Learning and Representing Object Shape Through ... - Semantic Scholar

3 downloads 2094 Views 4MB Size Report
First, the array must respond to the orientation stimuli correctly. ..... there are five types of objects, i.e., the Apple logo, bottle, giraffe, mug, and swan. In each ...
1346

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 25, NO. 7, JULY 2014

Learning and Representing Object Shape Through an Array of Orientation Columns Hui Wei, Qiang Li, and Zheng Dong

Abstract— Recognizing an object from its background is always a very challenging task for pattern recognition, especially when the size, pose, or illumination of the object or the background are changing. The most essential method of handling this classical problem is to learn and define the structure of an object using its topological or geometrical features and components. To create a data structure that can describe the spatial relationships of object components formally and join knowledge learning and applying in a seamless loop, a representation platform must be developed. This platform can serve as a shared workspace not only for learning but also for recognition. In this paper, the platform is established by simulating the primary functional modules in the biological primary visual cortex (V1). V1 is located at the middle level of the visual information processing system. As the conjunction of low-level data and high-level knowledge, it performs visual processing for general purposes. Orientation columns in V1 are simulated in our platform, and an array of such columns is designed to represent the orientation features of edges in an image. With this platform, formalized prototypes are designed to represent each typical view of an object and thus the object concept. Data- and concept-driven processing can shift iteratively on this platform. The processes of acquiring knowledge of an object and applying that knowledge coincide with each other perfectly. The experimental results show that our algorithm can learn from a small training set and can recognize the same types of objects in natural background without any preliminary information. This bioinspired representation platform offers a promising prospect for the handling of semantic-concerned problems that need prior knowledge. Index Terms— Bidirectional processing, neural modeling, object representation, orientation column.

I. I NTRODUCTION

C

URRENTLY, there are two dominant methods of object recognition, the model-based method and the appearance-based [1] method. The latter is more popular because it requires a fewer preprocessing stages, such as image enhancement, contour detection, segmentation, figure–ground separation, knowledge-based analysis, and topological analysis. This will lighten the burden of detailed image analysis.

Manuscript received March 19, 2013; accepted November 21, 2013. Date of publication January 10, 2014; date of current version June 10, 2014. This work was supported by the 973 Program under Project 2010CB327900, in part by the National Science Foundation of China under Project 61375122, Project 81373556, and Project 30990260, and in part by the National Twelfth Five-Year Plan for Science and Technology under Project 2012BAI37B06. H. Wei is with the Department of Computer Science, Laboratory of Cognitive Model and Algorithm, Brain Science Research Center, Fudan University, Shanghai 200433, China (e-mail: [email protected]). Q. Li and Z. Dong are with the Department of Computer Science, Laboratory of Cognitive Model and Algorithm, Fudan University, Shanghai 200433, China (e-mail: [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNNLS.2013.2293178

Fig. 1.

Matching task for kindergarten children.

In this method, an image is usually described using pixel-level features and sample training is used to acquire some kinds of implicit classification standard. A good generalization can be then expected if proper statistical learning algorithms are used. Although it can achieve image classification and image retrieval, this kind of method is rather vague with respect to the semantics of an object. In conventional learning stages, tags or labels are assigned to images to describe their semantics. Therefore, mapping is learned between tags and images. This does answer the question of whether an image is similar to a benchmark, but it cannot answer the questions of what the target object is, where it is, or how many objects there are. For example, an image may be tagged a buffalo in a river but tags such as buffalo, river, and bank do not need to be defined explicitly. This leaves the nature of a buffalo, what it is, somewhat uncertain. However, the model-based method is strict with respect to template definition and it is closer to human cognitive processing. A. Learning and Representing the Shape of Object Are Crucial to Image Understanding Defining the semantics of an object using its structure or shape is more natural than defining the semantics using tags. The topological or geometric structure of an object can tolerate changes in size, pose, position, color, illumination, and background, which improves invariance recognition. Fig. 1 shows a task assigned to children. In this figure, there are balls and holes. Children are asked to: 1) find the biggest and the smallest ball; 2) find out which two of the balls are of the same size; and 3) find a proper hole for every ball. Learning what a ball is and recognizing its shape are critical to the fulfillment of this task. The appearance-based method is weak here because pixel-level features are insufficient for describing geometrical information. Once several positive samples of a type of object are available, inductive learning can be used to draw a prototype of

2162-237X © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

WEI et al.: LEARNING AND REPRESENTING OBJECT SHAPE

the object. From a psychological point of view, the prototype is an inner representation that must be memorized. For a computer program, formal data structures are used to describe that prototype. One direct method involves choosing some kinds of knowledge representation methods, such as a semantic networks, production rules, frames, and ontology. However, from the point of view of knowledge engineering, a good knowledge representation scheme must be able to smoothly join two levels of knowledge, the level of knowledge acquired from the learning phase and the level of knowledge used in the reasoning phase. The knowledge concerned in this paper pertains to the structural characteristics of an object. These characteristics should be the result of learning but should also be used as an evidence in recognition processes. Although it is possible to define structure-related knowledge manually by knowledge representation tools, such as production rules, however, it breaks the natural transition from learning to recognition because the form of knowledge acquired during the learning phase could be out of joint with the form that they are required in recognition. Therefore, a representative form that runs through the whole process of perception in a natural and uniform way should be developed. The goal of this paper is to determine the structure of an object, represent it in an efficient form, and use this prototype or template to recognize other samples. B. Related Work This paper concerns two types of studies, 1) object representation and recognition and 2) computational simulation of primary visual cortex (V1). With respect to object definition, four previous studies used chain codes to represent objects’ contour [2]–[5]. The major limitation of this approach is that the chain code is too small in scale to reflect any macrofeature, such as the slope of a line. This involves considerable data storage costs for contour representation. Another three studies used spline curves to fit objects’ contours [6]–[8]. The major limitation of this method is that the accuracy with which the shape is represented is highly dependent on point-set and coordinate system selection. These methods all have direct and marked effects on fitting computation. Some previous studies used polygon approximation for shape representation [9]–[11]. The major limitation of this approach is that it is posture and size sensitive. This can affect the forming of polygons. In [12]–[15], scale-space features were used to represent original images. The major limitation of this method is that color, brightness, and noise may make the object difficult to describe and make it impossible to produce accurate descriptors for parts of the object. In [16]–[18], the use of global geometrical features, such as area, perimeter, major axis direction, eccentricity, compactness, and convexity, were described. However, these measures cannot describe the shape of an object accurately. This is because shape is not necessarily related to any of these measures. Three other studies describe experiments based on Hu-moment, Fourier, wavelet, or geometric spectrum descriptors [19]–[22]. The major limitations of these methods are color, brightness, and noise sensitivity. These methods

1347

are limited in intensive position, posture, and size analysis because the parameters, such as moment values or Fourier coefficients, usually obscure this information. Some previous studies used a template [23]–[25]. This strategy is consistent with psychological theory [26]. It can actually reflect human cognitive processing. This paper also follows this theory. With respect to V1 modeling, many simulation-related works were done. In [27]–[31], biological cortical maps were used. In [32]–[40], computational cortical maps were used. This map is also a concern in this paper. The map’s efficiency with respect to image representation is addressed here. In [41]–[43], hardware chips were used to simulate V1. In [44], another supercomputing-based method was proposed. In [45]–[49], some V1 mechanisms were simulated and used for invariance recognition for some special purposes. However, these simulations were somewhat crude and some basic physiological constraints were ignored. In [50]–[52], the modeling of functional columns of V1 for neural biological purposes were described. Six more also simulated functional columns of V1 for task-specific applications [53]–[59]. The purpose of this paper is somewhat different from those of the previous studies. The present model for V1 has more computational reliability, on the other hand, this model can be used in a more general way for image representation. Not all of these diverse vision-dependent tasks have been diverged even at the in V1 stage. The intended representation schema inspired by V1 should also be multipurpose. The limitations of the previous works highlight the value of the current model. C. Paper Organization Sections I-A and I-B summarize the motivation behind this paper. Section II discusses single orientation column modeling after a brief introduction of biological ice-cube model. Then, an artificial column array was designed for edged image representation. Here, hexagons in two scales were used to construct the physical model of the column array. The output of column array is vectorized lines. Section III suggests a line-context descriptor and discusses how to use descriptors to represent an object. This process turns raw image data into structured vector data. Section IV details how to learn features from multiple samples and how to form different prototypes of a type of object. Top–down processing is crucial to object recognition. Section V addresses how bidirectional processing works based on this new representation schema. Section VI describes the present set of experiments, including a representation stability experiment, an object detection statistical experiment, and active processing experiment. Finally, Section VII describes the present conclusion and prospects for future work. II. O RIENTATION C OLUMNS M ODELING Biological visual systems, especially the visual systems of higher mammals, have been profoundly optimized by natural evolution. The tradeoff between system complexity and function is ideal. The paper involves simulating the actions of the primary visual cortex (V1) because of its hub role in visual information processing. All information from the lower retina

1348

Fig. 2.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 25, NO. 7, JULY 2014

Ice-cube model of the cortex (in terms of [62]).

comes here and all information that is eventually sent to the higher cortex starts here. The information stored in the primary cortex is sufficient for upcoming advanced processing. A. Neurophysiological Mechanism of Orientation Column In the V1 area of the striate cortex, many cells respond strongly to special orientations [39], [60]. These are called simple cells. This kind of directional selectivity was studied by Hubel and Wiesel [61]. They won the 1981 Nobel Prize. They suggested the concept of an orientation column in V1, in which the orientation selection of simple cells in a column varies systematically and adjacent cells share approximate orientation preferences [40]. Hubel and Wiesel proposed that this column should be the basic unit of functional organization. Fig. 2 shows columns redrawn in terms of the results in [62]. This shows the ice-cube model of the cortex. It illustrates how the cortex is divided into two kinds of slabs, one set for ocular dominance (L here is left and R is right) and one set for orientation [63]. B. Hint for Computational Model Design From the perspective of computer programming, the icecube model discussed in the previous section is an ideal data structure for recording orientations. One column is considered the primary functional unit, and an array of columns can serve as a representation platform. This is similar to the duplicate micromodules in V1 that can be found during visual inspection of its anatomical structure. The edge detection algorithm puts out a boundary image, and boundaries serve as lines, also called orientations in some studies. The ice-cube model inspired us to design a similar array to record the orientation distribution of an image and we believe that contour information must be included in this distribution. The significance of this artificial array is not limited to the modeling of simple cells in V1. It can also be used to create new image-based representation schemes using contour information. This is completely different from other representation schemes, such as histogram-based, Fourier-transform-based, wavelet-based, Scale Invariant Feature Transform (SIFT)-descriptor-based, and basis-function-based schemes, because orientation is a higher order statistical feature and can reflect more geometrical attributes in a local region than other features can. C. Modeling of Orientation Columns Let us begin with the design of a single unit that simulates the orientation column in V1. Each orientation column is in charge of a very small part of the stimulus, called the receptive field of the column. Any orientation may vary from 0° to 180°.

Fig. 3. Every orientation column includes 19 chips, and every chip is responsible for a special orientation that appears in the receptive field of the orientation column.

This closed interval is divided into subintervals where every span is about 10°. In this way, the orientations that are grouped into a common subinterval have approximately the same slope. In V1, many small chips (simple cells) have evolved to stand for these orientations one by one. That is, once an orientation occurs, a corresponding chip fires. This is called the orientation selectivity of the simple cell. An orientation column assigns a chip to represent each possible subinterval of each slope; these chips are called orientation chips. All of the orientation chips that share a common receptive field and vary their sensitive slopes from 0° to 180° make up a column. According to the preceding principle, an orientation column can be designed in the form of a large hexagon composed of 19 smaller hexagons. Each small hexagon is an orientation chip, and each orientation chip has a receptive fields inherited from its column. Three neighboring columns are shown in Fig. 3 (right). Their receptive fields overlap somewhat, and they share some low-level ganglion and Lateral Geniculate Nucleus cells. Each of these cells obtains color and grayscale information from a pixelated image. A hexagon was chosen as the basic module unit because 1) the slope span of every two neighboring orientation chips is (180/19) °, which is sufficiently fine and precise, and 2) 19 small hexagons can cover a big hexagon completely. This makes the array of orientation chips compact, regular, and orderly. Constructing a dense array of small hexagons is a strict work. This is followed by confirming that the coverage of the visual field is continuous. This makes it easier to reorganize the 19 small hexagons with the same receptive field into an orientation column. To guarantee that no information has been missed, a pair of neighboring orientation columns with overlapping receptive fields was established. In this paper, every column has a receptive field of 9 × 9 pixels, which is shared by all chips in the column. The organization of the receptive fields is shown in Fig. 4. On the right are seven orientation columns and on the left are their overlapping receptive fields. The Roman numerals show a one-to-one relationship between a given column and its own receptive field. Fig. 4 shows a conceptual model. The hexagon array is not conventional. It is usually preferable to use orthogonal grids because they are easy to store in space. Fig. 5 shows the storage settlement of these orientation columns with respect to memory allocation. This kind of transformation serves the data structure design. More implementation details can also be seen in [64].

WEI et al.: LEARNING AND REPRESENTING OBJECT SHAPE

Fig. 4. Organization of overlapping receptive fields. (a) Overlapping RFs. (b) Orientation columns.

Fig. 5.

1349

Fig. 7. Training and neural physiological results. (a) Neural physiological result of an orientation sensitivity map. (b) and (c) Simulated results by training our model.

Storage model of neighboring orientation columns. Fig. 8.

Fig. 6. Orientation representation array initialized by training. (a) Part of an orientation array. (b) Examples of pinwheel locations.

D. Array of Orientation Columns The purpose of designing a V1-inspired orientation column is to use it to extract orientations within a small region. Rather, it is a kind of feature descriptor that can be used to exhaust a limited space of orientation. For an extended area, an array of orientation columns can be expected to act as infrastructure for image processing. The principle of how to use multiple column units to construct an array, just like splicing tiles on a floor, is shown in Figs. 3 and 4. The manner in which an orientation column array works is also addressed. First, the array must respond to the orientation stimuli correctly. Using Kohonen’s self-organizing mapping (SOM) algorithm, all orientation chips were trained using many real lines [64]. The advantages of this strategy are 1) the SOM algorithm has strong brain-like computation background and 2) the mode of its training samples coincides with the prospective mode of its stimuli in a real image. The validity of using SOM to simulate cortical map has also been confirmed in some pure neurobiological studies [65]. Fig. 6(a) shows a part of an array after training. Here, every chip has developed its own sensitive orientation. The receptive field partially sharing effect can be seen at the pinwheel locations marked by red circles, as in Fig. 6(b). This phenomenon is an important characteristic in the biological V1

Image represented by a group of active orientation chips.

cortex [66], [67], It shows that the distribution of feature detectors tends to be continuous in any local region. This guarantees that no representation will be missed, regardless of where the object appears. Fig. 7(a) shows a map of orientation sensitivity obtained in an electrophysiological experiment [60]. Fig. 7(b) and (c) shows the results of the present simulation. As in Fig. 7(a), they show the orientation selections of chips in false color. Regarding the mode of organization of the functional units, the artificial results can be shown to be similar to natural results. They are similar in the following respects: 1) no types of orientation selection are dominant or densely distributed; 2) all orientation selections are equally balanced; and 3) in any continuous block, all orientation selections are well proportioned. In this way, it can be expected that, in a computational sense, if the natural cortical map can record orientation information efficiently and reliably, then the artificial map can as well. Fig. 8 shows an example of how an image is represented. On the left is a stimulus that includes lines. On the right is an orientation column array, wherein each chip has a special sensitive direction. The three circles show three columns and their corresponding receptive fields are represented by the three dotted-line rectangles on the left. The small black hexagons represent chips that are activated by bars. Boundary information is very important for any shape-based algorithm. Edge output by algorithms like the Canny operator can be considered as fine bars. Then, a column array can be used to extract orientation information from a scene. The current representation through an orientation column array is much more condensed than a representation through pixels because orientations and lines are more informative than pixels. Local topological information of the images is well reserved. The orientation is a higher order statistical feature in that it is more directly concerned with the shape of an object. These are used to describe the geometric and structural characteristics of an object.

1350

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 25, NO. 7, JULY 2014

Fig. 9. Line-context descriptor. (a) Constructing a polar coordinate system on an active chip. (b) Using a spiral coordinate system to sort columns.

Fig. 10.

Determining which bin a column belongs to. TABLE I L INE -C ONTEXT M ATRIX

III. T YPE OF L INE -C ONTEXT D ESCRIPTOR The process in which an edged image is represented in dozens of active orientation chips, is like using letters (chips) to describe an object. It may be possible to upgrade them to words or sentences. This means that these active orientation chips must be organized. A. Revised Shape-Context Descriptor Although orientation chips have been shown to be able to integrate information to some extent in local areas, they are still too fine in scale to use over large areas. Here, the famous shape-context descriptor is revised [68]–[70]. A new descriptor is designed for the description of the distribution of active chips. It is here called a line-context descriptor. Fig. 9 shows an array of orientation columns in which some of the chips, represented by black hexagons, have been activated by a stimulus. On any active chip, a polar coordinate system whose starting axis is coincident with this chip’s sensitive slope can be constructed. The positive direction coincides with the direction of that column sorted by a spiral coordinate system, as shown in Fig. 9(b). As in the shapecontext method, the whole area of the polar coordinate system can be divided into 4 ∼ 6 rings. Each ring is evenly divided into 6 ∼ 8 sectors. Every sector is a bin in which the active chips can be added up as follows. An important technical matter here is determining which bin a column belongs to while the polar coordinate system is changing with the center chip’s position and slope. Fig. 10 shows a solution. Suppose the center of the polar coordinate system is column A(x0 ,y0 ) , and column B(x1,y1 ) ’s ownership is to be determined. α is the sensitive orientation of the center chip, and bin shows the scope of a bin after the coordinate system is rotated. r is the length of one side of a hexagon. Then, the ring index i ring and the sector index jsector to which column B belongs are calculated using the following equations: ⎤ ⎡  ⎢ (x 1 − x 0 )2 + (y1 − y0 )2 ⎥ ⎥ i ring = ⎢ ⎢   √ 2 ⎥  ⎥ ⎢ 3 9 2 ⎥ ⎢ r × 2 + r ×7× 2

jsector =



x1 − x0 |bin| −1 −α . sin  2π (x 1 − x 0 )2 + (y1 − y0 )2

Like the shape-context algorithm, this algorithm also collects statistics from these bins. Suppose the polar coordinate system has four concentric rings and each of them is divided into eight divisions [Fig. 9(a)]. We center it at an active chip C0 and then a line-context descriptor (LD) is a matrix ⎞(C0 ) ⎛ b11 · · · b1n ⎜ .. ⎟ .. LD(C0 ) = ⎝ ... (1) . . ⎠ bm1

···

bmn

4×8

where bi j is a 1-D histogram which represents the distribution of relative orientations of active chips in bini j . The relative orientation of an active chip is the difference of its own sensitive orientation and that of the center chip C0 . The histogram is defined as a vector bi j = (h 1 , h 2 , . . . , h 19 ) , where    (k −1)π kπ  ≤ |OC − OC0 | ≤ h k =  C|C ∈ Activebinij and 19 19  for k = 1,2, . . . , 19. Activebinij is the set of active chips in bini j and  OC − OC0  is the orientation difference between a chip C and the center chip C0 . Therefore, what we finally obtain is a matrix like Table I. For example, the number 3 in the second row and the third column means that in bin1,2 (i.e., sector 2 of ring 1), there are three active chips whose orientation difference relative to the center chip is less than 3π/19 and greater than 2π/19. The line-context descriptor is a 3-D matrix, which presents a kind of feature that describes how those surrounding lines distribute around a center one. Fig. 11 shows a visualization

WEI et al.: LEARNING AND REPRESENTING OBJECT SHAPE

1351

Fig. 11. Selected active chip and the visualizations of its line-context matrix.

Fig. 12.

Some standard learning samples of car images.

of a line context. One active chip (in black) was selected and its line context was listed at right by a 3-D bar graph. The x-, y-, and z-axes show bin level, interval of relative orientation, and quantity of active chips, respectively. This descriptor can be local or global, depending on the scope of the polar coordinate system being extended. IV. M ULTIPROTOTYPE R EPRESENTATION OF AN O BJECT The views of an object under different postures are always different. In this way, multiple prototypes with different structural characteristics can be formed for those typical views. The theory of multiple prototypes is helpful for recognition of invariance, so we want our program to learn some prototypes first, and then apply them in future recognition. A. Standard Samples for Prototype Learning The learning process here is to have our program grasp those essential and structural characteristics of an object in different postures. For accurate learning, standard images were selected for use as learning samples. For example, Fig. 12 shows some clean pictures from the ETH-80 image database. They can be considered typical images of this kind of car. They were chosen for the construction of prototypes through learning. These pictures were subjected to edge detection and uploaded to the orientation column array for representation. B. Inductive Learning of Prototype Once an image has been processed by the orientation column array, a honeycomb-like output becomes available. Then, as arranged, the line-context algorithm is used upon those activated orientation chips and a number of line-context descriptors will come into being. This means a learning sample has been decomposed into a set of descriptors. If, from the point of view of relational database, the descriptors are considered features, then any set of descriptors can be treated as a sheet of entries or a Microsoft Excel table. In this way, all processed learning samples can be stored in the form of a database sheet. This fits the goal of this paper exactly. Its consequent benefit is immediate and significant because

Fig. 13.

Correspondence of descriptors in multiple samples.

it can represent multiple prototypes in a table now, and its form is very regular and structured. A pixel image is usually considered unstructured raw data, but it can here be converted to a structured item. To produce a better structured sheet, the degrees of similarity between different samples must be established. Only by doing so, shared structural features can be identified. Fig. 13 shows the reason for this. It shows six learning samples. Circled numbers show the center position of a line-context descriptor. Pairs of descriptors shown in the same colors should match each other. Only those matched descriptors can be identified as features because they might occur repeatedly in more than one instance, but they do not do so by accident. Further explanation of this figure is as follows. The degree to which the descriptors match one another must be determined. A measure of similarity between two descriptors can be defined. It involves measuring the similarity between two corresponding bins and then adding the weighted values of all bin-to-bin similarities together to produce a value representing descriptor-to-descriptor similarity. The similarity between two bins is defined as follows: Sbin (b, b ) = 1 −

1  |h k − h k | 19 max(h k − h k ) 19

(2)

k=1

where each bin is a histogram over 19 intervals, b = (h 1 , h 2 , . . . , h 19 ) , and b = (h 1 , h 2 , . . . , h 19 ) . The similarity between two line-context descriptors is the weighted sum of similarity of corresponding bins Sdescriptor(LD, LD ) =

4  8 

wi j × Sbin (bi j , bi j )

(3)

i=1 j =1

where wi j is the element of a predefined weight matrix and bi j and bi j are the elements of the descriptor matrices LD and LD , as defined by (1) in Section III-A.

1352

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 25, NO. 7, JULY 2014

Algorithm 1 Line-Context Descriptor Alignment( pool) Require: pool is a set of line-context descriptors {LDi, j }, where i = 1, · · · , M is the index of samples and j = 1, · · · , Ni is the index of line-context descriptors of the i th sample Ensure: Each line-context descriptor in pool is assigned with a class label class ← 0 for LDi, j ∈ pool do if LDi, j .class label is not assigned then class ← class + 1 LDi, j .class label← class for LDi  , j  ∈ pool, where i  > i do if Sdescriptor(LDi, j , LDi  , j  ) > 0.7 then LDi  , j  .class label← class end if end for end if end for

TABLE II S HEET OF S AMPLES D EFINED BY F EATURES

Fig. 14. One branch of a recognition decision tree based on data set COIL-20.

C. Applying a Decision Tree Algorithm Suppose that there are M samples available for learning an object, and each sample is a typical prototype. After orientation column array processing and line-context representation, a set{LDs, j |1 ≤ s ≤ M and 1 ≤ j ≤ Ns } is produced, where Ns is the number of descriptors obtained from the sth sample and LDs, j represents the j th line-context descriptor of the sth sample. After expanding this set into lists, its data structure will look like this ⎫ ⎧ sample1 : LD1,1 , LD1,2 , · · · , LD1,N1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎬ ⎨ sample2 : LD2,1 , LD2,2 , · · · , LD2,N2 . pool = .. .. .. ⎪ ⎪ ... ⎪ ⎪ ⎪ ⎪ ⎭ ⎩ sample M : LD M,1 , LD M,2 , · · · , LD M,N M This is called a pool of line-context descriptors. Therefore, based on the similarity defined in (3), descriptors in different samples that are actually mapping to each other can be identified. This allows the user to find out which descriptors occur frequently. Algorithm 1 is used for this purpose. After the pool of line-context descriptors are processed by the alignment algorithm, all line-context descriptors in pool can be classified into a dozen of subsets. Each subset contains multiple similar line-context descriptors, which come from different samples. If a subset is considered a feature, then its dimension means that this feature can be observed in more than one prototype of this object. If these features are arranged in a table, then a well-structured sheet can be formed. Finally, normal images can be converted to a Boolean sheet attributed by geometrical or topological features. An example is shown in Table II. Refer to Fig. 13. After descriptor clustering, the matched descriptors are marked by numbers with the same value and color. Each descriptor’s geometrical meaning is obvious. This produces important progress in transforming a geometrical problem into a logical one in that this well-formed representation opens doors to many other mature machine learning algorithms, all of which require structured data.

The data presented in Table II strongly suggest that a decision tree algorithm would be suitable. This table can be used to train and build a decision tree, which is not only the classifier for new samples but also an explicit knowledge representation form in that it reveals the logical associations between multiple prototypes and observable features (or evidence). Fig. 14 shows one of the branches of a real decision tree, which is produced by four types of objects selected from the COIL-20 data set and based on WEKAs J48 decision tree tool. This algorithm is well developed, so the details regarding that algorithm are omitted here. It must be emphasized that the decision tree algorithm itself is not our goal because it has been carefully studied. What we are interested in is the course of transforming a bitmap image into a Boolean sheet and making a decision tree algorithm run on such a symbolic level. Some reasoning methods using symbolic representation are worth further investigation. V. C ONCEPT-D RIVEN P ROCESSING Cognitive psychology dictates that perception is a dualdirectional and iterative process. Normally, the data-driven process is well studied, but concept-driven processes are not. The key to concept-driven processing lies in two points. The first is a proper high-level representation form for concepts, and the second is a seamless route for feeding back top–down controls. Fig. 15 shows the present solution. Once a concept has been learned, according to the prototype theory given in the previous section, it must be defined using multiple geometric features. In actuality, these features are descriptors (top-right corner of Fig. 15). Each descriptor then records a special distribution of short lines, which are located at some specific positions. In this way, an active descriptor can conversely project its expectation to a column array and trigger a comparison between the actual stimulus and the expectation. Then, some pixel-concerned

WEI et al.: LEARNING AND REPRESENTING OBJECT SHAPE

Fig. 15.

Concept-driven processing model.

Algorithm 2 Bidirectional Recognition(D) Require: D is a set of line-context descriptors to recognize; conc is an array of concepts; each concept is a symbolic representation of an object class; the symbolic representation could be a decision tree as described above but not necessarily restricted to it; thr eshold is a threshold value for matching ratio and its default value is 75% Ensure: return value is the concept, which is recognized from D for c ∈ conc do r ati o ←Matching-Ratio(D, c) if r ati o > thr eshold then return c end if Rank concept c with r ati o end for for c ∈ conc by decreasing order of ranks do repeat Project unmatched descriptors in concept c downward Reorganize raw data with new hypothesis Update the set of descriptors D r ati o ←Matching-Ratio(D, c) until r ati o > thr eshold or maximum iterations exceeded if r ati o > thr eshold then return c end if end for return null {Showing a failure of recognition}

manipulations, such as researching, reselecting, rediscovering, and reorganizing pixels, can be performed (top–down arrow in the middle of Fig. 15). Each bin of a descriptor has a limited area within the pixel image for which it is responsible. For the purpose of confirming certain hypotheses, it can accurately project a search request onto this area (bottom of Fig. 15). Algorithm 2 describes this bidirectional recognition. VI. E XPERIMENT To prove the feasibility of the current method, a series of follow-up experiments were conducted.

1353

Fig. 16.

Diversity between different objects.

A. About the Representation Diversity Between Objects The aim of this experiment is to prove that object representation based on line-context descriptors possesses sufficient diversity to differentiate each object. We randomly select five classes of objects from the COIL-20 data set. Each class includes five samples. The five samples are similar to each other and thus belong to the same prototype. The reason for such selection of samples is based on the fact that different views of an object have generally very different geometrical characteristics, which implies that a stable representation can only be reached under a single view (or prototype). Different views are represented by different prototypes. They are first represented with descriptor vectors and then the Euclidean distance between any two representation vectors can be calculated. Fig. 16 shows these samples and a curved surface that records the distances between vectors. Five areas, labeled A, B, C, D, and E, are shown in Fig. 16. Comparisons can be made among objects in the same class, where the distances are small. Conversely, other areas show comparisons between objects in different classes, so the large distances are obvious. It can therefore be concluded that this representation schema provides not only intraprototype similarity but also interprototype distinguishability and can be applied to object recognition. B. About Recognition After an image is transformed into a structured vector, almost all classification algorithms are suitable. A decision tree algorithm was selected here, because 1) it has a simple and clear architecture to realize decision making; 2) it allows convenient extraction of explicit knowledge for object recognition; 3) it is scalable and incremental; and 4) it is suitable for symbolic reasoning and reasoning with incomplete information. All of these reasons are attractive for object recognition. This experiment, presented here, is somewhat different from conventional methods (splitting a fixed data set into a training part and a testing part), because the samples were not confined to a limited data set. Some standard prototypes were obtained from a small model, and then the decision tree was trained using these prototypes. After this, the tree was tested using some real instances in an actual scene. Fig. 17 shows this process. Fig. 17(a) shows a plastic F117 model. Its contour

1354

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 25, NO. 7, JULY 2014

Fig. 18. Statistical comparison based on the ETH80 data set. (a) Left image was cut, rotated, and reconstructed into a new image shown on the right. In our testing image database, 50% of the images were preprocessed like this. (b) Two images were matched using the SIFT features. Some features in the background, rather than those inside of the cars, were matched. (c) Two images were matched using the shape-context descriptors. Red and blue points: the contours of two cars. A link between a pair of points (red and blue) shows where they match. (d) P–R curves of three algorithms.

Fig. 17.

Recognizing multiple instances from real scene.

can be used as a learning sample. Fig. 17(b) and (c) shows two actual pictures, which include multiple F117s in different sizes. A flexible window was used to search each picture and capture blocks for line-context processing. This was then uploaded to a decision tree. Each red dot box represents a sample and the value attached to each box is the accumulation of feature difference degrees while comparing it with decision tree nodes. We can see that real objects have significantly smaller values. C. Comparison Using ETH80 Data Set Statistical comparison between different algorithms is very necessary, but it requires a common image data set. The data sets, such as COIL-20, ETH80, MPEG7, and the Princeton 3-D database, include many samples from the same type of object, but all images have almost the same background. A clean, monotonic background can counteract the need for shape definition. As a tradeoff, the ETH80 data set is still chosen in our experiment because its colorful images benefit appearance-based methods. Here, some of the images were reconstructed. This involves cutting an original image into small blocks, rotating some randomly selected blocks, and then reshuffling and rearranging all of the blocks into a new image. Fig. 18(a) shows an example, which can be used to determine whether a recognition algorithm is local physical feature dominant or not. We then established a database for algorithm testing. First, we designated a car as

the recognition target, and all car images come from ETH80. Some 10% of the images in the present database are of cars. Second, the remaining 90% of the images were randomly selected from ETH80. Third, 50% of the samples from the present database were randomly selected for reconstruction. For comparison, the widely accepted SIFT descriptor [71] and shape-context algorithm [70] were selected. The former of these is bioinspired and orientation concerned and the latter is a famous contour-oriented method. Fig. 18(b) and (c) shows correspondences found between two car images using SIFT feature matching and shape-context matching. All three approaches were evaluated over our data set. The data set was classified into car images and noncar images with these approaches. Classification errors (false positives and false negatives) were used as a measure of performance. Finally, precision–recall curves were drawn [Fig. 18(d)]. Our results are much better than other two approaches. One explanation for the excellent performance of the present algorithm is its improvement in representation over existing methods. For example, 1) the SIFT representation considers much more unimportant background information; 2) the shapecontext representation is more sensitive to the number of pixel points rather than their correlations; and 3) our representation is based on a line vector, which contains more geometrical information than SIFT descriptor or the shape-context’s histogram of the number of pixels. D. Comparison Using ETHZ Data Set The ETHZ [72] data set is another widely used and wellstudied database used for object detection. In this data set, there are five types of objects, i.e., the Apple logo, bottle, giraffe, mug, and swan. In each category, there are more

WEI et al.: LEARNING AND REPRESENTING OBJECT SHAPE

Fig. 19.

1355

Statistical comparison based on the ETHZ data set. (a) to (e) show the results over 5 object categories respectively.

Fig. 20. Top–down active processing. (a) Original model of a Ford Transit van, and its contour image. Red rectangles: specific areas, including dominant features such as wheels and windows. They are good for defining local line-context descriptors. (b) Twenty areas used to define line-context descriptors. (c) Real scene including two Ford Transit vans. They are different from the van in (a). (d) Three local areas are validated and shown to match predefined descriptors. (e) Nine descriptors are matched. (f) Only the front half of the second van is visible. According to those matched descriptors, the rear half of it has been rebuilt successfully using the learned concept Transit van and the rest of the descriptors.

than 50 instances. These instances are not only different in appearance but they also have different background. Many high-performance algorithms have been evaluated on this popular data set. The present algorithm was also executed on ETHZ, and the results were compared with those of [72]–[75]. Commonly used false positive per image (FPPI)–detection rate (DR) curve is also used here. The x-axis is FPPI and the

y-axis is DR. Detection was considered to have failed if more than half of the candidate results in a given image were false positives, because DR can generally increase up to one as long as FPPI is unrestricted. For this reason, the proportion of false positives per image was required to not exceed 0.5. Fig. 19 shows the comparisons. In a condition of FPPI ≤ 0.5, Fig. 19(a)–(e), shows the performance of our algorithm to

1356

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 25, NO. 7, JULY 2014

approach that of the best algorithm shown here. Surveying the results of recognition experiments in this and previous sections, we conclude that the proposed object representation method can improve the performance of object recognition. E. Active Processing Bidirectional processing is an attractive challenge that has troubled pattern recognition studies for a long time. The difficulty exists in the top–down direction. Specifically, it is difficult to apply high-level knowledge downward to lowlevel stimuli. A successful example using our algorithm is shown here. According to algorithm 5.1, the learned linecontext descriptors can be projected upon an image using a candidate object. An active search and comparison procedure was used to verify and validate the hypothesis. Fig. 20 shows this top–down processing. The present algorithm was first used to determine which descriptors in a given real scene matched the learned descriptors. Then, according to the hypothesis produced by those matched descriptors, the locations and identities of the descriptors were predicted. From this example, it can be seen that 1) the present method is sufficiently generalized, as shown by the fact that it can adapt two totally different scenes, a learning model in one and a recognized sample in the other; 2) the ability of generalization is not gained by repetitious training on different cases but rather by learning intrinsic structural features; 3) this is a kind of method of intensive analysis, which can be used to position an object’s components accurately; and 4) some invisible or hidden components of an object can be expected according to known evidence, and this is enabled by active processing. In summary, the color, size, and environmental tolerances of this algorithm are good. If the diversity of prototypes is rich enough, then posture invariance can be achieved easily. VII. C ONCLUSION In this paper, only simple mathematical tools were used for long-term machine learning. Neural mechanisms were simulated and a slightly complex representation schema was established. Because of the significant amounts of evolution that it has undergone, the natural visual system is highly efficient and optimized. The present practice of simulating a column map of V1 can offer a somewhat general representation and simplify downstream processing. Another difference from previous V1-inspired studies is that the present model was designed as a primary representation platform instead of a task-specific tool, and was upgraded to the level of semantic representation and data-concept dual driven processing. There are always two important issues in computer vision and pattern recognition: 1) concept-driven processing and 2) the use of high-level knowledge. The root of semantic gap problem is breaking down the continuity of representations from raw data to formal concepts. The present conventional approach involves formalizing only some of these concepts according to the particular request of a task. In human cognition, however, there is no such discontinuity. Here, the array of simulated orientation columns in the primary visual cortex is sparser than that of raw data and denser than formal

Fig. 21.

Multiple representations.

concepts. The semantic responsibilities assigned to this array are heavier than raw data and lighter than formal concepts. In this way, it is a very good place, with the right granularity, to allow for full data and concept converge. Raw data from downstream have been transformed into short lines (evidence), and concepts from upstream have been decomposed into short lines (hypothesis). They are all displayed in the same manner now, which greatly facilitates manipulation processes, such as selecting and organizing evidence, comparing hypotheses and evidence, and suggesting new hypotheses. This makes active processing and the translation of high-level concepts into low-level features possible. In the present model, learning an object, representing it, and extracting this information are considered under a common architecture. In this way, a possible separation between the definition of a piece of knowledge and the use of that knowledge can be avoided or mitigated. Fig. 21 summarizes this paper and shows the whole course of transforming unstructured data into declarable knowledge. Future work should involve defining scale-extended geometrical attributes and introducing probability reasoning to provide more benefit in invariant recognition. R EFERENCES [1] G.-S. Hsu, T. T. Loc, and S.-L. Chung, “A comparison study on appearance-based object recognition,” in Proc. 21st ICPR, Nov. 2012, pp. 3500–3503. [2] Y. Kui Liu and B. Žalik, “An efficient chain code with Huffman coding,” Pattern Recognit., vol. 38, no. 4, pp. 553–557, Apr. 2005. [3] Q. Niu, H. Zhang, J. Liu, Q. Wang, G. Xu, and Y. Xue, “A shape contour description method based on chain code and fast Fourier transform,” in Proc. IEEE 7th ICNC, vol. 3. Jul. 2011, pp. 1809–1812. [4] H. Sánchez-Cruz, E. Bribiesca, and R. M. Rodríguez-Dagnino, “Efficiency of chain codes to represent binary objects,” Pattern Recognit., vol. 40, no. 6, pp. 1660–1674, Jun. 2007.

WEI et al.: LEARNING AND REPRESENTING OBJECT SHAPE

[5] Y.-K. Liu, B. Žalik, P.-J. Wang, and D. Podgorelec, “Directional difference chain codes with quasi-lossless compression and runlength encoding,” Signal Process., Image Commun., vol. 27, no. 9, pp. 973–984, Oct. 2012. [6] Y.-C. Hwang, T.-L. Shieh, and L.-S. Lan, “A weighted least-squares approach for B-spline shape representation,” in Proc. IEEE 3rd ISCCSP, Mar. 2008, pp. 894–898. [7] M. S. Khan, A. F. Ayob, A. Isaacs, and T. Ray, “A novel evolutionary approach for 2D shape matching based on B-spline modeling,” in Proc. IEEE CEC, Jun. 2011, pp. 655–661. [8] O. Grove, K. Rajab, L. Piegl, and S. Lai-Yuen, “From CT to NURBS: Contour fitting with B-spline curves,” Comput., Aided Des. Appl., vol. 8, no. 1, pp. 3–21, 2011. [9] A. Carmona-Poyato, F. J. Madrid-Cuevas, R. Medina-Carnicer, and R. Muñoz-Salinas, “Polygonal approximation of digital planar curves through break point suppression,” Pattern Recognit., vol. 43, no. 1, pp. 14–25, Jan. 2010. [10] A. Masood and S. A. Haq, “A novel approach to polygonal approximation of digital curves,” J. Vis. Commun. Image Represent., vol. 18, no. 3, pp. 264–274, Jun. 2007. [11] A. Carmona-Poyato, R. Medina-Carnicer, F. J. Madrid-Cuevas, R. Muñoz-Salinas, and N. Fernández-García, “A new measurement for assessing polygonal approximation of curves,” Pattern Recognit., vol. 44, no. 1, pp. 45–54, Jan. 2011. [12] J. Hu and J. Hua, “Salient spectral geometric features for shape matching and retrieval,” Vis. Comput., vol. 25, nos. 5–7, pp. 667–675, May 2009. [13] A. M. Pinheiro and M. Ghanbari, “Piecewise approximation of contours through scale-space selection of dominant points,” IEEE Trans. Image Process., vol. 19, no. 6, pp. 1442–1450, Jun. 2010. [14] K. Zhu, Y. Wong, W. F. Lu, and J. Y. Fuh, “A diffusion wavelet approach for 3-D model matching,” Comput., Aided Des., vol. 41, no. 1, pp. 28–36, Jan. 2009. [15] K. Nasreddine, A. Benzinou, and R. Fablet, “Variational shape matching for shape classification and retrieval,” Pattern Recognit. Lett., vol. 31, no. 12, pp. 1650–1657, Sep. 2010. [16] G. Zou, J. Hua, Z. Lai, X. Gu, and M. Dong, “Intrinsic geometric scale space by shape diffusion,” IEEE Trans. Visualizat. Comput. Graph., vol. 15, no. 6, pp. 1193–1200, Nov./Dec. 2009. [17] C. Luo and S. Zhou, “On 3D contour matching based on geometry features,” in Applied Informatics and Communication. New York, NY, USA: Springer-Verlag, 2011, pp. 622–630. [18] B. Huang, J. Wu, D. Zhang, and N. Li, “Tongue shape classification by geometric features,” Inf. Sci., vol. 180, no. 2, pp. 312–324, Jan. 2010. [19] J. Žuni´c, K. Hirota, and P. L. Rosin, “A Hu moment invariant as a shape circularity measure,” Pattern Recognit., vol. 43, no. 1, pp. 47–57, Jan. 2010. [20] C. Direko˘glu and M. S. Nixon, “Shape classification via image-based multiscale description,” Pattern Recognit., vol. 44, no. 9, pp. 2134–2146, Sep. 2011. [21] K. Arai, “Visualization of 3D object shape complexity with wavelet descriptor and its application to image retrievals,” J. Visualizat., vol. 15, no. 2, pp. 155–166, May 2012. [22] I. Epifanio, “Shape descriptors for classification of functional data,” Technometrics, vol. 50, no. 3, pp. 284–294, 2008. [23] Z. Lin and L. S. Davis, “Shape-based human detection and segmentation via hierarchical part-template matching,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 4, pp. 604–618, Apr. 2010. [24] B. Al-Diri, A. Hunter, and D. Steel, “An active contour model for segmenting and measuring retinal vessels,” IEEE Trans. Med. Imaging, vol. 28, no. 9, pp. 1488–1497, Sep. 2009. [25] H. Li, Y. Wang, and Z. Yu, “Template match of buildings and results analysis,” in Proc. IEEE 18th Int. Conf. Geoinformat., Jun. 2010, pp. 1–5. [26] I. Biederman, “Recognition-by-components: A theory of human image understanding,” Psychol. Rev., vol. 94, no. 2, pp. 115–147, 1987. [27] W. H. Bosking, J. C. Crowley, and D. Fitzpatrick, “Spatial coding of position and orientation in primary visual cortex,” Nature Neurosci., vol. 5, no. 9, pp. 874–882, Aug. 2002.

1357

[28] S. O. Murray, P. Schrater, and D. Kersten, “Perceptual grouping and the interactions between visual cortical areas,” Neural Netw., vol. 17, nos. 5–6, pp. 695–705, Jun./Jul. 2004. [29] S. Tanaka, M. Miyashita, and J. Ribot, “Roles of visual experience and intrinsic mechanism in the activity-dependent self-organization of orientation maps: Theory and experiment,” Neural Netw., vol. 17, nos. 8–9, pp. 1363–1375, Oct./Nov. 2004. [30] P. A. Watters, “Coding distributed representations of natural scenes: A comparison of orthogonal and non-orthogonal models,” Neurocomputing, vol. 61, pp. 277–289, Oct. 2004. [31] Y. Kamitani and F. Tong, “Decoding the visual and subjective contents of the human brain,” Nature Neurosci., vol. 8, no. 5, pp. 679–685, Apr. 2005. [32] R. Prenger, M. C.-K. Wu, S. V. David, and J. L. Gallant, “Nonlinear V1 responses to natural scenes revealed by neural network analysis,” Neural Netw., vol. 17, nos. 5–6, pp. 663–679, Jun./Jul. 2004. [33] J. A. Bednar and R. Miikkulainen, “Joint maps for orientation, eye, and direction preference in a self-organizing model of V1,” Neurocomputing, vol. 69, no. 10, pp. 1272–1276, 2006. [34] J. A. Bednar, J. B. De Paula, and R. Miikkulainen, “Self-organization of color opponent receptive fields and laterally connected orientation maps,” Neurocomputing, vols. 65–66, no. 8, pp. 69–76, 2005. [35] S. Jegelka, J. A. Bednar, and R. Miikkulainen, “Prenatal development of ocular dominance and orientation maps in a self-organizing model of V1,” Neurocomputing, vol. 69, no. 10, pp. 1291–1296, 2006. [36] Y. F. Sit and R. Miikkulainen, “Self-organization of hierarchical visual maps with feedback connections,” Neurocomputing, vol. 69, no. 10, pp. 1309–1312, 2006. [37] A. Plebe and R. G. Domenella, “Object recognition by artificial cortical maps,” Neural Netw., vol. 20, no. 7, pp. 763–780, Sep. 2007. [38] M. Gheiratmand, H. Soltanian-Zadeh, and H. Khaloozadeh, “Towards an inclusive computational model of visual cortex,” in Proc. 8th IEEE Int. Conf. BIBE, Oct. 2008, pp. 1–5. [39] E. Erwin, K. Obermayer, and K. Schulten, “Models of orientation and ocular dominance columns in the visual cortex: A critical comparison,” Neural Comput., vol. 7, no. 3, pp. 425–468, May 1995. [40] N. Swindale, “The development of topography in the visual cortex: A review of models,” Netw., Comput. Neural Syst., vol. 7, no. 2, pp. 161–247, 1996. [41] K. Shimonomura, T. Kushima, and T. Yagi, “Neuromorphic binocular vision system for real-time disparity estimation,” in Proc. IEEE Int. Conf. Robot. Autom., Apr. 2007, pp. 4867–4872. [42] K. Shimonomura, T. Kushima, and T. Yagi, “Binocular robot vision emulating disparity computation in the primary visual cortex,” Neural Netw., vol. 21, nos. 2–3, pp. 331–340, Mar./Apr. 2008. [43] K. Shimonomura and T. Yagi, “Neuromorphic VLSI vision system for real-time texture segregation,” Neural Netw., vol. 21, no. 8, pp. 1197–1204, Oct. 2008. [44] K. L. Rice, C. N. Vutsinas, and T. M. Taha, “A preliminary investigation of a neocortex model implementation on the Cray XD1,” in Proc. ACM/IEEE Conf. SC, Nov. 2007, pp. 1–8. [45] M. Mathur and B. Bhaumik, “Study of spatial frequency selectivity and its spatial organization in the visual cortex through a feedforward model,” Neurocomputing, vols. 65–66, pp. 85–91, Jun. 2005. [46] S. Xiao and N. Gao, “Research on visual invariance based on dynamic receptive field,” in Proc. IEEE Int. Conf. Comput. Sci. Softw. Eng., Dec. 2008, pp. 273–276. [47] L. Y. Zhao and B. E. Shi, “Integrating contrast invariance into a model for cortical orientation map formation,” Neurocomputing, vol. 72, nos. 7–9, pp. 1887–1899, Mar. 2009. [48] H. Kolb, E. Fernandez, and R. Nelson, Webvision. University of Utah Health Sciences Center, 1995. [49] W. Yang and L. Zhang, “Spatiotemporal feature extraction based on invariance representation,” in Proc. IEEE IJCNN, Jun. 2008, pp. 927–932. [50] J. Z. Jin, C. Weng, C.-I. Yeh, J. A. Gordon, E. S. Ruthazer, M. P. Stryker, et al., “On and off domains of geniculate afferents in cat primary visual cortex,” Nature Neurosci., vol. 11, no. 1, pp. 88–94, Dec. 2007. [51] R. Kupper, A. Knoblauch, M.-O. Gewaltig, U. Körner, and E. Körner, “Simulations of signal flow in a functional model of the cortical column,” Neurocomputing, vol. 70, nos. 10–12, pp. 1711–1716, Jun. 2007. [52] B. Çürüklü and A. Lansner, “A model of the summation pools within the layer 4 (area 17),” Neurocomputing, vols. 65–66, pp. 167–172, Jun. 2005.

1358

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 25, NO. 7, JULY 2014

[53] J. M. Delgado, A. Turiel, and N. Parga, “Receptive fields of simple cells from a taxonomic study of natural images and suppression of scale redundancy,” Neurocomputing, vol. 69, nos. 10–12, pp. 1224–1227, Jun. 2006. [54] Q. Tang, N. Sang, and T. Zhang, “Extraction of salient contours from cluttered scenes,” Pattern Recognit., vol. 40, no. 11, pp. 3100–3109, Nov. 2007. [55] G.-E. La Cara and M. Ursino, “A model of contour extraction including multiple scales, flexible inhibition and attention,” Neural Netw., vol. 21, no. 5, pp. 759–773, Jun. 2008. [56] S. Aly, N. Tsuruta, R.-I. Taniguchi, and A. Shimada, “Visual feature extraction using variable map-dimension hypercolumn model,” in Proc. IEEE IJCNN, Jun. 2008, pp. 845–851. [57] Y. Huang, K. Huang, L. Wang, D. Tao, T. Tan, and X. Li, “Enhanced biologically inspired model,” in Proc. IEEE CVPR, Jun. 2008, pp. 1–8. [58] W. Huang, L. Jiao, and J. Jia, “Modeling contextual modulation in the primary visual cortex,” Neural Netw., vol. 21, no. 8, pp. 1182–1196, 2008. [59] H. Jhuang, T. Serre, L. Wolf, and T. Poggio, “A biologically inspired system for action recognition,” in Proc. 11th IEEE ICCV, Jun. 2007, pp. 1–8. [60] G. G. Blasdel and G. Salama, “Voltage-sensitive dyes reveal a modular organization in monkey striate cortex,” Nature, vol. 321, no. 6070, pp. 579–585, Jun. 1986. [61] D. H. Hubel and T. N. Wiesel, “Sequence regularity and geometry of orientation columns in the monkey striate cortex,” J. Comparat. Neurol., vol. 158, no. 3, pp. 267–293, 1974. [62] (2013, Jul. 19). Webvision: The Organization of the Retina and Visual System [Online]. Available: http://webvision.med.utah.edu [63] M. Stemmler, M. Usher, and E. Niebur, “Lateral interactions in primary visual cortex: A model bridging physiology and psychophysics,” Science, vol. 269, no. 5232, pp. 1877–1880, 1995. [64] H. Wei and Y. Wang, “A new model to simulate the formation of orientation columns map in visual cortex,” in Advances in Neural Networks (LNCS). New York, NY, USA: Springer-Verlag, 2011, pp. 35–41. [65] H. Yu, B. J. Farley, D. Z. Jin, and M. Sur, “The coordinated mapping of visual space and response features in visual cortex,” Neuron, vol. 47, no. 2, pp. 267–280, 2005. [66] R. Mikkulainen, Computational Maps in the Visual Cortex. New York, NY, USA: Springer-Verlag, 2005. [67] A. Shmuel and A. Grinvald, “Coexistence of linear zones and pinwheels within orientation maps in cat visual cortex,” Proc. Nat. Acad. Sci., vol. 97, no. 10, pp. 5568–5573, Jan. 2000. [68] S. Belongie, G. Mori, and J. Malik, “Matching with shape contexts,” in Statistics and Analysis of Shapes. New York, NY, USA: Springer-Verlag, 2006, pp. 81–105. [69] A. Ghosh and N. Petkov, “Robustness of shape descriptors to incomplete contour representations,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 11, pp. 1793–1804, Nov. 2005. [70] S. Belongie, J. Malik, and J. Puzicha, “Shape matching and object recognition using shape contexts,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 4, pp. 509–522, Apr. 2002. [71] D. G. Lowe, “Object recognition from local scale-invariant features,” in Proc. 7th IEEE Int. Conf. Comput. Vis., vol. 2. Sep. 1999, pp. 1150–1157. [72] V. Ferrari, F. Jurie, and C. Schmid, “From images to shape models for object detection,” Int. J. Comput. Vis., vol. 87, no. 3, pp. 284–303, 2010.

[73] V. Ferrari, T. Tuytelaars, and L. Van Gool, “Object detection by contour segment networks,” in Proc. ECCV, Jun. 2006, pp. 14–28. [74] C. Lu, L. J. Latecki, N. Adluru, X. Yang, and H. Ling, “Shape guided contour grouping with particle filters,” in Proc. IEEE 12th ICCV, Oct. 2009, pp. 2288–2295. [75] A. Thayananthan, B. Stenger, P. H. Torr, and R. Cipolla, “Shape context and chamfer matching in cluttered scenes,” in Proc. IEEE Comput. Soc. Conf. CVPR, vol. 1. Jun. 2003, pp. 127–133.

Hui Wei received the Ph.D. degree from the Department of Computer Science, Beijing University of Aeronautics and Astronautics, Beijing, China, in 1998. He was a Post-Doctoral Fellow with the Department of Computer Science and the Institute of Artificial Intelligence, Zhejiang University, Zhejiang, China, from 1998 to 2000. Since 2000, he has been with the Department of Computer Science and Engineering, Fudan University, Shanghai, China. His current research interests include artificial intelligence and cognitive science.

Qiang Li received the B.S. degree in computer science from Fudan University, Shanghai, China, in 2009, where he is currently pursuing the Ph.D. degree with the Department of Computer Science. His current research interests include cognition, brain-inspired computer vision, and neural network.

Zheng Dong received the B.S. and M.S. degrees in computer science from Fudan University, Shanghai, China, in 2008 and 2013, respectively. He is currently pursuing the Ph.D. degree with the Department of Computer Science. His current research interests include artificial intelligence and brain-inspired image processing.