AUN/SEED-Net Regional Conference for Computer and Information Engineering 2016
Flower species identification using deep convolutional neural networks Thi Thanh Nhan Nguyen1,2, Van Tuan Le1, Thi Lan Le1, Hai Vu1 Natapon Pantuwong3, Yasushi Yagi4 1) International Research Institute MICA, Hanoi University of Science and Technology, Hanoi, Vietnam 2) Thai Nguyen University of Information and Communication Technology, ThaiNguyen, VietNam 3) Faculty of Information Technology, King Mongkut's Institute of Technology Ladkrabang 4) Department of Intelligent Media, ISIR, Osaka University Abstract This paper demonstrates robustness of deep convolutional neural networks (CNN) for automatically identifying plant species from flower images. Among organs of plant, flower image plays an important role because its appearances are highly distinguishing. Moreover, flower’s observations are stable and less invariant with weather conditions, age of trees, or other artifacts. A number of traditional features have been proposed for basic-level category of the recognition task. However, these approaches may eliminate many useful natural cues during the feature extraction. They also require domain-related expert knowledge. In this paper, robustness of a deep Convolutional Neural Network (CNN) is presented. To select the appropriate network, we firstly implement a comparative study on evaluating performances of well-known CNNs such as AlexNet,CaffeNet,GoogLeNet. We have concluded that GoogLeNet archives the highest performance. By tuning network parameters, the highest performance is archived with the accuracy rate of 67.45% at rank 1 and 90.82% at rank 10 for a flower dataset of 967 species extracted from PlantCLEF 2015. These results are higher six times comparing with conventional Kernel Descriptor (KDES) techniques [1]. Consequently, the proposed technique can be one of the most promising solutions for helps developing the image-based searching applications in fields of the botanical classification, ecological monitoring systems, or in the multimedia community. Keywords: flower identification, GoogLeNet, KDES, Convolutional neural network.
1. Introduction1 Plant identification is an important task for researchers, students, and practitioners in field of the agriculture, forest, biodiversity protection, and so on. Recently, thanks to advanced research in the computer vision community, a number of works have been dedicated to the automatic plant identification based on images of plant organs (e.g., leaf, leafscan, fruit, stem, entire, stem, flower). Among them flower image plays an important role for the plant identification because its appearances (e.g., color, shape, texture) are highly distinguishing. Appearances of flowers are stable and less invariant with weather conditions, age of plant. In views of the botanic experts, flower images therefore are most valuable source for the plant identification task. However, to develop an automatic plant identification system based on flower images, the proposing techniques face to many challenges such as large inter-class similarity, but small intra-class similarity, lighting and viewpoint variations, occlusion, clutter, and object deformations [2]. These issues are illustrated in Fig. 1 with several species.
Tel.: + 84 4 38 68 30 87 Email:
[email protected]
Fig.1. Challenges of flower identification. (a) Viewpoint variations; (b) Occlusion; (c) Clutter; (d) Light variation; (e) Object deformations; (f) Small intra-class similarity; (g) Large inter-class similarity.
AUN/SEED-Net Regional Conference for Computer and Information Engineering 2016
In the literature, some approaches on the flower identification have been proposed [1], [2], [5]. They usually consist of four steps: pre-processing, segmentation, hand-design feature extraction and classification. Since flower images have complex background therefore those works are timeconsuming and the obtained accuracy is still low, particularly, with a large number of species. Recently, learning feature representations using a Convolutional Neural Networks (CNN) show a number of successes in different topics in field of the computer vision such as object detection, segmentation and image classification [9]. Feature learning approaches provide a natural way to capture cues by using a large number of code words (sparse coding) or neurons (deep networks). They are useful cues because natural characteristics of the objects are captured. Therefore, in this paper, we examine and demonstrate effectiveness of the deep convolutional neural networks that could be more effective for flower-based plant species identification. 2. Related work There are two main approaches for the plant identification based on image of the plant organs. They are hand-designed feature and deep learning. A number of hand-crafted (or hand-designed) features have been used for flower-based identification, such as Kernel Descriptors (KDES) [1]; color, shape, texture [2]. In [2] the authors extracted different type of features such as: HSV values, MR8 filter, SIFT, Histogram of Oriented Gradients (HOG) on a dataset consisting of 17 flower categories. After that, they use a Support Vector Machine (SVM) classifier and combine different linear weighted kernels. They evaluate and select the optimal features and utilize them on a dataset of 102 category/species. A good recognition rate is achieved. In [3], the authors also utilize color and shape features extracted from flower images. To discriminate species, they use Principal Component Analysis (PCA) from different types of flower. In [4], the authors extract HOG features then employ a SVM for classification. In [5], the authors propose a flower image retrieval tool based on ROI (Region-Of-Interest). They use the color histogram of a flower region and two shape-based features, which are Centroid-Contour Distance and Angle Code Histogram, to characterize the shape features of a flower. For evaluation, they use a database of 14 plant species. Le et al. [1,2] used KDES which has been proposed firstly by Liefeng Bo et al. [10] for plant identification. KDES is a robust feature extraction technique that allows to build hierarchical models from low level (at pixel) to higher level (at patch and/or whole flower image). After computing KDES, authors apply a SVM classifier for classification. KDES obtained very promising result
for leaf-based plant identification. However, the recognition rate is still inacceptable when adopting KDES to flower-based identification. Regarding the feature learning approaches, in PlantCLEF 2015 [7] competition, some research teams utilize CNN for the plant identification task based on multi-image plant observation queries. Each image of a query observation is one of seven view types: entire plant, branch, fruit, leaf, flower, stem or leaf scan. To the best of our knowledge, only few of researches focus on flower image by using CNN. In this paper, we clearly exploit CNN for flower-based plant identification. We then compare the performance of CNN and a hand-designed feature technique to show robustness of the CNN. 3. Proposed method Flower images are normally captured on complicated background with the presence of different objects. Even, CNN can be applied directly on these images, in this paper, in order to evaluate the effect of background for flower identification; we deploy some preprocessing techniques on flower images. The results extract only flower regions from a natural image. 3.1. Preprocessing Flower regions are usually overlaid on a complex background therefore they are difficult to correctly separate from the background. In this work, we apply saliency-segmentation-based approaches to select the ROI (Region-Of-Interest) on flower images. The main flow of pre-processing techniques is shown in Fig. 2. Firstly, we adapt a saliency extraction method as described in [13] and a common segmentation technique (e.g., mean-shift algorithm). The segmented region is selected based on a condition that its corresponding saliency value is large enough. The connected-region techniques then are applied to merge into interest-of-regions. Figure 3 shows ROIs (left panel) whose top-left and bottom-right points form a rectangle on original images (right panel). Saliency map results
Saliency value on whole image SK Select SM if SM> αSK
Input image Mean-shift segmentation
Saliency value on each segmented region SM
Connected regions
ROIs
Fig.2. The proposed pre-processing to select the regions of interest (ROI) of flower.
AUN/SEED-Net Regional Conference for Computer and Information Engineering 2016
layers (independent building blocks) used for the construction of the network is about 100. Googlenet incorporates Inception module with the intention of increasing network depth with computational efficiency. Googlenet with 5 million parameters gets 6.7% at top 5 error.
Fig.4. An illustration of Alexnet's architecture (adapted from [11]) Fig.3. Flower images and detected ROIs. 3.2. Convolutional neural network CNN is the one of the most famous deep learning approaches where multiple layers are training. It is very successful in computer vision, especially, in an annual competition called the Large-Scale Visual Recognition Challenge (ILSVRC). It is done on a very large database Imagenet containing 1.2 million images with 1000 classes. Some famous CNN are Lenet, Alexnet, Clarifai, SPP, VGG, GoogLeNet, Resnet. A CNN typically comprises multiple convolutional and sub-sampling layers, optionally followed by the fully-connected layers like a standard multi-layer neural network. The main advantage of CNN over fully-connected networks is that they are easier to train and have fewer parameters with the same number of hidden units to identify species. In this paper, we expoilt robustness of a deep convolutional neural network (CNN) through a large dataset with 1000 species. To select the appreciated network, we firstly implement a comparative study on evaluating performances of well-known CNNs such as Alexnet, Caffenet, Googlenet. Alexnet was proposed by Krizhevsky et al. [11] and won ILSVRC 2012. This network has 8 layers and gets 15.3% at top 5 error. Alexnet includes 60 million parameters and 650.000 neurons. The architecture of Alexnet is described in Fig.4. Caffenet presented in [12] is a modified version of Alexnet. It has 5 convolutional layers and 3 fully connected layers. The difference between Caffenet and Alexnet is that some layers are switched to reduce the memory footprint and increase bias-filter value in Caffenet. Googlenet of Szegedy et al. has won ILSVRC 2014. It used a new variant of convolutional neural network called “Inception” for classification. Fig.5 shows a schematic view of Googlenet. It is a very deep neural network model with 22 layers when counting only layers with parameters (or 27 layers if we also count pooling) [8]. The overall number of
Fig.5. A schematic view of Googlenet network (adapted from [8]) 3.3. Fine-tuning the CNNs for the classification In our approach, we fine-tuned the CNNs for optimizing the parameters. Firstly, we use pre-trained weights of CNNs on ImageNet and fine tuning the following parameters: • Test iteration: 1666 • Initial learning rate: 0.001 • Step size: 10000 (update learning rate after 10000 iterations) • Batch size: 5 (test set) • Number of iterations: 50,000 For fine-tuning the networks, the top fullyconnected layer was retrained on a flower dataset which are extracted from flower images of the PlantCLEF. To capture the low-level features, the bottom layers of the networks were retained. Intuitively, the lower layers of the network contain more generic features such as edge detectors or shape detectors, but the higher layers become progressively more specific to the cues relating to the classes in the training data. 3.4. Hand-designed features based on KDES To evaluate performance of the CNNs, we compare the classification results using CNNs and a hand-desgined features based on KDES. Readers can refer [1, 2] for details of this technique. In this paper, we summary flower image-based for plant identification. KDES descriptor technique extracts features from of the processed organ images through 3 levels, as listed below:
AUN/SEED-Net Regional Conference for Computer and Information Engineering 2016
-
Pixel-level: At this level, a normalized gradient vector is computed for each pixel of the flower image.
-
Patch-level: For each patch, we compute patch features based on a given definition of match kernel. The gradient match kernel is constructed from three kernels.
-
Image-level: Given an image, the final representation is built based on features extracted from lower levels using efficient match kernels (EMK).
Finally, we evaluate the effect of preprocessing techniques on the flower identification.
The extracted features from three levels are concatinated to form a KDES feature vector. We then deploy multi-class SVM classification for plant identification. Fig.6. Examples of flower with complicated background images on PlantCLEF 2015
4. Experimental setup 4.1. Dataset In the experiment, we evaluate performance of the CNNs through a dataset of PlantCLEF 2015 [7]. To the best of our knowledge, it is the largest dataset of plant identification and close to a real-world. This dataset presents different challenges due to the large intra-class variation and inter-class similarity. From this dataset, we extract flower dataset that contains 27.975 images for training and 8.327 images for testing of 967 plants species living in West Europe. This dataset includes single flowers on complex background. Images are captured in different views. Each image has one corresponding xml file which describes information of image such as observationId, Filename, meadiaId, view content, classid, species, genus, family, date, vote, location. Some examples of the flower are illustrated in Fig.6. To evaluate performances, we compute the accuracy at rank k as follows: Accuracy =
T N
(1)
where T is the correct recognition and N is the number of queries. One image is correctly recognized if the relevant plant is in the k first plants of the retrieved list. In our experiments, we calculate accuracy at Rank 1 and Rank 10. For implementations, we install Caffe framework on GPU GTX 970 and use three well-known models of Caffe that are AlexNet, Caffenet, GoogLeNet.
Table 1 shows the obtained results of three CNNs models on pre-processed images. We can found that Googlenet archives the highest performance. This result is consistent with the result of the annual competition ILSVRC. Googlenet has more layers than Alexnet, Caffenet and incorporates Inception module. Tab. 1. The accuracy at Rank 1 of some CNNs on the pre-processed database Alexnet
Caffenet
Googlenet
50.60
54.84
66.60
Accuracy
Table 2 shows the result of the second experiment. In this experiment, we compare the performances of KDES and Googlenet on the pre-processed images. The result of Googlenet is higher six times than KDES’s ones. It is worth to note that KDES is a robust descriptor for different object recognition and it is proved to be effective for leaf-based plant identification because of its rich information [2]. However, for other types of plant organs such as flowers, this descriptor is not flexible enough in order to reflect different aspects of the speices. In these cases, CNN becomes a good choice. Tab. 2. The obtained results for KDES and Googlenet on pre-processed flower images. Rank 1
Rank 10
4.2. Results
KDES
10.95
24.62
We perform three experiments. The first experiment aims at making a comparative evaluation of several CNNs on pre-proccessed flower database. In the second experiment, we compare the performance of CNN with one robust descriptor named KDES [1].
Googlenet
66.60
90.23
Table 3 compares the results of GoogLeNet on raw and pre-processed images. The obtained results on raw images are slightly higher than pre-processed
AUN/SEED-Net Regional Conference for Computer and Information Engineering 2016
images. The main reason is that with pre-processing techniques, in some cases, we can loose flower and also natural context information. The result confirms that for flower identification based on CNN, we do not need to perform pre-processing techniques. Some image examples of identification results are illustrated in Fig.7. The first column shows queried images and the remaining columns show the top 5 retrieved plants. The correct identification is marked by a green box. We can see that CNN can provide Name of species
correct matching even flowers in the images are captured in very different point-of-view. Tab. 3. The result of Googlenet on raw and preprocessed images. Rank 1
Rank 10
Raw images
67.45
90.82
Processed images
66.60
90.23
Retrieval results at different ranks 1st
2nd
3rd
4th
5th
Helianthemum nummularium
Punica granatum
Convolvulus soldanella
Cirsium eriophorum
Epilobium angustifolium
Test Image
Fig.7. Some example of identification results. The first column shows query images and the remaining columns show the top 5 retrieved plants. The correct identification is marked by a green frame.
AUN/SEED-Net Regional Conference for Computer and Information Engineering 2016
5. Conclusions In this paper, we have proposed to apply CNN for flower identification. Different experiments have been performed. The obtained results show the effectiveness of the CNN for flower identification. The accuracy at Rank 10 is greater than 90% for a dataset with a large number of classes. However, the accuracy obtained for Rank 1 is still limited. In our future works, we focus on improving CNN design to increase the accuracy at first ranks as well as fusing the recognition results from different image organs. Acknowledgment The authors thank Collaborative Research Program for Common Regional Issue (CRC) funded by ASEAN University Network (Aun-Seed/Net), under the grant reference HUST/CRC/1501. References [1] Thi-Lan Le, Nam-Duong Duong, Hai Vu, Thanh-
Nhan Nguyen, “Mica at lifeclef 2015: Multiorgan plant identification”, In Working notes of CLEF 2015 conference, 2015. [2] Nilsback, Maria-Elena, and Andrew Zisserman, “An automatic visual florasegmentation and classification of flower images”, Oxford University, 2009. [3] Rodrigo, Ranga, Kalani Samarawickrame, and Sheron Mindya. "An Intelligent Flower Analyzing System for Medicinal Plants", Conference on Computer Graphics, Visualization and Computer Vision (WSCG 2013), 2013. [4] Angelova, Anelia, et al, "Development and deployment of a large-scale flower recognition mobile app", NEC Labs America Technical Report, 2012.
[5] Hong, An-xiang, et al., "A flower image retrieval
method based on ROI feature", Journal of Zhejiang University Science 5.7, 2004, 764-772. [6] Mattos, Andréa Britto, et al, "Flower Classification for a Citizen Science Mobile App", Proceedings of International Conference on Multimedia Retrieval. ACM, 2014. [7] Goëau, Hervé, et al, "LifeCLEF Plant Identification Task 2015", In CEUR-WS, ed.: CLEF: Conference and Labs of the Evaluation forum. Volume 1391 of CLEF2015 Working notes, Toulouse, France, 2015. [8] Szegedy, Christian, et al., "Going deeper with convolutions", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015. [9] Yoo, Hyeon-Joong, "Deep Convolution Neural Networks in Computer Vision", IEIE Transactions on Smart Processing & Computing 4.1, 2015, 35-43. [10] Bo, L., Ren,X., Fox, D., “Kernel descriptors for visual recognition”, In Advances in neural information processing systems, 2010, 244-252. [11] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification with deep convolutional neural networks", Advances in neural information processing systems, 2012. [12] Jia, Yangqing, et al., "Caffe: Convolutional architecture for fast feature embedding", Proceedings of the 22nd ACM international conference on Multimedia, 2014. [13] R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk, “Frequency-tuned salient region detection”, in the Proc. Of IEEE Conference on Computer Vision and Pattern Recognition, 2009. pp 1597–1604, June 2009.