for classifying becomes more spread. Guided Grad CAM, however, more accu- rately shows which pixels have more influence. Hence, we use this latter method.
Assessing the Performance of Convolutional Neural Networks on Classifying Disorders in Apple Tree Leaves Pedro Ballester(B) , Ulisses B. Correa, Marco Birck, and Ricardo Araujo Federal University of Pelotas, Pelotas, RS, Brazil {plballester,ub.correa,mafbirck,ricardo}@inf.ufpel.edu.br
Abstract. This paper evaluates the deep learning architecture AlexNet applied to the diagnosis of disorders from leaf images using a recent dataset containing five apple tree disorders. It extends previous work by providing a more extensive testing and a dataset validation by using visualization methods. We show that previous results likely overestimate general accuracy, but that the model is able to learn relevant features from the images.
1
Introduction
According to the United Nations, by 2050 the world population is expected to achieve 9 billion individuals while global warming could cut the worldwide crop yields by more than 25% [4]. There are a number of factors that threatens food security besides the climate change, such as the decline in pollinators, pests and pathogens. New technologies and techniques to improve food production must continue to be developed to ensure food security. Fruits are an important food source to humans and apple is the most consumed fruit after banana. Many disorders can affect the production of apples and the timely diagnosis of such disorders is critical to improve a crop yield. However, a correct diagnosis requires experts which are either not always available or are very costly [3]. Therefore, automating diagnosis could greatly reduce costs and improve apple production. Many disorders, such as diseases and nutritional deficiencies, affect primarily the leaves of apple trees, making them a natural target for diagnosis, with several previous works tackling this problem from a Machine Learning approach [1,6,8]. In particular, in [3] Convolutional Neural Networks (CNN) are applied to an extensive data set containing five disorders that have a major impact in Brazilian crops: Glomerella, Scab, Potassium Deficiency, Magnesium Deficiency and Herbicide Damage. While the results shown were promising, testing was only conducted over a small test set of 75 images. Such small number was justified due to the focus on comparing performance against a set of experts, which were also asked to diagnose the same images; however, such small number can introduce undesired bias, making the reported accuracy (97%) unreliable. c Springer International Publishing AG 2017 ⃝ D.A.C. Barone et al. (Eds.): LAWCN 2017, CCIS 720, pp. 31–38, 2017. https://doi.org/10.1007/978-3-319-71011-2_3
32
P. Ballester et al.
The present work aims at improving the testing methodology used in [3] in order to provide a more robust and reliable measurement of how accurate the proposed approach is. In order to do so, we conduct a more extensive test using far more images and also evaluate learned attributed from the network after training. We show that while accuracy remains high (92%), it is lower than the one reported in the original paper. We also show that the network is indeed able to make use of regions of the images that contain leaf damage, an evidence that the model is learning not only useful, but also correct features.
2
Related Works
Several works propose to automatically diagnose plant diseases through image inputs, thus cutting costs of highly specialized personal inspection. Recent works are focused on Convolutional Neural Networks (CNN), a kind of Artificial Neural Network (ANN) capable of learning which features in the input are more significant to the training task. Despite the increasing use of CNNs, there are still works using classic methods to obtain features from images before applying a machine learning method. In [1] a total of 38 color, shape, and texture features are extracted from leaves pictures. These features are presented as input for a machine learning classifier (Support Vector Machines) obtaining an overall accuracy of 94.22% in their test-set, containing images of leaves affected with powdery mildew, mosaic, and rust. In [8] authors tested two approaches to image-based diagnosis: a fine-tuned deep network versus ad-hoc solution, based on shallow neural network. They had access to a reduced data-set of apple leaves pictures affected by black rot (Botryosphaeria obtusa). Botanists labeled dataset images to indicate the disease stage of development. In their experiments, a shallow network was able to achieve accuracy of 79.3% and a VGG16 architecture pre-trained over ImageNet and fine-tuned to the same data set achieved accuracy of 80%. In [6], a CNN architecture based on CaffeNet and pre-trained on ImageNet is proposed to classify 13 different types of diseases present in leaves of apples, peach, pear, and grapevine trees. This model reaches a reported accuracy of 96.3% when data augmentation techniques is used, such as affine, perspective, and rotation transformations. In [3], the authors focused on five different disorders present exclusively in apple trees, including not only diseases but also nutritional deficiencies. A dataset with expert labeling was built for the work and made available, which is used in the present work. The authors report a 97% accuracy using an AlexNet architecture pre-trained over ImageNet. However only on a small testing set of 75 images was used. The small number was used due to the need to compare to the classification provided by other experts, which provided evidences that the method could be more accurate than those.
Assessing the Performance of Convolutional Neural Networks
33
The need for understanding how a neural network makes decisions has been discussed in many works, such as [9]. In particular, identifying which parts of an image influence the network’s classification is an ongoing problem in deep learning research. In [10], a heatmap is generated stating which parts of the input participated in the classification process. However, this method gives a broad view, not showing which pixels influenced the most, and tend to not work with Multilayer Perceptron, which AlexNet’s output layer is composed of. The approach proposed by [7], on the other hand, allows for the observation of which pixel had the most influence; however the method is not class-discriminative, meaning that if a different label is passed to the method, its result remains the same. In [5], a pixel-specific, class-discriminative approach is proposed that was shown to work well with MLPs. This latter approach is used in the present work.
3
Goals and Methodology
Our main goal is to conduct a more extensive analysis of CNNs applied to the problem of classifying disorders in apple trees from leaf images. In particular, we use the same architecture and dataset as the one presented in [3] but conduct a more comprehensive test of its performance and a more thorough analysis of the results. Our specific goals are: (i) to apply a more robust testing methodology to assess the performance of the network and (ii) to better understand image features learned by the network. In order to accomplish our goals, we use the same AlexNet CNN architecture used in [3]. The network is pre-trained on ImageNet in order to obtain useful filters and then a Multilayer Perceptron is trained at the end of the network to conduct the classification. Adam Optimizer [2] was used with learning rate as 1e − 5 and a mini-batch size of 5. While [3] uses only 75 images to test the network, we randomly split the examples into a training set (1000 images) and a test set (500 images), providing a more reliable measurement of generalization capabilities of the model. Both sets are stratified, containing the same number of examples in each class. We considered two methods to identify and visualize network activations: CAM [10] and Guided Grad CAM [5]. Figure 1 shows an assessment of both methods applied to the dataset. CAM method loses its ability to identify which parts of the image are responsible for a given classification as the information for classifying becomes more spread. Guided Grad CAM, however, more accurately shows which pixels have more influence. Hence, we use this latter method throughout this paper. Guided Grad CAM is used in two different ways. First, we observe the progression of the visualization against number of epochs. This allows to measure which image parts are becoming more relevant as training progress. Second, we relate regions from the image that are the most important for classifications taking place. We try and relate these regions to spots in the image that are related to the disorder being classified.
34
P. Ballester et al. Step
CAM
Guided Grad-CAM
200
2000
6000
10000
Fig. 1. Progression of classic CAM and Guided Grad CAM methods over different number of time steps. Guided Grad CAM is able to better represent the image and active areas.
4
Results
Figure 2 shows how accuracy over the test set evolves over the course of training epochs. The network converges after about 2000 epochs and the final accuracy
Assessing the Performance of Convolutional Neural Networks
35
Fig. 2. Accuracy over training steps for AlexNet. The results presented correspond to the accuracy over the test set, which do not backpropagate the error. There is no freezing on the convolutional layers during the fine-tuning procedure.
Fig. 3. Evolution of Guide Grad CAM during the network’s training process. As it becomes better at the classification task, the method shows more clearly which parts of the image are responsible for the correct classification. The numbers represent the number of training steps at each snapshot. (Color figure online)
36
P. Ballester et al.
Fig. 4. Relations between regions related to the disorder and areas being used by the network. On the right end, a magnification of the red regions of the top image and the blue regions of the bottom image with their Grad-CAM counterparts. (Color figure online)
after 12000 epochs is of 92%, below the reported 97% in [3]. Nonetheless, this is a more accurate estimate of the true performance of the approach since far more examples were used to conduct the test. Figure 3 depicts the Guided Grad CAM method over a single image. The parts of the image used for classifying a specific class are highlighted by colors, while the unused ones tend to black. In this context, the image shows how the portions of the image used for the networks’ decision making process increase over training steps. The observed increase in colored areas resembles how plant disorders affect the leaf, spreading through the surface and creating highly damaged spots. Figure 4 shows pairs of affected leaves images and their Guided Grad CAM counterparts. The circles highlight regions from the image related to the disorder that had a high response in the network. In the top one, the disorder damage can be identified on the right image by the presence of more colored areas, mostly containing yellow color. The bottom image presents green patterns for most of the regions of interest. This kind of patterns persist through other classes and
Assessing the Performance of Convolutional Neural Networks
37
images, maintaining high response areas in regions directly related the most damaged parts of the leaf. The above results are evidence that the model is able to learn useful features that correlate well with relevant areas of the image; in contrast, a bias would manifest itself by highlights in areas that are not related to the disorder, such as the image background or mostly in healthy parts of the leaf, none of which is observed.
5
Conclusions
This paper provided a more in-depth analysis of the results presented in [3], which was limited due to the need to compare performance against a data set labeled by a panel of experts. In our approach, we aimed at obtaining a more reliable indicator of the performance of the trained model and obtain evidences that this performance is not due to extraneous artifacts in the images. Our methodology proposed the use of the same data set and architecture used in [3], however using a much larger number (500) of images for testing purposes and a technique, Guided Grad CAM, to allow the visualization of regions of images that contribute to the model’s classifications. Our results allows the conclusion that the AlexNet architecture, pre-trained over ImageNet and fine-tuned to the apple leaves’ dataset, is indeed able to classify with a high accuracy the five disorders in the dataset. However, we found that the original accuracy (97%) over the small sample is likely an overestimate of the model’s performance, with a 92% accuracy being a more reliable estimator. Furthermore, by visualizing areas of images that contribute to the classifications, using the Guided Grad CAM technique, we were able to show evidences that the model is learning relevant features from the images, namely damaged leaf areas. No evidences of learned artifacts or bias towards irrelevant image areas (e.g. background) were found. Finally, we point out directions for further improvements. A cross-validation over the dataset would improve the estimate, even if very costly due to the long training times. However, we believe that it is necessary to test the trained model over a dataset with more diverse examples that is closer to a real-world application - for instance, different lighting conditions, diverse backgrounds and leaf positions. Acknowledgements. This work is supported by CNPq through grant number 407780/2016-5. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X GPU used for this research.
References 1. Chuanlei, Z., Shanwen, Z., Jucheng, Y., Yancui, S., Jia, C.: Apple leaf disease identification using genetic algorithm and correlation based feature selection method. Int. J. Agric. Biol. Eng. 10(2), 74–83 (2017)
38
P. Ballester et al.
2. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) 3. Nachtigall, L.G., Araujo, R.M., Nachtigall, G.R.: Classification of apple tree disorders using convolutional neural networks. In: 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 472–476. IEEE (2016) 4. United Nations: Food, July 2017. http://www.un.org/en/sections/issues-depth/ food/ 5. Selvaraju, R.R., Das, A., Vedantam, R., Cogswell, M., Parikh, D., Batra, D.: Gradcam: why did you say that? Visual explanations from deep networks via gradientbased localization. arXiv preprint arXiv:1610.02391 (2016) 6. Sladojevic, S., Arsenovic, M., Anderla, A., Culibrk, D., Stefanovic, D.: Deep neural networks based recognition of plant diseases by leaf image classification. Comput. Intell. Neurosci. (2016) 7. Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net. arXiv preprint arXiv:1412.6806 (2014) 8. Wang, G., Sun, Y., Wang, J.: Automatic image-based plant disease severity estimation using deep learning. Comput. Intell. Neurosci. (2017) 9. Yosinski, J., Clune, J., Nguyen, A., Fuchs, T., Lipson, H.: Understanding neural networks through deep visualization. arXiv preprint arXiv:1506.06579 (2015) 10. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: CVPR (2016)