Classification of Closed and Open Shell Pistachio ...

4 downloads 435 Views 696KB Size Report
dealers. Then, pistachio nuts were mixed homogenously and. 100 closed-shell and 100 ... Thus, any PC, laptop or workstation connected to the same network ...
Classification of Closed and Open Shell Pistachio Nuts by Machine Vision Musa ATAŞ*, Yahya DOĞAN {musa.atas, yahyadogan}@siirt.edu.tr Computer Engineering Department, Siirt University El-Cezeri Vision & Cybernetics Laboratory, Turkey Abstract— in this study machine vision based pistachio nut classifier system is presented. Proposed system is evaluated on the Siirt pistachio species. Siirt pistachio nuts differ from other pistachio species such as Antep pistachio according to their shape, size and taste properties. Traditionally, pistachio nuts are inspected/classified via visual inspection of workers, manually. As a result, classification process is subjected to poor efficiency in terms of time and cost. Moreover, visual inspection and classification by hand is a tedious process and may contain various health risks. Our developed machine vision system aims to classify pistachio nuts to closed and open shell classes in a fully automated manner. For the sake of simplicity and rules extraction ability from training dataset, J48 decision tree was utilized as a main classifier. Classification performance of J48 was also compared to other well-known classifiers including Naïve Bayes and Multi-Layer Perceptron (MLP). Experiments revealed that proposed system using J48 decision-tree yields simple and interpretable classifier along with satisfactory classification accuracy performance of 94.5%. Keywords— Machine Vision; Machine Learning; Classification; Pistachio Nuts; J48; Naïve Bayes; Decision Tree; MLP.

I. INTRODUCTION Pistachio nut can be counted as one of the significant commodity of Turkey. Considering the production capacity, Turkey is the third pistachio nut producer in the world. In Turkey there are two types of pistachio species known as Antep and Siirt. Siirt province is the third producer in the Turkey and supplied approximately 18% of the national pistachio production [1]. Siirt pistachios are consumed as a fresh nut, desired more than Antep pistachios in the market due to having coarser grains, lower oil level and higher nutrients properties. Today, Siirt pistachios are exported to international markets and preferred among both local and global consumers. In this respect, quality of pistachio in the market has significant influence on the consumer choice. Consumers typically don’t make any tolerance for closed-shell pistachios if they come across in the purchased product. Today mostly hybrid system is used. Initially mechanical sorter machine try to classify closed-shell from open-shell nuts. Afterwards, manual inspection is carried out by workers nevertheless, it can be considered as potentially unhealthy, inconsistent, time consuming and expensive in nature. Hence, fully automated system with a higher classification accuracy rate is inevitably desired. According to [2], Pearson reported

that at harvest approximately 17%, 5% and 78% of the pistachio are closed-shell, thin-split and open-shell, respectively. Separation of closed-shell nuts from open and thin split are accomplished by so called mechanical pin-picker or needle sorter system. Needle sorter comprised a barrel with thousands of sharp needles. Raw pistachio nuts are fed into turning barrel. As barrel turned up, needles catch the splitshell pistachios and as a result closed-shell nuts are roll out at the end of the barrel. Mechanical sorting system has 97% and 95% classification accuracies on closed-shell and open-shell pistachio nuts, respectively. Yet, 8% thin-split pistachio nuts are misclassified as open-shell in the pin-picker system. Although pin-picker yields relatively sufficient sorting accuracy, due to its internal structure which may deteriorate the pistachio shells, they are not preferred in general. Furthermore, classification performance in thin-split cases can be considered as poor and only closed and open-shell cases can be handled. In the literature alternative separator systems to the mechanical system have been studied and several impact acoustic and image based systems were established. Therefore, impact acoustic and image based systems can be considered as two main branches for pistachio classification process. For impact acoustic studies Pearson et.al showed that by building microphone, Digital Signal Processing (DSP) device and air rejection nozzle separator system, almost 97% classification accuracy for open-shell and closed shell pistachio nuts can be obtained[2]. However considering thinsplit samples overall accuracy rapidly drops to 85% implies that developed impact acoustic system for three classes (closed, thin-split, open-shell) may not provide satisfactory result. Other deficiencies are related to the ambient noise intervention and the strike direction of the pistachio nut samples to the steel plate. As crack of pistachio nuts generally appears along suture direction, strike point on nuts cannot be controlled in the real-time processing and therefore this mechanism leads to increase misclassification rate. On the other hand, developed system exhibits efficient processing speed of 40 nuts/sec. Other studies related to the impact acoustic can be accessed [3] –[5]. Those studies are also work on the similar theoretical basis. For image based sorting systems, Haff et.al studied sorting of in-shell pistachios from kernels using color images and achieved 99.9% accuracy for regular in-shell pistachio from kernels. However for smaller in-shell pistachio this accuracy rate drops to 85% and 96% for discriminant analysis (DA) and K-Nearest Neighbor (KNN) approaches, respectively [6]. Ghazanfari et.al used fourier

descriptors and MLP as features and a classifier for grading the pistachios into three United States Department of Agriculture (USDA) size grades and closed-shell class, respectively. They achieved 94.8% overall classification accuracy [7]. Another image based study was conducted by Kouchakzadeh for discriminating five different varieties of pistachios and obtained 99.6% accuracy rate [8]. It should be noted that both studies performed the classification process in an off-line manner and generated image dataset was actually made up of ideal pistachio postures and positions. Thus for a real time operation classification performance might be adversely affected due to challenging cases that may be arisen from pistachio nuts positions. The objective of this study is to assess the feasibility and the efficiency of the image based pistachio sorting system that aims to classify closed and open shell pistachios by using machine vision techniques in the real-time manner. Section 2 describes detailed information about major components of the proposed machine vision system, image acquisition and separation processes. Feature extraction methods and machine learning issues are covered in Section 3. Experimental results and discussion are presented in Section 4. Finally, some concluding remarks are drawn at last section. II. MATERIAL AND METHODS Pistachio nut samples were collected from the local markets. Totally 3kg pistachio nuts were purchased from six different dealers. Then, pistachio nuts were mixed homogenously and 100 closed-shell and 100 open-shell nuts were sampled from the lot for training and validation set. 10-fold cross-validation technique was applied in order to evaluate overall generalization performance. Fig. 1 depicts typical pistachio nut samples having closed and open shell. First and second rows indicate three different closed and open-shell pistachio nut samples, respectively. Note that in an actual image resolution is 640x480 pixels. Here only cropped samples are illustrated.

Fig. 1 Typical closed-shell (upper row) and open-shell (bottom row) pistachios.

A. Image Acquisition Key objective of this study is to establish a machine vision system that can separate closed-shell samples from open-shell ones in a real-time fashion. In this respect, a simple setup as it is shown in Fig. 2 was designed and established at El-Cezeri vision & cybernetics laboratory.

Fig. 2 Major components of the proposed machine vision system are manual feed (a), optical sensor (b), illumination ring and auxiliary lamps (c) and GigE industrial camera (d).

System basically operates as follows; pistachio nut samples are dropped by project assistant manually from the feeding part as one by one. Feeding part is designed in such a way that pistachio samples are directed to almost vertical position at the outlet of the feeding component. Thus, samples fall down in front of the optical sensor. Here we used MZ80 industrial infrared optical sensor. Sensor operational range can be adjusted between 3cm up to 80cm and response time is about 2ms. Sensor is sensitive enough to detect the falling object in its field of view and emit data/trigger signal. Data signal voltage is about 3V and can’t trigger/activate the camera because camera needs 12V to activate trigger function. In order to empower the signal, Arduino uno rev3 microcontroller card is utilized as an interface between optical sensor and the industrial camera. Another benefit of using Arduino is to adjust the exact delay time for capturing the sample at the center of the frame. DFK23G618 ¼” Sony CCD industrial camera is employed for the proposed machine vision system. Camera has GigE property enabling connection in network with trigger input. In order to eliminate motion blur artifact from the captured image, camera exposure time setting is adjusted to 1/10000 sec. Such a small exposure time theoretically requires sufficient illumination power therefore high power ring led is adjoined as an illumination source for

this purpose. Special design of led ring (inclined high power led array) enable led to illuminate the object near/front of the ring and filter the distant objects. In this way background subtraction process is easily applicable by performing simple threshold. Camera is connected to the network and is powered via PoE. Thus, any PC, laptop or workstation connected to the same network theoretically can access the camera. In order to acquire and store the captured frames to the PC hard drive, a simple image acquisition software was developed in JAVA using Netbeans integrated development environment. As soon as captured frame receives to the computer, the software assigns a specific unique ID to the image and then saves to its hard drive. This process is preferred only training and validation phases and in real time phase this can be skipped due to its computation cost. B. Feature Extraction One significant contribution of our proposed algorithm is determining the region of interest (ROI) of the exact pistachio nut sample from the captured image. Cropping ROI is favorable because cropped image is about 1/10 of the full image which contributes the image processing speed approximately to 10 times. To do that first, ROI of the sample was calculated from the full image by using basic pixel weight center approach. At the beginning, we analyzed the histogram of the gray level image. We determined threshold value for background subtraction, intensity mean of the background pixels as 10. First, we applied high pass filter based on this threshold and next, all the pixels were projected along X and Y axis. From the projection information, ROI coordinates were extracted. As Fig. 3 indicates, in order to separate close (left hand side) and open (right side) shell pistachio, edge information seems to provide promising results. Generally edge appears on the regions where gradient is relatively high. Canny edge detector was first developed by John F. Canny in 1986 and used as a robust and effective edge detector in various studies [9] – [11]. Canny edge detector algorithm first, applies Gaussian filter for getting smoother surface. In this way, potential mis-edge regions in the surface and noise pixels can be removed. But, this procedure also detriments original edge pixels. In order to keep the favorable information on edge regions, adaptive filter is employed. Wang et.al proposed an adaptive filter algorithm that addresses this particular problem [12]. Key point of their algorithm is about evaluating the discontinuity of the pixel region. Higher the discontinuity, lower effect of filter, lower the discontinuity results in higher weight of filtering. We observed that using the adaptive filter contributes the results. As output of the canny edge algorithm contains 0 and 255 gray level intensities, we simply count the pixels of 255 values to get the total edge length. Finally total sum was normalized with the square root of the sample area (total number of pixels of pistachio nut). As edge is one dimensional, so for correct calculation of normalization, sample area should also be converted to one dimensional by taking the square root. Consequently, we utilized this normalized value as the proposed normalized edge based (NEB) feature in this study.

Fig. 3 Feature extraction steps.

In this study we realized that for classification of closed shell nuts from open-shell ones, utilizing only one NEB feature was adequate. Discrimination power of the proposed NEB feature can be seen from Table 1 and Fig. 4, where several statistical measures are provided also in Table 1. TABLE I SOME STATISTICAL MEASURES FOR THE PROPOSED NORMALIZED EDGE BASED FEATURE

Mean Standard Deviation Minimum Maximum

Closed-Shell 4.63 0.76 1.78 6.62

Open-Shell 8.43 1.63 5.67 12.56

Fig. 4 Distribution of closed (left), and open (right) shell pistachio samples in the dataset.

Discrimination power of the proposed NEB feature can also be evaluated by Fisher formula. Fisher discriminant (FD) was first proposed by Fisher [13]. Ataş et.al [14] reported, FD tries to project data from n-dimensional space to a one-dimensional space (i.e. line or axis) from where between class range is maximum and within class scatter is minimum. It can be computed as in Expression 1; (1) Where, FDP is the Fisher discrimination power, µ and σ designate the mean and variance of the each classes (closed and open-shell classes), respectively. FDP score of the proposed NEB feature was calculated as 4.47. Note that, FDP score higher than one indicates class separation is good. Zero score means poor class separation because two classes are overlapped to each other. C. Classifier Selection As proposed NEB feature is powerful enough, actually no need to utilize complex and sophisticated non-linear learning models. Simple linear models are adequate. Only MLP was used for inspection of non-linearity issue but it is observed that classification performance was not improved any more. Thus, we utilized Naïve Bayes and J48 for simple linear classifiers and MLP as non-linear classifier, accordingly. Weka data mining tool [15] was employed for machine learning purpose throughout the study. For image processing, visualization and matrix calculation issues in Open Cezeri Library (OCL) was utilized [16]. III. EXPERIMENTAL RESULTS Evaluation of proposed NEB feature was carried out with Naïve Bayes, J48 Decision Tree and MLP algorithms for binary classification. To assess classifier performance 10-fold cross validation technique was used. As number of features that we used is limited and size of dataset is big enough, isolated test was not employed. So, curse of dimensionality problem is no longer exist for this problem. As a result, we inferred that completely isolated test does not contribute on classification confidence and consistency, nevertheless we also checked the classification performance on isolated test set but results were not changed any more. Therefore here we skipped test set results because they are similar to cross validation results. Table II lists the 10-fold cross validation classification performance of each learning models. Here correctly classified instance, kappa statistic, f-measure, receiver operating characteristic area (ROC-Area), true positive rate (TPR), false positive rate (FPR), precision and recall criteria were employed. A higher value means a better performance of the classifier. Note that although MLP outperforms Naïve Bayes and J48 with respect to the most of the performance criteria, we still preferred to choose J48 for the sake of simplicity. Additionally by J48 decision tree, explicit if-then-else rule can be easily generated also. In terms of ROC Area constraint and minimum false negative rate Naïve Bayes seems the best.

TABLE II CLASSIFICATION PERFORMANCES OF UTILIZED LEARNING MODELS IN THE STUDY

MLP Correctly Classified %

Naïve Bayes

J48

95.500

94.500

94.500

Kappa Statistics

0.910

0.890

0.890

F-Measure

0.955

0.945

0.945

ROC-Area

0.968

0.977

0.922

TP-Rate

0.955

0.945

0.945

FP-Rate

0.045

0.055

0.055

Precision

0.955

0.945

0.945

Recall

0.955

0.945

0.945

TABLE III CONFUSION MATRIXES OF MLP (TOP), NAÏVE BAYES (MIDDLE) AND J48 (BOTTOM)

Classified As Closed-Shell Open-Shell

Closed-Shell 95 4

Open-Shell 5 96

Classified As Closed-Shell Open-Shell

Closed-Shell 96 7

Open-Shell 4 93

Classified As Closed-Shell Open-Shell

Closed-Shell 93 4

Open-Shell 7 96

Predicting closed-shell pistachio as an open-shell results in a more detrimental error type II known as severe error or false-negative. Conversely, classifying an open-shell sample as closed is a kind of type-I error and known as false-alarm. In our problem considering the consumer satisfaction, encountering closed-shell pistachios is undesirable. Therefore we should pay attention this constraint while evaluating the confusion matrixes. Analyzing the confusion matrixes in Table III, we can see that MLP classified five closed-shell pistachios as open-shells and thus produced severe error. J48 is even worse because it predicted 7 closed-shell pistachios as open. On the other hand, Naïve Bayes misclassified only four closed-shell samples as open-shells. As a result, regarding minimum false-negative rate, Naïve Bayes is best. Fig. 5 visualizes the J48 decision tree. Note that, one can easily extract the if-then-else rule from the visualized tree. We can write simple Java code snippet below. if (edgeRatio