1
2
Automatic Defect Segmentation of ‘Jonagold’ Apples on Multi-Spectral Images: A
3
Comparative Study
4
D. Unay ∗
5
TCTS Lab., Facult´e Polytechnique de Mons,
6
Multitel Building, Avenue Copernic 1, Parc Initialis, B-7000, Mons, Belgium
7
B. Gosselin
8
TCTS Lab., Facult´e Polytechnique de Mons,
9
Multitel Building, Avenue Copernic 1, Parc Initialis, B-7000, Mons, Belgium
10
Abstract
11
In this work, several thresholding and classification-based techniques were employed
12
for pixel-wise segmentation of surface defects of ‘Jonagold’ apples. Observations
13
showed that segmentation by supervised classifiers was more accurate than the
14
rest. Also, average of class-specific recognition errors was more reliable than error
15
measures based on defect size or global recognition. Segmentation accuracy im-
16
proved when pixels were represented as a neighborhood. Effect of down-sampling
17
on segmentation accuracy and computation times showed that multi-layer percep-
18
trons were the best. Russet was the most difficult defect to segment, whereas flesh
19
damage the least. The proposed method was much more precise on healthy fruit.
20
Key words: defect, segmentation, apple, machine vision, thresholding, classifiers
Preprint submitted to Elsevier Science
19 April 2006
21
1
Introduction
22
Fresh fruit market, providing the link between producers and consumers, has
23
to determine external quality of produce by certain rules. Quality of apple
24
fruit, in particular, depends on size, colour, shape and presence-type of de-
25
fected skin according to the marketing standard of European Commission
26
(Anonymous, 2001). Visual inspection of apples with respect to size and colour
27
by machine vision is already automated in the industry. However, detection of
28
defects is still problematic due to high variance of defect types and presence
29
of stem/calyx concavities.
30
Extracting important objects from image by partitioning it into foreground
31
and background pixels is called segmentation. In order to accurately perform
32
quality inspection of apples by machine vision, robust and precise segmen-
33
tation of defected skin is crucial. Therefore, defect segmentation is the main
34
concern of this paper. Du and Sun (2004) divided image segmentation tech-
35
niques used for food quality evaluation into four groups:
36
Thresholding-based: These techniques partition pixels with respect to an opti-
37
mal value (threshold). They can be further categorized by how the threshold
38
is calculated (global-local, simple-adaptive). Global techniques find one thresh-
39
old value for whole image, whereas local ones calculate different thresholds for
40
each pixel within a neighborhood. Simple threshold is a fixed value usually
41
determined from previous observations, however adaptive techniques calcu-
42
late a new value for each image. Note that, simple techniques can only be ∗ Corresponding author. Tel.: +32 65 374745; fax: +32 65 374729 Email address:
[email protected] (D. Unay). URL: http://www.tcts.fpms.ac.be (D. Unay).
2
43
global, due to the single threshold used. Majority of the works performing
44
defect segmentation of apples by thresholding used simple techniques (Dav-
45
enel et al., 1988; Crowe and Delwiche, 1996; Li et al., 2002; Mehl et al., 2002;
46
Throop et al., 2005). Quasi-spherical shape of apples leads to boundary light
47
reflectance effect, which causes segmentation problems at the boundaries of
48
fruit. In order to eliminate this effect, some researchers performed adaptive
49
spherical transform of images before employing simple thresholding (Tao and
50
Wen, 1999; Wen and Tao, 1999). Surface defects of apples vary not only by
51
size, but also by texture, shape and spectral characteristics, which makes their
52
segmentation difficult by simple techniques. Therefore, Kim et al. (2005) and
53
Bennedsen and Peterson (2005) used adaptive global thresholding, where the
54
latter employed one simple and two adaptive global techniques. Locally adap-
55
tive thresholding, on the other hand, require excessive computation that may
56
limit their practical use in real-time systems, which was probably why we have
57
not encountered any related work in the literature.
58
Region-based: Region-based techniques segment images by finding coherent,
59
homogeneous regions subject to a similarity criterion. They can be divided
60
into two basic classes; merging, which is a bottom-up method continuously
61
grouping sub-regions into larger ones and splitting, which is the top-down
62
version that recursively divides image into smaller regions. In order to seg-
63
ment patch-like defects, Yang (1994) used flooding algorithm, a region-based
64
technique, on monochromatic images of apples. Region-based techniques are
65
computationally more costly than the thresholding-based ones, but they do
66
not require any a priori information about images.
67
Edge-based: These techniques segment images by interpreting gray level dis-
68
continuities using an edge detecting operator and combining these edges into 3
69
contours to be used as region borders. Unfortunately, we did not run across any
70
work focusing on defect segmentation of apples with edge-based techniques.
71
Classification-based: Such techniques attempt to partition pixels into several
72
classes using different classification methods. They can be categorized into
73
two groups, supervised and unsupervised, where output target values of learn-
74
ing samples are provided in the former and missing in the latter. Among
75
supervised methods, Bayesian classification is the most used method by re-
76
searchers (Molt´o et al., 1998; Leemans et al., 1999; Blasco et al., 2003; Kleynen
77
et al., 2005), where pixels were compared to a pre-calculated Bayesian model
78
and classified as defected or healthy. On the other hand, Nakano (1997) intro-
79
duced a neural network based system to classify pixels of apple skin into six
80
classes, one of which was ‘defect’. Unsupervised classification does not benefit
81
any guidance in the learning process due to lacking target values. Such an
82
approach was used by Leemans et al. (1998) to segment defects using multi-
83
spectral images.
84
Above literature survey reveals that in segmenting surface defects of apple
85
fruit, researchers have mainly focused on global thresholding-based approaches
86
and Bayesian-based classification methods. Furthermore, there is no compar-
87
ative work discussing advantages and disadvantages of several segmentation
88
methods in this field. Therefore, novelty of this paper comes in two ways:
89
• some existing (but not yet applied) methods were employed for segmentation
90
of defects; like Isodata and Entropy thresholding (global, adaptive), Niblack
91
thresholding (local, adaptive), k-means classification (unsupervised), neu-
92
ral networks classification (unsupervised/supervised), discriminant analysis,
93
nearest neighbor and support vector machines (supervised). 4
94
• accuracies of several segmentation methods were calculated and compared visually and by performance measures.
95
96
2
Methodology
97
2.1 Database
98
Database contained images of 246 defected and 280 healthy apples of ‘Jon-
99
agold’ variety coming from three different sources: Orchard of Gembloux Agri-
100
cultural University, a private orchard in Gembloux and Belgische Fruit Veil-
101
ing auction in St. Truiden. ‘Jonagold’ variety was selected instead of mono-
102
coloured ones, because it has a bi-coloured skin causing more difficulties in
103
defect segmentation due to colour transition areas. Concerning defected ap-
104
ples; 42 fruit were injured by russet, 55 by bruise (6 natural and 49 artificial),
105
23 by rot, 17 by scald (from sunlight), 47 by hail damage (16 with skin per-
106
foration), 7 by limb rub, 24 by visible flesh damage, 11 by frost damage and
107
20 by other kinds of defects (e.g. scar tissue). Artificial bruises were produced
108
by dropping fruit from 30cm height onto a steel plate. After impact, the ap-
109
ples were stored for about 1-2 hours at room temperature and then imaged.
110
Note that, most related literature waits about 20-24 hours for bruises to fully
111
develop. The purpose behind 1-2 hours of waiting time was to evaluate if we
112
could detect bruises that are not yet fully developed. In order to serve as ref-
113
erence, defected areas of apples within the database were manually segmented
114
by O. Kleynen from the Gembloux Agricultural University of Belgium. These
115
manually segmented images will be referred to as theoretical segmentations
116
from now on in this paper. 5
117
Apples were placed one-by-one (to provide full view of defected or healthy skin)
118
on a conveyor belt inside a diffusely illuminated tunnel, where a high resolution
119
monochrome digital camera, four interference band-pass filters (centered at
120
450, 500, 750, and 800 nm with corresponding bandwidths of 80, 40, 80, and
121
50 nm) and a frame grabber acquired their corresponding filter images. This
122
system was capable of inspecting the fruit only from one-view. Filter images
123
of each fruit had to be separated by alignment based on pattern matching,
124
after which flat field correction was applied to remove vignetting. Finally, each
125
separated filter image was composed of 430x560 pixels with 8 bits-per-pixel
126
resolution. Figure 1 displays some examples from the database with related
127
theoretical segmentations.
128
Assembly of the image acquisition system and collection of the database were
129
done in the Mechanics and Construction Department of Gembloux Agricul-
130
tural University of Belgium. Therefore, readers concerned in more details
131
about image acquisition and database can refer to Kleynen et al. (2003, 2005).
132
2.2 Region-of-interest Extraction
133
As observed from Figure 1, background was lower than the fruit area in in-
134
tensity. Therefore, fruit area was separated from background by thresholding
135
the 750 nm filter image at intensity value of 30. However, this fixed threshold-
136
ing falsely removed some defects, stems or calyxes that are lower in intensity.
137
Hence, morphological filling was applied to eliminate this error.
138
Initial observations revealed that segmentation was problematic at far edges
139
of the fruit probably due to illumination artifacts. Therefore, after background 6
140
removal, fruit area was eroded by a rectangular structuring element with size
141
adaptive to fruit size (15 % of fruit’s bounding-box). Result of this erosion
142
step was the region-of-interest (roi ) defining the fruit area to be inspected.
143
2.3 Feature Extraction
144
Segmentation of defects at pixel level required each pixel to be represented by
145
features. Thus, intensity values within the tested neighborhood of each pixel
146
from four filter images formed its local features. Different neighborhoods tested
147
in this work are displayed in Figure 2. As the neighborhood size increases,
148
the amount of data to be processed for segmentation increases exponentially.
149
Preliminary tests showed that neighborhoods having greater than 8 neighbor-
150
pixels produced extremely overloaded computations and therefore were not
151
tested. Previous work presented by Unay and Gosselin (2004) showed that an
152
additional local feature (relative distance) related to pixels’ location relative to
153
geometric center of roi improved segmentation of defects on the same database.
154
In addition to the local features, average and standard deviation of intensity
155
values over the roi were also calculated from each filter image, making up the
156
global features. Hence, each pixel was represented by 13 + 4 × n features in the
157
feature space, where n refers to the number of neighbor pixels used (Table 1).
158
Feature values were also normalized to fall into the range of [-1,+1] before the
159
defect detection step. 7
160
2.4 Defect Detection
161
2.4.1 By thresholding
162
Thresholding can be applied globally or locally, i.e. within a neighborhood of
163
each pixel. Three global thresholding techniques were tested for defect seg-
164
mentation in this work : Entropy (Kapur et al., 1985) tries to maximize class
165
entropies, Isodata (Ridler and Calvard, 1978) is an iterative scheme based on
166
two-class Gaussian mixture-models and Otsu (Otsu, 1979) focuses on mini-
167
mizing weighted sum of inter-class variances.
168
Local techniques work well if the sizes of objects searched do not vary much,
169
which is unfortunately not the case for defects of apples. Hence, only Niblack ’s
170
method (Niblack, 1986), which adapts threshold to the local mean and stan-
171
dard deviation, was tested. Size of neighborhood used was 23-by-23 pixels.
172
Thresholding techniques are applicable on gray-level images. However in a
173
multi-spectral imaging system providing multiple images per object, one can
174
combine filter images to get a final gray-level image or select one of them by
175
a criterion. As revealing the optimal combination is arduous, in this work we
176
considered using filter images separately.
177
2.4.2 By classification
178
Classifiers using labelled training samples are said to be supervised and those
179
using unlabelled samples are grouped as unsupervised (Duda et al., 2001).
180
Guidance through labels makes supervised classifiers generally more promis-
181
ing in pattern recognition than others. But it is also known that visual char8
182
acteristics of apples can vary as seasons change, which means characteristics
183
of defected and healthy skin can change slowly with time. Thus, unsupervised
184
methods (also known as clustering methods) capable of adapting themselves
185
to such changes might be very robust.
186
Three unsupervised classifiers were employed for segmentation of defects:
187
k-Means, Competitive Neural Networks (CNN) and Self Organizing Feature
188
Maps (SOM). k-Means partitions samples into k classes by minimizing sum-
189
of-squares between samples and the corresponding class centroid. CNN and
190
SOM can recognize groups of similar inputs, where the latter applies data
191
reduction by producing a similarity map of 1 or 2 dimensions. SOM used here
192
had a 5x8 hexagonal topology.
193
Following ten supervised classifiers were tested in this research: k-Nearest
194
Neighbor (k-NN), Linear Discriminant Classifier (LDC), Quadratic Discrim-
195
inant Classifier (QDC), Logistic Regression (LR), Support Vector Machines
196
(SVM), Perceptron, Multi-Layer Perceptrons (MLP), Cascade Forward Neu-
197
ral Networks (CFNN), Elman Neural Networks (ENN) and Learning Vector
198
Quantizers (LVQ). For a test sample, k-NN finds the k closest samples in the
199
training set and assigns it to the most frequent class among these. Observa-
200
tions showed that k = 25 was a good choice. LDC assumes that samples are
201
linearly separable and tries to find the linear decision boundary that separates
202
them. QDC assumes that samples are separable by a hyper-quadratic surface.
203
LR tries to minimize the logarithm of the likelihood ratio of samples. SVM
204
performs non-linear mapping of the feature space to a new space (typically
205
higher dimensioned) through kernels and tries to find a hyperplane in this
206
new space that separates the classes with maximum margin (Burges, 1998).
207
Gaussian radial basis function (rbf) and polynomial kernels were tested in 9
208
this work. Perceptron, composed of a single neuron, is the simplest form of
209
neural networks that can only solve linearly separable problems. MLP is also
210
known as feed-forward networks, because input signals propagate layer-by-
211
layer through the network in forward direction (Haykin, 1994). It performs er-
212
ror back-propagation, where classification errors are propagated back through
213
the network and used to update weights during training. CFNN is similar to
214
MLP, except that the neurons of each subsequent layer has inputs coming
215
from not only the previous layers but also input layer. ENN is a two-layered
216
recurrent network with feedback from the first layer output to the first layer
217
input. LVQ is composed of a competitive layer followed by a linear layer, where
218
the former learns to classify inputs and the latter transforms the outputs of
219
the former into labels defined by user.
220
In each test, classifiers were trained by a training set that was constructed as
221
follows. Assume, m fruit of the database would be used for training, where
222
m1 of them belonged to defect type 1, m2 of them belonged to defect type
223
2,. . . Total number of defect types in our database was 10, so m = m1 + m2 +
224
. . .+m10 . ‘Sample size’, used from now on in this paper, refers to the number of
225
pixels (samples) per defect type per class (healthy-defected) in a training set.
226
size’ From each fruit injured by defect type 1, we randomly selected ‘sample m1
227
size’ defected pixels for training by the help of theoretical healthy and ‘sample m1
228
segmentations. We formed the training set by repeating this selection for each
229
defect type. Consequently, total number of pixels in a training set was 20
230
times ‘sample size’. Such a procedure was employed, because it permitted
231
equal representation of defect types and classes in a training set. Please note
232
that, fruit were never used for training (and validation) and testing at the
233
same in any test in order to prevent possible forced learning. 10
234
Cross-Validation is a training method for supervised classifiers, where a por-
235
tion of training set is separated as validation data and training of the classifier
236
is done on training set while evaluation on validation set. Among the super-
237
vised classifiers Perceptron, MLP, CFNN and ENN permitted cross-validation,
238
thus they were trained with this method.
239
All the artificial neural networks (ANNs) used in this study had 2-layered ar-
240
chitecture with 5 neurons in the hidden layer, if not stated otherwise. Sigmoid
241
neurons were used as long as architecture permitted. Levenberg-Marquardt
242
algorithm, learning rate of 0.01 and maximum epoch number of 200 were used
243
for their training. Above parameters were found optimum after several trials.
244
In this work Discriminant Analysis Toolbox of M. Kiefte (Kiefte, 1999) was
245
used for QDC and LR classification; SVM Light of Joachims (1999) was uti-
246
lized for SVM; Matlab built-in libraries were employed for LDC and all the
247
neural networks classifiers except MLP. Whereas, the rest of the methods (k-
248
means, k-NN and MLP classifiers and all the thresholding ones) were imple-
249
mented by the authors. Furthermore, the whole system was implemented un-
250
der Matlab environment (version 7 R14, The Math Works Inc., M.A., U.S.A.)
251
and tested on a Intel Pentium IV machine with 1.5 GHz CPU and 256 MB
252
memory.
253
2.5 Stem/Calyx Recognition
254
Stem and calyx are natural parts of fruit that show similar spectral charac-
255
teristics with defects. As orientation of fruit was not controlled during image
256
acquisition, these parts were also visible in the images and had to be dis11
257
criminated from real defects. Hence, a highly accurate image processing-based
258
method (Unay and Gosselin, 2007) was employed for this purpose. The method
259
first performed background removal and threshold-based object segmentation.
260
Then, from each object statistical, textural and shape features were extracted
261
and introduced to an SVM classifier in order to eliminate false stems or ca-
262
lyxes. Once stem/calyx regions were accurately found by this method, they
263
were removed from segmented areas providing refined segmentation.
264
2.6 Performance Measures
265
Accuracy of segmentation was calculated by the following measures:
266
• E-1 : This measure presents the percentage of error between the defect size
267
268
269
of theoretical and experimental results. • E-2 : It depicts recognition error by comparing experimental result with the theoretical one pixel-by-pixel.
270
• E-3 : Recognition error assumes that classes are equally represented, which
271
was not true for our case where defect sizes highly varied within the database.
272
Hence, calculating class-specific recognition errors and averaging them in-
273
stead could be more enlightening. If a class did not exist (ex. no defected
274
skin, i.e. fruit in perfect quality), then error was made up of only the other
275
class.
276
Note that these measures were calculated for each test image, whereas error
277
of a test was estimated as the average of measures of all test images. 12
278
3
Results and Discussion
279
3.1 Hold-out Tests
280
Pixel-wise segmentation lead to excessive amount of data to be processed,
281
which was computationally very expensive. Thus initial tests were done by
282
hold-out method, where 12 ,
283
in training, validation and test sets, respectively, providing 116 fruit for train-
284
ing, 38 for validation and 92 for testing. ‘Sample size’ of 2000 was used in
285
these tests.
286
Figure 3 shows performances of all the segmentation methods measured on the
287
test set of hold-out evaluation. All three graphs reveal that global thresholding
288
methods were slightly better than local ones and supervised classifiers were
289
generally more accurate than unsupervised ones. E-1 and E-2 measures show
290
similar results with k-means and SOM being the worst performers, whereas
291
results of E-3 were somewhat different with perceptron being the worst. Thus,
292
it is difficult to determine best method(s), while the error measures show such
293
contradictory responses. However, observations on visual results (an example
294
is seen in Figure 4) revealed some important facts. Perceptron, assigning all
295
pixels as healthy, could not segment defects at all. But it was said to per-
296
form well by E-1 and E-2 measures. Moreover, results of entropy thresholding
297
were worse than those of isodata or otsu, which was confirmed by only E-3.
298
Hence, E-3 measure that calculates weighted form of recognition error, was
299
found to be more accurate than others. It revealed that segmentation of su-
300
pervised methods were better than the rest, except for perceptron. Therefore,
301
LDC, SVM, MLP and CFNN were selected for further tests together with E-3
1 6
and
1 3
of the fruit of each defect type were placed
13
302
measure.
303
3.2 Leave-one-out Tests
304
Leave-one-out evaluation tests response of a classifier for each sample, while
305
training it with the rest. Final error is the average of error rates of all sam-
306
ples. It is said to be more reliable than hold-out when the database is small.
307
Although number of training samples was very high for our case, number of
308
fruit used was not so diverse. Therefore, leave-one-out method was used in the
309
following tests.
310
3.2.1 ‘Sample size’ v.s. segmentation
311
First, the effect of number of training samples on segmentation performance
312
was examined. ‘Sample size’ was varied from 800 to 4000 and the E-3 was
313
calculated using LDC, SVM, MLP and CFNN. Preliminary tests revealed that
314
gaussian rbf kernel performed better than polynomial, thus following results
315
include those of SVM with gaussian rbf. Figure 5 displays result of this test.
316
Performance of LDC, which was the worst among the four, did not depend
317
on ‘sample size’. Error of SVM smoothly decreased as ‘sample size’ increased,
318
whereas those of MLP and CFNN showed unstable oscillations. Lowest errors
319
were performed by MLP in the range [1800-2200]. As ‘sample size’ increased,
320
computational expense increased. So, ‘sample size’ of 1800 seemed a suitable
321
choice for the next tests. 14
322
3.2.2 Neighborhood v.s. segmentation
323
In an image, intensity of a pixel and those of its neighbors are correlated. This
324
correlation information can be used by classifiers to improve segmentation.
325
Thus, effects of different neighborhoods on segmentation performance were
326
tested (Figure 6). As observed, presentation of neighbors decreased segmen-
327
tation error significantly for all four classifiers. However, there was no specific
328
neighborhood type that lead to better segmentation. Thus, using neighbor-
329
hoods having 4 neighbors was more logical to keep computational cost low.
330
And also, it is believed that a pixel is more correlated with its side-neighbors
331
than its corner-ones. Hence, side-neighbors of pixels in 3-by-3 window (wp1 )
332
was a good choice to advance.
333
3.2.3 Down-sampling v.s. segmentation
334
Fruit inspection systems have to be as rapid and accurate as possible to cope
335
with the demands of the industry. Unfortunately, rapidity is achieved mostly
336
with a reduction in accuracy. Down-sampling, for example, provides small-
337
sized images and leads to lower computational load (more rapid) with the
338
compromise of higher segmentation errors (less accurate). Hence, an optimum
339
point should be found.
340
In order to test the effect of sampling on segmentation performance, we down-
341
sampled original images (430x560) by factors of 1, 2 and 3, and then performed
342
segmentation. Down-sampling by a factor f outputs a new image with size
343
equal to the original divided by 2f . Thus, sizes of the new images were 215x280,
344
107x140 and 58x70, respectively. Sampling was achieved by averaging. wp1
345
neighborhood was used for segmentation with ‘sample size’ of 1800. Figure 15
346
7 displays the results, where an increasing rise in segmentation errors was
347
observed as the down-sampling factor increased. Sampling by factor 1 did not
348
affect segmentation. LDC was the most affected classifier with 3 % rise in error
349
at factor 3, whereas MLP was the least with 1.3 % rise at same factor. Notice
350
that, defect shape and size can also influence these results (e.g. small or thin
351
defects may be missed after down-sampling). Our observations revealed that
352
segmentation of russet defects, which were generally thin and elongated, was
353
relatively more erroneous when down-sampling was applied.
354
3.2.4 Computational expense
355
For the search of optimum down-sampling, further test on computation times
356
of classifiers had to be done. Previous tests showed that SVM and MLP were
357
good choices for defect segmentation on apples, hence their computational
358
expense was considered here. Note that, both algorithms were implemented
359
in C language.
360
‘Sample size’ selected for training was 1800 and side-neighbors in 3-by-3 win-
361
dow were used here. The fruit occupying the highest number of pixels in image
362
space was selected for testing. Processing times of both classifiers were mea-
363
sured 30 times for segmentation of this fruit and their averages are displayed in
364
Table 2. Computation times of SVM were extremely high compared to MLP,
365
which was due to the high number of support vectors (around 4500) found by
366
the algorithm. To the contrary, processing time of MLP dropped under 0.3
367
sec as soon as down-sampling was applied. More research on pruning could
368
be done to remove redundant support vectors and have faster SVMs. But it
369
is beyond the scope of this paper. Therefore, MLP seemed more appropriate 16
370
for high-speed inspection of apples recalling that segmentation performances
371
of both classifiers were quite similar.
372
Aim of this project is to build an inspection system capable of processing 10
373
fruit per second. Moreover, inspection of whole fruit surface requires imaging
374
from at least three different locations. Thus, our algorithm should process
375
30 images/sec. In our experimental setup, MLP (with down-sampling by 3)
376
performed closest to this constraint, which is not an utopia to be fulfilled
377
considering the massive and rapid progress of computers.
378
3.2.5 Visual results
379
Figure 8 displays some examples of segmentation executed by MLP with wp1
380
neighborhood on images down-sampled by 3. Flesh damaged skin was correctly
381
found. There was slight over-segmentation in hail damage. Bruise and russet
382
defects seemed more problematic with some false segmentations at the edges.
383
But, in general, segmentations were promising. Note that down-sampling pro-
384
duced loss of details at the edges of defects and fruit.
385
3.2.6 Defect type specific comparison
386
In order to understand which defect types were better/worse segmented by
387
MLP, errors for each defect type are shown in Figure 9. Consistent with the
388
visual results, segmentations of russet were the most erroneous while those
389
of flesh damage the least. Hail damage and scald defects were slightly better
390
segmented than the whole database. Errors of the rest were around 15 %. 17
391
3.2.7 Evaluation on healthy apples
392
Our observations until now showed that MLP was very promising for external
393
defect segmentation of apple fruit. However, evaluation of an inspection system
394
should also be tested on healthy fruit to have a more reliable decision. Hence,
395
performance of our MLP-based segmentation approach (wp1 neighborhood
396
and ‘sample size’ of 1800) was tested on the images of the healthy database
397
(280 apples), which were down-sampled by factor of 3. Segmentation error on
398
these healthy apples was measured as 4.9 %, whereas it was 15.2 % for defected
399
apples. As noticed, MLP was clearly more accurate on healthy apples, but not
400
perfect. Considering that healthy apples make up a large portion of a raw
401
batch under normal conditions Kleynen et al. (2005), segmentation error of
402
our MLP-based system would be slightly higher than 5 % for a raw batch of
403
fruit. Please note that, such false segmentations may be compensated by a
404
powerful fruit grading stage that will classify apples into quality categories.
405
3.3 Ensemble tests
406
Ensemble systems are composed of several classifiers (experts) where final de-
407
cision is based on the outputs of experts. Segmentation performance of such
408
a system can be higher than individual performances of experts, if false seg-
409
mentations of experts are different. Combinations of LDC, SVM, CFNN and
410
MLP classifiers were used as experts, while final decision was made by four
411
approaches: Majority voting, averaging, LDC and single layer perceptrons.
412
Tests with several combinations were done, but there was no significant im-
413
provement in segmentation by ensemble systems. Besides, computational load
414
of ensemble systems is the sum of loads of each expert, which makes them 18
415
impractical for defect segmentation in quality inspection. Hence, no further
416
research was done by ensemble classifiers.
417
4
418
Segmentation of external defects of apples by image processing is a difficult
419
task. Although many works were introduced for this problem, comparative
420
studies involving different segmentation techniques were still missing. To fill
421
this gap, several thresholding and classification-based techniques were imple-
422
mented for defect segmentation on ‘Jonagold’ apples. Performances of classi-
423
fiers were found to surpass those of thresholding methods. Moreover, super-
424
vised classifiers were more accurate than unsupervised ones. Error measures
425
based on defect size or recognition rate could be misleading, whereas class-wise
426
recognition rate was more accurate. Segmentation was observed to be more
427
precise when neighbors of pixels were also used. Due to computational con-
428
straints, down-sampling of images was necessary. But, one has to find optimum
429
sampling factor in order to fulfill computational constraints and have accept-
430
able rates of segmentation errors, both of which are application specific. In this
431
work, even factor of 3 was found to be acceptable, because increase in errors
432
were quite low. In terms of segmentation accuracy and computational expense
433
multi-layer perceptrons were more promising than the other techniques. De-
434
fect type specific results showed that flesh damage was the most accurately
435
segmented defect, while russet was the least. Furthermore, multi-layer per-
436
ceptrons were much more accurate on healthy apples relative to the defected
437
ones. Finally, ensemble classifiers methods were tested for segmentation, but
438
no significant improvement was observed.
Conclusion
19
439
Results of this work showed that among many classification and tresholding-
440
based methods, multi-layer perceptrons were the most promising to be used
441
for segmentation of surface defects in high-speed machine vision-based apple
442
inspection systems.
443
5
444
This research was funded by the General Directorate of Technology, Research
445
and Energy of the Walloon Region of Belgium with Convention No 9813783.
446
The authors would like to thank Prof. M.-F. Destain, O. Kleynen and V.
447
Leemans from Gembloux Agricultural University of Belgium for the image
448
database, and the anonymous reviewers for their invaluable contributions.
449
References
450
Anonymous, August 2001. Commission regulation (ec) no 1619/2001 of 6 au-
451
gust 2001 on marketing standards for apples and pears. Official Journal L
452
215, 0003–0016.
Acknowledgements
453
Bennedsen, B., Peterson, D., 2005. Performance of a system for apple surface
454
defect identification in near-infrared images. Biosystems Engineering 90 (4),
455
419–431.
456
457
458
459
460
Blasco, J., Aleixos, N., Molt´o, E., 2003. Machine vision system for automatic quality grading of fruit. Biosystems Engineering 85 (4), 415–423. Burges, C., 1998. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2, 121–127. Crowe, T., Delwiche, M., 1996. Real-time defect detection in fruit - part ii: 20
461
An algorithm and performance of a protoype system. Transactions of ASAE
462
39 (6), 2309–2317.
463
Davenel, A., Guizard, C., Labarre, T., Sevila, F., 1988. Automatic detection
464
of surface defects on fruit by using a vision system. Journal of Agricultural
465
Engineering Research 41, 1–9.
466
Du, C.-J., Sun, D.-W., 2004. Recent developments in the applications of image
467
processing techniques for food quality evaluation. Trends in Food Science
468
& Technology 15, 230–249.
469
470
471
472
Duda, R., Hart, P., Stork, D., 2001. Pattern classification, 2nd Edition. John Wiley & Sons, Inc. Haykin, S., 1994. Neural Networks: A Comprehensive Foundation. Prentice Hall PTR, Upper Saddle River, NJ, USA.
473
Joachims, T., 1999. Making large-scale svm learning practical. In: Sch¨olkopf,
474
B., Burges, C., Smola, A. (Eds.), Advances in Kernel Methods – Support
475
Vector Learning. MIT Press, pp. 169–185.
476
Kapur, J., Sahoo, P., Wong, A., 1985. A new method for gray-level picture
477
thresholding using the entropy of the histogram. Graphical Models and Im-
478
age Processing 29, 273–285.
479
Kiefte, M., 1999. Discriminant analysis toolbox ver. 0.3 for matlab. [avail-
480
able at] http://www.mathworks.com/matlabcentral/fileexchange/ un-
481
der ‘Statistics and Probability’ category.
482
Kim, M., Lefcourt, A., Chen, Y.-R., Tao, Y., 2005. Automated detection of
483
fecal contamination of apples based on multispectral fluorescence image
484
fusion. Journal of food engineering 71 (1), 85–91.
485
Kleynen, O., Leemans, V., Destain, M.-F., December 2003. Selection of the
486
most efficient wavelength bands for ‘jonagold’ apple sorting. Postharvest
487
Biology and Technology 30 (3), 221–232. 21
488
Kleynen, O., Leemans, V., Destain, M.-F., 2005. Development of a multi-
489
spectral vision system for the detection of defects on apples. Journal of
490
Food Engineering 69 (1), 41–49.
491
Leemans, V., Magein, H., Destain, M.-F., July 1998. Defect segmentation on
492
‘golden delicious’ apples by using colour machine vision. Computers and
493
Electronics in Agriculture 20 (2), 117–130.
494
Leemans, V., Magein, H., Destain, M.-F., June 1999. Defect segmentation on
495
‘jonagold’ apples using colour vision and a bayesian classification method.
496
Computers and Electronics in Agriculture 23 (1), 43–53.
497
Li, Q., Wang, M., Gu, W., November 2002. Computer vision based system for
498
apple surface defect detection. Computers and Electronics in Agriculture
499
36 (2-3), 215–223.
500
Mehl, P., Chao, K., Kim, M., Chen, Y., 2002. Detection of defects on selected
501
apple cultivars using hyperspectral and multispectral image analysis. Ap-
502
plied Engineering in Agriculture 18 (2), 219–226.
503
Molt´o, E., Blasco, J., Benlloch, J., November 1998. Computer vision for auto-
504
matic inspection of agricultural produces. In: SPIE Symposium on Precision
505
Agriculture and Biological Quality. Boston, MA, USA.
506
507
508
509
510
511
Nakano, K., August 1997. Application of neural networks to the color grading of apples. Computers and Electronics in Agriculture 18 (2-3), 105–116. Niblack, W., 1986. An introduction to digital image processing. Prentice-Hall, Englewood Cliffs, NJ. Otsu, N., 1979. A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man and Cybernetics 9 (1), 62–66.
512
Ridler, T., Calvard, S., 1978. Picture thresholding using an iterative selection
513
method. IEEE Transactions on Systems, Man and Cybernetics SMC-8, 630–
514
632. 22
515
516
Tao, Y., Wen, Z., 1999. An adaptive spherical image transform for high-speed fruit defect detection. Transactions of ASAE 42 (1), 241–246.
517
Throop, J., Aneshansley, D., Anger, W., Peterson, D., 2005. Quality evaluation
518
of apples based on surface defects: development of an automated inspection
519
system. Postharvest Biology and Technology 36, 281–290.
520
Unay, D., Gosselin, B., September 2004. A quality sorting method for ‘jon-
521
agold’ apples. In: Proc. Int. Agricultural Engineering Conf. (AgEng). Leu-
522
ven, Belgium.
523
Unay, D., Gosselin, B., 2007. Stem and calyx recognition on ‘jon-
524
agold’ apples by pattern recognition. Journal of Food Engineering DOI:
525
10.1016/j.jfoodeng.2005.10.038 to appear.
526
Wen, Z., Tao, Y., April 1999. Building a rule-based machine-vision system for
527
defect inspection on apple sorting and packing lines. Expert Systems with
528
Applications 16 (3), 307–313.
529
Yang, Q., March 1994. An approach to apple surface feature detection by
530
machine vision. Computers and Electronics in Agriculture 11 (2-3), 249–
531
264.
23
450nm
500nm
750nm
800nm
segmentation
Fig. 1. Examples of apple images and related theoretical segmentations. First four columns present images from different filters, while the last one shows corresponding theoretical segmentations. Rows display apples with different defect types (Top to bottom: bruise, flesh damage and russet). 532
Fig. 2. Neighborhoods tested for defect segmentation. 533
534
535
536
537
538
539
540
24
(a) E-1
(b) E-2
(c) E-3 Fig. 3. Performances of segmentation methods by hold-out measured with E-1, E-2 and E-3.
25
(a) Theoretical
(b) Otsu
(c) Entropy
(d) Niblack
(e) k-Means
(f) Perceptron
(g) k-NN
(h) LDC
(i) SVM
(j) MLP
Fig. 4. Segmentations of some of the methods by hold-out on a fruit with bruise defect. Top-left image displays theoretical segmentation. In each image healthy skin is displayed in white colour, while defected areas in gray.
Fig. 5. Effect of ‘sample size’ on defect segmentation by leave-one-out method. Vertical axis refers to segmentation errors of classifiers. Horizontal axis shows sample used for training.
26
Fig. 6. Effect of neighborhood on defect segmentation by leave-one-out method. Vertical axis refers to segmentation errors of classifiers. Horizontal axis shows different neighborhoods (wp0 refers to neighborhood that provides only center pixel, while others include neighbors of it as well).
Fig. 7. Effect of down-sampling on defect segmentation by leave-one-out method. Vertical axis refers to segmentation errors of classifiers. Horizontal axis shows dimensions of test images (From left-to-right: original image and down-sampled versions by factors 1, 2 and 3.).
27
Fig. 8. Examples of segmentation by MLP with down-sampling factor of 3. Top row displays theoretical segmentations, whereas results of MLP are shown below. In each image healthy skin is displayed in white colour, while defected areas in gray. Examples were from fruit defected by bruise (left-most), flesh damage (mid-left), hail damage (mid-right) and russet (right-most).
Fig. 9. Defects-specific segmentation errors of MLP with down-sampling factor of 3. Number of apples belonging to each defect type is displayed in parenthesis. ‘hail d. p.’ refers to hail damage with perforation and ‘all defects’ indicates whole defected apples.
28
category
local
global
description
quantity
intensity of center pixel
4
intensity of n neighbor pixels
4×n
relative distance
1
average of intensities in roi
4
standard deviation of intensities in roi
4
Table 1 Details of features extracted for defect segmentation. Columns refer to category, description and quantity of features, respectively. Note that, all features (except for relative distance) were computed using each four filter images. 541
computational expense image size
# of samples
SVM(ms)
MLP(ms)
430x560
123662
280341
1081
215x280
31116
70983
290
107x140
7875
18474
101
53x70
1963
5038
51
Table 2 Maximum computation times in milliseconds (ms) observed for SVM and MLP. Each row displays results obtained relative to the dimension of test image (From top-to-bottom: original image and its down-sampled versions by factors 1, 2 and 3.). First two columns refer to the dimensions of test image and the corresponding number of samples (pixels) classified, respectively. 542
29