Automatic Defect Segmentation of 'Jonagold' Apples on ... - TCTS Lab

1

2

Automatic Defect Segmentation of ‘Jonagold’ Apples on Multi-Spectral Images: A

3

Comparative Study

4

D. Unay ∗

5

TCTS Lab., Faculté Polytechnique de Mons,

6

Multitel Building, Avenue Copernic 1, Parc Initialis, B-7000, Mons, Belgium

7

B. Gosselin

8

TCTS Lab., Faculté Polytechnique de Mons,

9

Multitel Building, Avenue Copernic 1, Parc Initialis, B-7000, Mons, Belgium

10

Abstract

11

In this work, several thresholding and classification-based techniques were employed

12

for pixel-wise segmentation of surface defects of ‘Jonagold’ apples. Observations

13

showed that segmentation by supervised classifiers was more accurate than the

14

rest. Also, average of class-specific recognition errors was more reliable than error

15

measures based on defect size or global recognition. Segmentation accuracy im-

16

proved when pixels were represented as a neighborhood. Effect of down-sampling

17

on segmentation accuracy and computation times showed that multi-layer percep-

18

trons were the best. Russet was the most difficult defect to segment, whereas flesh

19

damage the least. The proposed method was much more precise on healthy fruit.

20

Key words: defect, segmentation, apple, machine vision, thresholding, classifiers

Preprint submitted to Elsevier Science

19 April 2006

21

1

Introduction

22

Fresh fruit market, providing the link between producers and consumers, has

23

to determine external quality of produce by certain rules. Quality of apple

24

fruit, in particular, depends on size, colour, shape and presence-type of de-

25

fected skin according to the marketing standard of European Commission

26

(Anonymous, 2001). Visual inspection of apples with respect to size and colour

27

by machine vision is already automated in the industry. However, detection of

28

defects is still problematic due to high variance of defect types and presence

29

of stem/calyx concavities.

30

Extracting important objects from image by partitioning it into foreground

31

and background pixels is called segmentation. In order to accurately perform

32

quality inspection of apples by machine vision, robust and precise segmen-

33

tation of defected skin is crucial. Therefore, defect segmentation is the main

34

concern of this paper. Du and Sun (2004) divided image segmentation tech-

35

niques used for food quality evaluation into four groups:

36

Thresholding-based: These techniques partition pixels with respect to an opti-

37

mal value (threshold). They can be further categorized by how the threshold

38

is calculated (global-local, simple-adaptive). Global techniques find one thresh-

39

old value for whole image, whereas local ones calculate different thresholds for

40

each pixel within a neighborhood. Simple threshold is a fixed value usually

41

determined from previous observations, however adaptive techniques calcu-

42

late a new value for each image. Note that, simple techniques can only be ∗ Corresponding author. Tel.: +32 65 374745; fax: +32 65 374729 Email address: [email protected] (D. Unay). URL: http://www.tcts.fpms.ac.be (D. Unay).

2

43

global, due to the single threshold used. Majority of the works performing

44

defect segmentation of apples by thresholding used simple techniques (Dav-

45

enel et al., 1988; Crowe and Delwiche, 1996; Li et al., 2002; Mehl et al., 2002;

46

Throop et al., 2005). Quasi-spherical shape of apples leads to boundary light

47

reflectance effect, which causes segmentation problems at the boundaries of

48

fruit. In order to eliminate this effect, some researchers performed adaptive

49

spherical transform of images before employing simple thresholding (Tao and

50

Wen, 1999; Wen and Tao, 1999). Surface defects of apples vary not only by

51

size, but also by texture, shape and spectral characteristics, which makes their

52

segmentation difficult by simple techniques. Therefore, Kim et al. (2005) and

53

Bennedsen and Peterson (2005) used adaptive global thresholding, where the

54

latter employed one simple and two adaptive global techniques. Locally adap-

55

tive thresholding, on the other hand, require excessive computation that may

56

limit their practical use in real-time systems, which was probably why we have

57

not encountered any related work in the literature.

58

Region-based: Region-based techniques segment images by finding coherent,

59

homogeneous regions subject to a similarity criterion. They can be divided

60

into two basic classes; merging, which is a bottom-up method continuously

61

grouping sub-regions into larger ones and splitting, which is the top-down

62

version that recursively divides image into smaller regions. In order to seg-

63

ment patch-like defects, Yang (1994) used flooding algorithm, a region-based

64

technique, on monochromatic images of apples. Region-based techniques are

65

computationally more costly than the thresholding-based ones, but they do

66

not require any a priori information about images.

67

Edge-based: These techniques segment images by interpreting gray level dis-

68

continuities using an edge detecting operator and combining these edges into 3

69

contours to be used as region borders. Unfortunately, we did not run across any

70

work focusing on defect segmentation of apples with edge-based techniques.

71

Classification-based: Such techniques attempt to partition pixels into several

72

classes using different classification methods. They can be categorized into

73

two groups, supervised and unsupervised, where output target values of learn-

74

ing samples are provided in the former and missing in the latter. Among

75

supervised methods, Bayesian classification is the most used method by re-

76

searchers (Moltó et al., 1998; Leemans et al., 1999; Blasco et al., 2003; Kleynen

77

et al., 2005), where pixels were compared to a pre-calculated Bayesian model

78

and classified as defected or healthy. On the other hand, Nakano (1997) intro-

79

duced a neural network based system to classify pixels of apple skin into six

80

classes, one of which was ‘defect’. Unsupervised classification does not benefit

81

any guidance in the learning process due to lacking target values. Such an

82

approach was used by Leemans et al. (1998) to segment defects using multi-

83

spectral images.

84

Above literature survey reveals that in segmenting surface defects of apple

85

fruit, researchers have mainly focused on global thresholding-based approaches

86

and Bayesian-based classification methods. Furthermore, there is no compar-

87

ative work discussing advantages and disadvantages of several segmentation

88

methods in this field. Therefore, novelty of this paper comes in two ways:

89

• some existing (but not yet applied) methods were employed for segmentation

90

of defects; like Isodata and Entropy thresholding (global, adaptive), Niblack

91

thresholding (local, adaptive), k-means classification (unsupervised), neu-

92

ral networks classification (unsupervised/supervised), discriminant analysis,

93

nearest neighbor and support vector machines (supervised). 4

94

• accuracies of several segmentation methods were calculated and compared visually and by performance measures.

95

96

2

Methodology

97

2.1 Database

98

Database contained images of 246 defected and 280 healthy apples of ‘Jon-

99

agold’ variety coming from three different sources: Orchard of Gembloux Agri-

100

cultural University, a private orchard in Gembloux and Belgische Fruit Veil-

101

ing auction in St. Truiden. ‘Jonagold’ variety was selected instead of mono-

102

coloured ones, because it has a bi-coloured skin causing more difficulties in

103

defect segmentation due to colour transition areas. Concerning defected ap-

104

ples; 42 fruit were injured by russet, 55 by bruise (6 natural and 49 artificial),

105

23 by rot, 17 by scald (from sunlight), 47 by hail damage (16 with skin per-

106

foration), 7 by limb rub, 24 by visible flesh damage, 11 by frost damage and

107

20 by other kinds of defects (e.g. scar tissue). Artificial bruises were produced

108

by dropping fruit from 30cm height onto a steel plate. After impact, the ap-

109

ples were stored for about 1-2 hours at room temperature and then imaged.

110

Note that, most related literature waits about 20-24 hours for bruises to fully

111

develop. The purpose behind 1-2 hours of waiting time was to evaluate if we

112

could detect bruises that are not yet fully developed. In order to serve as ref-

113

erence, defected areas of apples within the database were manually segmented

114

by O. Kleynen from the Gembloux Agricultural University of Belgium. These

115

manually segmented images will be referred to as theoretical segmentations

116

from now on in this paper. 5

117

Apples were placed one-by-one (to provide full view of defected or healthy skin)

118

on a conveyor belt inside a diffusely illuminated tunnel, where a high resolution

119

monochrome digital camera, four interference band-pass filters (centered at

120

450, 500, 750, and 800 nm with corresponding bandwidths of 80, 40, 80, and

121

50 nm) and a frame grabber acquired their corresponding filter images. This

122

system was capable of inspecting the fruit only from one-view. Filter images

123

of each fruit had to be separated by alignment based on pattern matching,

124

after which flat field correction was applied to remove vignetting. Finally, each

125

separated filter image was composed of 430x560 pixels with 8 bits-per-pixel

126

resolution. Figure 1 displays some examples from the database with related

127

theoretical segmentations.

128

Assembly of the image acquisition system and collection of the database were

129

done in the Mechanics and Construction Department of Gembloux Agricul-

130

tural University of Belgium. Therefore, readers concerned in more details

131

about image acquisition and database can refer to Kleynen et al. (2003, 2005).

132

2.2 Region-of-interest Extraction

133

As observed from Figure 1, background was lower than the fruit area in in-

134

tensity. Therefore, fruit area was separated from background by thresholding

135

the 750 nm filter image at intensity value of 30. However, this fixed threshold-

136

ing falsely removed some defects, stems or calyxes that are lower in intensity.

137

Hence, morphological filling was applied to eliminate this error.

138

Initial observations revealed that segmentation was problematic at far edges

139

of the fruit probably due to illumination artifacts. Therefore, after background 6

140

removal, fruit area was eroded by a rectangular structuring element with size

141

adaptive to fruit size (15 % of fruit’s bounding-box). Result of this erosion

142

step was the region-of-interest (roi ) defining the fruit area to be inspected.

143

2.3 Feature Extraction

144

Segmentation of defects at pixel level required each pixel to be represented by

145

features. Thus, intensity values within the tested neighborhood of each pixel

146

from four filter images formed its local features. Different neighborhoods tested

147

in this work are displayed in Figure 2. As the neighborhood size increases,

148

the amount of data to be processed for segmentation increases exponentially.

149

Preliminary tests showed that neighborhoods having greater than 8 neighbor-

150

pixels produced extremely overloaded computations and therefore were not

151

tested. Previous work presented by Unay and Gosselin (2004) showed that an

152

additional local feature (relative distance) related to pixels’ location relative to

153

geometric center of roi improved segmentation of defects on the same database.

154

In addition to the local features, average and standard deviation of intensity

155

values over the roi were also calculated from each filter image, making up the

156

global features. Hence, each pixel was represented by 13 + 4 × n features in the

157

feature space, where n refers to the number of neighbor pixels used (Table 1).

158

Feature values were also normalized to fall into the range of [-1,+1] before the

159

defect detection step. 7

160

2.4 Defect Detection

161

2.4.1 By thresholding

162

Thresholding can be applied globally or locally, i.e. within a neighborhood of

163

each pixel. Three global thresholding techniques were tested for defect seg-

164

mentation in this work : Entropy (Kapur et al., 1985) tries to maximize class

165

entropies, Isodata (Ridler and Calvard, 1978) is an iterative scheme based on

166

two-class Gaussian mixture-models and Otsu (Otsu, 1979) focuses on mini-

167

mizing weighted sum of inter-class variances.

168

Local techniques work well if the sizes of objects searched do not vary much,

169

which is unfortunately not the case for defects of apples. Hence, only Niblack ’s

170

method (Niblack, 1986), which adapts threshold to the local mean and stan-

171

dard deviation, was tested. Size of neighborhood used was 23-by-23 pixels.

172

Thresholding techniques are applicable on gray-level images. However in a

173

multi-spectral imaging system providing multiple images per object, one can

174

combine filter images to get a final gray-level image or select one of them by

175

a criterion. As revealing the optimal combination is arduous, in this work we

176

considered using filter images separately.

177

2.4.2 By classification

178

Classifiers using labelled training samples are said to be supervised and those

179

using unlabelled samples are grouped as unsupervised (Duda et al., 2001).

180

Guidance through labels makes supervised classifiers generally more promis-

181

ing in pattern recognition than others. But it is also known that visual char8

182

acteristics of apples can vary as seasons change, which means characteristics

183

of defected and healthy skin can change slowly with time. Thus, unsupervised

184

methods (also known as clustering methods) capable of adapting themselves

185

to such changes might be very robust.

186

Three unsupervised classifiers were employed for segmentation of defects:

187

k-Means, Competitive Neural Networks (CNN) and Self Organizing Feature

188

Maps (SOM). k-Means partitions samples into k classes by minimizing sum-

189

of-squares between samples and the corresponding class centroid. CNN and

190

SOM can recognize groups of similar inputs, where the latter applies data

191

reduction by producing a similarity map of 1 or 2 dimensions. SOM used here

192

had a 5x8 hexagonal topology.

193

Following ten supervised classifiers were tested in this research: k-Nearest

194

Neighbor (k-NN), Linear Discriminant Classifier (LDC), Quadratic Discrim-

195

inant Classifier (QDC), Logistic Regression (LR), Support Vector Machines

196

(SVM), Perceptron, Multi-Layer Perceptrons (MLP), Cascade Forward Neu-

197

ral Networks (CFNN), Elman Neural Networks (ENN) and Learning Vector

198

Quantizers (LVQ). For a test sample, k-NN finds the k closest samples in the

199

training set and assigns it to the most frequent class among these. Observa-

200

tions showed that k = 25 was a good choice. LDC assumes that samples are

201

linearly separable and tries to find the linear decision boundary that separates

202

them. QDC assumes that samples are separable by a hyper-quadratic surface.

203

LR tries to minimize the logarithm of the likelihood ratio of samples. SVM

204

performs non-linear mapping of the feature space to a new space (typically

205

higher dimensioned) through kernels and tries to find a hyperplane in this

206

new space that separates the classes with maximum margin (Burges, 1998).

207

Gaussian radial basis function (rbf) and polynomial kernels were tested in 9

208

this work. Perceptron, composed of a single neuron, is the simplest form of

209

neural networks that can only solve linearly separable problems. MLP is also

210

known as feed-forward networks, because input signals propagate layer-by-

211

layer through the network in forward direction (Haykin, 1994). It performs er-

212

ror back-propagation, where classification errors are propagated back through

213

the network and used to update weights during training. CFNN is similar to

214

MLP, except that the neurons of each subsequent layer has inputs coming

215

from not only the previous layers but also input layer. ENN is a two-layered

216

recurrent network with feedback from the first layer output to the first layer

217

input. LVQ is composed of a competitive layer followed by a linear layer, where

218

the former learns to classify inputs and the latter transforms the outputs of

219

the former into labels defined by user.

220

In each test, classifiers were trained by a training set that was constructed as

221

follows. Assume, m fruit of the database would be used for training, where

222

m1 of them belonged to defect type 1, m2 of them belonged to defect type

223

2,. . . Total number of defect types in our database was 10, so m = m1 + m2 +

224

. . .+m10 . ‘Sample size’, used from now on in this paper, refers to the number of

225

pixels (samples) per defect type per class (healthy-defected) in a training set.

226

size’ From each fruit injured by defect type 1, we randomly selected ‘sample m1

227

size’ defected pixels for training by the help of theoretical healthy and ‘sample m1

228

segmentations. We formed the training set by repeating this selection for each

229

defect type. Consequently, total number of pixels in a training set was 20

230

times ‘sample size’. Such a procedure was employed, because it permitted

231

equal representation of defect types and classes in a training set. Please note

232

that, fruit were never used for training (and validation) and testing at the

233

same in any test in order to prevent possible forced learning. 10

234

Cross-Validation is a training method for supervised classifiers, where a por-

235

tion of training set is separated as validation data and training of the classifier

236

is done on training set while evaluation on validation set. Among the super-

237

vised classifiers Perceptron, MLP, CFNN and ENN permitted cross-validation,

238

thus they were trained with this method.

239

All the artificial neural networks (ANNs) used in this study had 2-layered ar-

240

chitecture with 5 neurons in the hidden layer, if not stated otherwise. Sigmoid

241

neurons were used as long as architecture permitted. Levenberg-Marquardt

242

algorithm, learning rate of 0.01 and maximum epoch number of 200 were used

243

for their training. Above parameters were found optimum after several trials.

244

In this work Discriminant Analysis Toolbox of M. Kiefte (Kiefte, 1999) was

245

used for QDC and LR classification; SVM Light of Joachims (1999) was uti-

246

lized for SVM; Matlab built-in libraries were employed for LDC and all the

247

neural networks classifiers except MLP. Whereas, the rest of the methods (k-

248

means, k-NN and MLP classifiers and all the thresholding ones) were imple-

249

mented by the authors. Furthermore, the whole system was implemented un-

250

der Matlab environment (version 7 R14, The Math Works Inc., M.A., U.S.A.)

251

and tested on a Intel Pentium IV machine with 1.5 GHz CPU and 256 MB

252

memory.

253

2.5 Stem/Calyx Recognition

254

Stem and calyx are natural parts of fruit that show similar spectral charac-

255

teristics with defects. As orientation of fruit was not controlled during image

256

acquisition, these parts were also visible in the images and had to be dis11

257

criminated from real defects. Hence, a highly accurate image processing-based

258

method (Unay and Gosselin, 2007) was employed for this purpose. The method

259

first performed background removal and threshold-based object segmentation.

260

Then, from each object statistical, textural and shape features were extracted

261

and introduced to an SVM classifier in order to eliminate false stems or ca-

262

lyxes. Once stem/calyx regions were accurately found by this method, they

263

were removed from segmented areas providing refined segmentation.

264

2.6 Performance Measures

265

Accuracy of segmentation was calculated by the following measures:

266

• E-1 : This measure presents the percentage of error between the defect size

267

268

269

of theoretical and experimental results. • E-2 : It depicts recognition error by comparing experimental result with the theoretical one pixel-by-pixel.

270

• E-3 : Recognition error assumes that classes are equally represented, which

271

was not true for our case where defect sizes highly varied within the database.

272

Hence, calculating class-specific recognition errors and averaging them in-

273

stead could be more enlightening. If a class did not exist (ex. no defected

274

skin, i.e. fruit in perfect quality), then error was made up of only the other

275

class.

276

Note that these measures were calculated for each test image, whereas error

277

of a test was estimated as the average of measures of all test images. 12

278

3

Results and Discussion

279

3.1 Hold-out Tests

280

Pixel-wise segmentation lead to excessive amount of data to be processed,

281

which was computationally very expensive. Thus initial tests were done by

282

hold-out method, where 12 ,

283

in training, validation and test sets, respectively, providing 116 fruit for train-

284

ing, 38 for validation and 92 for testing. ‘Sample size’ of 2000 was used in

285

these tests.

286

Figure 3 shows performances of all the segmentation methods measured on the

287

test set of hold-out evaluation. All three graphs reveal that global thresholding

288

methods were slightly better than local ones and supervised classifiers were

289

generally more accurate than unsupervised ones. E-1 and E-2 measures show

290

similar results with k-means and SOM being the worst performers, whereas

291

results of E-3 were somewhat different with perceptron being the worst. Thus,

292

it is difficult to determine best method(s), while the error measures show such

293

contradictory responses. However, observations on visual results (an example

294

is seen in Figure 4) revealed some important facts. Perceptron, assigning all

295

pixels as healthy, could not segment defects at all. But it was said to per-

296

form well by E-1 and E-2 measures. Moreover, results of entropy thresholding

297

were worse than those of isodata or otsu, which was confirmed by only E-3.

298

Hence, E-3 measure that calculates weighted form of recognition error, was

299

found to be more accurate than others. It revealed that segmentation of su-

300

pervised methods were better than the rest, except for perceptron. Therefore,

301

LDC, SVM, MLP and CFNN were selected for further tests together with E-3

1 6

and

1 3

of the fruit of each defect type were placed

13

302

measure.

303

3.2 Leave-one-out Tests

304

Leave-one-out evaluation tests response of a classifier for each sample, while

305

training it with the rest. Final error is the average of error rates of all sam-

306

ples. It is said to be more reliable than hold-out when the database is small.

307

Although number of training samples was very high for our case, number of

308

fruit used was not so diverse. Therefore, leave-one-out method was used in the

309

following tests.

310

3.2.1 ‘Sample size’ v.s. segmentation

311

First, the effect of number of training samples on segmentation performance

312

was examined. ‘Sample size’ was varied from 800 to 4000 and the E-3 was

313

calculated using LDC, SVM, MLP and CFNN. Preliminary tests revealed that

314

gaussian rbf kernel performed better than polynomial, thus following results

315

include those of SVM with gaussian rbf. Figure 5 displays result of this test.

316

Performance of LDC, which was the worst among the four, did not depend

317

on ‘sample size’. Error of SVM smoothly decreased as ‘sample size’ increased,

318

whereas those of MLP and CFNN showed unstable oscillations. Lowest errors

319

were performed by MLP in the range [1800-2200]. As ‘sample size’ increased,

320

computational expense increased. So, ‘sample size’ of 1800 seemed a suitable

321

choice for the next tests. 14

322

3.2.2 Neighborhood v.s. segmentation

323

In an image, intensity of a pixel and those of its neighbors are correlated. This

324

correlation information can be used by classifiers to improve segmentation.

325

Thus, effects of different neighborhoods on segmentation performance were

326

tested (Figure 6). As observed, presentation of neighbors decreased segmen-

327

tation error significantly for all four classifiers. However, there was no specific

328

neighborhood type that lead to better segmentation. Thus, using neighbor-

329

hoods having 4 neighbors was more logical to keep computational cost low.

330

And also, it is believed that a pixel is more correlated with its side-neighbors

331

than its corner-ones. Hence, side-neighbors of pixels in 3-by-3 window (wp1 )

332

was a good choice to advance.

333

3.2.3 Down-sampling v.s. segmentation

334

Fruit inspection systems have to be as rapid and accurate as possible to cope

335

with the demands of the industry. Unfortunately, rapidity is achieved mostly

336

with a reduction in accuracy. Down-sampling, for example, provides small-

337

sized images and leads to lower computational load (more rapid) with the

338

compromise of higher segmentation errors (less accurate). Hence, an optimum

339

point should be found.

340

In order to test the effect of sampling on segmentation performance, we down-

341

sampled original images (430x560) by factors of 1, 2 and 3, and then performed

342

segmentation. Down-sampling by a factor f outputs a new image with size

343

equal to the original divided by 2f . Thus, sizes of the new images were 215x280,

344

107x140 and 58x70, respectively. Sampling was achieved by averaging. wp1

345

neighborhood was used for segmentation with ‘sample size’ of 1800. Figure 15

346

7 displays the results, where an increasing rise in segmentation errors was

347

observed as the down-sampling factor increased. Sampling by factor 1 did not

348

affect segmentation. LDC was the most affected classifier with 3 % rise in error

349

at factor 3, whereas MLP was the least with 1.3 % rise at same factor. Notice

350

that, defect shape and size can also influence these results (e.g. small or thin

351

defects may be missed after down-sampling). Our observations revealed that

352

segmentation of russet defects, which were generally thin and elongated, was

353

relatively more erroneous when down-sampling was applied.

354

3.2.4 Computational expense

355

For the search of optimum down-sampling, further test on computation times

356

of classifiers had to be done. Previous tests showed that SVM and MLP were

357

good choices for defect segmentation on apples, hence their computational

358

expense was considered here. Note that, both algorithms were implemented

359

in C language.

360

‘Sample size’ selected for training was 1800 and side-neighbors in 3-by-3 win-

361

dow were used here. The fruit occupying the highest number of pixels in image

362

space was selected for testing. Processing times of both classifiers were mea-

363

sured 30 times for segmentation of this fruit and their averages are displayed in

364

Table 2. Computation times of SVM were extremely high compared to MLP,

365

which was due to the high number of support vectors (around 4500) found by

366

the algorithm. To the contrary, processing time of MLP dropped under 0.3

367

sec as soon as down-sampling was applied. More research on pruning could

368

be done to remove redundant support vectors and have faster SVMs. But it

369

is beyond the scope of this paper. Therefore, MLP seemed more appropriate 16

370

for high-speed inspection of apples recalling that segmentation performances

371

of both classifiers were quite similar.

372

Aim of this project is to build an inspection system capable of processing 10

373

fruit per second. Moreover, inspection of whole fruit surface requires imaging

374

from at least three different locations. Thus, our algorithm should process

375

30 images/sec. In our experimental setup, MLP (with down-sampling by 3)

376

performed closest to this constraint, which is not an utopia to be fulfilled

377

considering the massive and rapid progress of computers.

378

3.2.5 Visual results

379

Figure 8 displays some examples of segmentation executed by MLP with wp1

380

neighborhood on images down-sampled by 3. Flesh damaged skin was correctly

381

found. There was slight over-segmentation in hail damage. Bruise and russet

382

defects seemed more problematic with some false segmentations at the edges.

383

But, in general, segmentations were promising. Note that down-sampling pro-

384

duced loss of details at the edges of defects and fruit.

385

3.2.6 Defect type specific comparison

386

In order to understand which defect types were better/worse segmented by

387

MLP, errors for each defect type are shown in Figure 9. Consistent with the

388

visual results, segmentations of russet were the most erroneous while those

389

of flesh damage the least. Hail damage and scald defects were slightly better

390

segmented than the whole database. Errors of the rest were around 15 %. 17

391

3.2.7 Evaluation on healthy apples

392

Our observations until now showed that MLP was very promising for external

393

defect segmentation of apple fruit. However, evaluation of an inspection system

394

should also be tested on healthy fruit to have a more reliable decision. Hence,

395

performance of our MLP-based segmentation approach (wp1 neighborhood

396

and ‘sample size’ of 1800) was tested on the images of the healthy database

397

(280 apples), which were down-sampled by factor of 3. Segmentation error on

398

these healthy apples was measured as 4.9 %, whereas it was 15.2 % for defected

399

apples. As noticed, MLP was clearly more accurate on healthy apples, but not

400

perfect. Considering that healthy apples make up a large portion of a raw

401

batch under normal conditions Kleynen et al. (2005), segmentation error of

402

our MLP-based system would be slightly higher than 5 % for a raw batch of

403

fruit. Please note that, such false segmentations may be compensated by a

404

powerful fruit grading stage that will classify apples into quality categories.

405

3.3 Ensemble tests

406

Ensemble systems are composed of several classifiers (experts) where final de-

407

cision is based on the outputs of experts. Segmentation performance of such

408

a system can be higher than individual performances of experts, if false seg-

409

mentations of experts are different. Combinations of LDC, SVM, CFNN and

410

MLP classifiers were used as experts, while final decision was made by four

411

approaches: Majority voting, averaging, LDC and single layer perceptrons.

412

Tests with several combinations were done, but there was no significant im-

413

provement in segmentation by ensemble systems. Besides, computational load

414

of ensemble systems is the sum of loads of each expert, which makes them 18

415

impractical for defect segmentation in quality inspection. Hence, no further

416

research was done by ensemble classifiers.

417

4

418

Segmentation of external defects of apples by image processing is a difficult

419

task. Although many works were introduced for this problem, comparative

420

studies involving different segmentation techniques were still missing. To fill

421

this gap, several thresholding and classification-based techniques were imple-

422

mented for defect segmentation on ‘Jonagold’ apples. Performances of classi-

423

fiers were found to surpass those of thresholding methods. Moreover, super-

424

vised classifiers were more accurate than unsupervised ones. Error measures

425

based on defect size or recognition rate could be misleading, whereas class-wise

426

recognition rate was more accurate. Segmentation was observed to be more

427

precise when neighbors of pixels were also used. Due to computational con-

428

straints, down-sampling of images was necessary. But, one has to find optimum

429

sampling factor in order to fulfill computational constraints and have accept-

430

able rates of segmentation errors, both of which are application specific. In this

431

work, even factor of 3 was found to be acceptable, because increase in errors

432

were quite low. In terms of segmentation accuracy and computational expense

433

multi-layer perceptrons were more promising than the other techniques. De-

434

fect type specific results showed that flesh damage was the most accurately

435

segmented defect, while russet was the least. Furthermore, multi-layer per-

436

ceptrons were much more accurate on healthy apples relative to the defected

437

ones. Finally, ensemble classifiers methods were tested for segmentation, but

438

no significant improvement was observed.

Conclusion

19

439

Results of this work showed that among many classification and tresholding-

440

based methods, multi-layer perceptrons were the most promising to be used

441

for segmentation of surface defects in high-speed machine vision-based apple

442

inspection systems.

443

5

444

This research was funded by the General Directorate of Technology, Research

445

and Energy of the Walloon Region of Belgium with Convention No 9813783.

446

The authors would like to thank Prof. M.-F. Destain, O. Kleynen and V.

447

Leemans from Gembloux Agricultural University of Belgium for the image

448

database, and the anonymous reviewers for their invaluable contributions.

449

References

450

Anonymous, August 2001. Commission regulation (ec) no 1619/2001 of 6 au-

451

gust 2001 on marketing standards for apples and pears. Official Journal L

452

215, 0003–0016.

Acknowledgements

453

Bennedsen, B., Peterson, D., 2005. Performance of a system for apple surface

454

defect identification in near-infrared images. Biosystems Engineering 90 (4),

455

419–431.

456

457

458

459

460

Blasco, J., Aleixos, N., Moltó, E., 2003. Machine vision system for automatic quality grading of fruit. Biosystems Engineering 85 (4), 415–423. Burges, C., 1998. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2, 121–127. Crowe, T., Delwiche, M., 1996. Real-time defect detection in fruit - part ii: 20

461

An algorithm and performance of a protoype system. Transactions of ASAE

462

39 (6), 2309–2317.

463

Davenel, A., Guizard, C., Labarre, T., Sevila, F., 1988. Automatic detection

464

of surface defects on fruit by using a vision system. Journal of Agricultural

465

Engineering Research 41, 1–9.

466

Du, C.-J., Sun, D.-W., 2004. Recent developments in the applications of image

467

processing techniques for food quality evaluation. Trends in Food Science

468

& Technology 15, 230–249.

469

470

471

472

Duda, R., Hart, P., Stork, D., 2001. Pattern classification, 2nd Edition. John Wiley & Sons, Inc. Haykin, S., 1994. Neural Networks: A Comprehensive Foundation. Prentice Hall PTR, Upper Saddle River, NJ, USA.

473

Joachims, T., 1999. Making large-scale svm learning practical. In: Schölkopf,

474

B., Burges, C., Smola, A. (Eds.), Advances in Kernel Methods – Support

475

Vector Learning. MIT Press, pp. 169–185.

476

Kapur, J., Sahoo, P., Wong, A., 1985. A new method for gray-level picture

477

thresholding using the entropy of the histogram. Graphical Models and Im-

478

age Processing 29, 273–285.

479

Kiefte, M., 1999. Discriminant analysis toolbox ver. 0.3 for matlab. [avail-

480

able at] http://www.mathworks.com/matlabcentral/fileexchange/ un-

481

der ‘Statistics and Probability’ category.

482

Kim, M., Lefcourt, A., Chen, Y.-R., Tao, Y., 2005. Automated detection of

483

fecal contamination of apples based on multispectral fluorescence image

484

fusion. Journal of food engineering 71 (1), 85–91.

485

Kleynen, O., Leemans, V., Destain, M.-F., December 2003. Selection of the

486

most efficient wavelength bands for ‘jonagold’ apple sorting. Postharvest

487

Biology and Technology 30 (3), 221–232. 21

488

Kleynen, O., Leemans, V., Destain, M.-F., 2005. Development of a multi-

489

spectral vision system for the detection of defects on apples. Journal of

490

Food Engineering 69 (1), 41–49.

491

Leemans, V., Magein, H., Destain, M.-F., July 1998. Defect segmentation on

492

‘golden delicious’ apples by using colour machine vision. Computers and

493

Electronics in Agriculture 20 (2), 117–130.

494

Leemans, V., Magein, H., Destain, M.-F., June 1999. Defect segmentation on

495

‘jonagold’ apples using colour vision and a bayesian classification method.

496

Computers and Electronics in Agriculture 23 (1), 43–53.

497

Li, Q., Wang, M., Gu, W., November 2002. Computer vision based system for

498

apple surface defect detection. Computers and Electronics in Agriculture

499

36 (2-3), 215–223.

500

Mehl, P., Chao, K., Kim, M., Chen, Y., 2002. Detection of defects on selected

501

apple cultivars using hyperspectral and multispectral image analysis. Ap-

502

plied Engineering in Agriculture 18 (2), 219–226.

503

Moltó, E., Blasco, J., Benlloch, J., November 1998. Computer vision for auto-

504

matic inspection of agricultural produces. In: SPIE Symposium on Precision

505

Agriculture and Biological Quality. Boston, MA, USA.

506

507

508

509

510

511

Nakano, K., August 1997. Application of neural networks to the color grading of apples. Computers and Electronics in Agriculture 18 (2-3), 105–116. Niblack, W., 1986. An introduction to digital image processing. Prentice-Hall, Englewood Cliffs, NJ. Otsu, N., 1979. A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man and Cybernetics 9 (1), 62–66.

512

Ridler, T., Calvard, S., 1978. Picture thresholding using an iterative selection

513

method. IEEE Transactions on Systems, Man and Cybernetics SMC-8, 630–

514

632. 22

515

516

Tao, Y., Wen, Z., 1999. An adaptive spherical image transform for high-speed fruit defect detection. Transactions of ASAE 42 (1), 241–246.

517

Throop, J., Aneshansley, D., Anger, W., Peterson, D., 2005. Quality evaluation

518

of apples based on surface defects: development of an automated inspection

519

system. Postharvest Biology and Technology 36, 281–290.

520

Unay, D., Gosselin, B., September 2004. A quality sorting method for ‘jon-

521

agold’ apples. In: Proc. Int. Agricultural Engineering Conf. (AgEng). Leu-

522

ven, Belgium.

523

Unay, D., Gosselin, B., 2007. Stem and calyx recognition on ‘jon-

524

agold’ apples by pattern recognition. Journal of Food Engineering DOI:

525

10.1016/j.jfoodeng.2005.10.038 to appear.

526

Wen, Z., Tao, Y., April 1999. Building a rule-based machine-vision system for

527

defect inspection on apple sorting and packing lines. Expert Systems with

528

Applications 16 (3), 307–313.

529

Yang, Q., March 1994. An approach to apple surface feature detection by

530

machine vision. Computers and Electronics in Agriculture 11 (2-3), 249–

531

264.

23

450nm

500nm

750nm

800nm

segmentation

Fig. 1. Examples of apple images and related theoretical segmentations. First four columns present images from different filters, while the last one shows corresponding theoretical segmentations. Rows display apples with different defect types (Top to bottom: bruise, flesh damage and russet). 532

Fig. 2. Neighborhoods tested for defect segmentation. 533

534

535

536

537

538

539

540

24

(a) E-1

(b) E-2

(c) E-3 Fig. 3. Performances of segmentation methods by hold-out measured with E-1, E-2 and E-3.

25

(a) Theoretical

(b) Otsu

(c) Entropy

(d) Niblack

(e) k-Means

(f) Perceptron

(g) k-NN

(h) LDC

(i) SVM

(j) MLP

Fig. 4. Segmentations of some of the methods by hold-out on a fruit with bruise defect. Top-left image displays theoretical segmentation. In each image healthy skin is displayed in white colour, while defected areas in gray.

Fig. 5. Effect of ‘sample size’ on defect segmentation by leave-one-out method. Vertical axis refers to segmentation errors of classifiers. Horizontal axis shows sample used for training.

26

Fig. 6. Effect of neighborhood on defect segmentation by leave-one-out method. Vertical axis refers to segmentation errors of classifiers. Horizontal axis shows different neighborhoods (wp0 refers to neighborhood that provides only center pixel, while others include neighbors of it as well).

Fig. 7. Effect of down-sampling on defect segmentation by leave-one-out method. Vertical axis refers to segmentation errors of classifiers. Horizontal axis shows dimensions of test images (From left-to-right: original image and down-sampled versions by factors 1, 2 and 3.).

27

Fig. 8. Examples of segmentation by MLP with down-sampling factor of 3. Top row displays theoretical segmentations, whereas results of MLP are shown below. In each image healthy skin is displayed in white colour, while defected areas in gray. Examples were from fruit defected by bruise (left-most), flesh damage (mid-left), hail damage (mid-right) and russet (right-most).

Fig. 9. Defects-specific segmentation errors of MLP with down-sampling factor of 3. Number of apples belonging to each defect type is displayed in parenthesis. ‘hail d. p.’ refers to hail damage with perforation and ‘all defects’ indicates whole defected apples.

28

category

local

global

description

quantity

intensity of center pixel

4

intensity of n neighbor pixels

4×n

relative distance

1

average of intensities in roi

4

standard deviation of intensities in roi

4

Table 1 Details of features extracted for defect segmentation. Columns refer to category, description and quantity of features, respectively. Note that, all features (except for relative distance) were computed using each four filter images. 541

computational expense image size

# of samples

SVM(ms)

MLP(ms)

430x560

123662

280341

1081

215x280

31116

70983

290

107x140

7875

18474

101

53x70

1963

5038

51

Table 2 Maximum computation times in milliseconds (ms) observed for SVM and MLP. Each row displays results obtained relative to the dimension of test image (From top-to-bottom: original image and its down-sampled versions by factors 1, 2 and 3.). First two columns refer to the dimensions of test image and the corresponding number of samples (pixels) classified, respectively. 542

29

Automatic Defect Segmentation of 'Jonagold' Apples on ... - TCTS Lab

Automatic Defect Segmentation of 'Jonagold' Apples on ... - TCTS Lab

Suggest Documents

Defect segmentation on 'Jonagold' apples using ... - Semantic Scholar

Apple Defect Segmentation by Artificial Neural Networks - TCTS Lab

HMM-based Speech Segmentation - TCTS Lab

Apple Defect Detection and Quality Classification with ... - TCTS Lab

analysis and automatic recognition of human beatbox ... - TCTS Lab

Automatic Fast Detection of Tumor suspect areas ... - TCTS Lab - FPMs

Artificial Neural Network-based Segmentation and Apple ... - TCTS Lab

Residual Excitation Skewness for Automatic Speech ... - TCTS Lab

A New Online Tool for Automatic Phonetic Alignment - TCTS Lab

Robust Excitation-based Features for Automatic Speech ... - TCTS Lab

Residual Excitation Skewness for Automatic Speech ... - TCTS Lab

EAI Endorsed Transactions - TCTS Lab

On the Mutual Information of Glottal Source Estimation ... - TCTS Lab

Arousal-driven Synthesis of Laughter - TCTS Lab

On the Potential of Glottal Signatures for Speaker ... - TCTS Lab

Tracheoesophageal Speech: a Dedicated Objective ... - TCTS Lab

PARAMETRIC REPRESENTATION FOR SINGING VOICE ... - TCTS Lab

Automatic 3D Segmentation of Ultrasound Images Using ... - FEI Lab

Designing artfully-mediated interactive surfaces ... - TCTS Lab

Towards user-friendly audio creation - TCTS Lab

glottal source estimation robustness - TCTS Lab - FPMs

The AVLaughterCycle Database - TCTS Lab - FPMs

Spatio-Temporal Saliency Based on Rare Model - TCTS Lab

Automatic segmentation of