Discrimination of Wheat Grain Varieties Using ...

2 downloads 0 Views 670KB Size Report
Bologna, June 30 - July 3, Edizioni Avenue Media, Milano, 2008. Venora, G.; Grillo, O.; Shahin, M, A.; Symons, S.J. Identification of Sicilian landraces and.
This article was downloaded by: [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] On: 20 March 2013, At: 16:35 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

International Journal of Food Properties Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/ljfp20

Discrimination of Wheat Grain Varieties Using Image Analysis and Multidimensional Analysis Texture of Grain Mass Piotr Zapotoczny

a

a

Department of Agri-Food Process Engineering, University of Warmia and Mazury in Olsztyn, Heweliusza, 14, 10-718, Olsztyn, Poland Accepted author version posted online: 20 Mar 2013.

To cite this article: Piotr Zapotoczny (2013): Discrimination of Wheat Grain Varieties Using Image Analysis and Multidimensional Analysis Texture of Grain Mass, International Journal of Food Properties, DOI:10.1080/10942912.2011.615085 To link to this article: http://dx.doi.org/10.1080/10942912.2011.615085

Disclaimer: This is a version of an unedited manuscript that has been accepted for publication. As a service to authors and researchers we are providing this version of the accepted manuscript (AM). Copyediting, typesetting, and review of the resulting proof will be undertaken on this manuscript before final publication of the Version of Record (VoR). During production and pre-press, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal relate to this version also.

PLEASE SCROLL DOWN FOR ARTICLE Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae, and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand, or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.

ACCEPTED MANUSCRIPT

2

Discrimination of Wheat Grain Varieties Using Image Analysis and Multidimensional analysis Texture of grain mass

3

Piotr Zapotoczny 1

4 5

Department of Agri-Food Process Engineering, University of Warmia and Mazury in Olsztyn, Heweliusza 14, 10-718 Olsztyn, Poland, e-mail: [email protected]

6

ABSTRACT

7

This paper presents the results of discrimination of 11 wheat grain varieties. The statistical

8

analysis included reduction of variables to a set of 49 textures with the highest discriminating

9

strength and multidimensional analysis. Reduction of variables was performed by the following

10

methods: genetic algorithms (SFFS – sequential forward floating search method) as well as the

11

Class Ranker and Class Rankers Search methods. Furthermore, the multidimensional analysis

12

was performed by methods employing the following classifiers: Bayes, Lazy, Meta, Decision

13

trees and Discriminatory analyses. The classification of individual varieties, regardless of the

14

year of cultivation, was between 98 and 100%.

15

Keywords: digital image analysis, automated kernel grading, image texture analysis, cereal grain

16

classification, bulk samples

Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013

1

17

1

Corresponding author: Department of Agri-Food Process Engineering, University of Warmia and Mazury in Olsztyn, Heweliusza 14, 10-718 Olsztyn, Poland, E – mail address: [email protected]

ACCEPTED MANUSCRIPT 1

Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013

ACCEPTED MANUSCRIPT 18

INTRODUCTION

19

The use of computer visualising systems in the food industry has been growing. They are used,

20

for example, to evaluate product quality, to control processing and to identify varieties. These

21

methods can help identify such features of an object as colour, geometry and surface texture. An

22

image may be acquired with a video camera, a digital camera or scanners(1-3). In recent years,

23

devices for taking 21/2 and 3 D images have been used increasingly often. Typical image analysis

24

is based on acquisition of photographs taken in the electromagnetic radiation whose wavelength

25

ranges from 400 to 700 nm. Owing to the development of digital camera technology, it is now

26

possible to take ultraviolet (below 400 nm) and infrared (above 700 nm) photographs. The latest

27

trends in visualising techniques include hyperspectral photography – photographs of an object

28

are taken in several or even in several dozen spectral channels. Any visualising system must

29

include software for photograph analysis (object identification and separation from the

30

background), the object measurement and statistical analysis of the results (multidimensional

31

analyses, reduction of variables).

32

With a tool like this, researchers can develop visualising systems, which can be used for such

33

purposes as evaluation of food product quality and sorting and identification of varieties of

34

various crops, including grains. Variety identification is important as varieties are usually

35

intended for a specific use. For example, wheat grain can be used for either fodder or food

36

production. Traditional methods of variety identification are still costly, but also time

37

consuming. Therefore, studies should be conducted aimed at constructing a cheap and quick

ACCEPTED MANUSCRIPT 2

Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013

ACCEPTED MANUSCRIPT 38

system, which could identify varieties with the smallest possible error. Such conditions can be

39

met by a visualising system based on a flat scanner and en masse grain analysis.

40

Such research has been carried out for several years

41

technique of digital image recording with a CCD camera to identify different grain species. They

42

isolated three groups of features which determine the physical properties of caryopses, i.e.

43

geometric features, colour and surface texture, and they identified 25 different indexes.

44

Subsequently, they used neural networks to develop a system of identification of grain species.

45

The final recognition effectiveness ranged from 95.7 (wheat) to 92.5% (rice). Authors(8-13) used a

46

flat scanner to identify different species of Indian wheat. Out of the 45 indexes of geometric

47

dimensions and shape they isolated 5 indexes which can be used to identify varieties. Utku

48

made an attempt at developing a system to recognise 31 wheat varieties with a CCD camera.

49

There are systems to perform quality evaluation of agricultural products (15-18). Such a system can

50

identify lentil varieties with an accuracy of 99.8% and beans – with an accuracy of 99.0% and

51

varieties of wheat and wheat products with a similar accuracy.

52

However, those reports do not provide any information about the effectiveness of the statistical

53

models in identifying variety in successive years of cultivation. Developing a model based on

54

data from only one specific year, using neural networks or multidimensional analyses, brings

55

very good results and scientists can build such models. But would such a model – developed for

56

one year – prove useful when used in classification in subsequent years? Another question is

57

whether a model based on varieties harvested in a specific climate and weather conditions would

58

be useful in other conditions. Therefore, the author developed a statistical model to identify

(4-7)

. The authors have employed the

(14)

ACCEPTED MANUSCRIPT 3

Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013

ACCEPTED MANUSCRIPT 59

varieties which would be effective in classification of varieties in successive years of study. The

60

model is based on measuring the texture of en masse images of caryopses. In order to reduce the

61

number of variables, several methods of feature space reduction were employed as well as

62

several discriminatory methods to achieve the best classification model.

63

MATERIALS AND METHODS

64

Grain samples

65

The experimental material comprised treated grain of common spring and winter wheat of four

66

quality classes (elite wheat, prime quality wheat, bread wheat, forage wheat). The experimental

67

material comprised treated grain of common spring and winter wheat of four quality classes

68

(elite wheat, prime quality wheat, bread wheat, forage wheat). The elite wheat (E) includes

69

varieties of a very good milling and baking quality as well as those resistant to overgrowth. Flour

70

from such grain may be used to upgrade bread flour (B). Wheat of A quality features good

71

milling quality and a very good baking quality, it is also resistant to overgrowth. It can be used to

72

improve mixes of wheat grain of poorer quality, however, it should be added in larger volumes

73

than E quality wheat. Bread wheat grain (B) may be used for milling and baking, however, it

74

should usually be mixed with E or A quality wheat. Bread wheat features an average baking

75

value. Wheat varieties which did not qualify to any of the groups (A, B or E) are classified as

76

group C, which comprises all the remaining varieties, including forage wheat. Grain kernel

77

samples were supplied by the Plant Cultivation Centre in Strzelce Sp. z o.o. near Kutno and were

78

cultivated in central Poland.

ACCEPTED MANUSCRIPT 4

Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013

ACCEPTED MANUSCRIPT 79

The study covered three cultivation years (2005, 2006, 2007) and 11 varieties (seven winter and

80

four spring varieties; elite wheat: Torka*, prime quality wheat: Nawra*, Koksa*, Zyta, Sukces,

81

Tonacja, Fregata, bread wheat: Cytra*, Soraja, Nutka, forage wheat: Symfonia.) were analyzed

82

each year at three moisture content levels – 12%, 14% and 16%. Initial moisture content was

83

determined in two replications using the drying method according to Polish standard PN-71A-

84

75101. The samples were ground and placed in a laboratory dryer at a temperature of 100°C for

85

four hours. Samples characterized by low initial moisture content values were hydrated. Water

86

was added, grain was stirred for 24 hours, it was placed in tight plastic containers and stored for

87

48 h at room temperature to ensure equal moisture distribution through the sample. Initial

88

moisture content values were determined after the applied hydration treatment.

89

Image analysis

90

The image acquisition workstation consisted of an EPSON PERFECTION 4490 PHOTO flat scanner

91

connected with a graphic station based on an Intel Pentium D 830 processor and a scanner using

92

SILVERFAST EPSON V 6.4.3 software. Before each series of images was acquired, the scanner was

93

calibrated with an IT8.7/2 template, supplied with the scanner software. This enabled control of

94

the image quality, which is of key importance in image texture measurement. The images were

95

analysed with modified MaZda v 4.3 software

96

modification involved a module for automatic image segmentation which can be used to define

97

the type of ROI (region of interest), to set the channel to be analysed and the type textures

98

measured. The caryopses on the measurement scene were put randomly in a layer whose

99

thickness prevented the scanner light from passing through it (a layer of approx. 20 mm). Before

(27)

. As compared to the original software, its

ACCEPTED MANUSCRIPT 5

Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013

ACCEPTED MANUSCRIPT 100

the textures were measured, 16 ROI were randomly superimposed on an image fed into the

101

software (Fig. 1). Each variety was described by textures from 384 ROI (24 scans *16 ROI).

102

Statistical analysis

103

The statistical analysis of results involved an unsupervised selection of data, followed by

104

multidimensional analysis of the selected data in order to test the possibility of variety

105

classification. The aim of the selection of variables was to reduce the set of 1,960 variables,

106

which describe a single ROI, to a set of the best 49 variables. Such a large number of variables

107

resulted from the calculation of 280 variables from a single channel. Such a great number was

108

initially produced since 7 channels were taken into account. The data for further

109

multidimensional analyses were selected from those available for the 2006 grain. It was assumed

110

that the variables obtained for the year would be used to perform classification of 2005, 2006 and

111

2007 grain. If satisfactory discrimination is achieved for three subsequent years of harvest based

112

on the same set of textures, it will confirm the good quality of the statistical model developed for

113

discrimination of wheat grain varieties.

114

Variables’ reduction

115

Seven methods of variables’ selection were used. The first group comprises genetic algorithms

116

(HGA+Adaptive, HGA+Fixed), and the sequential forward floating search. The methods have

117

been implemented in the HGA.sel software

118

on Class Ranker + InfoGainAttributeEva, Class Ranker + ChiSquaredAttributeEval and Class

(19)

. Another group of methods includes those based

ACCEPTED MANUSCRIPT 6

Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013

ACCEPTED MANUSCRIPT 119

RankersSearch + CfsSubsetEval and Class RankersSearch + ConsistencySubsetEval; in this

120

case, the WEKA v. 3.7 software was used (20).

121

Genetic algorithms

122

The issue of genetic algorithm application has been presented by Pudil et al.

123

study has applied genetic algorithms with two strategies of features’ space searching. The first of

124

them involved seeking the best set in determination of the maximum dimension of the features’

125

space and the optimal dimension was established automatically (Adaptive). The other strategy

126

was based on establishing the exact dimension of the target space (Fixed)

127

seeking method, the algorithm operation parameters were set at: Ripple parameter - 1,

128

Population size - 10, Mutation rate - 0.10, Generation limit - 15, Selection pressure - 0,25,

129

Cross–over points - 3, Number of clusters - 11, Reduced dimensionality – 10.

130

The sequential forward floating search (SFFS)

131

Before the seeking was started in the method, it was necessary to declare the maximum number

132

of features n, which was to be included in the target observation space. In initialising the SFFS

133

procedure, all the features are placed in the Δ set, whereas the Ξ set remains empty. When the

134

algorithm operates, subsequent significant features are found, which results in creating a

135

collection of subsets Ξt, where 1 ≤ t ≤ n, which are the best sets with the dimension of t (19).

(21-22 and 23)

. This

(24)

. Regardless of the

136

ACCEPTED MANUSCRIPT 7

Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013

ACCEPTED MANUSCRIPT 137

Class Ranker, Class RankerSearch

138

A detailed description and assumptions of the selection method applied has been presented by

139

Witten (25). The study employs two methods of selection: Class Ranker and Class RankerSearch.

140

In the first method, the selected attributes were evaluated by the InfoGainAttributeEvaluate

141

method, which involves measuring their information gain with respect to the class. It discretizes

142

numeric attributes first using the MDL-based discretization method (it can be set to binarize

143

them instead). This method, along with the next three, can treat missing as a separate value or

144

distribute the counts among other values in proportion to their frequency (25).

145

Another method was based on the ChiSquared statistics. ChiSquaredAttributeEvaluate evaluates

146

attributes

147

GainRatioAttributeEval evaluates attributes by measuring their gain ratio with respect to the

148

class. (25). In the Class RankerSearch reduction method, the quality of features is evaluated by the

149

CfsSubsetEvaluate and ConsistencySubsetEvaluate method. Application of the methods

150

discussed above provided a set of variables with potentially the greatest discriminating power.

151

Table 1 shows the 4 best variables for each selection method and each colour channel.

152

Finally, variables from the first place on the list were chosen for the multidimensional analysis,

153

with the reservation that if the same texture was selected in the first place in several methods of

154

selection, textures from the next places were selected. This provided a set of 49 variables from

155

all the colour channels and methods of selection. It was assumed that such a procedure would

156

ensure the best possible set of information for further analyses.

by

computing

the

chi-squared

statistic

with

respect

to

the

class.

ACCEPTED MANUSCRIPT 8

Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013

ACCEPTED MANUSCRIPT 157

Multidimensional analysis

158

The variety classification was performed with the use of 7 methods of classification, i.e. Bayes,

159

Lazy, Meta, Decision trees and Discriminatory analyses. Discriminant analysis (stepwise

160

regressive and progressive, as well as the Best subset) was performed using Statistica v 9.0

161

software; the other analyses were performed with WEKA v 3.7. The strategy adopted in

162

developing the statistical model involved division of data sets into subsets according to the

163

methods: cross-validation (k=10), percentage split (30% of the input set), training (the test set

164

was taken from the training set). Division of a data set in discriminant analysis was performed by

165

the cross-validation method (k=10). At that stage, such a method was sought which would

166

ensure the smallest classification error for 11 wheat varieties in successive years of cultivation.

167

RESULTS AND DISCUSSION

168

Reduction of variables

169

Table 1 presents the results of selection of variables for all the channels under analysis. The

170

largest group of textures for the genetic algorithms was calculated by the Co-occurrence matrix

171

method. The selected textures in the Ranker and RankerSearch were usually calculated by the

172

Haar wavelet transform methods, with the exception of the textures selected by the RankSearch

173

for the U, V channel. The best discriminants in that case were the values calculated from the

174

histogram distribution. Fig. 2 shows the categorised diagrams of cases distribution based on

175

selected variables (texture parameters). The discriminating power of selected “raw” variables,

176

without the multidimensional analysis was so great that introducing variables to the model could

ACCEPTED MANUSCRIPT 9

Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013

ACCEPTED MANUSCRIPT 177

be expected to result in an effective final classification. An example could be a case distribution

178

presented in Fig. 2c, where textures calculated from channel S were the grouping variables.

179

Cases in different variables were grouped in an isolated space around its own centre. The Cytra,

180

Koksa and Soraja varieties were sets of points distanced from others. Moreover, the other

181

varieties were clustered around separated space, with their centres close to one another.

182

Multidimensional analysis

183

The results of multidimensional analysis are presented in Table 2. The correctness of

184

classification depending on the method ranged from 86 to 100%, regardless of the year of

185

cultivation. When the training and testing method was used on the same set, the percentage of

186

exact classification was always 100%. This was caused by the fact that the discriminating power

187

of the selected variables was very high, and mainly by the fact that classification of cases on the

188

same set as the training set gives much better results. Therefore, further analyses used

189

classification of sets into teaching and validation ones. In that case, the correctness of

190

classification slightly worsened, but it remained high all the same. The Lazy_IB1 method applied

191

in 2006 gave 100% of correct classifications, whereas the value for 2005 and 2007 calculations

192

was 95%. In general, classification of 2006 varieties was better than that for grains from the

193

other years. This resulted from the fact that the choice of types of textures for classification was

194

based on 2006 data and the textures were used in classification of varieties in all the years.

195

Decision trees (Trees_J48) proved to be the worst method. In that case, 100% correctness of

196

classification was not achieved even for the Training variant, and the classification level was

197

worse by 10% than in the other methods.

ACCEPTED MANUSCRIPT 10

Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013

ACCEPTED MANUSCRIPT 198

Multidimensional analysis – stepwise regressive, progressive, and the best set

199

analysis

200

Table 2 shows the results of discrimination of 11 wheat varieties in different years of cultivation.

201

Wheat discrimination with the use of the methods under discussion was highly effective.

202

Classification error was equal to 8% in the worst case – when the “best subset” method was used

203

(2007). The cumulative error was only 1-2% when the stepwise progressive or regressive method

204

was used. The error occurred only in the Sukces and Nutka varieties, when incorrect

205

classification was performed in individual cases. Classification of the other varieties was 100%

206

correct. In the stepwise analysis, it is possible to evaluate the effect of individual variables on the

207

discriminant ability during the classification. This allows a decision to be made to either continue

208

or to stop introducing variables to the model. Lambda Wilkasa is the statistic which shows the

209

strength of the variables. The lower the value of the statistics, with a high value of statistics F,

210

the higher the discriminating power of the variables fed into the model. Fig. 4 shows the

211

distribution of cases for the 3 best discriminating variables. The grouping of cases based on them

212

was satisfactory; it was possible to distinguish centres for individual varieties. When only 9

213

variables were fed into the model, the value of the statistics was 0.000001 at F=670, which

214

ensured 100% of effective classification at such a low number of varieties (Fig.3).

215

ACCEPTED MANUSCRIPT 11

Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013

ACCEPTED MANUSCRIPT 216

CONCLUSIONS

217

The method of discrimination of grain of wheat varieties provides 100% effectiveness. Based on

218

the textures selected in 2006, it was possible to discriminate 2007 and 2005 varieties. One of the

219

advantages of the method is its quickness. It takes about a minute to place grains on the scanner

220

and to perform the image analysis together with the statistical analysis. These conclusions are

221

true with the reservation that discriminant analysis is performed on variables for a specific year

222

and applies only to that year. Fig. 4a shows the mean values for the selected texture

223

B_WavEnLL_s-8 for the varieties under analysis and 3 years of cultivation. The mean values for

224

the texture for individual years were different. The differences in the parameter level exceeded

225

100% in most cases. The discrepancy between the parameter value for the three years under

226

analysis and the 11 analysed varieties was 4.5 units, which – with small standard deviation – will

227

make discrimination more difficult. Further studies aimed at improving the proposed method will

228

have to produce a model which based on the standardised value of the texture used in

229

classification of varieties in different years of cultivation.

230

ACKNOWLEDGMENTS

231

The authors are grateful for the financial support provided by the Ministry of Scientific Research

232

within the framework of grant no. 1089/P06/2005/29.

233

ACCEPTED MANUSCRIPT 12

Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013

ACCEPTED MANUSCRIPT 234

REFERENCES

235

Seyed, M.A.; Razavia, A.; Rahbaria. R. Computer Image Analysis and Physico-Mechanical

236

Properties of Wild Sage Seed (Salvia macrosiphon). 2010. International Journal of Food

237

Properties. 13 (2), 308-316.

238

Yong H.; Xiaoli, Lia.;Yongni, S. Fast Discrimination of Apple Varieties Using Vis/NIR

239

Spectroscopy. 2007. International Journal of Food Properties. 10 (1), 9-18.

240

Mahmoodia M.; Khazaeia, J.; Narjes. Modeling of Geometric Size Distribution of Almond.

241

2010. International Journal of Food Properties. DOI: 10.1080/10942910903501872.

242

Majumdar, S.; Jayas, D.S. Classification of cereal grains using machine vision: I. Morphology

243

models. American Society of Agricultural Engineering. 2000, 43, 1669-1675.

244

Majumdar, S.; Jayas, D.S. Classification of cereal grains using machine vision: III. Texture

245

Models. Morphology models. American Society of Agricultural Engineering. 2000, 43, 1681-

246

1687.

247

Majumdar, S.; Jayas, D.S. Classification of cereal grains using machine vision: II. Color Models.

248

Morphology models. American Society of Agricultural Engineering. 2000, 43, 1677-1680.

249

Majumdar, S.; Jayas, D.S. Classification of cereal grains using machine vision: VI. Combined

250

Morphology, Color, and Texture Models. American Society of Agricultural Engineering. 2000,

251

43, 1689-1694.

252

Jayas, D.S.; Paliwal, J.; Visen, N.S. Multi-layer neural networks for image analysis of

253

agricultural products. Journal of Agricultural Engineering Research. 2000, 77, 119-128.

ACCEPTED MANUSCRIPT 13

Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013

ACCEPTED MANUSCRIPT 254

Visen, N.S.; Paliwal, J; Jayas, D.S.; White, N.D.G. Specialist neural networks for cereal grain

255

classification. Biosystems Engineering. 2001, 82, 151-159.

256

Visen, N.S.; Shashidhar, N.S.; Paliwal, J.; Jayas, D.S. Identification and segmentation of

257

occluding groups of grain kernels in a grain sample image. Journal of Agricultural Engineering

258

Research. 2002. 79, 159-166.

259

Paliwal, J.; Visen, N.S.; Jayas, D.S. Evaluation of neural network architectures for cereal

260

classification using morphological features. Journal of Agricultural Engineering Research. 2001.

261

79, 361-370.

262

Paliwal, J.; Visen, N.S.; Jayas, D.S.; White, N.D.G. Comparison of a neural network and non-

263

parametric classifier for grain kernel identification. Biosystems Engineering. 2003, 85, 405-413.

264

Paliwal, J.; Visen, N.S.; Jayas, D.S.; White, N.D.G. Cereal grain and dockage id entification

265

using machine vision. Biosystems Engineering. 2003, 85, 51-57.

266

Utku, H. Application of the feature selection method to discriminate digitized wheat arieties..

267

Journal of Food Engineering. 2000, 46, 211-216.

268

Venora, G.; Grillo, O.; Ravalli, C.; Cremonini, R. Identification of Italian landraces of bean

269

(Phaseolus vulgaris L.) using an image analysis system.. Scientia Horticulturae. 2009, 121, 410-

270

418.

271

Venora, G.; Grillo, O.; Saccone, R. Quality assessment of durum wheat storage centres in Sicily:

272

Evaluation of Vitreous, Starchy and Shrunken Kernels using an Image Analysis System. Journal

273

Cereal Science. 2009, 49, 429-440.

ACCEPTED MANUSCRIPT 14

Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013

ACCEPTED MANUSCRIPT 274

Venora, G.; Grillo, O.; Saccone, R.; Ravalli, C. Speck evaluation on commercial spaghetti using

275

an imaging system. in: From Seed to Pasta: The Durum Wheat Chain. International Durum

276

Wheat Symposium. Bologna, June 30 - July 3, Edizioni Avenue Media, Milano, 2008

277

Venora, G.; Grillo, O.; Shahin, M, A.; Symons, S.J. Identification of Sicilian landraces and

278

Canadian cultivars of lentil using image analysis system. Food Research International. 2007, 40,

279

161-166.

280

Klepaczko, A. Zastosowanie algorytmów analizy skupień do selekcji cech dla zadań klasyfikacji

281

cech dla zadań klasyfikacji wektorów danych.. Praca doktorska. 2006, Politechnika Łódzka,

282

Instytut Elektroniki

283

Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten I.H. The WEKA Data

284

Mining Software: An Update; SIGKDD Explorations. 2009, Volume 11, Issue 1.

285

Pudil, P.; Novovičova, J.; Kittler, J. Floating Search Methods in Feature Selection. Pattern

286

Recognition Letters. 1994, 15, 1119–1125.

287

Pudil, P.; Novovičova, J. Novel Methods for Subset Selection with Respect to Problem

288

Knowledge. Intelligent Systems. 1998, 13, 66–74.

289

Pudil, P.; Somol, P. Current Feature Selection Techniques in Statistical Pattern. Advances in

290

Intelligent and Soft Computing. 2005, 30, 53–68,

291

Oh, I. S.; Lee, J. S.; Moon, B. R. Hybrid Genetic Algorithms for Feature Selection. IEEE Trans.

292

Pattern Analysis and Machine Intelligence. 2004, 26, 1424–1437.

ACCEPTED MANUSCRIPT 15

Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013

ACCEPTED MANUSCRIPT 293

Witten, I.H.; Frank, E. Data Mining. Practical Machine Learning Tools and Techniques, Second

294

Edition. Books Elsevier 2005.

295

Szczypiński, P. M.; Strzelecki, M.; Materka, A.; and Klepaczko, A. MaZda-A software package

296

for image texture analysis. Computer Methods and Programs in Biomedicine. 2009, 94, 66–76.

297

ACCEPTED MANUSCRIPT 16

ACCEPTED MANUSCRIPT 298

Table 1

299

Listing of the best variables from different colour channels and selection methods

Selection unsupervised

Selection supervised

Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013

Chan el

Ranker +

Ranker +

RankSearch +

RankSearch +

InfoGainAttri

ChiSquaredAttri

CfsSubsetEva

ConsistencySub

buteEva

buteEval

l

setEval

HGA + HGA

SFFS Fixed

S(0,1)SumA

S(0,2)SumE

verg

ntrp

S(0,1)SumE Y

ntrp S(0,2)SumA verg S(2,0)SumA verg

R

GrMean

S(0,5)SumO fSq S(1,0)SumA

WavEnLL_s_ 8

WavEnLL_s_8

WavEnLL_s_ Perc.99%

Perc.90

7

WavEnLL_s _7 WavEnLL_s_1

WavEnLL_s_

WavEnLL_s_8

8

WavEnLL_s_7

WavEnLL_s_ 7

WavEnLL_s_

WavEnLL_s_

1

1

S(1,0)SumO

WavEnLL_s_

WavEnLL_s_

fSqs

2

3

verg

GrMean

GrNonZeros

S(5,-5)

GrMean

SumAverg

GrKurtosis

135dr_LngR

S(0,1)Entrop y

WavEnLL_s_1 WavEnLL_s_3

WavEnLL_s

WavEnLL_s_

Vertl_LngREmp WavEnLL_s_

WavEnLL_s_8

-8

8

h Perc.90

8

Vertl_Fraction

WavEnHH_

Vertl_LngRE

Vertl_Fractio

WavEnLL_s_1

s-2

mph Perc.90

n

WavEnLL_s_7

WavEnLL_s_8

ACCEPTED MANUSCRIPT 17

ACCEPTED MANUSCRIPT S(2,-2) InvDfMom S(2,0)DifEnt

Emph Vertl_Fracti on

Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013

rp

WavEnHH_

Vertl_Fractio

s-1

n

WavEnLL_s_ 1 WavEnLL_s_

Vertl_LngR

7

Emph

S(5,-

WavEnHH_

WavEnLL_s

WavEnLL_s_

WavEnLL_s_1

WavEnLL_s_

WavEnLL_s_8

5)SumOfSqs

s-1

-8

8

WavEnLL_s_8

8

S_0_5_SumAve

WavEnHH_

S_1_0_SumA

S_4_0_SumAve

S(0,4)SumA

Sigma

verg

s-1 Teta3

G S(3,3)SumOfSqs

Teta2 verg

GrNonZeros

Teta2

S(3,3)Contra st

B

Vertl_Fraction

rg

S_0_4_SumA

S_5_0_SumAve

verg

rg

S_0_5_SumA verg

rg S_3_0_SumAve rg

S_3_0_SumA

S_2_0_SumAve

S_2_0_SumA

verg

rg

verg

S_2_0_SumA verg

WavEnHH_s Teta2

WavEnLL_s

WavEnLL_s_

WavEnLL_s_8

WavEnLL_s_

WavEnLL_s_8

-3

-8

8

S_4_0_SumAve

8

WavEnLL_s_1

WavEnHH_

S_4_0_SumA

rg

WavEnLL_s_

S_5_0_SumAve

s-1

verg

S_1_0_SumAve

1

rg

GrNonZeros

S_3_0_SumA

rg

S_5_0_SumA

S_3_0_SumAve

verg

S_3_0_SumAve

verg

rg

S(5,5)AngScMo m S(0,5)SumV

S(4,4)Correlat S(3,3)DifVarnc

GrVariance

S_5_0_SumA

S_3_0_SumA

ACCEPTED MANUSCRIPT 18

ACCEPTED MANUSCRIPT arnc

S(3,-

S(0,5)InvDf

3)Correlat

verg

rg

verg

Teta3

Teta3

Perc.50

Perc.50

Perc.10

Perc.10

Perc.90

Perc.90

Perc.90

Perc.90

Perc.50

Perc.50

Perc.10

Perc.10

Teta3

Teta3

Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013

Mom

WavEnHH_s WavEnHH_

WavEnLL_s

WavEnLL_s_

WavEnLL_s_8

-1

-8

8

WavEnLL_s_7

WavEnHH_

WavEnLL_s_

WavEnLL_s_6

s-2

7

WavEnHH_

WavEnLL_s_

s-1 Teta2

6

Vertl_ShrtR

s-1 Sigma

Emp U S(0,3)DifEnt rp

Teta3 Teta2

WavEnLL_s_5

S(0,2)SumA

WavEnLL_s_

verg

5

S(0,4)SumO

45dgr_GLev

WavEnLL_s

WavEnLL_s_

WavEnLL_s_8

fSqs

NonU

-8

8

WavEnLL_s_7

WavEnHH_

WavEnLL_s_

s-1

7

S(3,0)DifEnt rp

S(5,5)DifVarnc

V S(1,1)SumV arnc

135dr_LngR S(5,5)DifVar EmphVertl_ nc S(5,0)DifVar nc

Fraction

WavEnLL_s_6 WavEnLL_s_5

WavEnLL_s_ 6 WavEnLL_s_ 5

ACCEPTED MANUSCRIPT 19

Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013

ACCEPTED MANUSCRIPT S(0,5)Entrop

WavEnHH_

WavEnLL_s

WavEnLL_s_

WavEnLL_s_8

y

s1

-8

8

WavEnLL_s_7

S(0,5)AngSc

Teta3

WavEnHH_

WavEnLL_s_

WavEnLL_s_1

s-1 Teta2

7

Mom

Teta2

S

135dr_Fracti

S(3,-

135dr_LngR

3)SumEntrp S

(3,-

Emph

on

WavEnLL_s_2

WavEnLL_s_

Teta3

Teta3

WavEnLL_s_

WavEnLL_s_8

8

Perc.50

Perc.50

WavEnLL_s_ 7

WavEnLL_s_7

6 WavEnLL_s_ 2

3)SumVarnc

300 301

ACCEPTED MANUSCRIPT 20

ACCEPTED MANUSCRIPT 302

Table 2

303 304

Results of multidimensional analysis for 11 wheat varieties cultivated in the years 2005-2007, grain humidity 14%.

Total

ation

Training

CY

NA TOR KOK

TR

dydiscrimin [%]

FRE

FONI

ACJ

GAT

A

A

A

NUT SOR SUK WR

KA* SA* A*

SYM TON KA

ZY

AJA CES

A*

TA

100

100

100

100

100

100

100

100

100

100

100

100

94

95

96

88

95

97

98

79

91

100

98

91

92

93

94

90

89

96

98

69

98

100

97

87

93

100

98

97

100

73

100

84

100

88

93

87

93

100

98

97

100

72

100

84

100

87

92

88

93

100

99

98

100

69

100

84

100

86

93

86

Cross20 validiation 05 Percentage split

Naive Bayes

Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013

Method of

Training

Cross20 validiation 06 Percentage split

ACCEPTED MANUSCRIPT 21

ACCEPTED MANUSCRIPT Training

100

100

100

100

100

100

100

100

100

100

100

100

93

93

95

88

96

97

97

78

91

100

99

91

92

93

94

91

89

97

98

68

94

100

97

88

100

100

100

100

100

100

100

100

100

100

100

100

94

95

96

88

95

97

98

79

91

99

98

91

92

93

94

90

89

96

98

69

98

100

97

87

94

100

92

99

100

83

100

82

100

88

91

89

93

100

99

98

100

80

100

80

99

87

91

89

93

100

98

99

100

82

100

84

100

82

92

88

Cross20

Percentage split

Training

Cross20 validiation 05 Percentage split

BayesNet

Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013

validiation 07

Training

Crossvalidiation 20 06 Percentage

ACCEPTED MANUSCRIPT 22

ACCEPTED MANUSCRIPT split

100

100

100

100

100

100

100

100

100

100

100

100

94

95

96

88

95

97

98

79

91

100

98

91

88

94

88

95

99

85

73

85

93

84

99

78

100

100

100

100

100

100

100

100

100

100

100

100

95

94

96

89

97

98

100

84

97

98

99

89

94

96

93

87

93

98

98

77

100

99

97

90

100

100

100

100

100

100

100

100

100

100

100

100

99

100

100

100

100

99

100

100

100

99

99

99

Cross20 validiation 07 Percentage split

Training

Crossvalidiation

Lazy.IB1

Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013

Training

20 Percentage 05 split

Training

20 Cross06 validiation

ACCEPTED MANUSCRIPT 23

ACCEPTED MANUSCRIPT Percentage 100

100

100

100

100

100

100

100

100

100

100

100

100

100

100

100

100

100

100

100

100

100

100

100

95

94

96

89

97

98

100

84

97

98

99

89

94

96

93

87

93

98

98

77

100

99

97

90

90

84

92

75

96

97

97

76

98

95

98

78

86

84

87

75

96

92

97

71

83

91

97

76

86

j88

89

76

92

91

97

68

73

90

94

84

Training

100

100

100

100

100

100

100

100

100

100

100

100

Cross-

98

100

100

100

100

95

100

94

99

94

97

97

split

Cross20 validiation 07 Percentage split

Training

Cross20 validiation Meta Attribute Ser

Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013

Training

05 Percentage split

20 06

ACCEPTED MANUSCRIPT 24

ACCEPTED MANUSCRIPT validiation

Percentage 97

100

100

100

100

94

100

90

100

92

94

95

99

100

99

100

100

100

98

100

99

98

100

98

94

99

89

97

96

96

91

95

98

83

99

84

93

98

88

99

98

98

88

91

98

83

100

76

90

84

92

75

96

97

97

76

98

95

98

78

86

84

87

75

96

92

97

71

83

91

97

76

86

88

89

76

92

91

97

68

74

90

94

84

99

100

100

100

100

99

100

99

99

99

99

99

Training

Cross20 validiation 07 Percentage split

Training

Crossvalidiation Trees.J48

Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013

split

20 Percentage 05 split

20

Training

ACCEPTED MANUSCRIPT 25

ACCEPTED MANUSCRIPT 06

Cross98

100

100

100

100

95

100

94

99

93

97

96

97

100

100

100

100

92

100

90

100

93

96

95

90

84

92

75

96

97

97

76

98

95

98

78

86

84

87

75

96

92

97

71

83

91

97

76

86

88

89

76

92

91

97

68

74

90

94

84

99

100

100

100

100

100

100

99

100

100

100

100

99

100

100

100

100

100

100

99

100

100

100

100

97

99

99

95

97

98

100

91

100

100

99

90

validiation

split

Training

Cross20 validiation 07 Percentage split

Stepwise progressive Discriminant analysis

Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013

Percentage

20 Stepwise 05 regressive

The

Best

subset

ACCEPTED MANUSCRIPT 26

ACCEPTED MANUSCRIPT Stepwise 98

100

100

100

100

98

100

100

100

100

100

100

98

100

100

100

100

98

100

100

100

100

100

100

98

100

100

100

100

98

100

100

100

100

100

100

99

100

100

100

100

100

100

100

100

99

100

100

99

100

100

100

100

100

100

100

100

99

100

100

92

98

100

98

100

98

80

94

99

80

78

84

progressive

Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013

20 Stepwise 06 regressive

The

Best

subset

Stepwise progressive

20 Stepwise 07 regressive

The

Best

subset

305

*- spring cultivars

306

ACCEPTED MANUSCRIPT 27

ACCEPTED MANUSCRIPT

Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013

307

Fig. 1. An image of kernel setting with ROI

Chanel- R

Chanel - G

Chanel - B

308

Chanel - S

Chanel - U

Chanel - V

309 310

ACCEPTED MANUSCRIPT 28

ACCEPTED MANUSCRIPT

Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013

311 312 313 314 315

Fig. 2 A categorised diagram of the distribution of cases which represent 11 varieties of wheat (the year 2006, humidity 14%). Diagram A and B represents distribution of cases vs. variables from the RGB channel and the Ranker and Ranker +Fixed selection method. Diagram C and D represents distribution of cases vs. variables from channels S and Y and the HGA+Fixed, SFFS+Fixed selection methods.

B

A

316

C

D

317 318

ACCEPTED MANUSCRIPT 29

ACCEPTED MANUSCRIPT

Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013

319 320

Fig. 3 Classification of 11 varieties of wheat (A, C, E – stepwise progressive method, B, D, F – best subset method)

321

322

323 324

ACCEPTED MANUSCRIPT 30

ACCEPTED MANUSCRIPT 325

Fig. 4. Distribution of cases based on three best variables in progressive discriminant analysis.

326 327

Fig. 4a. Mean values and standard deviation of texture B_WavEnLL_s-8 for the years 2005, 2006 and 2007.

Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013

YEAR 2005

YEAR 2006

328

YEAR 2007

329 330 331

ACCEPTED MANUSCRIPT 31