This article was downloaded by: [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] On: 20 March 2013, At: 16:35 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK
International Journal of Food Properties Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/ljfp20
Discrimination of Wheat Grain Varieties Using Image Analysis and Multidimensional Analysis Texture of Grain Mass Piotr Zapotoczny
a
a
Department of Agri-Food Process Engineering, University of Warmia and Mazury in Olsztyn, Heweliusza, 14, 10-718, Olsztyn, Poland Accepted author version posted online: 20 Mar 2013.
To cite this article: Piotr Zapotoczny (2013): Discrimination of Wheat Grain Varieties Using Image Analysis and Multidimensional Analysis Texture of Grain Mass, International Journal of Food Properties, DOI:10.1080/10942912.2011.615085 To link to this article: http://dx.doi.org/10.1080/10942912.2011.615085
Disclaimer: This is a version of an unedited manuscript that has been accepted for publication. As a service to authors and researchers we are providing this version of the accepted manuscript (AM). Copyediting, typesetting, and review of the resulting proof will be undertaken on this manuscript before final publication of the Version of Record (VoR). During production and pre-press, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal relate to this version also.
PLEASE SCROLL DOWN FOR ARTICLE Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae, and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand, or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.
ACCEPTED MANUSCRIPT
2
Discrimination of Wheat Grain Varieties Using Image Analysis and Multidimensional analysis Texture of grain mass
3
Piotr Zapotoczny 1
4 5
Department of Agri-Food Process Engineering, University of Warmia and Mazury in Olsztyn, Heweliusza 14, 10-718 Olsztyn, Poland, e-mail:
[email protected]
6
ABSTRACT
7
This paper presents the results of discrimination of 11 wheat grain varieties. The statistical
8
analysis included reduction of variables to a set of 49 textures with the highest discriminating
9
strength and multidimensional analysis. Reduction of variables was performed by the following
10
methods: genetic algorithms (SFFS – sequential forward floating search method) as well as the
11
Class Ranker and Class Rankers Search methods. Furthermore, the multidimensional analysis
12
was performed by methods employing the following classifiers: Bayes, Lazy, Meta, Decision
13
trees and Discriminatory analyses. The classification of individual varieties, regardless of the
14
year of cultivation, was between 98 and 100%.
15
Keywords: digital image analysis, automated kernel grading, image texture analysis, cereal grain
16
classification, bulk samples
Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013
1
17
1
Corresponding author: Department of Agri-Food Process Engineering, University of Warmia and Mazury in Olsztyn, Heweliusza 14, 10-718 Olsztyn, Poland, E – mail address:
[email protected]
ACCEPTED MANUSCRIPT 1
Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013
ACCEPTED MANUSCRIPT 18
INTRODUCTION
19
The use of computer visualising systems in the food industry has been growing. They are used,
20
for example, to evaluate product quality, to control processing and to identify varieties. These
21
methods can help identify such features of an object as colour, geometry and surface texture. An
22
image may be acquired with a video camera, a digital camera or scanners(1-3). In recent years,
23
devices for taking 21/2 and 3 D images have been used increasingly often. Typical image analysis
24
is based on acquisition of photographs taken in the electromagnetic radiation whose wavelength
25
ranges from 400 to 700 nm. Owing to the development of digital camera technology, it is now
26
possible to take ultraviolet (below 400 nm) and infrared (above 700 nm) photographs. The latest
27
trends in visualising techniques include hyperspectral photography – photographs of an object
28
are taken in several or even in several dozen spectral channels. Any visualising system must
29
include software for photograph analysis (object identification and separation from the
30
background), the object measurement and statistical analysis of the results (multidimensional
31
analyses, reduction of variables).
32
With a tool like this, researchers can develop visualising systems, which can be used for such
33
purposes as evaluation of food product quality and sorting and identification of varieties of
34
various crops, including grains. Variety identification is important as varieties are usually
35
intended for a specific use. For example, wheat grain can be used for either fodder or food
36
production. Traditional methods of variety identification are still costly, but also time
37
consuming. Therefore, studies should be conducted aimed at constructing a cheap and quick
ACCEPTED MANUSCRIPT 2
Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013
ACCEPTED MANUSCRIPT 38
system, which could identify varieties with the smallest possible error. Such conditions can be
39
met by a visualising system based on a flat scanner and en masse grain analysis.
40
Such research has been carried out for several years
41
technique of digital image recording with a CCD camera to identify different grain species. They
42
isolated three groups of features which determine the physical properties of caryopses, i.e.
43
geometric features, colour and surface texture, and they identified 25 different indexes.
44
Subsequently, they used neural networks to develop a system of identification of grain species.
45
The final recognition effectiveness ranged from 95.7 (wheat) to 92.5% (rice). Authors(8-13) used a
46
flat scanner to identify different species of Indian wheat. Out of the 45 indexes of geometric
47
dimensions and shape they isolated 5 indexes which can be used to identify varieties. Utku
48
made an attempt at developing a system to recognise 31 wheat varieties with a CCD camera.
49
There are systems to perform quality evaluation of agricultural products (15-18). Such a system can
50
identify lentil varieties with an accuracy of 99.8% and beans – with an accuracy of 99.0% and
51
varieties of wheat and wheat products with a similar accuracy.
52
However, those reports do not provide any information about the effectiveness of the statistical
53
models in identifying variety in successive years of cultivation. Developing a model based on
54
data from only one specific year, using neural networks or multidimensional analyses, brings
55
very good results and scientists can build such models. But would such a model – developed for
56
one year – prove useful when used in classification in subsequent years? Another question is
57
whether a model based on varieties harvested in a specific climate and weather conditions would
58
be useful in other conditions. Therefore, the author developed a statistical model to identify
(4-7)
. The authors have employed the
(14)
ACCEPTED MANUSCRIPT 3
Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013
ACCEPTED MANUSCRIPT 59
varieties which would be effective in classification of varieties in successive years of study. The
60
model is based on measuring the texture of en masse images of caryopses. In order to reduce the
61
number of variables, several methods of feature space reduction were employed as well as
62
several discriminatory methods to achieve the best classification model.
63
MATERIALS AND METHODS
64
Grain samples
65
The experimental material comprised treated grain of common spring and winter wheat of four
66
quality classes (elite wheat, prime quality wheat, bread wheat, forage wheat). The experimental
67
material comprised treated grain of common spring and winter wheat of four quality classes
68
(elite wheat, prime quality wheat, bread wheat, forage wheat). The elite wheat (E) includes
69
varieties of a very good milling and baking quality as well as those resistant to overgrowth. Flour
70
from such grain may be used to upgrade bread flour (B). Wheat of A quality features good
71
milling quality and a very good baking quality, it is also resistant to overgrowth. It can be used to
72
improve mixes of wheat grain of poorer quality, however, it should be added in larger volumes
73
than E quality wheat. Bread wheat grain (B) may be used for milling and baking, however, it
74
should usually be mixed with E or A quality wheat. Bread wheat features an average baking
75
value. Wheat varieties which did not qualify to any of the groups (A, B or E) are classified as
76
group C, which comprises all the remaining varieties, including forage wheat. Grain kernel
77
samples were supplied by the Plant Cultivation Centre in Strzelce Sp. z o.o. near Kutno and were
78
cultivated in central Poland.
ACCEPTED MANUSCRIPT 4
Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013
ACCEPTED MANUSCRIPT 79
The study covered three cultivation years (2005, 2006, 2007) and 11 varieties (seven winter and
80
four spring varieties; elite wheat: Torka*, prime quality wheat: Nawra*, Koksa*, Zyta, Sukces,
81
Tonacja, Fregata, bread wheat: Cytra*, Soraja, Nutka, forage wheat: Symfonia.) were analyzed
82
each year at three moisture content levels – 12%, 14% and 16%. Initial moisture content was
83
determined in two replications using the drying method according to Polish standard PN-71A-
84
75101. The samples were ground and placed in a laboratory dryer at a temperature of 100°C for
85
four hours. Samples characterized by low initial moisture content values were hydrated. Water
86
was added, grain was stirred for 24 hours, it was placed in tight plastic containers and stored for
87
48 h at room temperature to ensure equal moisture distribution through the sample. Initial
88
moisture content values were determined after the applied hydration treatment.
89
Image analysis
90
The image acquisition workstation consisted of an EPSON PERFECTION 4490 PHOTO flat scanner
91
connected with a graphic station based on an Intel Pentium D 830 processor and a scanner using
92
SILVERFAST EPSON V 6.4.3 software. Before each series of images was acquired, the scanner was
93
calibrated with an IT8.7/2 template, supplied with the scanner software. This enabled control of
94
the image quality, which is of key importance in image texture measurement. The images were
95
analysed with modified MaZda v 4.3 software
96
modification involved a module for automatic image segmentation which can be used to define
97
the type of ROI (region of interest), to set the channel to be analysed and the type textures
98
measured. The caryopses on the measurement scene were put randomly in a layer whose
99
thickness prevented the scanner light from passing through it (a layer of approx. 20 mm). Before
(27)
. As compared to the original software, its
ACCEPTED MANUSCRIPT 5
Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013
ACCEPTED MANUSCRIPT 100
the textures were measured, 16 ROI were randomly superimposed on an image fed into the
101
software (Fig. 1). Each variety was described by textures from 384 ROI (24 scans *16 ROI).
102
Statistical analysis
103
The statistical analysis of results involved an unsupervised selection of data, followed by
104
multidimensional analysis of the selected data in order to test the possibility of variety
105
classification. The aim of the selection of variables was to reduce the set of 1,960 variables,
106
which describe a single ROI, to a set of the best 49 variables. Such a large number of variables
107
resulted from the calculation of 280 variables from a single channel. Such a great number was
108
initially produced since 7 channels were taken into account. The data for further
109
multidimensional analyses were selected from those available for the 2006 grain. It was assumed
110
that the variables obtained for the year would be used to perform classification of 2005, 2006 and
111
2007 grain. If satisfactory discrimination is achieved for three subsequent years of harvest based
112
on the same set of textures, it will confirm the good quality of the statistical model developed for
113
discrimination of wheat grain varieties.
114
Variables’ reduction
115
Seven methods of variables’ selection were used. The first group comprises genetic algorithms
116
(HGA+Adaptive, HGA+Fixed), and the sequential forward floating search. The methods have
117
been implemented in the HGA.sel software
118
on Class Ranker + InfoGainAttributeEva, Class Ranker + ChiSquaredAttributeEval and Class
(19)
. Another group of methods includes those based
ACCEPTED MANUSCRIPT 6
Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013
ACCEPTED MANUSCRIPT 119
RankersSearch + CfsSubsetEval and Class RankersSearch + ConsistencySubsetEval; in this
120
case, the WEKA v. 3.7 software was used (20).
121
Genetic algorithms
122
The issue of genetic algorithm application has been presented by Pudil et al.
123
study has applied genetic algorithms with two strategies of features’ space searching. The first of
124
them involved seeking the best set in determination of the maximum dimension of the features’
125
space and the optimal dimension was established automatically (Adaptive). The other strategy
126
was based on establishing the exact dimension of the target space (Fixed)
127
seeking method, the algorithm operation parameters were set at: Ripple parameter - 1,
128
Population size - 10, Mutation rate - 0.10, Generation limit - 15, Selection pressure - 0,25,
129
Cross–over points - 3, Number of clusters - 11, Reduced dimensionality – 10.
130
The sequential forward floating search (SFFS)
131
Before the seeking was started in the method, it was necessary to declare the maximum number
132
of features n, which was to be included in the target observation space. In initialising the SFFS
133
procedure, all the features are placed in the Δ set, whereas the Ξ set remains empty. When the
134
algorithm operates, subsequent significant features are found, which results in creating a
135
collection of subsets Ξt, where 1 ≤ t ≤ n, which are the best sets with the dimension of t (19).
(21-22 and 23)
. This
(24)
. Regardless of the
136
ACCEPTED MANUSCRIPT 7
Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013
ACCEPTED MANUSCRIPT 137
Class Ranker, Class RankerSearch
138
A detailed description and assumptions of the selection method applied has been presented by
139
Witten (25). The study employs two methods of selection: Class Ranker and Class RankerSearch.
140
In the first method, the selected attributes were evaluated by the InfoGainAttributeEvaluate
141
method, which involves measuring their information gain with respect to the class. It discretizes
142
numeric attributes first using the MDL-based discretization method (it can be set to binarize
143
them instead). This method, along with the next three, can treat missing as a separate value or
144
distribute the counts among other values in proportion to their frequency (25).
145
Another method was based on the ChiSquared statistics. ChiSquaredAttributeEvaluate evaluates
146
attributes
147
GainRatioAttributeEval evaluates attributes by measuring their gain ratio with respect to the
148
class. (25). In the Class RankerSearch reduction method, the quality of features is evaluated by the
149
CfsSubsetEvaluate and ConsistencySubsetEvaluate method. Application of the methods
150
discussed above provided a set of variables with potentially the greatest discriminating power.
151
Table 1 shows the 4 best variables for each selection method and each colour channel.
152
Finally, variables from the first place on the list were chosen for the multidimensional analysis,
153
with the reservation that if the same texture was selected in the first place in several methods of
154
selection, textures from the next places were selected. This provided a set of 49 variables from
155
all the colour channels and methods of selection. It was assumed that such a procedure would
156
ensure the best possible set of information for further analyses.
by
computing
the
chi-squared
statistic
with
respect
to
the
class.
ACCEPTED MANUSCRIPT 8
Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013
ACCEPTED MANUSCRIPT 157
Multidimensional analysis
158
The variety classification was performed with the use of 7 methods of classification, i.e. Bayes,
159
Lazy, Meta, Decision trees and Discriminatory analyses. Discriminant analysis (stepwise
160
regressive and progressive, as well as the Best subset) was performed using Statistica v 9.0
161
software; the other analyses were performed with WEKA v 3.7. The strategy adopted in
162
developing the statistical model involved division of data sets into subsets according to the
163
methods: cross-validation (k=10), percentage split (30% of the input set), training (the test set
164
was taken from the training set). Division of a data set in discriminant analysis was performed by
165
the cross-validation method (k=10). At that stage, such a method was sought which would
166
ensure the smallest classification error for 11 wheat varieties in successive years of cultivation.
167
RESULTS AND DISCUSSION
168
Reduction of variables
169
Table 1 presents the results of selection of variables for all the channels under analysis. The
170
largest group of textures for the genetic algorithms was calculated by the Co-occurrence matrix
171
method. The selected textures in the Ranker and RankerSearch were usually calculated by the
172
Haar wavelet transform methods, with the exception of the textures selected by the RankSearch
173
for the U, V channel. The best discriminants in that case were the values calculated from the
174
histogram distribution. Fig. 2 shows the categorised diagrams of cases distribution based on
175
selected variables (texture parameters). The discriminating power of selected “raw” variables,
176
without the multidimensional analysis was so great that introducing variables to the model could
ACCEPTED MANUSCRIPT 9
Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013
ACCEPTED MANUSCRIPT 177
be expected to result in an effective final classification. An example could be a case distribution
178
presented in Fig. 2c, where textures calculated from channel S were the grouping variables.
179
Cases in different variables were grouped in an isolated space around its own centre. The Cytra,
180
Koksa and Soraja varieties were sets of points distanced from others. Moreover, the other
181
varieties were clustered around separated space, with their centres close to one another.
182
Multidimensional analysis
183
The results of multidimensional analysis are presented in Table 2. The correctness of
184
classification depending on the method ranged from 86 to 100%, regardless of the year of
185
cultivation. When the training and testing method was used on the same set, the percentage of
186
exact classification was always 100%. This was caused by the fact that the discriminating power
187
of the selected variables was very high, and mainly by the fact that classification of cases on the
188
same set as the training set gives much better results. Therefore, further analyses used
189
classification of sets into teaching and validation ones. In that case, the correctness of
190
classification slightly worsened, but it remained high all the same. The Lazy_IB1 method applied
191
in 2006 gave 100% of correct classifications, whereas the value for 2005 and 2007 calculations
192
was 95%. In general, classification of 2006 varieties was better than that for grains from the
193
other years. This resulted from the fact that the choice of types of textures for classification was
194
based on 2006 data and the textures were used in classification of varieties in all the years.
195
Decision trees (Trees_J48) proved to be the worst method. In that case, 100% correctness of
196
classification was not achieved even for the Training variant, and the classification level was
197
worse by 10% than in the other methods.
ACCEPTED MANUSCRIPT 10
Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013
ACCEPTED MANUSCRIPT 198
Multidimensional analysis – stepwise regressive, progressive, and the best set
199
analysis
200
Table 2 shows the results of discrimination of 11 wheat varieties in different years of cultivation.
201
Wheat discrimination with the use of the methods under discussion was highly effective.
202
Classification error was equal to 8% in the worst case – when the “best subset” method was used
203
(2007). The cumulative error was only 1-2% when the stepwise progressive or regressive method
204
was used. The error occurred only in the Sukces and Nutka varieties, when incorrect
205
classification was performed in individual cases. Classification of the other varieties was 100%
206
correct. In the stepwise analysis, it is possible to evaluate the effect of individual variables on the
207
discriminant ability during the classification. This allows a decision to be made to either continue
208
or to stop introducing variables to the model. Lambda Wilkasa is the statistic which shows the
209
strength of the variables. The lower the value of the statistics, with a high value of statistics F,
210
the higher the discriminating power of the variables fed into the model. Fig. 4 shows the
211
distribution of cases for the 3 best discriminating variables. The grouping of cases based on them
212
was satisfactory; it was possible to distinguish centres for individual varieties. When only 9
213
variables were fed into the model, the value of the statistics was 0.000001 at F=670, which
214
ensured 100% of effective classification at such a low number of varieties (Fig.3).
215
ACCEPTED MANUSCRIPT 11
Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013
ACCEPTED MANUSCRIPT 216
CONCLUSIONS
217
The method of discrimination of grain of wheat varieties provides 100% effectiveness. Based on
218
the textures selected in 2006, it was possible to discriminate 2007 and 2005 varieties. One of the
219
advantages of the method is its quickness. It takes about a minute to place grains on the scanner
220
and to perform the image analysis together with the statistical analysis. These conclusions are
221
true with the reservation that discriminant analysis is performed on variables for a specific year
222
and applies only to that year. Fig. 4a shows the mean values for the selected texture
223
B_WavEnLL_s-8 for the varieties under analysis and 3 years of cultivation. The mean values for
224
the texture for individual years were different. The differences in the parameter level exceeded
225
100% in most cases. The discrepancy between the parameter value for the three years under
226
analysis and the 11 analysed varieties was 4.5 units, which – with small standard deviation – will
227
make discrimination more difficult. Further studies aimed at improving the proposed method will
228
have to produce a model which based on the standardised value of the texture used in
229
classification of varieties in different years of cultivation.
230
ACKNOWLEDGMENTS
231
The authors are grateful for the financial support provided by the Ministry of Scientific Research
232
within the framework of grant no. 1089/P06/2005/29.
233
ACCEPTED MANUSCRIPT 12
Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013
ACCEPTED MANUSCRIPT 234
REFERENCES
235
Seyed, M.A.; Razavia, A.; Rahbaria. R. Computer Image Analysis and Physico-Mechanical
236
Properties of Wild Sage Seed (Salvia macrosiphon). 2010. International Journal of Food
237
Properties. 13 (2), 308-316.
238
Yong H.; Xiaoli, Lia.;Yongni, S. Fast Discrimination of Apple Varieties Using Vis/NIR
239
Spectroscopy. 2007. International Journal of Food Properties. 10 (1), 9-18.
240
Mahmoodia M.; Khazaeia, J.; Narjes. Modeling of Geometric Size Distribution of Almond.
241
2010. International Journal of Food Properties. DOI: 10.1080/10942910903501872.
242
Majumdar, S.; Jayas, D.S. Classification of cereal grains using machine vision: I. Morphology
243
models. American Society of Agricultural Engineering. 2000, 43, 1669-1675.
244
Majumdar, S.; Jayas, D.S. Classification of cereal grains using machine vision: III. Texture
245
Models. Morphology models. American Society of Agricultural Engineering. 2000, 43, 1681-
246
1687.
247
Majumdar, S.; Jayas, D.S. Classification of cereal grains using machine vision: II. Color Models.
248
Morphology models. American Society of Agricultural Engineering. 2000, 43, 1677-1680.
249
Majumdar, S.; Jayas, D.S. Classification of cereal grains using machine vision: VI. Combined
250
Morphology, Color, and Texture Models. American Society of Agricultural Engineering. 2000,
251
43, 1689-1694.
252
Jayas, D.S.; Paliwal, J.; Visen, N.S. Multi-layer neural networks for image analysis of
253
agricultural products. Journal of Agricultural Engineering Research. 2000, 77, 119-128.
ACCEPTED MANUSCRIPT 13
Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013
ACCEPTED MANUSCRIPT 254
Visen, N.S.; Paliwal, J; Jayas, D.S.; White, N.D.G. Specialist neural networks for cereal grain
255
classification. Biosystems Engineering. 2001, 82, 151-159.
256
Visen, N.S.; Shashidhar, N.S.; Paliwal, J.; Jayas, D.S. Identification and segmentation of
257
occluding groups of grain kernels in a grain sample image. Journal of Agricultural Engineering
258
Research. 2002. 79, 159-166.
259
Paliwal, J.; Visen, N.S.; Jayas, D.S. Evaluation of neural network architectures for cereal
260
classification using morphological features. Journal of Agricultural Engineering Research. 2001.
261
79, 361-370.
262
Paliwal, J.; Visen, N.S.; Jayas, D.S.; White, N.D.G. Comparison of a neural network and non-
263
parametric classifier for grain kernel identification. Biosystems Engineering. 2003, 85, 405-413.
264
Paliwal, J.; Visen, N.S.; Jayas, D.S.; White, N.D.G. Cereal grain and dockage id entification
265
using machine vision. Biosystems Engineering. 2003, 85, 51-57.
266
Utku, H. Application of the feature selection method to discriminate digitized wheat arieties..
267
Journal of Food Engineering. 2000, 46, 211-216.
268
Venora, G.; Grillo, O.; Ravalli, C.; Cremonini, R. Identification of Italian landraces of bean
269
(Phaseolus vulgaris L.) using an image analysis system.. Scientia Horticulturae. 2009, 121, 410-
270
418.
271
Venora, G.; Grillo, O.; Saccone, R. Quality assessment of durum wheat storage centres in Sicily:
272
Evaluation of Vitreous, Starchy and Shrunken Kernels using an Image Analysis System. Journal
273
Cereal Science. 2009, 49, 429-440.
ACCEPTED MANUSCRIPT 14
Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013
ACCEPTED MANUSCRIPT 274
Venora, G.; Grillo, O.; Saccone, R.; Ravalli, C. Speck evaluation on commercial spaghetti using
275
an imaging system. in: From Seed to Pasta: The Durum Wheat Chain. International Durum
276
Wheat Symposium. Bologna, June 30 - July 3, Edizioni Avenue Media, Milano, 2008
277
Venora, G.; Grillo, O.; Shahin, M, A.; Symons, S.J. Identification of Sicilian landraces and
278
Canadian cultivars of lentil using image analysis system. Food Research International. 2007, 40,
279
161-166.
280
Klepaczko, A. Zastosowanie algorytmów analizy skupień do selekcji cech dla zadań klasyfikacji
281
cech dla zadań klasyfikacji wektorów danych.. Praca doktorska. 2006, Politechnika Łódzka,
282
Instytut Elektroniki
283
Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten I.H. The WEKA Data
284
Mining Software: An Update; SIGKDD Explorations. 2009, Volume 11, Issue 1.
285
Pudil, P.; Novovičova, J.; Kittler, J. Floating Search Methods in Feature Selection. Pattern
286
Recognition Letters. 1994, 15, 1119–1125.
287
Pudil, P.; Novovičova, J. Novel Methods for Subset Selection with Respect to Problem
288
Knowledge. Intelligent Systems. 1998, 13, 66–74.
289
Pudil, P.; Somol, P. Current Feature Selection Techniques in Statistical Pattern. Advances in
290
Intelligent and Soft Computing. 2005, 30, 53–68,
291
Oh, I. S.; Lee, J. S.; Moon, B. R. Hybrid Genetic Algorithms for Feature Selection. IEEE Trans.
292
Pattern Analysis and Machine Intelligence. 2004, 26, 1424–1437.
ACCEPTED MANUSCRIPT 15
Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013
ACCEPTED MANUSCRIPT 293
Witten, I.H.; Frank, E. Data Mining. Practical Machine Learning Tools and Techniques, Second
294
Edition. Books Elsevier 2005.
295
Szczypiński, P. M.; Strzelecki, M.; Materka, A.; and Klepaczko, A. MaZda-A software package
296
for image texture analysis. Computer Methods and Programs in Biomedicine. 2009, 94, 66–76.
297
ACCEPTED MANUSCRIPT 16
ACCEPTED MANUSCRIPT 298
Table 1
299
Listing of the best variables from different colour channels and selection methods
Selection unsupervised
Selection supervised
Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013
Chan el
Ranker +
Ranker +
RankSearch +
RankSearch +
InfoGainAttri
ChiSquaredAttri
CfsSubsetEva
ConsistencySub
buteEva
buteEval
l
setEval
HGA + HGA
SFFS Fixed
S(0,1)SumA
S(0,2)SumE
verg
ntrp
S(0,1)SumE Y
ntrp S(0,2)SumA verg S(2,0)SumA verg
R
GrMean
S(0,5)SumO fSq S(1,0)SumA
WavEnLL_s_ 8
WavEnLL_s_8
WavEnLL_s_ Perc.99%
Perc.90
7
WavEnLL_s _7 WavEnLL_s_1
WavEnLL_s_
WavEnLL_s_8
8
WavEnLL_s_7
WavEnLL_s_ 7
WavEnLL_s_
WavEnLL_s_
1
1
S(1,0)SumO
WavEnLL_s_
WavEnLL_s_
fSqs
2
3
verg
GrMean
GrNonZeros
S(5,-5)
GrMean
SumAverg
GrKurtosis
135dr_LngR
S(0,1)Entrop y
WavEnLL_s_1 WavEnLL_s_3
WavEnLL_s
WavEnLL_s_
Vertl_LngREmp WavEnLL_s_
WavEnLL_s_8
-8
8
h Perc.90
8
Vertl_Fraction
WavEnHH_
Vertl_LngRE
Vertl_Fractio
WavEnLL_s_1
s-2
mph Perc.90
n
WavEnLL_s_7
WavEnLL_s_8
ACCEPTED MANUSCRIPT 17
ACCEPTED MANUSCRIPT S(2,-2) InvDfMom S(2,0)DifEnt
Emph Vertl_Fracti on
Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013
rp
WavEnHH_
Vertl_Fractio
s-1
n
WavEnLL_s_ 1 WavEnLL_s_
Vertl_LngR
7
Emph
S(5,-
WavEnHH_
WavEnLL_s
WavEnLL_s_
WavEnLL_s_1
WavEnLL_s_
WavEnLL_s_8
5)SumOfSqs
s-1
-8
8
WavEnLL_s_8
8
S_0_5_SumAve
WavEnHH_
S_1_0_SumA
S_4_0_SumAve
S(0,4)SumA
Sigma
verg
s-1 Teta3
G S(3,3)SumOfSqs
Teta2 verg
GrNonZeros
Teta2
S(3,3)Contra st
B
Vertl_Fraction
rg
S_0_4_SumA
S_5_0_SumAve
verg
rg
S_0_5_SumA verg
rg S_3_0_SumAve rg
S_3_0_SumA
S_2_0_SumAve
S_2_0_SumA
verg
rg
verg
S_2_0_SumA verg
WavEnHH_s Teta2
WavEnLL_s
WavEnLL_s_
WavEnLL_s_8
WavEnLL_s_
WavEnLL_s_8
-3
-8
8
S_4_0_SumAve
8
WavEnLL_s_1
WavEnHH_
S_4_0_SumA
rg
WavEnLL_s_
S_5_0_SumAve
s-1
verg
S_1_0_SumAve
1
rg
GrNonZeros
S_3_0_SumA
rg
S_5_0_SumA
S_3_0_SumAve
verg
S_3_0_SumAve
verg
rg
S(5,5)AngScMo m S(0,5)SumV
S(4,4)Correlat S(3,3)DifVarnc
GrVariance
S_5_0_SumA
S_3_0_SumA
ACCEPTED MANUSCRIPT 18
ACCEPTED MANUSCRIPT arnc
S(3,-
S(0,5)InvDf
3)Correlat
verg
rg
verg
Teta3
Teta3
Perc.50
Perc.50
Perc.10
Perc.10
Perc.90
Perc.90
Perc.90
Perc.90
Perc.50
Perc.50
Perc.10
Perc.10
Teta3
Teta3
Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013
Mom
WavEnHH_s WavEnHH_
WavEnLL_s
WavEnLL_s_
WavEnLL_s_8
-1
-8
8
WavEnLL_s_7
WavEnHH_
WavEnLL_s_
WavEnLL_s_6
s-2
7
WavEnHH_
WavEnLL_s_
s-1 Teta2
6
Vertl_ShrtR
s-1 Sigma
Emp U S(0,3)DifEnt rp
Teta3 Teta2
WavEnLL_s_5
S(0,2)SumA
WavEnLL_s_
verg
5
S(0,4)SumO
45dgr_GLev
WavEnLL_s
WavEnLL_s_
WavEnLL_s_8
fSqs
NonU
-8
8
WavEnLL_s_7
WavEnHH_
WavEnLL_s_
s-1
7
S(3,0)DifEnt rp
S(5,5)DifVarnc
V S(1,1)SumV arnc
135dr_LngR S(5,5)DifVar EmphVertl_ nc S(5,0)DifVar nc
Fraction
WavEnLL_s_6 WavEnLL_s_5
WavEnLL_s_ 6 WavEnLL_s_ 5
ACCEPTED MANUSCRIPT 19
Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013
ACCEPTED MANUSCRIPT S(0,5)Entrop
WavEnHH_
WavEnLL_s
WavEnLL_s_
WavEnLL_s_8
y
s1
-8
8
WavEnLL_s_7
S(0,5)AngSc
Teta3
WavEnHH_
WavEnLL_s_
WavEnLL_s_1
s-1 Teta2
7
Mom
Teta2
S
135dr_Fracti
S(3,-
135dr_LngR
3)SumEntrp S
(3,-
Emph
on
WavEnLL_s_2
WavEnLL_s_
Teta3
Teta3
WavEnLL_s_
WavEnLL_s_8
8
Perc.50
Perc.50
WavEnLL_s_ 7
WavEnLL_s_7
6 WavEnLL_s_ 2
3)SumVarnc
300 301
ACCEPTED MANUSCRIPT 20
ACCEPTED MANUSCRIPT 302
Table 2
303 304
Results of multidimensional analysis for 11 wheat varieties cultivated in the years 2005-2007, grain humidity 14%.
Total
ation
Training
CY
NA TOR KOK
TR
dydiscrimin [%]
FRE
FONI
ACJ
GAT
A
A
A
NUT SOR SUK WR
KA* SA* A*
SYM TON KA
ZY
AJA CES
A*
TA
100
100
100
100
100
100
100
100
100
100
100
100
94
95
96
88
95
97
98
79
91
100
98
91
92
93
94
90
89
96
98
69
98
100
97
87
93
100
98
97
100
73
100
84
100
88
93
87
93
100
98
97
100
72
100
84
100
87
92
88
93
100
99
98
100
69
100
84
100
86
93
86
Cross20 validiation 05 Percentage split
Naive Bayes
Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013
Method of
Training
Cross20 validiation 06 Percentage split
ACCEPTED MANUSCRIPT 21
ACCEPTED MANUSCRIPT Training
100
100
100
100
100
100
100
100
100
100
100
100
93
93
95
88
96
97
97
78
91
100
99
91
92
93
94
91
89
97
98
68
94
100
97
88
100
100
100
100
100
100
100
100
100
100
100
100
94
95
96
88
95
97
98
79
91
99
98
91
92
93
94
90
89
96
98
69
98
100
97
87
94
100
92
99
100
83
100
82
100
88
91
89
93
100
99
98
100
80
100
80
99
87
91
89
93
100
98
99
100
82
100
84
100
82
92
88
Cross20
Percentage split
Training
Cross20 validiation 05 Percentage split
BayesNet
Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013
validiation 07
Training
Crossvalidiation 20 06 Percentage
ACCEPTED MANUSCRIPT 22
ACCEPTED MANUSCRIPT split
100
100
100
100
100
100
100
100
100
100
100
100
94
95
96
88
95
97
98
79
91
100
98
91
88
94
88
95
99
85
73
85
93
84
99
78
100
100
100
100
100
100
100
100
100
100
100
100
95
94
96
89
97
98
100
84
97
98
99
89
94
96
93
87
93
98
98
77
100
99
97
90
100
100
100
100
100
100
100
100
100
100
100
100
99
100
100
100
100
99
100
100
100
99
99
99
Cross20 validiation 07 Percentage split
Training
Crossvalidiation
Lazy.IB1
Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013
Training
20 Percentage 05 split
Training
20 Cross06 validiation
ACCEPTED MANUSCRIPT 23
ACCEPTED MANUSCRIPT Percentage 100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
95
94
96
89
97
98
100
84
97
98
99
89
94
96
93
87
93
98
98
77
100
99
97
90
90
84
92
75
96
97
97
76
98
95
98
78
86
84
87
75
96
92
97
71
83
91
97
76
86
j88
89
76
92
91
97
68
73
90
94
84
Training
100
100
100
100
100
100
100
100
100
100
100
100
Cross-
98
100
100
100
100
95
100
94
99
94
97
97
split
Cross20 validiation 07 Percentage split
Training
Cross20 validiation Meta Attribute Ser
Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013
Training
05 Percentage split
20 06
ACCEPTED MANUSCRIPT 24
ACCEPTED MANUSCRIPT validiation
Percentage 97
100
100
100
100
94
100
90
100
92
94
95
99
100
99
100
100
100
98
100
99
98
100
98
94
99
89
97
96
96
91
95
98
83
99
84
93
98
88
99
98
98
88
91
98
83
100
76
90
84
92
75
96
97
97
76
98
95
98
78
86
84
87
75
96
92
97
71
83
91
97
76
86
88
89
76
92
91
97
68
74
90
94
84
99
100
100
100
100
99
100
99
99
99
99
99
Training
Cross20 validiation 07 Percentage split
Training
Crossvalidiation Trees.J48
Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013
split
20 Percentage 05 split
20
Training
ACCEPTED MANUSCRIPT 25
ACCEPTED MANUSCRIPT 06
Cross98
100
100
100
100
95
100
94
99
93
97
96
97
100
100
100
100
92
100
90
100
93
96
95
90
84
92
75
96
97
97
76
98
95
98
78
86
84
87
75
96
92
97
71
83
91
97
76
86
88
89
76
92
91
97
68
74
90
94
84
99
100
100
100
100
100
100
99
100
100
100
100
99
100
100
100
100
100
100
99
100
100
100
100
97
99
99
95
97
98
100
91
100
100
99
90
validiation
split
Training
Cross20 validiation 07 Percentage split
Stepwise progressive Discriminant analysis
Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013
Percentage
20 Stepwise 05 regressive
The
Best
subset
ACCEPTED MANUSCRIPT 26
ACCEPTED MANUSCRIPT Stepwise 98
100
100
100
100
98
100
100
100
100
100
100
98
100
100
100
100
98
100
100
100
100
100
100
98
100
100
100
100
98
100
100
100
100
100
100
99
100
100
100
100
100
100
100
100
99
100
100
99
100
100
100
100
100
100
100
100
99
100
100
92
98
100
98
100
98
80
94
99
80
78
84
progressive
Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013
20 Stepwise 06 regressive
The
Best
subset
Stepwise progressive
20 Stepwise 07 regressive
The
Best
subset
305
*- spring cultivars
306
ACCEPTED MANUSCRIPT 27
ACCEPTED MANUSCRIPT
Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013
307
Fig. 1. An image of kernel setting with ROI
Chanel- R
Chanel - G
Chanel - B
308
Chanel - S
Chanel - U
Chanel - V
309 310
ACCEPTED MANUSCRIPT 28
ACCEPTED MANUSCRIPT
Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013
311 312 313 314 315
Fig. 2 A categorised diagram of the distribution of cases which represent 11 varieties of wheat (the year 2006, humidity 14%). Diagram A and B represents distribution of cases vs. variables from the RGB channel and the Ranker and Ranker +Fixed selection method. Diagram C and D represents distribution of cases vs. variables from channels S and Y and the HGA+Fixed, SFFS+Fixed selection methods.
B
A
316
C
D
317 318
ACCEPTED MANUSCRIPT 29
ACCEPTED MANUSCRIPT
Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013
319 320
Fig. 3 Classification of 11 varieties of wheat (A, C, E – stepwise progressive method, B, D, F – best subset method)
321
322
323 324
ACCEPTED MANUSCRIPT 30
ACCEPTED MANUSCRIPT 325
Fig. 4. Distribution of cases based on three best variables in progressive discriminant analysis.
326 327
Fig. 4a. Mean values and standard deviation of texture B_WavEnLL_s-8 for the years 2005, 2006 and 2007.
Downloaded by [Uniwersytet Warminsko Mazurski], [Piotr Zapotoczny] at 16:35 20 March 2013
YEAR 2005
YEAR 2006
328
YEAR 2007
329 330 331
ACCEPTED MANUSCRIPT 31