1 A low-cost automated system for high-throughput

0 downloads 0 Views 3MB Size Report
Oct 16, 2018 - NY 14853 USA; P. R. Armstrong, United States Department of ..... the Logitech driver software and images of a standard grey card (model 24 ColorCard, ..... A colorimeter (model CR-400, Konica Minolta, Tokyo, Japan) was ...
Page 1 of 64

The Plant Phenome Journal Accepted Paper, posted 10/16/2018. doi:10.2135/tppj2018.07.0005

1

A low-cost automated system for high-throughput phenotyping of single oat seeds

2

James W. Clohessy*†, Duke Pauli‡, Kevin M. Kreher, Edward S. Buckler V, Paul R. Armstrong,

3

Tingting Wu, Owen A. Hoekenga, Jean-Luc Jannink, Mark E. Sorrells, and Michael A. Gore*

4 5

J. W. Clohessy, D. Pauli, K. M. Kreher, E. S. Buckler V, M. E. Sorrells, and M. A. Gore. Plant

6

Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca,

7

NY 14853 USA; P. R. Armstrong, United States Department of Agriculture - Agriculture

8

Research Service (USDA-ARS), Center for Grain and Animal Health Research, Manhattan, KS

9

66502 USA; T. Wu, Northwest A&F University, No. 3 Taicheng Road, Yangling, Shaanxi

10

712100 China; O. A. Hoekenga, Cayuga Genetics Consulting Group LLC, Ithaca, NY 14850

11

USA; J.-L. Jannink, USDA-ARS, Robert W. Holley Center for Agriculture and Health, Plant

12

Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca,

13

NY 14853 USA.

14 15

J. W. Clohessy and D. Pauli contributed equally to this work.

16 17

† Present address: North Florida Research and Education Center, University of Florida, 155

18

Research Road, Quincy, FL 32351 USA.

19

‡ Present address: School of Plant Sciences, University of Arizona, 451A Marley Hall, Tucson,

20

AZ 85721 USA.

21 22

*Corresponding authors.

23

Email address: [email protected] (M. A. Gore)

1

The Plant Phenome Journal Accepted Paper, posted 10/16/2018. doi:10.2135/tppj2018.07.0005

24

Mailing address: Plant Breeding and Genetics Section, School of Integrative Plant Science,

25

Cornell University, 358 Plant Science Building, Ithaca, NY 14853 USA

26 27

Email address: [email protected] (J.W. Clohessy)

28

Mailing Address: North Florida Research and Education Center, University of Florida, 155

29

Research Road, Quincy, FL 32351 USA

2

Page 2 of 64

Page 3 of 64

30

The Plant Phenome Journal Accepted Paper, posted 10/16/2018. doi:10.2135/tppj2018.07.0005

Core Ideas

31



A single-seed analyzer system was modified to provide low-cost seed imaging.

32



The system throughput allows rapid, non-destructive phenotyping of single seeds.

33



Accuracy of seed shape and size measurements were similar to those obtained manually.

34



Repeatabilities of morphometric seed traits were higher than seed color traits.

3

The Plant Phenome Journal Accepted Paper, posted 10/16/2018. doi:10.2135/tppj2018.07.0005

35

Abstract

36

Efforts focused on the genetic improvement of seed morphometric and color traits would greatly

37

benefit from efficient and reliable quantitative phenotypic assessment in a non-destructive

38

manner. Although several seed phenotyping systems exist, none of them combine the cost-

39

effectiveness, identity preservation, throughput, and accuracy needed for implementation in plant

40

breeding. We integrated an image analysis component into a single-seed analyzer (SSA) system

41

that also captures near-infrared reflectance (NIR) and weight data. Through the development and

42

utilization of an open-source computational image analysis pipeline, image data acquired by two

43

cameras mounted on the SSA machine were automatically processed to derive estimates of

44

length, width, height, volume, and color for 96 individual dehulled seeds (groats) of five oat

45

genotypes replicated across days. With the exception of color, the four traits were found to be

46

strongly correlated with and have repeatabilities comparable to manual measurements. The seed

47

color values had moderately strong correlation with those measured by a colorimeter, but further

48

improvements to the SSA system are needed to increase measurement accuracy. These results

49

demonstrate that the SSA system has the potential to provide a low-cost solution for the rapid,

50

accurate measurement of morphological traits on individual seeds of oat and potentially other

51

crop species, allowing for the screening of seeds from numerous genotypes in breeding

52

programs.

53 54

Abbreviations

55

BLUE, best linear unbiased estimator; CART, classification and regression tree; CIELAB,

56

Commission Internationale de l'Eclairage L*a*b*; EFD, elliptic Fourier descriptors; GIMP,

4

Page 4 of 64

Page 5 of 64

The Plant Phenome Journal Accepted Paper, posted 10/16/2018. doi:10.2135/tppj2018.07.0005

57

GNU image manipulation program; NIR, near-infrared reflectance; NMR, nuclear magnetic

58

resonance; RGB, red, green, and blue; SSA, single-seed analyzer.

5

The Plant Phenome Journal Accepted Paper, posted 10/16/2018. doi:10.2135/tppj2018.07.0005

59

Introduction Cereal yield, measured as the amount of dry grain over a harvested area (kg/ha), has a

60 61

complex genetic architecture consisting of numerous small-effect loci that interact with the

62

environment. The grain (or seed) yield of small grain cereals, including oat (Avena sativa L.),

63

common wheat (Triticum aestivum L.), and barley (Hordeum vulgare L.) can be separated into

64

three primary components that are typically more heritable and genetically tractable than yield

65

itself: individual seed weight, seed number per spike, and spikes or reproductive tillers on a per

66

area basis (Rustgi et al., 2013). Of these three components, the weight of an individual seed is

67

largely determined by both the rate and duration of seed fill that occurs from anthesis to

68

maturity. The accumulation of water and dry matter (e.g., carbohydrates, oil, protein etc.)

69

throughout this developmental period results in changes to seed length, width, height, and

70

volume, all of which contribute to defining the final dry weight of an individual seed (Xie et al.,

71

2015).

72

In oat, a single grain consists of a groat (caryopsis) and a hull, with the hull removed by a

73

mechanical dehuller during the milling process for human consumption. The size and shape of

74

the groat is important because it affects the maximum size of oat flakes, milling yield, and test

75

weight (Doehlert et al., 2006; Groh et al., 2001). Additionally, low groat weight has been

76

associated with high oil concentration (Peterson and Wood, 1997). Oat in particular is consumed

77

as a whole grain, and consumers have shown preference for groats based on color (Johnson et al.,

78

1997). Furthermore, constituents such as lignin and suberin may be surface enriched in the oat

79

groat, affecting both color and quality (Miller and Fulcher, 2016). To improve via direct

80

selection on these traits, oat breeding programs need a phenotyping platform that provides

6

Page 6 of 64

Page 7 of 64

The Plant Phenome Journal Accepted Paper, posted 10/16/2018. doi:10.2135/tppj2018.07.0005

81

accurate and precise quantitative measures of single dehulled seeds (hereafter seeds) in a cost-

82

effective, high-throughput manner.

83

While it is possible to manually measure the dimensions and weight of a single seed

84

using a digital caliper and analytical lab balance, there are restrictions on the number of samples

85

that can be quickly and cheaply processed and the accuracy to which two- and three-dimensional

86

(2D and 3D) parameters can be estimated. To overcome these limitations, past efforts have

87

focused on the use of digital image analysis for the phenotyping of individual seeds, but these

88

methods are not without weaknesses. Elliptic Fourier (EF) analysis implemented in the software

89

SHAPE (Iwata and Ukai, 2002) enables the 2D closed contour shape variation of seed to be

90

described by Fourier series expansions (i.e., a series of curves that approximate the curve under

91

consideration) through the generation of Elliptic Fourier descriptors (EFDs). This image-based

92

analysis approach allowed for the genetic architecture of rice (Oryza sativa L.) grain shape to be

93

studied, which provided novel insight into the genetics responsible for trait variation (Iwata et

94

al., 2010). However, the sampled seeds had to be carefully oriented for imaging in a manual

95

fashion, and principal components of the coefficients of the EFDs rather than physical

96

measurements (e.g., length, width etc.) were used as morphological phenotypes (Iwata et al.,

97

2010, Williams et al., 2013). More recently, the development of software methods for the

98

analysis of images acquired by consumer grade flatbed scanners has enabled the measurement of

99

various 2D seed shape parameters and color (Moore et al., 2013, Tanabata et al., 2012, Whan et

100 101

al., 2014), but has lacked the additional dimension needed to measure seed height. Instruments constructed to enable non-destructive phenotyping of individual seeds from a

102

range of small- to large-seeded plant species have potential to overcome the drawbacks of early

103

generation digital image analysis systems. As an example, Spielbauer et al. (2009) constructed an

7

The Plant Phenome Journal Accepted Paper, posted 10/16/2018. doi:10.2135/tppj2018.07.0005

104

instrument that collected near-infrared reflectance (NIR) and weight data on single maize (Zea

105

mays L.) seeds to predict grain quality traits such as protein and starch content. This instrument

106

served as the basis for building a single-seed analyzer (SSA) machine to collect the same data

107

types from individual oat seeds (Montilla-Bascón et al., 2017). However, these and other related

108

high-throughput systems (Melchinger et al., 2018, Rolletschek et al., 2015) for evaluating end

109

use quality traits do not have the capability to measure seed morphometric characteristics such as

110

length, width, and height. Illustrative of the advantages derived from integrating lab automation

111

and digital image analysis, the phenoSeeder system constructed by Jahnke et al. (2016)

112

employed a volume-carving method (Roussel et al., 2016) to reconstruct the three-dimensional

113

(3D) shape of individual seeds. Although the system had high reproducibility for the measured

114

seed morphological traits such as volume and mass, it was hindered by a less than optimal

115

throughput and complexity of the robotics needed to manipulate individual seeds.

116

In an effort to address the individual seed phenotyping needs of oat breeding programs,

117

we incorporated an imaging component into the existing low-cost SSA machine of Montilla-

118

Bascón et al. (2017) that already had NIR spectroscopy and weighing capabilities. The objectives

119

of our study were to: (i) develop an open-source computational pipeline for oat seed image

120

analysis, (ii) quantify morphological and color phenotypes of individual seeds from five oat

121

genotypes and compare the results to manual measurements, and (iii) estimate the number of oat

122

seeds that need to be phenotyped to provide high trait repeatabilities and accurate estimation of

123

trait parameters.

124 125

Materials and Methods

126

Oat Samples for Measurements

8

Page 8 of 64

Page 9 of 64

The Plant Phenome Journal Accepted Paper, posted 10/16/2018. doi:10.2135/tppj2018.07.0005

127

In 2015, a set of five different oat genotypes (Ogle, Newdak, Corral, Hidalgo, and

128

SD081107) was evaluated at Cornell University’s Snyder Farm in Ithaca, NY. Conventional oat

129

cultivation practices for the Northeastern US were employed. The plots were mechanically

130

harvested. After harvesting, grain samples were placed in a dryer at 21 C to remove excess

131

moisture and dehulled using a mechanical dehuller (Codema LLC, Maple Grove, MN, USA).

132

The visual color differences observed among the dehulled seeds (groats) from the five oat

133

genotypes spanned a narrow range of hue, saturation, and lightness values for the color brown.

134

Only unbroken and healthy seeds were evaluated using the SSA. For each genotype, 96 seeds

135

were randomly sampled and individually placed into two 48-well plates for replicated

136

measurements with the SSA.

137 138 139

SSA Design, Function, and Control The SSA provides NIR spectroscopy paired with weight measurement and imaging to

140

non-destructively phenotype single seeds rapidly in a batch process that preserves the identity of

141

seeds by placing them in the same well position of a second receiving plate (Supplemental Fig.

142

S1). This design of the SSA (Supplemental Fig. S2) represents a combination and improvement

143

of two previously constructed seed analysis devices (Spielbauer et al., 2009, Ziegler et al., 2013).

144

The NIR component of the SSA was not a focus of this study, but has been previously

145

demonstrated for individual dehulled oat seeds by Montilla-Bascón et al. (2017). In this study,

146

individual seeds were processed through the SSA for weight measurement and imaging using

147

gravity-enhanced compressed air and a series of V-shaped trough seed slides.

148 149

The controlling software for the SSA used Visual C++ based on Visual Studio 2010 and a SQL Server 2008 database (Microsoft Inc., Redmond, WA, USA). Multithreading, consisting of

9

The Plant Phenome Journal Accepted Paper, posted 10/16/2018. doi:10.2135/tppj2018.07.0005

150

five threads, was used for collecting data or controlling each instrument component. These were

151

a primary thread and an X-Y table, NIR spectrometer, analytical balance, and camera thread. The

152

primary thread provided a user interface and interacted with the other four threads that served as

153

subordinate threads. Each subordinate thread undertook separate tasks related to different

154

hardware/sensor requirements as indicated by their names.

155 156 157

Seed Plate Loading and Seed Weight Data Collection The SSA was used to analyze 96 or 192 seeds in a single session, as seeds placed into

158

two or four 48-well plates were stored on the SSA. A total of three sessions were required to

159

analyze the 10 48-well plates on each day. In all, the complete set of 10 plates were analyzed on

160

six different days, for a total of six repeated measurements per seed. The run order of the 10

161

plates was randomized across the three sessions on each day. For each session, two or four 48-

162

well plates were inverted and loaded onto an indexing X-Y table that was used to deliver

163

individual seeds sequentially into the SSA. Single seeds traveled through a glass tube and then

164

dropped onto an analytical balance (GH-200, A&D Company Limited, Tokyo, Japan) to obtain a

165

weight measurement. Once a stable seed weight was recorded, a burst of compressed air ejected

166

the seed into a funnel that guided the seed onto a slide and then onto a rotary imaging table.

167 168 169

Seed Image Data Collection The SSA used a rotary imaging table (Supplemental Fig. S3) that rotated the seed to the

170

web camera (model C615, Logitech, Lausanne, Switzerland) imaging position, enabling the

171

acquisition of top and side view seed images and eventual deposition of the seed in the same

172

position of a second 48-well plate, so as to retain seeds in their original order (Supplemental Fig.

10

Page 10 of 64

Page 11 of 64

The Plant Phenome Journal Accepted Paper, posted 10/16/2018. doi:10.2135/tppj2018.07.0005

173

S2). To block any reflections and non-seed objects from the side and top cameras, top (4.5 × 3

174

cm) and side (4.9 × 2.7 cm) backgrounds made from matte black construction paper (Mi-teintes

175

160 GSM #425 Stygian Black, Canson, Annonay, France) were attached to the rotary imaging

176

table. This allowed for a consistent image background at any location where a seed could be

177

imaged on the table, but with a tradeoff of eliminating the potential imaging benefit of a bottom

178

view camera.

179

We provided uniform lighting across the seed by constructing a lighting system with

180

multiple spectra and variable intensity. Briefly, an aluminum shroud with a diameter of 9 cm and

181

a 1 cm diameter hole in the center was affixed to each of the web cameras, serving as a mount

182

for strips of LED lights. Each strip consisted of six LED lights. Two sets of three strips were

183

placed around the lens of the top camera, and three strips were placed on the top half of the side

184

camera. The LED strip lights consisted of three light colors: 4000K neutral white (model

185

2026NW-40K, Qurosy, Shenzhen, China), 3100K warm white (model 2026WW-31K, Qurosy,

186

Shenzhen, China), and 395-405nm UV (model 5050, Joygo, Guangdong, China). All of the same

187

color LEDs were on a separate circuit, with each circuit having a potentiometer dimmer (model

188

T90040, Triangle Bulbs, Linden, NJ, USA) to control the intensity of light. The UV lights were

189

disabled and intended to detect phenotypes beyond those included in this study.

190

The warm and neutral white light potentiometers were set to 534Ω and 557Ω of

191

resistance for LEDs, respectively, to produce a full color gradient on the seeds without

192

overexposure. For the top camera, white balance and exposure values were set manually using

193

the Logitech driver software and images of a standard grey card (model 24 ColorCard,

194

CameraTrax, Menlo Park, CA, USA). Similarly, the focus was manually set for both web

195

cameras using settings in the Logitech driver software and images of 8pt Times New Roman

11

The Plant Phenome Journal Accepted Paper, posted 10/16/2018. doi:10.2135/tppj2018.07.0005

196

printed text. The web cameras were instructed to take an image by the SSA control software

197

using the OpenCV Visual C++ library (www.opencv.org). Each webcam provided a 1920 × 1080

198

single bitmap image that was received by the OpenCV software and saved to a hard drive

199

directory.

200

In total, 5,760 bitmap images were generated, comprising 2,880 sets of top and side

201

images (Supplemental Table S1). To improve portability, all bitmap images were converted to

202

PNG-24 format. The machine technician recorded if any seeds were lost or potentially reordered,

203

resulting in the removal of 90 image sets prior to image processing. The raw image data are

204

available upon written request.

205 206 207

Image Color Correction The process to estimate seed length, width, height, volume, and color from images

208

required a multistep computational pipeline (Supplemental Fig. S4). The web camera software

209

did not allow the application of separate image settings to different web cameras connected to

210

the same computer, thus the color accuracy of the top camera was prioritized over the side

211

camera. Briefly, we initiated color correction of all collected image data using a color reference

212

card (model 24 ColorCard) imaged by the top camera at the start of each of the six days. Each

213

image of the color reference card was analyzed using GNU Image Manipulation Program

214

(GIMP) (www.gimp.org) by selecting all of the pixels contained in the grey color box on the

215

imaged reference card. With the mean red, green, and blue values of the selected area, individual

216

channel offsets were determined by calculating the difference between these mean values and

217

values expected for the reference (red = 162, green = 163, blue = 162). These calculated daily

218

offsets of a few pixels were applied to both top (i.e., color and light fluctuation correction) and

12

Page 12 of 64

Page 13 of 64

The Plant Phenome Journal Accepted Paper, posted 10/16/2018. doi:10.2135/tppj2018.07.0005

219

side (i.e., only light fluctuation correction) images using a color offset function in IrfranView

220

(www.irfanview.com) (Supplemental Fig. S5).

221 222 223

Top Image Cropping and Seed Segmentation There was inherent variability in where seeds were positioned on the matte black

224

background affixed to the rotary imaging table such that not every imaged seed was centered

225

relative to the top web camera. Therefore, image cropping was needed to reduce extraneous

226

background area that otherwise significantly increased image processing time. Through

227

implementation of an image pre-processing Python script (sapre-processing.PY;

228

https://github.com/GoreLab/SSA_Image_Processor) that utilized OpenCV and Pillow modules,

229

all top camera images were cropped at the extreme top and left sides. Removal of pixels from the

230

right and bottom sides would also be preferred for processing speed, but the variability in seed

231

positions necessitated the inclusion of these sides.

232

To detect and segment seed pixels from the background area of all top camera images,

233

the EasyPPC software (Guo et al., 2013) used a classification and regression tree (CART) model

234

to differentiate seed from background (Supplemental Fig. S6). The CART model was trained

235

using a random selection of 33 color corrected and six non-color corrected top camera images.

236

The addition of non-color corrected images provided a wider color palette, improving results for

237

darker and lighter hues caused by shading and glare, respectively.

238 239 240 241

Side Image Cropping and Seed Segmentation Variability in seed position on the matte black background was even more extreme in side camera images. Therefore, the image pre-processing Python script was used in combination

13

The Plant Phenome Journal Accepted Paper, posted 10/16/2018. doi:10.2135/tppj2018.07.0005

242

with the EasyPPC software to crop the side images. The top and bottom rows of each side image

243

were cropped by the Python script, as those areas could not contain a seed. Subsequently, the

244

EasyPCC software used a CART model to detect the matte black background in each side image,

245

resulting in cropped side images that were segmented into non-background and background areas

246

(Supplemental Fig. S7). This CART model was trained using a random selection of 33 color

247

corrected side images that had their top and bottom rows cropped as described above. Next, all

248

image areas detected as not of interest (i.e., neither the matte black background nor seed) were

249

cropped for a second time by the Python script, producing images of a seed against the matte

250

black background. Finally, the EasyPCC software used a second CART model trained to detect

251

the seed from the remaining background. We trained this CART model using a random selection

252

of 35 images from the second round of cropping (Supplemental Fig. S8).

253

When training the CART models for side images, the EasyPPC software often detected

254

colors in the regions of interest that were the same or similar to those found in the background

255

area of the images. The occurrence of these false positives was minimized using an iterative

256

process where a CART model was trained, applied to a test set of images, evaluated, and

257

retrained, if needed. The metric for model evaluation was proper detection of the seed

258

boundaries, such that there were fewer regions at the seed-background boundary where the seed

259

was improperly labeled as background. If severe encroachment was found, the CART model was

260

retrained with a different background area highlighted. This process was repeated no more than

261

two times. All images used for model training were removed from the result set (Supplemental

262

Table S1).

263 264

Image-based Measurement of Seed Height, Length, and Width

14

Page 14 of 64

Page 15 of 64

265

The Plant Phenome Journal Accepted Paper, posted 10/16/2018. doi:10.2135/tppj2018.07.0005

The top and side segmented images were used to measure seed phenotypes. The Python

266

OpenCV module was used in the main-processing Python script (Samain.PY;

267

https://github.com/GoreLab/SSA_Image_Processor) to create a 2D rectangle around the seed in

268

both the top and side images (Supplemental Fig. S9). The edges of the rectangle were bound to

269

the outermost points of the seed. The rectangle was allowed to freely rotate, thus minimizing the

270

size. The length and width of this translated and rotated rectangle in the top image represented

271

the length and width of the seed in pixels. Seed height in pixels was found using the same

272

process of rectangle translation and rotation applied to the side image. Other variables measured

273

from this 2D rectangle included the position in the image and the angle of the midline of the

274

seed. These two variables were used to convert the measured pixels to millimeters.

275

Conversion of pixel measurements to seed length in millimeters was accomplished using

276

scaling variables calculated from calibration images (i.e., images of a ruler) taken before the

277

experimental seeds were processed through the SSA. For the top image of the seed, the process

278

of pixel to millimeter conversion was linear as the seed always fell in the same plane. However,

279

for the side image the position of the seed was subject to change between images due to the

280

physical design of the SSA. To address this, a series of calibration images were taken using the

281

side camera with a ruler placed at several different known distances from the camera. The

282

scaling factors produced from this process allowed a pixel representing a portion of a seed at a

283

known distance from the side camera to be converted to a millimeter distance. The top camera

284

image contained the position of the seed and thus the distance between the seed and side camera

285

could be calculated. Using the pixel height of the seed found in the side image, the scaling

286

factors were used to convert the side image height distance from pixels to millimeters. This

15

The Plant Phenome Journal Accepted Paper, posted 10/16/2018. doi:10.2135/tppj2018.07.0005

287

process of finding the seed location and scaling was repeated for each set of images and, as such,

288

for each seed.

289 290

Seed Image Error Detection

291

The SSA collected a low number (~13%) of top and side images that contained

292

misaligned or missing seeds. Briefly, the main-processing Python script (Samain.PY) detected

293

these error events by analyzing several seed phenotypes and their positions in both the top and

294

side images. The position of the seed was saved as a binary 2D array representing the pixels that

295

contain the seed in the image and the non-seed containing pixels. The perimeter of this array was

296

compared with the max bounds of the source raw image. If the perimeter intersected with the

297

max bounds, then this indicated that the seed was only partially in the image. Next, the area of

298

the seed as defined by the total number of pixels was found using the binary 2D array. If the total

299

number of pixels comprising the seed was less than 400 pixels (i.e., an order of magnitude less

300

than an averaged sized seed), then this indicated that only dust or debris (i.e., no seed) was

301

detected. If the total number of pixels encompassing the seed was an order of magnitude more

302

than an average sized seed, then this indicated a likely non-seed object in the image. Lastly, seed

303

angle measurements from the top image were used to identify potential errors contained in the

304

side image. If the angle of the seed was greater than 80° with respect to the x-axis in the top

305

image, then this indicated that the side image would only contain a small cross-section of the

306

seed. Therefore, seed volume could not be accurately estimated. Top and side images having

307

seed perimeter, area, and/or angle values exceeding these defined thresholds were flagged in the

308

output data file with a description of detected errors. Flagged images were removed and not

309

included in the analyzed results (Supplemental Table S1).

16

Page 16 of 64

Page 17 of 64

The Plant Phenome Journal Accepted Paper, posted 10/16/2018. doi:10.2135/tppj2018.07.0005

310 311 312

Image-based Seed Volume Measurement Volume of an individual seed was estimated using an ellipsoidal approximation of the

313

seed cross-section in combination with a Riemann sum over the length of the seed. Briefly, the

314

data from the top and side images were used to create a 2D seed location array, which was then

315

used to measure the height and width of any given cross section and total length (Supplemental

316

Fig. S10). Notably, the side image array rarely matched the top image array with respect to

317

length because of the variability in seed angle. To resolve this issue, we scaled the side array

318

along the seed length axis, enabling the pixel length of the side array to equal the pixel length of

319

the top array. The seed height was proportionally maintained throughout this process.

320 321

The estimated height, width, and rescaled length values were used to estimate volume as follows: 𝑙=𝐿

𝑉𝑜𝑙𝑢𝑚𝑒 = ∑𝑙 = 0𝜋 ⋅

322

𝑤(𝑙) ℎ(𝑙) 2 ⋅ 2

⋅ 𝛥𝑙

(Equation 1)

323

where summation occurs along the length of the 2D array from 𝑙 = 0 to 𝑙 = 𝐿 where 𝐿 represents

324

the total array length in pixels. The area of an ellipse and, as such, the cross sectional area at any

325

point in the seed 𝑙 is given by 𝜋 ⋅

326

respectively, at point 𝑙 along the length of the 2D seed location arrays. 𝑤(𝑙) and ℎ(𝑙) were

327

calculated first in pixels and then scaled to millimeters using the scale factors previously

328

described in the “Image-based measurement of seed height, length, and width” section of the

329

Materials and Methods. The cross section as defined above was multiplied by the length of each

330

pixel step 𝛥𝑙 in millimeters and summed over the length of the seed, generating an estimated

331

value for the volume.

𝑤(𝑙) ℎ(𝑙) 2 ⋅ 2

where 𝑤(𝑙) and ℎ(𝑙) represent the width and height,

332

17

The Plant Phenome Journal Accepted Paper, posted 10/16/2018. doi:10.2135/tppj2018.07.0005

333

Image-based Seed Color Measurement

334

Seed color was measured using the color calibrated top camera image and 2D array of

335

pixels that represented the seed location. As an example of this procedure, Supplemental Fig.

336

S11 shows a red line boundary around the pixels used for color detection. The main-processing

337

Python script (Samain.PY) recorded every colored pixel and its color value, followed by the

338

creation of a comma delimited text file. The comma separated values (CSV) file contained a row

339

for each seed with three columns containing the average red, green, or blue (RGB) color value

340

over the entire seed. These color value data were only available for the oat genotypes Ogle and

341

Hidalgo on five of the six days, but were obtained for the other three oat genotypes (Newdak,

342

Corral, and SD081107) on all six days.

343 344 345

Final Data Output from Image Analysis The final phenotypic data file generated by the main-processing Python script

346

(Samain.PY) was a CSV file containing individual seeds as rows and the seven seed phenotypes

347

(length, width, height, average red, average green, average blue, and volume) as columns. An

348

additional column included script error messages or warnings that indicated missing image files

349

or data points where the phenotype values were outside of expected ranges.

350 351 352

Manual Seed Measurements Manual measurements of length, height, and width on individual seeds were performed

353

with a digital caliper (model 500-171-30, Mitutoyo, Kawasaki, Japan) having a measurement

354

accuracy of +/-0.01 mm. The digital caliper directly sent measurements to a laptop via a USB

18

Page 18 of 64

Page 19 of 64

The Plant Phenome Journal Accepted Paper, posted 10/16/2018. doi:10.2135/tppj2018.07.0005

355

connection. All seed positions were maintained within the tray, and seed measurements were

356

taken by two different technicians. Length was defined as the radius that is parallel to the seed crease, whereas height was

357 358

defined as the axis that intersects the seed crease at the top and the embryo at the bottom. Width

359

was defined as the remaining axis that is perpendicular to the seed crease. These measurements

360

were used to calculate the volume of an ellipsoid as follows: 4

361 362

(Equation 2)

𝑉𝑜𝑙𝑢𝑚𝑒 = 3𝜋𝑎𝑏𝑐

where a is the radius and b and c are the axes of the ellipsoid. Seed weights obtained by the SSA were independently validated by two technicians that

363 364

manually weighed each seed using the same scale (GH-200, A&D Company Limited, Tokyo,

365

JPN).

366

A colorimeter (model CR-400, Konica Minolta, Tokyo, Japan) was used to measure seed

367

color for each genotype by bulking the two trays of 96 seeds for each genotype. The colorimeter

368

was first calibrated using the included white standard panel and set to use a 2° view angle and a

369

D65 illuminant. Each set of seeds was placed within a glass dish, ensuring that there were at least

370

two layers of seeds throughout and no visible bottom. The colorimeter was then placed over the

371

center of the petri dish and moved around the sample to collect a total of 100 color

372

measurements. The colorimeter returned measurements in the L*a*b* color space format. The

373

measurements were made by two different technicians on different days.

374 375

Statistical Analyses

19

Page 20 of 64

The Plant Phenome Journal Accepted Paper, posted 10/16/2018. doi:10.2135/tppj2018.07.0005

376

To assess whether the raw SSA phenotypic data contained significant outliers, we

377

initially fitted a mixed linear model for each of the eight seed traits in ASReml-R version 3.0

378

(Gilmour et al., 2009):

379

Yijk = µ + genotypei + dayj + (genotype×day)ij + tray(genotype)ki + εijk,

(Equation 3)

380

in which Yijk is the individual phenotypic observation; µ is the grand mean; genotypei is the

381

effect of the ith genotype; dayj is the effect of the jth day of samples being run; (genotype×day)ij

382

is the interaction effect between the ith genotype and jth day; tray(genotype)ki is the effect of the

383

kth tray nested within the ith genotype; and εijk is the residual error effect assumed to be

384

independently and identically distributed according to a normal distribution with mean zero and

385

variance 𝜎2𝜀 , that is ~iid N(0, 𝜎2𝜀 ). The model terms genotypei, dayj, and (genotype×day)ij were

386

fitted as fixed effects, while tray(genotype)ki was modeled as a random effect according to ~iid

387

N(0, 𝜎2). To detect significant outliers, Studentized deleted residuals (Neter et al., 1996) obtained

388

from these mixed linear models were examined. Data transformations and unequal variance

389

models were not needed because the assumptions of normality and homoscedasticity were met.

390

Once significant outliers were removed, an iterative mixed linear model (Equation 3)

391

fitting procedure was conducted in ASReml-R version 3.0. Likelihood ratio tests were conducted

392

to remove terms fitted as random effects that were not significant at α = 0.05 (Littell et al., 2006).

393

The final model for each trait was used to estimate a best linear unbiased estimator (BLUE) for

394

each genotype on individual days and across all six days. Sequential tests of fixed effects were

395

conducted with degrees of freedom being calculated using the Kenward-Rogers (KR)

396

approximation (Kenward and Roger, 1997). The KR approximation, which uses a scaled Wald

397

statistic that involves an approximate covariance matrix to account for variability introduced due

398

to estimation of variance components, adjusts the degrees of freedom used in tests of

20

Page 21 of 64

The Plant Phenome Journal Accepted Paper, posted 10/16/2018. doi:10.2135/tppj2018.07.0005

399

significance together with an F approximation of the sampling distribution for the scaled Wald

400

statistic (Kenward and Roger, 1997). The bias-adjusted degrees of freedom in the KR

401

approximation results in more conservative estimates of significance values associated with

402

model terms.

403

The mixed linear model analysis of phenotypic data obtained by a handheld digital

404

caliper and analytical balance was identical to that used for the SSA data, with the exception of

405

having fitted the following model in ASReml-R version 3.0:

406

Yijk = µ + genotypei + techj + (genotype×tech)ij + tray(genotype)ki + εijk,

(Equation 4)

407

in which Yijk is the individual phenotypic observation collected with the digital caliper or

408

analytical balance; µ is the grand mean; genotypei is the effect of the ith genotype; techj is the

409

effect of the jth technician who has taken measurements; (genotype×tech)ij is the interaction

410

effect of the ith genotype with the jth technician; tray(genotype)ki is the effect of the kth tray

411

nested within the ith genotype; and εijk is the residual error effect assumed to be ~iid N(0, 𝜎2𝜀 ).

412

The model term genotypei was the only fixed effect, while all other terms were modeled as

413

random effects according to ~iid N(0, 𝜎2). Data transformations and unequal variance models

414

were not needed in this analysis. For each trait, a BLUE for each genotype was estimated with

415

the final model fitted to an outlier screened data set.

416

The mixed linear model analysis of phenotypic data collected by a handheld colorimeter

417

was also identical to that used for the SSA except for fitting the following model in ASReml-R

418

version 3.0:

419

Yij = µ + genotypei + techj + (genotype×tech)ij + εij,

(Equation 5)

420

in which Yij is an individual phenotypic observation with the colorimeter; µ is the grand mean;

421

genotypei is the effect of the ith genotype; techj is the effect of the jth technician that has taken

21

The Plant Phenome Journal Accepted Paper, posted 10/16/2018. doi:10.2135/tppj2018.07.0005

422

measurements; (genotype×tech)ij is the interaction effect between the ith genotype and jth

423

technician, and εij is the residual error effect assumed to be ~iid N(0, 𝜎2𝜀 ). The model term genotypei

424

was the only fixed effect, with all other terms modeled as random effects according to ~iid N(0, 𝜎2

425

). For each trait, a BLUE for each genotype was estimated using the final model fitted to an outlier

426

screened data set, without a need for data transformations and unequal variance models.

427

We assessed the validity of phenotypic data collected by the SSA through comparisons

428

with measurements manually obtained by a digital caliper, analytical balance, and colorimeter.

429

The strength of a relationship between the same traits scored by two different methods was

430

measured with the Pearson’s correlation coefficient (rp) in R (R Core Team, 2017). Phenotypic

431

data sets with outliers removed (i.e., prior to estimation of BLUEs) were used to calculate the

432

correlation of daily single seed measurements obtained by the SSA to those collected using the

433

digital caliper and analytical balance. Given that single seed measurements based on the digital

434

caliper and analytical balance were taken independently by two different technicians, an average

435

single seed measurement value across technicians was used in pairwise comparisons with SSA

436

measurements across genotypes within each of the six days.

437

To enable the comparison of seed color measurements between the SSA and colorimeter,

438

the BLUEs for the three color traits (RGB) scored by the SSA were used to generate

439

Commission Internationale de l'Eclairage L*a*b* (CIELAB) color space values for each

440

genotype across the six days. The conversion from RGB to CIELAB color space values was

441

performed through a web interface (http://colormine.org/convert/rgb-to-lab) that implemented

442

the open source ColorMine library. Next, the resultant SSA-based CIELAB color values were

443

tested for correlation (rp) with those obtained on bulked seed samples by the colorimeter.

22

Page 22 of 64

Page 23 of 64

444

The Plant Phenome Journal Accepted Paper, posted 10/16/2018. doi:10.2135/tppj2018.07.0005

For each trait, repeatability (r) was estimated to provide a measure of technical

445

performance of the SSA relative to manual measurements for individual seeds. In the context of

446

this study, repeatability is the proportion of the variance in measurements taken by a single

447

instrument on the same seed under the same conditions that is due to permanent, or non-

448

localized, differences between oat genotypes (Falconer and Mackay, 1996, Piepho and Möhring,

449

2007). To estimate the appropriate variance components, the final fitted model (Equations 3 or 4)

450

for each trait was reformulated so that all terms were modeled as random effects. For the traits

451

measured by the SSA, variance components were then used to estimate r as follows (Holland et

452

al., 2003):

453

𝑟=

𝜎2𝑔 𝜎2𝑔 + 𝜎2𝑔𝑑 + 𝜎2𝑔𝑡 + 𝜎2𝜀

(Equation 6)

454

where 𝜎2𝑔 is the estimated genetic variance, 𝜎2𝑔𝑑 is the estimated variance associated with

455

genotype-by-day interaction, 𝜎2𝑔𝑡 is the estimated variance associated with the effect of sampling

456

tray nested within genotype, and 𝜎2𝜀 is the residual error variance. With respect to the

457

repeatability estimates for the traits measured by the digital caliper and analytical balance, the

458

equation used was as follows:

459

𝑟=

𝜎2𝑔 𝜎2𝑔 + 𝜎2𝑔𝑡𝑒𝑐ℎ + 𝜎2𝑔𝑡 + 𝜎2𝜀

(Equation 7)

460

where 𝜎2𝑔 is the estimated genetic variance, 𝜎2𝑔𝑡𝑒𝑐ℎ is the estimated variance associated with the

461

effect of genotype nested within research technician, 𝜎2𝑔𝑡 is the estimated variance associated

462

with the effect of sampling tray nested within genotype, and 𝜎2𝜀 is the residual error variance.

463

Standard errors for the repeatability estimates were approximated using the delta method

464

(Holland et al., 2003, Lynch and Walsh, 1998).

23

Page 24 of 64

The Plant Phenome Journal Accepted Paper, posted 10/16/2018. doi:10.2135/tppj2018.07.0005

465

To assess how measurement repeatability impacted the number of phenotypic

466

observations needed to be collected per seed, we calculated the gain in accuracy from a reduction

467

in phenotypic variation measured as a percentage as more measurements were obtained for an

468

individual seed. With the inclusion of repeatability estimates calculated in Equation 5, the

469

following formula was used to derive the change in phenotypic variance as the number of

470

measurements, n, was increased from 1 to 10 for each SSA trait (Falconer and Mackay, 1996):

471

𝑉𝑝(𝑛) 𝑉𝑝

=

1 + 𝑟(𝑛 ― 1) 𝑛

(Equation 8)

472

where Vp(n) is the phenotypic variance for a number of measurements, n, for a trait, Vp is the

473

overall phenotypic variance based on a single measurement, and r is the trait repeatability.

474

The number of different seeds that need to be measured per genotype by the SSA to

475

reliably estimate seed trait parameters for an oat genotype, as determined by confidence

476

intervals, was also calculated. With 10% of the trait mean as the defined amount of error, the

477

following formula was used to determine seed sample size (Ott et al., 2013):

478

𝑛=

(𝑧𝛼/2)2𝜎2 𝐸2

(Equation 9)

479

where Zα/2 is the Z value associated with either a confidence interval of 90% or 95% (α = 0.10 or

480

0.05, respectively), σ2 is the variance for the trait, and E is the amount of acceptable error in the

481

trait measurement. E was calculated as 10% of the estimated mean for each trait. To derive

482

means and variances for a trait, an iterative sampling and modeling approach was employed.

483

First, for each trait, random sampling without replacement from the outlier screened data set was

484

conducted to generate sample sizes ranging from 2 to 20 measurements (n) per genotype. Within

485

each subsampled data set, the following model was fitted to derive genotypic BLUEs for each

486

trait within the n = 2 to 20 sample sizes:

487

Yi = µ + genotypei + εi, 24

(Equation 10)

Page 25 of 64

The Plant Phenome Journal Accepted Paper, posted 10/16/2018. doi:10.2135/tppj2018.07.0005

488

in which Yi is an individual phenotypic observation; µ is the grand mean; genotypei is the fixed

489

effect of the ith genotype; and εi is the residual error effect assumed to be ~iid N(0, 𝜎2𝜀 ). The

490

derived trait BLUEs for each genotype within the individual n sample sizes were used to

491

estimate trait means and variances specific to that n-sized sample. Once all the trait means and

492

variances for the n = 2 to 20 sample sizes were calculated, an overall, across-n sample size

493

average was calculated for both the trait mean and variance. These representative values were

494

used in Equation 10 for determining the seed sample size for each trait.

495 496

Results The single-seed analyzer (SSA) and manual (digital caliper and analytical balance)

497 498

measurements of the individual oat seeds revealed highly significant (P < 0.0001) differences

499

among the five genotypes for seed weight, length, width, height, and volume (Table 1). With

500

respect to the performance of the SSA across the six days on which seeds were measured, there

501

was no effect of day on seed weight measurements, but there were significant differences among

502

the days for length, width, height, and volume (P < 0.05 to 0.0001, Table 1, Fig. 1). The

503

interaction effect of genotype-by-day was only significant for volume (P < 0.01, excluding the

504

seed color traits), indicating that the day-to-day variability did not influence the rank-order of the

505

genotypes (Supplemental Table S2). For the RGB color values extracted from the SSA collected

506

images, no significant differences among the five oat genotypes were detected, but the effect of

507

day and genotype-by-day were both highly significant (P < 0.0001) for all three color value

508

traits.

509

25

Page 26 of 64

The Plant Phenome Journal Accepted Paper, posted 10/16/2018. doi:10.2135/tppj2018.07.0005

510

Table 1. F values for fixed effects from an analysis of variance (ANOVA) for oat seed traits

511

measured using the single-seed analyzer (SSA), digital caliper, analytical balance, and

512

colorimeter. Trait Weight (g) Length (cm) Width (cm) Height (cm) Volume (cm3) Red value Green value Blue value L* a* b*

513 514 515 516 517 518

Method SSA Balance SSA Caliper SSA Caliper SSA Caliper SSA Caliper SSA SSA SSA Colorimeter Colorimeter Colorimeter

Genotype 170.800**** 61.48**** 140.900**** 57.450**** 33.860**** 94.970**** 89.450**** 31.150**** 32.470**** 59.640**** 2.258NS 2.764 NS 2.353 NS 5.668* 3.591 NS 1.715 NS

NS

Not Significant at the < 0.05 level. * Significant at the < 0.05 level. ** Significant at the < 0.01 level. *** Significant at the < 0.001 level. **** Significant at the < 0.0001 level.

26

Day 1.811 NS 2.462* 9.670**** 76.720**** 125.400**** 343.100**** 293.900**** 193.200**** -

Genotype-by-Day 0.211 NS 0.650 NS 0.799 NS 1.536 NS 1.835** 9.029**** 7.052**** 8.922**** -

Page 27 of 64

The Plant Phenome Journal Accepted Paper, posted 10/16/2018. doi:10.2135/tppj2018.07.0005

519 520

Fig. 1. Boxplots of five oat seed traits measured by the single-seed analyzer (SSA) across six

521

days.

522 523

When comparing the trait BLUEs of seed weight, length, width, height, and volume for

524

the five oat genotypes between the SSA and manual measurements (Table 2), the estimates were

525

not significantly different (two-sided t-test, P > 0.05). Additionally, the correlation (rp) of daily

526

SSA measurements to those obtained with the digital caliper and analytical balance ranged from

527

0.618 to 0.994, with an average rp of 0.886 across all days and traits (Table 3). We observed an

528

across-day, average correlation between seed weight and volume, of 0.888 for the SSA and 0.986

529

for manual measurements (Fig. 2). With the exception of weight and width, the weakest

530

correlations (rp = 0.618 to 0.900) between the SSA and manual measurements were on the fifth

531

day that seed samples were evaluated. Irrespective of this one anomalous day, estimated

27

Page 28 of 64

The Plant Phenome Journal Accepted Paper, posted 10/16/2018. doi:10.2135/tppj2018.07.0005

532

correlation coefficients were found to be consistent across days for these five traits. In contrast,

533

the three CIELAB color values determined via the SSA and colorimeter were significantly

534

different (two-sided t-test, P < 0.01) between the two methods (Table 4). However, strong

535

positive correlations (rp = 0.742 to 0.915) were observed between color values based on the two

536

measurement methods (Table 4).

537 538

Table 2. Best linear unbiased estimators (BLUEs) for oat seed traits measured using the single-

539

seed analyzer (SSA), digital caliper, and analytical balance, as well as estimates of repeatability

540

(r) on an individual seed basis with their standard error (SE). Seed Trait Weight (g) Length (cm) Width (cm) Height (cm) Volume (cm3) Red value Green value Blue value

Method SSA Balance SSA Caliper SSA Caliper SSA Caliper SSA Caliper SSA SSA SSA

Corral 0.022 0.023 0.651 0.609 0.271 0.247 0.207 0.198 0.017 0.127 137.254 110.472 69.046

Genotype Newdak Ogle 0.027 0.026 0.028 0.028 0.720 0.736 0.672 0.688 0.272 0.284 0.247 0.258 0.224 0.208 0.213 0.205 0.020 0.019 0.150 0.154 140.379 141.315 116.022 114.956 74.816 73.061

Hidalgo 0.020 0.022 0.694 0.638 0.248 0.223 0.210 0.197 0.016 0.120 133.227 107.808 69.417

541

28

SD081107 0.030 0.031 0.771 0.726 0.288 0.262 0.220 0.210 0.022 0.169 138.827 113.475 73.359

Repeatabilities Estimate SE 0.281 0.144 0.272 0.144 0.227 0.126 0.252 0.138 0.299 0.153 0.377 0.169 0.151 0.094 0.147 0.097 0.193 0.115 0.268 0.142 0.028 0.028 0.054 0.054 0.006 0.094

Page 29 of 64

The Plant Phenome Journal Accepted Paper, posted 10/16/2018. doi:10.2135/tppj2018.07.0005

542

Table 3. Pearson’s correlation coefficients (rp) generated from evaluating the linear relationship

543

of single-seed analyzer (SSA) measured oat seed traits with the same traits measured by a digital

544

caliper and analytical balance across six days. All correlations were significant at α = 0.05. Day 1 2 3 4 5 6

Weight 0.941 0.983 0.979 0.975 0.993 0.994

Length 0.965 0.974 0.964 0.964 0.900 0.977

Width 0.893 0.927 0.918 0.898 0.895 0.912

Height 0.707 0.719 0.723 0.687 0.618 0.741

Volume 0.910 0.952 0.936 0.929 0.669 0.929

545 546

547 548

Fig. 2. The relationship between seed weight and volume estimated by measurements from the

549

single-seed analyzer (SSA) (A) or digital caliper (B).

550

29

Page 30 of 64

The Plant Phenome Journal Accepted Paper, posted 10/16/2018. doi:10.2135/tppj2018.07.0005

551

Table 4. Best linear unbiased estimators (BLUEs) for oat L*a*b* color space seed traits

552

measured using the single-seed analyzer (SSA) and colorimeter, as well as Pearson’s correlation

553

coefficients (rp) from evaluating the linear relationship of these same traits between the two

554

measurement methods. All correlations were significant at α = 0.10. Color Parameter L* a* b*

Method

Corral

Hidalgo

Newdak

Ogle

SD081107

SSA Colorimeter SSA Colorimeter SSA Colorimeter

48.290 56.869 4.935 6.062 26.993 23.371

47.138 58.062 4.726 6.182 25.230 22.266

50.227 59.294 3.732 5.666 26.276 23.538

50.006 59.756 4.569 6.089 26.999 23.348

49.346 59.156 4.355 5.910 25.937 22.785

Correlation 0.742 0.915 0.857

555 556

The estimates of repeatability (r) on an individual seed basis for weight, height, length,

557

width, and volume measured by the SSA ranged from 0.151 to 0.299, whereas for manual

558

measurements repeatability estimates for an individual seed ranged from 0.147 to 0.377 for the

559

same five traits (Table 2). The repeatability estimates for these five traits were not significantly

560

different (paired t-test, P > 0.05) between the SSA and manual methods. Neither the SSA nor the

561

manual methods consistently had the highest repeatabilities. The SSA measurements exhibited

562

higher repeatability for weight and height; however, manual measurements for seed length,

563

volume, and width had higher repeatability estimates. Conversely, the RGB color values had the

564

lowest repeatability estimates (mean r of 0.029) of any measured trait (Table 2). On average, the

565

repeatabilities for the three color value traits were nearly six to seven fold lower than those

566

obtained for the other five non-color traits by the SSA (mean r of 0.230) and manual (mean r of

567

0.263) individual seed phenotyping methods.

568 569

The repeatability estimates for the SSA measured traits (Table 2) were used to calculate the gain in accuracy expected from a reduction in the phenotypic variance by multiple

30

Page 31 of 64

The Plant Phenome Journal Accepted Paper, posted 10/16/2018. doi:10.2135/tppj2018.07.0005

570

measurements on an individual seed. The gain in accuracy expected from multiple measurements

571

(n = 1 to 10) of each seed was highest for the traits with the lowest repeatabilities, and lowest for

572

traits with the highest repeatabilities (Fig. 3), congruent with expectations. Of the eight SSA

573

measured traits, the RGB color value traits would benefit the most from multiple measurements.

574

Additionally, there was an observed concomitant decrease in accuracy gains as the number of

575

measurements increased. Generally, the results suggested that two to three measurements per

576

seed by the SSA provided the best balance between gain in accuracy and resource expenditure

577

(Fig. 3), as there was only a minimal reduction in the phenotypic variance with four or more

578

measurements for an individual seed.

31

The Plant Phenome Journal Accepted Paper, posted 10/16/2018. doi:10.2135/tppj2018.07.0005

579 580

Fig. 3. Gain in accuracy from multiple phenotypic measurements of an individual seed with the

581

single-seed analyzer (SSA).

582 583

In a sampling analysis focused on the accuracy in estimation of morphological

584

parameters, we calculated the number of different seeds per genotype that needed to be measured

585

by the SSA to estimate a mean value with an assumed error rate of 10% for the more repeatable

586

non-seed color traits. The results of this analysis showed that an estimated sample size of only

587

one seed was required for seed height, width, and length, while weight and volume had estimated

588

sample sizes of seven and eight seeds, respectively, to allow for estimation of trait values such 32

Page 32 of 64

Page 33 of 64

The Plant Phenome Journal Accepted Paper, posted 10/16/2018. doi:10.2135/tppj2018.07.0005

589

that the 90% confidence interval was equal to 10% of the trait mean (Table 5). When considering

590

a confidence interval of 95%, the number of estimated seeds that needed to be measured for

591

these five traits increased by one seed for length and three seeds each for weight and volume,

592

whereas no additional seeds were required for measurement for width and height.

593 594

Table 5. Number of seeds needed for measurement by the single-seed analyzer (SSA) to estimate

595

a mean trait value with an error rate of 10% at a confidence interval (CI) of 90% and 95%. Trait Weight (g) Length (cm) Width (cm) Height (cm) Volume (cm3)

Mean 0.025 0.714 0.273 0.214 0.019

Error +/- 10% of mean 0.003 0.071 0.027 0.021 0.002

n for 90% CI 6.738 1.333 0.919 0.549 8.002

n for 95% CI 9.624 1.903 1.313 0.784 11.430

596 597

Discussion

598

The quantification of phenotypic trait variation on large numbers of genetically distinct

599

individuals continues to require a considerable investment of both time and financial resources

600

for plant breeding programs (Araus and Cairns, 2014, Furbank and Tester, 2011, Pauli et al.,

601

2016, White et al., 2012). To address this challenge, there have been significant efforts focused

602

on the development, testing, and validation of plant phenotyping systems for both greenhouse

603

and field environments (Andrade-Sanchez et al., 2014, Deery et al., 2014, Jiang et al., 2018,

604

Massimo et al., 2017). These phenotyping advancements have shifted the emphasis towards the

605

use of validated systems for comprehensively assessing plant morphological and physiological

606

traits over varying spatiotemporal scales (Fahlgren et al., 2015, Honsdorf et al., 2014, Pauli et al.,

607

2016). Often of less focus relative to agronomically and economically important phenotypes

608

evaluated under field conditions, such as grain and biomass yield, are measurement of traits on a

33

The Plant Phenome Journal Accepted Paper, posted 10/16/2018. doi:10.2135/tppj2018.07.0005

609

single seed basis (Jahnke et al., 2016). These traits, which have consistently been shown to have

610

a direct relationship with end-product quality (Melchinger et al., 2018, Montilla-Bascón et al.,

611

2017, Rolletschek et al., 2015, Spielbauer et al., 2009), would significantly benefit from having a

612

low-cost, easy to operate seed phenotyping system amenable to laboratory settings. Herein, we

613

developed and tested a novel computational image analysis pipeline combined with a SSA

614

machine that enables the accurate measurement of individual oat seed characteristics in a cost-

615

effective and high-throughput manner.

616

The SSA machine and cameras, which cost an estimated $20,000, in combination with

617

the implementation of image analysis methods based on open source software allowed for the

618

detection of significant differences in individual seed weight, length, width, height, and volume

619

parameters among the five oat genotypes evaluated. The estimates of these seed properties were

620

highly concordant with those measured manually using a handheld digital caliper. Conversely,

621

the more subtle variation in seed color across the five oat genotypes was not significantly (P

Suggest Documents