Document not found! Please try again

Estimation of the limit of detection using information ...

4 downloads 0 Views 683KB Size Report
4 shows the MI between the source (presence/absence of analyte) and the .... and Health Administration (OSHA) defined permissible exposure limits (PEL) ...
Accepted Manuscript Title: Estimation of the Limit of Detection Using Information Theory Measures Author: Jordi Fonollosa Alexander Vergara Ramon Huerta Santiago MarcoAuthor to whom all correspondence should be addressed: PII: DOI: Reference:

S0003-2670(13)01338-X http://dx.doi.org/doi:10.1016/j.aca.2013.10.030 ACA 232904

To appear in:

Analytica Chimica Acta

Received date: Revised date: Accepted date:

9-8-2013 8-10-2013 11-10-2013

Please cite this article as: J. Fonollosa, A. Vergara, R. Huerta, S. Marco, Estimation of the Limit of Detection Using Information Theory Measures, Analytica Chimica Acta (2013), http://dx.doi.org/10.1016/j.aca.2013.10.030 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Estimation of the Limit of Detection Using Information Theory Measures

1 2

ip t

3

5 a

BioCircuits institute (BCI),

us

6

cr

Jordi Fonollosaa,1, Alexander Vergarab, Ramon Huertaa, Santiago Marcoc,d

4

University of California San Diego,

8

La Jolla, CA 92093, USA

an

7

9 b

Biomolecular Measurement Division, Material Measurement Laboratory,

M

10

National Institute of Standards and Technology,

12

Gaithersburg, MD 20899-8362, USA

ed

11

13 c

Signal and Information Processing for Sensing Systems

pt

14

Institute for Bioengineering of Catalonia (IBEC),

16

Baldiri Reixac, 4-8, 08028 Barcelona, Spain

Ac ce

15

17 18 19

d

Departament d’Electrònica, Universitat de Barcelona,

Martí i Franqués 1, 08028 Barcelona, Spain

20 1

Author to whom all correspondence should be addressed:

Dr. Jordi Fonollosa Tel.: +1 858 534-6758 Fax: +1 858 534-7664 e-mail: [email protected] 1

Page 1 of 44

Abstract

22

Definitions of the Limit of Detection (LOD) based on the probability of false positive and/or

23

false negative errors have been proposed over the past years. Although such definitions are

24

straightforward and valid for any kind of analytical system, the proposed methodologies to

25

estimate the LOD are usually simplified to signals with Gaussian noise. Additionally, there is a

26

general misconception that two systems with the same LOD provide the same amount of

27

information on the source regardless of the prior probability of presenting a blank/analyte

28

sample. Based upon an analogy between an analytical system and a binary communication

29

channel, in this paper we show that the amount of information that can be extracted from the

30

analytical system depends on the probability of presenting the two different possible states. We

31

propose a new definition of LOD utilizing Information Theory tools that deals with noise of any

32

kind and allows the introduction of prior knowledge easily. Unlike most traditional LOD

33

estimation approaches, the new definition is based on the amount of information that the

34

chemical instrumentation system provides on the chemical information source. Our findings

35

indicate that the benchmark of analytical systems based on the ability to provide information

36

about the presence/absence of the analyte (our proposed approach) is a more general and proper

37

framework, while converging to the usual values when dealing with Gaussian noise.

38

Keywords

39

Limit of Detection; Information Theory; Mutual Information; Heteroscedasticity; False

40

positive/negative errors; Gas Discrimination and Quantification

Ac ce

pt

ed

M

an

us

cr

ip t

21

41

2

Page 2 of 44

41

1. Introduction The Limit of Detection (LOD) of an analytical method (or measurement instrument) is a

43

fundamental figure of merit. In most settings, it specifies the smallest concentration quantity at

44

which the analyte can be detected or distinguished from a blank measurement within a stated

45

confidence level, thereby constituting a limiting factor of a chemical detection system. It must be

46

remarked that the value of the LOD that is provided in the specifications of any generic system is

47

very significant since it may have legal implications and play a relevant role in coordinating

48

market regulations by standardization agencies. For example, in 2006 the US Environmental

49

Protection Agency (EPA) reestablished the maximum admissible contamination level of arsenic

50

in drinking water from 50 nmol/mol to 10 nmol/mol. Based on the LOD of the different

51

methodologies to measure the concentration of arsenic, EPA invalidated some of the previously

52

accepted techniques to measure arsenic content in water [1, 2]. Therefore, a clear and accurate

53

definition of LOD along with a consistent experimental protocol for its estimation is imperative

54

for the determination of the LOD provided in the specifications of any chemical system as well

55

as for the mutual understanding among technicians, system developers, and policy makers.

Ac ce

pt

ed

M

an

us

cr

ip t

42

56

Over the last decades, different definitions of the LOD have been proposed, each leading to

57

substantially different estimated values of the LOD [3]. Early definitions of LOD considered

58

only the probability of having false positive errors (i.e., the probability of falsely claiming the

59

presence of the analyte in a sample, or Type I errors) and not the probability of having false

60

negative errors (i.e., the probability of falsely claiming the absence of the target compound, or

61

Type II errors), which could lead to 50% the Type II error probability [4]. Similar approaches

62

based on the probability of errors were utilized to evaluate the acceptability of analytical

63

methods [5]. Subsequent definitions of LOD were adapted to take into account both Type I and 3

Page 3 of 44

Type II errors [6], recommending to set the LOD at the concentration level that would make the

65

probabilities of Type I and Type II errors 5% or less. However, although all these widely

66

accepted definitions are generally valid for any kind of analytical system, the proposed

67

methodologies utilized to estimate the LOD are usually simplified to signals with Gaussian

68

noise.

ip t

64

A chemical measurement system can be considered in most cases as a black-box with

70

input/output signals. The input signal is the concentration of the target analyte, whereas the

71

output signal takes the form of an instrument- dependant quantity derived from the instrument

72

raw output (e.g., peak-area integral at a certain retention time in a GC-FID configuration). We

73

have to note that the LOD is always given in terms of the analyte concentration. Therefore, the

74

concentration (input space) uncertainty must be estimated from the variance at the instrument

75

output. To do so and build the corresponding calibration model, the systems are frequently

76

assumed to have a linear input-output relationship and a constant output variance

77

(homoscedasticity) because linear models favor an easy transformation from the output noise

78

variance to the input. In some instrumental techniques, however, the input-output relationship

79

could be non-linear (e.g. due to competition effects) and the noise distribution can depart from

80

the assumed Gaussian distribution. Interestingly, even though some authors have considered

81

more complex sources of noise like heteroscedastic stochastic process to solve problems in

82

realistic scenarios, they still assume Gaussian noise in their calculations [7, 8].

Ac ce

pt

ed

M

an

us

cr

69

83

The assumption of Gaussian noise in the estimation of the LOD can be too restrictive for

84

chemical detection systems because the measured value (i.e., the analyte concentration) is

85

necessarily positive. Therefore, the distribution of noise inevitably becomes asymmetric and

4

Page 4 of 44

86

non-Gaussian for very small concentration levels, which corresponds to the concentration range

87

explored to estimate the LOD. In attempts to deal with measurements in which the variance is neither normal nor

89

homogenous, Fraga et al. introduced a methodology to estimate the LOD of chemical systems

90

based on the repetition of measurements with and without the chemical of interest in a test

91

sample [9]. Because the presence or absence of the chemical is known by the practitioner, the

92

authors were able to estimate the Type I and Type II probability errors from the predictions of

93

the chemical system. Then, to estimate the LOD, the concentration of the chemical of interest

94

was increased gradually and the error probabilities were evaluated for each concentration level.

95

Finally, the LOD was set at the concentration that made the error probabilities lower than a

96

defined threshold (10 % for both types of error). The authors ultimately used their methodology

97

to optimize the operation of the system by building a Receiver Operating Characteristic (ROC)

98

curve changing the relevant parameter.

ed

M

an

us

cr

ip t

88

All methodologies based on the probability of Type I and Type II errors assume implicitly

100

that the two states representing analyte/no-analyte exposure are presented with the same

101

probability, i.e. the same a priori probability for both classes. However, in most of the

102

applications one of the possible states is expected to be found more often than the other one,

103

thereby leaning the probability of the system towards one of the classes. In this paper we show

104

that the amount of information that can be extracted about the sample from the analytical system

105

depends on the prior probability of presence or absence of the analyte. We propose a new

106

definition of LOD based on the amount of information that the chemical system can extract from

107

the presented stimulus. This new approach clearly shows that the amount of information

108

provided by the analytical method is very dependent on the prior probabilities. In sharp

Ac ce

pt

99

5

Page 5 of 44

comparison to previous definitions, our methodology, which is based on information-theoretic

110

tools [10], is sensible to the probability of presenting the analyte and deals with noise of any

111

kind. The remainder of the paper is organized as follows. In Section 2 we review previous

112

definitions of LOD. Then, we will explore the amount of information that can be extracted by a

113

chemical system (Section 3), followed by the proposed methodology to estimate the LOD

114

(Section 4), two examples to estimate the LOD of a system (section 5), and the conclusions of

115

this work (Section 6).

cr

us

2. Limits of Classic Definitions of LOD

an

116

ip t

109

The LOD is a fundamental figure of merit, the definition of which has changed over the

118

course of years. An early definition of LOD adopted by the IUPAC in 1975 [4] stated that “the

119

limit of detection, expressed as a concentration CL (or amount, qL), is derived from the smallest

120

measure, XL, that can be detected with reasonable certainty for a given analytical procedure”,

121

determining the actual LOD represented by the following equation:

122

 = b + σb

Ac ce

pt

ed

M

117

represents the mean of the blank measures,

(1)

123

where

is its standard deviation, and k is a

124

parameter set according to a defined confidence level [11]. Note that the LOD is expressed in

125

units of concentration and that the distribution of the samples is estimated at the input space

126

(concentration). However, since the systems are usually simplified to a linear relationship input-

127

output, the standard deviation of the blank measurements is often estimated at the output space,

128

and the LOD is converted into analyte concentration by utilizing a previously obtained

129

calibration model: 6

Page 6 of 44

y

k σb = b

130

(2)

where b is the slope of the linear calibration and

132

output space (sensor output).

is the standard deviation calculated at the

ip t

131

Based on Kaiser’s work, the numerical value k usually adopts a value of 3, so the confidence

134

level is set to 99.86%, provided a one-sided normal distribution [12]. In 1995, the IUPAC

135

adopted a new definition that considers both the probabilities of Type I and Type II errors [6]. In

136

such an approach, the methodology to calculate the value XL included the standard deviation of

137

the net concentration when the analyte is not present (σb) and, when the analyte is present, at the

138

level of the LOD, σLOD. Assuming that the noise follows a normal distribution, XL can then be

139

expressed as:

M

an

us

cr

133

ed

 = b + 1− σb + 1− σLOD

140

(3)

where α and β define the thresholds for the Type I and Type II error probabilities, and z1-α

142

and z1-β

143

limit the accepted probabilities. If the noise in the system is considered homoscedastic (i.e., with

144

a constant variance) and Type I and Type II errors are limited to 5% (as recommended by

145

IUPAC), the Eq (3) can be rewritten as [13]:

146

pt

141

Ac ce

are, respectively, the upper percentage points of the two noise distributions that

 = b + 2 0.95 σb = b + 3.3σb

(4)

147

where the factor 3.3 is the common value to estimate the LOD when the noise is considered

148

homoscedastic and its variance is known. If the variance of the noise is unknown, which is the 7

Page 7 of 44

common scenario, the z-values must be replaced by the equivalent t-values of the t-Student

150

distribution [14]. Figure 1 shows a visual comparison between the two classic LOD definitions

151

outlined above2. Despite the efforts made by the IUPAC to standardize the more rigorous

152

definition adopted in 1995, many investigations are still presenting LOD estimates based on the

153

definition that only considers false positive errors. The limitations of the frequent simplifications

154

of methodologies to estimate the LOD based on the probabilities of the errors have been

155

analyzed previously [7, 15]. They include the assumptions that errors in the blank signal are

156

distributed normally, systematic errors are negligible, errors occur only in the y-direction, the y-

157

intercept is not significantly different from the measured value, and a good calibration function is

158

obtained. Additionally, they do not take into account the a priori probabilities of measuring

159

blank samples or the chemical of interest.

cr

us

an

M

3. Information Theory Applied to Chemical Sensing

ed

160

ip t

149

Over six decades ago, Shannon developed the discipline of Information Theory (IT), a

162

mathematical model of communication that quantifies the efficiency of data transmission over

163

noisy channels by measuring the information content at the source and at the receptor [10].

164

Information Theory is today a sound and complete framework for parameter estimation and for

165

learning machines in general [16]. In measurement science in general, and, in particular, in

166

chemical sensing, Information Theory techniques represent basic tools for algorithm analysis,

167

parameter evaluation and optimization, molecular visualization, feature selection, and inference,

168

as attested in numerous works [17-27].

Ac ce

pt

161

2

The reader is referred to very complete reviews [13-17] for a more detailed discussion of these definitions.

8

Page 8 of 44

The efficiency of a communication system is evaluated by comparing the a posteriori

170

probability (i.e., the probability of decoding the original message after its reception) and the a

171

priori probability (i.e., the probability of guessing the original message without any additional

172

information). Hence, it is possible to establish an analogy between a communication system for

173

data transmission and an analytical system. The analyte presence/absence can be seen as a source

174

of information —the analytical system is equivalent to the transmission channel— and the output

175

of the analytical system (sensor reading) represents the received message. By virtue of this

176

analogy, the efficiency of an analytical system can be evaluated by estimating the a posteriori

177

probability (the probability assigned to each state of the source knowing the output of the

178

analytical system) and the a priori probability (the probability assigned to each state with no

179

other information available). The source of information can be a continuous quantity if the goal

180

is the estimation of the analyte concentration or a discrete variable if the purpose of the system is

181

the determination of presence/absence of the analyte.

ed

M

an

us

cr

ip t

169

This study is focused on the determination of whether the target compound is present or not

183

in a test sample. Hence, the source of information is a random binary variable (present/absent

184

analyte). The output of the analytical system, after the definition of a proper threshold, is also a

185

random binary variable since it predicts the presence or absence of the target compound. Ideally,

186

both variables should present the same state, but, as in the data transmission over a noisy

187

channel, there may also be Type I (false positive) and Type II (false negative) errors. Figure 2

188

shows a schematic representation of the analytical system with the error probabilities. This

189

analogy corresponds to a binary communication channel, where the emitter transmits a bit

190

representing the presence/absence of the analyte, and the receiver acquires a bit. The probability

Ac ce

pt

182

9

Page 9 of 44

191

of flipping the bit during transmission, which is equivalent to a wrong prediction by the

192

analytical system, determines the efficiency of the analytical system. Information Theory is based on probability theory and statistics. The two most important

194

measures of information of Shannon’s mathematical theory of communication are entropy (S) —

195

the information contained in a random variable— and Mutual Information (MI) —the amount of

196

information that a second random variable Y yields about the random variable of interest X.

cr

ip t

193

First, the entropy is the amount of self-contained information in a process and describes the

198

level of uncertainty (or disorder) of a system. The entropy of a Discrete Memory-less Source

199

(DMS) depends on the probability of presenting the N possible different states (or symbols) xi:

()2 ()

(5)

ed

200

 =1

M

= −

 =

an

us

197

where S is expressed in bits and represents the minimum number of binary bits needed by a

202

receiver to reconstruct the original message, and p(xi) is the probability of finding the state xi.

203

For a DMS source with two different states (presence or absence of an analyte, see Fig. 2) the

204

entropy can be simply expressed as a function of p(x1), the probability of finding the state x1;

205

thus

Ac ce

206

pt

201

= −(1 )2 (1 ) − [1 − (1 )]2 [1 − (1 )]

(6)

207

where p(x2) = [1-p(x1)]. For the extreme case in which a system has a probability p(x1)=1, the

208

system always presents the state x1 and hence there is no uncertainty and S=0. Figure 3 illustrates

10

Page 10 of 44

209

the entropy of a two-state DMS, which is maximized when both states are equally probable i.e.

210

p(x1)= p(x2)=0.5, providing a maximum entropy of 1 bit. Second, the Mutual Information is a measure of the information of one random variable

212

contained in another random variable. Hence, the MI quantifies the amount of information on the

213

state of the variable X contained by the known state of another random variable Y:

,

(, ) 2

( , ) ()()

cr

=

us

214

ip t

211

(7)

where px(i) and py(j) are the marginal probability distribution functions of variables X and Y and

216

p(i,j) is the joint probability distribution function. In our analogy, X is the absence/presence of

217

the analyte, while Y is the thresholded binary output of the analytical instrument. Our approach is

218

based on the evaluation of the MI given by the observation of Y regarding X. Accordingly, if two

219

random variables are statistically independent, the known state of the first variable does not bring

220

any information on the unknown state of the second variable and MI=0. Conversely, if both

221

variables are coincident, the known state of the first variable makes perfectly ascertainable the

222

state of the second variable, and the MI equals S.

Ac ce

pt

ed

M

an

215

223

Because Eq. 7 leads to the restriction 0 MI  S, the maximum value of MI of an analytical

224

system working to discriminate the presence/absence of a compound (Smax = 1 bit, see Fig. 3) is 1

225

bit, which corresponds to a system where both states (presence/absence of analyte) are equally

226

probable with no Type I or Type II errors. However, the MI of a DMS depends on the a priori

227

probability of the states and the probability of Type I and Type II errors. Figure 4 shows the MI

228

between the source (presence/absence of analyte) and the analytical prediction of

229

presence/absence of analyte for different probabilities of presenting a blank sample in the source 11

Page 11 of 44

and different Type I and Type II error probabilities3. When the presence/absence of analyte is

231

equally probable in the source (p(x1) = pblank =0.5), the maximum value of MI is 1 bit and the

232

obtained map is symmetric. However, when the probability of presenting a blank sample

233

increases, the maximum value of MI decreases and the Type I error becomes more significant

234

because the system is biased towards the probability of presenting a blank sample, which is the

235

needed input to obtain a false positive reading from the system.

cr

us

236

ip t

230

237

an

4. Definition of LOD based on Mutual Information

Our methodology to estimate the LOD of analytical systems is based on the analogy from a

239

binary channel, where the source of information is the random variable representing the

240

absence/presence of an analyte (represented by 0/1) and the random variable representing the

241

prediction made by the analytical system corresponds to the received message. The maximum of

242

MI between these two variables is 1 bit, which would correspond to an ideal analytical system

243

with zero errors (Type I or Type II, i.e. α = β = 0, see Fig. 2) and where the prior probability of

244

analyte presence is 50 %. Two different contributions impact the amount of information, MI, that

245

can be extracted from the system taking into account the knowledge of the output Y. On the one

246

hand, the prior probability of analyte presence limits the entropy of the system and the MI. On

247

the other hand, the efficiency of the analytical system itself, which is given by Type I and Type

248

II error probabilities, misguides the predictions made by the system. Therefore, in order to

249

estimate the LOD of an analytical system, it is necessary (i) to set the desired thresholds for the

Ac ce

pt

ed

M

238

3

We assigned the state ‘blank sample presented’ to x1 and the state ‘analyte presented’ to x2. Therefore, from now on, the probability of presenting a blank sample to an analytical system p(x1)=pblank.

12

Page 12 of 44

250

Type I and Type II probability errors, and (ii) to estimate the prior probability to present

251

blank/analyte samples. The thresholds for the Type I and Type II error probabilities can be set to any arbitrary value

253

according to the needs and restrictions of the user. The typical IUPAC definition for the LOD

254

suggests a probability for both Type I and Type II errors of 5%. In this work we assume that

255

there is some a priori knowledge that allows the estimation of the probability of presenting the

256

analyte or a blank sample. However, if such information is not available, the results from the

257

IUPAC definition are reproduced by setting the probability of presenting the analyte to 50 %.

258

Therefore, before estimating the LOD, our methodology needs to define the parameters α, β and

259

pblank to determine the MI threshold, MIth. Once MIth is set, the LOD estimation is given by the

260

minimum analyte concentration that makes the MI between the source of information and the

261

analytical system higher than MIth. Figure 5 shows the MIth for different values of the Type I and

262

Type II errors and the probability to present blank samples. The figure also provides a guide to

263

determine the MIth defined by the relevant parameters (α, β, and pblank ). For the convenience of

264

the reader, the MIth values from Figure 5 for the most common scenarios are provided in Table 1

265

(Equation A.1 in the appendix relates the MIth for arbitrary values of α, β, and pblank ).

Ac ce

pt

ed

M

an

us

cr

ip t

252

266

In summary, as the concentration rises, the amount of information that the system provides

267

from the input is higher, and the MI between the binary input variable (gas present/gas absent)

268

and the binary output variable (prediction of the sensor) increases. A MI threshold is defined

269

from the accepted tolerances in Type I and Type II errors and the probability to present blank

270

samples (see Figure 5). And finally, the LOD is the lowest concentration level that makes the MI

271

higher than the threshold.

13

Page 13 of 44

It is important to note that MIth depends on three parameters that need to be defined before

273

estimating the LOD: Type I and Type II errors and the probability to present blank samples. The

274

first two parameters depend exclusively on the accepted tolerances in the errors made by the

275

system and can take any value agreed by the practitioner, community, or standardization

276

agencies. However, the estimation of the probability to present blank samples may not be

277

straightforward. Although in most common scenarios such information is available and can be

278

obtained either from previous experiments or in the literature, sometimes the practitioner has to

279

face difficulties due to lack of information. If no information on pblank is available, one can

280

simply assume pblank = 50 % to reproduce the LOD values proposed by the IUPAC. A more

281

accurate solution would include the estimation of pblank from the same set of measurements used

282

to calculate the LOD. If the samples to estimate the LOD are obtained from the same

283

environment from which the system acquires samples in normal operation, one can expect that

284

the obtained pblank from the subset of samples will be a good estimate of the actual pblank. In this

285

scenario, the practitioner should collect first all the samples to estimate pblank and then determine

286

the MIth to estimate the LOD.

cr

us

an

M

ed

pt

5. Examples of LOD estimation

Ac ce

287

ip t

272

288

In this section we validate our methodology with two different analytical systems. First, we

289

estimate the LOD of a system utilizing synthetic data to show that the noise distribution of the

290

measurement, indeed, affects the LOD estimate. Second, we estimate the LOD of an analytical

291

system to detect benzene, a compound that has special interest due to its carcinogenic properties

292

and presence in industrial environments.

293

5.1: LOD estimation under different noise probability density distributions (pdf) 14

Page 14 of 44

In order to compare our definition of LOD with the methodologies that assume

295

homoscedastic Gaussian noise and neglect the probability of presenting the analyte, we

296

simulated a system with different noise distributions. Specifically, we built a linear model of a

297

system with sensitivity S=0.5 and offset=2: = .  + 

(8)

cr

298

ip t

294

We generated synthetic noise with different probability density function distributions. The

300

resulting noise was added to the system output (Y). In particular, we studied four different

301

distributions of noise4: a) a Gaussian noise with mean 0 and standard deviation σ = 1, b) a

302

uniform distribution in the interval (-1.73,+1.73), c) a discrete binary noise distribution where the

303

values ±1 are equally probable, and d) a Gaussian noise distribution with mean 0 and standard

304

deviation increasing linearly with the input signal according to σ = 1 +0.1 X. The latter

305

distribution reproduces a system, the variability of which is proportional to the input

306

concentration and which is a common attribute of analytical systems [28]. Note that the four

307

abovementioned distributions have been designed to show the same standard deviation at zero.

308

Therefore, the measured standard deviation of the blank samples (at the output space Y) is sn=0.5

309

for all the noise distributions, and the estimated LOD coincides for the methodologies that only

310

consider the dispersion of the blank samples. In particular, assuming k=3.3, the estimation of the

311

LOD corresponds to 3.3 in all the considered examples.

Ac ce

pt

ed

M

an

us

299

312

The methodology based on the MI estimation, however, is sensitive to the different noise

313

distributions and the probability to present blank samples. Figure 6 illustrates that MI increases

314

differently for different noise distributions, which is the origin of providing different MI 4

The noise distributions are referred to the input space (in units of concentration).

15

Page 15 of 44

estimates for the LOD. Figure 7 shows the LOD estimations for the different noise distributions

316

and the probability of blank samples (pblank). From Figure 7 we can conclude that i) the LOD

317

changes for different noise distributions if we want to obtain the same amount of information

318

from the analytical system and ii) the LOD needs to be more restrictive (higher analyte

319

concentration) when the probability of presenting blank samples increases. Although the effect

320

of pblank becomes only significant when it is expected to measure blank samples most of the time,

321

it is a scenario faced by many applications. Additionally, the variation of the LOD caused by the

322

different values of pblank or Type I and Type II error probabilities is comparable to the parameter

323

divergence of other methodologies and definitions. For example a change in the parameter k

324

from 3 to 3.3 represents a variation of the LOD estimation of the same order of magnitude as the

325

variations shown by the different noise distributions and pblank. Therefore, the methodology to

326

estimate the LOD based on the MI can provide sound and consistent LOD estimations, can be

327

adapted to any arbitrary value of Type I and Type II errors, and, in contrast to any methodology

328

presented before, give the possibility to introduce a priori knowledge on the probability of

329

presenting blank/analyte samples.

cr

us

an

M

ed

pt

Ac ce

330

ip t

315

331

5.2 LOD of a chemosensory system

332

5.2.1 Data Collection

333

To illustrate the methodology proposed to estimate the LOD, we studied an analytical

334

system to detect benzene, which has been identified as carcinogenic to humans and can be

335

present in the atmosphere from natural sources such as oil seeps and wild fires or originated in

336

industrial environments, gasoline filling stations and automobile combustion engines [29]. In 16

Page 16 of 44

order to minimize the risks of the individuals exposed to benzene, the Occupational Safety and

338

Health Administration (OSHA) defined permissible exposure limits (PEL) according to the time-

339

weighted average (TWA) and short-term exposure limits (STEL) in different industrial scenarios

340

[30]. Therefore, the measurement of the levels of benzene is important for worker safety, but

341

such measurements need to be performed reliably with instrumentation that has the LOD clearly

342

defined.

cr

ip t

337

For illustration purposes, we consider a detection system based on a metal oxide (MOX) gas

344

sensor, the conductivity of which changes when the sensing layer is exposed to

345

reducing/oxidizing volatiles. MOX sensors are a common choice due to their cost-effective

346

design, native cross-reactivity (i.e., the vast number of volatiles sensors can detect), sensitivity,

347

and ease of operation [31-33]. In particular, we utilized a TGS2610 MOX gas sensor

348

(Figaro[34]) placed in a 60 ml volume Teflon/stainless steel air-tight gas chamber into which

349

benzene could be injected at different concentrations. In order to ensure a sensor response as

350

reproducible as possible, the sensor was pre-heated for several days before starting the set of

351

measurements while flowing 200 ml/min of clean air in the test chamber. Briefly, the vapor

352

delivery system was composed of two fluidic branches that met each other in the injection point

353

before bringing the resulting gas mixture to the test chamber. On the one hand, the solvent

354

branch included a high-pressure cylinder containing the carrier gas (medical-grade dry-air

355

supplied by Airgas [35]) connected in series to a mass flow controller (supplied by Bronkhorst

356

High-Tech B.V.[36]) with a maximum flow of 200 ml/min. On the other hand, the solute branch

357

was based on a high-pressure calibrated cylinder of benzene at 500 nmol/mol in air (provided

358

and certified by Airgas), the flow of which was regulated by a mass flow controller (Bronkhorst

359

High-Tech B.V.) with a maximum flow rate of 100 ml/min. A computer platform equipped with

Ac ce

pt

ed

M

an

us

343

17

Page 17 of 44

a National Instruments acquisition board (PCI-6014) and LabView software (ver. 6) was adapted

361

to command the full set of experiments, which required control over several parameters that can

362

be defined by the user. First, the sensor’s operating temperature was controlled by the voltage

363

applied on the built-in heating element of the sensor, which was kept constant at 5 V during the

364

whole set of experiments. Second, the concentration of the gas sample (benzene) was controlled

365

by the flow of the two fluidic branches in such a way that the total flow was kept constant to 200

366

ml/min. And finally, the resistance of the sensor was acquired at a sampling frequency of 100 Hz

367

and stored in a computer for further processing.

us

cr

ip t

360

The sensor was exposed to eight different levels of benzene concentration (12.5, 18.75, 25,

369

31.25, 37.5, 43.75, 50, and 56.25 nmol/mol), each repeated 13 times in a random order for a total

370

of 104 measurements. The experimental procedure for each of the measurements was composed

371

of three different steps. First, clean air was circulated in the test chamber for 5 minutes to capture

372

the signal baseline of the sensor. Second, the sensor was exposed to the mixture of benzene at a

373

concentration level randomly selected from the list of concentrations for 5 minutes. And finally,

374

clean air was re-circulated for 5 minutes to purge out the gas sample cell.

pt

ed

M

an

368

In order to consider only the steady-state portion of the sensor signal, we selected the

376

samples of the time series just before air conditions were changed. In particular, for each of the

377

measurements, we selected the 4000 samples comprised from 250 s to 290 s to capture the

378

system response to a blank stimulus and the 4000 samples comprised from 550 s to 590 s to

379

acquire the response to the benzene presentation. Therefore, in total, we have 52000 samples

380

(4000 samples × 13 repetitions) for each of the concentration levels and 416000 samples

381

acquired from blank responses (we extracted the blank response from each of the 104

382

measurements).

Ac ce

375

18

Page 18 of 44

Usually, the baseline of the sensor is removed to improve the performance of the system.

384

Baseline is estimated from the value of the transitory response just before the gas exposure [37].

385

Therefore, we subtracted the mean of the sensor response during the 40 seconds before the

386

beginning of the gas exposure5.

387

5.2.2 LOD Estimation

cr

ip t

383

In order to estimate the LOD it is necessary to define the accepted limits for the Type I and

389

Type II errors and estimate the probability to present blank samples. For the following example

390

we set the limits for the Type I and Type II errors to 5%, as suggested by the IUPAC. However,

391

in order to illustrate better our methodology, we estimated the LOD for two cases: i) assuming

392

pblank = 50%, which corresponds to the scenario with no a priori knowledge, and ii) assuming

393

pblank = 90%. Therefore, the MI threshold for the LOD estimation is 0.7136 and 0.2978,

394

respectively (see Table 1). Finally, it is necessary to compute the MI between the information

395

source and the sensor prediction for different benzene concentrations. The LOD will be defined

396

by the minimum benzene concentration level that makes the MI larger than the corresponding

397

threshold value.

Ac ce

pt

ed

M

an

us

388

398

To estimate the LOD of a system given pblank, it is necessary to create a dataset with the

399

same ratio of blank measurements versus analyte exposures than pblank. Therefore, to estimate the

400

LOD for pblank = 50%, we built a dataset of sensor predictions with a balanced number of

401

blank/analyte exposures, whereas the dataset to estimate the LOD for pblank = 90% was balanced

402

accordingly. The MI between the source of information (analyte presented/absent) and the sensor

5

The collection of the total set of experiments took about 32 hours, and the measurements were acquired during day and night. Although uncontrolled variables such us room temperature and pressure can be considered constant during the length of a single experiment (15 minutes), they affect the baseline of the sensor during the whole duration of the experiment set.

19

Page 19 of 44

403

prediction increases for higher levels of benzene concentration (see Figure 8). The MI threshold

404

to define the LOD is set to 0.7136 (pblank = 50%) and 0.2978 (pblank = 90%). According the proposed definition in this work, the LOD is given by the smallest input

406

concentration that makes the MI higher than the corresponding threshold. Therefore, from Fig. 8

407

we adjusted a 4th degree polynomial function for each pblank to determine the concentration level

408

at which the MI reaches the corresponding threshold. The obtained LOD estimations are 22.4

409

nmol/mol and 25.1 nmol/mol for pblank = 50% and pblank = 90%, respectively. We also estimated

410

the LOD with the definition proposed by the IUPAC. The standard deviation of the blank

411

measures (at the output space) is 7.8 Ω/Ω. In order to convert the LOD to the concentration

412

space, we utilized the Clifford-Tuma model that relates the resistance of a MOX sensor when it

413

is exposed to pure air to the resistance of the sensor when it is exposed to different gas

414

concentrations [38, 39]. The estimated LOD value based on the standard deviation (k=3.3) of the

415

blank samples would be 21.1 nmol/mol, which is an optimistic value (especially for low

416

probabilities of presenting the analyte) compared to the LOD estimations based on Information

417

Theory. From Figure 8 we can conclude that the LOD needs to be shifted towards higher

418

concentration levels for different prior probabilities if the thresholds for the Type I and Type II

419

error probabilities, α and β, are set to be constant.

421

cr

us

an

M

ed

pt

Ac ce

420

ip t

405

6. Conclusions

422

In this work, we showed that the amount of information on the absence/presence of a target

423

analyte that can be extracted using a particular instrumental technique depends not only on the

424

prior probabilities of presenting the analyte or a blank sample, but also on the noise distribution. 20

Page 20 of 44

In other words, in order to have the same information on the chemical source, it is necessary to

426

set the limit of detection at different concentration levels if the noise distribution is different.

427

This outcome is an interesting result because analytical systems are usually compared based on

428

the misconception that if the error probabilities (false and/or negative positives) of two systems

429

are the same, the systems are equivalent regardless of the noise distribution present in the

430

system. Based on the amount of information, we proposed a methodology to estimate the LOD

431

that deals with noise of any kind and can provide more accurate comparisons between systems.

us

cr

ip t

425

Using synthetic and experimental data with different distributions of noise, we showed that

433

our methodology captures better the information contained in the analytical signals and is

434

sensitive to the noise distribution and the prior probabilities, whereas classic methodologies

435

always provide the same LOD estimate. We showed that a limit of detection estimated from

436

actual information that can be extracted from an analytical system is more informative than

437

simply estimating the error probabilities, thereby providing better comparisons between systems.

438

Therefore, we believe that our methodology can be of special interest to obtain accurate

439

benchmarks of analytical systems, and it can be used to estimate the LOD of a wide variety of

440

analytical systems. Further investigations on the definition and methodologies to estimate the

441

LOD may include maximum likelihood ratio tests [40], which have already been used to evaluate

442

the discrimination ability of analytical systems and can also deal with unbalanced number of

443

samples [41-43]. The Neyman-Pearson lemma [44] shows that, given the Type-I error rate,

444

likelihood ratio testing provides the test with the lowest Type-II error rate (maximum power) for

445

simple-vs-simple hypotheses. Therefore, when performing a hypothesis test between H0: µ=µ0

446

versus H1: µ=µ1, the likelihood ratio test is the most powerful test. However, when designing a

447

test to estimate the LOD of an analytical system, a new sample will have certain probability to

Ac ce

pt

ed

M

an

432

21

Page 21 of 44

belong simultaneously to class blank (H0) and class analyte (H1). Therefore, one does not have to

449

reject H0 in favor of H1 since the goal is the optimization of a combined hypothesis of the form

450

ƞH0 + (1-ƞ)H1. Additionally, one needs to build a model for class blank and class analyte from

451

experimental data, and the parameters of the models will unavoidably have some uncertainty that

452

may limit the power of the likelihood ratio test. Hence, it remains as further work to adapt

453

maximum likelihood ratio tests to estimate LOD of analytical systems.

cr

ip t

448

us

454

an

455 456

M

457

459

pt

ed

458

Appendix:

461

Mutual Information for arbitrary values of α (Type I error probability), β (Type II error

462

probability) , and P1(probability of presenting a blank example):

463 464 465 466

Ac ce

460

    p1 1     1  p1  MI  p1 1    log 2    1  p1   log 2    p1  p1 1     1  p1      1  p1   p1 1     1  p1         1  p1 1    p1   p1 log 2    1  p1 1    log 2    p1  p1  1  p1 1      1  p1   p1  1  p1 1     22

Page 22 of 44

467

Acknowledgements:

469

This work has been supported by the Jet Propulsion Laboratory under the contract number 2013-

470

1479652 and partially funded by the Spanish Ministerio de Economía y Competitividad under

471

the project TEC2011-26143. Alex Vergara was financially supported by the NIST/NIH Research

472

Associateship program administered by the National Research Council and partially financed by

473

NATO under the Science for Peace & Security Program under grant no. SPS-984511. Santiago

474

Marco is member of the consolidated research group SGR2009-0753 by the Generalitat de

475

Catalunya. The authors also thank Joanna Zytkowicz for proofreading and revising the

476

manuscript. The suppliers and methodological tools identified herein are only specified for the

477

experimental procedures presented in this manuscript. Their mentioning in no way implies

478

recommendation or endorsement by the National Institute of Standards and Technology.

ed

M

an

us

cr

ip t

468

pt

479

Vitae:

481

Jordi Fonollosa (Ph.D., 2009 – University of Barcelona) is a Postdoctoral Researcher at the

482

BioCircuits Institute, UC San Diego. His research is focused on gas sensor array robustness and

483

optimization, support vector machines, and Information Theory applied to chemical sensing.

484

Other strong interests include biologically inspired algorithms, signal recovery systems, and

485

infrared sensing technologies.

486

Alexander Vergara (Ph.D., 2006 – Universitat Rovira i Virgili) is a NRC research associate

487

jointly working at the National Institute of Standards and Technology (NIST) and the National

Ac ce

480

23

Page 23 of 44

Institutes of Health (NIH) and a Visiting Research Scholar at the BioCircuits Institute, UC San

489

Diego, where he was a postdoctoral researcher until summer 2012. His work mainly focuses on

490

the use of dynamic methods and information-theoretic formalisms for the optimization of micro

491

gas-sensory systems and on the building of autonomous vehicles that can localize odor sources

492

through a process resembling the biological olfactory processing. His areas of interest also

493

include information theory, signal processing, pattern recognition, feature extraction, chemical

494

sensor arrays, and machine olfaction.

495

Ramón Huerta (Ph.D., 1994 – Universidad Autónoma de Madrid) is a research scientist at the

496

BioCircuits Institute, UC San Diego. Prior his current appointment, he was associate professor at

497

the Universidad Autónoma de Madrid (Spain). His areas of expertise include dynamic systems,

498

artificial intelligence, and neuroscience. His work deals with the development algorithms for the

499

discrimination and quantification of complex multidimensional time series, model building to

500

understand the information processing in the brain, and chemical sensing and machine olfaction

501

applications based on bio-inspired technology. Dr. Huerta's research work gathers in a

502

publication record of over 90 articles in peer-reviewed journals at the intersection of computer

503

science, physics, and biology.

504

Santiago Marco (Ph.D., 1993 – University of Barcelona) is Associate Professor at the

505

University of Barcelona and head of the Signal and Information Processing for Sensor Systems

506

Lab at the Institute for Bioengineering of Catalonia, Barcelona, Spain. His research concerns the

507

development of signal/data processing algorithmic solutions for smart chemical sensing based in

508

sensor arrays or microspectrometers integrated typically using microsystem technologies. Dr.

509

Marco research has produced over 100 articles in peer-reviewed archival journals. More at

510

http://www.ibecbarcelona.eu/artificial_olfaction

Ac ce

pt

ed

M

an

us

cr

ip t

488

24

Page 24 of 44

511 512

References:

514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550

[1] K.A. James, J.R. Meliker, J.A. Marshall, J.E. Hokanson, G.O. Zerbe, T.E. Byers, Journal of Exposure Science and Environmental Epidemiology (2013). [2] Analytical Feasibility Support Document for the Second Six-Year Review of Existing National Primary Drinking Water Regulations US Environmental Protection Agency, 2009. [3] L.A. Currie, Analytica Chimica Acta, 391 (1999). [4] Pure and Applied Chemistry, 45 (1976). [5] E.F. McFarren, R.J. Lishka, J.H. Parker, Analytical Chemistry, 42 (1970) 358. [6] L.A. Currie, Analytica Chimica Acta, 391 (1999). [7] E. Desimoni, B. Brunetti, Analytica Chimica Acta, 655 (2009). [8] E. Voigtman, K.T. Abraham, Spectrochimica Acta Part B-Atomic Spectroscopy, 66 (2011). [9] C.G. Fraga, A.M. Melville, B.W. Wright, Analyst, 132 (2007) 230. [10] C.E. Shannon, Bell System Technical Journal, 27 (1948) 379. [11] H.M.N.H. Irving, H. Freiser, T.S. West, IUPAC, Compendium of Analytical Nomenclature - The orange book. Definitive rules, Oxford : Pergamon, 1978. [12] H. Kaiser, Analytical Chemistry, 42 (1970) 24A. [13] R. Boque, A. Maroto, J. Riu, F.X. Rius, Grasas Y Aceites, 53 (2002) 128. [14] M.C. Ortiz, L.A. Sarabia, M.S. Sanchez, Analytica Chimica Acta, 674 (2010). [15] E. Desimoni, B. Brunetti, R. Cattaneo, Annali Di Chimica, 94 (2004). [16] J. Principe, Information Theoretic Learning, Springer, 2010. [17] F. Dupuis, A. Dijkstra, Analytical Chemistry, 47 (1975) 379. [18] K. Eckschlager, Information theory in analytical chemistry, John Wiley & Sons, 1994. [19] V. David, A. Medvedovici, Journal of Chemical Information and Computer Sciences, 40 (2000) 976. [20] T.K. Alkasab, J. White, J.S. Kauer, Chemical Senses, 27 (2002) 261. [21] T. Pearce, A. Sanchez-Montañes, Handbook of Artificial Olfaction Machines, WileyVCH, Weinheim, 2002. [22] P.P. Vazquez, M. Feixas, M. Sbert, A. Llobet, Computers & Graphics-Uk, 30 (2006) 98. [23] A. Vergara, M.K. Muezzinoglu, N. Rulkov, R. Huerta, Sensors and Actuators BChemical, 148 (2010) 298. [24] M. Trincavelli, A. Loutfi, Ieee, IEEE International Conference on Robotics and Automation (ICRA), Anchorage, AK, 2010, p. 2852. [25] J. Fonollosa, A. Gutierrez-Galvez, S. Marco, Plos One, 7 (2012). [26] A. Vergara, E. Llobet, Talanta, 88 (2012) 95. [27] J. Fonollosa, L. Fernández, R. Huerta, A. Gutiérrez-Gálvez, S. Marco, Sensors and Actuators B: Chemical, 187 (2013) 331. [28] D.T. O'Neill, E.A. Rochette, P.J. Ramsey, Analytical Chemistry, 74 (2002) 5907.

Ac ce

pt

ed

M

an

us

cr

ip t

513

25

Page 25 of 44

an

us

cr

ip t

[29] P. National Toxicology, Report on carcinogens : carcinogen profiles / U.S. Dept. of Health and Human Services, Public Health Service, National Toxicology Program, 12 (2011) iii. [30] Toxic and Hazardous Substances: Benzene. Occupational Safety and Health Administration. [31] Y.K. Min, H.L. Tuller, S. Palzer, J. Wollenstein, H. Bottner, Sensors and Actuators BChemical, 93 (2003) 435. [32] N. Barsan, D. Koziej, U. Weimar, Sensors and Actuators B-Chemical, 121 (2007) 18. [33] G.F. Fine, L.M. Cavanagh, A. Afonja, R. Binions, Sensors, 10 (2010) 5469. [34] Figaro USA, Inc. [35] Airgas, Inc. [36] Bronkhorst High-Tech B.V. [37] J.W. Gardner, P.N. Bartlett, Oxford University Press, New York, 1999. [38] P.K. Clifford, D.T. Tuma, Sensors and Actuators, 3 (1983) 233. [39] P.K. Clifford, D.T. Tuma, Sensors and Actuators, 3 (1983) 255. [40] A.P. Dempster, Statistics and Computing, 7 (1997) 247. [41] A. Vexler, A. Liu, E. Eliseeva, E.F. Schisterman, Biometrics, 64 (2008) 895. [42] R. Thiebaut, H. Jacqmin-Gadda, Computer Methods and Programs in Biomedicine, 74 (2004) 255. [43] H.S. Lynn, Statistics in Medicine, 20 (2001) 33. [44] J. Neyman, E.S. Pearson, On the problem of the most efficient tests of statistical hypotheses, Springer, 1992.

M

551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572

Ac ce

pt

ed

573

26

Page 26 of 44

573

Tables:

574

Table 1: Mutual Information between the source of information and the prediction of the system

576

for different values of the probability to present blank samples and error Type I and Type II

577

probabilities. This table sets the threshold to define the MI for different configurations of the

578

system. The case MI (α = 0%; β = 0%) corresponds to the entropy of the system.

cr

Pblank=0.67

Pblank=0.75

Pblank=0.9

Pblank=0.98

MI (α = 0%; β = 0%)

1

0.9149

0.8113

0.4690

0.1414

MI (α = 5%; β = 5%)

0.7136

0.6450

0.5622

0.2978

0.0720

MI (α = 5%; β = 10%)

0.6205

0.5688

0.4984

0.2663

0.0646

MI (α = 10%; β = 5%)

0.6205

0.5497

0.4727

0.2402

0.0553

MI (α = 10%; β = 10%)

0.5310

0.477

0.4123

0.2111

0.0488

pt

ed

M

an

us

Pblank=0.5

Ac ce

579

ip t

575

27

Page 27 of 44

Figure captions

580

Figure 1: Representation of the classic definitions of LOD for a linear homoscedastic system.

581

The first definition has no control over the probability of Type I errors (left). The second LOD

582

definition considers the probability of Type I and Type II errors (right).

583

Figure 2: An analytical system shows an analogy to a communication channel. The presence (or

584

not) of an analyte represents the message in the source (information to be transmitted). The

585

analytical system is equivalent to a noisy channel where the message is transmitted. The readings

586

of the sensory system determine if the analyte is present (or not) at the source. The probabilities

587

of Type I errors (false positive) and Type II errors (false negative) are given by α and β

588

respectively.

589

Figure 3: Entropy of a two-state discrete memoryless source as a function of the probability of

590

finding the state x1. The entropy takes its maximum value S=1bit when both states (analyte

591

present/analyte not present) are equally probable (p(x1)= p(x2)=0.5).

592

Figure 4: Mutual Information (in bits) between the source (analyte presented/absent) and the

593

prediction of the analytical system for different values of the probability of presenting blank

594

samples. Type I error becomes more significant when the probability of presenting blank

595

samples increases.

596

Figure 5: The entropy of an analytical system aimed at the discrimination between

597

absence/presence of an analyte is limited to 1 bit. The MI between the source of information

598

(analyte presented or not) and the prediction of the system depends on the probability of

599

presenting a blank sample (pblank ) and the defined thresholds for Type I and Type II errors (α and

Ac ce

pt

ed

M

an

us

cr

ip t

579

28

Page 28 of 44

β respectively). The LOD can be defined by the amount of information that can be extracted

601

from the analytical system. Once the parameters α, β, and pblank are set, this figure provides a

602

guide to determine MIth. Then, the LOD is the lowest concentration level that makes the MI

603

between the source and the output higher than MIth.

604

Figure 6: MI across the concentration of X for a system with homoscedastic σ = 1 Gaussian

605

noise (blue), uniform distribution (green), discrete binary noise (red), and Gaussian noise with

606

standard deviation increasing linearly with the input signal according to σ = 1 +0.1 X (black).

607

The probability of presenting a blank sample (pblank) is 50%, so the threshold that defines the

608

LOD is MIth=0.7136 (see Table 1). The obtained LOD utilizing our methodology for the three

609

systems is 3.30 , 3.12 , 2.0 , and 4.98 respectively. Our methodology to estimate the LOD is

610

sensitive to the noise distribution, whereas the IUPAC methodology provides the same

611

estimation for the three systems (3.3, assuming k=3.3) since the standard deviation of the blank

612

samples is the same for all the simulated systems.

613

Figure 7: LOD for an analytical system with different noise distributions. The LOD is estimated

614

in such a way that the amount of information provided by the system remains constant. In

615

contrast to classical definitions that would estimate LOD = 3.3 . for all the cases, the

616

methodology based on the MI is sensitive to the noise distribution and the probability to present

617

blank samples. Gaussian noise (dark blue), uniform distribution (light blue), discrete binary noise

618

(yellow), and Gaussian noise with increasing standard deviation (dark red).

619

Figure 8: Mutual Information between the source of information (presence/absence of analyte)

620

and the sensor prediction. We estimated the LOD for two different a priori probabilities pblank =

621

50 % (top) and pblank = 90 % (bottom). The red line shows the MI threshold, MIth, which

Ac ce

pt

ed

M

an

us

cr

ip t

600

29

Page 29 of 44

622

determines the concentration for the LOD. MI increases for higher values of benzene

623

concentration before it reaches the corresponding maximum value set by the entropy.

Ac ce

pt

ed

M

an

us

cr

ip t

624

30

Page 30 of 44

624

Highlights

625 626



We propose a definition of Limit of Detection (LOD) based on Information Theory.



Analytical systems are compared based on their ability to provide information.



The methodology to estimate the LOD deals with noise distributions of any kind.



Our methodology converges to the same LOD values than traditional methods.



We show different examples to estimate the LOD with our methodology.

628

ip t

627

630

cr

629

632

us

631

633 634

an

635 636

M

637

Ac ce

pt

ed

638

31

Page 31 of 44

Ac

ce

pt

ed

M

an

us

cr

i

*Graphical Abstract

Page 32 of 44

Ac

ce

pt

ed

M

an

us

cr

i

fig1a.tif

Page 33 of 44

Ac

ce

pt

ed

M

an

us

cr

i

fig1b.tif

Page 34 of 44

Ac

ce

pt

ed

M

an

us

cr

i

fig2.tif

Page 35 of 44

Ac

ce

pt

ed

M

an

us

cr

i

fig3.tif

Page 36 of 44

Ac

ce

pt

ed

M

an

us

cr

i

fig4a.tif

Page 37 of 44

Ac

ce

pt

ed

M

an

us

cr

i

fig4b.tif

Page 38 of 44

Ac

ce

pt

ed

M

an

us

cr

i

fig4c.tif

Page 39 of 44

Ac

ce

pt

ed

M

an

us

cr

i

fig5.tif

Page 40 of 44

Ac

ce

pt

ed

M

an

us

cr

i

fig6.tif

Page 41 of 44

Ac

ce

pt

ed

M

an

us

cr

i

fig7_f.tif

Page 42 of 44

Ac

ce

pt

ed

M

an

us

cr

i

fig8a.tif

Page 43 of 44

Ac

ce

pt

ed

M

an

us

cr

i

fig8b.tif

Page 44 of 44

Suggest Documents