Using the Log-Normal Distribution in the Statistical Treatment of Experimental Data Affected by Large Dispersion Carlo F. M. Carobbi
Marco Cati
Luigi M. Millanta
Dept. of Electronics and Telecommunications, University of Florence, Italy email:
[email protected]
Dept. of Electronics and Telecommunications, University of Florence, ltaly email:
[email protected]
Dept. of Electronics and Telecommunications, University of Florence, Italy email
[email protected]
measurement results characterized by large deviations and expressed in log units. For instance the Guide to the Expression of Uncertainty in Measurement (GUM) [6] does not discuss this case. This lack of coverage is likely to determine questionable or incorrect interpretation of EMC test results, especially disturbing when emission or susceptibility limits are exceeded by a relatively small amount. Apart from the strict interest in uncertainty evaluations, the statistical analysis of large fluctuations is more generally important e. g. in experiments intrinsically involving large fluctuations such as the measurement of the voltage across an impedance when the injected current is fluctuating because of contact imperfections.
Abstract There are situations in experimental work where largejluctuations of the measurand are experienced, either because of inherent variations of the observed quantity or because of the complexity of the system process leading to the output quantity. A large spread of the measuredvalues around the center value results. It often appears to the experimenter that the spread of values around the mean is not symmetric, rather, values above the mean obtained by multiplying by a certain factor are approximately as likely as those obtained by dividing the mean value by the samefactor. A log-normal distribution thus appears to be a candidate for a representation of the distribution of the observed quantify, at least as a simplrfiing assumption, when such distribution cannot be assumed a priori on the basis of physical reasoning. A system where large variations of the observed quantity result can be e.remplified by the radiotor-to-receiver transmission in the dominant presence of reflecting surjaces, such as in a screened room. Overall, large variations are often observed in EMC work, where we are usually faced with complex experimental orpredicfive processes. In the following we describe the procedure through which the parameters of the log-normal distribution fitting a given set of experimental outcomes are obtained. This description is applied to the measured field distribution in a screened room.
The use of log units in EMC is justified by several reasons a fundamental one being the large dynamic range involved in the description of the majority of the EMC quantities. The variation of the calibration factor of a broadband antenna may amount to tens of dB over its calibration frequency range. The emissions of a piece of electrical equipment may vary by several orders of magnitude over the frequency range of interest. It is this wide range of variations that makes the dl3 option a necessity. EMC measurements are required to be performed over large frequency ranges [?I. Measuring instruments, antennas, sensors, probes, test facilities must often be calibrated over several frequency decades. This contributes to implying relatively large uncertainties in the evaluation of the quantities under calibration. Also note that the EMC measurement results are markedly variable depending on the test set-up configuration and on the specific device under test (DUT). This large variability is best dealt with using a statistical approach. We should indeed note that whenever, as is usually done, we resort to freezing the experimental configuration in an effort to increase repeatability we correspondingly lose information. In the limit, the only information content left is l bit: pass or fail.
Keywords Uncertainty, Log-normal, Large dispersion INTRODUCTION Uncertainty evaluations in EMC measurements have recently received increasing attention from the scientific and technical community. Several papers, workshops and tutorials have been presented on the subject and particular attention has been drawn to the use of log versus linear units [I]-[5]. Logarithmic units such as dB, dBm, dB(pV), dEl( m-') and many others have been in wide use in the Electromagnetic Compatibility (EMC) technology since its beginning. The evaluation of the uncertainty involved in a given test or measurement, however, is of more recent interest and still appears to give rise to controversy. The characteristics of the EMC physical quantities relevant to the present discussion are the common use of log units and the typically large uncertainty range. Textbooks and standards do not give adequate assistance on bow to evaluate 0-7803-7835-0/03/$17.00 0 2003 IEEE
In this paper the correspondence of the statistical analyses performed in the linear and in the log domains is discussed, with special reference to the normal (logarithmic domain) and the corresponding log-normal (anti-logarithmic domain) distributions, and with extension of practical value to more general distributions which are expected to be symmetrically distributed in the log domain. Best estimators can he derived in simple terms for the natural (anti-log) quantities and the log-normal distribution is seen to be best
812
Authorized licensed use limited to: Universita degli Studi di Firenze. Downloaded on April 29,2010 at 09:46:16 UTC from IEEE Xplore. Restrictions apply.
STATISTICAL CONCEPTS AND DERIVATIONS The sum of statistically independent random variables (RVs) has mean value and variance given respectively by the s u m of the mean values and variances of each RV [ 1 I]. This property does not imply a specific selection of units (linear or log in our case). The standard deviation of the sum is evaluated through the root-sum-of-the-squares rule (see GUM) which applies also to log quantities (square root of the sum of dB2).In the case of samples of small size (a typical occurrence not only in EMC hut also in other fields such as destructive testing and some biomedical applications) the central limit theorem is of help in the evaluation of the uncertainty. If the standard deviations of the RVs in the sum have similar magnitudes the probability density function of the s u m is approximately normal. Thus a coverage factor of two, corresponding to a confidence interval of 95% in the normal distribution, can he extended to many practical cases. However, in the case of standard deviations of different magnitudes it may he necessary to use statistical inference tests and estimates (i.e. Student, Chi-squared test) in order to correctly derive the confidence interval.
suited to many situations where large dispersion of the ohserved quantities is found. BASIC ANALYSIS Historically, the (implicit) use of log-normal distributions in EMC originated from a) using logarithmic units and b) using the normal distribution as the first-choice resource for uncertainty estimates, i. e. using dBs with assumed normal probability density function (pdf) [E (Table IV), 91. In general the selection of the pdf hest suited to describe a given quantity requires a knowledge of its physical properties. Even when the physicalhnathematical model relating the measurand to its input quantities and the pdfs of all input quantities are known, the pdf of the measurand can only be obtained numerically, except for extremely simple situations [IO]. There are, however, experimental considerations (A to D below) which permit at least to exclude some pdfs in favor of a preferred or more reasonable one. A), typically, it may occur that an intrinsically positive quantity (an amplitude) is affected by a large dispersion around its expected value. A symmetrical distribution (e. g. normal) would in this case imply a large probability of ohtaining non-physical (negative) values. The only acceptable candidates are then asymmetrical, positive pdfs. In addition, expensive or time-consuming tests or experiments often result in very few experimental data or measurements available for analysis, and when large dispersions are present, the hest estimators of the characteristic parameters of the pdf are also affected by large uncertainties [6], thus producing a large indecision as to which pdf should he considered as hest fitting the experiment. Investment in cumbersome analysis is not warranted by the moderate worth of the available data. The log-normal pdf is positive, asymmetrical and allows a very simple mathematical treatment for estimating the parameters of the pdf, hence the confidence intervals. B), the measurement chain consists of several cascaded blocks whose individual transfer functions are multiplied (summed in dB). If the conditions of applicability of the central-limit theorem can be considered at least approximately satisfied, the variability of the measurand can be described by a normal pdf in log units. C), manufacturers often specify the equipment tolerances in terms of dBs as equal deviations above and below the expected value, which, again, corresponds to assuming lognormal distributions in the linear quantities. D), in many experimental situations experience and insight suggest that after having in some way identified a center of the distrihution of the randomly varying experimental outcome (i.e. our hest guess) we also realize that equally likely deviations appear as relative (ratio) rather than absolute (difference) quantities. If our hest guess is, say, 50 and a value of 100 (a factor of 2) is likely to occur in a given percentage of cases we may also know that the deviation corresponding to 100 is 25 rather than 0. Thus this behavior would correspond to a log-nonnal distribution. In the following section some important aspects of the statistics of log and linear quantities will be outlined.
Let us assume for temporary convenience (will he relaxed in the following derivations) that the sum of the log RVs is normal (symmetrical) with expected value p and variance u 2 , then the corresponding linear RV is log-normal (asymmetrical) with parameters p and U' [ I I].
IfXis the linear RV and Y the log one we have: Y=ln(X)
, X=exp(Y)
(1)
and the corresponding probability density functions are:
normal with parameters p, U , and:
log-normal with parameters p, U
.
The log-normal distribution has mean value p, given by:
(4) mode M , :
and median m, : m, = exP(P)
(6)
It is evident that if the standard deviation a of the log RV is large, then p x , M , and m, differ by large amounts. The
813
Authorized licensed use limited to: Universita degli Studi di Firenze. Downloaded on April 29,2010 at 09:46:16 UTC from IEEE Xplore. Restrictions apply.
visual aid of Fig. 1 and Fig. 2 helps in describing the situation ( p = 3, U' = 2 in the example).
In fact thepdfofeachRV x, ( i = l , 2,..., n) is represented by:
According to maximum likelihood the best estimate of p and U corresponds to the maximum value of the joint probability density f,(x,,x ',...,x,). Under the assumption of independent RVs {x,,x2,. .., x n } ,we have:
which can be rewritten as: Fig. 1: Normal probability density function.
f, (x,,x* .J")= 2..
0 Mi
Elementary analytical methods allow to obtain the values of ,U and U that maximize (1 1) giving (7) and (8). Equation (7) can be rewritten as:
showing that the best estimate of p is the geomefric mean of the x, . It is further possible to prove (with a different line of reasoning, not presented here) that (12) has more general validity. In fact, if the RV expressed in log units Y has pdf f,( y ) symmetric (not necessarily normal) around
p , then, a) the geometric mean (12) represents the best estimator of the RV in linear units Xand, b) exp(p) is the
Fig. 2: Log-normal probability density function.
median
mz
of f, (x) , the pdf of X . This is exactly the
If (x,.x, ,...,x.) are n independent RVs in linear units (ex-
same property expressed by equation (6) for the log-normal case.
perimental outcomes of the random variable X ) lognormally distributed with unknown parameters ( p , U ) ,
If as is ordinarily done in technical work we use quantities expressed in dB,namely:
then, the best estimators of ,U and be:
U
are shown below to
1 " p = - In (x, ) n
C
Y =Alog(X)
where A is either 10 or 20 and log is the decimal logarithm, then the best estimates of parameters p e U are:
(7)
p= U=
J'
(13)
-nC;=,[ ~ n ( x ~ ) - p ] '
U =
"
1 " -E A log ( n
Xi)
d'
-nC "I A l o g ( x , ) - p ] '
8 14
Authorized licensed use limited to: Universita degli Studi di Firenze. Downloaded on April 29,2010 at 09:46:16 UTC from IEEE Xplore. Restrictions apply.
A final note about the determination of the confidence intervals. Let us assume that /, ( y ) is symmetrical about p .
It is immediately shown that: P + V
V"P[+)
field in logarithmic units follows a normal pdf, with the same level of credibility as the log-normal in linear units. The measured values and their log-normal distribution are represented in Fig. 3. The results for the other frequencies explored suggest the same conclusion, see Table 1.
(15)
m z / ~ O m, ~ 55 50
where k, is the coverage factor corresponding to the probability
m;I$
.
-
45
0.077
0.07
0.053
Yt'P
p , i.e.
J
f, ( y ) & = p and m,
= exp(p) is the median of
P X . 0
/,(x).
The symmetrical interval ( p - k , u , p + k , u ) and the
corresponding
asymmetrical
one
(m,/exp(k,u),
rn, .exp(k,a)) geometrically centered on the median, result in
the same probabilityp for RVs Y and X,respectively. The shaded areas in Fig. I and Fig. 2 represent the case with (Ynormal,Xlognormal) p = 0.68, k p = 1 ( p = 3, U' = 2) . It can be shown
0.007
Electric Field [rnV/m]
" that when using the dBs we have
rnx = 10'
and the asymmetric interval becomes CT as
[-
1'
3 z. , m, .IO " 10 A
Fig. 3: Experimental data and corresponding log-normal distribution.
with p as in (14) with
The physics of the experiment presented requires one comment. All measured field values are the superposition of many essentially uncorrelated contributions (from multiple wall reflections) plus an essentially point-to-point constant contribution (from direct ray and reflections from floor and ceiling), as observed through the envelope detection in the receiver. A Rice distribution for the field (in linear units) would be better related to the physical phenomenon. Finding the best estimate of the parameters for the Rice pdf from the measured data is a complex task and only a numerical solution is viable [13]. However, once this is done, and upon completing the x2 test we obtain that the log-normal and Rician distribution appear as statistically equivalent (see Table I).
in (14).
EXPERIMENTAL CONFIRMATION We offer in this section an experiment illustrating a case with a large dispersion of the measured values, where the distribution cannot be adequately represented as normal whereas the log-normal fitting of the data is satisfactory. Inside a screened room (6.08 m length x 4.58 m width x 2.55 m height) two biconical antennas are coupled in transmission, vertical polarization, distance 3 m, height of the phase center 1.45 m. The antenna pair is rigidly moved to different randomly selected positions across the room, 50 total. A minimum clearance to the walls of about 0.3 m was left over. With constant transmitted power (about 0.1 mW) the received power was noted for each position. The behavior was tested at various frequencies, four of which are noted in Table I . Let us consider the example for I50 MHz (68 modes excited in the room, [12]). We start treating the electric field in linear units, resulting in an average of 14.3 mV/m and a standard deviation of 6.0 mV/m over the 50 measurements. If we assume a normal pdf having the above values of average and deviation and we perfom a x2 test the assumption is found to be unacceptable (probability less than 2 %). If we assume instead a lognormal distribution (parameters p = 82.2 dB(pV/m) and
Table 1: Probabilities of the x2 test: normal vs. lognormal distribution. ( l e s t for Rice distribution added, last column)
a = 4.1 dB, average and std. deviation in dB units) the x 2 test results this time in a probability of about 50 Yo,hence the assumption is acceptable. Correspondingly, the electric
815
Authorized licensed use limited to: Universita degli Studi di Firenze. Downloaded on April 29,2010 at 09:46:16 UTC from IEEE Xplore. Restrictions apply.
[3] E. L. Bronaugh, J. D. M. Osburn, “Estimating EMC Measurement Uncertainty Using Logarithmic Terms (dB),” IEEE Int. Symp. on EMC, Seattle, WA, Aug. 1999.
It is also important to note that in terms of technical practice, most tests or measurements cannot point out at the outset which distribution is recommended on physical grounds and an acceptable fitting to the data is a sensible (or a forced) solution.
[4] J. Perini, “EMC Measurement Uncertainty - What is it?,” Workshop, IEEE Int. Symp. on EMC, Washington, DC, Aug. 2000.
[5] Several authors, “Tutorial on EMC Measurement Basics,” Tutorial, IEEE Int. Symp. on EMC, Washington, DC, Aug. 2000.
CONCLUSIONS There are cases in experimental activity where a large dispersion of the measured quantity is observed. A simple and satisfactoly representation of the distribution (probability density) can be in terms of a log-normal function. When log units are used the appropriate selection of the best estimator and of the confidence intervals is not immediately apparent. If the probability density function of the logarithm of the random variable is symmetrical (not necessarily normal) then its hest estimator is the average value ,u while the median mx is the best estimator for the same random variable in linear units. It is also of interest to note that mx = exp(p ), the antilog of the expected value of the
[6] Guide to the Expression of Uncerfainfy in Measurement, First Ed. 1993, corrected and reprinted in 1995. International Organization for Standardization, Geneva, Switzerland. [7] CISPR 16: Specificafion for radio disturbance and immuniry measuring apparatus and methods, International Electrotechnical Commission, Second Ed. 1999, Geneva, Switzerland. [8] D. N.Heirman, “CISPR Subcommittee A Uncertainty Activity”, IEEE Trans. on Electromag. Compat., vol. 44, no. I, Feb. 2002.
log random variable, is the geometrical mean of the distribution of the linear random variable. Symmetrical and asymmetrical confidence intervals for the log and linear random variables have been determined for the case of a symmetrically distributed log random variable. Calculations can be confined within the (simpler) log units obtaining the expected value, the standard deviation and the coverage factor of the log random variable. An EMC-type experiment is presented and commented demonstrating the convenience of the log-normal option in cases of practical interest where large dispersion of the measured results is observed.
[9] J. DeMarinis, “Qualification of Radiated EM1 Test Sites’’, IEEE Int. Symp. on EMC, Anaheim, CA, Aug. 1992.
[IO] M. G. Cox and P. M. Harris, “Measurement Uncertainty and the Propagation of Distributions”, Metrologie 200 1, 1 0 International ~ Metrology Congress, Saint-Louis, France, 22-25 October, 2001. [ I I] A. Papoulis, Probabilify, Random Variables and Stochastic Processes, Third Ed., McGraw-Hill, New York, 1991. [I21 B. H. Liu, D. C. Chang, M. T. Ma, “Design consideration of reverberating chambers for electromagnetic interference measurements”, IEEE Int. Symp. on EMC, Arlington, VA, Aug. 1983. [13] J. Sijbers et alii, “Maximum-Likelihood Estimation of Rician Distribution Parameters”, IEEE Trans. On Medical Imaging, vol. 17, no. 3, Jun. 1998.
REFERENCES [ l ] E. L. Bronaugh, J. D. M. Osburn, “A Process for the Analysis of the Physics of Measurement and Determination of Measurement Uncertainty in EMC Test Procedures,” IEEE Int. Symp. on EMC, Santa Clara, CA, Aug. 1996.
[2] E. R. Heise, R. E. W. Heise, “lest Facility Uncertainty Calculation Methodology and Rationale,” IEEE Int. Symp. on EMC, Seattle, WA, Aug. 1999.
816
Authorized licensed use limited to: Universita degli Studi di Firenze. Downloaded on April 29,2010 at 09:46:16 UTC from IEEE Xplore. Restrictions apply.