ON EFFICIENT ESTIMATION FOR THE LOG-NORMAL MODEL MEAN Winston A. Richards, Department of Mathematics and Statistics The Pennsylvania State University, Harrisburg, USA *Ashok Sahai, Robin Antoine, Letetia Addison, Department of Mathematics & Computer Science; Faculty of Science & Agriculture St. Augustine Campus; The University of The West Indies. Trinidad & Tobago; West Indies. And M. Raghunadh Acharya Department of Statistics and Computer Science, Aurora‟s Post Graduate College; Osmania University, Hyderabad (Andhra Pradesh); India ABSTRACT It is not uncommon to have situations where highly skewed data appear in research investigations. In many instances, transformations of such data are advocated, for the primary purpose of establishing a normal distribution required for parametric statistical analysis. Such a transformation retains its advantage in the context of ‟meta-analysis‟ (Joseph et. al. (2000)), as well as in the context of „Bayesian‟ Case-Studies (Stevens et. al. (2003)). The logarithmic transformation is a commonly-employed method within the decision sciences used to establish a normal distribution of skewed data. Patterson (1966) discussed and defined the challenges involved in the estimation of the population mean following the transformation of sample data. The same is equally true for the logarithmic transformations as well. The purpose of this paper is to address the logarithmic transformation and to ultimately provide the most efficient estimation of the lognormal mean. This has been achieved by using the sample information from the resultant normal distribution, in order to ultimately estimate the mean of the non-transformed population. The aspects of the gains in efficiency of the proposed „Optimal Mean Estimator‟ are numerically illustrated through a simulation study, comparing it with the minimum “Risk/RMSE (Relative Mean Square Error)” estimator of log-normal mean recently studied by Shen et. al. (2006). 2010 Mathematics Subject Classification: 62F. Keywords: Lognormal population mean, lognormal modeling, simulation study.
Corresponding Author: Ashok Sahai (
[email protected]) 1|Page
1. INTRODUCTION This paper addresses the transformation problem, and looks for the ‟most efficient‟ mean estimation for the Log-normal model. This has been achieved by using the sample information from the resultant normal distribution (after the logarithmic transformation) more fully in the context of estimation of the population mean of the original (lognormal) distribution before the transformation. Suppose Y is a random variable which has a log-normal distribution with mean E(Y) = ς. Then log (Y) will be normally distributed with mean, say, μ and variance, say, σ2. Let us say that Y ~ LN (μ, σ2) with mean ς. Consequently, the three parameters have the following relationship: ς = exp (μ + σ2/2)
… (1.1)
Consider a random sample y1, y2, …, yn that is i. i. d. LN (μ, σ2) with mean ς. Then xi = log (yi) is i. i. d. N (μ, σ2); for i = 1, …, n. Let us define the following expressions: i n
y =
i 1
i n
yi/ n, x =
xi/ n, and s2 =
i 1
i n
(xi - x )2/ (n-1)
… (1.2)
i 1
We know that x and s2 are the ML (Maximum Likelihood) estimators for μ and σ2 respectively. The simple plug-in principle, in view of (1.1), leads to the usual ML estimator for ς: exp ( x + s2/2) ~ UER, say.
… (1.3)
Shen et. al. (2006) have proposed a new estimator by considering its Relative Mean Square Error (RMSE). For example the RMSE of „UE‟ in (1.3) above will be: RMSE (UER) = E [(UER - ς)/ ς ]2
… (1.4)
Shen et. al. (2006) use the following class of estimators: δc = exp ( x +c.s2/2), c = 1/ (n+d); d > -n.
2|Page
… (1.5)
By minimizing the RMSE of the estimators in their class in (1.5) above up to the terms of order 1/n 2, they found the optimal value of the„d‟. They found their optimal estimator in the class when: c = 1/ (n + 4 + 3. σ2/2)
… (1.6)
Using the unbiased estimate „s2 „of „σ2 „in (1.6), they found their „Minimum Risk/RMSE (Relative Mean Square Error) estimator of ς: exp { x + (n-1).s2/ (2. (n + 4) + 3. s2)} = ER2006, say.
… (1.7)
The RMSE of „ER2006‟ in (1.7) above will be: RMSE (ER2006) = E [(ER2006 - ς )/ ς ]2
… (1.8)
2. PRELIMINARY RESULTS It is well-known that the sample mean x , and the sample variance s2 are independently distributed for a random sample from a Normal population with mean μ and standard deviation σ. While the original variable (on which we have the original data), say y is not normally distributed, the transformed variable x = log(y) is N (μ, σ2). The population mean ς of y is a function of μ and σ2, as noted in (1.1). As such, we could use the sample mean, and the sample variance: i n
i n
i 1
i 1
x x i /n & s 2 ( xi x) 2 /( n 1)
Where ‟n‟ is the size of the random sample drawn from a normal population: x ~ N (μ, σ 2). Doing so leads to the usual estimator of μ as exp (UER), as in (1.3), wherein UER = x s 2 / 2 . It is a very significant fact to note that, in view of the celebrated Rao- Blackwell theorem, any function of the sample mean/sample variance will be UMVUE (Uniformly Minimum Variance Unbiased Estimate) or UMMSE (Uniformly Minimum Mean Squared Error Estimator) of the relevant function of Population mean (μ)/population variance (σ2), depending on the fact that the corresponding estimator at hand is simply an unbiased/ minimum mean squared estimator (mmse) of that parametric function of the population, which we are interested in the estimation of. 3|Page
Hence, the estimator ‟URE‟ is UMVU estimator of ln ς (the population mean of the original lognormal distribution). We know that (n−1)*s2/σ2 ~ χ2 (Chi-Square) distribution on ‟n-1‟ degrees of freedom (d. f.). Using that sampling distribution, it could well be seen that the aforesaid estimator of the variance of T is not unbiased. In fact, one could easily check that: E (s2) = σ2. So that s2 is an unbiased estimate of σ2, whereas E (s4) is not equal to σ4. On the other hand, we could also easily check that: E (cn1* s4) = σ4, wherein cn1 = (n−1)/(n+1) . Therefore, cn1* s4 is an unbiased estimate of σ4. However, it could easily be verified that the MMSE (Minimum Mean square Error) Estimator of σ 2 is as below: MMSE (σ2) = cn1*s2; wherein cn1 = (n-1)/ (n+1)
… (2.1)
3. PREPARATORY IMPROVEMENT It would be very common to encounter situations in which the sample estimate of the coefficient of variation of the sample mean (a more stable random variable than the original variable in the study „X‟) will not be that large. In such situations, we might prefer to have „C.V. ( x ) < 1.0 or even much less than 1.0‟. Hence for such cases, which are quite frequent, we propose an alternative estimator, say t, as below: t = x + x / [n. ( x )2 /s2 – 1]
… (3.1)
We note that the relative efficiency of t with respect to (wrt) the usual estimator x (In %) would be: 100/ 100
4|Page
E( x ) 2 E (t ) 2
… (3.2)
It suffices for us to find an unbiased estimator of the „Efficiency Ratio‟ , as a function of ( x , s2) alone; since ( x , s2) is jointly a complete sufficient statistic for ( , σ2).The so-determined unbiased estimator, being a function of a “Complete Sufficient Statistic”, would be UMVUE. Now, we note that:
n = 2
E
E (t ) 2 E( x ) 2
( x ) x /{n.( x)
2
/ s 1} 2
2
… (3.3)
= 1 + 2A + B; Wherein:
→
n 2 E x.( x ).{n.( x) 2 / s 2 1}1 A=
n 2 2 E s .E x .
=
x.( x ).{n.( x)
2
/ s 2 1}1
c.n = 2 E s2 .
n( x ) 2 2 2 1 x.{n.( x) / s 1} .( x ). exp 2 2
=
d x With c = n 1 / 2 2 2 .
n( x ) 2 2 2 1 x . { n .( x ) / s 1 } .( d / d x ).{exp } d x c E s 2 . 2 2 .
Now, using the technique of integration by parts, we have the following proceedings: A = E s 2 . E x [a] = E[a]; Wherein, a = - [{n.( x) = - [(u +1). (u -1)-2]. And,
5|Page
/ s 2 1}.{n.( x) 2 / s 2 1}2 ] … (3.4)
{n.( x) 2 /( s 2 )} u
n Also; B = 2 E ( x) 2 .{n.( x) 2 / s 2 1}2
2
… (3.5)
Similarly, we now take note of the well-known fact that independently of x , (n-1) s2/ σ2 has a χ2distribution with (n-1) „degrees of freedom‟ ~ d. f. Therefore, we have: →
n B = 2 E s 2 . E x . ( x) 2 .{n.( x) 2 / s 2 1}2
=(c*/σ2). E x . [ {[( x) 2 / s 2 ].(n.( x) 2 / s 2 1) 2 }.((s 2 ) ( n1) / 2 ) exp{(n 1).s 2 / 2. 2 }.ds 2 ] ```
… (3.6)
0 2 ( n 1) / 2 .{1 /(((n 1) / 2)}. Where, C* = {(n 1) / 2. }
Now, because we have:
(d / ds 2 ).[(s 2 ) ( n1) / 2 . exp{(n 1).s 2 / 2 2 }] {(n 1) / 2 2 }.[ 2 .( s 2 ) ( n3) / 2 . exp{(n 1).s 2 / 2 2 } ( s 2 ) ( n1) / 2 . exp{(n 1).s 2 / 2 2 }].
… (3.7)
Using (3.7) in (3.6), we could lead ourselves to the following:
B = (c/ n)*. E x [ {[n.( x) 2 / s 2 ].(n.( x) 2 / s 2 1) 2 }.((s 2 ) ( n 3) / 2 ) exp{(n 1).s 2 / 2. 2 }.ds 2 ] 0
- {2c*/ [n. (n-1)]}. x { [n.( x) 2 / s 2 ].(n.( x) 2 / s 2 1) 2 . (d / ds 2 ).[(s 2 ) ( n1) / 2 . exp{(n 1).s 2 / 2 2 } ds 2 }] . 0
Again resorting to the integration by parts, we obtain: B =E s 2 ▫ E x ▫ [{n.( x) 2 /( s 2 )}.(n.( x) 2 / s 2 1) 2 ] - {2c*/ [n. (n-1)]}
▫E x { [n.( x) 2 /( s 2 )].[n.( x) 2 / s 2 1]2 . (d / ds 2 ).[(s 2 ) ( n1) / 2 . exp{(n 1).s 2 / 2 2 } ds 2 }] 0
B = E [{n.( x) 2 /( s 2 )}.(n.( x) 2 / s 2 1) 2 ] - {2 /( n 1)}. E [n.( x) 2 / s 2 .{n.( x) 2 / s 2 1}.{n.( x) 2 / s 2 1}3 ] 2 2 2 2 2 = E ([{n.( x) /( s )}.(n.( x) / s 1) -
{2 /( n 1)}. [n.( x) 2 / s 2 .{n.( x) 2 / s 2 1}.{n.( x) 2 / s 2 1}3 ]) Or B = E (b), where: → b = u. (u -1)-2 - {2/ (n-1)}. u. (u+1) (u -1)-3 = u. (u -1)-3 . [(n-3) – u. (n+1)]/ (n-1) Wherein, as mentioned earlier, we set
6|Page
{n.( x) 2 /( s 2 )} u
… (3.8) … (3.9)
Thus, the „UMVUE‟ of the relative efficiency of t with respect to (w.r.t.) the usual estimator x would be that of:1 + 2A +B [ i.e., 1 + 2a + b]. Which is, as per (3.4), and (3.7) = 1 - 2. [(u +1). (u -1)-2] + u. (u -1)-3. [(n-3) – u. (n+1)]/ (n-1) = 1 – [(u – 1)-3. {(n – 3). u2 + (n – 3).u – 2. (n – 1)}/ (n – 1)] Now, 100/ 100
If 0 1
E( x ) 2 > 100% E (t ) 2
… (3.10) … (3.11)
; Or If 0 < 1 + 2a + b 1 for all ( x) 2 s 2 / n [As “C.V.” of x is < 1], as per (3.9); or If u > (n + 1)/ (n – 3); which is so for all “C.V.” of x , in practice!
4. THE IMPROVED LOGNORMAL MEAN ESTIMATOR As noted in the introduction section, our aim is that of improving the estimator proposed by Shen et. al. (2006) in (1.7) as below: exp { x + (n-1).s2/(2. (n + 4) + 3. s2)} = ER2006.
… (4.1)
Using the improvement achieved through t in the estimation of µ we use that in (4.1) in place of x t = x + s2 / (n x )
… (4.2)
= x . (1+v)
2 2 Also, we use the “MMSE” in (2.1), in (4.1) above, rather than s2 therein: MMSE (σ ) = cn1*s
Thus, our proposed efficient estimator say, „ER2007I‟ &„ER2007II‟, using (4.2) and then cn1*s 2 also in (4.1) are respectively, as follows: exp {t + (n-1).s2/ (2. (n + 4) + 3.s2)} = ER2007I.
… (4.3)
& exp {t + (n-1).cn1.s2/ (2. (n + 4) + 3. cn1.s2)} = ER2007II. 2 2 Wherein, t = x + x / [n. ( x ) /s – 1]
… (4.4)
defined in (3.1).
Now, in order to compare the two estimators, namely ER2006 vis-à-vis its improvement proposed by us ER2007, that is: 7|Page
Shen et. al. (2006) in (4.1), vis-à-vis its improvement as in (4.3), we define their „Relative Efficiencies w.r.t. “usual ML estimator”, namely UER in (1.3) as follows: ReffOER2006 = RMSE (UER)*100/RMSE (ER2006) %
--- (4.5)
ReffOER2007I = RMSE (UER)*100/RMSE (ER2007I) %
--- (4.6)
& ReffOER2007II = RMSE (UER)*100/RMSE (ER2007II) %
--- (4.7)
Wherein, RMSE (UER) = E [(UER - ς)/ ς ]2
as in (1.4),
--- (4.8)
as in (1.8),
--- (4.9)
& RMSE (ER2006) = E [(ER2006 - ς)/ ς ]2 And similarly, RMSE (ER2007I) = E [(ER2007I - ς)/ ς ]2
--- (4.10)
& RMSE (ER2007II) = E [(ER2007II - ς)/ ς ]2
--- (4.11)
5. SIMULATION AND COMPARISONS The simulation conducted for the proposed point estimator considered various sample sizes n: 11, 21, 31, 41, 51, 71, 101, & 151 and population mean values for µ: 0.5000, 0.5500, 0.6000, 0.6500, 0.7000, 0.7500, & 0.8000 for a fixed variance of σ2 = 0.25 ( i.e., σ = 0.50, as well ! ). This simulation considered 11,000 iterations of the relevant sample size, drawing randomly from a population of N (μ, σ2). The actual RMSEs of the estimators (usual ML estimator UER, and the improved, and the three estimators: ER 2006, ER 2007I & 2007II) respectively, were then calculated per the expressions in (4.8), (4.9), (4.10), and (4.8) as an average of “the actual relative mean square errors on each one of these 11,000 iterations” to validate the findings. Thence “Relative Efficiencies” of the improved, and the proposed estimators: ER 2006, ER 2007I & ER 2007II with respect to the usual ML estimator UER were calculated per the expressions in (4.5), (4.6) & (4.7), respectively. The resulting values of these relative efficiencies appear in Table 4.1, in the APPENDIX.
8|Page
6. CONCLUSION The need for an accurate determination of the point estimate for a lognormal mean is great within the context of clinical research and other scientific disciplines. For numerous, and often well-established reasons, the transformation of data is undertaken to reach the assumptions required for parametric statistical inference. However, standard methods, which capture only a sample‟s mean and variance, do not necessarily yield the most efficient estimator. By utilizing more complete information available within the sample coefficient of variation: “
s n .x
“, this paper presented, and tested, more efficient
point estimators for the lognormal mean. The results from the simulation which incorporated various population means and sample sizes indicated that a relative improvement in efficiency was observed, particularly as a function of increasing population CV {i.e. ( σ / μ ) } and decreasing sample size. It might be in place here to note that in most of the applications (in the context of) biomedical research in general, and in the context of „Pharmacokinetic data‟, in particular, the sample size is rather small, always.
9|Page
REFERENCES Shen, H., Brown, L. D., and Zhi, H. (2006), “Efficient estimation of log-normal means with application to pharmacokinetic data”; Statistics in Medicine; 25: 3023-3038. Zhou, X.H. (1998), “Estimation of the log-normal mean”; Statistics in Medicine; 17: 2251-2264. Evans I. G., Shaban S. A. (1974), “A note on estimation in lognormal models”; Journal of the American Statistical Association; 69: 779-781.
10 | P a g e
APPENDIX Relative Efficiencies of Er2007I & Er2007II Vis-à-Vis that of Shen et. al (2006) TABLE 4.1: Relative Efficiencies ~ REff‟s Of Er 2006, Er 2007I & Er 2007II with Respect To Usual Estimator UER [%]↓[11,000 iterations/samples with σ = 1.00 ]→ n=11 mu=0.50 mu=0.55 mu=0.60 mu=0.65 mu=0.70 mu=0.75 mu=0.80 REffEr2006I 112.092 113.836 114.865 115.021 113.545 114.075 114.381 REffEr2007I 114.374 115.104 115.873 116.002 115.611 116.3621 115.731 REffEr2007II 120.825 121.771 121.250 121.661 120.574 119.271 119.161 n=21 mu=0.50 mu=0.55 mu=0.60 mu=0.65 mu=0.70 mu=0.75 mu=0.80 REffEr2006I 107.299 107.889 106.820 107.684 108.008 108.267 107.867 REffEr2007I 111.158 110.874 110.494 110.061 109.171 109.188 109.029 REffEr2007II 112.216 114.544 113.628 113.030 111.455 111.543 111.283 n=31 mu=0.50 mu=0.55 mu=0.60 mu=0.65 mu=0.70 mu=0.75 mu=0.80 REffEr2006I 104.675 105.389 106.196 105.139 105.243 105.149 105.376 REffEr2007I 108.363 107.747 107.506 106.793 106.524 106.330 106.215 REffEr2007II 110.812 110.097 109.887 108.552 108.145 107.758 107.597 n=41 mu=0.50 mu=0.55 mu=0.60 mu=0.65 mu=0.70 mu=0.75 mu=0.80 REffEr2006I 104.363 104.090 103.924 104.268 104.620 104.165 104.720 REffEr2007I 106.387 105.850 105.464 105.302 105.261 104.897 105.080 REffEr2007II 108.425 107.547 106.895 106.702 106.666 106.026 106.313 n=51 mu=0.50 mu=0.55 mu=0.60 mu=0.65 mu=0.70 mu=0.75 mu=0.80 REffEr2006I 102.972 103.339 103.794 103.553 103.625 102.770 103.056 REffEr2007I 105.054 104.729 104.584 104.337 104.208 103.752 103.742 REffEr2007II 106.431 106.068 105.938 105.477 105.273 104.408 104.439 n=71 mu=0.50 mu=0.55 mu=0.60 mu=0.65 mu=0.70 mu=0.75 mu=0.80 REffEr2006I 102.384 102.477 102.342 102.486 102.160 102.526 101.811 REffEr2007I 103.642 103.423 103.211 103.101 102.876 102.953 102.538 REffEr2007II 104.687 104.377 104.011 103.867 103.450 103.600 102.869 n=101 mu=0.50 mu=0.55 mu=0.60 mu=0.65 mu=0.70 mu=0.75 mu=0.80 REffEr2006I 101.768 101.663 101.505 101.940 101.743 102.318 101.854 REffEr2007I 102.586 102.386 102.200 102.279 102.137 102.352 102.073 REffEr2007II 103.337 103.008 102.693 102.872 102.603 102.994 102.509 n=151 mu=0.50 mu=0.55 mu=0.60 mu=0.65 mu=0.70 mu=0.75 mu=0.80 REffEr2006I 100.922 101.720 101.313 100.843 101.329 100.751 101.089 REffEr2007I 101.673 101.780 101.585 101.340 101.503 101.185 101.327 REffEr2007II 102.067 102.412 102.019 101.560 101.869 101.312 101.551
11 | P a g e