Polynomial estimation of the measurand parameters

1 downloads 0 Views 3MB Size Report
Aug 29, 2017 - Advanced Mathematical and. Computational Tools in Metrology and Testing XI, ..... Distribution. Output parameters of Lilliefors test. LSTAT. CV.
International Conference

AMCTM 2017 Advanced Mathematical and Computational Tools in Metrology and Testing XI, 29-31 August, 2017 University of Strathclyde Glasgow, Scotland

A polynomial estimation of measurand parameters based on higher order statistics Serhii ZABOLOTNII1), Zygmunt WARSZA2) Jacek PUCHALSKI3) 1) Cherkasy State University of Technology, Ukraine (e-mail: [email protected]) 2) Research Institute of Automation and Measurements & Polish Metrological Society, Warsaw, Poland (e-mail: [email protected]) 3) Central Office of Measures (GUM), Warsaw, Poland (e:mail: [email protected])

1

Contents  Introduction  Moments and cumulants  Stochastic polynomial Maximization Method (PMM)  of first degree S=1  of second degree S=2

 of third degree S=3  Analysis of the accuracy of polynomial estimates  Numerical examples of Monte Carlo method for different pdf distribution:     

Uniform Arcsinus Trapezoidal Triangular Gaussian

 Conclusions and references

Introduction Accuracy of measurement, is characterized by:  mean value (bias - unknown and not removed systematic error)  distribution of experimental errors. According GUM [1] both above components are treated as random uncertainties uB and uA of the symmetrical probability density functions and extended uncertainty UP is calculated. The results of experimental studies show that the Gaussian law is not universal and suitable for the description of all kinds of measuring systems. Therefore, a number of other distributions: uniform, trapezoidal, triangular, arcsines, etc. is used [2]. The main problem in parametric approaches is the requirement of a priori information about the form of distribution. Application of statistical methods, which would allow minimizing the required amount of a priori information is considered. The use of higher-order statistics (moments or cumulants) is one of the alternative approaches in processing of non-Gaussian experimental data [3].

Purpose of work The aims of this work are the following:

 application of the Polynomial Maximization Method (PMM) to synthesis of algorithms of a estimation of the measurand value and uncertainty of samples from non-Gaussian symmetrical random populations,  analysis of the accuracy of polynomial estimates;  investigation of effectiveness of those algorithms using the statistical (Monte Carlo) modeling. It is the first step to obtain the answer if these algorithms can be applied for all kind of symetrical and non-symmetrical samples of data of various measured signals and processes. 4

Description by moments and cumulants Some theoretical background of cumulants can be useful The characteristic function of random variable ξ is: 1  f u   p x  e jux dx  2 

Where:

p x  density of distribution 

r   d r r then its moments are: mr   x p  x  dx  j  r f u   du  u 0 

and cumulants:

expansion in series of Taylor-Maclaurin

r  d ln f (u )  r r  j   r du   u 0

Between moments and cumulants there is a univocal relationship, however description by cumulants has several advantages. For the normal distribution ( i.e. pdf of Gauss function) all cumulants of the order r ≥ 3 are zero. r 



 r  r 2 2

Introducing - normalizing cumulants to the variance (where: к2 – cumulant of r 2nd order equal 2variance) The most common:

 3,  4

- cooficients of skeness and kurtosis

5

Some properties of cumulants 

Equivariance and invariance: the first cumulant is shift-invariant; all of others are shift-invariant. This means that, if we denote by κn(X) the n-th cumulant of the probability distribution of the random variable X, then for any constant c:



Homogeneity: the n-th cumulant is homogeneous of degree n, i.e. if c is any constant, then



Additivity: if X and Y are independent random variables, then:



κn(X + Y) = κn(X) + κn(Y). Moments by cumulants and the opposite 

The n-th moment μ′n is an n-th-degree polynomial in the first n cumulants:

… Central moments µn: above for κ1=0 кn when μ'1 = 0 for n > 1 , , , , …| , , ,

,…

6

Cumulants of some probability distributions: 

For the normal distribution with expected value μ and variance σ2 the cumulants are κ1=μ κ2=σ2 and κ3=κ4 ... = 0



The special case σ2 = 0 is a constant random variable X = μ.



The cumulants of the uniform distribution on the interval [−1, 0] are κn = Bn/n, where Bn is the n-th Bernoulli number: (B0 = 1, B1 = ±1⁄2, B2 = 1⁄6, B3 = 0, B4 = −1⁄30, B5 = 0, B6 = 1⁄42, B7 = 0, …)



The cumulants of the exponential distribution are: κn = λ−n (n − 1)!



The Poisson distribution. The derivative of the cumulant generating function is g '(t) = μ·et. All cumulants are equal to the parameter: κ1 = κ2 = κ3 = ...= μ’,

The variance-to-mean ratio ε = σ2 μ -1 = κ1-1κ2

7

Advantages of stochastic polynomials application Advantages of application of stochastic polynomials in practice are resulting from their focus on the use of higherorder statistics (moments and cumulants). In describing random variables and processes that allows to: • simply and effectively, from the practical point of view, to describe the degree of non-Gaussian statistics; • find a compromise between the required amount of a priori information, complexity and accuracy of statistical processing algorithms; • provide the adaptability of resulting algorithms to the probabilistic nature of statistical data. 8

Polynomial Maximization Method (1) Stochastic polynomial of the general form

 

n s  lsn x    k0     ki   i xv v 1 i 1

has a maximum in the vicinity of the true value of an estimated parameter , where k0    

n

s



hi  z  Ψ i  z  dz   v 1 i 1

ki   





hi z  Ψ i z dz

If as the basis functions to use power-conversion then the estimate of the unknown parameter  is determined by solving the equation

hi  xni   i     1 i 1 n

s

0   o

where

 i   - initial moments

Coefficients that minimize the variance of the estimates (for certain s) s

 hi  Fi, j     i 1

d  i   d

where

Fi , j     i  j     i   j   9

Polynomial Maximization Method

(2)

To calculate the evaluation of uncertainty is necessary to find the volume of extracted information on the estimated parameters , which generally are described by the equation: s

d J sn    n hi   i   d i 1

The statistical sense of function J sn   is similar to the classical Fisher concept of information quantity, as if n   its inverse approaches to the variance of estimates:



2 ( ) s

 lim J n

1 sn  

10

Mathematical formulation of the problem – asymmetric pdf)

If a – is a measured value determined as a result of frequent tests which are characterized by the presence of measurement errors The set of measurement results can be interpreted as a sample  x  x1 , x2 ,...xn , consisting of n independent and identically distributed random components that are described by the model

  a  .

In this model a  const – is a permanent component (value of measurand), and  – asymmetrically-distributed random variable (measurement error), probabilistic properties of a given sequence of cumulants (cumulant coefficients). In a mathematical model of the cumulant of the second order  2 determine a variance of the random error component, and cumulant coefficients of higher orders  3 ,  4 , etc. numerically describe the degree of deviation of random errors on the Gaussian distribution. 11

Estimation of measured value 2

In the PMM using polynomials of degree s  2 the estimation aˆ is a solution of an equation: n

n

 

h1 a  xv  a   h2 a  x  a   2 v 1

v 1

2 v

2



a  aˆ

0

Where: h1 a  and h2 a  - optimal coefficients, which for order of polynomial s  2 minimize the dispersion of searched estimates of parameter a .

Polynomial estimation of the measurand parameters Linear PMM statistic (s = 1) for estimating parameters  transformed into a sample arithmetic average: 1 n aˆ1   xv  x n v1

Quadratic PMM statistics (s = 2) for estimating parameters  :

aˆ2   aˆ1   2  where the correction factor δ(2) in expanded form is:

 2   

2 4 2 3

2    1 n 2  1 n 2    2   4    2  sign 3   2    xv    xv     2      n n 2   v1 v1   3      

 i  i

 2i

1 2

Analysis of the accuracy of polynomial estimates To quantify changes of measurement uncertainty the coefficient of variance estimates reduction is proposed:

g a  s

 2a  s  2  a  1

2 The variance  a 1 does not depend on the value of the estimated parameter,

but is determined only by the variance of the random component of the measurement error (second-order cumulant) and by sample size n

 2a 1  2  Asymptotic variance a 2 is:



2  a 2

2 n

2 

 32   1   n  24 

The coefficient of variance reduction is a function of the value of the cumulative coefficients of skewness and kurtosis

g  a 2

 32  1 24

3  4  0

g a 2  1 - no any profit for Gauss pdf 14 !!!

Analysis of the accuracy of polynomial estimates Fig 1. Coefficient of reduction variance dependency on cumulative coefficients γ3, γ4

Statistical modeling of polynomial estimation

Table 1. The results from Monte-Carlo simulation of parameters estimation Theoretical values of parameters

Monte-Carlo simulation

Distribution 3

4

g  2

2.83

12

2

 2

gˆ  2 n  20

n  50

n  200

0.43

0.47

0.46

0.43

6

0.5

0.58

0.52

0.5

1.41

3

0.6

0.63

0.61

0.6

 4

1

1.5

0.71

0.74

0.72

0.71

Lognormal  2  0.1 ,   1

1

1.86

0.74

0.76

0.075

0.74

Weibull a  1 , b  2

0.63

0.25

0.82

0.84

0.83

0.82

Gamma

  0.5

Exponential (Gamma,   1 ) Gamma

Statistical modeling of polynomial estimation

Table 2. The result of testing the adequacy of the Gaussian distribution model linear (s =1) and polynomial (s =2) estimates on the basis of Lilliefors test

Output parameters of Lilliefors test Distribution

Gamma

  0.5

Exponential (Gamma,   1 )

 2 Gamma

 4

LSTAT n  20

n  50

n  200

s 1

s2

s 1

s2

s 1

s2

0.045

0.036

0.028

0.021

0.018

0.009

0.034

0.027

0.023

0.013

0.011

0.008

0.021

0.017

0.013

0.012

0.009

0.007

CV

0.009 0.02

0.017

0.012

0.011

0.008

0.007

Lognormal  2  0.1 ,   1

0.016

0.014

0.012

0.011

0.008

0.007

Weibull a  1 , b  2

0.013

0.017

0.011

0.011

0.006

0.004

Statistical modeling of polynomial estimation Fig 2. Gaussian probability plots approximating the experimental values of the measured parameter estimates for exponential distribution model error.

a) n=20

b) n=50

c) n=200

Statistical modeling of polynomial estimation Fig 3. Gaussian probability plots approximating the experimental values of the measured parameter estimates in the lognormal distribution model error:

a) s=1

b) s=2

Mathematical formulation of the problem (symmetric error)

If a measured value  (measurand) is determined from tests with the presence of random errors, then the set of measurement results  can be interpreted as a sample x  x1 , x2 ,... xn  consisting of n independent and identically distributed random components described by the model     0 . Where:   const is a value of measurand and  0 - centered (with zero mean) symmetrically nonGaussian distributed the random component of errors. Probabilistic properties of  0 are described by the cumulant

 4 and  6 . 2 Parameter  and the standard uncertainty  a  s (variance) and cumulant coefficients

of measurand are estimated from observations of sample

 x.

2

Estimation of measured value

In many practical cases, to find the estimation of parameter simple linear statistics is used, which is the arithmetic average 1 n aˆ   xv . n v 1

a a

The parameter aˆ is the estimate of mathematical expectation of the random variable being found by the method of moments (MM). This estimate has the minimal dispersion, only where the random variable has Gaussian distribution The estimate of the mean value a , obtained by PMM method using a polynomial of degree s  1 , coincides with linear estimate MM. It is also shown that the symmetry of distribution is characterized by vanishing values of the unpaired order cumulant coefficients, and that the estimates using polynomials of degree s  2 , degenerate to linear estimate (arithmetic average).

Estimation of measured value 2 (cont.)

In accordance with the PMM, using polynomials of degree s  3 estimation aˆ is a solution of an equation: n

n







n





h1 a  xv  a   h2 a  x  a   2  h3 a  xv3  a 3  3a 2 v 1

v 1

2 v

2

v 1



0 a  aˆ

Where: h1 a  , … h3 a  optimal coefficients, which for order of polynomial s  3 minimize the dispersion of searched estimates of parameter a .

If parameters are calculated by PPM3 for the sample of symmetric pdf:

1  m1  x

 2  m2

m  4  42  3 m2

6 

1 n i mi   xv  x  n v1

m6 m4  15  30 3 2 m2 m2

Accuracy of polynomial estimates To quantify changes of measurement uncertainty the coefficient

ga  s

of variance estimates reduction as ratio

of variances for degrees

s

and g a  s

s  1 is proposed:  2a  s  2 .  a  1

This coefficient is the ratio of estimates  2a s of parameter

a

variances (which are found on PMM base using an s-th order polynomial), and linear estimate  2a  1 of its variance, which are found using the method of moments (it is equivalent to the use of polynomials of degree s  1 in PMM).

Accuracy of polynomial estimates 2 (cont.)

The variance  2 does not depend on the value of the estimated parameter, but is determined only by the variance of the random component of the measurement error (secondorder cumulant  2 ) and by sample size n



2  1

2  . n

Thus, the coefficient of variance reduction

 42 g  3  1  6  9 4   6 is a function of the value of the cumulative coefficients of higher orders (  4 and  6 ) and does not depend on the values of

2

and the sample size n .  i  i

 2i

Accuracy of polynomial estimates 3 (cont.)  2a 3 g a  3  2  a 1

 42 g  3  1  6  9 4   6

Fig. 4. Coefficient of reduction variance g(a)3 dependency on cumulative coefficients γ4, γ6.

Statistical MC modeling of polynomial estimation

Table 3. The results of estimated parameters from Monte-Carlo simulation. Theoretical values of parameters

Monte-Carlo simulation

Distribution

gˆ a 3

4

6

g  a 3

Arcsines

-1.3

8.2

0.2

0.27

0.22

0.21

Uniform

-1.2

6.9

0.3

0.41

0.33

0.31

  0.75

-1.1

6.4

0.36

0.47

0.39

0.37

  0.5

-1

5

0.55

0.63

0.58

0.55

  0.25

-0.7

2.9

0.76

0.82

0.78

0.77

-0.6

1.7

0.84

0.89

0.86

0.85

Trapezoidal

Triangular

n  20

n  50 n  200

Statistical MC modeling of polynomial estimation 2

Fig. 2. Example of experimental estimates of the measured parameter with a Trap distribution (β = 0.5) of random errors: a) histograms and Gaussian approximation PDF, b) empirical distribution function and Gaussian probability plots.

Statistical MC modeling of polynomial estimation 3 Table 4. The result of testing the adequacy of the Gaussian model distribution of polynomial estimates (s = 3) on the basis of Lilliefors test.

Output parameters of Lilliefors test Distribution

P n  20

n  50

LSTAT n  200

n  20

n  50 n  200

Arcsines

0.001 0.003

0.25 0.03

0.01

0.007

Uniform

0.001 0.006

0.31 0.02

0.01

0.007

  0.75 0.001 0.12

0.5

0.02

0.008 0.006

  0.5

0.07

0.18

0.5

0.008 0.007 0.006

  0.25

0.24

0.26

0.5

0.007 0.007 0.006

0.31

0.5

0.5

0.007 0.006 0.005

Trapezoidal

Triangular

CV

0.009

P is the significance level that corresponds to the sample value of the test statistic; LSTAT - sample value of the statistic test; CV - the critical value of test statistic. If LSTAT  CV , the null hypothesis of a given critical level is valid.

Conclusions 1 An analysis of results confirms the possibility of using the Kunchenko stochastic polynomials [4] in algorithms for finding nonlinear estimates of the measured parameter. The polynomial evaluation synthesized at s=2, essentially characterized by greater accuracy in comparison with the linear estimates of the arithmetic average (according GUM), as well as a higher rate of normalization of their distribution. Reduction of the uncertainty of estimates of non-Gaussian pdf-s is determined by the degree of measurement error, numerically expressed as absolute values of cumulant skewness and kurtosis. It is clear that the decrease in standard uncertainty (variance) of estimates is achieved by taking into account additional information about the properties of the probability of measurement errors. This information depends on values of a specified number of cumulants. the description of uncertainties.

Conclusions 2 Research also shows that polynomial evaluations synthesized for order of s=3 of symmetrical samples are characterized by significantly greater accuracy then the linear estimates (arithmetic mean).

The improved value is determined by the level of non-Gaussianity expressed by the absolute values of the higher order cumulant coefficients. The decrease in standard uncertainty of estimates is achieved by taking into account additional information about the properties of the probability of measurement errors. This information depends on values of a specified number of cumulants. However, such estimation seems to be a much simpler task compared to the selection and testing adequacy of the law of distribution chosen for approximation of the measurement errors dispersion and for the description of uncertainties.

Conclusions 3

Among possible directions of further research one should mention the following: 

increase of the degree of the stochastic polynomial, which is necessary to get more effective solutions;



analysis of the dependence of the accuracy of determining the parameters of non-Gaussian model (cumulants) on the stability of polynomial estimation of the measured parameters;



synthesis and analysis of the properties of recurrent algorithms of polynomial estimation of the measured parameter. 31

References 1 1. Supplement 1 to the Guide to the expression of uncertainty in measurement (GUM) – Propagation of distributions using a Monte Carlo method, Guide OIML G 1-101 Ed. 2008. 2. Novickij P. V., Zograf I. A. (1991). Оcenka pogreshnostiej resultatov izmierenii (Estimation of the measurement result errors), Energoatomizdat, Leningrad (in Russ). 3. Mendel J. M. (1991), Tutorial on higher-order statistics (spectra) in signal processing and system theory: theoretical results and some applications. Proc. IEEE, 79(3), 278–305. 4. Kendall, M. G., Stuart, A. (1969). The Advanced Theory of Statistics, Volume 1: Distribution Theory, 3-rd Edition, Griffin. 5. Toybert P. (1988), Otsenka tochnosti rezultatov izmereniy (Estimation of accuracy of measurement results). Energoatomizdat, Leningrad (in Russ). 6. Zaharov I. P., Klimova E. A., (2014). Application of excess method to obtain reliable estimate of expanded uncertainty. Systemy obrobky informatsii, SOI no. 3(119), pp. 24-28 (in Russ.). 7. Kuznetsov B. F., Borodkin, D. K. and Lebedeva L. V. (2013). Cumulant models of additional errors. Sovremennye tekhnologii. Sistemnyi analiz. Modelirovanie, no. 1 (37), pp. 134-138 (in Russ.). 8. Carlo L. T. (1997), On the meaning and use of kurtosis. Psychological methods, 2(3), 292-307. doi:10.1037/1082-989X.2.3.292.

References 2 9. Kunchenko Y. (2002), Polynomial Parameter Estimations of Close to Gaussian Random variables. Germany, Aachen: Shaker Verlag. 10. Zabolotnii S. W., Warsza, Z. L. (2016), Semi-parametric estimation of the change-point of parameters of non-gaussian sequences by polynomial maximization method. Advances in Intelligent Systems and Computing (Vol. 440). Springer http: //doi.org/10.1007/978-3-319-29357-8_80. 11. Cramér H. (1999). Mathematical methods of statistics (vol. 9). Princeton University Press. 12. Beregun V. S., Garmash O. V., Krasilnikov A. I. (2014). Mean square error of estimates of cumulative coefficients of the fifth and sixth order. Electronic Modelling. V. 36, no.1, 17–28. 13. Lilliefors H. W. (1967). On the Kolmogorov-Smirnov test for normality with mean and variance unknown. Journal of the American Statistical Association, 62 (318), 399-402.

Examples: 1. Classic estimators of samples from TRAP distribution

Fig. 1. Standard deviations S [ Xˆ ] of measurand estimators: 2C: Xˆ = 0.5( X + qV/2), and of mean S [X ] and of midrange S[qV/2] of the sample from linear trapeze PDF Trap(a, b) as function of a ratio of bases β =a/b. Number of sample elements n = 40. Tab. 1. Three estimators of measurement result and uncertainty for the Trap PDF sample. Estimation of value and standard deviation S of measurand

X

qV /2 Xˆ

X  22,87 ; uA  0,09 X  (23,01; S [q V / 2 ]  0,09

~ X  22,94: S [ X ]  0,07

Value of measurand with extended uncertainty and its borders X  (22,87  0,19), P  0,95

X  (22,68; 23,06), P  0,95

X  (23,01  0,18), P  0,95 X  (22,83; 23,19), P  0,95 X  (22,94  0,14), P  0,95 X  (22,87; 23,15), P  0,95

Examples: 2. PPM estimators of samples from TRAP distribution , ,

ˆ 2  3 gˆ   3  2 ˆ   1

a Fig 2. Ratio of experimental to theoretical values of coefficients of decreasing the dispersion of parameter as function of number n of elements of Trap distribution sample.

ˆ 2  1  gˆ  3 

2

ˆ 2  3 

n

2 

  42 1   n  6  9 4   6 

ˆ 2  3  42 1 2 ˆ   1 6  9 4   6

 i  i

b Fig. 4 The empirical distribution of estimates of the SD parameter: a) the probabilistic graph (Q-Q plot) of Gaussian approximation; b) plots of the type Box-plot (99% confidence interval)

 2i

Tab. 2. Coefficients of the variance ratio of estimates

Results of simulation

β

g  3

gˆ  3  ˆ 2 3 / ˆ 2 1 qˆ 3  ˆ 2 3 / ˆ 2 V / 2

n 20

50

200

20

50

200

β=1

0,3

0,56 0,36 0,32 2,15 3,51 10,4

β = 0,75

0,36

0,61 0,45 0,38 1,53 1,29 1,04

β = 0,5

0,55

0,78 0,63 0,57 1,02 0,85 0,74

β = 0,25

0,76

0,97 0,86 0,79 0,9 0,77 0,71

β=0

0,84

1,03 0,95 0,87 0,84 0,76 0,69

Fig 3. Areas of effectiveness of estimates of standard deviation of samples from the Trap distribution

Thanks for your attention!

Dziękuję za uwagę Дякуємо за увагу

36

First mentioned

1286

City status

1795

Country

Ukraine

Oblast

Cherkasy

Raion

Cherkasy

Population (Sept. 01, 2011) • City

286 037

37

38

39

Черкасский государственный технологический университет 40

Драматический театр им. Т.Г.Шевченка

Дворец "Дружба народов"

"Укрсоцбанк"

"Долина роз" на берегу Днепра

41

"Холм Славы" (захоронение погибших cо 2-й Мировой войны)

Микрорайон "Мытница" (на намыве песка р.Днепр)

Парк г. Черкассы

Парк г. Черкассы

42

Дворец бракосочетания

Художественный музей

Обласной исторический музей

Обласной исторический музей 43