Multivariate Gaussian Density Function Approximated . . . . . . . 96 ..... E[(x − µ)k] is the kth (theoretical) moment of the distribution (about the mean), for k = 1,2,... 3.
PARAMETER ESTIMATION METHODS FOR DIFFERENT PROBABILITY DENSITY FUNCTIONS by
Andrew Chikondi Peter Mkolesia Submitted in partial fulfillment of the requirements for the degree
DOCTOR OF PHILOSOPHY IN SCIENCE
in the Department of Mathematics and Statistics FACULTY OF SCIENCE
TSHWANE UNIVERSITY OF TECHNOLOGY
Supervisor : Prof MY Shatalov Co-Supervisor : Dr CR Kikawa
MARCH 2017
DECLARATION
“I hereby declare that the thesis submitted for the degree of Doctor of Philosophy in Science, at the Tshwane University of Technology, is my own original work and has not been previously submitted to any other institution of higher education. I further declare that all sources cited or quoted are indicated and acknowledged by means of a comprehensive list of references ”.
Andrew Chikondi Mkolesia
c Tshwane University of Technology 2017 Copyright
ii
DEDICATION
I dedicate my thesis to my family. A special feeling of gratitude to my loving parents, Ruth and Peter Mkolesia, whose words of encouragement and tenacity echo in my ears. I also dedicate this thesis to my many friends and church family who have supported me throughout the process. I will always appreciate all they have done, especially Jabu Mtshweni for always checking on me, Bishop Dr. Emmanuel Klufio for his understanding and allowing me to take time off from church work and do my studies. I dedicate this work and give special thanks to my son Caleb Chimwemwe Mkolesia, daughter Naomi Chisomo Mkolesia and my wife Ndaga Lilly Mkolesia for being there for me throughout the entire doctorate program. You have been my best cheerleaders. Soli Deo Gloria.
iii
ACKNOWLEDGEMENTS
I wish to thank my supervisors, Prof. Michael Yu Shatalov and Dr. Cliff Richard Kikawa, who were more generous with their expertise and precious time to this research. I would like to acknowledge Prof. Shatalov for steering me in the right direction whenever I needed it. Special thanks to Dr. Kikawa, his door was always open whenever I ran into a trouble spot or had a question about my research or writing. I am gratefully indebted to him for his countless hours of reflecting, reading, encouraging, and most of all patience through the entire process. He always brought energy and insight into every one of our many discussions. Finally, I must express my very profound gratitude to my parents Peter and Ruth Mkolesia and my wife Ndaga Mkolesia for providing me with unfailing support and encouragement through my years of study and through the process of researching and writing this thesis. This accomplishment would not have been possible without them. Thank you.
iv
ABSTRACT
Natural and physical phenomena are modeled by non-linear mathematical models. The non-linearity of the models poses a challenge in the estimation of parameters that model such phenomena. Current methods of estimation use iterative routines which require provision of initial guess values (IGV) of the unknown parameters in order to optimize the model parameters. To provide IGVs one should have expert knowledge in the field being studied, and this is always more of an art than science. Providing IGVs that are not near the optimal solution may result into, long computation time and also failure to converge to the required optimal solution in some instances. The objective of the study was to develop parameter estimation methods which do not require initialization of the non-linear models parameters in order to provide optimal estimates. Non-linear statistical models that can be formulated as first order linear ordinary differential equations are considered. For parameter estimation, the models are firstly linearized using the differential techniques. Formulated models are linear in the unknown parameters, that are estimated using ordinary least-squares methods. In this work the three mixture Gaussian models, one-parameter Rayleigh, two-parameter Rayleigh distributions are considered in the one-dimensional parameter estimation framework. In the two-dimensional estimation framework a multivariate Gaussian is considered. Three methods and theorems are proposed and proven using appropriate numerical examples. The methods are also tested on real life data to check their accuracy and statistical properties. A novel method for the recognition of multiple Gaussian patterns v
(RoMGP) is formulated and a theorem is developed to estimate the multiple Gaussian patterns. Another method is developed for the parameter estimation for the Rayleigh (one and two) parameter distributions. The technique is to linearize the density function via differential methods. The estimation is done using the ordinary least-squares method through the minimization of the formulated goal function. Two data sets one simulated and the other real data, are used to assess the performance of the proposed differential least-squares method (DLSM). Graphical methods are also used to compare the DLSM and the maximum likelihood estimator (MLE) on the real data. It is shown that the proposed DLSM works well when the sample size is n ≥ 15. A method for estimating the parameters of a multivariate Gaussian using the principle of n-cross sections (PCS) is proposed and tested using numerical analysis. The PCS uses hyperplanes that splice the multivariate Gaussian distribution, the generated one dimensional distributions are in turn estimated using the minimization of the goal function and the estimation of the parameters is done by the generated systems of equations from the hyperplanes. The proposed methods can be adopted to compute the parameters of mixed density function models and data that follows a Rayleigh distribution. The methods can be used or adopted to compute the initial guess values for the maximum likelihood estimator, method of moments estimator (MME) or other iterative routines, for both procedural and theoretical Statistics, since the RoMGP, DLSM and PCS uses non trivial assumptions on the data.
vi
LIST OF TABLES
Table 3.1
Parameter values for the Three Mixture Model . . . . . . . . . . . .
41
Table 3.2
Sufficient Parameter values for the Three Mixture Model . . . . . .
41
Table 4.1
One Parameter Rayleigh Distribution Model . . . . . . . . . . . . .
54
Table 5.1
Two Parameter Rayleigh Distribution Model . . . . . . . . . . . . .
69
Table 5.2
Scale and Location Parameter for Glass Fiber Data . . . . . . . . .
71
Table 5.3
Performance of Proposed and Existing Methods . . . . . . . . . . .
72
Table 5.4
Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
Table 5.5
Average Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . .
74
Table 6.1
Hyperplanes for the Numerical Simulation . . . . . . . . . . . . . .
94
Table 6.2
Second Approximations using Hyperplanes Pij . . . . . . . . . . . .
95
Table 6.3
Parameter Estimation using the principle of n-cross sections . . . .
96
Table 7.1
Parameter Estimation using the STG1 the principle of MGF . . . . 118
Table 7.2
Sufficient Parameters using the STG1
Table 7.3
Sufficient Parameters using the STG1, MLE and MME . . . . . . . 120
Table 7.4
Approximation for the Specialization of the Gaussian . . . . . . . . 132
Table 7.5
Sufficient Parameters using the STG3
vii
. . . . . . . . . . . . . . . . 120
. . . . . . . . . . . . . . . . 132
LIST OF FIGURES
Figure 3.1
Three Component Mixture Model . . . . . . . . . . . . . . . . . . .
40
Figure 4.1
pdfs for a one Parameter Rayleigh Distribution . . . . . . . . . . . .
50
Figure 4.2
pdfs for a one Parameter Rayleigh Distribution . . . . . . . . . . . .
50
Figure 4.3
pdfs for a one Parameter Rayleigh Distribution . . . . . . . . . . . .
52
Figure 4.4
pdfs for a one Parameter Rayleigh Distribution . . . . . . . . . . . .
53
Figure 5.1
pdf Variations for a Two Parameter Rayleigh Distribution . . . . . .
66
Figure 5.2
pdf Variations for a Two Parameter Rayleigh Distribution . . . . . .
67
Figure 5.3
pdf Variations for a Two Parameter Rayleigh Distribution . . . . . .
67
Figure 5.4
pdf Variations for a Two Parameter Rayleigh Distribution . . . . . .
70
Figure 6.1
Multivariate Gaussian Density Function . . . . . . . . . . . . . . . .
93
Figure 6.2
Multivariate Gaussian Density Function Approximated . . . . . . .
96
Figure 6.3
Multivariate Gaussian Density Function Exact . . . . . . . . . . . .
97
Figure 6.4
Multivariate Gaussian Density Function Approximated . . . . . . .
98
Figure 7.1
Special Two Mixture Gaussian Function . . . . . . . . . . . . . . . . 113
Figure 7.2
Specialization of the Gaussian Function . . . . . . . . . . . . . . . . 114
Figure 7.3
Specialization of the Gaussian Function . . . . . . . . . . . . . . . . 119
viii
GLOSSARY
CAS
Computer Algebra System
DLSM
Difference Least-Squares Method
EM
Expectation Maximisation
FMM
Finite Mixture Models
GF
Goal Function
IGV
Initial Guess Value
i.i.d
Independent and Identically Distributed
LS
Least-Squares
MGF
Multiple Goal Functions
ML
Maximum Likelihood
MLE
Maximum Likelihood Estimator
MME
Method of Moments Estimator
NLS
Non-linear Least-Squares
ODM
Optimized Differential Method
OLS
Ordinary Least-Squares
PCS
Principle of n-Cross Sections
pdf
Probability Density Function
PM
Proposed Method
RoMGP
Recognition of Multiple Gaussian Patterns
SCPs
Single Component Peaks
STG1
Specialization of The Gaussian 1 (STG1) the algebraic solution
STG2
Specialization of The Gaussian 2 (STG2) using computer algebra systems
STG3
Specialization of The Gaussian 3 (STG3) using multiple goal functions
w.r.t
with respect to
ix
TABLE OF CONTENTS
DECLARATION
ii
DEDICATION
iii
ACKNOWLEDGEMENTS
iv
ABSTRACT
v
LIST OF TABLES
vii
LIST OF FIGURES
viii
GLOSSARY CHAPTER 1.
ix INTRODUCTION
1
1.1
OVERVIEW
1
1.2
BACKGROUND INFORMATION
2
1.3
STATEMENT OF THE RESEARCH PROBLEM
3
1.4
JUSTIFICATION OF THE STUDY
5
1.5
OBJECTIVES OF THE STUDY
5
1.6
SUMMARY OF CONTRIBUTIONS
6
1.6.1 List of Publications in Refereed Scientific Journals
6
1.6.2 Manuscripts Submitted in Refereed Scientific Journals and in Progress
7
1.6.3 Conferences and Seminars
7 x
1.7
THESIS LAYOUT
CHAPTER 2.
8
LITERATURE REVIEW
9
2.1
OVERVIEW
9
2.2
INTRODUCTION
9
2.3
APPROACHES TO PARAMETER ESTIMATION
10
2.3.1 The Frequentist Approach to Parameter Estimation
10
2.3.2 The Bayesian Approach to Parameter Estimation
10
2.4
12
PARAMETER ESTIMATION METHODS
2.4.1 Maximum Likelihood Estimation
12
2.4.2 Method of Moments Estimators (MME)
14
2.5
16
SOME PROPERTIES OF GOOD ESTIMATORS
2.5.1 Bias of an Estimator
17
2.5.2 Mean Squared Error and Efficiency of Estimators
18
2.6
19
Consistency
2.6.1 The Expectation-Maximization (EM) Algorithm
19
2.7
PROPOSED METHODS OF ESTIMATION
21
2.8
CONCLUSION
22
CHAPTER 3.
RECOGNITION OF A MIXTURE OF MULTIPLE
GAUSSIAN PATTERNS
23
3.1
OVERVIEW
23
3.2
INTRODUCTION
23
3.3
MOTIVATION
25
3.4
CURRENT METHODS OF SOLUTION
27
3.5
PROPOSED METHOD FOR GAUSSIAN MIXTURES
35
3.6
A MONTE CARLO SIMULATION STUDY
39
3.6.1 Three Component Mixture Model
39 xi
3.7
RESULTS AND DISCUSSION
40
3.8
CONCLUSION
42
CHAPTER 4.
ESTIMATION OF THE RAYLEIGH DISTRIBU-
TION PARAMETER
43
4.1
OVERVIEW
43
4.2
INTRODUCTION
43
4.3
MOTIVATION
44
4.4
PARAMETER ESTIMATION FOR RAYLEIGH DISTRIBUTIONS
45
4.4.1 Maximum Likelihood Estimator
46
4.4.2 Frequency Ratio Estimator
47
4.4.3 Differential Least-Squares Method
48
4.5
49
APPLICATION OF THE DIFFERENCE LEAST-SQUARES METHOD
4.5.1 Monte Carlo Simulation Study, estimation using the DLSM
49
4.5.2 Real Data for the Strength of Glass Fiber
51
4.5.3 Real Data for Electronic Component Failure Times
52
4.6
RESULTS
54
4.7
CONCLUSION
55
CHAPTER 5.
EXACT SOLUTIONS FOR A TWO-PARAMETER
RAYLEIGH DISTRIBUTION
56
5.1
OVERVIEW
56
5.2
INTRODUCTION
56
5.3
MOTIVATION
58
5.4
PARAMETER ESTIMATION
59
5.4.1 Maximum Likelihood Estimator
59
5.4.2 Method of Moment Estimators
62
5.4.3 Optimized Differential Method
64 xii
5.5
NUMERICAL SIMULATIONS
66
5.5.1 Results
68
5.6
69
DATA ANALYSIS
5.6.1 Strength of Glass Fiber Data
70
5.7
SUMMARY OF RESULTS
72
5.8
CONCLUSION
74
CHAPTER 6.
MULTIVARIATE GAUSSIAN PARAMETER ESTI-
MATION: THE PRINCIPLE OF n-CROSS SECTIONS
76
6.1
OVERVIEW
76
6.2
INTRODUCTION
76
6.3
MOTIVATION
77
6.3.1 Relation to the Univariate Gaussians
79
6.3.2 The Covariance Matrix
80
6.3.3 The Diagonal Covariance Matrix Case
83
6.4
PARAMETER ESTIMATION FOR MULTIVARIATE GAUSSIAN DISTRIBUTIONS
85
6.4.1 Multivariate Gaussian in the MLE Framework.
85
6.4.2 The Principle of n-Cross Sections
89
6.5
NUMERICAL SIMULATIONS
92
6.6
CONCLUSION
97
CHAPTER 7.
PARAMETER ESTIMATION FOR A SPECIAL-
IZED GAUSSIAN DISTRIBUTION
99
7.1
OVERVIEW
99
7.2
INTRODUCTION
100
7.3
MOTIVATION
101
7.3.1 Specialization For The Mixture Of The Gaussian Distributions xiii
101
7.3.2 Methodology for the Specialization of the Gaussian Distributions 7.4
103
ALGORITHM 1: SPECIALIZATION OF THE GAUSSIAN 1 (STG1) IN THE MULTIPLE GOAL FUNCTIONS (MGF) FRAMEWORK
106
7.4.1 First Goal Function (G1 )
107
7.4.2 Second Goal Function (G2 )
109
7.4.3 Third Goal Function (G3 )
110
7.4.4 Fourth Goal Function (G4 )
111
7.4.5 Numerical Simulation for Algorithm 1: Specialization of the Gaussian 1 (STG1) Multiple Goal Functions (MGF) Framework
112
7.4.6 Summary of Results for the Parameter Estimation using the STG1 the principle of MGF 7.5
118
ALGORITHM 2: SPECIALIZATION OF THE GAUSSIAN MIXTURES 2 (STG2) IN THE ANALYTIC FRAMEWORK
7.6
121
ALGORITHM 3: SPECIALIZATION OF THE GAUSSIAN 3 (STG3) USING THE COMPUTER ALGEBRA SYSTEM (CAS)
127
7.6.1 Numerical Simulation using Algorithm 3 (STG3)
130
7.7
133
CONCLUSIONS
CHAPTER 8.
CONCLUSIONS AND RECOMMENDATIONS
134
8.1
OVERVIEW
134
8.2
INTRODUCTION
134
8.3
MAIN AIM OF THE THESIS
136
8.4
SUMMARY OF THE THESIS FINDINGS
137
8.5
RECOMMENDATIONS FOR FUTURE WORK
139
REFERENCES
141
xiv
CHAPTER 1. INTRODUCTION
1.1
OVERVIEW
In this chapter a review of the problem in current methods of parameter estimation is highlighted and an introduction of the proposed methods for estimation are introduced. Chapter 2 presents the literature review on the current methods of parameter estimation. In Chapter 3 the recognition of a mixture of multiple Gaussian patterns and a method of estimating the sufficient parameters for a Gaussian mixed model is developed. In Chapter 4 the method of estimating the parameters of a one-parameter Rayleigh distribution is proposed and numerically validated, the method is also implemented on real life data and checked for its accuracy. Chapter 5 the method of parameter estimation developed in Chapter 4 is extended, so as to estimate a two-parameter Rayleigh distribution. The result is a new method that can be implemented in a computer algebra system (CAS) to estimate parameters for different distributions. In Chapter 6 a multivariate Gaussian distribution is discussed and a method for estimating the parameters for a multivariate Gaussian is proposed. The proposed method is tested on simulated data. In Chapter 7 a specialization for the case of a mixture of two Gaussian is investigated, three Mathematical approaches for estimating the parameters of the mixture of two Gaussian are proposed. The methods are then tested using CAS and numerical examples. Chapter 8 is the conclusion of the proposed methods of Chapters 3, 4, 5, 6 and 7, recommendations and further research are also stated.
1
1.2
BACKGROUND INFORMATION
Distributions occur naturally in both theoretical and experimental data. Usually these distributions are presented as a mixture of two or more in the data being studied. The most assumed distribution in natural phenomenon is the Gaussian. Modeling of the “single” Gaussian distribution is a little uncomplicated as opposed to the “multiple” distributions that may be present in the data set. With the classical parameter estimation methods namely, maximum likelihood estimation (MLE), method of moments estimators (MME), frequency estimators, ordinary least squares method (OLS). These techniques usually have a basic assumption that the residual error terms are independent or identically distributed (i.i.d), thus a need for developing estimation techniques and approaches that may relax the conditions and estimate the parameters of complex model phenomena in a flexible manner (Brown & Prescott, 2006). There exists estimation methods which are parametric and non-parametric. The parametric methods (MME, MLE and OLS) are restrictive as they model the distributions using only the given parameters of the assumed model or family of distributions with stringent model assumptions, while in the non-parametric methods, which are commonly known as distribution free methods, the data itself will detect the model to be used. Thus for non-parametric estimations the approximation of values of a probability density function is done by the given samples from the associated distributions. These non-parametric estimators make use of kernel methods and require n2 arithmetic operations to evaluate the density at n sample points.
Non-parametric regression methods achieve flexibility by assuming only the continuity of the unknown regression function. However, non-parametric regression procedures face the difficulty in extending themselves to model the parameters of data. There is a demand for fast and easy to implement methods that can be used to estimate the
2
parameters of given distribution models.
In response to this demand, this research develops a procedure and method of estimating the parameters of mixed models as well as known distributions. Parametric methods are developed that assume a priori probability density function for the data. From the mixed Gaussian density function a procedure is proposed called recognition of multiple Gaussian patterns (RoMGP) estimation method, which is also used to estimate a general n-component Gaussian model. Another model is also developed for the estimation of the sufficient parameters for a one-parameter Rayleigh distribution, and it is called the difference least-squares method (DLSM). The parameter estimation for a two-parameter Rayleigh distribution is also considered and investigated, the optimized differential method (ODM) is employed for the estimation task in this case. A multivariate parameter estimation method is also considered for the Gaussian distribution. A method for estimating the parameters is proposed for the multivariate Gaussian distribution the estimation method called the principle of n-cross sections (PCS) method. Algorithms for estimating a specialization for the case of two Gaussian mixtures are developed and tested. These algorithms help in eliminating the need for initial guess values by making use of Mathematical procedures to determine the estimates of the parameters for the mixture of two Gaussians. The algorithms are called specialization of two Gaussian mixtures 1, 2 and 3 (STG1, STG2, STG3). STG1 is the analytic approach (algebraic), STG2 is the computer algebra system (CAS) approach and STG3 is the multiple goal function algorithm, respectively.
1.3
STATEMENT OF THE RESEARCH PROBLEM
Considerable research has been done on the estimation of finite mixture distribution using the mixture of normal (Van Dijk, 2009); (Lindsay, 1995). Methods developed for
3
estimating the parameters for finite mixture distribution include the proposed multiple goal function (MGF) method (Kikawa, 2013), where a two normal mixture model is estimated, maximum likelihood (ML) (Mclachlan & Peel, 2000); (Mclachlan & Basford, 1988); (Hastie et al., 2001), finite mixture models (Filho, 2007), (Van Dijk, 2009), clustering methods (Mclachlan & Basford, 1988). The estimation of the parameters for the mixture distributions may take considerable time to approximate. In the case of the MGF it was observed that its performance was better than the maximum likelihood method (Kikawa, 2013). The parameter estimation of mixture distribution models involves iterative methods (Duda et al., 2012), which require initial guess values for the unknown parameters in the model. This contributes to time restraints (Mclachlan & Peel, 2000). The problems associated with mixture models largely relate to making statistical inferences about the properties of the constituent components, provided that the data on the overall mixture is available (Mclachlan & Peel, 2000). Since the identification of heterogeneous information presents a challenge and requires highly involved computations (Duda et al., 2012), (Mclachlan & Peel, 2000), (Sonka et al., 1999), but n-component density functions overcome the estimation problems through parametrization. In practice often complex distributions may be modeled relatively well with two component density functions, using well selected choices of components which represent perfectly selected features of their distribution (Jain et al., 2000). The parameter estimation of mixture distributions still remains of considerable interest (Kikawa, 2013), as its applications are wide in the scientific domain. The problem of this research is to propose methods that do not depend on the initialization of the unknown parameters in the model to be estimated. This requirement will increase the convergence rate for the parameters to be estimated.
4
1.4
JUSTIFICATION OF THE STUDY
The main aim of this thesis was to demonstrate methods of estimating unknown parameter values for selected distributions. These distributions can be decomposed in terms of ordinary differential equations that are linear with respect to the unknowns. The theorems and methods proposed throughout this thesis can be extended to other distributions. Since the existing parameter estimations may not guarantee convergence to the required solution, they have a slow convergence rate to the optimal solution due to the initial guess value (IGV) being far from the optimal solution and can exhibit high computation time resulting in slow convergence. Therefore, there is a need to propose new estimation methods or approaches that do not require provision of initial approximations to the unknown parameters in the model(s) to be identified. The research problem applicable to this study was to develop theorems and methods for computing initial guess values and estimating parameters for different distributions.
1.5
OBJECTIVES OF THE STUDY
The objectives of this study are: • To estimate complex probability density functions (pdfs) using ordinary differential equations (ODE) by expressing a multifaceted function (general pdf) of n-variables as a composition of less than n-variables. • Identification of sufficient parameters for multiple Gaussian patterns using the decomposition method and validate the proposed methods using appropriate simulations. • To develop methods of estimating parameters for selected probability distribution functions. 5
• Establish algorithms that estimate the parameters for the specialization mixture of Gaussian using mathematical approach as opposed to guess values and linear search models.
1.6
SUMMARY OF CONTRIBUTIONS
The contributions of this work are theorems and methods that can be applied to estimate parameters for different probability density functions by converting the data to transcendental least-squares models, into linear least-squares problems that are linear with respect to unknown parameters and have closed form and unique series solutions for the the unknown parameters.
1.6.1
List of Publications in Refereed Scientific Journals
[1] MKOLESIA, A.C., KIKAWA, C.R., SHATALOV, M.Y. & KALEMA, B.M. 2016. Recognition of a Mixture of Multiple Gaussian Patterns. International Journal of Pure and Applied Mathematics, 108(2):307-326. [Online]. Available from: DOI - 10.12732/ijpam.v108i2.8 or http://www.ijpam.eu/contents/2016-108-2/8/8. pdf [2] MKOLESIA, A.C., KIKAWA, C.R. & SHATALOV, M.Y. 2016. Estimation of the Rayleigh Distribution Parameter. Transylvanian Review Journal, 24(8):11581163.
[Online].
Available from: http://www.researchgate.net/publication/
304195622 Estimation of the Rayleigh Distribution Parameter [3] KIKAWA, C.R., SHATALOV, M.Y., KLOPPERS, P.H. & MKOLESIA, A.C. 2015. On the Estimation of a Univariate Gaussian Distribution: A Comparative Approach.
Open Journal of Statistics, (5):445-454.
6
[Online].
Available from:
DOI - 10.4236/ojs.2015.55046 or http://file.scirp.org/pdf/OJS 2015081314271990. pdf [4] KIKAWA, C.R., SHATALOV, M.Y, KLOPPERS, P.H. & MKOLESIA, A.C. 2016. Parameter Estimation for a Mixture of Two Univariate Gaussian Distributions: A Comparative Analysis of The Proposed and Maximum Likelihood Methods. [Online].
British Journal of Mathematics & Computer Science,
12(1): 1-8.
Available from: DOI - 10.9734/BJMCS/2016/16617 or http://www.
journalrepository.org/media/journals/BJMCS 6/2015/Sep/Kikawa1212015BJMCS16617. pdf
1.6.2
Manuscripts Submitted in Refereed Scientific Journals and in Progress
[1] MKOLESIA, A.C., KIKAWA, C.R. & SHATALOV, M.Y. 2016. Exact Solutions for a Two-Parameter Rayleigh Distribution. Alexandria Engineering Journal. (under review). [Oline]. http://ees.elsevier.com/aej/default.asp [2] MKOLESIA, A.C., KIKAWA, C.R. & SHATALOV, M.Y. 2016. Multivariate Gaussian Parameter Estimation: The Principle of n-Cross Sections. .... (In progress)
1.6.3
Conferences and Seminars
[1] Mkolesia, A.C. 2015. Methods to Solve Transcendental Least Squares Problems and Their Statistical Inference. Postgraduate Seminar in Mathematics and Statistics.
Dates: 29 November - 3 December 2015.
Metropolitan University, Port Elizabeth, South Africa.
7
Host: Nelson Mandela
1.7
THESIS LAYOUT
In Chapter 2, a review of literature concerning the concept of both linear and nonlinear least-squares estimation techniques is presented. Chapter 3, an assessment of the univariate and n-Gaussian distributions, a parameter estimation method for these distributions is developed and presented. Chapter 4 one-parameter Rayleigh distribution parameter estimation is investigated and a novel method for estimating the parameters is developed, presented and assessed. Chapter 5, two-parameter Rayleigh distribution is investigated and its parameters estimated using a proposed method. In Chapter 6, a multivariate Gaussian distribution is considered, a method for estimating the parameters for the multivariate Gaussian distribution is proposed and investigated. In Chapter 7, a special case of the mixture of a two Gaussian is considered, three algorithms of estimating the parameters of the specialization of the two Gaussian mixture are proposed and investigated using numerical simulations. In Chapter 8 the overall conclusion of the estimation methods proposed in the thesis are discussed, recommendations for future and further research are also presented.
8
CHAPTER 2. LITERATURE REVIEW
2.1
OVERVIEW
This Chapter details much of the related work concerning parameter estimation methods that are used in the rest of this thesis. It is aimed at giving the reader a context for the use of parameter estimation methods as well as an insight into their general applicability and usefulness in solving both practical and theoretical problems. The review of existing methodologies presents and justifies the problem of the study and motivation for proposing novel approaches for parameter estimation.
2.2
INTRODUCTION
Physical phenomena may be modeled and analyzed using transcendental mathematical models (Mindlin, 2006). For example, in analytical chemistry in the field of chemical chromatography (Schure & Davi, 2011) engineers usually encounter the challenge of identifying trace elements that have their chromatographic peaks close to each other. These peaks may sometimes be seen as single component peaks (SCPs), while they may be several peaks that are presenting as one peak. There exists several methods that can be used to identify the component mixtures, however, such methods require initial guess values (IGVs). Regular methods of finite mixture model (FMM) identification require IGVs that may result in high computation time, slow convergence and or even failure to converge if the provided IGVs are far from the optimal solution. In the
9
proposed methods, the FMM is firstly decomposed into it’s even and odd parts, which are linearized through differential techniques (ordinary linear homogeneous differential equation (ODEs)). Secondly, the ordinary least squares (OLS) method is employed to estimate the unknown parameters in the linearized models. Numerical simulations are used to evaluate the performance of the proposed method (PM).
2.3 2.3.1
APPROACHES TO PARAMETER ESTIMATION The Frequentist Approach to Parameter Estimation
A statistical model is the distribution, the observation X, is specified by its probability density function (pdf) (Fr¨ uhwirth-Schnatter, 2006). The choice of a model is an art in statistics and data should only be assumed to ascertain and correct if the data conforms to the selected model parameters (Piantadosi et al., 2014). Although one might be sure of the family from which the distribution X comes, we do not know what the distribution’s parameters are (Diebolt & Robert, 1994). The main point of doing simulations or experiments is (usually) to find out more about the parameters of other fixed characters of the distribution and this process is called estimation (Fr¨ uhwirth-Schnatter, 2006). In the frequentist paradigm it is also assumed that the signal-to-noise ratio is low (Palle & Esteban, 2013) and that the confidence interval is limited and significant (Lyons & Unel, 2005).
2.3.2
The Bayesian Approach to Parameter Estimation
The Bayesian approach to parameter estimation makes use of the posterior density distribution of the parameters to yield an appropriate posterior prediction model (Bretthorst, 1997). This approach can be used when little is known about the data, the advantage of this is that it helps to avoid fitting a density function to the data (Bretthorst, 1997). The Bayesian approach is also useful when the data is sparse or incomplete (few data), 10
because of its framework of probability and being subjective. The Bayesian density model maybe used to update the Expectation Maximum to be solved (Jaakkola, 2000). The method of least squares (LS) and maximum likelihood (ML) requires all parameters of the model to be estimated. This leads to computational problems of determining a global maximum in the parameter space of high dimensions. The Bayesian parameter estimation approach makes use of intuitive analysis (Kramer & Sorenson, 1988), thus certain nuisance parameters can be eliminated through the process of integration to only estimate the necessary parameters (Bretthorst, 1997). Thus the problem of high dimensionality is reduced to lower dimensionality by reducing the number of nuisance parameters to only the interesting ones that are needed (Bretthorst, 1997). This in turn reduces the search dimension. An advantage of the Bayesian parameter estimation is that through probability theory the accuracy of the estimate is computed directly (Bretthorst, 1997), while in the least squares method this is not given at all (Bretthorst, 1997). While in the ML an extra computation has to be done to determine the accuracy of the estimations (Bretthorst, 1997). The parameters estimated using the Bayesian approach cannot differ from the ones obtained using least squares or maximum likelihood (Bretthorst, 1997). The Bayesian parameter estimation can be used in probability risk assessment (PRA) (Siu & Kelly, 1998), understanding system behavior with posterior density at hand (Kramer & Sorenson, 1988), compute loss functions (Kramer & Sorenson, 1988), decisionmaking under uncertainty (Siu & Kelly, 1998). A disadvantage of the Bayesian parameter estimation is that it is mathematically intractable and nonlinear (Kramer & Sorenson, 1988), thus the Bayesian approach requires high integration dimensions (Christensen et al., 2001), this is due to the uncertainty or knowledge rather than variability of outcomes (Lyons & Unel, 2005). In this study the frequentist, classical or orthodox (Murphy, 2012) approach is used, instead of the Bayesian approach. The frequentist has been selected as the data is 11
considered to be complete and the general model fit for the data is assumed to be known. The frequentist approach generates a higher parameter dimension but with the advent of computational power this can be overcome. There is also need to track the parameters through reparameterization, the Bayesian approach may give different solution through reparameterization (Murphy, 2012). In the case of the Gaussian probability density function the frequentist estimates are the parameter values that maximize the probability for obtaining the data if a model signal is present within the data (Murphy, 2012), (Bretthorst, 1997). The frequentist approach is tractable and computes faster (Palle & Esteban, 2013) than the Bayesian approach.
2.4
PARAMETER ESTIMATION METHODS
There exists many parameter estimation methods such as Maximum likelihood estimators (MLE) (Fr¨ uhwirth-Schnatter, 2006), Bayes estimators (Bretthorst, 1997), Method of moments estimators (MME) (Gibson, 2015), Cram´erRao bound, Minimum mean squared error (MMSE) (Hall, 2005), also known as Bayes least squared error (BLSE), Maximum a posteriori (MAP) (Murphy, 2012), Minimum variance unbiased estimator (MVUE), nonlinear system identification, Best linear unbiased estimator (BLUE), Unbiased estimators, Particle filter, Markov chain Monte Carlo (MCMC) (Jasra et al., 2005), (Murphy, 2012), Kalman filter, Wiener filter. In this study the method of MLE and MME were considered as they are the classical frequentist approach to parameter estimation. Most of the other parameter estimation methods are derived from these classical methods (Fr¨ uhwirth-Schnatter, 2006).
2.4.1
Maximum Likelihood Estimation
With the availability of powerful computers and elaborate numerical algorithms, maximum likelihood estimation (MLE) became the preferred approach to parameter estima-
12
tion for finite mixture models for many decades (Fr¨ uhwirth-Schnatter, 2006), (Redner & Walker, 1984) provide a concise and excellent review of ML estimation for finite mixture models. The MLE was used for a univariate mixture of two normal distributions with σ12 = σ22 as early as 1948 (Rao, 1948). Further pioneering work for MLE was done by (Hasselblad, 1966) for univariate mixtures of normals, for general mixtures from the exponential family, (Wolfe, 1970) for multivariate mixtures of normals. In these early papers, the ML estimator ϑˆ is obtained by maximizing the mixture likelihood p(y | ϑ ) with respect to ϑ using some direct method such as Newtons method (Hasselblad, 1966) or a gradient method (Quandt, 1972). An iterative scheme for maximizing the likelihood function, developed by (Hasselblad, 1966), (Hasselblad, 1969), is nothing but an early variant of the EM algorithm, introduced later by (Dempster et al., 1977), which is the most commonly applied method to find the ML estimator for a finite mixture model. The maximum likelihood estimate (MLE) is the value θˆ which maximizes the function L(θ) given by L (θ) = g(X | θ)
(2.1)
where g is the pdf for a continuous random variable X = {x1 , x2 , . . . , xn }, thus the estimated parameter θˆ is given by
θˆ = arg max L(θ). θ
(2.2)
Thus, maximizing the pdf for continuous random variables, Xi ∈ R are independent and identically distributed (i.i.d) then
L(θ) =
n Y
g(xi | θ).
(2.3)
i=1
Thus, the need to determine the estimate θˆ ∈ Ω that maximizes L(θ), where Ω is the 13
parameter space (range of all possible values of the parameter θ). Definition 2.1. Maximum Likelihood Estimate Let a random sample Xi arising from a random sample distribution that depends on one or more unknown parameters, θ1 , θ2 , . . . , θm with pdf g Xi | θ1 , θ2 , . . . , θm . Suppose that (θ1 , θ2 , . . . , θm ) is restricted to a given parameter space Ω. Then: 1. L (θ1 , θ2 , . . . , θm ) =
n Y
g xi | θ1 , θ2 , . . . , θm is the likelihood function of θ where
i=1
(θ1 , θ2 , . . . , θm ) ∈ Ω. 2. If g1 (Xi ) , g2 (Xi ) , . . . , gm (Xi ) is the m-tuple that maximizes the likelihood function then θˆ = gj (Xi ) for i = 1, 2, . . . , n and j = 1, 2, . . . , m is the maximum likelihood estimator of θj for j = 1, 2, . . . , m. 3. The observed statistics gj (Xi ) for i = 1, 2, . . . , n and j = 1, 2, . . . , m are called the maximum likelihood estimates for θj for j = 1, 2, . . . , m. 2.4.2
Method of Moments Estimators (MME)
The method of moments estimation (MME) of parameters is obtained by deriving equations that relate to the density moments, these include expected values of powers of the random variable under consideration (Pearson, 1894). The method was introduced by Pearson (1894). The method of moments involves equating sample moments with theoretical moments (Pearson, 1894). Definition 2.2. Method of Moments In order to estimate k unknown parameters θ1 , θ2 , . . . , θk for a certain distribution say fX x | θ of a random variable X. Thus the distribution of the moments can be expressed as a function of the θs 1. µk ≡ E X k = gk (θk ) is the k th (theoretical) moment of the distribution (about the origin), for k = 1, 2, . . . 14
2. E[(x − µ)k ] is the k th (theoretical) moment of the distribution (about the mean), for k = 1, 2, . . . n
1X k 3. µ bk = Mk = X = gbk θbk is the k th sample moment, for k = 1, 2, . . . n i=1 i n
4.
Mk∗
1X ¯ k is the k th sample moment about the mean, for k = 1, 2, . . . = Xi − X n i=1
The procedure for the MME 1. Equate the first sample moment about the origin M1 to the first theoretical moment E(X). 2. Equate the second sample moment about the origin M2 to the second theoretical moment E(X 2 ). 3. Continue equating the sample moments about the origin Mk with the corresponding theoretical moments E(X k ) for k = 3, 4, . . . until there are as many equations as number of parameters. 4. Solve for the parameters. The resulting values are called method of moments estimators. The method converges to the probability distribution. Thus, corresponding moments should be about equal.
Example Consider a random variable X following some distribution, the k th moment of the distribution is defined as Mk = E(X k ).
15
(2.4)
Thus M1 = E(X), M2 = Var(X) + E(X 2 ). The sample moments of observation Xi for i = 1, . . . , n i.i.d for some distribution are defined as n
X ˆk = 1 M X k. n i=1 i
(2.5)
ˆ1 = X ¯ is the familiar sample mean and M ˆ2 = σ ¯ 2 where σ For example M ˆ2 + X ˆ is the standard deviation of the sample. The MME simply equates the moments of the distribution with sample moments Mk = ˆ k and solves for the unknown parameters. This implies that the distribution must M have finite moments. Nice Properties of the MME 1. Yields consistent estimators though they may be biased (Hall, 2005). 2. Easier to compute (Gibson, 2015) than the MLE (Hall, 2005). The MME can be computed by hand compared to other methods which may use iterative methods like the Newton-Raphson method. Not so nice Properties of the MME 1. Not sufficient (all data that is relevant to estimate the parameters of interest is used) 2. In some cases for large samples the estimated parameters may be outside the parameter space (D). Thus the estimates may not be reliable (Gibson, 2015).
2.5
SOME PROPERTIES OF GOOD ESTIMATORS
An estimator g(X), of a random variable X, in itself is random, because it comes from a random sample (Fr¨ uhwirth-Schnatter, 2006) and (Dempster et al., 1977). The estimator of a sampling distribution indicates how good the pdf will vary over large number of 16
independent experiments. The estimated parameter(s) of a sample should be close to the actual value of the exact value (Fr¨ uhwirth-Schnatter, 2006). To test the closeness of the estimated value to the central value is done by measuring the expectation. If the expectation of the estimated pdf is close to the actual value, then this is a favorable property for the estimator. Naturally, this leads to the concept of bias (Spokoiny & Dickhaus, 2015).
2.5.1
Bias of an Estimator
The bias of an estimator arises from the difference between the expected value of the estimator and the true value from the pdf (see Section 2.5 of Chapter 2), that is to say an estimator θb = g(X) where g is the pdf for the random sample X = (x1 , x2 , . . . , xn ) (Spokoiny & Dickhaus, 2015) and (McLachlan & Krishnan, 2008). The bias of the estimator is given by Bias g (X) = E g(x) − θ
(2.6)
where E is the expectation value. The bias measures the difference between the expectation and value it is meant to estimate, in this case θ. Definition 2.3. Bias If the following holds: E g(Xi ) = θ then the statistic g(X) is unbiased estimator of the parameter θ. Otherwise, g(X) is a biased estimator of parameter θ. From the definition 2.3 this means that Bias(g) = 0 is the expectation of g which is preciously θ and g is unbiased. That is to say an estimator with zero bias is called unbiased. Just because an estimator is unbiased it does not mean it is useful. Other characteristics have to also be considered in order to make the estimation of a pdf useful (Fr¨ uhwirth-Schnatter, 2006) and (Mclachlan & Peel, 2000).
17
2.5.2
Mean Squared Error and Efficiency of Estimators
In order to have an estimator to be useful, the random error
g(X) − θ
has to be considered (Pardoe, 2012). For a “good”estimator this random error should be small on average, to quantify the accuracy of the estimator g(X) by
E |g(X) − θ| ,
but the modulus function is not particularly nice to work with when trying to compute expectations. A better measure is to use the mean squared error (MSE) of g(X), given by
MSE g (X) = E
g(x) − θ
2
(2.7)
if g(X) is unbiased then E g(X) = θ so that
MSE g (X) = Var g(x)
(2.8)
MSE g (X) = Bias2 g(x) + Var g(X) .
(2.9)
generally given by
The quality and efficiency of the estimator is defined by the MSE. An estimator is efficient if it has the lowest possible variance among all unbiased estimators (Pardoe, 2012).
18
2.6
Consistency
An estimator is consistent if the larger the size of the random sample X the better the estimate is (Pardoe, 2012) and (Spokoiny & Dickhaus, 2015). Thus a consistent estimator will converge at the true parameter value as the sample size increases (Reddy, 2011). Definition 2.4. Consistency For a random sample X and g(X) is an estimator of the parameter θ, g is consistent if, for all δ
8
lim p |g(X) − θ| > δ = 0
n→
Thus as the sample size n increases the probability that the modulus of the error is bigger than δ tends to zero (for any δ). The smaller the δ then the larger n would need to be to ensure probability of having an error smaller than δ. The relationship between the MSE and the consistency of an estimator is given by using the Tchebychev’s inequality having 2
MSE g(X) = . (2.10) δ2 From Equation (2.10) the pdf g(X) is consistent if MSE g(X) → 0 as n → . Thus E g(X) − θ P |g(X) − θ| > δ < δ2
8
MSE = Var + Bias2 .
2.6.1
The Expectation-Maximization (EM) Algorithm
The expectation-maximization (EM) algorithm was introduced for general latent variable models in the seminal paper by (Dempster et al., 1977), who also mentioned applications to finite mixture models. The use of the EM algorithm for the estimation of mixture models has been studied in detail in (Redner & Walker, 1984) and (Meng & VanDyk, 1997), provide a very inspiring general level tutorial on the EM algorithm in the context 19
of finite mixtures of Poisson distributions, whereas the monograph of (Mclachlan & Peel, 2000) gives full details for a wide range of finite mixture models. To implement the EM algorithm for a finite mixture model, the log of complete-data likelihood function p y, S | ϑ , is written as N X K X log p y, S | ϑ = Dik log ηk p yi | θ k ,
(2.11)
i=1 k=1
where Dik is a 0 \ 1 coding of the allocations Si : Dik = 1, iff Si = k. Starting from ϑˆ(0) , the EM algorithm iterates between two steps: an E-step, where the conditional expectation of log p y, S | ϑ , given the current data and given the current parameter is computed, and an M-step in which parameters that maximize the expected completedata log likelihood function, obtained from the E-step are determined. Under fairly mild regularity conditions, the EM algorithm converges at a local maximum of the mixture likelihood function (Dempster et al., 1977), (Wu, 1983). For mixture models, the E-step leads for m ≥ 1 to the following estimator of Dik ,
ˆ (m) = D ik
(m−1) ηˆk p K X
(m−1) ηˆj p
yi |
(m−1) θˆk
, (m−1) yi | θˆj
(2.12)
j=1
and the M-step involves maximizing N X K X
ˆ (m) log ηk p yi | θk D ik
(2.13)
i=1 k=1
with respect to all unknown components in ϑ = (θθ 1 , . . . , θ K , η )leading to new estimate (m) ϑˆ . It is easy to verify that for an arbitrary mixture
(m) ηˆk
nk = N
nk =
N X i=1
20
ˆ (m) , D ik
(2.14)
whereas the estimator of the component parameters θ k of course depends on the distribution family underlying the mixture. For mixtures of Poisson distributions, for instance, the estimator of the component mean µk reads: (m)
µ ˆk
=
N 1 X ˆ (m) D yi . nk i=1 ik
(2.15)
A disadvantage of the EM algorithm compared to direct maximization of the likelihood function is much slower convergence. Following (Redner & Walker, 1984), who recommended combining the EM algorithm with Newtons method, several authors used hybrid algorithms for mixture estimation, (Aitkin & Aitkin, 1996) for a review.
2.7
PROPOSED METHODS OF ESTIMATION
In this work, the estimation of the parameters of different distribution, namely Gaussian and Rayleigh (one and two parameters) distributions are considered. The proposed method is achieved by converting the random sample X for the pdf into its even and odd components. The estimation of the sufficient parameters for the distribution is then done by the series expansion (differential method) for the converted random sample data X. This conversion increases the parameter space, by introducing new parameters that are also estimated together with the sufficient parameter for the random X distributed data. The estimates of the parameters are then computed using the standard method of least squares. The Rayleigh distribution parameter estimation is done by increasing the parameter dimensional space (D) of the original distribution and minimizing a goal function, using the differential method. An optimal differential method is developed to estimate the parameters of a two parameter Rayleigh distribution. A multivariate Gaussian distribution is also investigated and its respective parame-
21
ters estimated. The method for estimating the parameters for a multivariate Gaussian distribution is called the principle of n-cross sections. This is achieved by deriving hyperplanes that intersect with the multivariate Gaussian distribution at different angles respectively. The dimension of the multivariate distribution is reduced in order to solve for reduced parameters than the original multivariate Gaussian distribution. That is to say, from a two dimensional space with six unknown parameters the distribution is reduced to a one dimensional space that has only three undetermined parameters thus require three or more cross sections (hyperplanes) to solve for the required parameters. This is in contrast with the six parameters that require six or more equations generated from a corresponding number of cross sections (hyperplanes). A specialized mixture of two Gaussian distributions is considered, three algorithms (STG1, STG2 and STG3) are proposed. These algorithms are not iterative and require no initialization. The proposed algorithms present a new mathematical approach in approximating the parameters of the mixture of two Gaussian distributions.
2.8
CONCLUSION
Parameter estimation theory is an important area of study and thus there is need for more investigations. In this study emphasis has been put on the frequentist approach to parameter estimation due to its tractability, and scalability. Extensive numerical simulations have also been presented to test the viability of the proposed methods.
22
CHAPTER 3. RECOGNITION OF A MIXTURE OF MULTIPLE GAUSSIAN PATTERNS
3.1
OVERVIEW
In this Chapter a novel method for the recognition of multiple Gaussian patterns by estimating their sufficient parameters is proposed. Regular methods of FMM identification require initial guess values (IGVs) that may result in high computation time, slow convergence and or even fail to converge if the provided IGVs are far from the optimal solution, (Hastie & Tibshirani, 1996), (Duda et al., 2012), and (Figueiredo et al., 2002). The FMM is firstly decomposed into its even and odd parts, which are linearised through differential techniques. Secondly, the ordinary least squares (OLS) method is employed to estimate the unknown parameters in the linearised models. A Monte Carlo simulation is done to evaluate the performance of the proposed method, recognition of multiple Gaussian patterns (RoMGP). It is shown that numerical results of the PM compare well with the simulated values.
3.2
INTRODUCTION
Data measurements of real world phenomena are modeled by a mixture of continuous density functions, (Mclachlan & Peel, 2000), these data may replicate finite mixture models (FMMs), (Van Dijk, 2009), that is a mixture of Gaussian distributions, and the mixture models are capable of approximating any arbitrary distribution, (Mclachlan &
23
Peel, 2000). The FMM provides a natural representation of the heterogeneity of a finite number of latent classes, it concerns modeling a statistical distribution by a mixture (or weighted sum) of other distributions, that is, Gaussian distributions. A FMM provides a parametric alternative that describes the unknown distribution in terms of a mixture of known distributions, (Kessler & McDowell, 2012). The FMMs are also known as, latent class models or unsupervised learning models, (Mclachlan & Peel, 2000), (Mclachlan & Basford, 1988), and (Hastie et al., 2001). A mixture model is usually probabilistic in nature and encompasses finite or infinite sub-populations referred to as components. The FMMs are closely related to intrinsic classification models, clustering, (Jain & Dubes, 1988) and numerical taxonomy. Finite mixture models have a wide range of applications in the field of signal processing, (Lindsay, 1995) , image processing, (Hastie et al., 2001), (Jain et al., 2000), (Liang et al., 1992b), and (Liang et al., 1992a), pattern recognition, (Hastie & Tibshirani, 1996), (Jain et al., 2000), machine learning such as modeling, clustering, classification, (Filho, 2007), (Hastie et al., 2001), and (Hastie & Tibshirani, 1996) and survival analysis. Mixture models of normal densities with common variance in the univariate case, may be estimated with a continuous distribution. Thus a need to develop a methodology for the recognition of multiple Gaussian patterns by estimating sufficient parameters of a finite mixture model (FMM), the Gaussians mixture models are capable of approximating any arbitrary distribution, (Van Dijk, 2009), (Mclachlan & Peel, 2000). In pattern reconstruction, FMM permits a probabilistic model based approach for unsupervised learning, (Jain et al., 2000). Generally from a statistical perspective FMM is a semi parametric/non parametric estimator of the density function, (Lindsay, 1995). Experience suggests that usually only few latent classes are needed to approximate the density function well, (Titterington et al., 1985). In practice FMM are flexible extensions to basic parametric models and can generate skewed distributions from symmetric components. Identification of heterogeneous information presents an enormous challenge and requires highly involved computations, but n-component den24
sity functions overcome the estimation problems, (Duda et al., 2012).
This chapter is organized as follows. Section 3.3, describes the motivation of the proposed method. In Section 3.4, current methods (regular guess value method) of parameter estimation are investigated. A theorem is stated for estimating a general n−component mixture model. Section 3.5, the proposed method is discussed. In Section 3.6, a Monte Carlo simulation is considered to assess the performance of the proposed method. In Section 3.7, results and discussion of the study are presented and conclusions presented in Section 3.8.
3.3
MOTIVATION
The estimation of parameters for finite mixture distributions containing normal distributions components was studied by (Hastie & Tibshirani, 1996). In the work by, (Kikawa, 2013), and (Kikawa et al., 2015) , a proposed method recognition of multiple Gaussian patterns (RoMGP), for estimating parameters of a two component mixture with equal prior probabilities, was considered, the two component mixture model was presented as −(x−εµ1 ) { λ 2 2σ1 f (x, σ1 , σ2 , µ1 , µ2 , λ) = √ e σ1 2π
2
}
2
2) } 1 − λ { −(x−εµ 2 2σ2 + √ e , σ2 2π
(3.1)
where x is the study variable, σ12 , σ22 are the variances, µ1 , µ2 are the mean and λ is the proportion of each component in the mixture model, for 0 < λ < 1. The general form of Equation (3.1) is given as 2
2
f (x, Ψ) = A1 e−α1 (x−µ1 ) + A2 e−α2 (x−µ2 ) ,
(3.2)
where Ψ = (A1 , A2 , µ1 , µ2 , α1 , α2 ) .
From Equation (3.2), Ψ is a complete collection of distinct parameters occurring in 25
the mixture model and is the “small” parameter, which characterises smallness of the distance of the two means µ1 and µ2 from their absolute mean in the mixture. The goal here is now to estimate the parameters of Ψ. In the case of an n-component distribution Equation (3.1) can be written as n X
2
−(x−µi ) } { 1 2 f (x, σi , µi , λi ) = λi √ e 2σi , σ 2π i i=1 n X where λi = 1 for λi ∈ [0, 1].
(3.3)
i=1
The general form of Equation (3.3) for an n-component normal mixture is given as
f (x, Ψ) = A1 e−α1 (x
− µ1 )2
+ A2 e−α2 (x
− µ2 )2
+ · · · + An e−αn (x
− µn )2
,
(3.4)
where Ψ = (Ai , µi , αi ) for i = 1, 2, 3, · · · , n.
There exist several methods used to compute FMM parameters, that is, Expectation maximization (EM), Newton-Gauss, Bayesian, Markov chain Monte Carlo, Spectral method, Graphical Methods etc. However, all these methods use iterative routines, (Van Dijk, 2009) and (Duda et al., 2012) where initial guess values must be provided to start the iteration process, (Hastie et al., 2001) before computing the required optimal solutions. The challenge with initial guess value routines are: 1. slow convergence, (Hastie & Tibshirani, 1996)and (Duda et al., 2012) ; 2. high computational time, (Mclachlan & Peel, 2000) (This is due to the problems associated with mixture models largely relating to making statistical inferences about the properties of the constituent components, provided that the data on the overall mixture is available, (Mclachlan & Peel, 2000)) and 3. may fail to converge if the initialization values are far from the required solution. 26
3.4
CURRENT METHODS OF SOLUTION
As mentioned in Section 3.3, the existing methods of solution use iteration procedures to determine the optimal solutions. Consider the three component Mixture Model given as
f (x) =
n X
2
Ai e−αi (x−µi ) , for n = 3.
(3.5)
i=1
The function f (x) in Equation (3.5) is decomposed into its even and odd parts
fe (x) =
1 f (x) + f (−x) , 2
(3.6)
fo (x) =
1 f (x) − f (−x) . 2
(3.7)
and
Considering the expansion of fe (x) and fo (x) in series with respect to (at = 0).
At first approximation with error of order one, O(), Equation (3.5) yields def
fe (x) =
fo (x) def = x
n X i=1 n X
2
Ai e−αi x + O(), for n = 3, 2
Bi e−αi x + O().
(3.8) (3.9)
i=1
From Equation (3.8) the parameters Ai and αi can be approximated. The odd function Equation (3.7) was divided by an x term so as to approximate it as an even function Equation (3.6). In this way the equations from the odd function can be compared to those from the even component. From Equation (3.9) the following parameters can be obtained ai where ai =
Bi . 2αi Ai
27
At second approximation with error of order two, Equation (3.5) yields def
fe (x) =
n X
ei + B ei x2 e−αi x2 + O(2 ), A
(3.10)
i=1
ei = 2Ai α2 (µi )2 , ei = Ai 1 − αi (µi )2 and B where A i n fo (x) def X e e i x2 e−αi x2 + O(), = Ci + D x i=1
(3.11)
ei = Ai 2αi µi (1 − αi (µi )2 ) and D e i = Ai 4 αi3 (µi )3 . where C 3 By change of variables (reparameterization), let Z = x2 Equations (3.10 & 3.11) can be written as def
fe (x) =
fo (x) def = x
n X i=1 n X
e e Ai + Bi Z e−αi z + O(2 ),
(3.12)
ei + D e i Z e−αi z + O(). C
(3.13)
i=1
Equation (3.12) represents the solution of an exact linear differential equation with real roots, having a characteristic equation of the form n Y
(λ + αi )2 = 0.
(3.14)
i=1
Thus the solution of Equation (3.14) is λj,k = −αi where j = 1, 3, 5, · · · , n and k = 2, 4, 6, · · · , n and i = 1, 2, 3, · · · , n. Assuming that a dataset {xk , f (xk ) = fk }, for k = 1, ..., N , is available, a goal function (Gn ) where n is the number of mixture models can be used to estimate the parameters Ai , αi and ai of Equation (3.15). Using the first approximation Equations
28
(3.8 & 3.9), the goal function for a three component mixture model is given as N
1X 2 G3 = G3 (Ai , αi ) = {Ai e−αi (xk −µi ) − fk }2 2 k=0
(3.15)
G3 → min, for i = 1, 2, 3.
e1 is given by The approximated goal function of Equation (3.15), G 2 N 1X e 2 −αi x2k e e e e e e e G3 = G3 (Ai , Bi , Ci , αi ) = Ai + Bi xk + Ci xk e − fk , 2 k=0
(3.16)
e3 → min, for i = 1, 2, 3. G
ei , B ei , C ei and αi , Deriving the partial derivatives of Equation (3.16) with respect to A the following equations are generated N X e1 ∂G 2 2 = e−αi xk ξe−αi xk − fk , ei ∂A
(3.17)
N X e1 ∂G 2 2 = xk e−αi xk ξe−αi xk − fk , ei ∂B
(3.18)
e1 ∂G = ei ∂C
(3.19)
k=0
e1 ∂G = ∂αi
k=0 N X
2 2 x2k e−αi xk ξe−αi xk − fk ,
k=0 N X
2 x2k e−αi xk ξ
−αi x2k fk − e ξ ,
(3.20)
k=0
ei + B ei xk + C ei x2 for i = 1, 2, 3. where ξ = A k
It can be observed that the least squares normal Equations (3.17, 3.18, 3.19 and , 3.20) can not be solved explicitly to give exact solutions. Iterative methods that require IGVs to compute the optimal solutions have to be employed. Thus, the justification of developing the proposed method that does not require initial guess values in order to
29
compute the optimal solutions. Theorem 3.1. A generalised n-component Gaussian mixture model, f (x) =
n X
2
Ai e−αi (x−µi ) ,
i=1
can be approximated using a series about a predetermined origin. Proof: Considering the general form of an n-component normal mixture model, Equation (3.4) can be generalised as
f (x) =
n X
2
Ai e−αi (x−µi ) .
(3.21)
i=1
Equation (3.21) can be decomposed into even and odd functions as
fe (x) =
n X
ei e−αi x2 + O(2 x2 ) + O(2 ), A
(3.22)
i=1
ei = Ai e−αi µ2i 2 for i = 1, 2, · · · , n, where A
and
fo (x) =
n X
ei xe−αi x2 + O(2 x3 ) + O(2 x), B
(3.23)
i=1
ei = 2Ai αi µi e−αi µ2i 2 for i = 1, 2, · · · , n, where B
respectively.
The working error of the even and odd parts are of order two O(2 x2 ) + O(2 ) , the error indicates the closeness of the mean µi of the mixture model. Extending the order of the error, say to order three, will only increase the number of parameters to be estimated thus rendering the formulation to be less accurate, that is, the even and odd parts with error of order three now become Equation (3.24) and Equation (3.25)
30
respectively,
fe (x) =
n X
ei e−αi x2 (1 + 4α2 µ2 2 x2 ) + O(4 x2 ) + O(4 ), A i i
(3.24)
ei xe−αi x2 + O(4 x2 ) + O(4 ). B
(3.25)
i=1
fo (x) =
n X i=1
The odd part was divided by an x term so as to approximate it as an even part. In this way the odd part can be estimated as those of the even part. n
fo (x) X e −αi x2 = Bi e + O(2 x2 ) + O(2 ), x i=1
(3.26)
ei = 2Ai αi µi e−αi µ2i 2 for i = 1, 2, · · · , n. where B n
fo (x) X e −αi x2 = Bi e + O(2 x2 ) + O(2 ). x i=1
(3.27)
Hence, fe (x) ≡
fo (x) . x
(3.28)
The parameters of the even function are ei = Ai e−αi µ2i 2 , for i = 1, 2, · · · , n. A
(3.29)
The parameters of the odd function are
ei = 2Ai αi µi e−αi µ2i 2 , for i = 1, 2, · · · , n. B
(3.30)
Using the data for the n-component mixture model Equation (3.21) the parameter ei and A ei , for i = 1, 2, · · · , n may be obtained using the ordinary least squares estimates B (OLS) method, via parametric ODEs. √ In order to simplify the computations, let y = x2 , then f (x) = f ( y), for ease of 31
√ presentation this can be written as f (x) = f ( y) = fe(y). Thus the even part can be represented as
fe (x) = fee (y) =
n X
ei e−αi y + O(2 y) + O(2 ). A
(3.31)
i=1
Similarly the odd part is represented by n
X fo (x) ei e−αi y + O(2 y) + O(2 ), = feo (y) = B x i=1
(3.32)
ei = 2Ai αi µi e−αi µ2i 2 , for i = 1, 2, · · · , n. where B Let Zi = e−αi y for i = 1, 2, · · · , n this transforms Equations (3.31 & 3.32) to Equations (3.33 & 3.34) respectively.
fe (x) = fee (y) = fee (Z) = fo (x) = feo (y) = feo (Z) = x
n X i=1 n X
ei Zi + O(2 y) + O(2 ). A
(3.33)
ei Zi + O(2 y) + O(2 ). B
(3.34)
i=1
By letting Zi = e−αi y for i = 1, 2, · · · , n in Equations (3.31 & 3.32), yields a linear transformation of the model, Equations (3.33 & 3.34) respectively. Equation (3.31) is the solution to the exact linear ordinary differential equation with real roots of order n, Equation (3.35). dn e dn−1 e dn−2 e d f + η f + η fe + · · · + ηn−1 fee + ηn fee = 0. e 1 e 2 n n−1 n−2 dy dy dy dy
(3.35)
Using the multiple OLS method, the estimated values of ηi for i = 1, 2, · · · , n may be obtained using the data set {yi , fee (yi )} for i = 1, 2, 3, · · · , m. The numeric computations of
di e f dy i e
for i = 1, 2, · · · , n are used to obtain the estimated parameters of ηi . The multiple
32
regression is given by Equation (3.36).
Ye =
n X
Mei Xei , where Ye =
i=1
dn fee dn−i e , X = fe ei dy n dy n−i
(3.36)
and Mei = −ηi , for i = 1, 2, · · · , n.
The auxiliary equation for the linear ODE Equation (3.35) is given by
γ n + ηi γ n−1 + ηi+1 γ n−2 + · · · + ηn = 0.
(3.37)
Considering the linear roots of the auxiliary Equation (3.37), we have
n
γ + ηi γ
n−1
+ ηi+1 γ
n−2
n Y + · · · + ηn = (γ + τi ), for i = 1, 2, · · · , n.
(3.38)
i=1
Thus solving for the roots of Equation (3.37) yields the parameters of Zi = e−αi y . n Y (γ + τi ) = 0 , for i = 1, 2, · · · , n.
(3.39)
i=1
The finite mixture dataset {xi , f (xi )} for i = 1, ..., m, a new dataset {yi = x2i , f (yi ) = f (x2i )} for i = 1, 2, · · · , m can be constructed. Equation (3.31) can be solved using ei , multiple ordinary least squares (OLS) method to obtain parameter estimates for A for i = 0, 1, 2, · · · , m, since Yji = e−αj yi , where i = 1, 2, · · · , n and j = 1, 2, · · · , m. ei , into Equation (3.22), n equations are obtained, but with 2n unknown Substituting for A parameters. At this stage, the parameters cannot be estimated. From Equation (3.22) ei = Ai e(−αi µ2i 2 ) for i = 1, 2, · · · , n. A The odd part can be estimated exactly as the even part, since the relation in Equation (3.28) holds. Thus, the approaches for parameter estimation in both odd and even
33
components are similar. n
fo (x) X e −αi x2 = Bi e + O(2 x2 ) + O(2 ), x i=1
(3.40)
ei = 2Ai µi αi e−αi 2 (µi )2 . where B
The even and odd components can be compared using Equation (3.31 & 3.32). Let √ √ y = x2 then f (x) = f ( y) and, fe(y) = f ( y).
fe(y) =
n X
ei e−αi y + O(2 y) + O(2 ). B
(3.41)
i=1
Consider
Yi = e−αi y , for i = 1, 2, · · · , n.
(3.42)
Let fe(y) = fe(Y ) and substitute Equation (3.42) into Equation (3.41) to obtain
fe(z) =
3 X
ei Yi . B
(3.43)
i=1
Considering the dataset for the mixture model {xi , f (xi )}, using the multiple OLS on the dataset {xi , f (xi )} for the mixture model Equation (3.43) the parameter estimates ei for i = 1, 2, · · · , n. Thus the parameters of the mixed Gaussian are obtained as of B ei B , 2µi αi e−αi 2 (µi )2 1 α ei = − 2 ln |Zi |, x ei B µ ei = , for 6= 0. ei αi 2A
ei = A
34
The function now has n equations with n unknowns. Combining the Even and Odd functions Equations (3.36 & 3.43) estimating the general unknown parameters Ai , µi , of the model Equation (3.21), the sufficient parameters of the Mixture Model can be obtained, that is, a system of n Equations and n unknowns.
3.5
PROPOSED METHOD FOR GAUSSIAN MIXTURES
In this Section Theorem 3.1 is applied to a three component Gaussian mixture model, in order to estimate the parameters for the model. The proposed method is called, recognition of multiple Gaussian patterns (RoMGP) estimation method. The 3-component univariate normal density mixture is considered and can be formulated as follows
f (x, Ψ) = f (x, σ1 , σ2 , σ3 , µ1 , µ2 , µ3 , λ, ), where for
Ψ = (σi , µi , )
(3.44)
i = 1, 2, 3.
A complete collection of distinct parameters for the mixture model is given by Ψ and is the “small” distance of the means µi for i = 1, 2, 3. Thus Equation (3.44) can be written as
f (x, Ψ) =
N X
2
Ai e−αi (x−µi ) , where
Ai =
k=0
for fe (x) =
X
3 X
λ √i σi 2π
(3.45)
λi = 1, λi ∈ [0, 1].
ei e−αi x2 + O(2 x2 ) + O(2 ), A
(3.46)
i=1
ei = Ai e−αi 2 µ2i . where A fo (x) =
3 X
ei x e−αi x2 + O(2 x2 ) + O(2 ), B
i=1
ei = 2Ai µi αi e−αi 2 (µi )2 . where B
35
(3.47)
Equation (3.47) may be estimated in the same way as the even components 3
fo (x) X e −αi x2 = Bi e + O(2 x2 ) + O(2 ), x i=1
(3.48)
ei = 2Ai µi αi e−αi 2 (µi )2 . where B
The parameters αi for i = 1, 2, 3 for Equation (3.46) and Equation (3.48) can now be estimated using the ordinary least squares method (OLS). Using the change of parameters y = x2 Equation (3.46) represents the transformed even component. √ f (x) = f ( y) = fe(y),
(3.49)
fe (x) = fee (y),
(3.50)
fee (y) =
3 X
ei e−αi y + O(2 y) + O(2 ). A
(3.51)
i=1
The transformation of the odd and even parts of the three FMM is given as
fe (x) = fee (y) = fee (z) = feo (y) feo (z) fo (x) = √ = = x y x
3 X
ei Zi . A
i=1 3 X
ei Yi . B
(3.52) (3.53)
i=1
where Yi = e−αi y for i = 1, 2, 3.
ei and B ei for Using the Ordinary Least Squares (OLS) to estimate the parameters A i = 1, 2, 3 from the data set {xi , f (xi )} for i = 1, · · · , m. Equations (3.52 & 3.53) are solutions to the homogeneous linear ordinary differential equation (ODE) with constant
36
coefficients Equations (3.54 & 3.55). d3 e d2 e d f + η fe + η2 fee + η3 fee = 0, e 1 3 2 dy dy dy 3 2 d e d d fo + η1 2 feo + η2 feo + η3 feo = 0. 3 dy dy dy
(3.54) (3.55)
Using the multiple OLS method, the estimated values of ηi for i = 1, 2, 3 may be obtained using the data set {yi , fee (yi )} for i = 1, 2, 3, · · · , m. The solutions ηi yield the values of αi using the characteristic equation 3 Y (γ + τi ) = γ 3 + η1 γ 2 + η2 γ + η3 ,
(3.56)
i=1 3 Y (γ + τi ) = (γ + τ1 )(γ + τ2 )(γ + τ3 ). i=1
η1 = τ 1 + τ 2 + τ 3 ,
(3.57)
η2 = τ 1 τ 2 + τ 1 τ 3 + τ 2 τ 3 ,
(3.58)
η3 = τ 1 τ 2 τ 3 .
(3.59)
Thus solving Equations (3.57, 3.58 & 3.59) will yield:
τ 1 = α1 ,
(3.60)
τ 2 = α2 ,
(3.61)
τ 3 = α3 .
(3.62)
The Finite Mixture dataset {xi , f (xi )} for i = 1, ..., m, a new dataset {yi = x2i , f (yi ) = f (x2i )} for i = 1, ..., m can be constructed. Equation (3.52) can also be solved using e1 , A e2 multiple ordinary least squares (OLS) method to obtain parameter estimates for A 37
e3 , since Y1i = e−α1 yi , Y2i = e−α2 yi and Y3i = e−α3 yi . Substituting for A e1 , A e2 and and A e3 , into Equation (3.46), three equations are obtained, with six unknown parameters, A at this stage, the parameters cannot be estimated. from Equation (3.46). In the same way for the odd part of the finite mixture model is estimated Equation (3.53), since the even part is similar to the odd part, Equation (3.28), then the approaches of estimation are similar. 3
fo (x) X e −αi x2 = Bi e + O(2 x2 ) + O(2 ), x i=1
(3.63)
ei = 2Ai µi αi e−αi 2 (µi )2 . where B
The even and odd components can be compared using Equations (3.49 & 3.64). Let √ √ y = x2 then f (x) = f ( y) and, fe(y) = f ( y).
fe(y) =
3 X
ei e−αi y + O(2 y) + O(2 ). B
(3.64)
i=1
Consider
Yi = e−αi y , for i = 1, 2, 3.
(3.65)
Let fe(y) = fe(Y ) and substitute Equation (3.65) into Equation (3.64) to obtain
fe(z) =
3 X
ei Yi . B
(3.66)
i=1
Considering the data set for the mixture model {xi , f (xi )}, using the multiple OLS on the dataset {xi , f (xi )} for the mixture model Equation (3.66) the parameter estimates ei for i = 1, 2, 3 are obtained. of B The odd function has three equations with six unknowns. Combining the Even and
38
odd functions Equation (3.52 & 3.53) estimating the general unknown parameters Ai ,µi , of model (3.68), the sufficient parameters of the mixture model can be obtained. i.e A system of six Equations and six unknowns. ei and µ The estimated parameters A ei , for i = 1, 2, 3 can be used as the IGVs for the unknown parameters in the iteration methods for the three Gaussian mixture model.
3.6 3.6.1
A MONTE CARLO SIMULATION STUDY Three Component Mixture Model
The three component mixture model is represented by
f (x, σi , µi , λi ) =
3 X
λi
i=1
1 √
σi 2π 3 X
where
! −(x − µi )2 , 2(σi )2
exp
(3.67)
λi = 1 and λi ∈ [0, 1].
i=1
From Equation (3.67), σi is the standard deviation, µi is the mean, and λi is the scaling factor.
f (x) =
3 X
2
Ai e−αi (x−εµi ) ,
(3.68)
i=1
where
Ai =
λ √i . σi 2π
Consider a simulation of a three component mixture model Equation (3.68), with parameters ε = 0.1, A1 = 1, A2 = 1.5, A3 = 2, α1 = 3, α2 = 2.5, α3 = 3, λ1 = 1.023327, λ2 = 1.681497, λ3 = 2.046653, µ1 = 0.1, µ2 = 1.3, µ3 = 1, σ1 = 0.408248, σ2 = 0.447214, σ3 = 0.408248. This is presented as
2
2
2
f (x) = e−3(x−(0.1)(0.1)) + 1.5 e−2.5(x−(0.1)(1.3)) + 2 e−3(x−(0.1)(1)) .
39
(3.69)
Figure 3.1: Three Component Mixture Model
From Figure 3.1, the three component mixture model may appear as one continuous Gaussian distribution. The main idea is to estimate the sufficient parameters that represent the three component mixture model by increasing the parameter dimensions of the original mixture model on subsequent estimation steps see ( Section 3.5).
3.7
RESULTS AND DISCUSSION
Tables 3.1 & 3.2 represent the result of the proposed method for a three component mixture model. In Table 3.1, the parameters of the three component mixture model are presented, and in Table 3.2, the sufficient parameters of the three component mixture model are presented, which are the main focus of the estimation problem. The exact values represent the Monte Carlo simulation of the model and the estimate values represent the values obtained using the proposed method.
40
Table 3.1: Parameter values for the Three Mixture Model
Values Parameter
Exact
Estimated
ε
0.1
0.10
A1
1.0
0.99
A2
1.5
1.31
A3
2.0
1.87
α1
3.0
2.71
α2
2.5
2.47
α3
3.0
3.01
λ1
1.023
1.0622
λ2
1.682
1.4780
λ3
2.047
1.9107
Table 3.2: Sufficient Parameter values for the Three Mixture Model
Values Parameter
Exact
Estimated
µ1
0.1
0.06
µ2
1.3
1.30
µ3
1.0
0.87
σ1
0.408
0.4293
σ2
0.447
0.4501
σ3
0.408
0.4074
Considering the results in Section 3.7, according to Table 3.1, the values of the 41
exact simulated model of the three mixture model can be estimated using the proposed (RoMGP) method (Theorem 3.1). The sufficient parameters of the three mixture model can also be computed using the developed method, in Table 3.2, the sufficient parameters are accurately estimated with minimal errors. The proposed method thus estimates the sufficient parameters for the mixture model.
3.8
CONCLUSION
In this Chapter the problem of estimation of the finite mixture model was considered. The problem was solved using the proposed method (RoMGP) that does not require initialization or iterative methods for estimating the sufficient parameters. It was also shown in the Monte Carlo simulation how the proposed method can be applied to a FMM and generalized for an n-component FMM. All simulations and computations R were performed in Mathematica . Considering the estimates, of the proposed method,
it is reasonable to conclude that, the RoMGP method produced consistent results. The study also indicates that 1. the proposed method (RoMGP) can be used symbiotically with the regular methods to compute IGVs; and 2. can be used to estimate a general n-component Gaussian model. Nevertheless, it has to be noted that the results of the regular guess value method largely depend on the initial approximations. Hence, the RoMGP method can be used as an algorithmic approach to computing initial approximations for the iteration methods.
42
CHAPTER 4. ESTIMATION OF THE RAYLEIGH DISTRIBUTION PARAMETER
4.1
OVERVIEW
This Chapter proposes an approach for estimating the scale parameter of a Rayleigh distribution, the technique is to minimize a goal function using a differential method. The proposed method estimates the scale parameter by increasing the parameter dimensional space (D) of the original function. Three datasets one simulated and two real datasets, are used to assess the performance of the proposed difference least-squares method (DLSM). Graphical presentations are also used to compare the DLSM and the maximum likelihood method (MLM) on real data. It is shown that the proposed DLSM works well when the sample size n ≥ 15. Since the DLSM uses non trivial assumptions on the data, it is recommended to substitute the current approaches used in estimating the scale parameter of a Rayleigh distribution.
4.2
INTRODUCTION
The usual assumption in Rayleigh distribution and regression analysis is that the function of the predictor variables is the scale parameter and a constant (Chansoo & Keunhee, 2009), (Akhter & Hirai, 2009) . In probability theory and statistics, the Rayleigh distribution is a continuous probability distribution for positive valued random variables (Siddiqui, 1961).
43
The Rayleigh distribution has a number of applications in settings where magnitudes of normal variables are important. An example for the application of Rayleigh distribution is the analysis of wind velocity into its orthogonal two-dimensional vector components. Assuming that each component is uncorrelated, normally distributed with equal variance, σ 2 , and zero mean, µ = 0, then the overall wind speed (vector magnitude) will be characterized by a Rayleigh distribution. The estimation of the scale parameter in Rayleigh distributions can be found in several areas of applications, that is, magnetic resonance imaging (MRI) (Aja-Fernandez et al., 2008), Radar (Akhter & Hirai, 2009), Lifetime of an object, reliability analysis, service times, communication theory and engineering (Akhter & Hirai, 2009), (Dyer & Whisenand, 1973), & (Merovci & Elbata, 2015). The Rayleigh distribution can also be used to model scattered signals that reach a receiver by multiple paths. The background data for many applications that is, MRI is Rayleigh distributed. Hence, the noise variance for the MRI data may be estimated using background data (Aja-Fernandez et al., 2008).
This chapter is organized as follows. Section 4.3, describes the motivation of the proposed method. In Section 4.4, different methods of parameter estimation for the Rayleigh distribution are presented and the proposed differential method is discussed. In Section 4.5, a Monte Carlo simulation and real data are considered to assess the performance of the proposed method. In Section 4.6, results and discussions of the study are presented and conclusions presented in Section 4.7.
4.3
MOTIVATION
The estimation of parameters for a one-parameter Rayleigh distribution is a problem of continuous probability distribution which usually arises when a two dimensional vector
44
has its two orthogonal components normally and independently distributed. The Euclidean norm of the vector will then have a Rayleigh distribution. The one-parameter Rayleigh probability density function (pdf) is represented by f (x, σ) =
x − x22 e 2σ for x ≥ 0. σ2
(4.1)
The scale parameter of the distribution is σ. The cumulative distribution function (CDF) on the support of X is given by Equation (4.2). x2
8
F (x) = P (X ≤ x) = 1 − e− 2σ2 for x ∈ [0, ). From Equation (4.1) by letting y = F (y) =
x σ
(4.2)
the Rayleigh pdf is given by Equation (4.3).
y − y2 e 2 for y ≥ 0. σ
(4.3)
Although extensive work has been done in the estimation of the one-parameter Rayleigh distribution (Dey et al., 2014), (Akhter & Hirai, 2009), (Kundu & Raqab, 2005), (Merovci & Elbata, 2015), (Siddiqui, 1961), the differential method has not been considered.
4.4
PARAMETER ESTIMATION FOR RAYLEIGH DISTRIBUTIONS
There exists different methods of estimating parameters for data that is assumed to be of a Rayleigh distribution. Some of these methods are maximum likelihood estimation (Chansoo & Keunhee, 2009), (Akhter & Hirai, 2009), (Dey et al., 2014), (Merovci & Elbata, 2015), (Dyer & Whisenand, 1973), (Kundu & Raqab, 2005), (Soliman, 2005), method of moments estimators (Merovci & Elbata, 2015), (Dey et al., 2014), Modified Moment Estimators (Kundu & Raqab, 2005), Local frequency ratio method (Moya 45
et al., 2005), L-moment estimator (Kundu & Raqab, 2005), (Dey et al., 2014), least squares estimators (Kundu & Raqab, 2005), (Dey et al., 2014), weighted least squares estimators (Merovci & Elbata, 2015), (Kundu & Raqab, 2005), percentile based estimator, Bayes estimators (Chansoo & Keunhee, 2009), (Diebolt & Robert, 1994), (Dey et al., 2014), (Soliman, 2005), simulation consistent estimators. In this section, the maximum likelihood estimator (MLE), frequency ratio estimator and the proposed difference least-squares method (DLSM) are discussed.
4.4.1
Maximum Likelihood Estimator
A random sample xi for i = 1, . . . , n of n observations for a Rayleigh population with a pdf of the form, see Equation (4.1). The likelihood function of this sample is given by
L=
n Y i=1
f (xi ) =
n Y i=1
x2 − i2 2σ
xi e σ2
=
n Y
xi
i=1
1 1 Pni=1 ( xi )2 σ e2 . σ 2n
(4.4)
The log-likelihood function l(x) = l of Equation (4.4) is given by n Y
n
1X l = ln L = ln xi − 2n ln σ − 2 i=1 i=1
xi σ
2 .
(4.5)
The likelihood normal equation is given by ∂ {ln L} = 0, ∂σ
(4.6)
n ∂l 2n 1 X 2 =− + 3 x = 0. ∂σ σ σ i=1 i
(4.7)
The parameter σ can be approximated by σ b using v u n n X u1 X 2n 1 2 t = 3 x ⇒σ b= x2 . σ σ i=1 i 2n i=1 i 46
(4.8)
The parameter σ b2 is a biased variable and the unbiased variable is obtained by σ b for the maximum likelihood estimator. The parameter σ b can be corrected using √ √ σ bΓ(n) n σ b4n n!(n − 1)! n = √ σ e= . (2n)! π Γ n + 12
4.4.2
(4.9)
Frequency Ratio Estimator
The Rayleigh distribution Equation (4.1) can be considered for its local frequencies by letting x = xi where i = 1, 2. Thus the Rayleigh distribution becomes
x1 − 21 ( xσ1 )2 e , σ2 x2 1 x 2 2 = 2 e− 2 ( σ ) . σ
f1 =
(4.10a)
f2
(4.10b)
The ratio of Equations (4.10a) and (4.10b) is given by 1 2 2 x1 f1 = e− 2σ2 (x1 −x2 ) . f2 x2
(4.11)
The approximation of the the parameter σ = σ b is given by
ln
σ2 =
f1 x1 1 = ln − 2 x21 − x22 , f2 x2 2σ x22 − x21
2 ln ff12 − ln xx12 s σ b=σ=
=
x22 − x21 , 2 (ln f1 x2 − ln f2 x1 )
x22 − x21 . 2 (ln f1 x2 − ln f2 x1 )
47
(4.12)
(4.13)
(4.14)
4.4.3
Differential Least-Squares Method
The application of the proposed DLSM uses the following assumptions on the dataset: Assumption 1. The data is assumed to follow a Rayleigh distribution; Assumption 2. The sample is assumed to be sufficient if n ≥ 15;
Theorem 4.1. For any random variable X that is assumed to follow a Rayleigh distrix2 x bution, f (x, σ) = 2 e− 2σ2 for x ≥ 0, the scale parameter σ can be estimated by σ min i→n
n X
00
φ21 x2i fi − 3φ1 fi − fi
2
→ 0, for n ≥ 15,
i=1
q where σ b = φ11 . Proof: Assuming that a random sample X = {x1 , x2 , x3 , · · · , xn } is available, using numerical differentiation, the first derivative of Equation (4.1) is given by
σ 2 − x2 xσ 2
df = f (x) dx Introducing, φ1 =
1 σ2
! .
(4.15)
as a new parameter, Equation (4.15) becomes
df = f (x) dx
1 − xφ1 . x
(4.16)
Taking the second numerical derivative of Equation (4.1) we obtain
d2 f = f (x) dx2
x2 3 − 2 4 σ σ
! .
(4.17)
Further re-parametrization of Equation (4.17), by introducing 3φ1 = φ2 , the parameter dimensional space is increased from two to three as 48
d2 f = f (x) φ21 x2 − φ2 2 dx 2 df 2 2 = f (x) φ x − 3φ . 1 1 dx2
(4.18a) (4.18b)
The scale parameter for the Raleigh distribution can now be estimated by minimization of the second order goal function
min i→n
where σ b=
n X
00
φ21 x2i fi − 3φ1 fi − fi
2
→ 0, for n ≥ 15,
(4.19)
i=1
q 1 . φ1
4.5
APPLICATION OF THE DIFFERENCE LEAST-SQUARES METHOD
In this section, a comparison of the Monte Carlo simulation and real life data are modeled using the proposed DLSM.
4.5.1
Monte Carlo Simulation Study, estimation using the DLSM
Using a simulation for a Rayleigh distribution for which σ = 0.7, for different sample sizes n5 = 5, n10 = 10, · · · , n30 = 30. Applying Equation (4.1) , the proposed method is employed to re-estimate the scale parameter, σ = 0.7. Figures 4.1 and 4.2 show the variations of the probability density function (pdf) for the Rayleigh distribution for σ b = 0.55, σ b = 0.62, σ b = 0.65, σ b = 0.67, 49
σ b = 0.68, and σ b = 0.69. It is noticed that the pdf for the Rayleigh distribution seems to be a decreasing function for σ b = 0.55, see Figure 4.1, and it increases for σ b ≥ 0.62, see Figures 4.1 and 4.2. f(x)
1.0
0.8
= 0.7 σ
0.6
= 0.671 for n = 5 σ = 0.624 for n = 10 σ = 0.654 for n = 15 σ
0.4
0.2
0.5
1.0
1.5
2.0
2.5
3.0
x
Figure 4.1: The variations of the pdfs of the Rayleigh distribution for different scale parameters and sample sizes f(x)
0.8
0.6
= 0.7 σ = 0.67 for n = 20 σ = 0.68 for n = 25 σ = 0.69 for n = 30 σ
0.4
0.2
0.5
1.0
1.5
2.0
2.5
3.0
x
Figure 4.2: The variations of the pdfs of the Rayleigh distribution for different scale parameters and sample sizes
50
The detailed results of the Monte Carlo simulation are discussed and presented in Sections 4.6 and 4.7. The simulations were performed using Mathematicar software.
4.5.2
Real Data for the Strength of Glass Fiber
The data taken from Smith and Naylor (1987) (Vining & Kowalski, 2010), studied the strength of 1.5cm glass fibers measured at the national Physical Laboratory, England. The data comprises of 63 observations: 0.55, 0.93, 1.25, 1.36, 1.49, 1.52, 1.58, 1.61, 1.64, 1.68, 1.73, 1.81, 2, 0.74, 1.04, 1.27, 1.39, 1.49, 1.53, 1.59, 1.61, 1.66, 1.68, 1.76, 1.82, 2.01, 0.77, 1.11, 1.28, 1.42, 1.5, 1.54, 1.6, 1.62, 1.66, 1.69, 1.76, 1.84, 2.24, 0.81, 1.13, 1.29, 1.48, 1.5, 1.55, 1.61, 1.62, 1.66, 1.7, 1.77, 1.84, 0.84, 1.24, 1.3, 1.48, 1.51, 1.55, 1.61, 1.63, 1.67, 1.7, 1.78, 1.89. The scale parameter for the strength of glass fiber was estimated using the difference least-squares method (DLSM). The estimated scale parameter for this distribution was computed to be σ = 0.4657 ≈ 0.47 also using the maximum likelihood the estimated value for the scale parameter is σ = 1.0894752632374674 ≈ 1.09.
51
Figure 4.3: The variations of the pdfs of the Rayleigh distribution for the scale parameter estimation using the difference least-squares method and maximum likelihood estimators From Figure 4.3, it can be observed that the MLE method estimates (models) the glass fiber data better than the DLSM at the peak. However, the DLSM presents a better performance than the MLE method on the tail. It can be said that both the MLE and DLSM have relative importance in modeling fiber data. 4.5.3
Real Data for Electronic Component Failure Times
The failure times involving an electronic component provided by Juran and Gryna (1980) (Vining & Kowalski, 2010). These are the failure times in hours for 84 of the components: 1, 1.2, 1.3, 2, 2.9, 3, 3.1, 3.5, 3.8, 4.3, 4.7, 4.8, 5.2, 5.4, 6.4, 6.8, 6.9, 7.2, 8.3, 8.7, 9.2, 10.2, 10.4, 11.9, 14.4, 15.6, 16.2, 17, 19.2, 28.1, 28.2, 29, 30.6, 32.4, 33, 36.1, 40.1, 42.8, 44.5, 50.4, 51.2, 52, 54.2, 55.6, 56.4, 58.3, 63.7, 64.6, 65.3, 70.1, 71, 75.1, 78.4, 79.2, 84.1, 86, 88.4, 89.9, 90.8, 91.1, 92.1, 97.9, 100.8, 103.2, 104, 104.3, 105.8, 106.5, 110.7, 112.6, 52
114.8, 115.1, 117.4, 118.3, 120.6, 121, 122.9, 124.5, 125.8, 126.6, 128.4, 129.2, 129.5, 129.9. The scale parameter for the failure times involving an electronic component was estimated using the difference least-squares method (DLSM). The estimated unbiased scale parameter for this distribution was computed to be σ 2 = 1426.9500257374 ≈ 1426.95, and for the biased estimated scale parameter component is σ = 37.7749920679992 ≈ 37.77. Using the maximum likelihood the estimated value for the scale parameter is σ = 51.35960811024647 ≈ 51.34.
Figure 4.4: The variations of the pdfs of the Rayleigh distribution for the scale parameter estimation using the difference least-squares method and the MLE methods, on failure times for electronic component
Figure 4.4, shows that the MLE method estimates the electronic component data correctly on right tail, while the DLSM estimates the data well on the peak. The MLE and DLSM methods yield a Rayleigh distribution that correctly models the given elec53
tronic component failure data. The MLE estimator for the electric component failure time data is a heavy-tailed (fat-tail) probability distribution whose tail is not exponentially bounded (Asmussen, 2003), while the DLSM estimator for the electric component failure time data is a long heavy-tailed probability distribution that estimates the data well on the extremes. Thus the DLSM can be used to estimate the parameters for data that follows a Rayleigh distribution in the absence of a maximum likelihood estimator.
4.6
RESULTS
Table 4.1 represents the result of the estimated values of the scale parameter using the simulation of the Rayleigh data, Section 4.5. The original values represent the Monte Carlo simulation of the model and the estimate values represent the values obtained using the proposed DLSM.
Table 4.1: Parameter Estimation for the Rayleigh Distribution Model using the Difference Least-Squares Method
Original Parameter σ = 0.7 Sample Size
DLSM
n5 = 5
σ = 0.55
n10 = 10
σ = 0.62
n15 = 15
σ = 0.65
n20 = 20
σ = 0.67
n25 = 25
σ = 0.68
n30 = 30
σ = 0.69
54
The Difference Least-Squares method (DLSM) performs well with data samples n ≥ 15, see Figures 4.1, 4.2 & Table 4.1, thus the DLSM is more accurate in estimating the Rayleigh distributed data, for large samples n ≥ 15, see Figures 4.3 and 4.4).
4.7
CONCLUSION
In this Section a new approach for estimating the scale parameter of a Rayleigh distribution has been discussed. A theorem on which the proposed approach is based has been stated and proved. Three data sets, one simulated, see Section 4.5.1 and two real, see Sections 4.5.2 & 4.5.3 have been used to assess the performance of the proposed method. The proposed method shows comparable results with the MLE when used to model the real data sets. The DLSM gives better results when used to model samples n ≥ 15. As mentioned in Section 4.5.2, the MLE and DLSM methods have relative importance in modeling specific data sets. This is justified by, the MLE modeling the electronic data better on the tail and the DLSM modeling the data better on the peak, see Figure 4.4. On the contrary the DLSM models the fiber glass data better on the tail and the MLE method models the same data better at the peaks, see Figure 4.3. Thus the DLSM method can be use to estimate parameters for the Rayleigh distribution where tail datasets are of importance.
55
CHAPTER 5. EXACT SOLUTIONS FOR A TWO-PARAMETER RAYLEIGH DISTRIBUTION
5.1
OVERVIEW
In attempting to estimate parameters of a Rayleigh distribution, numerical iterative methods or routines are frequently employed. In this study, an exact method on the constant minimization of the goal function is proposed. The scale and location parameter of the Rayleigh distribution are estimated by changing the parameter dimensional space of the original function. Linearization of the probability density function (PDF) is done through differential techniques. A numerical simulation and real life data set of the strength of glass fiber are used to evaluate the performance of the proposed optimization differential method (ODM), maximum likelihood estimators (MLE) and method of moments estimator (MME). The probability density functions of the three methods are constructed using the estimated parameters from the methods. The ODM shows a better convergence as the sample data increase and can be adopted in practice
5.2
INTRODUCTION
The Rayleigh distribution was introduced by Lord Rayleigh (1880) in connection with problems in the field of acoustics (Strutt, 2011 Original Publication Year: 1877). From then on, the Rayleigh distribution has been applied in many different areas of science and technology. The two-parameter Rayleigh distribution is a special case of
56
the three-parameter Weibull distribution (Khan et al., 2010), (Murthy et al., 2004), (Kundu & Raqab, 2005), the distribution may be used to estimate real life data than the one-parameter Rayleigh distribution (Dey et al., 2016), (Mkolesia et al., 2016a). The Rayleigh distribution has a number of applications in settings where magnitudes of normal variables are important (Chansoo & Keunhee, 2009), (Akhter & Hirai, 2009). An example for the Rayleigh distribution is the analysis of wind velocity (Pessanha et al., 2016) into its orthogonal two-dimensional vector components (Mclachlan & Peel, 2000). Assuming that each component is uncorrelated, normally distributed with equal variance, σ 2 , and zero mean, η = 0, then the overall wind speed (vector magnitude) will be characterized by a Rayleigh distribution. The parameter estimation of the Rayleigh distribution can be found in several areas of applications, that is to say, magnetic resonance imaging (MRI) (Aja-Fernandez et al., 2008), Radar (Akhter & Hirai, 2009), Fiber strength (Smith & Naylor, 1987), (Jasra et al., 2005), Plant data (Jasra et al., 2005), Clustering (Fraley & Raftery, 1998), communication theory and engineering (Akhter & Hirai, 2009), (Dyer & Whisenand, 1973), (Merovci & Elbata, 2015), seismic analysis (Zerva, 2009), (Moya et al., 2005). In this Chapter, a method of estimating the location and scale parameter of a Rayleigh distribution is developed. The proposed method is tested against other existing estimation methods that is maximum likelihood estimation (MLE) and method of moments estimation (MME). The tests are done on numerical simulations and real life examples. The parameter estimation of the scale and location parameters for a two-parameter Rayleigh distribution random variables, is achieved by minimizing a differential goal function. It is shown that the developed method (optimization differential method ODM) works well when the data sample size are n ≥ 5. The chapter is organized as follows: Section 5.3, describes the motivation of the proposed method. In Section 5.4, different methods of parameter estimation for the Rayleigh distribution are presented and the proposed method is discussed. In Section 57
5.5, a numerical simulation is considered to assess the performance of the proposed method and results of the proposed method are presented. Section 5.6 the proposed method (optimized differential method) with existing parameter estimation models on real life data are compared. In Section 5.7, results and discussion of the study are presented and conclusions presented in Section 5.8.
5.3
MOTIVATION
The estimation of parameters in all algebraic models is regarded as a fundamental step in statistical modeling for a class of problems. In this study estimating parameters of a Rayleigh distribution is considered. The specification for the two-parameter Rayleigh distribution is if x > η, λ > 0;
0
else where,
(5.1)
where, the scale parameter λ > 0 and the location parameter 0 < η
η, is as follows;
(5.2) F (x; λ, η) = 1 − e
−λ(x−η)2
.
Although extensive work has been done on the estimation of the one-parameter Rayleigh distribution (Mkolesia et al., 2016a), according to Dey et al. (2014), not 58
much attention has been paid to the two-parameter Rayleigh distribution, however, brief literature does exist (Johnson et al., 1995). In this study a method for exact estimation of the Rayleigh distribution is discussed.
5.4
PARAMETER ESTIMATION
There exists different methods for estimating parameters for X distributed data that is assumed to be of a Rayleigh distribution. Some of these methods are maximum likelihood estimation, (Chansoo & Keunhee, 2009), (Akhter & Hirai, 2009), (Dey et al., 2014), (Merovci & Elbata, 2015), (Dyer & Whisenand, 1973), (Kundu & Raqab, 2005), (Soliman, 2005), method of moments estimators, (Merovci & Elbata, 2015), (Dey et al., 2014), Modified Moment Estimators, (Kundu & Raqab, 2005), Local frequency ratio method, (Moya et al., 2005), L-moment estimator, (Kundu & Raqab, 2005), (Dey et al., 2014), least squares estimators, (Kundu & Raqab, 2005), (Dey et al., 2014), weighted least squares estimators, (Merovci & Elbata, 2015), (Kundu & Raqab, 2005), Bayes estimators, (Chansoo & Keunhee, 2009), (Diebolt & Robert, 1994), (Dey et al., 2014), (Soliman, 2005). In this section a summary of the maximum likelihood estimator (MLE), method of moments estimation (MME) and the proposed method is presented.
5.4.1
Maximum Likelihood Estimator
Assume a random sample xi for i = 1, . . . , n of n observations for a Rayleigh distributed random data with a PDF of the form Equation (5.1) is obtainable. The likelihood function of xi on the support of X is given by l (λ, η) = C + n ln λ +
n X
ln (xi − η) − λ
i=1
n X i=1
The two normal equations are then
59
(xi − η)2 .
(5.3)
n
∂ l (λ, η) n X = − (xi − η)2 = 0, ∂λ λ i=1 and
n
(5.4)
n
X X ∂ l (λ, η) =− (xi − η)−1 + 2λ (xi − η) = 0. ∂η i=1 i=1
(5.5)
It can be observed that the normal equations, Equations (5.4 and 5.5) can not be solved explicitly to obtain exact solutions for the scale (λ) and location (η) parameters, hence the application of numerical iterative approaches. From Equation (5.4), we obtain the maximum likelihood estimation (MLE) of λ as b (η), as a function of η, say λ b (η) = λ
n n X
.
(5.6)
2
(xi − η)
i=1
b Substituting λ(η) in Equation (5.3), we obtain the log-likelihood function of η without the additive constant as
b (η) , η = −n ln g (η) = l λ
n X
(xi − η)
i=1
=
i=1
+
n X
ln (xi − η)
(5.7)
i=1
n X
2
x −η i ln n . X 2 (xi − η) i=1
The profile log-likelihood function Equation (5.7) is an increasing function of η, for n = 1, hence the MLE of η is x1 . In this case the MLE of η is not finite for n > 1. Therefore, the MLE of η, say ηbM LE , can be obtained by maximizing Equation (5.7) with respect to η. It can be shown that the maximum of Equation (5.7) can be obtained as a fixed point solution of the following equation h (η) = ηbM LE ,
60
(5.8)
where h (η) = 2
n X
2
(xi − η) ×
i=1
n X
(xi − η) ×
i=1
n X
(xi − η)−1 .
(5.9)
i=1
bM LE = λ(b b ηM LE ) can be easily obtained. Once ηbM LE is obtained, the MLE of λ, λ Observe that very simple iterative technique say h(η (j) ) = η (j+1) , where η (j) is the j-th iterate, can be used to solve Equation (5.8). bM LE , are not in explicit The variances and distributional properties of ηbM LE and λ forms, it is expected that the exact distributions of the MLEs will not be possible to obtain. We therefore, mainly rely on the asymptotic properties of the MLEs. The two-parameter Rayleigh distribution does not satisfy the standard regularity conditions of Cramer or Cramer-Rao boundary regularity conditions, (Rinne, 2009), (Rao, 1948). Note that
E
∂ l(λ, η) ∂η 2
!
= 2nλ + E
n X
(xi − η)2 → . 8
2
(5.10)
i=1
It follows that from Theorem 3 of Smith (1985), (b ηM LE − η) is asymptotically normally 1 b distributed with mean 0 and variance V (b ηM LE − η) = O , and λM LE − λ n ln n !−1 2 bM LE − λ = − E ∂ l = asymptotically normally distributed with mean 0 and V λ ∂λ2 λ2 b are asymptotically independent, the exact asymptotic variance of ηb cannot . ηb and λ n be obtained in explicit form, from the Corollary of Theorem 3 of Smith (1985), it follows that V (b η − η) can be well approximated by the inverse of the observed information, and in this case it is 1
V (b ηM LE − η) ≈ 2η +
1 n
n X
.
(5.11)
(xi − ηbM LE )−2
i=1
Thus 100(1 − α)% approximate the confidence interval of λ and η can be obtained
61
as bM LE bM LE ± z α λ√ λ 2 n
(5.12)
and
12
ηbM LE
1 α ± z2 , n X 1 −2 2b ηM LE + (xi − ηbM LE ) n i=1
respectively, where z α2 is the 5.4.2
(5.13)
α -th percentile point of the standard normal distribution. 2
Method of Moment Estimators
The MMEs of the two-parameter Rayleight distribution can be obtained as
bM M E λ
" # 1 3 = 2 1 − Γ2 s 2
(5.14)
3 , 2
(5.15)
and ηbM M E = x¯ −
1 b− 2 Γ λ MME
n
n
1X 1 X xi and s2 = (xi − x¯)2 are sample mean n i=1 n − 1 i=1 and sample variance respectively. where Γ(τ ) = (τ − 1)!, x¯ =
bM M E and ηbM M E are not possible to obtain, but the The exact distributions of λ bM M E and ηbM M E can be obtained. For that we need the asymptotic distribution of λ following notations. Suppose the random variable X has a Rayleigh distribution with parameters λ and η. Thus define
ak = E X
k
1
k
and bk = E (X − η) =
k
λ2
Then
62
Γ
k +1 2
for k = 1, 2, · · · .
(5.16)
a1 = b1 + η, a2 = b2 + 2ηa1 − η 2 , a3 = b3 + 3ηa2 − 3η 2 a1 + η 3 ,
(5.17)
a4 = b4 + 4ηa3 − 6η 2 a2 + 4η 3 a1 − η 4 .
(5.18)
The asymptotic properties of the MME can now be provided. Since the first two moments exist, it is immediate that the MMEs are consistent estimators to the corresponding parameters. Using the δ-method, the asymptotic distributions of the MME can be obtained, (Rinne, 2009). √
h Xi √ d bM M E − λ) −→ N2 0, , n(b ηM M E − η), n(λ
d
where “−→ ” means convergence in distribution, and
X
is a 2 × 2 matrix and it has
the following form: X
and denote
= C−1 AC−1 , c11 c12 C= , c21 c22
a11 a12 A= , a21 a22
then for
63
(5.19)
3 2
Γ
c1 = q 1 − Γ2
3 2
and 3 , c2 = 1 − Γ 2 2
we have a11 = a2 − a21 , a12 = a21 = a3 − a1 a2 , a22 = a4 − a22 , − 1 − 1 c1 c11 = 1 + c1 a2 − a21 2 a1 , c12 = c21 = − a2 − a21 2 , 2 −2 c22 = −c2 a2 − a21 . It is clear that both ηbM LE and ηbM M E are consistent estimators of η, ηbM LE converges to η faster than ηbM M E .
5.4.3
Optimized Differential Method
Proposition 5.1. Nonlinear least-squares problems can be linearized in parameters, through differential techniques by changing their parameter dimensional space. Theorem 5.2. For any random variable X that is assumed to follow a Rayleigh distri2
bution f (x; λ, η) = 2λ(x − η)e−λ(x−η) for x > η and λ > 0, the scale parameter λ, and location parameter η, can be estimated by
min i→n
n X i=1
"
#2 2 p p d fi (ξ) x2i φ1 − xi 2φ1 φ2 + 2 φ1 φ22 − 3 φ1 − 2 {fi (ξ)} −→ 0, dxi
p b = 1 |φ1 |, and for ξ = (x; φ1 , φ2 ). The estimate of the scale parameter λ, is λ 2 the location parameter η, is ηb = φ2 .
where
64
Proof: It follows from the work done by (Kikawa et al., 2015), (Kikawa, 2013) . AsN X suming that a random sample X = xi is available, using numerical differentiation, i=1
on Equation (5.1) the first derivative is 1 ∂f = f (ξ) − 2λ(x − η) , ∂x x−η
(5.20)
and the second order derivative is ∂ 2f 2 2 2 2 = f (ξ) x 4λ − x8λ η + 4λη − 6λ ∂x2
(5.21)
With the assumption that the random sample, X = {x1 , x2 , · · · , xn } is available and is assumed to follow a Rayleigh distribution then the scale (λ) and location (η) parameters for the distribution can be obtained by minimizing the goal function
min i→n
n X i=1
"
p p d2 fi (ξ) x2i φ1 − xi 2φ1 φ2 + 2 φ1 φ22 − 3 φ1 − 2 {fi (ξ)} dxi
#2 −→ 0,
where ξ = (x; φ1 , φ2 ). The estimate of the scale parameter λ, is p b = 1 |φ1 | λ 2
(5.22)
and for the location parameter η, is
ηb = φ2 .
(5.23)
Hence Equations (5.22 and 5.23), give the exact solutions of the scale parameter b and the location parameter η = ηb. λ=λ 65
5.5
NUMERICAL SIMULATIONS
A simulation of the Rayleigh distribution R(λ, η), Equation (5.1), where the scale λ = 0.346, and the location η = 0.5, is performed. Estimation of the scale parameter using the proposed method is represented in Figures 5.1 and 5.2.
f(x,σ) λ = 0.346, σ = 0.5 = 0.403 for n = 5 λ =0.349,σ
0.5
= 0.404 for n = 10 λ =0.348,σ = 0.405 for n = 15 λ =0.347,σ
0.4
0.3
0.2
0.1
1
2
3
4
5
x
Figure 5.1: The variation of the probability density function for the Rayleigh distribution using the ODM for different samples sizes n = 5, 10, 15
66
f(x,σ) λ = 0.346, σ = 0.5
0.5
= 0.406 for n = 20 λ =0.347,σ = 0.407 for n = 25 λ =0.347,σ = 0.408 for n = 30 λ =0.346,σ
0.4
0.3
0.2
0.1
1
2
3
4
5
x
Figure 5.2: The variation of the probability density function for the Rayleigh distribution using the ODM for different samples sizes n = 20, 25, 30 f(x,σ) λ = 0.346, σ = 0.5 = 0.496 for n >>> λ =0.351,σ
0.5
0.4
0.3
0.2
0.1
1
2
3
67
4
5
x
using the ODM for a considerably large sample size n →
8
Figure 5.3: The variation of the probability density function for the Rayleigh distribution
The Figures 5.1 and 5.2, represent the results of ODM to determine the scale and location parameter of the numerical simulation of a two-parameter Rayleigh distribution with a scale of λ = 0.346 and location η = 0.5. Using the ODM it can be shown that with b is approximated a sample of n = 5, 10, 15 (small), Figure 5.1, the scale parameter λ = λ better than the location parameter η = ηb. The difference in the estimation for the small sample, that is n = 5, 10, 15, is very minimal. Using the ODM it can be observed that with a sample of n = 20, 25 (moderate)and n = 30 (large), Figure 5.2, the scale b and location parameter η = ηb, are approximated. In this case the parameter λ = λ proposed method gives a better approximation for the scale parameter than the location parameter. The difference in the estimation for the moderate and large samples, that is n = 20, 25, 30, is very minimal. If the sample size is increased say a considerably ) 1 , the estimate of the scale and location parameter are
8
large sample size (n →
improved, Figure 5.3. The detailed results of the numerical simulation are discussed and presented in Sections 5.5.1, 5.7 and 5.8. The numerical simulations were performed using Mathematicar CAS.
5.5.1
Results
Table 5.1 represents the result of the estimated values of the scale and location parameters, for the numerical simulation, Section 5.5. The values, λ = 0.346 and η = 0.5 represent the scale and location parameters of the two-parameter Rayleigh distribution b and location parameter respectively. The estimated values of the scale parameter λ,
A sample size of n → n = 280.
is considered for any n ≥ 60 for this sample n →
68
8
1
8
ηb, represent the values obtained using the ODM on different sample sizes n. It can be was considered for
observed that the reliability of estimates increases with increase in sample size, Table 5.1, which is consistent with statistical estimation theory. Table 5.1: Parameter estimation for a two-parameter Rayleigh distribution model using the optimized differential method
Scale Parameter λ = 0.346, Location Parameter η = 0.5 Estimated Value
Estimated Value
Scale Parameter
Location Parameter
n=5
b5 = 0.3486 λ
ηb 5 = 0.4032
n = 10
b10 = 0.3482 λ
ηb 10 = 0.4043
n = 15
b15 = 0.3478 λ
ηb 15 = 0.4054
n = 20
b20 = 0.3474 λ
ηb 20 = 0.4063
n = 25
b25 = 0.3469 λ
ηb 25 = 0.4071
n = 30
b30 = 0.3465 λ
ηb 30 = 0.4077
.. .
.. .
.. .
b = 0.3509 λ
ηb = 0.4962
n→
5.6
8
Sample
DATA ANALYSIS
(Real Life Data Example) In this section the real life data for the strength of fiber glass, is considered. The ODM, MLE and MME approaches are employed to estimate both the scale and location parameters of the real life data and reconstruction of the data pdf pattern is done, Figure 5.4.
69
5.6.1
Strength of Glass Fiber Data
The data taken from SMITH and NAYLOR (1987) (Vining & Kowalski, 2010) , studied the strength of 1.5cm glass fibers measured at the national Physical Laboratory, England. The data comprises of 63 observations:
0.55, 0.93, 1.25, 1.36, 1.49, 1.52, 1.58, 1.61, 1.64, 1.68, 1.73, 1.81, 2, 0.74, 1.04, 1.27, 1.39, 1.49, 1.53, 1.59, 1.61, 1.66, 1.68, 1.76, 1.82, 2.01, 0.77, 1.11, 1.28, 1.42, 1.5, 1.54, 1.6, 1.62, 1.66, 1.69, 1.76, 1.84, 2.24, 0.81, 1.13, 1.29, 1.48, 1.5, 1.55, 1.61, 1.62, 1.66, 1.7, 1.77, 1.84, 0.84, 1.24, 1.3, 1.48, 1.51, 1.55, 1.61, 1.63, 1.67, 1.7, 1.78, 1.89.
The scale parameter for the strength of glass fiber was estimated using ODM. The estimated scale parameter for this distribution was computed to be σ = 0.466 also using the maximum likelihood the estimated value for the scale parameter is σ = 1.089.
5.6.1.1
Strength of Glass Fiber Data Using Different Estimation Method
70
Figure 5.4: Variation of probability density functions for the Rayleigh distribution using scale and location estimates from ODM, MLE and MME
The estimation of the location and scale parameters for the strength of glass fiber data using the ODM, MLE and the MME, approaches are shown in Figure 5.4. Table 5.2 shows the estimates obtained using the ODM. MLE and MME methods. Table 5.2: Scale and location parameter estimates for the strength of glass fiber data using the ODM, MLE and MME
Scale Parameter
Location Parameter
b λ
ηb
ODM
5.073
1.024
MLE
5.781
1.628
MME
5.402
1.634
Method
From the results, Table 5.2 and Figure 5.4, it is shown that the ODM gives better estimates for both the scale and location parameter. The pdf using ODM, Figure 5.4, takes into consideration majority of the data points as it is centrally positioned as opposed to the pdf patterns of the MLE and MME which are greatly skewed to the right. Hence, leaving out a greater part of the data points on the left. The MLE and MME ignore a majority of the data thus poorly estimate the strength of glass fiber data. The location parameter has a significantly high discrepancy and thus the ODM can be considered for the estimation of the location and scale parameter for the strength of glass fiber data.
71
5.7
SUMMARY OF RESULTS
In this section the developed method (ODM) the existing MLE and MME approaches are compared, in order to compare their estimation capabilities for the two-parameter Rayleigh distribution. Table 5.3: Summary of the performance of the proposed method and existing MLE, b and Location Parameter MME techniques for the estimated values of the Scale λ = λ, η = ηb
Original values λ = 0.346 and η = 0.5 ODM
MLE
MME
b λ
ηb
b λ
ηb
n=5
0.3486
0.4032
0.0425
0.0091
0.0000
0.0494
n = 10
0.3482
0.4043
0.1167
0.0137
-0.2422
0.3479
n = 15
0.3478
0.4054
0.1522
0.02791
-0.3009
0.4386
n = 20
0.3474
0.4063
25.0528
-24.7945
-0.3437
0.5122
n = 25
0.3469
0.4071
0.4942
-0.2096
-0.3699
0.5679
n = 30
0.3465
0.4077
0.1895
0.0137
-0.3809
0.6119
.. .
.. .
.. .
.. .
.. .
.. .
.. .
0.3509
0.4962
0.5695
0.1280
-1.3633
1.4673
n→
N.B: A sample size of n →
8
ηb
8
b λ
is considered for any n ≥ 60 for this sample n →
was considered for n = 280.
72
8
Sample
Table 5.4: The mean absolute error analysis of the proposed method and existing MLE, MME techniques for the estimation of the original values of the scale λ = 0.346 and location η = 0.5 parameters
b and location parameter η = ηb Error analysis for the scale λ = λ ODMerror
MLEerror
MMEerror
berror λ
ηberror
berror λ
ηberror
n=5
0.75
19.36
87.72
98.18
100.00
90.12
n = 10
0.64
19.14
66.27
97.26
170.00
30.42
n = 15
0.52
18.92
56.01
94.42
186.97
12.28
n = 20
0.40
18.74
7140.69
5058.9
199.34
2.44
n = 25
0.26
18.58
42.83
141.92
206.91
13.58
n = 30
0.14
18.46
45.23
97.26
210.09
22.38
.. .
.. .
.. .
.. .
.. .
.. .
.. .
1.42
0.76
64.59
74.40
494.00
193.46
n→
N.B: A sample size of n →
8
ηberror
8
berror λ
is considered for any n ≥ 60 for this sample n →
8
Sample
was considered for n = 280. From Table 5.4, the error for the different estimation methods are presented. It is observed that the proposed method presents a relatively smaller error for the scale and location parameters respectively. The MLE and MME present a bigger errors for the estimated parameters.
73
Table 5.5: The average mean absolute error for the estimation methods, ODM, MLE and MME
Method
Scale parameter error
Location parameter error
ODM
± 0.59
± 16.28
MLE
± 1071.91
± 808.91
MME
± 223.90
± 52.10
Thus in this case the ODM has the lowest error margin compared with the other estimation methods (MLE and MME, respectively), Table 5.5. The proposed method performs better than the existing methods in the numerical analysis. Thus the ODM method can be adopted to estimate the location and scale parameters of Rayleigh distributed samples. The proposed optimized differential method works well when the following assumptions are met and not relaxed: Assumption 3. Is assumed to follow a Rayleigh probability density function (Rpdf) distribution Assumption 4. With a sample size of n ≥ 5 may be used for the proposed method to converge.
5.8
CONCLUSION
It has been shown that the proposed method estimates the scale and location parameter of the Rayleigh distribution better than the MLE and MME, Sections 5.6 and 5.7. Table 5.1 shows that the ODM performs well when the sample size is large and shows 74
convergence at n ≥ 5. In Table 5.2 the ODM covers more data points than the MLE and MME, thus the ODM estimates the real life data well. It is observed from Table 5.3 that the ODM performs better to estimate the parameters for the Rayleigh distribution, thus the proposed method may be implemented as an algorithm for estimating the sufficient parameters for a Rayleigh distribution. In the error analysis of the estimation techniques investigated (ODM, MLE, MME), Table 5.4 it can be observed that the ODM works better with the X random variable being X ≥ 5, see Section 5.7. It can be observed from Table 5.5 that on the overall the ODM gives a minimum average error.
75
CHAPTER 6. MULTIVARIATE GAUSSIAN PARAMETER ESTIMATION: THE PRINCIPLE OF n-CROSS SECTIONS
6.1
OVERVIEW
In this Chapter, a new method called the principle of n-cross sections (PCS) for the estimating parameters of a multivariate Gaussian distribution is proposed. The importance of parameter estimation for a multivariate distribution is discussed in Section 6.2. In Section 6.3 the motivation for the study is discussed. Existing parameter estimation method like, the maximum likelihood for multivariate Gaussian distributions is presented in Section 6.4.1. A theorem on which the PCS method is based is firstly stated and proved in Section 6.4.2. A numerical simulation is performed in Section 6.5 in order to evaluate the performance of the PCS method. In Section 6.6 a conclusion is presented.
6.2
INTRODUCTION
Estimating parameters of a multivariate Gaussian distribution is usually done by methods maximum likelihood (Li, 2012), (Rencher, 2002), (Wickens, 2002), which assumes that the training (available) data set is large enough to provide robust estimates. This study proposes a new method of parameter estimation for the multivariate Gaussian distribution known as the principle of n-cross sections (PCS). The multivariate Gaussian distribution is considered as many multivariate distribution rely, in some manner,
76
on the multivariate Gaussian (Li, 2012), (Timm, 2002) and (Rencher, 2002). It is stated that many real-world problems fall naturally within the framework of normal theory (Li, 2012). The importance of the normal distribution rests on its dual role as both population model for natural phenomena and approximate sampling distribution for many statistics (Li, 2012), (Rencher, 2002). Real life data may never exactly come from the true multivariate Gaussian distribution (Rencher, 2002), the multivariate Gaussian distribution provides a robust approximation to the “true”population distribution (Rencher, 2002). The multivariate distribution also has many “nice”mathematical properties that can be explored and mathematically tractable (Rencher, 2002). Ideally multivariate methods should be used so that the data can be treated in an integrated way (Fricker, 2006). Because of the central limiting theorem, many multivariate statistics converge to the multivariate Gaussian distribution as the sample sizes increase. The multivariate distributions have a number of applications like, speaker authentication (Li, 2012), decision theory (Chatfield & Collins, 1980), cluster analysis (Chatfield & Collins, 1980).
6.3
MOTIVATION
Much work on the estimation of multivariate Gaussian distributions has been done using nonparametric methods like the kernel estimators (Scott & Sain, 2004), (Murphy, 2012). There is a need to establish new methods that are more robust and have less distribution assumptions which can estimate the parameters for multivariate Gaussian distributions. There exists some traditional methods that are used to estimate the parameters for the multivariate Gaussian distribution like the maximum likelihood estimators (Gallant, 1987), (Searle, 1971), (Kotz et al., 2000) & (Fichet et al., 2011), the MLE needs initialization (Searle, 1971). However, these methods have stringent distribution assumptions and require initial guess values to start the estimation process. Thus a need for parameter estimation method that can not be based on initial values. In this chapter a
77
method for estimating the parameters for a multivariate Gaussian distribution is presented. This new method is called the principle of three cross section and does not require any initialization. Definition 6.1. Multivariate Gaussian: Any Rn -valued random variable x is a multivariate Gaussian if for every t ∈ R real valued random variable t · x is normal or Gaussian. From definition (6.1), the distribution of the multivariate t · x is determined uniquely by its mean µ = µt and its standard deviation σ = σt . Thus µt = t · µ where µ = E(x) (Rencher, 2002), (Timm, 2002). A vector-valued random variable x = [x1 , . . . , xn ]T is said to be a multivariate normal (or Gaussian) distribution with mean (location parameter) µ ∈ Rn and covariance matrix Σ ∈ Sn++ , where Sn++ = A ∈ Rn×n : A = AT and xT A x > 0 for all x ∈ Rn such that x 6= 0
, that is to say A is a positive definite n × n matrix and the probability density function is given by 1 T −1 N(x | µ , Σ) = (x − µ ) where x ∈ Rn . n 1 exp − (x − µ ) Σ 2 (2π) 2 | Σ | 2 1
(6.1)
µ, Σ ), where µ ∈ Rn and Σ > 0 with Thus x is a Gaussian distribution, x ∼ N (µ moments, mean, E(x) = µ and covariance, cov(x) = Σ (Kotz et al., 2001). From Equation (6.1) with a mean vector µ there are n (independent) parameters and within the symmetric covariance matrix Σ there are 21 n(n + 1) independent parameters, thus 12 n(n + 3) independent parameters in total (Wickens, 2002).
78
6.3.1
Relation to the Univariate Gaussians
where
−
0. This implies that for any vector x 6= µ ,
(x − µ )T Σ −1 (x − µ ) > 0, 1 − (x − µ )T Σ −1 (x − µ ) < 0. 2
79
(6.5)
From Equation (6.4), D2m = (x − µ )T Σ −1 (x − µ ) is known as the Mahalanobis distance (Murphy, 2012), first proposed by (Mahalanobis, 1936), which is the squared general distance from x to µ . If a random variable has a large variance than another, it receives relatively less weight in the Mahalanobis distance. Similarly, two highly correlated variables do not contribute as much as two variables that are less correlated. This means that the use of the inverse of the covariance matrix in a Mahalanobis distance has the effect of 1. standardizing all variables to the same variance and 2. eliminating correlations.
6.3.2
8
Z ···
−
8
−
−
8
1
(2π) 2 | Σ | 2
8
n
Z
8
Z
1
8
Like in the univariate case, the argument of the exponential function is a downward 1 opening quadratic bowl (Murphy, 2012). The coefficient, n 1 does not depend (2π) 2 | Σ | 2 on x, and is a normalizing factor to ensure that 1 T −1 exp − (x − µ ) Σ (x − µ ) dx1 dx2 · · · dxn = 1. (6.6) 2
The Covariance Matrix
The covariance matrix for the multivariate Gaussian is an important concept (Rencher, 2002), (Timm, 2002), and usually computed first when analyzing multivariate data. For a pair of random variables X and Y , their covariance is defined as
cov[X, Y ] = E[(X − E(X))(Y − E(Y ))] = E[XY ] − E[X] E[Y ].
The multivariate covariance matrix provides a succinct method for the covariances of all pairs for multiple variables (Kotz et al., 2000). The covariance matrix is denoted by Σ , and is an n × n matrix where the (i, j)th entry is cov[xi , xj ] (Anderson & Olkin, 1985).
80
Proposition 6.1 provides an alternative way to characterize the covariance matrix of a random vector x. Proposition 6.1. For any random vector x with mean µ and covariance matrix Σ ,
Σ = E[(x − µ )(x − µ )T ] = E[xxT ] − µµ T .
(6.7)
Proof: From Equation (6.7)
cov[x1 , x1 ] . . . cov[x1 , xn ] .. .. .. Σ= . . . cov[xn , x1 ] . . . cov[xn , xn ] 2 E[(x1 − µ1 ) ] . . . E[(x1 − µ1 )(xn − µn )] .. .. .. = . . . E[(xn − µn )(x1 − µ1 )] . . . E[(xn − µn )2 ] 2 (x1 − µ1 ) . . . (x1 − µ1 )(xn − µn ) .. .. ... = E . . (xn − µn )(x1 − µ1 ) . . . (xn − µn )2 x1 − µ1 .. = E [x1 − µ1 . . . xn − µn ] . xn − µ n
(6.8)
(6.9)
= E[(x − µ )(x − µ )T ].
Equation (6.8) is obtained by the fact that the expectation of a matrix is the matrix found by taking the componentwise expectation of each entry. Equation (6.9) is obtained
81
by the fact that for any vector z ∈ Rn it follows that,
z1 z1 z1 z1 z2 z2 z2 z1 z2 z2 z zT = .. z1 z2 . . . zn = .. .. . . . zn zn z1 zn z2
. . . z1 zn . . . z2 zn . .. .. . . . . . zn zn
(6.10)
Proposition 6.2. Suppose that Σ is the covariance matrix corresponding to some random vector x. Then Σ is symmetric positive semidefinite. Proof: The symmetry of Σ follows from the definition. Next for any vector z ∈ Rn , such that
T
z Σz = = =
n X n X i=1 j=1 n X n X
Σij zi zj
(6.11)
cov[xi , xj ] · zi zj
i=1 j=1 n X n X
h E
xi − E[xi ]
i xj − E[xj ] · zi zj
i=1 j=1
= E
n X n X
xi − E[xi ] xj − E[xj ] · zi zj
(6.12)
i=1 j=1
Equation (6.11) is obtained by expansion of quadratics and Equation (6.12) is the n X n X Σij zi zj is of the form linearity of expectations (Murphy, 2012). The quantity i=1 j=1
XX i
T
2
xi xj zi zj = (x z) ≥ 0. Thus, the quantity inside the expectation is always
j
nonnegative, hence the expectation itself must be nonnegative (Timm, 2002). Hence conclude that zT Σ z ≥ 0 (Murphy, 2012). 82
Any symmetric square real matrix Σ is said to be positive definite if for every nonzero column vector of real numbers, the scalar zT Σ z > 0. In the case for a positive semidefinate matrix the scalar zT Σ z ≥ 0.
From propositions (6.1) and (6.2) it follows that Σ must be symmetric positive semidefinate in order for it to be a valid covariance matrix (Li, 2012), (Timm, 2002) & (Rencher, 2002). As a requirement for the multivariate Gaussian density it is required that Σ must be invertible in order for Σ −1 to exists and hence full rank. Since any full rank symmetric positive semidefinite matrix is necessarily symmetric positive definite, it follows that Σ must be symmetric positive definite.
6.3.3
The Diagonal Covariance Matrix Case
In order to understand the multivariate Gaussian is to consider the case where n = 2, and where the covariance matrix Σ is diagonal, that is to say,
x1 x = ; x2
µ1 µ = ; µ2
83
2 σ1 0 Σ= . 2 0 σ2
(6.13)
In this case, the bivariate Gaussian density has the form,
T −1 1 x1 − µ1 σ12 0 x1 − µ1 1 N(x | µ , Σ ) = (6.14) 1 exp − 2 2 2 x − µ 0 σ x − µ 2 2 2 2 2 σ 2 0 1 2π 0 σ2 2 T 1 x1 − µ1 σ12 0 x1 − µ1 1 = 1 1 exp − 2 2 2 2 1 2π(σ1 · σ2 − 0 · 0) x2 − µ 2 x2 − µ 2 0 σ2 2 T
1 x1 − µ1 σ12 (x1 − µ1 ) 1 exp 1 − 2 2πσ1 σ2 1 x2 − µ 2 (x2 − µ2 ) σ22 " 2 2 # 1 1 x1 − µ 1 1 x2 − µ 2 = exp − − 2πσ1 σ2 2 σ1 2 σ2 " " # 2 2 # 1 x1 − µ 1 1 x2 − µ 2 1 1 exp − exp − =√ ·√ . (6.15) 2 σ1 2 σ2 2πσ1 2πσ2 =
Equation (6.15) is now the product of two independent Gaussian densities, one with a mean µ1 and variance σ12 , and the other with mean µ2 and variance σ22 . More generally, one can show that an n-dimensional Gaussian with mean µi ∈ Rn and diagonal matrix Σ,
2 σ1 0 0 σ22 Σ= .. .. . . 0 0
...
0 ... 0 . . . .. . . . . σn2
(6.16)
= diag σ12 , σ22 , . . . , σn2 , is the product of n independent Gaussian random variables with mean µi and variance σi2 , respectively (Murphy, 2012), (Wickens, 2002). 84
Multivariate Gaussian density has the form
N(x | µ , Σ ) =
n Y
√
i
2 1 e−αi (xi −µi ) . 2πσi
(6.17)
The right hand side of Equation (6.17) is evaluated by substituting individual vectors µ, Σ ) and taking the product. x1 , . . . , xn , in turn into the pdf of x ∼ N (µ If the variables are uncorrelated then the covariance matrix will be a diagonal matrix Equation (6.16), with variances of the individual variables as σx2i =
1 . 2αi
Thus the
elements of the random vector x = [x1 , x2 , . . . , xn ]T are independent random variables. As the covariance-variance matrix is diagonal, Equation (6.16), then the density function Equation (6.14) factors and the random variables are independent (Chatfield & Collins, 1980), (Wickens, 2002). Thus, if multivariate Gaussian distribution variables are uncorrelated implies and implied that variables are independent (Searle, 1971), (Wickens, 2002).
6.4
PARAMETER ESTIMATION FOR MULTIVARIATE GAUSSIAN DISTRIBUTIONS
There are several methods that are applied in order to estimate the parameters of multivariate Gaussian distributions. However, in this section the MLE is considered, due to its robustness. The proposed PCS method is also presented.
6.4.1
Multivariate Gaussian in the MLE Framework.
Theorem 6.3. (MLE for a Gaussian). Suppose a set of n i.i.d random variables x = [x1 , . . . , xn ]T is sampled from a multivariate Gaussian distribution xi ∼ N (x; µ , Σ ), then the MLE for the parameters is given by
85
n
µˆ mle
1 X def ¯, xi = x = n i=1
(6.18)
ˆ mle Σ
n n X X 1 1 ¯x ¯T . = (xi − µˆ ) (xi − µˆ )T = xi xTi − x n i=1 n i=1
(6.19)
That is, the MLE is just the empirical mean and empirical covariance. In the univariate case, the following familiar results are obtained:
µ ˆ =
σ ˆ2
1X xi = x¯, n i=1
(6.20)
X X 1 1 = (xi − x¯)2 = x2i − (¯ x) 2 . n i n i
(6.21)
Proof: To prove this result, there is need for several results from matrix algebra: ∂ n T o b a ∂a n o ∂ T a Aa ∂a o ∂ n Trace (BA) ∂A o ∂ n ln | A | ∂A
= b, T = A + A a, = BT ,
(6.22) def
= A−T = A−1
T
,
Trace (ABC) = Trace (CAB) = Trace (BCA) .
(6.23)
Equation (6.23) is called the cyclic permutation property of the trace operator (Murphy, 2012). Using this the scalar inner product xT Ax can be reordered as follows
T
T
T
Trace x Ax = Trace xx A = Trace Axx
86
.
(6.24)
Now the proof begins: The log-likelihood is L µ , Σ | x = ln p µ , Σ | x n
1X n = ln | Σ | − (xi − µ )T Σ −1 (xi − µ ) + const, 2 2 i=1 1 1 T −1 where p µ , Σ | x = q exp − (x − µ ) Σ (x − µ ) . 2 D Σ| (2π) |Σ
(6.25)
The the precision matrix is given by Λ = Σ −1 . Using the substitution yi = xi − µ and differentiating using the chain rule, thus obtain ∂ T −1 ∂yi ∂ (xi − µ )T Σ −1 (xi − µ ) = y Σ yi µ µ ∂µ ∂yi i ∂µ = −1 Σ −1 + Σ −T yi .
(6.26) (6.27)
Hence n
∂ 1X µ, Σ ) = − Σ−1 (xi − µ ) L (µ −2Σ µ ∂µ 2 i=1 −1
=Σ
n X
(xi − µ )
(6.28) (6.29)
i=1
= µˆ
(6.30) n
1X = xi n i=1
(6.31)
¯. =x
(6.32)
¯. Thus the MLE estimate of the mean µ is the empirical (sample) mean x Using the trace of a matrix (Timm, 2002), (Gallant, 1987) from Equation (6.24) the
87
log-likelihood for Λ as follows: L µ , Σ | x = ln p µ , Σ | x n
n 1X = − ln | Σ | − (xi − µ )T Σ −1 (xi − µ ) + const 2 2 i=1 n n 1X T −1 = − ln | Σ | − Trace Σ (xi − µ ) (xi − µ ) 2 2 i=1 n X 1 n [(xi − µ ) (xi − µ )T ] = − ln | Σ | − Trace Σ −1 2 2 i=1
(6.33)
n 1 = − ln | Σ | − Trace SµΣ −1 , 2 2 where def
Sµ =
n h X
i T µ µ (xi − ) (xi − ) .
(6.34)
i=1
The matrix Sµ is called the scatter matrix centered at µ (Murphy, 2012). Taking the derivative of Equation (6.33) w.r.t Λ yields ∂L (Λ) n −T 1 T = Λ − Sµ = 0 ∂Λ 2 2 1 −T −1 Λ = Λ = Σ = Sµ . n
(6.35) (6.36)
so n
X ˆ= 1 (xi − µˆ ) (xi − µˆ )T . Σ n i=1
(6.37)
Equation (6.37) is the same as the empirical covariance matrix centered on µ . If we ¯ (since both parameters must be simultaneously optimized), we plug-in the MLE µ = x get the standard equation for MLE of a covariance matrix. 88
6.4.2
The Principle of n-Cross Sections
Theorem 6.4. The sufficient parameters of a multivariate Gaussian distribution can be estimated as √ b 2π and 1. σ b=A 2. µ b=
1 . σ b (τ − 2ξη) − 1
Proof: The multivariate Gaussian distribution with two parameters can be represented as
2 −β(x
f (x1 , x2 , α, β, ρ, a, b) = Ae−α(x1 −ξa)
2 2 −ξb) −2ρ(x1 −ξa)(x2 −ξb)
.
f (x1 , x2 , H) = A exp −α(x1 − ξa)2 − β(x2 − ξb)2 − 2ρ(x1 − ξa)(x2 − ξb) , where
(6.38)
(6.39)
H = {α, β, ρ, a, b} .
The multivariate Gaussian distribution in Equation (6.38) has two variables (x1 , x2 ) and six unknown parameters {A, α, β, ρ, a, b}, where ξ is intuitively introduced as a scale parameter that is very small. These unknown parameters are functions of the sufficient parameters σ, µ. The unknown parameters H can be estimated by formulating six equations with six unknowns. This is a tedious process and instead the number of equations can be reduced by considering the proposed method by letting x2 = ki x1 , Equation (6.39) becomes a one-dimensional equation,
h i f (x1 , H) = A exp −α (x1 − ξa)2 − β (ki x1 − ξb)2 − 2ρ (x1 − ξa) (ki x1 − ξb) . (6.40)
89
Consider the argument of the exponent Λ, in Equation (6.40)
Λ = −α (x1 − ξa)2 − β (ki x1 − ξb)2 − 2ρ (x1 − ξa) (ki x1 − ξb)
(6.41)
Simplifying Equation (6.41) yields Λ = −α (x1 − ξa)2 − β (ki x1 − ξb)2 − 2ρ (x1 − ξa) (ki x1 − ξb) −α(x21 −2aξx1 +a2 ξ 2 ) −β (ki2 x21 −2bki ξx1 +b2 ξ 2 ) −2ρ(ki x21 −aki ξx1 −bξx1 +abξ 2 ) z }| {z }| {z }| { = −αa2 ξ 2 + 2αaξx1 − αx21 −βb2 ξ 2 + 2βbkξx1 − βki2 x21 +2ρbξx1 − 2ρki x21 − 2ρabξ 2 + 2ρaki ξx1 Constant Term
=
x21
−βki2
}| z { − 2ρki − α + 2x1 {αaξ + βbki ξ + ρ (aki ξ + bξ)} −ξ 2 αa2 + βb2 + 2ρab .
Let: τ = α + βki2 + 2ρki , η = αaξ + βbki ξ + ρ (aki ξ + bξ) , υ = αa2 + βb2 + 2ρab. (6.42) Thus the argument of the exponent Λ, in Equation (6.40) can be expressed as
Λ = −x21 τ + 2x1 η − ξ 2 υ.
(6.43)
The first approximation of the Gaussian multivariate Equation (6.38) can now be estimated as
e exp −τ x21 + 2ξηx1 fe(x1 , H) = A
where
e = A exp −ξ 2 υ . A
(6.44)
Consider the argument of the exponent in Equation (6.44)
φ = −τ x21 + 2ξηx1 .
90
(6.45)
Equation (6.45) is a quadratic, thus the argument of the multivariate Gaussian has been transformed to a quadratic in x1 . By completing the square, Equation (6.45) is transformed to
φ=
−τ x21
+ 2ξηx1
φ = −τ x21 + 2ξηx1
Letting ζ =
2ξη 2 = −τ x1 − x1 τ ( 2 2 ) 2ξη ξη ξη = −τ x21 − x1 + − − − τ τ τ ( ) 2 2 ξη ξη = −τ x1 − − − . τ τ 2 ξη (ξη)2 = −τ x1 − . + τ τ
(6.46)
(6.47)
η in Equation (6.47) thus τ φ = −τ x21 + 2ξηx1 = −τ (x1 − ξζ)2 − ξ 2 ηζ.
(6.48)
From the reparameterization of Equation (6.47) and Equation (6.48) the second approximation for the multivariate Gaussian Equation (6.38) is now e ee −τ (x1 −ξζ)2 fe(x1 ) = A e
where
ee e −ξ2 ηζ A = Ae .
(6.49)
Thus the approximations for the parameters of a two-variate Gaussian distribution with six unknowns, Equation (6.38), has been reduced to an estimation problem in one variable of three parameters. In order to estimate these parameters there is need for three equations, these equations are generated by hyperplanes that splice through the multivariate Gaussian. These hyperplanes (Pij for i = 0, 1, 2 and j = 0, 1, 2, 3, 4, 5, 6), are separated at angles αj and kj .The multivariate Gaussian is then reflected on these geometric hyperplanes as one-dimensional Gaussian distributions and used to approximate the parameters of the multivariate Gaussian. 91
Equation (6.49) is the solution of the first order homogeneous linear differential equation with constant coefficient e dfe e + η fe = 0. dx1
(6.50)
In order to estimate the parameters of Equation (6.49) a method of minimizing the goal function
G = min i→n
n X i=1
2 ee 2τ ξζ feei − 2xi τ feei − df i , dxi
(6.51)
is applied to estimate the multivariate Gaussian on the hyperplanes. Through the method of back substitution the parameters and goal function minimization thus the estimation of the parameters for the multivariate Gaussian is done using the method of the principle of n-cross sections. Thus the sufficient parameters of the multivariate Gaussian distribution are estimated √ b 2π, 1. σ b=A 2. µ b=
1 . σ b (τ − 2ξη) − 1
6.5
NUMERICAL SIMULATIONS
In this section a numerical simulation is done on a simulated multivariate Gaussian distribution and the parameters are estimated using the method of the principle of ncross sections. The data generation and computations of the estimates are done using R Mathematica CAS software.
92
Considering the functional form of a two-variate Gaussian distribution, f (x1 , x2 , α, β, ρ, a, b) = f (x1 , x2 , H) 2
2
= Ae−α(x1 −ξa) −β(x2 −ξb) −2ρ(x1 −ξa)(x2 −ξb) = A exp −α(x1 − ξa)2 − β(x2 − ξb)2 − 2ρ(x1 − ξa)(x2 − ξb) where
H = {α, β, ρ, a, b} , (6.52)
is generated for X = x1 , . . . , xn , in this data set n is set to 282. Thus 282 data points are considered. Equation (6.52) has the following exact values for the parameters that are going to be estimated, A = 1.03, ξ = 0.143, α = 5.203, β = 5.047, ρ = 3.0131, a = 0.1 b = 0.3 σ = 2.581, and µ = 0.092. Equation (6.52) with the given parameters can be represented as Figure 6.1.
Figure 6.1: The Multivariate (2D) Gaussian Density Function for exact values A = 1.03, ξ = 0.143, α = 5.203, β = 5.047, ρ = 3.0131, a = 0.1 and b = 0.3
93
The estimation of the multivariate Gaussian distribution is done by generating seven cross sections (hyperplanes) that intersect with the graph in Figure 6.1. The data of the multivariate Gaussian on the hyperplanes Pij for i = 0, 1, 2 and j = 0, 1, 2, 3, 4, 5, 6, is generated as a two dimension data and the estimation of the parameters is done. For this numeric simulation the following hyper planes were generated: Table 6.1: Hyperplanes for the Numerical Simulation
Hyperplane Pij
kj
Angle αj α0 = 0
P22
k0 = 0 √ k1 = 3 √ k2 = − 3
P13
k3 = 1
α3 =
P24
k4 = −1
α4 = − π4
P15
k5 =
P26
k6 = −
P00 P11
α1 =
π 3
α2 = − π3 π 4
√
2 2 √
2 2
α5 = 0.6154797 α6 = −0.6154797
The hyperplanes generated in Table 6.1 reduce the multivariate Gaussian to a one dimensional Gaussian that is used to estimate the parameters of the second approximation, ee Equation (6.49). The unknown parameters A, τ and ζ are approximated by minimizing the goal function, Equation (6.51). The following results are obtained for the estimation of the second approximation using the generated hyperplanes Pij assuming ξ ≪ thus ξ = 1 × 10−1 .
94
Table 6.2: Minimization of the Goal Function for the Second Approximation using the Hyperplanes Pij
Hyperplane
ee A
τ
ζ
η
P00
1.022595555
5.202012751
0.037119404
0.193095613
P11
1.018863171
30.75061713
0.019198013
0.59035074
P22
1.02193435
9.903096606
| −0.026856134 |
| −0.265958891 |
P13
1.024775658
16.26751179
0.0264374
0.430070721
P24
1.016342564
4.222994171
| −0.015267556 |
| −0.064474802 |
P15
1.025458907
11.98295359
0.030277166
0.362809873
P26
1.015285857
3.464224689
0.003783799
0.013107929
Since τ, η and υ are given by Equation (6.42), from Table 6.2 a system of equations ee for estimating the unknown parameters A, τ, ζ and η can be generated, from Equation (6.42). Table 6.3 represents the estimated unknown parameters in the functional form of the multivariate Gaussian distribution obtained using the principle of n-cross sections. Using appropriate algebraic formulations the required sufficient parameters σ and µ are then computed as
µ b=
√ b 2π, and σ b=A
(6.53)
1 . σ b (τ − 2ξη) − 1
(6.54)
95
Table 6.3: Comparison of the Exact Solutions and Parameter Estimation using the Principle of n-Cross Sections
Values Parameter
Exact
Approximated
A
1.03
1.023
ξ
0.143
0.100
α
5.203
5.4256
β
5.047
5.3548
ρ
3.0131
2.74356
a
0.10
0.151
b
0.30
0.437
σ
3.00
2.582
µ
0.100
0.0918
Figure 6.2: The Multivariate (2D) Gaussian Density Function for Approximated values A = 1.023, ξ = 0.100, α = 5.4256, β = 5.3548, ρ = 2.74356, a = 0.151 and b = 0.437 96
Figure 6.2 represents the approximated multivariate Gaussian distribution of Equation (6.52) represented by Figure 6.1.
6.6
CONCLUSION
The approximation of the multivariate Gaussian distribution is done using the proposed approach, principle n-cross sections. The developed method estimates the functions of the sufficient parameters of the multivariate Gaussian distribution very close to the exact values see Table 6.3. In Figure 6.4, a comparison of the exact and approximated multivariate Gaussian distributions are compared side by side.
Figure 6.3: The Multivariate Gaussian Density Function for Exact Values
97
Figure 6.4: The Multivariate Gaussian Density Function for Approximated Values
Thus the proposed method approximates the multivariate Gaussian very well see Figure 6.3. The method, principle of n-cross sections can thus be used to approximate the multivariate Gaussian distribution. Care should be taken when approximating the values of the parameter functions from the system of equations as the absolute values are considered. This method can also be implemented in software and automated to estimate the multivariate Gaussian models.
98
CHAPTER 7. PARAMETER ESTIMATION FOR A SPECIALIZED GAUSSIAN DISTRIBUTION
7.1
OVERVIEW
In this Chapter two novel algorithms for the estimation of the sufficient parameters for a specialized Gaussian distribution are presented. Traditional methods (that is maximum likelihood estimators (MLE), method of moments estimator (MME)) for the estimation of parameters require initial guess values (IGVs) and a definition of the parameter space (McNicol, 2005), that may result in high computation time, (Hastie & Tibshirani, 1996) and (Figueiredo et al., 2002). The specialized Gaussian distribution is a mixture of two Gaussian pdf’s, with unequal amplitude (kurtosis) and equal variance respectively. The proposed algorithms approach the estimation of the parameters mathematically, by using the multiple goal function framework, Taylor series decomposition and the least squares methodology. Numerical simulations are done for the developed algorithms to evaluate the performance of the proposed algorithms (specialization of the Gaussian 1 algorithm (STG1), using the multiple goal function framework for α 6= β, and specialization of the Gaussian 2 algorithm (STG2), by using the analytic framework, for α = β), and specialization of the Gaussian 3 - using computer algebra system (STG3 Algorithm for α = β). It was established that the STG1 algorithm with order two has good estimations for the specialized Gaussian distribution. The STG2 algorithm yield exact approximations for the specialized Gaussian distribution but at the cost of computational time. 99
7.2
INTRODUCTION
In the field of astrophysics it is important to distinguish between one star or two stars that are in a cluster (Fabian, 1992). In some clusters what appears to be a star may be a collection of two or more stars. Thus the estimation of the specialization of the Gaussian is essential in the field of astrophysics. Estimation of parameters is also used in the field of helioseismology (Palle & Esteban, 2013) and asteroseismology (Gizon et al., 2008), for the estimation of stellar oscillation parameters (Gizon et al., 2008). Where the stochastic solar-like oscillations and deterministic sinusoidal oscillations are considered (Gizon et al., 2008), application of estimations in X-ray astronomy (Wachter et al., 1978), signal-detection models (Wickens, 2002) and (McNicol, 2005), determine cosmological parameters from the result of observations of the anisotropies in the cosmic microwave background (Christensen et al., 2001). The goal of parameter estimation is that knowing a signal is present (cluster of starts) does not provide any pertinent information for doing physics or astrophysics (isolating the stars) (Palle & Esteban, 2013). Some methods used for parameter estimation include Maximum Likelihood (Palle & Esteban, 2013), Bayesian approach (Palle & Esteban, 2013), Least squares estimation (Kaastra, 2008). In this chapter three algorithms are presented and tested using numerical simulations to estimate the parameters of the specialized Gaussian distributions. The chapter is organized as follows: In Section 7.3, the motivation of the proposed algorithms is described. In Section 7.4, the first algorithm (STG1 Algorithm for α 6= β), for the estimation of the parameters is presented and tested on numerical simulation. A comparison of the proposed algorithm to existing estimation methods, maximum likelihood estimator (MLE) and method of moments estimator (MME) are conducted. In Section 7.5, the second algorithm (STG2 Algorithm for α = β), is presented. In Section 7.6, the third algorithm (STG3 Algorithm for α = β), is presented tested on numerical simulation. In Section 7.7, the results of the performance of the three algorithms are
100
presented.
7.3
MOTIVATION
Parameter estimation in astrophysics is a fundamental requirement in order to understand the physics of space. In order to estimate the parameters of different models there is need to improve on the speed at which the different algorithms can be able to compute the necessary parameters and that the methods are also tractable. Traditional parameter estimation methods require initialization, the proposed methods require no initialization and may be used as estimators for initial values for the traditional methods. 7.3.1
Specialization For The Mixture Of The Gaussian Distributions
The specialization of the mixture of the Gaussian distribution is given by 2
2
f (r) = Ae−α(r−εa) + Be−β(r−εb) ,
(7.1)
where r, ε, α, β, a, b ∈ R.
Two cases for the specialization of the Gaussian mixtures, Equation (7.1) are considered in Section 7.4 for α 6= β, Sections 7.5 for α = β. In the case of α = β the specialization of the Gaussian is a standard mixture of two Gaussian distributions, while in the case of α 6= β Equation (7.1) transforms to the specialization of the Gaussian distributions as 2
2
f (r) = Ae−α(r−εa) + Be−α(r−εb) ,
(7.2)
where r, ε, α, a, b ∈ R.
In practice Equation (7.2) has different amplitudes Ai , which are independent of αi but have equal variances. 101
Considering the general form of the Gaussian distribution model (Mkolesia et al., 2016b)
f (r) =
n X
2
Ai e−α(r−εai ) ,
(7.3)
i
where Ai = p
1
,α= 2
2πσi
1 and µi = εai . 2σi2
From Equation (7.3) the specialization mixture of two Gaussian Equation (7.2) has
A1 = A,
A2 = B,
(7.4)
a1 = a,
a2 = b.
(7.5)
From Equations (7.2) and (7.3) the moments which are the sufficient parameters (Wickens, 2002), of the specialization of the Gaussian distribution can be computed using the series approximation with ordinary least squares method, as 1 αi = 2 2σi µi = εai
⇒ σi = ± ⇒
q
1 2αi
µi = εai
Biased,
(7.6)
Unbiased,
(7.7)
where i = 1, 2.
Thus the sufficient parameters for the mixture of a specialization of the Gaussian, Equation (7.2)
σi2 is the scale parameter, µi is location parameter, and r is the support for r ∼ N(µ, σ 2 ).
The expectation of the random variable is given as µ = E (r) and the expected squared 2 deviation from the mean is given by σ 2 = E r2 − E (r) . The support r for the 102
Gaussian pdf is the subset of the domain containing the elements which are not mapped to zero. Thus the support of the Gaussian pdf are all non-zero values of the Gaussian pdf that “live”on Supp(f ). Thus
f : X −→ R, Supp(f ) = {r ∈ X | f (r) 6= 0}.
7.3.2
(7.8)
Methodology for the Specialization of the Gaussian Distributions
Algorithms of estimating the sufficient parameters for the specialization of the Gaussian distributions are proposed and considered, in Sections 7.4, 7.5, and 7.6, by employing different algorithms. The specialization of the Gaussian is given by Equation (7.2). Splitting Equation (7.2) in to even and odd parts in order to use the series approximations and ordinary least squares method to determine the sufficient parameters as follows: The even part of a mixture of two Gaussian distributions is given as 1 f (r) + f (−r) 2 1 −α(r−εa)2 1 −α(r−εb)2 1 −α(−r−εa)2 1 −α(−r−εb)2 = Ae + Be + Ae + Be . 2 2 2 2
fe (r) =
(7.9)
the odd part of a mixture of two Gaussian distributions is give as 1 f (r) − f (−r) 2 1 −α(r−εa)2 1 −α(r−εb)2 1 −α(−r−εa)2 1 −α(−r−εb)2 = Ae + Be − Ae − Be . 2 2 2 2
fo (r) =
103
(7.10)
The first series approximation of Equation (7.9) at ε = 0 is h i 2 fee = 2α2 Aa2 + Bb2 ε2 r2 − α Aa2 + Bb2 ε2 + A + B e−αr + O ε3 r 3 + O ε3 .
(7.11)
Equation (7.10) has the series first approximation at ε = 0 as 2 2 2 2 3 2 3 3 2 3 3 e fo =2εr α Aa + Bb r ε − α Aa + Bb ε + α (Aa + Bb) e−αr 3 + O ε4 r 4 + O ε4 . (7.12)
Simplifying Equation (7.12) yields 2 2 2 feo 2 3 2 3 3 2 3 3 = α Aa + Bb ε r − α Aa + Bb ε + α (Bb + Aa) e−αr 2εr 3 + O ε3 r 3 + O ε3 . (7.13)
The odd part fo Equation (7.13) can be approximated as the even part Equation (7.11) (Kikawa, 2013). Equations (7.11) and (7.13) have error of order three in ε, O(ε3 ), while Equation (7.12) has an error of order four in ε, O(ε4 ).
104
2
2
Theorem 7.1. The specialized Gaussian distributions, f (r) = Ae−α(r−εa) +Be−β(r−εb) , where r, ε, a, b ∈ R, and α 6= β. The sufficient parameters σi and µi for i = 1, 2, contained in A, α, a, B, β, b, can be estimated using a method of multiple goal functions N
G1 =
2 1 X c1 I1,i + c2 I2,i + c3 4ρi − 4Ui −→ min, 2 i=0 N
i2 1 X h −bαρi b i −βρ G2 = Ae + Be − Ui −→ min, 2 i=0 N i2 1 Xh b i −b αρi −βρ w1 e + w2 e − Ui −→ min, G3 = 2 i=0 2 N 1 X b −bα(ri −εba)2 b −βb(ri −εbb)2 G4 = − Ui −→ min . Ae + Be 2 i=0
where Ui = Ui (ρ) = fi r2 , ρ = r2 and sufficient parameters are then estimated as s σ1 = ±
s
1 , b 2α b
σ2 = ±
µ1 = εb b a ,
1 , bb 2β
b µ2 = εbb ,
where
b b and B b are independent of α A b, β, σ1 and σ2 are biased, µ1 and µ2 are unbiased.
Note 1: The specialized Gaussian is considered as a specialization of the Gaussian for the case of two mixed Gaussian. Note 2: The formulated algorithm, Theorem (7.1) the terms used, are proportional to errors of order one, O (1) in fee and O (ε) in feo , Equations (7.12) and (7.13).
105
Proof:
7.4
The proof is Section 7.4.
ALGORITHM 1: SPECIALIZATION OF THE GAUSSIAN 1 (STG1) IN THE MULTIPLE GOAL FUNCTIONS (MGF) FRAMEWORK
The parameter estimation for the specialization of the Gaussian (α 6= β) is considered. The specialization of the Gaussian is given by 2
2
f (r) = Ae−α(r−εa) + Be−β(r−εb) ,
(7.14)
where r, ε, α, β, a, b ∈ R and α 6= β.
Equation 7.14 has even and odd parts given by Equations (7.9) and (7.10), respectively. Expanding Equations (7.9) and (7.10) w.r.t ε in the first approximation, fe (r) ∼ = Ae−αr + Be−βr ,
(7.15)
2 2 fo (r) ∼ = αaAe−αr + βbBe−βr .
(7.16)
2
2
Equations (7.15) and (7.16) are the general solutions of the linear constant coefficient ordinary differential equation d2 U dU + c1 + c2 U = 0, 2 dρ dρ
(7.17)
where c1 = α + β, and c2 = αβ, c1 and c2 are unknown constants, U = U (ρ) = fe r2 , where ρ = r2 .
106
(7.18)
Taking the first integral 1st
R
of Equation (7.17)
U 0 − U00 + c1 (U − U0 ) + c2 I1 = 0, dU dU 0 0 where U = , U0 = , U0 = U (ρ = β) , dρ dρ ρ=ρ0 Z ρ U (σ) dσ. and I1 = I1 (ρ) =
(7.19)
ρ0
Introducing a new unknown constant c3 into Equation (7.19), this avoids integration of the integral-differential equation. Thus Equation (7.19) can also be written as
U 0 + c1 U + c2 I1 + c3 = 0
(7.20)
where c3 = − U00 + c1 U0 .
(7.21)
Taking the second integral 2nd
R
of Equation (7.20)
(U − U0 ) + c1 I1 + c2 I2 + c3 4ρ = 0, Z ρ where I2 = I1 (σ) dσ 4ρ = ρ − ρ0 ,
(7.22)
ρ0
and − 4U = U − U0 .
7.4.1
First Goal Function (G1 )
In order to compute the unknown variables c1 , c2 , and c3 , the first goal function is constructed
G1 = G1 (c1 , c2 , c3 )
(7.23)
N
2 1 X c1 I1,i + c2 I2,i + c3 4ρi − 4Ui −→ min . = 2 i=0 Minimizing this goal function Equation (7.23) the unknown constants can be com107
puted. Taking the derivative of Equation (7.23) with respect to c1 , c2 , and c3 N
N
N
N
X X X X ∂G1 2 = c1 I1,i + c2 I1,i · I2,i + c3 I1,i · 4ρi − I1,i · 4Ui = 0, ∂c1 i=0 i=0 i=0 i=0 (7.24) N
N
N
N
X X X X ∂G1 2 = c1 I1,i · I2,i + c2 I2,i + c3 I2,i · 4ρi − I2,i · 4Ui = 0, ∂c2 i=0 i=0 i=0 i=0 (7.25) N
N
N
N
X X X X ∂G1 = c1 I1,i · 4ρi + c2 I1,i · I2,i + c3 (4ρi )2 − (4ρi · 4Ui ) = 0. ∂c3 i=0 i=0 i=0 i=0 (7.26) From Equations (7.24), (7.25) and (7.26) the unknown constants c1 , c2 , c3 can be obtained. Thus the parameters α and β can now be estimated from equation (7.17)
β = c1 − α → α (c1 − α) = c2
→ α2 − c1 · α + c2 = 0.
(7.27)
Thus from Equation (7.27) the estimates of parameters α and β are computed as p c21 − 4c2 , α b= p2 c1 + c21 − 4c2 βb = . 2 c1 −
108
(7.28) (7.29)
7.4.2
Second Goal Function (G2 )
By generating the second goal function with respect to A and B using the estimates obtained in Equation (7.28)
G2 = G2 (A, B)
(7.30)
N
=
i2 1 X h −bαρi b Ae + Be−βρi − Ui −→ min . 2 i=0
Minimizing Equation (7.30) the estimates of the unknown constants A and B can be computed. Taking the derivative of Equation (7.30) with respect to A and B N
N
n
X X X ∂G2 −(α b+βb)ρi −b αρi −2b αρi e · Ui = 0, e − e +B =A ∂A i=0 i=0 i=0 n
N
(7.31)
N
X X b X b ∂G2 b =A e−(αb+β )ρi + B e−2βρi − e−βρi · Ui = 0. ∂B i=0 i=0 i=0
(7.32)
By solving Equations (7.31) and (7.31) the estimates of A and B can be obtained
b A = A;
b B = B.
109
(7.33)
7.4.3
Third Goal Function (G3 )
The third goal function is generated with respect to a and b using the estimates obtained in Equation (7.28 and 7.33)
G3 = G3 (w1 , w2 )
(7.34)
N
=
i2 1 Xh b w1 e−bαρi + w2 e−βρi − Ui −→ min . 2 i=0
w1 b → a=b where w1 = 2b α aA a= , b 2b αA bB b → b = bb = w2 . w2 = 2βb b 2βbB
(7.35) (7.36)
Minimizing the third goal function Equation (7.34) the estimates of the unknown constants a and b can be computed. Taking the derivative of Equation (7.34) with respect to w1 and w2 N
N
N
X X X ∂G3 b = w1 e−2bαρi + w2 e−(αb+β )ρi − e−bαρi Ui = 0, ∂w1 i=0 i=0 i=0 N
(7.37)
N
N
X b X b X ∂G3 b e−βρi · Ui = 0. e−2βρi − e−(αb+β )ρi + w2 = w1 ∂w2 i=0 i=0 i=0
(7.38)
By solving Equations (7.37) and (7.38) the estimates of a and b are computed, Equations (7.35) and (7.36).
110
7.4.4
Fourth Goal Function (G4 )
The fourth goal function is generated to minimize the function f (r) Equation (7.14) using the estimates obtained in Equations (7.33), (7.28), (7.35) and (7.36) respectively. bb b B, b α G4 = G4 A, b, β, a, bb 2 N 1 X b −bα(ri −εba)2 b −βb(ri −εbb)2 = − Ui −→ min . Ae + Be 2 i=0
(7.39)
Thus the estimates of the specialization of the Gaussian model can be computed using multiple goal functions, Equations ( 7.23), (7.30), (7.34), and (7.39). In order to estimate the parameters of the specialization of the Gaussian Equation(7.14). The least squares method (LS) is used to estimate the parameters of the fourth goal function. In order to use the LS method there is need to define the boundaries and intervals for the parameters. In this case the parameters are estimated using multiple goal functions. This method does not require any initialization like the (MLE and MME methods), the parameters are estimated using the series expansion of the specialization of the Gaussian mixture and generating multiple goal functions. The algorithm has seven steps that are executed in order to estimate the parameters. Algorithm 1: Specialization of the Gaussian 1 (STG1) Multiple Goal Functions (MGF) Framework • STEP 1: Determine the even and odd parts of the function (f0 (r) and fe (r)) which is a solution for a linear ordinary differential equation with constant coefficients. R • STEP 2: Compute the first Numerical integral (1st ). R • STEP 3: Compute the second numerical integral (2nd ). • STEP 4: Construct and evaluate the first goal function (G1 ) in order to approxib mate α = α b and β = β. 111
• STEP 5: Construct and compute the second goal function (G2 ) this will approxib and B = B. b mate A = A • STEP 6: Construct and calculate the third goal function (G3 ) to approximate a=b a and b = bb. • STEP 7: Construct the fourth goal function (G4 ) and estimate the parameters of b bb b b bb bb the specialization of the Gaussian model A, α b, b a, B, β and bb using the approxb and bb. b α b β, imated values A, b, b a, B, Thus the sufficient parameters are then estimated as s σ1 = ±
s
1 , b 2α b
σ2 = ±
1 , b 2βb
b µ2 = εbb ,
µ1 = εb b a ,
7.4.5
Numerical Simulation for Algorithm 1: Specialization of the Gaussian 1 (STG1) Multiple Goal Functions (MGF) Framework
In this section a numerical simulation is used to check the performance of the proposed STG1 algorithm. Consider the the specialized mixture of two Gaussians given by 2
2
f (r) = Ae−α(r−εa) + Be−β(r−εb) , where
ε = 1, A = 1, α = 0.8, a = −0.3,
B = 1.6, β = 1.7, b = 0.2.
Figure 7.1 represents the original function Equation (7.40).
112
(7.40)
f(r) 2.5
2.0
1.5 f(r) 1.0
0.5
-3
-2
1
-1
2
3
r
Figure 7.1: Two Gaussian density mixture functions for A = 1, α = 0.8, a = −0.3, B = 1.6, β = 1.7, b = 0.2
STEP 1: Even and Odd Parts of the Function The first step is to get the even and odd parts of the Gaussian distributions given by 1 f (r) + f (−r) , 2 1 fo (r) = f (r) − f (−r) . 2 fe (r) =
The odd and even parts of the density function is represented by Figure 7.2.
113
(7.41) (7.42)
f(r) 2.5
2.0
1.5 fo (r) fe (r) 1.0
0.5
-3
-2
1
-1
2
3
r
Figure 7.2: Specialization of the Gaussian Density Function Odd and Even Parts, A = 1, α = 0.8, a = −0.3, B = 1.6, β = 1.7, b = 0.2
The first derivative of Equation (7.40) 2 2 df (r) = Aαe−α(a−r) (2a − 2r) + Bβe−β(b−r) (2b − 2r) . dr
(7.43)
The second derivative of Equation (7.40) h i h i d2 f 2 2 −α(a−r)2 −β(b−r)2 = 2Aαe α (a − r) − 1 + 4Bβe β (b − r) − 1 . dr2
(7.44)
(1st )
STEP 2: First Numerical Integral
By interpolation using cubic splines into the data generated by the odd part of the function the following numerical integral, Equation (7.19) can now be computed Z
ρ
I1 =
U (σ)dσ. ρ0
114
(7.45)
STEP 3: Second Numerical Integral (2nd ) Now integrating Equation (7.45) from Equation (7.22) Z
ρ
I1 (σ)dσ.
I2 =
(7.46)
ρ0
STEP 4: First Goal Function: Approximate α = α b and β = βb From Equations (7.45) and (7.46) the first goal function is generated and the estimations of α = α b and β = βb are computed
N h X
2 i
N X
N X
N X
I1,i I1,i · I2,i I1,i · 4ρi I1,i · 4Ui i=0 c1 i=0 i=0 i=0 N N N h N i X X X X 2 I2,i I1,i · I2,i I2,i · 4ρi I2,i · 4Ui c2 = . i=0 i=0 i=0 N i=0 N N N h i X c3 X X X 2 I1,i · 4ρi I1,i · I2,i (4ρi ) [4ρi · 4Ui ] i=0
i=0
(7.47)
i=0
i=0
Numerically solving Equation (7.47) yields c1 2.122471236 c = 0.9848751487 . 2 c3 −2.3111534166
(7.48)
Thus the estimates of the parameters α and β can be computed from Equation (7.28)
α b = 0.8
and
βb = 1.7.
b and B = B b STEP 5: Second Goal Function: Approximate A = A
115
(7.49)
Using the estimated parameters in Equation (7.49) the second goal function is
N i h i X b e−2bαρi e−(αb+β )ρi e−2bαρi · Ui A i=0 i=0 i=0 N h = N h . N i h i i X X − αb+βb ρ B X −2βρ b b −2 βρ ( ) i i i e e e · Ui N h X
i
i=0
N h X
i=0
(7.50)
i=0
Numerically solving Equation (7.50) yields A 0.8629271279 = . B 1.56261340396
(7.51)
b and B = B b are estimated as From Equation (7.50) the parameters A = A
b = 0.8629271279 A
and
b = 1.56261340396. B
(7.52)
STEP 6: Third Goal Function: Approximate a = b a and b = bb Using the estimated parameters in Equation (7.50) the third goal function is N h N h i i X X b e−bαρi · Ui e−2bαρi e−(αb+β )ρi w1 i=0 i=0 i=0 . = N h N h N h i i X X X − αb+βb ρ i b b e−βρi · Ui e ( )i e−2βρi w2
N h X
i=0
i
(7.53)
i=0
i=0
From Equation (7.53) the parameters a = b a and b = bb are estimated numerically yields w1 −0.5043494082 = . w2 1.05995333
116
(7.54)
Solving Equation (7.54) for the parameters a = b a and b = bb b a = −0.3652709978
and
bb = 0.199506147.
(7.55)
STEP 7: Fourth Goal Function: Estimate The Parameters of the Function b and bb bα b β, Using the Approximated values A, b, b a, B, Using the approximated parameters obtained in Equations (7.48), (7.51) and (7.54), the fourth goal function is constructed in order to estimate the parameters of the original specialization of the Gaussian model Equation (7.39). The goal function is of the form 2 N 1 X b −bα(ri −ba)2 b −Bb(ri −bb)2 G4 = Ae + Be − Ui . 2 i=0 Minimizing the goal function, Equation (7.56), G4
(7.56)
→ 0 with the first estimates from
the STG1 algorithm b A 0.8629271279 0.6852756624 α b b a −0.3652709978 = , b 1.5926134039 B βb 1.4371955736 bb 0.199506147
117
(7.57)
the following estimates are obtained bb A 0.9999999713 α 0.7999999766 b b b b a −0.3000000613 = . b b B 1.6000000738 b βb 1.7000000805 b bb 0.2000000204 7.4.6
(7.58)
Summary of Results for the Parameter Estimation using the STG1 the principle of MGF
Table 7.1: Comparison of the Exact solution s and Parameter Estimation using Algorithm 1: Specialization of the Gaussian 1 (STG1) Multiple Goal Functions (MGF) Framework
Values Parameter
Exact
Estimated (STG1)
Relative Error
Absolute Errors
A
1
0.9999999713
2.87000000431803 × 10−8
2.87000000431803 × 10−6
α
0.8
0.7999999766
2.34000000487455 × 10−8
2.92500000609319 × 10−6
a
−0.3
−0.3000000613
6.130000002047 × 10−8
2.04333333401567 × 10−6
B
1.6
1.6000000738
7.37999998889904 × 10−8
4.6124999930619 × 10−6
β
1.7
1.7000000805
8.04999999992617 × 10−8
4.73529411760363 × 10−6
b
0.2
0.2000000204
2.03999999948135 × 10−8
1.01999999974067 × 10−6
From Table 7.1 it can be observed that the proposed algorithm of using the multiple goal functions estimates the parameters of the specialization of the Gaussian model very well as illustrated by Figure 7.3. The proposed algorithm gives a fairy accurate 118
approximation for the function. f(r) 2.5
2.0
1.5
∧
f (r) f(r)
1.0
0.5
-3
-2
1
-1
2
3
r
Figure 7.3: Specialization of the Gaussian Density Function Numerical Comparison of the Exact and Estimated Functions
Thus the sufficient parameters of the specialization of the Gaussian distribution can be computed using Equations (7.6), and (7.7). The estimated parameters using the STG1 algorithm is exact up to five decimal places, Table 7.1. The relative errors and absolute errors also indicate that the STG1 algorithm works well in the estimation of the parameters, in the regions of 4 × 10−8 and 2 × 10−6 respectively, Table 7.1.
119
Table 7.2: Sufficient Parameter Estimation using Algorithm 1: Specialization of the Gaussian 1 (STG1) the principle of Multiple Goal Functions (MGF) framework
Values Parameter
Exact
Estimated (STG1)
Relative Error
Absolute Errors
µ1
0.3
0.3000000613
6.1300000020 × 10−8
2.0433333340 × 10−7
µ2
0.2
0.2000000204
2.0399999995 × 10−8
1.0199999997 × 10−7
σ1
0.7905694150
0.790569427
1.1562077851 × 10−8
1.4625000197 × 10−8
σ2
0.5423261445
0.5423261317
1.2840368546 × 10−8
2.3676469732 × 10−8
Table 7.2 illustrates the comparison between the exact and estimated values for the sufficient parameters of the specialization of the Gaussian distribution.
Table 7.3: Sufficient Parameter Estimation using the proposed algorith STG1, MLE and MME
Estimated Values Parameter
Exact
STG1
MLE
MME
µ1
0.3
0.3000000613
0.3417969012
1.898323498
µ2
0.2
0.2000000204
0.2731198325
0.6915771972
σ1
0.7905694150
0.790569427
0.7268802337
0.8278740844
σ2
0.5423261445
0.5423261317
0.4722845497
0.6582031193
The comparison of the estimates of the sufficient parameters for the specialized Gaussian using MLE and MME, Table 7.3, show that the MLE gives estimations that accurate to one decimal place, to the exact values than the MME method, which has a higher dis120
parity from the exact values. Comparing the approximation of the results for the MLE Table 7.3 and the proposed algorithm STG1 Table 7.2, it can be seen the the proposed method performs better than the MLE and MME in the estimation of the specialized Gaussian distribution. However the STG1 and MLE approaches are good estimators with in two decimal places.
7.5
ALGORITHM 2: SPECIALIZATION OF THE GAUSSIAN MIXTURES 2 (STG2) IN THE ANALYTIC FRAMEWORK
In this section the specialization of the Gaussian mixture parameters for (α = β) are approximated using the algebraic framework. From Equations (7.11) and (7.13), by the re-parametrization
C1 = A + B,
(7.59)
C2 = Aa + Bb,
(7.60)
C3 = Aa2 + Bb2 ,
(7.61)
C4 = Aa3 + Bb3 .
(7.62)
From Equations (7.59), (7.60), (7.61), and (7.62), the analytic solution can be computed by generating a system of equations A+B = C1 SET I. Aa + Bb = C2
121
(7.63)
From set I, Equation (7.63) the following is obtained
4(1) o
(1)
4A
1 1 = = b − a, a b C1 1 = = C1 b − C2 , C b 2
(1)
4B
1 C1 = = C2 b − C1 a. a C 2
(7.64)
Solving for A and B yields (1)
A= B=
4A
(1) 4o (1) 4B (1) 4o
Aa + Bb Aa2 + Bb2
=
C1 b − C2 , b−a
(7.65)
=
C2 − C1 a . b−a
(7.66)
= C2 SET II. =C
(7.67)
3
From set II, Equation (7.67) the following is obtained
4(2) o
(2)
4A
a b = = ab (b − a) , a2 b 2 C2 b = = b (C2 b − C3 ) , C b2 3
(2)
4B
122
a C2 = = a (C3 − C2 a) . a2 C 3
(7.68)
Solving for A and B yields (2)
A= B=
4A
(2) 4o (2) 4B (2) 4o
2
Aa + Bb
2
Aa3 + Bb3
=
C2 b − C3 , a (b − a)
(7.69)
=
C3 − C2 a . b (b − a)
(7.70)
= C3 SET III. =C
(7.71)
4
From set III, Equation (7.67) the following is obtained
4(2) o
(3)
4A
a2 = a3 C3 = C 4
b = a2 b2 (b − a) , b3 b2 = b2 (C3 b − C4 ) , b3 2
(3)
4B
a2 C 3 = = a2 (C4 − C3 a) . a3 C 4
(7.72)
Solving for A and B yields (3)
A= B=
4A
(3) 4o (3) 4B (3) 4o
=
C3 b − C4 , a2 (b − a)
(7.73)
=
C4 − C3 a . b2 (b − a)
(7.74)
Thus from solution sets I, II and III Equations (7.65), (7.65) and (7.65) respectively,
123
the following system of equations are obtained IV
z }| { C1 b − C2 C2 b − C3 C3 b − C4 A= . = = 2 b−a a (b − a) a (b − a) {z } |
(7.75)
V
VI
z }| { C3 − C2 a C4 − C3 a C2 − C1 a B= = = 2 . b−a b (b − a) b (b − a) | {z }
(7.76)
VII
From Equation (7.75) sets IV, and V are obtained. Thus the following equations can be established
a (C1 b − C2 ) = C2 b − C3
−→ C3 = C2 (a + b) − C1 (ab) ,
(7.77)
a (C2 b − C3 ) = C3 b − C4
−→ C4 = C3 (a + b) − C2 (ab) .
(7.78)
Considering Equations (7.77) and (7.77) linear in a + b = S “sum”, and ab = P “product”. Thus
4(4) o
(4)
4S
(4)
4P
C2 = C 3 C3 = C 4 C2 = C 3
−C1 = C1 C3 − C22 6= 0, −C2 −C1 = C1 C4 − C2 C3 , −C2 C3 = C2 C4 − C32 . C4
124
(7.79)
(7.80)
(7.81)
Now solving for S and P in compact form yields (4)
S=a+b= P = ab =
4S
(4) 4o (4) 4P = (4) 4o
=
C1 C4 − C2 C3 , C1 C3 − C22
C2 C4 − C32 . C1 C3 − C22
(7.82) (7.83)
From Equation (7.82)
b = S − 1.
(7.84)
Substituting Equation (7.84) into Equation (7.83)
a (S − a) = P.
(7.85)
Solving Equation (7.85) for a and b yields the compact form given as
a1,2 =
S±
√
a2 − Sa + P = 0 S2 − 4P 2
−→
b1,2 =
(7.86) S∓
√
S2 − 4P . 2
(7.87)
From Equations (7.76) sets VI and VII are obtained. Thus the following equations can be established
b (C2 − C1 a) = C3 − C2 a −→ C3 = C2 (a + b) − C1 (ab) ,
(7.88)
b (C3 − C2 a) = C4 − C3 a −→ C4 = C3 (a + b) − C2 (ab) .
(7.89)
125
Thus
4(5) o
(5)
4a+b=S
(5)
4ab=P
C2 = C 3 C3 = C 4 C2 = C 3
−C1 = C1 C3 − C22 6= 0, −C2 −C3 = C3 C4 − C2 C3 , −C2 C3 = C2 C4 − C32 . C4
(7.90)
(7.91)
(7.92)
Now solving for a + b = S and ab = P yields (5)
S=a+b=
4S
(5)
4o (5)
P = ab =
4P
(5)
4o
=
=
C1 C4 − C2 C3 , C1 C3 − C22
C2 C4 − C32 . C1 C3 − C22
(7.93) (7.94)
If a = b ⇒ S2 = 4P . From Equations (7.59) and (7.60) A+B = C1 C2 −→ a = = b. C1 a (A + B) = C2
(7.95)
From Equations (7.61) and (7.62)
a2 (A + B) = C3 a3 (A + B) = C4
C3 C2 = 22 C1 C1 C4 C3 −→ a3 = = 23 C1 C1 −→ a2 =
C12 C4 = C23 = C1 C2 C3
−→ C1 C3 − C22 = 0,
(7.96)
−→ C12 C4 = C23 = C1 C2 C3 ,
(7.97)
−→ C2 C4 = C2 C3 .
126
(7.98)
From Equation (7.98)
C2 C4 − C32 = 0.
(7.99)
Thus from Equation (7.99) since a = b it can be deduced that
(Aa + Bb) Aa3 − Bb3 = a4 (A + B)2 = C32
⇒ C2 C4 − C32 = 0.
(7.100)
Considering the special mixed two Gaussian distributions 2
2
2
Ae−α(r−εa) + Be−α(r−εa) = (A + B) e−α(r−εa) .
(7.101)
Thus Equation (7.101) is one pure Gaussian with a normalizing factor A+B. Separately A and B can not be computed only their sum can be computed. Thus the estimation of the specialization of a mixture of two Gaussian α = β is computed using the STG2 algorithm.
7.6
ALGORITHM 3: SPECIALIZATION OF THE GAUSSIAN 3 (STG3) USING THE COMPUTER ALGEBRA SYSTEM (CAS)
In this section the estimation for the specialization of the Gaussian distribution is anR alyzed by a computer algebra system (CAS, Mathematica ). The methodology of the
solution for algorithm 3 (STG3) can also be verified by any computer algebra system (CAS) and numeric simulations Section 7.6.1. From Equations (7.59), (7.60), (7.61), and (7.62), the analytic solution can be computed R is used. Solution sets are by any CAS, in this case Mathematica
127
SET 1a: Let: k1 = 4C1 C33 + 4C23 C4 + C12 C42 − 3C22 C32 − 6C1 C2 C3 C4 , k2 = C1 C2 C3 , k3 = C1 C4 , k4 = C2 C3 , k5 = C22 − C1 C3 , " # p p p 1 A= C 3 2 k1 + 4k3 + C12 4C33 + C4 k3 + k1 − 6k4 − k2 3k4 − 3 k1 , 2k1 2 " # p p p 1 B= C 3 −2 k1 + 4k3 + C12 4C33 + C4 k3 − k1 − 6k4 − k2 3k4 + 3 k1 , 2k1 2 1 p 1 −k3 + k4 + k1 + k3 − k4 , a= k5 2 1 p b= − k1 − k3 + k4 . 2k5 (7.102) SET 2a: Let: k1 = 4C1 C33 + 4C23 C4 + C12 C42 − 3C22 C32 − 6C1 C2 C3 C4 , k2 = C1 C2 C3 , k3 = C1 C4 , k4 = C2 C3 , k5 = C22 − C1 C3 , " # p p p 1 k1 + 6k4 − k3 C23 −2 k1 + 4k3 + C12 4C33 − C4 − k2 3k4 − 3 k1 , A= 2k1 " # p p p 1 B= C23 2 k1 + 4k3 + C12 4C33 + C4 k1 − 6k4 + k3 − k2 3k4 + 3 k1 , 2k1 1 p 1 −k3 + k4 − − k1 + k3 − k4 , a= k5 2 1 p k1 − k3 + k4 . b= 2k5 (7.103)
128
SET 3a: LET: k6 = −3C32 + 4C2 C4 , 1 2 C , k6 2 1 B= −C22 , k6 1 (C3 + k6 ) , a= 2C2 1 b= (C3 − k6 ) . 2C2 A=
(7.104)
SET 4a: LET: k6 = −3C32 + 4C2 C4 , 1 −C22 , A= k6 1 B= C22 , k6 p 1 C 3 − k6 , a= 2C2 p 1 C3 + k6 . b= 2C2
(7.105)
From Equations (7.11) and (7.12), the following system of equations are generated
P1 = −α Aa2 − Bb2 + A + B, P2 = 2α2 Aa2 − Bb2 + A + B, P3 = −α2 Aa3 − Bb3 + α (Aa + Bb) , 2 3 P4 = α Aa3 − Bb3 . 3
(7.106) (7.107) (7.108) (7.109)
From Equations (7.106), (7.107), (7.108) and (7.109), this implies that Equations (7.59),
129
(7.60), (7.61), and (7.62) become P2 , 2α P3 3P4 C2 = + 2, α 2α P2 C3 = 2 , 2α 3P4 C4 = 3 . 2α C 1 = P1 +
7.6.1
(7.110) (7.111) (7.112) (7.113)
Numerical Simulation using Algorithm 3 (STG3)
A numerical simulation for the analytic solution is done by having the following variables
A = 1,
B = 1.6,
α = 0.8,
a = −0.3,
b = 0.4.
(7.114)
Thus P1 = 2.3232, P3 = 0.223744,
P2 = 0.44288, P4 = 2.573653 × 10−2 .
(7.115) (7.116)
Computing the values of Ci for i = 1, 2, 3, 4 from Equations (7.110), (7.111), (7.112) and (7.113) yields
C1 = 2.6000,
(7.117)
C2 = 0.3400,
(7.118)
C3 = 0.3460,
(7.119)
C4 = 7.5399 × 10−2 .
(7.120)
130
From Equations (7.59), (7.60), (7.61) and (7.62), now solving for the variables A, B, a, b yields the following system of equations
C1 = 2.6000 = A + B,
(7.121)
C2 = 0.3400 = Aa + Bb,
(7.122)
C3 = 0.3460 = Aa2 + Bb2 ,
(7.123)
C4 = 7.5399 × 10−2 = Aa3 + Bb3 .
(7.124)
R Using a CAS in this case Mathematica to solve Equations (7.121), (7.122), (7.123),
and (7.124) yields
SET 1:
(7.125)
A = 1,
B = 1.6
a = −0.3,
b = 0.4.
SET 2:
(7.126)
A = 1.6,
B=1
a = 0.4,
b = −0.3.
The solutions of Equations (7.125) and (7.126) are commutative of each solution pair. Thus one set of solutions is chosen from the generated sets.
131
Table 7.4: The Parameter Estimation for the Specialization of the Gaussian Analytic Solution Using the Algorithm STG3
Variable
Exact
STG3
A
1
1
B
1.6
1.6
a
-0.3
-0.3
b
0.4
0.4
α
0.8
0.8
From Table 7.4, it is observed that the algorithm 3 (STG3) gives the best results. Thus the sufficient parameters of the specialization of the Gaussian, can be computed from Equations (7.6), and (7.7). Table 7.5: Sufficient Parameter Estimation using Algorithm 3: Specialization of The Gaussian 3 (STG3) using computer algebra system (CAS)
Values Parameter
Exact
Approximate
µ1
−0.3
−0.3
µ2
0.4
−0.4
σ1
0.39894
0.39894
σ2
0.4
0.4
Thus the sufficient parameters for the mixture of a specialization of the Gaussian, Equation (7.2) are computed using the STG3 algorithm, Table 7.5.
132
7.7
CONCLUSIONS
Algorithm 1: Specialization of the Gaussian 1 (STG1) in the multiple goal functions (MGF) framework Section 7.4 can be improved by using the approximation in STEP 1 of the series of the solution to order two, this will in turn give rise to a linear ordinary differential equation with constant coefficients of order four. This change improves the accuracy of the algorithm but at the cost of computational time and complexity. Thus at order one the results are acceptable 7.1 and 7.2. The STG1 algorithm is also comparable to the MLE and MME, in this case the proposed algorithm STG1 performed better than the MLE and MME, Table 7.2 and 7.3. Algorithm 2: Specialization of the Gaussian 2 (STG2) in the analytic framework 7.5 can be used as a mathematical approach to the estimations of the IGV for a parameter estimation method. The STG2 has rigorous computational needs but the results are accurate and exact. This then eliminates the need for an expert to establish the parameter space for the IGV. The expert can only verify if the estimated parameters fit in the given data. Algorithm 3: Specialization of the Gaussian 3 (STG3) using CAS 7.6 the estimation of the parameters if done by direct application of CAS (Mathematicar ). The results of the STG3 algorithm show that this algorithm estimates the sufficient parameters very closely as illustrated in Tables 7.4 and 7.5. Thus these algorithms can be implemented in software or into a field-programmable gate array (FPGA) for the computation of parameters for a specialization of the Gaussian distribution. considering the accuracy presented here these algorithms will enhance the recognition of the Gaussian distribution as presented in Chapter 3.
133
CHAPTER 8. CONCLUSIONS AND RECOMMENDATIONS
8.1
OVERVIEW
This study was aimed at developing alternative parameter estimation methods for different probability density functions, used to model statistical data.
8.2
INTRODUCTION
Real-world problems fall naturally within the framework of normal theory (Li, 2012). Many iterative methods such as MLE, MME, and Newton-Raphson for MLE, are used to estimate the parameters of the different probability density functions. These methods require IGVs and a definition of the parameter space in order to converge. Thus need to develop algorithms that do not require IGVs and are easy to implement in software for fast computations. Five probability density functions were considered, for the estimation of the parameters without using IGVs. These methods include, the finite mixture distributions of Gaussian pdf, these are a weighted average of a finite number of distributions (Van Dijk, 2009). The FMM can also be described as a convex combination of two or more probability density functions (Filho, 2007). The combination of the probability functions that form the mixture models help in the approximation of arbitrary distributions (Mclachlan & Peel, 2000). Mixture models have been used in many applications including statistical analysis and machine learning such as modeling, clustering, classication and latent class
134
and survival analysis. In Chapter 3, a mixture of multiple Gaussian patterns was examined using a Monte Carlo simulation to estimate the parameter estimation. A method for the recognition of the mixture of multiple Gaussian was developed (RoMGP). The proposed method (RoMGP) produced acceptable results, that may be used as initial guess values for traditional methods (see Table 3.2). In Chapter 4, a Rayleigh distribution was considered. The Rayleigh distribution may arise in the case of random complex numbers where real and imaginary components are i.d.d (independently and identically distributed) Gaussian (normal) distribution with equal variance and zero mean. In this case the absolute value of the complex number is Rayleigh distributed. The rate for a Rayleigh distribution increases linearly, it is appropriate for components that might not have manufacturing defects but age rapidly with time (Akhter & Hirai, 2009). That is to say, the distribution has a monotonic failure rate (Merovci & Elbata, 2015). A method was developed to estimate the one parameter Rayleigh distribution. The developed method was tested on a Monte Carlo simulated data and also on two real data. The proposed method(DLSM) shows comparable results with the MLE when used to model the real data sets. The DLSM gives better results when used to model samples n ≥ 15 (see Table 4.1). In Chapter 5, a two parameter Rayleigh distribution was considered. A method for the estimation of the two parameter Rayleigh distribution was developed and tested on numerical simulation and two real life data. It was established that the proposed method estimates the scale and location parameter of the Rayleigh X distribution better than the MLE and MME (see Table 5.1). In Chapter 6, a multivariate distribution was investigated. Many multivariate distribution rely in some manner on the multivariate Gaussian (Li, 2012), (Timm, 2002) and (Rencher, 2002). A method for estimating the multivariate Gaussian distribution was proposed (principle n-cross sections) was developed and tested on simulated numerical 135
data. It was shown that the developed method estimates the functions of the sufficient parameters of the multivariate Gaussian distribution very close to the exact values (see Table 6.3). In Chapter 7, the parameter estimation for the specialized Gaussian distribution, three algorithms were developed for the estimation of the specialized Gaussian distribution. Algorithm 1 (STG1) was considered in the multiple goal functions (MGF) framework, algorithm 2 (STG2) in the analytic framework, algorithm 3 (STG3) using CAS The STG1 algorithm is comparable to the MLE and MME, in this case the proposed algorithm STG1 performed better than the MLE and MME (see Table 7.2 and 7.3). The STG2 has rigorous computational needs but the results are accurate and exact. The results of the STG3 algorithm show that this algorithm estimates the sufficient parameters very closely to the exact values (see Tables 7.4 and 7.5). Thus the proposed methods can be used as a mathematical approach to the estimations of the IGV for the parameter estimation of a specialized Gaussian distribution.
8.3
MAIN AIM OF THE THESIS
The main aim of this thesis was to propose new parameter estimation approaches for selected probability distribution functions that do not require IGVs. In order to achieve this the following things were done 1. Statistical modeling was done 2. Theorem development for new approaches in estimating parameters for different pdf 3. New and novel approaches that estimate parameters developed 4. Numerical analysis and simulations were done to assess the performance of proposed methods 136
8.4
SUMMARY OF THE THESIS FINDINGS
In Chapter 3, the main aim was the recognition of multiple Gaussian patterns by estimating the sufficient parameters of the overall mixture distribution. Contribution to this chapter was developing a method for the recognition of multiple Gaussian pattern. The new and novel development was the method that does not require initialization using initial guess values was developed, called recognition of multiple Gaussian patterns (RoMGP). The approach of this new method was to identify the even and odd parts of the distribution function. Use truncated series approximations to generate the goal functions, then minimize the goal function to estimate the sufficient parameters. The summary of results for the proposed method recognition of multiple Gaussian patterns (RoMGP) performed on numerical simulations. The RoMGP method can also be used as an algorithmic approach to computing initial approximations for the iteration methods. In Chapter 4, a technique of estimating the scale parameter of a Rayleigh distribution was developed. A theorem for the estimation of the scale parameter for the Rayleigh distribution was stated and proved. The new development in this chapter was a method that does not require IGV called the difference least-squares method (DLSM) was developed. The approach of the proposed method is a series approximation for the distribution, a goal function is minimized in order to obtain the scale parameter. The results of the proposed method shows comparable outcomes with the MLE when used to model real life data sets. It was also observed that the DLSM gives better results when used to model samples n ≥ 15. Chapter 5, a proposed method for the estimation of the location and scale parameter of a two-parameter Rayleigh distribution was stated and tested. The proposed method optimization differential method (ODM) uses linearization of the pdf through differential techniques. The main contribution in this chapter was a theorem for the estimation of any random X variables that are assumed to follow a Rayleigh distribution through the
137
minimization of a goal function. The new contribution is an approach that requires no IGV. The method uses the solution of ODE and series approximation to generate goal functions that are minimized in order to obtain the estimation of the parameters. The aim of the method is to linearize the training data set, then minimize the generated goal function. The results of the proposed ODM performs well when the sample size is large and shows convergence at n ≥ 5, ODM covers more data points than the MLE and MME, thus the ODM estimates the real life data well. In Chapter 6, the aim was to estimate the functions of the sufficient parameters of a multivariate Gaussian distribution using hyperplanes by the proposed method of the principle of n-cross sections (PCS) method. The contribution was the estimation of the sufficient parameter via the principle of n-cross sections of the multivariate Gaussian distribution. The novel method, reducing the multivariate Gaussian distribution to a one-dimension distribution and in turn estimating the parameter function for the multivariate Gaussian distribution. The approach was to spliced the multivariate Gaussian distribution with n number of hyperplanes. This in turn generates n number of equations whose goal functions are in turn minimized to estimate the parameter functions for the multivariate Gaussian distribution. The results from the Monte-Carlo simulations show that the proposed method approximates the multivariate Gaussian well. The method, principle of n-cross sections (PCS) can thus be used to approximate the multivariate Gaussian distribution without having deep knowledge about the conditions for the multivariate Gaussian distribution. Chapter 7, the estimation for the sufficient parameters of the specialization of the Gaussian distribution was established and investigated. An algorithms that does not require IGV in order to estimate the parameters of the specialized Gaussian distribution was developed. The novelty of this method is by using MGF to estimate the parameters for the specialization of the Gaussian distribution. The first algorithm: Specialization of the Gaussian 1 (STG1), using the multiple goal functions framework. The second 138
algorithm: Specialization of the Gaussian 2 (STG2), in the analytic framework. The approach for the specialized Gaussian distribution is to expand the data using the Taylor series of order two. The even and odd parts of the specialized Gaussian function are determined. Since these are solutions of a second order differential equation, multiple goal functions are used to estimate the parameters. These estimates are then used by a brute force minimization of the original specialization of the Gaussian to determine the estimates of the parameters. The results of the new algorithms from the numerical simulations the proposed algorithms (STG1, and STG2) estimate the specialized Gaussian very closely. The algorithms require no initialization and definition for the parameter space, when data is given. Thus the algorithms can be used to estimate the specialized Gaussian distribution without having the knowledge about the parameters space or initial guess values.
8.5
RECOMMENDATIONS FOR FUTURE WORK
Parameter estimations are necessary as they give the development of much needed models for real life data that occur in nature. • Extend the work on the recognition of multiple Gaussian patterns to three-dimensional space. • Consider the estimation of the parameters for mixtures of Rayleigh distributions for one-parameter & two-parameters Rayleigh density functions and their applications. • Consider the integral method for the parameter estimation of the mixed, multivariate Gaussian distribution and also for the one- and two-parameter Rayleigh distributions. • Make use of support vector machines and compare the performance with the principle of n-cross section method for the multivariate Gaussian distribution. This 139
can be a good research topic for a masters student to investigate. • Estimate a mixture of multivariate Gaussian distribution using the principle of n-cross sections method. • This research focused more on the univariate and multivariate Gaussian. A method in estimating the parameters for a mixed multivariate Gaussian distribution can be a good Ph.D. research project in Mathematical Statistics. • Use higher dimension space for the estimation of the parameters i.e. increase the Taylor approximation to order two, that in turn will increase the odd and even parts to order four differential equation. This will require more computational power and resources but improve on the accuracy of the estimation algorithms (STG1).
140
REFERENCES
Aitkin M. & Aitkin I. 1996. A hybrid em/gauss-newton algorithm for maximum likelihood in mixture distributions. Statistics and Computing, 6(2):127–130. Aja-Fernandez S., Alberola-Lopez C., & Westin C. 2008. Noise and signal estimation in magnitude mri and rician distributed images: A lmmse approach. IEEE Transactions on Image Processing, 17(8):1383–1398. Akhter A. & Hirai A. 2009. Estimation of the scale parameter from the rayleigh distribution from type ii singly and doubly censored data. Pakistan Journal of Statistics, Vol. V No. 1:31–45. Anderson T. & Olkin I. 1985. Maximum-likelihood estimation of the parameters of a multivariate normal distribution. Linear Algebra and its Applications, 70:147 – 171. Asmussen S. 2003. Applied Probability and Queues. Stochastic Modelling and Applied Probability. Springer, 2nd ed. edition. ISBN 0387002111. Bretthorst G. 1997. Bayesian Spectrum Analysis and Parameter Estimation, volume 48. Springer-Verlag Berlin Heidelberg. ISBN 3-540-96871-7. Brown H. & Prescott R. 2006. Applied Mixed Models in Medicine. John Wiley & Sons, Ltd, 2nd edition. ISBN 978-0-470-02356-3, 0-470-02356-2. Chansoo K. & Keunhee H. 2009. Estimation of the scale parameter of the rayleigh
141
distribution under general progressive censoring. Journal of the Korean Statistical Society, 38:239–246. Chatfield C. & Collins A. 1980. Introduction To Multivariate Analysis. SpringerScience+Bussiness Media, B.V. ISBN 9780412160301. Christensen N., Mayer R., Knox L., & Luey B. 2001. Bayesian methods for cosmological parameter estimation from cosmic microwave background measurements. Institute of Physics publishing Classical and Quantum Gravity, 18:2677–2688. Dempster A.P., Laird N.M., & Rubin D.B. 1977. Maximum likelihood from incomplete data via the em algorithm. JOURNAL OF THE ROYAL STATISTICAL SOCIETY, SERIES B, 39(1):1–38. Dey S., Dey T., & Kundu D. 2014. Two-parameter rayleigh distribution: Different methods of estimation. American Journal of Mathematical and Management Sciences, 33(1):55–74. Dey T., Day S., & Kundu D. 2016. On progressively type-ii censored two-parameter rayleigh distribution. Communications in Statistics - Simulation and Computation, 45(2):438–455. Diebolt J. & Robert C. 1994. Estimation of finite mixture distributions through bayesian sampling. Journal of the Royal Statistical Society, Vol. 56, No. 2 (1994):363– 375. Duda R., Hart P., & Stork D. 2012. Pattern Classification. JOHN WILEY & SONS. DUDA, R., HART, P. & STORK, D. 2012. Pattern Classification. JOHN WILEY & SONS.
142
Dyer D. & Whisenand C. 1973. Best linear unbiased estimator of the parameter of the rayleigh distribution - part i: Small sample theory for censored order statistics. IEEE TRANSACTIONS ON RELIABILITY, VOL. R-22, NO. 1:27–34. Fabian A., editor 1992. Cluster and Superclusters of Galaxies. Springer Science & Business Media, B.V. ISBN 9789401050951. Fichet B., Poccolo D., Verde R., & Vinci M. 2011. Classification and Multivariate Analysis for Complex Data Structures. Springer-Verlag Berlin Heidelberg. ISBN 9783642133121. Figueiredo M., Anil K., & Jain A. 2002. Unsupervised learning of finite mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(3):381– 396. Filho I. 2007. Mixture Models for the Analysis of Gene Expression: Integration of Multiple Experiments and Cluster Validation. University of Berlin., University of Berlin. Doctoral Dissertation. Fraley C. & Raftery A. 1998. How many cluster? which clustering method? answers via model-based cluster analysis. The Computer Journal, Vol. 41, No. 8:578– 588. Fricker Jr. R. 2006. Statistical Methods in Counterterrorism Game Theory. Springer. ISBN 9780387329048. ¨ hwirth-Schnatter S. 2006. Fru
Finite Mixture and Markov Switching Models.
Springer Series in Statistics. Springer Science & BusiSpringer S, LLC. ISBN 0-38732909-9. Gallant A. 1987. Nonlinear Statistical Models, volume Wiely Series in Probability and Mathematical Statistics. John Wiley & Sons, Inc. 143
Gibson W. 2015. The Method of Moments in Electromagnetics. CRC Press an imprint of the Tylor & Francis Group., 2nd edition. ISBN 9781482235807. Gizon L., Cally P., & Leibacher J., editors 2008. Helioseismology, Asteroseismology, and MHD Connections. Springer. ISBN 9780387894812. Hall A. 2005. Generalized Method of Moments. Oxford University Press. ISBN 0198775210. Hasselblad V. 1966. Estimation of parameters for a mixture of normal distributions. TECHNOMETRICS, 8(3):431–444. Hasselblad V. 1969. Estimation of finite mixtures of distributions from the exponential family. Journal of the American Statistical Association, 64:1459–1471. Hastie T. & Tibshirani R. 1996. Discriminant analysis by gaussian mixtures. Journal of the Royal Statistical Society (B), 58:155–176. Http://www.jstor.org/stable/2346171 [Accessed: 21/04/2016]. Hastie T., Tibshirani R., & Friedman J.H. 2001. The elements of statistical learning. SPRINGER. HASTIE, T. R. TIBSHIRANI, & FRIEDMAN, J. H. 2001. The elements of statistical learning. SPRINGER. Jaakkola T.S. & Jordan M. 2000. Bayesian parameter estimation via variational methods. Statistics and Computing, 10:2537. Jain A. & Dubes R. 1988. Algorithms for Clustering Data. PRENTICE HALL., Eaglewood Cliffs. New Jersey. JAIN, A.K. & DUBES, R. 1988. Algorithms for Clustering Data. Eaglewood Cliffs. New Jersey. PRENTICE HALL. Jain
A.,
Duin
R.,
&
Mao
J. 2000.
Statistical pattern recognition.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1):4– 144
38. Http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=159772 [Accessed: 17/02/2015 and 21/04/2016]. Jasra A., Holmes C., & Stephens D. 2005. Markov chain monte carlo methods and the label switching problem in bayesian mixture modeling. Statistical Science, 20(1):50–67. Johnson N., Kotz S., & Balakrishnan N. 1995. Continuous Multivariate Distributions, volume 2 of Wiley Series in Probability and Mathematical Statistics. John Wiley & Sons, INC., second edition. ISBN 0-471-58494-0. Kaastra J., editor 2008. Clusters of Galaxies: Beyond the Thermal View, volume 134. Springer. ISBN 9780387788746. Kessler D. & McDowell A. 2012. Introducing the fmm procedure for finite mixture models. SAS Global Forum 2012. Khan H., Provost S., & Singh A. 2010. Predictive inference from a two-parameter rayleigh life model given a doubly censored sample. Communication in Statistics, 39(7):1237–1246. Kikawa C. 2013. Methods to Solve Transcendental Least-Squares Problems and Their Statistical Inferences. Ph.D. thesis, Tshwane University of Technology. KIKAWA, C.R. 2013. Methods to Solve Transcendental Least-Squares Problems and Their Statistical Inferences. DTech thesis. TUT. Kikawa C., Shatalov M., Kloppers P., & Mkolesia A. 2015.
Pa-
rameter estimation for a mixture of two univariate gaussian distributions: A comparative analysis of the proposed and maximum likelihood methods.
British Journal of Mathematics & Computer Science,
12(1):1–8.
Http://search.proquest.com/docview/1718892265?accountid=42821 [18/04/2016]. 145
Kotz S., Balakrishnan N., & Johnson N. 2000. Continuous Multivariate Distributions. John Wiley & Sons, Inc. ISBN 9780471183877. Kotz S., Kozubowski T., & Podgorski K. 2001. The Laplace Distribution and Generalizations. Springer Science+Bussiness Media, LLC. ISBN 9781461266464. Kramer S.C. & Sorenson H.W. 1988. Bayesian parameter estimation. IEEE Transactions on Automatic Control, 33(2):217–222. Kundu D. & Raqab M. 2005. Generalized rayleigh distribution: different methods of estimations. Computational Statistics & Data Analysis, 49(1):187 – 200. Li Q. 2012. Speaker Authentication. Springer. ISBN 9783642237300. Liang Z., Jaszczak R., & Coleman R. 1992a. Image Processing, Analysis and Machine Vision. BROOKS/COLE. Liang Z., Jaszczak R., & Coleman R. 1992b.
Parameter estimation of fi-
nite mixtures using the em algorithm and information criteria with application to medical image processing. IEEE Transactions on Nuclear Science, 39(4):1126– 1133.
Http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=159772 [Ac-
cessed: 17/02/2015]. Lindsay B. 1995.
Mixture models: Theory, geometry and applications.
NSF-
CBMS Regional Conference Series in Probability and Statistics, 5:1–3, 5–9, 1–163. Http://www.jstor.org/stable/4153184 [Accessed: 24/02/2015]. Lyons L. & Unel M., editors 2005. Statistical Problems in Particle Physics, Astrophysics and Cosmology. Imperial College Press. ISBN 1860946496. Mahalanobis P. 1936. On the generalised distance in statistics. Proceedings of the National Institute of Sciences of India, 2(1):4955. 146
McLachlan G. & Krishnan T. 2008. The EM Algorithm and Extensions. Wiley Series in Probability and Statistics. Wiley-Interscience Publication JOHN WILEY & SONS, INC. ISBN 9780470191606,9780471201700,0471201707. Mclachlan G.J. & Basford K.E. 1988. Mixture models: Inference and applications to clustering. MARCEL DEKKER, INC., New York, Basel. MCLACHLAN, G. J. & BASFORD, K. E. 1988. Mixture models: Inference and applications to clustering. New York, Basel. MARCEL DEKKER, INC. Mclachlan G.J. & Peel D. 2000. Finite mixture models. Wiley Series in Probability and Statistics. WILEY & SONS., New York. MCLACHLAN, G. J. & PEEL, D. 2000. Finite mixture models. Wiley Series in Probability and Statistics. New York. WILEY & SONS. McNicol D. 2005. A Primer of Signal Detection Theory. Lawrence Erlbaum Associates, Inc. ISBN 0805853235. Meng X. & VanDyk D. 1997. The em algorithm an old folk song sung to a fast new tune. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 59(3):511–567. Merovci F. & Elbata I. 2015. Weibull rayleigh distribution: Theory and applications. Applied Mathematics & Information Sciences (An International Journal), 9, No. 4, 2127-2137 (2015):2127–2137. Mindlin R. 2006. Introduction To The Mathematical Theory Of Vibrations Of Elastic Plates. World Scientific Publishing Co. Pte. Ltd. ISBN 9812703810, 9789812703811, 9789812772497. Mkolesia A., Kikawa C., & Shatalov M. 2016a. Estimation of the rayleigh distribution parameter. Transylvanian Review, 24(8):1158–1163. 147
Mkolesia A., Kikawa C., Shatalov M., & Kalema B. 2016b. Recognition of a mixture of multiple gaussian patterns. International Journal of Pure and Applied Mathematics, 108(2):307–326. Moya A., Suarez J.C., Martn-Rui S., Amado P.J., & Garrido R. 2005. Frequency ratio method for seismic modelling of γdoradusstars. A&A, 443(1):271–282. Murphy K. 2012. Machine Learning: A probabilistic Perspective. The MIT Press. ISBN 9780262018029. Murthy D., Xie M., & Jiang R. 2004. Weibull Models. JOHN WILEY & SONS, INC. ISBN 0-471-36092-9. Palle P. & Esteban C. 2013. Asteroseismology, Canary Islands Winter School of Astrophysics, volume 22. Cambridge University Press. ISBN 9781107029446. Pardoe I. 2012. Applied Regression Modeling. Wiley & Sons INC. Publications., 2nd edition. ISBN 9781118097281, 9781118345054. Pearson K. 1894. Contributions to the mathematical theory of evolution. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 185:71–110. Pessanha J., Oliveira F., & Souza R. 2016. Teaching statistics methods in engineering courses through wing power data. Journal of Engineering Education. Piantadosi S., Kidd C., & Aslin R. 2014. Rich analysis and rational models: inferring individual behavior from infant looking data. Developmental Science, 116. Quandt R. 1972. A new approach to estimating switching regressions. Journal of the American Statistical Association, 67(338):306–310.
148
Rao C. 1948. Large sample tests of statistical hypotheses concerning several parameters with applications to problems of estimation. Proceedings of the Cambridge Philosophical Society, 44:5057. Reddy T. 2011. Applied Data Analysis and Modeling for Energy Engineers and Scientists. Springer US, 1 edition. ISBN 1441996125, 9781441996121. Redner R. & Walker H. 1984. Mixture densities, maximum likelihood and the em algorithm. SIAM Review - Society for Industrial and Applied Mathematics, 26(2):195– 239. Rencher A. 2002. Methods of Multivariate Analysis. Wiley Series in Probability and Statistics. John Wiley & Sons, INC. Publication, second edition. ISBN 0471418897. Rinne H. 2009. The Weibull Distrbution: A Handbook. Chapman & Hall/CRC Press and Tylor & Francis Group. ISBN 978-1-4200-8743-7. Schure M. & Davi J. 2011. The statistical overlap theory of chromatography using power (fractal) statistics. Journal of Chromatography A. Scott D. & Sain S. 2004. Multi-dimensional density estimation. Preprint submitted to Elsevier Science. Searle S. 1971. Linear Models. John Wiley & Sons, Inc. Siddiqui M. 1961. Some problems connected with rayleigh distributions. Journal of Research of the National Bureau of Standards, Boulder Laboratories, Colo., 66D, No. 2, March-April 1962:167–174. Siu N.O. & Kelly D.L. 1998. Bayesian parameter estimation in probabilistic risk assessment1. Reliability Engineering & System Safety . Printed by Elsevier Science Limited, 62(62):89 – 116. 149
Smith R. & Naylor J. 1987. A comparison of maximum likelihood and baysian estimators for the three-parameter weibull distribution. Journal of the Royal Statistical Society, 36:358–369. Soliman A. 2005. Estimation of parameter of life from progressive censored data using burr-xii model. IEEE TRANSACTIONS ON RELIABILITY, Vol. 54. Sonka M., Hlavac V., & Boyle R. 1999. Image Processing, Analysis and Machine Vision. BROOKS/COLE, 511 Forest Lodge Road, Pacific Grove, CA 93950 USA. SONKA, M., HLAVAC, V. & BOYLE, R. 1999. Image Processing, Analysis and Machine Vision. BROOKS/COLE. Spokoiny V. & Dickhaus T. 2015.
Basics of Modern Mathematical Statistics.
Springer Texts in Statistics. Springer-Verlag Berlin Heidelberg, 1 edition. ISBN 9783-642-39908-4,978-3-642-39909-1. Strutt J.W. 2011 Original Publication Year: 1877. The Theory of Sound, volume 2. Cambridge University Press. ISBN 9781139058094. Cambridge Books Online. Timm N. 2002. Applied Multivariate Analysis. Springer Texts in Statistics. SpringerVerlag New York, Inc. ISBN 0387953477. Titterington D., Smith A., & Makov U. 1985. Statistical Analysis of Finite Mixture Distributions. JOHN WILEY & SONS, INC., Chichester United Kingdom. Van Dijk B. 2009. Essays on Finite Mixture Models. Tinbergen Institute. VAN DIJK, B. 2009. Essays on Finite Mixture Models. Tinbergen Institute. Vining G. & Kowalski S. 2010. Statistical Methods for Engineers. Cengage Learning, 3rd edition. ISBN 978-0538735186. Wachter K., Leach R., & Kellogg E. 1978. Parameter estimation in x-ray astronomy using maximum likelihood. The Astrophysical Journal, 1(230):274–287. 150
Wickens T. 2002. Elementary Signal Detection Theory. Oxford UniversitOxford , Inc. ISBN 0195092503. Wolfe P. 1970. The diffraction of waves by slits and strips. SIAM - JOURNAL ON APPLIED MATHEMATICS, 19(1):20–32. Wu C. 1983. On the convergence properties of the em algorithm. The Annals of Statistics, 11(1):95–103. Zerva A. 2009. SPATIAL VARIATION OF SEISMIC GROUND MOTIONS. CRC Press an inprint of Tylor & Francis Group. ISBN 13: 978-0-8493-9929-9.
151