call Moustache distribution exhibiting both heavily skewed distribution as symmetrical distribution by changing its parameters. Hence, this family serves as an.
Eliminating User-Interaction in Probability Density Estimation Kurt Barbé1, Lee Gonzales Fuentes1, Lee Barford2 and Wendy Van Moer1 1. Dept. ELEC/M2ESA, Vrije Universiteit Brussel, Pleinlaan 2, B-1050 Brussels, Belgium 2. Measurement Research Laboratory, Agilent Technologies, 561 Keystone Ave Unit 434, Reno, NV 89503 USA Email: results in a smooth estimate but it comes with the price of Abstract – Despite the extensive literature, describing the an additional degree of freedom. Next to the bandwidth, probability content of measurements remains an the user also needs to select a kernel function. At least for important topic for engineering problems. The histogram smaller data sets an inappropriate choice may introduce a remains the golden standard, even though kernel density bias; a symmetrical kernel for a heavily skewed density estimation is a strong competitor when smooth estimates function. For the orthogonal polynomial approach one are desired. needs to choose an appropriate family of orthogonal Critical user interaction is required for the use of both polynomials together with the maximal degree used. The the histograms and kernel densities. A good choice for the weights of the polynomials used are estimated by a least bandwidth is essential in both cases. On top of that the squares fit such that high degrees often lead to illkernel density method requires a proper choice of the conditioned regression matrices. kernel. Incorrect choices may lead to incorrect results In this paper, we propose a novel approach to density generated by either masking important details or estimation. It is based on the empirical cumulative introducing false details. distribution function with the advantage that no userIn this paper, we propose a new approach which requires interaction is required. This objective is met through the no user-defined choices. The method is therefore fully development of a family of distributions which we shall automatic and provides the user a smooth density call Moustache distribution exhibiting both heavily estimate of the probability content. skewed distribution as symmetrical distribution by Keywords – Probability density, Measurement changing its parameters. Hence, this family serves as an uncertainty, Non-parametric, Data analysis, Histograms, ideal basis for the estimation of any distribution. Probability theory. The remainder of this paper is organized as follows: I. INTRODUCTION Section II highlights the user-interaction problem with a The use of histograms in engineering applications is preliminary example, section III introduces a new family omnipresent. It is used in many engineering problems: of flexible distribution functions able to describe skewed detection of melanoma [1], analysis of astrophysical as symmetrical densities, section IV introduces the novel phenomena [2], ADC-testing benches [3], cardiovascular density estimator, section V illustrates the approach on measurements [4], image pre-processing [5]-[8] and the preliminary example and a time-jitter problem and feature detection [9]-[10]. The histogram allows finally conclusions are formulated in section VI. accessing the probability density function of the data in II. PROBLEM STATEMENT an user-friendly manner without a large computational In this section, a simulation example illustrates the burden. The histogram is considered a golden reference downside of the classical methods: the histogram, kernel but it never produces a smooth result. The lack of density estimator and orthogonal polynomials. smoothness may mask important details or may produce incorrect peaks. Since the 1950s the statistical community An example of a symmetrical but heavily peaked has developed a smooth version of the histogram known distribution is the cubic power of a Gaussian distributed as kernel density estimators [10]-[13]. Although it is less random variable . Some calculations reveal that the popular than the histogram, this technique is adopted in true probability density function (pdf) of is given by various fields. Driven by the Fourier analysis, one can also use a weighted sum of orthogonal polynomials to smooth the histogram [14]. where is the pdf of the Gaussian random variable . A mutual problem for the histogram and the kernel In a first example, we select the optimal number (by trial densities is the bandwidth selection problem. There exists and error) of bins for the histogram and kernel density no best bandwidth or optimal number of bins for the estimator and we choose the optimal maximal degree for histogram. A good choice depends on the actual shape of the orthogonal polynomials. the probability density function. Depending on the actual In order to match the true pdf, we selected 195 bins or a data distribution and the goals of the analysis, different bandwidth of 0.4 for the histogram for a data set of 1000 bin widths may be appropriate. Furthermore, different samples. For the kernel density estimator, the same choices lead to revealing different features of the data. bandwidth was chosen together with the Epanechnikov There are some interesting rules of thumb to guide the kernel (half circle) [18]. The orthogonal polynomial were user [15]-[17]. Switching to kernel density estimators Hermite polynomials [19] with a maximal degree of 175.
978-1-4673-4623-8/13/$31.00 ©2013 IEEE
1295
All the different methods reveal similar results as shown in Figure 1.
(2)
Probability density
0.5 0.4
with the derivative of . Equation (2) describes 5 constraints that need to hold such that the lowest degree of the polynomial equals 4. A straightforward calculation solving the set of equations (2) results in,
0.3 0.2 0.1 0 -20
-15
-10
-5
0 data
5
10
15
20
with . The probability density function of the Moustache family of pdfs becomes, (3) It is a family of unimodal density functions parameterized in the parameter allowing skewed and symmetrical densities. The family of density functions has a compact support such that the interval should be extended to the minimal and maximal observation of the data set covering the data range.
Figure 1: Different density estimators with options optimally chosen: true pdf (red), histogram (blue), kernel density estimator (green) and orthogonal polynomials (black).
Using a default setting for the histogram and kernel density bandwidth with as the number of samples and for the maximal degree of the polynomials, the Akaike’s information criterion give different results as seen in Figure 2.
2 Probability density
probability density
0.5 0.4 0.3 0.2 0.1 0 -20
-15
-10
-5
0 data
5
10
15
1.5 1 0.5 0 -0.5 0
20
0.2
0.4
Data
0.6
0.8
1
Figure 3: The Moustache family of pdfs parameterized in the parameter . The most left ( ) and most right skewed ( ) densities are shown in dark green and light green respectively where the only symmetrical density is given in red.
Figure 2: Different density estimators where the options were classically chosen: true pdf (red), histogram (blue), kernel density estimator (green) and orthogonal polynomials (black).
An important observation is that the classical alternatives to the histogram are simply histogram smoothers such that one should not expect improved performances w.r.t. the histogram. The histogram is unfortunately not an unbiased estimator of the pdf. The bias is proportional to the slope of the pdf [20]:
IV. NOVEL DENSITY ESTIMATOR A. Philosophy of the new approach The estimation procedure is performed in terms of the empirical cumulative distribution function. First, the data is ordered in ascending order, denoted by the vector , whereto a cumulative probability of for is assigned. This step provides the raw empirical cumulative distribution function. This is simple and fast to compute by sorting the data and assigning the probabilities to the th sorted sample.
(1) with the center value of bin . The slope of the pdf is in a neighborhood of its point of symmetry very large which explains the histogram’s poor performance. The bandwidth needs to be sufficiently small to counter the steep slope as predicted by equation (1). III. MOUSTACHE DISTRIBUTIONS The basis of the new proposed approach consists of a new family of distribution functions. This family of distributions is very flexible such that it captures symmetrical and skewed pdfs. The family consists of polynomial cumulative distribution functions in the interval of the lowest degree satisfying the following constraints:
1
k/N
0.8 0.6 0.4 0.2 0
-2
-1
0 Data
1
2
Figure 4: Raw empirical cumulative distribution function for a standard Gaussian distributed data-set of 100 samples.
1296
estimated locally, we want to compute the derivate over the different local parts to enhance the smoothing over the different segments. As a result, we approximate the derivative of the function by,
Next, the ordered data set is partitioned in windows of length . The points are sufficient for the polynomial to provide a fit through these points. However, we add the constraint that two consecutive windows need to be twice continuously differentiable from the left. This constraint is sufficient to obtain smooth estimates of the probability density function. Hence, the Moustache distributions are locally estimated to match the raw empirical cumulative distribution. In a second step, the pdf is obtained by computing the derivative of the estimated cumulative distribution. B. Flow chart of the approach The algorithm is recursive over the different windows of . We observed that this choice results in wellconditioned regression matrices.
Appendix B shows that the optimal solution is given, as expected, by with
. V. EXAMPLES A. Revisiting the preliminary example We apply the novel approach in the second case of the preliminary example in which standard choices were made to select the bandwidth and the maximal degree for the orthogonal polynomials. The result is shown in Figure 5. The true pdf holds a vertical asymptote at 0 which makes it particularly hard for smooth estimates like the kernel density and orthogonal polynomial estimator to follow the slope. The novel method is very successful of estimating the sharp mode and does not suffer from a possibly poorly chosen bandwidth.
1. Initialization of the algorithm We start by normalizing the data width to the interval [0,1] since the Moustache distributions are defined on the unit interval. Then, we order the data in ascending order. Let denote the ordered data set while denotes the empirical cumulative distribution associated to every ordered data-point. The regression matrix of the Moustache distribution function is given by,
probability density
0.15
Note that the is assumed to be computed elementwise. Next, the first three points of the vector are select to fit the first Moustache distribution. Hence, this results where denotes the in an initial fit transpose operator. 2. Update of the algorithm from step to
Histogram True density Kernel density Orthogonal polynomial Novel approach
0.1
0.05
0 -60
-40
-20
0 data
20
40
60
Figure 5: Different density estimators where the options were classically chosen
Let the estimated cumulative distribution function available in step computed on the ordered data set . We numerically compute the first two derivatives of from the left at the point . The derivatives are given by the vector . Let us define the constraint matrix as,
In the resulting fit of the empirical cumulative distribution function is shown. The derivative as computed in the third step of the algorithm results in the magenta curve in Figure 5. 1
Cumulative distribution
0.8
We solve the constrained least squares problem, where ,
Empirical distribution function Smooth estimation
0.6
0.4
0.2
0 -200
The analytical solution is shown in Appendix A. The estimated parameters from this step allows updating the estimated cumulative distribution function such that,
-100
0 Data
100
Figure 6: Illustration of the algorithm on preliminary example
B. Time jitter measurements In this simulation example, we consider the measurement of a sine-wave with . The sampling frequency is 22.5 kHz and points where collected. The sampling period was distorted by uniform distribution in the interval ms. An example of the time domain signal is shown in Figure 7.
3. Computing a smooth estimate of the pdf Finishing the previous step provides the user with an estimate of the cumulative distribution function by stacking the locally estimated . Hence, to obtain an estimate of the pdf, it remains to compute the derivative of . Since the cumulative distribution function is
1297
The oscillating behavior has vanished from the kernel density estimate and both the novel method as the kernel density estimator show similar result. The orthogonal polynomial exhibits problems to follow the sharp mode. C. Measurement noise characterization We consider the following as depicted in Figure 10. A voltage signal is realized by the actuator given by . In order to scramble the information a white noise sequence is added to . Two Helmhotlz coils are positioned such that both have the same angle w.r.t the surface at a distance . The applied voltage induces a voltage in the secondary coil which is measured by the acquisition system connected to the PC.
Amplitude [V]
1 0.5 0 -0.5 -1 0
2
4
6 8 Time [ms]
10
12
Figure 7: Time jitter example – in blue and the time-distorted measurement denoted by the green crosses.
The probability density function of the error is estimated. In a first example the number of collected samples is . The results are shown in Figure 8.
PC
8 Probability density
R 6 4
Actuator
2 Noise gen.
0
-0.2
-0.1
0 x (t)-x(t) [V]
0.1
0.2
Figure 10: Measurement set-up: induced voltage through Helmholtz coils
0
Figure 8: The probability density estimates for the time jitter example based on 2500 samples: histogram (blue), kernel (green), orthogonal polynomials (black) and the novel method (magenta).
In the current measurement experiment the chosen reference signal was a single sinewave with mV, Hz and . The noise sequence was a
The kernel density estimate tends to act as a smoothing operator for the histogram for which the Epanechnikov kernel was used (50 bins and a bandwidth of 8mV). The degree of the Hermite polynomials was chosen to be 20. Increasing the order further resulted in ill-conditioned regression matrices. The orthogonal polynomials and the novel method treat the various local minima and maxima shown by the histogram as irrelevant features. We can verify this claim by increasing the sampling frequency such that in the same observation window, we collect 15000 samples. In this case 122 bins were used for the histogram and a bandwidth of 3.3mV for the Epanechnikov kernel density estimator. The order of the orthogonal polynomials was not changed. The result is shown in Figure 9.
stationary white process following the distribution with the noise power and a chi-square with 1 degree of freedom. This signal is realized by a Gaussian noise generator passed through a mixer to square the noise signal. 0.06
Measured Signal Estimated noise signal Averaged Signal
0.04
Amplitude
0.02 0 -0.02 -0.04
Probability density
8 -0.06 0
6
2
3
4
5 Time [s]
6
7
8
9
Figure 11: Measurement induced voltage – raw signal (blue), averaged signal over the different periods (red) and difference (green)
4
In Figure 11, we show a measurement of the raw signal over a time interval of 10 seconds in blue. Since the true frequency is known, we can average over the 10 different in red. Finally the difference between these signals reveal an estimate of the noise contribution in green. The actual noise signal is a white signal but its probability distribution is a mixture of the applied chi-square distribution and Gaussian noise due to background noise.
2 0
1
-0.2
-0.1
0 x0(t)-x(t) [V]
0.2
0.1
Figure 9: The probability density estimates for the time jitter example based on 15000 samples: histogram (blue), kernel (green), orthogonal polynomials (black) and the novel method (magenta).
1298
10
APPENDIX A: CONSTRAINED LINEAR REGRESSION Let us solve the linear system of equations,
We apply the different techniques to obtain an estimate of the probability density function of this noise signal. We start by a smooth estimation of the cumulative distribution. The estimation algorithm shows a nice fit in Figure 12.
The least squares solution is well known to be given by, In this case, we intend obtaining the solution given the additional constraints that . Note that the design matrix and are not necessarily of the same size. We compute the solution by the Lagrange multiplier,
1
Distribution function
0.8
0.6
0.4
We compute the Jacobian of the objective function,
0.2
0 -0.05
0
0.05 Noise signal [V]
0.1
Solving for the parameter vector constrained least squares solution,
0.15
Figure 12: Distribution function of the noise signal – blue (smooth estimate) and red (empirical)
APPENDIX B: 3-POINT FINITE DIFFERENCE By means of a first order Taylor approximation, we obtain
The computation of the derivative of the fitted cumulative distribution by the algorithm reveals the estimated pdf. However in we see that all probability density estimation techniques reveal similar results. The only advantage of the novel technique is that no user interaction is required whereas the classical approaches used a pre-defined bandwidth of 3.5 mV. 30
Probability density
with . Thus, we obtain the under-determined system
Histogram Kernel density Orthogonal polynomial Novel approach
25
results in the
To uniquely identify the parameters minimal decrease of the SNR such that,
20
we desire a
should be minimized. The optimal solution is given by,
15 10 5 0 -0.05
0
0.05 Noise signal [V]
0.1
REFERENCES [1] Sforza G.; Castellano G.; Arika S.A.; LeAnder R.W.; Stanley R.J.; Stoecker W.V.; Hagerty J.R., "Using Adaptive Thresholding and Skewness Correction to Detect Gray Areas in Melanoma In Situ Images ," IEEE Transactions on Instrumentation and Measurement, Vol. 61, No. 7, pp. 1839 - 1847, July 2012. [2] Bovy J. et al., "Photometric redshifts and quasar probabilities from a single, data-driven generative model," The astrophysical journal, Vol. 749, No. 1, April 2012. [3] Alegria, F.C.; da Silva, H.P., "Choosing Between Terminal and Independently Based Gain and Offset Error in the ADC Histogram Test," IEEE Transactions on Instrumentation and Measurement, Vol. 61, No. 1, pp. 9 16, January 2012. [4] Lee Q.Y. et al., "Multivariate classification of systemic vascular resistance using photoplethysmography ," Physiological Measurement, Vol. 32, No. 8, August 2011. [5] Thomas, G.; Flores-Tapia, D.; Pistorius, S., "Histogram Specification: A Fast and Flexible Method to Process Digital Images," IEEE Transactions on Instrumentation and Measurement, Vol. 60, No. 5, pp. 1565 - 1578, May 2011.
0.15
Figure 13: Estimated probability density function on the measurement example.
VI. CONCLUSIONS In this paper the probability density function was estimated by fitting the empirical cumulative distribution function. A very flexible family of Moustache distribution functions was introduced to fit any cumulative distribution function given by the data. The advantage of fitting the cumulative distribution instead of the density is that user interaction can be eliminated and the examples reveal that this approach is similar to the alternatives if proper bandwidth are selected. For improper bandwidth the novel technique outperforms the classical approaches. ACKNOWLEDGEMENT This research was funded in part by an Agilent Foundation Research grant #2705-BEL12, a Post-doctoral research fellowship of the Flemish Research Foundation FWO #1.2.046.11.N.00 and the research council of the VUB.
1299
[6] Arnaud De Decker, Damien François, Michel Verleysen, John A. Lee, "Mode estimation in highdimensional spaces with flat-top kernels: Application to image denoising ," Neurocomputing, Vol. 74, No. 9, pp. 1402 - 1410, April 2011. [7] Zagorodnov, V.; Ciptadi, A., "Component Analysis Approach to Estimation of Tissue Intensity Distributions of 3D Images ," IEEE Transactions on Medical Imaging, Vol. 30, No. 3, pp. 838 - 848, March 2011. [8] Krstinic, D.; Skelin, A.K.; Slapnicar, I., "Fast twostep histogram-based image segmentation ," IET Image Processing, Vol. 5, No. 1, pp. 63 - 72, February 2011. [9] Aldhubaib, F.; Lui, H.-S.; Shuley, N.V.; Al-Zayed, A., "Aspect segmentation and feature selection of radar targets based on average probability of error," IET Microwaves, Antennas & Propagation, Vol. 4, No. 10, pp. 1654 - 1664, October 2010. [10] Akgul, C.B.; Sankur, B.; Yemez, Y.; Schmitt, F., "3D Model Retrieval Using Probability Density-Based Shape Descriptors," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 31, No. 6, pp. 1117 - 1133, June 2009. [11] Nadaraya, E., “On Estimating Regression,” Theory of Probability & Its Applications, Vol. 9, No. 1, pp. 141142, 1964. [12] Azzalini, A, “A note on the estimation of a distribution function and quantiles by a kernel method,” Biometrika, Vol. 68, No. 1,: pp. 326-328, 1981. [13] Silverman, B.W., “Density estimation for statistics and data analysis,” Chapman & Hall/CRC, 1986. [14] S. B. Provost, and M. Jiang, “Orthogonal Polynomial Density Estimates Alternative Representation and Degree Selection” International Journal of Computational and Mathematical Sciences, vol., no., pp. 17-24, 2012 [15] Sturges, H. A., "The choice of a class interval". J American Statistical Association: pp. 65–66, 1926. [16] Doane, D.P., Aesthetic frequency classification. American Statistician, Vol. 30: pp. 181–183, 1976. [17] Scott, D. W., "On optimal and data-based histograms," Biometrika, Vol. 66, No. 3, pp. 605–610, 1979. [18] Comaniciu, D; Meer, P, "Mean shift: A robust approach toward feature space analysis", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, No. 5, pp. 603–619, 2002 [19] Kaplan W., Advanced Calculus, Addison-Wesley, 4 aug. 2002, 741 pages. [20] Härdle W., Müller M., Sperlich S. and Werwatz A, Nonparametric and semi-parametric models, Springer, 14 May 2004, 299 pages.
1300