AN ADJUSTED BOXPLOT FOR SKEWED DISTRIBUTIONS

Recommend Documents

An Adjusted Boxplot for Skewed Distributions - K.U.Leuven

Oct 30, 2007 - probability at the lognormal distribution. Note that for the normal .... exponential model is analogous to that of the linear case. At the exponential.

Approximating Independent Loss Distributions with an Adjusted ...

The growth of the Single Tranche synthetic CDO (STCDO) market has been greatly assisted by the existence of pricing models which are flexible enough to fit ...

A Constructive Representation of Univariate Skewed Distributions

As examples, the Normal and Logistic distributions have MG = â while the heavier tailed tÎ½ has. MG = Î½. ..... distribution is always unimodal with mode at y = log.

Linear Time Sorting of Skewed Distributions

the time to build word-based Hu man codes. 1 Introduction. Sorting a generic list of numbers is a well stud- ied problem that can be e ciently solved by using.

conditionally specified multivariate skewed distributions - CiteSeerX

The basic skew normal model f(x; Î») = 2Ï(x)Î¦(Î»x), where Ï (respectively Î¦) is the standard normal density (respectively distribution), may be used as a.

Skewed Reflected Distributions Generated by the

exp(â|z|/Î¸) [1 + sgn(Î»z)(1 â exp(â|Î»z|Î±))] , z,Î» â R , Î±,Î¸ > 0. From formula 3.381(4) in Gradshteyn and Ryzhik (1965) we obtain the k-th moment of the distribution ...

Skewed Wealth Distributions - Department of Economics - NYU

above the "bliss point," marginal utility can become negative, creating complications. For an ...... https://sites.googl

Skewed Wealth Distributions - Department of Economics - NYU

In all places and all times, the distribution of income remains the same. Nei- ther institutional change ... constant of

Penultimate gamma approximation in the CLT for skewed distributions

and include an example concerning the noncentral chi-squared distribution. Finally, in Section 4, we present two illustrative examples concerning sums of ...

Skewed alpha-stable distributions for modeling and ... - Tubitak Journals

Urla, 35430, ËIzmir-TURKEY e-mail: [email protected]. Received: 09.02.2011. Abstract. Music information retrieval and particularly musical instrument ...

Bayesian Inference for Shape Mixtures of Skewed Distributions, with

We introduce a class of shape mixtures of skewed distributions and .... (4) can be applied in the Bayesian specification of the SN model for a ..... Some multivariate extensions of the skew-generalized-normal pdf in (12) are .... Moreover, Sahu, Dey

Control Charts for Skewed Distributions: Weibull, Gamma, and ...

vations on lifetimes in accelerated life test samples are often skewed. When the quality variable ... nearly symmetric to highly skewed. Based on the simulation ...

Sample size calculations for skewed distributions - Springer Link

the sample size may result in the actual power differing from the nominal value. Methods have been proposed for the specific cases of logistic [11-14] or Poisson ...

Bayesian Inference for Shape Mixtures of Skewed Distributions, with ...

consists of skew-symmetric (SS) distributions with probability density function (pdf) of the form ...... âBayesian nonparametric meta-analysis using Polya tree ...

An ECM algorithm for Skewed Multivariate Variance

Apr 9, 2015 - Madan and Seneta (1987) applied the ML approach using characteristic ... It is well known that the VG distribution has normal mean-variance mixtures rep- .... distribution with the probability density function (pdf) given by: fV G(y) =

A reflection on outlier detection and skewed distributions - DiVA portal

Oct 20, 2017 - A reflection on outlier detection and skewed distributions. Kenneth ... outliers) observations deviated to some extent from the Normal distribution. As stated in ... work, even after application of a suitable transformation.â (Carlin

Skewed Wealth Distributions: Theory and Empirics - Department of ...

F. S. Fitzgerald: The rich are different from you and me. ... properties of distributions of wealth from the mechanics o

a new class of skewed multivariate distributions with applications to ...

Then we use this new class of distributions in a multivariate regression context and propose .... As an illustration, assuming a common fj(Â·) = f(Â·), j = 1,...,m, the ...

Delivering Left-Skewed Portfolio Payoff Distributions in the ... - MDPI

Aug 21, 2015 - Keywords: portfolio management; payoff distributions; pension ... Providing a dynamic portfolio investor with a payoff distribution overcomes.

Delivering Left-Skewed Portfolio Payoff Distributions in the ... - MDPI

Aug 21, 2015 - transaction fees are required for the cautious-relaxed strategy to lose .... A âsemi" analytic solution could be obtained by a functional expansion.

Symmetries and structure of skewed and double distributions

Oct 23, 1998 - The structure of the integral (5) relating double distributions with the usual ones has a simple graphic illustration. (see Fig.3): integrating DDs ...

On Skewed Multi-dimensional Distributions: the FusionRP ... - UCR CS

for the case of Bob) that are very skewed. Most real-life data are skewed. ..... For Î±, we use Newton's method to find the Maxi- mum Likelihood Estimator (MLE), ...

Parton Skewed Distributions in the Pion and Quark-Hadron Duality

Alexander P. Bakulev, Rusko Ruskov â. Bogolyubov Laboratory of Theoretical Physics, JINR, 141980 Dubna, Russia and. Klaus Goeke, N. G. Stefanis.

Impact of Less Skewed Distributions on Efficiency and Effectiveness of ...

(1) Fondazione Bruno Kessler (FBK-irst), via Sommarive 18, I-38100 Povo, Trento .... instances Blum and Langley (1997) pointed out that as learning progresses ...

AN ADJUSTED BOXPLOT FOR SKEWED DISTRIBUTIONS

Download PDF

32 downloads 477 Views 843KB Size Report

Comment

However, when the data are skewed, usually too many points are classified ..... A frequently used graphical tool to analyze a univariate data set is the boxplot.

COMPSTAT’2004 Symposium

c Physica-Verlag/Springer 2004

AN ADJUSTED BOXPLOT FOR SKEWED DISTRIBUTIONS Ellen Vanderviere and Mia Huber Key words: Boxplot, skewness, medcouple. COMPSTAT 2004 section: Graphics. Abstract: The boxplot is a very popular graphical tool to visualize the distribution of continuous univariate data. First of all, it shows information about the location and the spread of the data by means of the median and the interquartile range. The length of the whiskers on both sides of the box and the position of the median within the box are helpful to detect possible skewness in the data. Finally, observations that fall outside the whiskers are pinpointed as outliers, hence the boxplot also includes information from the tails. However, when the data are skewed, usually too many points are classified as outliers. This is because the outlier rule is solely based on measures of location and scale, and the cutoff values are derived from the normal distribution. We present a generalization of the boxplot that includes a robust measure of skewness in the determination of the whiskers. We show with several simulation results that this adjusted boxplot gives a more accurate representation of the data and of possible outliers.

1

Introduction

One of the most frequently used graphical techniques for analyzing a univariate data set is the boxplot, proposed by Tukey [6]. If Xn = {x1 , x2 , . . . , xn } is a univariate data set, the boxplot is constructed by • putting a line at the height of the sample median Q2 • drawing a box from the first quartile Q1 to the third quartile Q3 • classifying all points outside the interval [Q1 − 1.5 IQR ; Q3 + 1.5 IQR]

(with IQR = Q3 − Q1 )

as outlier and marking them on the plot • drawing the whiskers (i.e. the lines that go from the ends of the box to the most remote points that are no outliers) This construction implies that a boxplot gives information about the location, spread, skewness and tails of the data. However, in some cases the information about the tails that is given by the boxplot, is not reliable. As was already mentioned in Hoaglin, Mosteller

1934

Ellen Vandervieren and Mia Hubert

and Tukey [3], too many points are classified as outlier when the data are sampled from a skewed distribution. Consider e.g. the chi-squared distribution with df degrees of freedom, which is very skewed for small df . Table 1 lists the theoretical lower whisker and the upper whisker for df = 1, 5 and 20. The last column gives the probability of a type I error, which we define as the probability to exceed the whiskers, or equivalently, to mark regular observations as outliers. We see that this probability is about 7.6% for χ21 which is very high, and decreases with df . For the normal distribution on the other hand, the expected percentage of outliers is only 0.7%. Distr. χ21 χ25 χ220 N(0,1)

Lower outlier cutoff -1.730 -3.252 2.888 -2.698

Upper outlier cutoff 3.155 12.552 36.392 2.698

Total % outliers 7.58 2.80 1.39 0.70

Table 1: Theoretical lower and upper outlier cutoff values for several distributions, and the expected percentage of classified outliers according to the boxplot rule.

The large discrepancy between these percentages is caused by the fact that the outlier rule is solely based on measures of location and scale, and the cutoff values are derived from the normal distribution. We present a generalization of the boxplot that includes a robust measure of skewness in the determination of the whiskers. To construct this adjusted boxplot we will derive new outlier rules at the population level. To draw the boxplot at a particular data set, we then just need to plug in the finite-sample estimates.

2 2.1

Generalization of the boxplot A robust measure of skewness

To measure the skewness of a continuous distribution F , we will use the medcouple, which we denote as MC. It is defined as MC(F ) =

med

x1