Theory and Methods A method for estimating lorenz ...

4 downloads 0 Views 946KB Size Report
Publisher Taylor & Francis. Informa Ltd Registered ... Enrique Castillo a; Ali S. Hadi b; José María Sarabia c a Department of Applied ... This paper proposes a new method for estimating the parameters of Lorenz Curves. (LC's) and fitting LC's ...
This article was downloaded by: [Imperial College] On: 25 February 2010 Access details: Access Details: [subscription number 758844008] Publisher Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 3741 Mortimer Street, London W1T 3JH, UK

Communications in Statistics - Theory and Methods

Publication details, including instructions for authors and subscription information: http://www.informaworld.com/smpp/title~content=t713597238

A method for estimating lorenz curves

Enrique Castillo a; Ali S. Hadi b; José María Sarabia c a Department of Applied Mathematics and Computational Sciences, University of Cantabria, Santander, Spain b Department of Statistical Sciences, Cornell University, Ithaca, NY, USA c Department of Economics, University of Cantabria, Santander, Spain

To cite this Article Castillo, Enrique, Hadi, Ali S. and Sarabia, José María(1998) 'A method for estimating lorenz curves',

Communications in Statistics - Theory and Methods, 27: 8, 2037 — 2063 To link to this Article: DOI: 10.1080/03610929808832208 URL: http://dx.doi.org/10.1080/03610929808832208

PLEASE SCROLL DOWN FOR ARTICLE Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf This article may be used for research, teaching and private study purposes. Any substantial or systematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.

COMMUN. STATIST.-THEORY METH., 27(8), 2037-2063 (1 998)

A METHOD FOR ESTIMATING LOREN2 CURVES Enrique Castillo Department of Applied Mathematics and Computational Sciences, University of Cantabria, 39005 Santander, Spain

Downloaded By: [Imperial College] At: 11:40 25 February 2010

Ali S. Hadi Department of Statistical Sciences, Cornell University, 358 Ives Hall, Ithaca, NY 14853-3901, USA Jose Maria Sarabia Department of Economics, University of Cantabria, 39005 Santander, Spain Key Words: Elemental percentile method, Gzni index, Income inequality measures, Non-linear least-squares, Parameter estimation, Pareto distribution, Pietra index, Robust estimation, Spanish Family Income.

ABSTRACT This paper proposes a new method for estimating the parameters of Lorenz Curves (LC's) and fitting LC's to observed data. The method is very general. It is applicable to any family of LC's as long as it is given in closed form which is often the case in practice. The method can also be applied to either the LC or to its associated distribution. The estimators are easy to compute as they are obtained one at a time by solving only one equation in one unknown and in many cases the solutions are given in closed-forms. An additional advantage, that is not shared with the currently used method of estimation, is that the method is invariant as to the specification of which variable is written as a function of the other in the LC form. The method is applied to the most commonly suggested LC's families. An example of real-life data is used to illustrate the methodology. A simulation study is performed to study the properties of the proposed estimators and to compare them with existing ones. The results seem to indicate that the proposed estimators have good properties and they often perform much better than the existing ones.

Copyright Q3 1998 by Marcel Dekker, Inc.

2038

1

CASTILLO, HADI, AND SARABIA

INTRODUCTION

Downloaded By: [Imperial College] At: 11:40 25 February 2010

Let F(x; 0) be the cumulative distribution function (cdf) of a continuous random variable X with support in a subset of the non-negative real numbers and with a finite expectation p = J x f (x; 8)&, where f (x; 6) is the corresponding probability density function. Here 0 is an unknown, possibly vector-valued, parameter. Associated with F(x; 0) is the function

where F-l(y; 6) = sup{% : F ( x ; 6) 5 y). The function L(p; 8) is known as the Lorenz Cumre (LC) associated with F(x; 0) (Gastwirth, 1971). For simplicity of notation we shall sometimes write F(x) and L(p) instead of F(x; 8 ) and L(p; 8), respectively. A Lorenz curve is usually used in practice to study the income and wealth distributions and, for that matter, the distributions of other non-negative random variables. A Lorenz curve can be thought of as a function q = L(p) which describes the relationship between the proportion of the total income q received by the proportion of units p that receives income up to x . From a given LC, one can derive measures of income inequality such as the Pietra and Gini indexes. Lorenz curves are particularly useful in this regard because they can be used for fitting grouped data (Slottje, 1990), a form usually used to summarize income data. A range of the applications of Lorenz curves in economics can be found in Kakwani (1977) and Ryu and Slottje (1996). It can be seen from (1) that the function L(p) satisfies the following conditions:

where L1(p) and L1'(p) are the first and second derivatives of L(p) with respect to p. These four conditions are necessary and sufficient for any continuous function on [O, 11 to be a LC. As given by the following theorem, a LC characterizes its cdf. T h e o r e m 1 Let L(p) be any continuous function satisfying the conditions i n (2). If LN(p) exists and is positive everywhere i n an interval ( a ,b), then the associated density function is given by f (x) = [~L'~(F(x))]-',where p is the mean of X . Accordingly, if F(x) is given, the associated L(p) can be obtained using (1); see, for example, Arnold (1983) and McDonald (1984) and the references therein. Conversely, if L(p) is given, the associated f(x) can be derived, at least implicitly, using Theorem 1. This fact has lead some authors to start by defining a functional form for an LC which satisfies the conditions in (2),then deriving the corresponding cdf F(x) using Theorem 1. Parametric models for LC's, have been proposed by Kakwani and Podder (1973), Rasche et al. (1980), Pakes (1981), Agganval (1984), Aggarwal and Singh (1984), Arnold (1986), Villaseiior and Arnold (1989), Ortega et al. (1991), Chotikapanich (1993) and Sarabia (1997). Arnold et al. (1987) generate, from any strongly unimodal density, a oneparameter family of LC's. The corresponding family is also Loren2 ordered with respect to the indexing parameter.

Downloaded By: [Imperial College] At: 11:40 25 February 2010

ESTIMATING LORENZ CURVES

2039

Basmann et al. (1990) considered several descriptive parametric approximations of empirical LC's which generalized the Kakwani and Podder proposal. Recently, Ryu and Slottje (1996) introduces two flexible form approaches to approximate Lorenz curves. Their method is based on maximum entropy estimation. Unfortunately, very few proposals allow for explicit expression of both the LC and the corresponding density cdf (Villasefior and Arnold, 1989). The choice of methods for estimating the parameter 6 depends on two considerations: (a) Whether or not the functional form of f(x) is known explicitly, and (b) The form of the available data (whether the data are grouped or ungrouped). When the form of f (x) is known and a set of ungrouped data is available, classical estimation methods, such as the maximum likelihood and the method of moments, could be utilized t o estimate 0. Alternatively, the associated LC could be derived and the the parameters are estimated by fitting the LC to the data. It is note worthy that apart from a scale transformation there is a one-to-one correspondence between the Lorenz curve, L(p), and the distribution function F(x). This implies that from the data (pi, qi); i = 1,. . . ,n it is not possible to estimate F(x). In this paper we address the problem of the direct estimation of L(p). On the other hand, iff (x) is not expressed explicitly or if the available data set is grouped data, the LC can be used to estimate 6'. In this paper we use the elemental percentile method (EPM) suggested by Castillo and Hadi (1995) to estimate 0. This method can be applied to both grouped and ungrouped data. But since most income data are available in grouped form, we shall give a detailed treatment to the application of the proposed method to grouped data. The rest of this article is organized as follows. Section 2 gives a brief description of the proposed estimation method. Applications of the method to several of the LC's proposed in the literature are given in Section 3. A real-life data set is used in Section 4 to illustrate the method and to compare it with the currently favored estimation methods. Further evaluations and comparisons are performed through a simulation ex~erimentin Section 5. Section 6 concludes with some remarks. Finally, some technical results required for computational purposes are given in the appendix.

The PROPOSED METHOD

2

We advocate the use of the elemental percentile method to estimate the parameters using either F(x; 0) or L(p; 0). This method consists of two stages: in the first stage a set of initial estimates of the parameters are obtained and in the second stage, these estimates are combined in a suitable way to obtain the final estimates of the parameters.

Initial Estimates Using F ( x ;8) XI:,, . . . ,x,:, be the observed order statistics of

2.1

Let a random sample of size n obtained from F(x;6). The method starts by equating the cdf evaluated a t the observed order statistics t o their corresponding percentile values and then using the resulting equations as a basis for obtaining initial estimates of the parameters. Let

2040

CASTILLO, HADI, AND SARABIA

I = {i,,. . . ,i,) be a set of T distinct indices, where T is the number of parameters in 0, ij E {1,2, . . . ,n ) , j = {1,2, . . . ,T). Then, equating the cdf evaluated at the observed order statistics to their corresponding percentile values, we have

Downloaded By: [Imperial College] At: 11:40 25 February 2010

where pi,, is a suitable plotting position, e.g., pi,, = (i - a ) / ( n- b ) , for some values of 0 i a 5 1and 0 5 b 5 1 . The system in (3) is a set of r independent equations in r unknowns, el, 02,. . . ,Or. An initial estimate of '6 can then be obtained by solving (3) for 0. To estimate the jth parameter, Bj, we eliminate all other parameters but Oj, and obtain one equation in one parameter which can then be easily solved, either in closed form or by using the bisection method. The other parameters are then obtained one at a time in a similar way, or by back substitution in the eliminating process. Example 1 The P a r e t o Distribution. The Pareto cdf F ( x ; a , u) = 1- ( x / u ) - ~ depends on two parameters x > a > 0 and a > 0. Here 6' = { a , u) and r = 2. We therefore choose two distinct observations xi:, and xj:,. Accordingly, (3) becomes:

After arranging terms, these equations can be written as

By dividing the first by the second equation in (5) we eliminate u and obtain

from which an initial estimate of a is obtained as

-

a;, = k ( l - pi:,) - 10g(l - pi:,) 10dzi:n) - log(~j:n)

'

This estimate can then be substituted in (5) which yields an associated initial estimate of a which is given by

Note that estimators (7) and (8) are consistent. Thus, and k;, are initial estimators of a and u obtained from the two order statistics x,, and xi:,. Note that, in a sample of size n, there are n(n- 1)/2 possible initial estimates. These estimates are combined in the second stage t o obtain a final estimates of the parameters (see Section 2.3).

ESTIMATING LORENZ CURVES

2.2

Initial Estimates Using L(p;8 )

Suppose now we wish to estimate 8 by fitting a LC, q = L(p; B), to the data. The observations corresponding to L(p; 6') consist of n pairs of points (pl, ql)), . . . , (p,, q,)), where pi = i l n , q; = s;/s,, and s; = XI:, + . . . + xi:, for i = 1,. . . , n . Thus s; is the sum of the first i order statistics. Using the EPM, we choose a set of r distinct indices I = (21, . . . ,i,) and set

Downloaded By: [Imperial College] At: 11:40 25 February 2010

or, equivalently,

p.,- L-1 (q;;8),2 E I.

(10) An initial estimate of 0 can then be obtained by solving (9) or (10) for B in a similar way as we did in Section 2.1. Section 3 gives several illustrative examples of this. Note that the initial estimators are the same whether we use (9) or (10). This means that the method is invariant as to the specification of which variable is written as a function of the other. The least-squares, which is the currently used method of estimation, does not have this property because the estimators obtained from minimizing ELl[q, - L(p,; 8)12 are not the same as the ones obtained from minimizing CEJP~- L - Y ~ , ;e)i2.

2.3

Final Estimates

The initial estimates obtained either from F(x;B) or L(p;8) depend on r observations. For large values of n and r , the number of all possible initial estimates that can be obtained from a sample of size n may be too large to be computationally feasible. In such cases, instead of computing all possible initial estimates, one may select a pre-specified number, N, of elemental subsets either systematically, based on some theoretical considerations, or a t random. For each of these subsets, an initial estimate of 8 is computed. Let us denote these initial estimates by Jjl, jj2,. . . e j N , j = 1,2, . . . , r. These elemental estimates can then be combined, using some suitable (preferably robust) functions, to obtain an overall final estimate of 0. Examples of robust functions include the median (MED) and the least median of squares (LMS), Rousseeuw (1984). Thus, a final estimate of Oj can be defined as

where median(al, . . . ,k)is the median of { a l , . . . ,(y,), and L M S ( a I , a ~. .,. ,a,) is the estimate obtained using the LMS methods, which in this case is equal to the midpoint of the shortest interval containing half of the numbers a l , az, . . . ,a, (see Rousseeuw and Leroy (1987), p. 169), that is, B(LMS) = (cwN ( ~ ( ~ + ~ ) : ~ ) / 2 , where h = [n/2] 1, [a] being the integer part of a, and k is the index satisfying

+

+

In some cases, alternatives t o (11) and (12) may be available (see Example 5 below). These estimators can b e used for the calculation of the standard income inequality measures such as the Gini index which is defined by:

CASTILLO, HADI, AND SARABIA

3

APPLICATIONS TO LORENZ CURVES

We now apply the method of Section 2 to several of the LC's proposed in the literature. These LC's are grouped according to the number of their parameters.

3.1

One-Parameter Lorenz Curves

Downloaded By: [Imperial College] At: 11:40 25 February 2010

Example 2 The Pareto LC. The well-known classical Pareto LC,

corresponds to the Pareto cdf, F(x; a , a ) = 1 - (x/u)-", a > 1 , x > a > 0. Note that although F(x; a,a ) depends on two parameters, LCP(p; a ) depends on only one parameter a . Letting q = Lcp(p; a ) , (15) can be expressed as

where p = ( a - 1 ) I a . Equation (16) shows a straight line through the origin relationship between log(1 - q) and log(1 - p). Thus, a least-squares like estimate of p can be obtained by minimizing

which gives ~(LSE)=

E , log(l-

pi) log(l- qi)

5[ l o d l - pi)]'

(18)

i= 1

An estimate of a is then obtained as &(LSE) = (1 - ~ ( L s E ) ) - ~ . To estimate a using the EPM, we have T = 1, hence we need only one data point (pi, q,) to compute an initial estimate of a. Substituting (pi, q,) for (p, q) in (15), we obtain q; = 1 - (1 which gives the initial estimates

These initial estimates are then combined as in (11) or (12) t o obtain a final estimate of a.

Example 3 The Chotikapanich LC. Recently, Chotikapanich (1993) proposes the one-parameter LC,

Downloaded By: [Imperial College] At: 11:40 25 February 2010

ESTIMATING LORENZ CURVES

2043

and the non-linear least-squares to estimate the parameter a , that is, the value of

a that minimizes

is used to estimate a. We denote this estimate by & ( L S E ) . Using the EPM to estimate a, we substitute ( p i , 9,) for ( p ,q ) in (20) and obtain

which implies that an initial estimate of a is the zero-root of the function

This equation can be solved for a using the bisection method. The bisection method requires knowledge of lower and upper bounds for the solution. These bounds are given in the appendix. Once the initial estimates are computed, they are combined as in (11) or (12) to obtain a final estimate of a.

E x a m p l e 4 The S t r o n g l y U n i m o d a l LC's. Arnold et al. (1987) propose a class of LC's of the form L ~ P0 );= F ( F - ' ( P ) - a ) ,a 2 0, (24) where F ( x ) is any strongly unimodal cdf. For example, if F ( x ) = @ ( x ) ,where @(x) is the cdf of the standard normal distribution, then (24) is the LC corresponding to the log-normal distribution, that is, log(X) N ( 0 , a 2 ) . There is no general method for estimating a in (24). However, to estimate a using the EPM, we set q, = F(F-'(p,) - a ) , i = 1,. . . , n , which gives the initial estimate ui = F-I (pi) - F - l ( q ; ) , i = 1, . . . , n. A final estimate of a is then obtained as in (11) or (12).

-

3.2

Two-Parameter Lorenz Curves

Example 5 The Hyperbolic LC. Aggarwal (1984) and Aggarwal and Singh (1984) consider the one-parameter LC,

This form is latter extended by Arnold (1986) to the hyperbolic functional form,

2044

CASTILLO, HADI, AND SARABIA

I t can be seen that ( 2 5 ) are obtained by setting P = 1 in ( 2 6 ) . Note that (26) can be expressed as L H Y @ ; ~ , P )- a P P P - ~ H Y ~ ; a , ~1)- P This representation has lead Arnold (1986) to suggest estimating cu and

(27)

P

by mini-

mizing (28)

where u ; = q i / ( p i - q,) and vi = p ; / ( l - p i ) . The resultant estimates are given by

To estimate a and p using the EPM, we have r = 2, so we need two data points

Downloaded By: [Imperial College] At: 11:40 25 February 2010

(pi, q,) and ( p j ,q j ) R o m ( 9 ) and ( 2 6 ) , we have

After arranging terms, (30) becomes ax;

+ p z ; = yi,

CYXj

+ pzj

= yj,

(31)

where xi = p,(p; - q , ) , z , = q,(p, - l ) , and y, = (pi - l ) ( p ; - qi). It can be seen that ( 3 1 ) is a system of linear equations in a and P. The solution of this system,

is the initial estimates of a and 0 obtained from the two distinct points ( p i , q,) and ( p j , q j ) . Note that there are n(n - 1)/2 initial estimates of the form (30). These estimates (or a subset of them if n is too large) can be combined as in ( 1 1 ) or ( 1 2 ) . Alternatively, since the EPM equations ( 3 1 ) are linear in the parameters and each equation depends on only one observation, all n equations can be combined i? a regression like estimator. This EPM estimators, denoted by & ( R E G ) and P ( R E G ) , are given by

The results of Sections 4 and 5 indicate that the REG-estimators are even better than the MED and LMS estimators.

Downloaded By: [Imperial College] At: 11:40 25 February 2010

ESTIMATING LORENZ CURVES

2045

Example 6 The Kakwani and Podder LC. Kakwani and Podder (1973) propose an LC of the form

Taking the logarithm of both sides of (34), we obtain

Kakwani and Podder (1973) use (35) and the linear least-squares to estimate a and

0.Accordingly, the values of a and P that minimize

are used to estimate a and @. These estimates are denoted by & ( L S E )and ~ ( L S E ) . Using the EPM, r = 2 distinct points (pi, q,) and ( p j , q j ) are substituted in (34) for (p, q ) . The two equations

are obtained. Taking logarithms of both sides of the two equations in ( 3 7 ) , we obtain

which is a linear system of two equations in a and P. The solution of (38) gives the initial estimates.

of a and p, respectively. Final estimates are obtained as in Example 5. An alternative is to use regression estimates as in (33). Example 7 The Mixture LC's. Two forms of a mixture-type LC's are given by L ~ l ( p ; a , x=) ( 1 - x ) p + x p a + ' , a 2 O , O 5 x 5 1,

and L M ~ @a, ; 71) = ( 1 - I ) P

+ 1 [ I - (1 - p ) ( a - l ) / a ] , a > 1.0 5 x 5 1.

(40) (41)

Downloaded By: [Imperial College] At: 11:40 25 February 2010

2046

CASTILLO, HADI, AND SARABIA

The mixtures LC (40) and (41) form part of a hierarchy of LC's which are generated from the generalized Tukey's Lambda distribution (Sarabia, 1997). The parameters a and 7r can be estimated by the maximum likelihood method using theorem 1. To estimate the two parameters using the EPM, we choose r = 2 distinct points (pi, q,) and (pj, q j ) Then, for the first LC, LMl(p; a , n), we have

We can then substitute one of these equations into the other to eliminate .rr and obtain

which suggests that an initial estimate &,j can be obtained by finding the zero-root of

This root can be computed by the bisection method. The lower and upper bound for the solution is given in the appendix. This estimate is then substituted in one of the equations in (42) and the corresponding initial estimate of s is obtained as

Similarly, for LM2(p;a,r ) ,we have

Eliminating n , we obtain

which is a non-linear equation in a. Thus, an initial estimate &ij can be obtained by solving (47) using the bisection method. The lower and upper bounds are similar to those for Equation (43). This estimate is then substituted in one of the equations in (46) t o obtain the associated initial estimate of n which is given by n 'I. . -

Qi - Pi (1 - p,) [1 - (1 - pi)-l/&j

I'

(48)

Final estimates are then obtained by combining the initial estimates Bij and kij as in (11) or (12).

2047

ESTIMATING LORENZ CURVES E x a m p l e 8 T h e Rasche et. parameter family of LC:

al.

(1980) LC. Rasche et al. (1980) a two-

A simple method for estimating the parameters of this family of LC does not seem to exist (Gupta, 1984). To estimate the two parameters of this family using the EPM, we first select and substitute two distinct points @,,q,) and (pj, qj) in (49) and obtain qi = [ l - (1 - P , ) ~ ] ~ , (50) e = [I - (1 - pj)aj? Taking the logarithm of both sides and eliminating j3, we obtain

Downloaded By: [Imperial College] At: 11:40 25 February 2010

which is a non-linear equation in a. An initial estimate of a is the zero-root of the function

The solution, tiij,is obtained by the bisection method using the lower and upper bounds which are given in the appendix. The corresponding initial estimate of P is then obtained as fi,j = log(q,)/log(l - (1 - pj)"'j). Final estimates of a and P are then obtained as in Example 5.

3.3

Multi-Parameter Lorenz Curves

E x a m p l e 3.8. T h e Elliptical LC. The family of elliptical LC introduced by Villaseiior and Arnold (1989) is a flexible family for fitting income data. This family is given by Lv,(p; a.

a,6) = 0.5 ((a - PP) - da2+hp+ep2),

(53)

w h e r e a = a + p + b + l > O , b = -2aO-46,c = P 2 - 4 a < O , a + 6 5 1, and 6 2: 0. This is a three-parameter LC. Equation (53) implies that any point (p,,q,) 6w,, a = 1 , . . . ,n, where y, = q,(l - q,), x, = (p: - q,), must satisfy y; = ax, r; = q;@; - I), and w, = (p, - q;). This is a linear function of a , P, and 6. Leastsquares like estimators of a,P , and 6 can be obtained by minimizing

+ +

which gives

2048

CASTILLO, HADI, AND SARABIA

Using the EPM to compute an initial estimate of the three parameters, we take any three distinct observations, {(pi, q,), (pj,qj), (pk,qk)), and obtain three equations in three unknowns, the solution of which yields the initial estimates

There are n ( n - l)(n - 2)/6 possible sets of initial estimates. These estimates (or a subset of them when n is too large) are combined as in (11) or (12) t o obtain final estimates of the parameters.

Downloaded By: [Imperial College] At: 11:40 25 February 2010

4

AN EXAMPLE

In this section we apply the methodology to a real-life data set: the Spanish Family Income (SFI) data. The distribution of the SF1 in 1989 is given in Table 1. The data (pi, qi) represent points on a LC. Five LC's are fitted to the data. The resultant estimated LC's and the corresponding estimated parameters are given in Tables 2-6. For comparison purpose, several criteria are available for judging the goodness-of-fit. Since the currently used estimation method for the LC is the least-squares, we use the sum of squared errors,

as a criterion for comparison. Clearly, this criterion favors the least-squares estimators (LSE). The SSE's are also given in Tables 2-6. The results from this example are summarized as follows: For the Pareto LC case (Table 2)) using the SSE criterion, the LSE is inferior to the LMS estimators but not to the MED estimators. The reason that the LSE is not the best according to the SSE criterion is that the quantity to be minimized, which is given in (CPLSE), is not the same as the quantity given in (57). For the Chotikapanich LC case (Table 3), the least-squares estimators are better than the MED and LMS estimators. Of course, this results is expected because by design the least-squares estimators minimize the criterion (57) of choice. Thus, according to this criterion, no other estimator is better than the LSE's. So, SSE is not really a good criterion for comparison in this Chotikapanich LC case. For the Kakwani and Podder LC case (Table 4), the SSE for the LSE is larger than those of the MED and LMS estimators. For the Hyperbolic LC case (Table 5), the SSE for the LSE is smaller than those of the MED and LMS estimators, but larger than the SSE for the REG estimator. Recall that the REG is also obtained using the EPM.

ESTIMATING LORENZ CURVES Table 1: The Spanish Family Income (SFI) data.

Downloaded By: [Imperial College] At: 11:40 25 February 2010

Table 2: Estimated Pareto LC for the SF1 data.

Table 3: Estimated Chotikapanich LC for the SF1 data.

0.7138

SSE d

0.7802 0.0064 2.1635

0.7840 0.0075 2.1020

Table 4: Estimated Kakwani and Podder LC for the SF1 data.

0.7138

1

SSE 1

0.7684 0.0049

1

0.7885

1

0.7919

1 0.0089 1 0.0100

Downloaded By: [Imperial College] At: 11:40 25 February 2010

Table 5: btimated Hyperbolic LC for the SF1 data.

Table 6: Estimated Elliptical LC for the SF1 data

ESTIMATING LORENZ CURVES

205 1

Finally, for the Elliptical LC case (Table 6), the SSE for the LSE is smaller than those of the MED and LMS estimators. From the above results, we can conclude that even using the SSE criterion which favors least-squares estimators, the proposed estimators are at least as good as the least-squares estimators. In the following section, we use more fair criteria for comparing the various methods.

MONTE CARL0 COMPARISONS

5

In this section we carry out a simulation experiment to study the performance of the proposed method and to compare it with the currently used methods for several of the LC's discussed in Section 3.

5.1

Design of the Experiments

Downloaded By: [Imperial College] At: 11:40 25 February 2010

The performances of the estimators depend on the following factors: The functional form of the LC. Due to space limitation, we consider four LC's: two one-parameter LC's (the Pareto and the Chotikapanich LC's), one twoparameter LC (the Hyperbolic LC), and the three-parameter elliptical LC. We note here that we have performed the simulation for the other LC's and the conclusions are found to be essentially the same. The sample size. We consider two sample sizes n = 10,20. The values of the parameters. For each LC, we select the parameter values which satisfy the necessary conditions imposed on the parameters. The data for each of these configurations are generated by

-

where pi Uniform(0,l). For each case we generate 500 samples. For each parameter we compute the bias, the standard deviation, and the root mean squared error, RMSE = dvariance bias2'. The results are obtained using programs written in the computer language C and run on a Macintosh PowerPC computer. The p, are generated using the standard uniform congruential random number generator provided by the C function rnd().

+

5.2

The Pareto LC

The parameter a in the Pareto LC must be larger than 1 for the mean of the distribution to exist. We, therefore, consider five values of a,(1.25, 1.50, 1.75, 2.00, 2.50). The simulation results for these cases for two sample sizes are shown in Table 7. We observe the following: The bias for the MED and LMS are sometimes positive and sometimes negative; a n indication that the estimators could be unbiased. The bias of the LSE is negative in all cases.

CASTILLO, HADI, AND SARABIA Table 7: The Pareto LC: Simulation results

Bias LMS: LSE: MED : LMS: LSE: MED: LMS: LSE: MED: LMS: LSE:

-0.0352 -0.0143 -0.0241 -0.0722 -0.0480 -0.0924 -0.0989 -0.0930 -0.3221 0.1041 0.1572

St. Dev. 0.1540 0.1461 0.1318 0.3294 0.3916 0.3636 0.5963 0.7206 2.3169 1.0089 1.1041

RMSE 0.1579 0.1468 0.1340 0.3372 0.3945 0.3752 0.6045 0.7265 2.3392 1.0143 1.1152

Downloaded By: [Imperial College] At: 11:40 25 February 2010

LMS: LSE: LMS : LSE: LMS:

LMS:

LMS: MED: LMS : LSE:

0

For a = 1.25 and 1.50, the MED and LMS have slightly smaller bias but slightly larger standard deviation than the LSE. As a result, the RMSE for the LSE is slightly smaller than the RMSE for the MED and LMS. The RMSE for all three methods increase with a. However, the rate of increase for the LSE is huge as compared to that of the MED and LMS. For example, for n = 10, the RMSE for the MED and LMS increases from 0.1579 and 0.1468 when a = 1.25 to 1.2660 and 1.4501 when a = 2.5, where as the RMSE for the LSE increases from 0.1340 to 494.7815. Although the bias of the LSE also increases with a,the large increase in the RMSE for the LSE is mainly due to the large variability of the LSE estimator.

ESTIMATING LORENZ CURVES The performances of the MED and LMS are approximately the same. The performances of all three methods improve as the sample size increases from 10 to 20, as would be expected. Therefore, the simulation results for the Pareto LC show clearly that both the MED and LMS are far superior than the LSE.

5.3

The Chotikapanich LC

The parameter a in the Chotikapanich LC is positive. We, therefore, consider six values of a, (0.25, 0.75, 1.00, 1.50, 2.00, 5.00). The simulation results for these cases for two sample sizes are shown in Table 8. The following interpretations can be made:

Downloaded By: [Imperial College] At: 11:40 25 February 2010

0

0

0

Like in the case of the Pareto LC, the bias for the MED and LMS are sometimes positive and other sometimes negative, but the bias of the LSE is negative in all cases. The bias, standard deviation, and RMSE are all consistently larger for the LSE than for the MED and LMS. The RMSE for the MED and LMS slightly increases with a. The RMSE for the LSE first increases then decreases as a increase from 0.25 to 5 . Like in the Pareto LC case, the performances of the MED and LMS for the Chotikapanich LC are approximately the equal. The performances of all three methods improve as the sample size increases from 10 to 20.

The simulation results for the Chotikapanich LC case also show that both the MED and LMS are superior than the LSE.

5.4

The Hyperbolic LC

The two parameters in the Hyperbolic LC must satisfy the following conditions: a > 0 , ,f3 > 0 , and a - /3 < 1. We, therefore, consider four values of a (0.50, 1.00, 1.50, 2.50). In each case, we set ,f3 = 2 a . Recall that in this case the new REG estimators defined in (33 are given in addition to the MED and LMS estimators. The simulation results for these cases for two sample sizes are shown in Tables 9-10. The results are summarized as follows: 0

0

0

The bias of all four estimators is positive. For estimating both a and and LMS estimators.

P, the REG estimators are better than the MED

For estimating p, the RMSE for the LSE is consistently but slightly smaller than those for the REG. The converse is true for estimating /3 The RMSE for all four methods increases with a and /3.

CASTILLO, HADI, AND SARABIA

Downloaded By: [Imperial College] At: 11:40 25 February 2010

Table 8: The Chotikapanich LC: Simulation results

Estimator MED: LMS: LSE: MED: LMS: LSE: MED: LMS: LSE: MED: LMS: LSE: MED: LMS: LSE: MED: LMS: LSE: MED: LMS: LSE: MED: LMS: LSE: MED: LMS: LSE: MED: LMS: LSE: MED: LMS: LSE: MED: LMS: LSE:

Bias

St. Dev.

-0.2968 -0.2354 -4.7030 -0.1185 -0.0377 -2.7948 -0.0518 0.0354 -2.2015 -0.0103 0.0393 -1.2699 -0.0419 0.0151 -0.6234 -0.0245 0.0542 -0.4567 -0.1684 -0.1103 -4.1604 -0.0481 0.0211 -1.9341 0.0092 0.0987 -1.1633 -0.0216 0.0337 -0.4535 0.0642 0.0866 -0.1400 0.0438 0.1012 -0.2515

0.7328 0.7404 4.4969 0.8365 0.8662 3.9324 0.9346 0.9754 3.5815 1.0533 1.1309 2.8190 1.1884 1.2734 2.0815 1.4809 1.4949 2.4034 0.5302 0.5341 4.5644 0.6902 0.7324 3.5474 0.7177 0.7770 2.8905 0.7924 0.8685 1.7931 0.8473 0.9055 1.2726 1.0493 1.0553 1.8986

RMSE 0.7906 0.7770 6.5070 0.8449 0.8670 4.8244 0.9360 0.9760 4.2040 1.O533 1.1316 3.0918 1.1892 1.2735 2.1729 1.4811 1.4959 2.4464 0.5563 0.5454 6.1760 0.6919 0.7327 4.0404 0.7177 0.7833 3.1158 0.7927 0.8692 1.8496 0.8497 0.9096 1.2803 1.0502 1.0602 1.9152

ESTIMATING LORENZ CURVES Table 9: The Hyperbolic LC: Simulation results for &.

LMS REG LSE MED LMS

1.0000 1.0000 1.0000 2.0000 2.0000 2.0000 2.0000 3.0000 3.0000 3.0000 3.0000 5.0000 5.0000 5.0000 5.0000 1.0000 1.0000 1.0000

Downloaded By: [Imperial College] At: 11:40 25 February 2010

REG

20

1

LSE MED LMS REG LSE MED LMS REG LSE MED LMS REG

( E3 LMS

1

5.0000 5.0000 5.0000

Bias

St. Dev.

0.6730 0.5995 0.6695 0.5628 1.1865 1.1123 1.1835 1.0288 1.6040 1.5485 1.6723 1.5119 2.7022 2.5703 2.7319 2.5292 0.4152 0.3706 0.5356 0.4942 0.8587 0.7989 1.0112 0.9712 1.3750 1.3218 1.5523 1.4607 2.3315 2.2308 2.5434 2.4702

0.6738 0.6425 0.3360 0.1840 0.8761 0.7964 0.4199 0.2496 0.7598 0.8117 0.4338 0.2671 0.9293 0.9371 0.4938 0.2650 0.4046 0.5030 0.2948 0.1291 0.4971 0.5985 0.3586 0.1417 0.6489 0.7907 0.4299 0.1609 0.7539 0.9528 0.4553 0.1684

RMSE 0.9523 0.8788 0.7491 0.5922 1.4749 1.3680 1.2558 1.0586 1.7749 1.7483 1.7277 1.5353 2.8575 2.7358 2.7762 2.5431 0.5797 0.6248 0.6114 0.5108 0.9922 0.9982 1.O729 0.9815 1.5205 1.5403 1.6108 1.4696 2.4504 2.4258 2.5839 2.4760

The performances of all four methods improve as the sample size increases from 10 to 20. The simulation results for the Hyperbolic LC case also show that the LSE and REG estimators are comparable to each other and are better than the MED and LMS estimators.

2056

CASTILLO, HADI, AND SARABIA Table 10: The Hyperbolic LC: Simulation results for

Downloaded By: [Imperial College] At: 11:40 25 February 2010

n

Estimator MED LMS REG LSE MED LMS REG LSE MED LMS REG LSE MED LMS REG LSE 20 MED LMS

REG MED LMS REG

p.

P

Bias

St. Dev.

RMSE

0.9559 0.9322 0.7354 0.5337 1.4017 1.3728 1.0324 0.8294 1.9153 1.9098 1.2420 1.0098 2.1436 1.9345 1.1837 1.0233 0.9965 0.9622 0.7484 0.4183 1.5863 1.5506 1.1076 0.6211 1.9412 1.9973 1.2309 0.6996 2.4311 2.4072 1.4110 0.8029

0.0441 0.0678 0.2646 0.4663 0.5983 0.6272 0.9676 1.1706 1.0847 1.0902 1.7580 1.9902 2.8564 3.0655 3.8163 3.9767 0.0035 0.0378 0.2516 0.5817 0.4137 0.4494 0.8924 1.3789 1.0588 1.OO27 1.7691 2.3004 2.5689 2.5928 3.5890 4.1971

1.3865 1.3597 0.8067 0.7237 2.1294 2.0570 1.1300 1.0976 2.4327 2.7486 1.5035 1.4493 5.2076 4.5706 2.0133 1.7725 0.9581 0.9696 0.6052 0.5059 1.4046 1.6894 0.8264 0.6470 1.9225 2.2738 1.1968 0.8353 2.3660 2.7325 1.2992 0.9602

1.3872 1.3614 0.8490 0.8609 2.2119 2.1505 1.4876 1.6047 2.6636 2.9569 2.3132 2.4619 5.9395 5.5035 4.3149 4.3539 0.9581 0.9704 0.6554 0.7709 1.4642 1.7481 1.2163 1.5232 2.1948 2.4850 2.1359 2.4473 3.4924 3.7669 3.8169 4.3055

The three parameters in the Elliptical LC must satisfy the following conditions: a /3 6 1 > O,pz - 4 a < 0, a 6 5 1, and 6 2 0. We, therefore, consider three values of cx (0.25, 0.50, 0.75). For each value of a, we consider three values of p, (&, 0, -@. For each a and ,B values, we consider three values of 6 but only those values which satisfy the above conditions. All in all there are 18 possible combinations of the parameters satisfying the above conditions. Because of space

+ + +

+

ESTIMATING LORENZ CURVES

Downloaded By: [Imperial College] At: 11:40 25 February 2010

Table 11: The Elliptical LC: Simulation results for & Estimator MED : LMS: LSE: MED: LMS: LSE: MED: LMS: LSE: MED: LMS: LSE: MED : LMS: LSE: MED: LMS: LSE: MED: LMS: LSE: MED : LMS: LSE: MED : LMS: LSE: MED: LMS: LSE: MED : LMS : LSE:

(n=

10). RMSE 0.2446 0.2431 0.3320 0.4079 0.4014 0.4883 0.5018 0.5000 0.5604 0.2015 0.1985 0.2975 0.3361 0.3289 0.4500 0.4371 0.4294 0.5224 0.1710 0.1652 0.2775 0.2927 0.2862 0.4126 0.3944 0.3902 0.4929 0.1810 0.1913 0.2535 0.2505 0.2587 0.3044

CASTILLO, HADI, AND SARABIA

2058

Downloaded By: [Imperial College] At: 11:40 25 February 2010

Table 12: Th.e Elliptical LC: Simulation results for Estimator MED : LMS: LSE: MED: LMS: LSE: MED: LMS: LSE: MED: LMS: LSE: MED: LMS: LSE: MED: LMS: LSE: MED: LMS: LSE: MED: LMS: LSE: MED: LMS: LSE: MED: LMS: LSE: MED: LMS: LSE:

( n = 10).

Bias St. Dev. 1.1871 0.2481 1.1587 0.2685 1.4087 0.6806 1.2712 0.2072 1.2455 0.2194 1.4206 0.3561 1.3475 0.2179 1.3204 0.2392 1.4537 0.3357 1.6839 0.3256 1.6381 0.3650 1.9521 0.7545 1.7668 0.2658 1.7191 0.2824 2.0151 0.5675 1.8230 0.2420 1.7810 0.2697 1.9870 0.3965 2.1591 0.3228 2.1286 0.3595 2.4797 0.8201 2.2813 0.3232 2.2420 0.3220 2.5535 0.6178 2.3112 0.2519 2.2822 0.2948 2.5024 0.4095 1.0815 0.2765 1.0466 0.3143 1.2698 0.4722 1.1276 0.2133 1.0976 0.2189 1.2804 0.3744

RMSE

1.2128 1.1894 1.5645

1.2879 1.2646 1.4645 1.3650 1 1.4920 1.7151 1.6783 2.0929 1.7867 1.7421 2.0935 1.8390 1.8013 2.0262 2.1831 2.1587 2.6118 2.3041 2.2650 2.6272 2.3249 2.3012 2.5357 1.1163 1.0927 1.3548 1.1476 1.1192 1.3340

limitations, we include here only the results for n = 10 and also truncated seven combinations so that the tables can fit in one page. The simulation results are shown in Tables 11-13. We make the following observations: The bias is negative for all three estimators of a, but is positive for all estimators of 8.

ESTIMATING LORENZ CURVES

Downloaded By: [Imperial College] At: 11:40 25 February 2010

Table 13: The Elliptical LC: Simulation results for

Estimator MED: LMS: LSE: MED : LMS: LSE: MED : LMS: LSE: MED: LMS : LSE: MED : LMS : LSE: MED : LMS: LSE: MED : LMS: LSE: MED : LMS: LSE: MED : LMS: LSE: MED : LMS: LSE: MED: LMS: LSE:

C?

(n = 10).

Bias

St. Dev.

RMSE

0.3509 0.3398 0.4268 0.6166 0.6026 0.6893 0.8723 0.8530 0.9332 0.3486 0.3342 0.4319 0.6246 0.6064 0.7223 0.8811 0.8568 0.9610 0.3410 0.3293 0.4317 0.6217 0.6024 0.7231 0.8808 0.8592 0.9716 0.3706 0.3509 0.4572 0.6216 0.6026 0.6956

0.0659 0.0655 0.1274 0.0845 0.0853 0.1358 0.1001 0.0997 0.1467 0.0583 0.0589 0.1112 0.0771 0.0753 0.1381 0.0924 0.0924 0.1370 0.0553 0.0586 0.1224 0.0738 0.0716 0.1370 0.0810 0.0852 0.1347 0.0811 0.0791 0.1311 0.0775 0.0807 0.1300

0.3570 0.3461 0.4454 0.6223 0.6086 0.7026

0.8780 0.8588 0.9446 0.3534 0.3394 0.4460 0.6293 0.6111 0.7354 0.8859 0.8617 0.9708 0.3454 0.3345 0.4487 0.6260 0.6067 0.7359 0.8845 0.8634 0.9809 0.3794 0.3597 0.4757 0.6264 0.6080 0.7077

The RMSE for the LSE is consistently larger than those for the MED and LMS for all three parameter estimators. The difference, however, is not large. The MED and LMS are comparable to each other. Although the results for n = 20 is not shown here, the performances of all four methods improve as the sample size increases from 10 to 20.

2060

CASTILLO, HADI, AND SARABIA

The simulation results for the Elliptical LC case indicate that the MED and LMS are better than the LSE, although this conclusion is not as clear-cut as in the cases of the Pareto and the Chotikapanich LC's.

6

SUMMARY AND CONCLUDING REMARKS

Downloaded By: [Imperial College] At: 11:40 25 February 2010

In this paper we advocate the use of the elemental percentile method t o estimate the parameters of and fit Lorenz curves to observed data. The method can be used to estimates the parameters by either fitting Lorenz curves or fitting the associated cumulative distribution functions to the data. The method is suitable for both grouped and ungrouped data and for all forms of the LC as long as these forms are written in closed-forms which is typically the case in practice. The method is described in Section 2 and illustrated by applications to several LC's in Section 3. A real-life data set and a simulation study are used to evaluate the proposed estimators and to compare them with the currently used methods. The results indicate that proposed methods perform as good as the currently used methods in some cases but in many other cases they outperform current methods by a wide margin.

APPENDIX In this appendix we derive upper and lower bounds to be used by the bisection method for solving the equations of the initial estimators. A.1. B o u n d s for a in (23). The estimation equation is

exp (cup;) - q, exp ( a )- 1 Initially, we make the change of variable

+ qi = 0.

(59)

P = exp(a) and define

The zeroes of (60) are the solutions of the estimation equation. The h ( P ) function: has one zero a t

p

= 1.

attains one maximum at

Po = (:)*

is concave from below, because h U ( P )= pi(pi - l

) p - 2

< 0.

satisfies lim h ( P ) = q, - 1 < 0. P-0 satisfies lim h ( P ) = -m. 0-m

Thus, the tangent a t the point ( 2 b , h(2Po)) is an upper bound for h ( P ) in the interval (PO,m) and its zero 2 b - h(2Po)/h1(2Po)is an upper bound for the zero of h ( P ) . Consequently, the second zero of h ( P ) is in the interval (Po, 2Po - h(2&)/ht(2Po)). Once we know p, we can get the estimate of a by ti = log(@.

ESTIMATING LORENZ CURVES A.2. B o u n d s for a in (44). The estimation equation is

Let

where Bij = (pj(p, - qi))/(pi(pj - qj)) > 0. If the L C is concave from above we have Bij > 1. Without loss of generality we assume pi < p,. Then, we have:

lim h(a) = co.

a-+-m

lim h(a) = 1 - Bij

a-+w

lim h(a) =

l0d~i)

Downloaded By: [Imperial College] At: 11:40 25 February 2010

a -t 0

Thus, if log(p,)j log(p,) hence

and

a0

- Bij.

-

a0

Bij

> 0, then the zero a 0 of h(a) is such that CYO > 0 arid

is less than the zero of the right hand side function in (63), i.e

But if log($;)/ log(pj) hence

and

< 0.

- Bij < 0,

the zero

a0

of h(a) is such that

a0


log(Bij)/ l o g ( ~ i / ~ j ) . Consequently we can conclude with the following theorem: Theorem 2 If log(p,)/ log(pj)

- Bij > 0 the zero of the function

otherwise it is i n the interval

A.3. B o u n d s for a in (52). The estimation equation is

(66)

CASTILLO, HADI, AND SARABIA log(qi) --

-

log(qj)

log 11 - (1 log [1 - (1 - P ~ ) ~ I

We define

where Cij = log(q,)/ log(qj). Note that this function is defined only for a > 0. We assume without loss of generality that pi < pj which implies that qi < q j and Cu > 1. Then, we have:

lim h(a) = oo

a-00

lim h(n) = 1 - C,j < 0

a-0 0

The function is increasing.

Downloaded By: [Imperial College] At: 11:40 25 February 2010

Thus, if (0, no).

a0

is large enough for h(n)

> 0, then the zero of h(n) is in the interval

ACKNOWLEDGEMENTS The authors are grateful to the Direcci6n General de Investigacih Cientifica y T6cnica (DGICYT) (projects PB92-0500 and PB92-0504), and the NATO Research Office for partial support of this work.

BIBLIOGRAPHY Aggarwal, V. (1984), "On Optimum Aggregation of Income Distribution Data," Sankhya, B, 46, 343-355. Aggarwal, V. and Singh, R. (1984), "On Optimum Stratification with Proportional Allocation for a Class of Pareto Distributions," Communications in Statistics, A13, 3017-3116. Arnold, B. C. (1983), Pareto Distributions, International Cooperative Publishing House, Fairland, MD. Arnold, B. C. (1986), "A Class of Hyperbolic Lorenz Curves," Sankhya, B, 48, 427-436. Arnold, B. C., Robertson, C. A., Brockett, P. L., and Shu, B. Y. (1987), "Generating Ordered Families of Lorenz Curves by Strongly Unimodal Distributions," Journal of Business and Economic Statistics, 5, 305-308. Basmann, R.L., Hayes, K.L., Slottje, D.J. and J.D. Johnson (1990). A General Functional Form for Approximating the Lorenz Curve. Journal of Econometrics, 43, 77-90.

ESTIMATING LORENZ CURVES

2063

Castillo, E. and Hadi, A. S. (1995), "A Method for Estimating Parameters and Quantiles of Continuous Distributions of Random Variables," Computational Statistics and Data Analysis, Vol 20, 421-439. Chotikapanich, D. (1993), "A Comparison of Alternative Functional Forms for the Lorenz Curve," Economics Letters, 41, 129-138. Gastwirth, J . L. (1971), "A General Definition of the Lorenz Curve," Econometricu, 39, 1037-1039. Gupta, M. R. (1984), "Functional Form for Estimating the Lorenz Curve," Econom e t r i c ~ 52, , 1313-1314. Kakwani, N. C. (1977), "Application of Lorenz Curves in Economic Analysis," Econornetrica, 45, 719-727.

Downloaded By: [Imperial College] At: 11:40 25 February 2010

Kakwani, N. C. and Podder, N. (1973), "On Estimation of Lorenz Curves from Grouped Observations," International Economic Review, 14, 137-148. McDonald, 3. B.(1984), "Some Generalized Functions for the Size Distribution of Income," Econornetrica, 52, 647463. Ortega, P., Martin, A., FernBndez, A., Ladoux, M. and A. Garcia (1991). A New Functional Form for Estimating Lorenz Curves. Review of Income and Wealth, 37. 447-452. Pakes, A. G . (1981), "On Income Distributions and Their Lorenz Curves," Technical Report Department of Mathematics, University of Western Australza, Nedlands, W.A. Rasche, R. H., Gaffney, J., Koo, A. Y. C., and Obst, N. (1980), "Functional Forms for Estimating the Lorenz Curve," Ewnometrica, 48, 1061-1062. Rousseeuw, P.J. (1984), "Least Median of Squares Regression," Journal of the American Statistical Association, 79, 871-880. Rousseeuw, P.J. and Leroy, A.M. (1987), Robust Regression and Outlier Detection, John Wiley and Sons, New York. Ryu, H.K. and D.J. Slottje (1996). Two Flexible Functional Form Approaches for Approximating the Lorenz Curve. Journal of Econometrics, 72, 251, 274. Sarabia, J.M. (1997). A Hierarchy of Lorenz Curves Based on the Generalized Tukey's Lambda Distribution. Econometric Reviews, 16, 305-320. Slottje, D. J. (1990), "Using Grouped data for Constructing Inequality Indices: Parametric vs. Non-parametric Methods," Economics Letters, 32, 193-197. Villaseiior, J . A. and Arnold, B, C. (1989), "Elliptical Lorenz Curves," Journal of Econometrics, 40, 327-338.

Received April , 1997; Revised February, 1998.

Suggest Documents