DALTON TRANSFERS, INEQUALITY AND ALTRUISM
Dominique Thon Bodoe Graduate School of Business, Norway.
Stein W. Wallace Center for Advanced Study Norwegian Academy of Science and Letters and Norwegian University of Sciences and Technology
April 2000; revised April 2001. Abstract. This paper characterizes the set of income distributions attainable through a sequence of equalizing decentralized pairwise transfers which preserve the relative ranking of the donor and the recipient. This kind of transfer was considered by Dalton in his famous 1920 article. We provide a description of the set of allocations attainable through a sequence of Dalton transfers. This paper argues that, although such transfers are often mentioned in the literature on income inequality, they do not really play any role there. It also argues that such transfers are, on the other hand, of interest in modeling altruism and its consequences. Key words. Income distribution, inequality, altruism, majorization. J.E.L. classification: D31, D63, D64.
CORRESPONDING AUTHOR: Stein W. Wallace Center for Advanced Study Norwegian Academy of Science and Letters, Norway Drammensveien, 78, N-2071 Oslo Norway
[email protected]
Tel: + 47 22122526
Fax: + 47 22122501
Thanks are due to Michal Kaut of the Norwegian University of Sciences and Technology, who produced the graphics.
2 I INTRODUCTION. In spite of the ubiquity of references to ”Dalton’s principle of transfer” in the economic literature, the set of income distributions which can be reached from some initial distribution by a sequence of transfers such as actually described in Dalton (1920) has never been characterized. One purpose of this paper is to provide such a characterization. A second purpose is to try to ascribe to such transfers their proper place in the study of the distribution of income. We start with a familiar kind of transfer, which is not the one described by Dalton. Let n be a number of individuals and x, y ∈ ℜn , Σx = Σy , be two allocations of income to those individuals. An important concept in the economic literature on the distribution of a constant sum of incomes between n individuals is the one of a T-transform, that is a special n×n bistochastic matrix of the form
T = λ Q + (1 − λ)I;
0 ≤ λ ≤ 1,
where I is the identity matrix and Q is a permutation matrix which interchanges exactly two coordinates. Thus Q and T have exactly two elements off the diagonal, and xT has the form:
xT = ( x1 ,..., x j −1 , λ x j + (1 − λ ) x k , x j +1 ,..., x k −1 , λ xk + (1 − λ ) x j , x k +1 ,..., x n ) .
The concept is due to Muirhead (1903); see Marshall and Olkin (1979). If one thinks of a T-transform as performing a single transfer of income between two individuals,
3 and call this transfer a Muirhead-transfer, then expressing the desirability of such a operation can be formulated as the following axiom:
Muirhead-transfer axiom. A transfer of income between two individuals is desirable if, after the transfer is performed, the income of the recipient is not strictly larger than the original income of the donor.
The following well-known “majorization” preorder has a number of alternative definitions; we define it here on the basis of Muirhead-transfers. Note that in our notation, “M” refers to Muirhead and not to majorization; in Definition 1, it is x which majorizes y.
Definition 1. We say that x p M y if and only if y can be reached from x through a succession of Muirhead-transfers.
For a given x, we call S M (x) the set of y’s such that x p M y; it is the better-than-x set according to the preferences expressed by the axiom. On the basis of the well-known relationship between T-transforms and Birkhoff’s Theorem (see Section 3), the set M S (x) can be described as the convex hull of the points produced by applying all
permutation matrices to x. To illustrate, for n = 3, Figure 1 represents S M (x) for x = (1, 4, 13). The figure also shows that the simplex is divided into 6 (= n!) ”cells”, each of which corresponds to a distinct ordering of the incomes, that is to a particular permutation. Without loss of generality, we assume that the original income vector x
4 is increasingly ordered; we call C123 the corresponding cell; likewise C 321 is the cell where incomes are decreasingly ordered, and so on. Let ∆n = {z z1 ≤ z 2 ≤ ... ≤ z n }.
Now, quite independently, Dalton (1920) investigated a number of standard statistical measures of dispersion in order to determine whether or not they satisfy certain desirable conditions when applied to the measurement of income inequality. One of the desirable properties Dalton proposed a measure of income inequality should satisfy is that it should rank income distributions in line with the following consideration.
"If there are only two income receivers and a transfer of income takes place from the richer to the poorer, inequality is diminished. There is indeed an obvious limiting condition. The transfer must not be so large as to more than reverse the relative position of the two income receivers, and it will produce its maximum result, that is to say create equality, when it equals half the difference between the two receivers and whatever the amount of their incomes, any transfer between any two of them, or in general any series of such transfers, subject to the above condition, will diminish inequality". (Dalton (1920), p. 351)
The pairwise income transfer considered by Dalton (henceforth a Dalton-transfer) is accomplished by constructing xT*, with:
(1)
T* = λ* Q + (1 − λ*)I;
0 ≤ λ* ≤ ½,
5 where Q and I are as in a T-transform. Formulated as an axiom, Dalton’s idea is thus:
Dalton-transfer axiom. A transfer of income between two individuals is desirable if, after the transfer is performed, the income of the recipient is not strictly larger than the income of the donor.
This leads to the definition of the following partial order:
Definition 2. We say that x p D y if and only if y can be reached from x through a succession of Dalton-transfers.
Inspection of the definitions will reveal that both axioms refer to the desirability of a transfer which is from some individual to some other individual who is initially strictly poorer. Note carefully the difference between the two axioms. The Muirheadtransfer axiom, for example, allows (maximally) for a permutation but the Daltontransfer axiom does, barring trivialities, preclude a permutation. While p M is a preorder, that is it is transitive and reflexive, p D is furthermore antisymmetric and is thus a partial order. To anticipate somewhat, S D (x), being the set of y’s such that x p D y, is represented on Figure 2, for the same x as in Figure 1.
The preorder p M implies the property of ”anonymity” or ”symmetry” in that if, for T being a T-transform with λ = 1, y = xT and thus x p M y, then also y p M x, as x = yT also holds; thus it does not matter who receives a particular income. Put otherwise, if one thinks of the two income allocations x and y being compared as two random
6 variables with a uniform probability measure, the way they are jointly distributed does not play any role for p M . As regards p D , on the other hand, it does matter who owns each original and each final income, that is the joint distribution is of relevance.
To illustrate some properties of S D (x), consider again the income vector x = (1, 4, 13) as in Figure 2. A striking feature of S D (x) is that it is not convex. It is also clear that D M D S (x) is a subset of S (x). The intersection of S (x) with every cell of the simplex
is non-empty. This is in particular true of the intersection with C 321 . Thus, in spite of the restriction λ* ≤ ½ on every Dalton-transfer, it is quite possible that a sequence of such transfers completely strictly inverts the ordering of the incomes, for n > 2. The following figure illustrates such a sequence, with y = (6.1, 6, 5.9) as the final result (the superscript indicates the owner of the income).
− − − 11 − − − − − − 42 − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − 133 − − − − − − − − − − − 42 − − − − − − − − − − − − − − 6.5 − − − 7.53 − − − − − − − − − − − − − − − − 1
− − − − − − − − − − − − − − 5.62 − 5.93 − − − − − − 6.5 − − − − − − − − − − − − − − − − − − − − − − 1
− − − − − − − − − − − − − − − − − 5.93 − 62 − 6.11 − − − − − − − − − − − − − − − − − − − − − − − − −
The ordering p D has a physical interpretation and in fact figures in the literature on statistical mechanics. Let there be three solid bodies with different temperatures, and assume they have the same heat capacities, normalized to unity, which makes transfers of heat equivalent to transfers of temperature (temperature being by definition the ratio of heat over heat capacity). Suppose they are successively pair-wise connected by a “perfect wire” which conduct heat without losses, in some sequence. The above example represents such a sequence at the end of which the originally warmest body is the coldest and vice-versa. More generally, the following question is of interest: What
7 are the combinations of temperatures which can be reached from x by a sequence of such pair-wise heat transfers, keeping in mind that by the second law of thermodynamics, heat flows continuously only from a warm body to a cold one, and stops flowing when the two bodies have the same temperature? This set of temperatures is precisely the set S D (x) represented on Figure 2. The representation of such physical phenomena is surprisingly recent and due to Zylka (1985, 1987, 1990); we make extensive use of his work.
A Dalton-transfer is thus a Muirhead-transfer which satisfies the further condition that it respects the equivalent of the second law of thermodynamics; the income allocation that it produces could have been reached by a continuous pairwise process by which “income flows only from a rich to a poor” with the consequence that this flow stops when their incomes become equal. For the purpose of comparison, consider the following alternative transfer axiom; “H” stands for either “horizontal equity” or Hoffman (1969), who investigated the properties of S H (x), the corresponding betterthan set.
H-TRANSFER AXIOM. A transfer of income between two individuals is desirable if it is a transfer from richer to poorer which does not invert the ranking of any pair of individuals.
We say that x p H y if y can be reached from x through a sequence of such transfers. Of course, if x ∈ ∆n , then S H (x) = S M (x) ∩ ∆n = S D (x) ∩ ∆n , and one has in general: x p H y → x p D y → x p M y; S H (x) ⊆ S D (x) ⊆ S M (x).
8
The interest of investigating the better-than set generated by a sequence of Daltontransfers is that the “obvious limiting condition” in Dalton’s quote above, that after the transfer is performed the rank-orders of the donor and recipient are not strictly inverted, seems to provide a natural bound on decentralized pairwise altruism or charity. “Decentralized” because the pair of individuals involved in a Dalton-transfer (as well as a Muirhead-transfer) need to consider only their own two incomes. This is in contradistinction to a H-transfer, for example, where knowledge of other income levels than the ones of the donor and recipient is necessary in order to set a suitable limit on the size of the pairwise transfer (namely knowledge of the incomes of the poorest individual strictly richer than the recipient and of the richest one strictly poorer than the donor). We suggest that when human beings interact pairwise and altruism prevails, the outcome is typically a transfer such as described by Dalton. The legend of Saint Martin giving half of his cloak to a destitute man and similar edifying tales of decentralized pairwise transfers illustrate the prevalence of this conception of altruism.
In the next section we give a description of the set of income vectors attainable from some given income vector through a sequence of Dalton-transfers and derive a number of properties of the extreme points of this set. Those properties are used to construct two algorithms, one to enumerate all extreme points for a given x and one to check for x p D y, for some given pair x, y. Those algorithms are given in the Appendix. Section 3 provides concluding comments and a short discussion of the literature.
9 II THE BETTER-THAN SET . Following Zylka (1985), on which the following description of S D (x) for n = 3 is based, let without loss of generality the initial income vector x be increasingly ordered: x ∈ ∆n . Consider the following idempotent bistochastic matrices:
(2)
.5 A1 = .5 0
.5 0 .5 0 , 0
1
.5 A2 = 0 .5
0 .5 1 0 , 0 .5
1 A3 = 0 0
0 0 .5 .5 .
.5 .5
The extreme points of S D (x), that is the points that cannot be written as a non-trivial convex combination of other points of this set, are:
1 ε = x;
(3)
2 ε = x A1 ;
3 ε = x A1 A2 ;
4 ε = x A1 A2 A3 ;
5 ε = x A3 ;
6 ε = x A3 A2 ;
7 ε = x A3 A2 A1 .
Those points are represented on Figure 3, with x taken to be x = (1, 4, 13) as an illustration. For every cell, construct its intersection with the set S M ( ε i ) of any ε i whose ranking is the same as the one of that cell (there might be several such ε i ’s). Any such intersection is the one of two convex sets and is thus convex. The set S D (x) is the union of those intersections.
In Figure 3, one of the “wedges” which constitute S D (x) is the convex intersection of M H S (x) with the cell C123 . This intersection is of course S (x), the polytope with
10 vertices (x, ε 5 , x= , ε 2 ) in the Figure, with x= the point of equality. Figure 3 shows as well the set S M ( ε 5 ) which is the triangle ε 5 AB; its intersection with C132 is another wedge of S D (x). Figure 4 represents S M ( ε 7 ) and S M ( ε 4 ) as the two triangles ε 7 BD and ε 4 AC, respectively. Their intersections with C 321 are the two shaded triangles. A point y ∈ C 321 belongs to one of those intersections if and only if x p D y.
We now turn to a general characterization of the extreme points of S D (x).
Definition 3. To average income between individuals i and j is to give them each the income ( xi + x j ) / 2 where xi , x j are the incomes before averaging.
Proposition 1. A transfer from a richer individual to a poorer one, which is strictly less than half the difference in income does not produce an extreme point of S D (x).
This is obvious, as such a transfer produces an income vector which is a convex combination of the original income vector and the income vector obtained by averaging. Let now v = (v(1), v(2), …, v(n)) be a permutation of the individuals. As we assume x ∈ ∆n , one initially has v(i) = i. In general, v(i) = k means that individual k is the i’s poorest, with income x(k). We also use the inverse function v−1 (k) = i, meaning likewise that individual k is the i’s poorest. If v −1 (i ) − v −1 ( j ) = 1, then we call i, j neighbors.
11 Proposition 2. If the incomes of individuals i and j are different, averaging those incomes does not produce an extreme point of S D (x) if
−1
v (i ) − v −1 ( j ) > 1.
The proposition is proven in the Appendix and illustrated in Figure 5, where x* is obtained by averaging the incomes of individuals 1 and 3, who are not neighbors (x = (2, 7, 9)). Note that in (3), the matrix A2 , which equalizes the incomes of individuals 1 and 3 appears in the construction of an extreme point only after either the first two incomes have been equalized (e.g. in ε 3 ) or the last two (e.g. in ε 6 ) and thus in the above construction (3), A2 performs an averaging between neighbors.
Propositions 1 and 2 characterize the points in S D (x) which are not extreme, and are not sufficient to construct the finite set of extreme points. To illustrate the difficulty, consider in Figure 6 the “clockwise” construction of the points ε 2 , ε 3 , ε 4 , ι 5 , ι 6 , ι 7 , 8 2 3 4 5 4 6 5 7 6 8 7 ι , …, with ε , ε , ε as above and ι = ε A1 , ι = ι A2 , ι = ι A3 , ι = ι A1 ,
…., see (2), all constructed by pairwise averaging between neighbors. Yet, obviously, 5 6 7 8 ι , ι , ι , ι , … are interior points. The following proposition allows one to
distinguish.
Proposition 3. If v−1 (i ) = v−1 ( j ) − 1 and i > j, then averaging the incomes of the two individuals will not produce a new extreme point.
The premisses of the proposition mean that the incomes of i, j have already been averaged once, as originally, i has a higher income than j. Essentially the proposition
12 says that one should not average more than once the incomes of the same two individuals if one wishes to construct an extreme point of S D (x). The proof is given in the Appendix. While Propositions 1, 2 tell us what to do to obtain extreme points (average income between neighbors), Proposition 3 tells us when to stop (when one runs out of pairs of individuals whose incomes have not been averaged yet).
To illustrate, consider Figure 7. Going from x to ε 2 averages the incomes of individuals 1 and 2; going from there to ε 3 averages the incomes of individuals 1 and 3, and then to ε 4 averages the incomes of individuals 2 and 3. We have now run out of pairs of individuals between whom income has not be averaged. From ε 4 , averaging 2 and 3 would reproduce ε 4 , averaging 1 and 2 brings us to interior point 5 ι , and averaging 1 and 3 to interior point A.
Finally, what is the meaning of the non-convexity of S D (x)? We give its intuition with the help of an example. Start as above from the income allocation x = (1, 4, 13) and let every income recipient split his income in two equal parts, one of which is held in coins, xc = (.5, 2, 6.5) and the other in bank notes, xb = xc . Suppose now that the following sequence of averagings is performed on the coin incomes: y c = b c x A1 A2 A3 = (3.875, 2.5625, 2.5625), while the sequence of averagings y =
b x A3 A2 A1 = (3.3125, 3.3125, 2.375) is performed on the bank note incomes. The
income allocation is now y = y c + yb = (7.1875, 5.875, 4.9375). Noting that ε 4 = (7.75, 5.125, 5.125) and ε 7 = (6.625, 6.625, 4.75), one has that the point y is in the “dent” in S D (x) in cell C 321 and is not attainable through Dalton-transfers performed
13 on the whole incomes, see Figure 3. More generally, if income allocation x is split into λ x and (1 − λ )x, with 0 < λ < 1, then it is possible by performing Daltontransfers on each of those two income vectors separately to reach a point which is in the convex hull of the extreme points of S D (x), yet not in S D (x) itself. While S D (x) is not convex, it is starshaped with respect to the point of perfect equality x= , in the sense that a convex combination of x= and any point in S D (x) belongs to S D (x), as pointed out by Zylka (1985); it is also starshaped with respect to x.
14 3. CONCLUSION As pointed out in the introduction, the preorder p M is the opposite of the “majorization” preorder. If x, y ∈ ℜn , then x is said to majorize y if there exists a bistochastic matrix B such that y = Bx and the set of y’s majorized by a prescribed x is the set that we have called S M (x). This follows from Muirhead’s Lemma (see Marshall and Olkin (1979), p. 21) which we state as follows:
Proposition 4. x p M y (Definition 1) if and only if there exists a non-negative matrix B such that: Be = e, eB = e, and y = xB.
By Birkhoff’s Theorem, the set of matrices generating S M (x) (i.e. the bistochastic matrices) can each be decomposed by (i.e. written as a convex combination of) permutation matrices. Note that Muirhead’s Lemma does not say that all bistochastic matrices can be factorized by (i.e. written as the product of) T-transforms. It merely says that if x majorizes y, then y = xB for some bistochastic matrix B which can be so factorized. The fact is that there need not be a unique bistochastic matrix corresponding to a given majorization; Marshall and Olkin (1979, p. 39) give examples of bistochastic matrices which cannot be factorized by T-transforms.
A similar pair of a decomposition result and a factorization result exists for the preorder p H . The set of matrices generating S H (x) can each be decomposed by a subset of the set of bistochastic matrices (here the idempotent bistochastic matrices, see Hoffman (1969)) and if x p H y, then there exists some bistochastic matrix B
15 which can be factorized by T-transforms which do not strictly invert the ranking of all incomes (Dasgupta et al (1973), who assume w.l.g. that x, y ⊆ ∆n ).
On the other hand, the partial order x p D y is defined (Definition 1) by the fact that y is produced by applying to x some bistochastic matrix which can be factorized by T* matrices (see (1)) but no decomposition characterization of all bistochastic matrices which generate S D (x) is available. The non-convexity of S D (x) militates strongly against the possibility of such a result. There is thus in this respect a significant difference between p M and p H on the one hand and p D on the other.
Turning to a short review of the literature, we note that it is curious that Dalton (1920) did consider “asymmetric” Dalton-transfers rather than “symmetric” Muirheadtransfers, as the purpose of his article was to put to test inequality measures (the mean deviation, the Gini mean difference and others) which are all symmetric functions, as well as Lorenz dominance, which is likewise a symmetric concept. (A function F(x) of a vector x is symmetric if F(x) = F(Πx) for Π a permutation matrix.) The idea of symmetry is mostly implicit in Dalton’s article, although in his introduction, there is a lone and token reference to symmetry when he explicitly states the assumption that “the relation of income to economic welfare is the same for all members of the community” (Dalton (1920), p. 349). The only clue that Dalton gives to justify the particular formulation of his “axiom”, which is clearly at variance with his symmetry assumption just quoted, is contained in his footnote to the passage quoted in Section 1 above, and which reads:
16 “Inequality is certain to be diminished by a series of transfers such that all transfers from A, the richer, to B, the poorer, still leaves A richer than, or just as rich as, B. But if some of the transfers make B richer than A, it is possible that the effect of the series of transfers might cancel out and leave inequality the same as before”
The first sentence of this quote is plain enough; it restates what appears in Dalton’s text and confirms that this is what he really meant. The second sentence is more mysterious. Dalton’s motivation is seemingly to avoid defining a class of transfers a sequence of which would “leave inequality the same as before”. If we take this to mean that the end allocation of income is a permutation of the initial one, then Dalton seems to be saying that a series of T-transforms with λ > ½, could produce a permutation. This is undoubtedly true; in fact without any upper bound on λ , one opens the door to transfers -even a single one- which would “leave inequality the same as before” or even increase it, in an obvious sense. Why Dalton needed to consider “a series of transfers” is not clear. In any case, if Dalton’s intention was to exclude permutations, then the milder condition λ < 1 (rather than λ ≤ ½) would have done as well, as permutation matrices can be factorized only by permutation matrices, and this would have spared his article from suffering from acute schizophrenia as regards symmetry.
Kolm (1969), who introduced Muirhead-transfers and the preorder p M in the economic literature, carefully distinguished between such transfers and Daltontransfers and seems to be the only economist to have discussed Dalton-transfers per se: he calls them progressive transfers (see also Kolm (1996) who provides an economic discussion of the concept, and Foster (1985)) There are otherwise some
17 errors and much confusion in the later literature. A particularly egregious error is Marshall and Olkin (1979, p. 6)’s who affiliate Dalton-transfers to Muirhead’s Lemma, to the set S M (x), and in general to majorization. It is otherwise often the case in the literature that “the transfer axiom” merely refers to a transfer “from the rich to the poor” (a tradition going back to Pigou (1912, p. 24)), which definition is not sufficiently precise to be useful.
The perusal of the literature on the evaluation of income distribution, of which a recent synthesis can be found in Sen (1997), shows that among the possible transfer concepts, Muirhead transfers reign supreme: the typical inequality measure is a Schurconvex function, that is a function preserving p M , and the typical better-than-x-set is M S (x). The classical mathematical results which are mobilized to study income
inequality, such as Hardy, Littlewood and Polya’s Theorem, Blackwell’s Theorem and the Schur-Ostrowski Theorem all deal with p M ; so do concepts such as Lorenz curves, stochastic dominance, and mean-preserving spreads, of which Muirheadtransfers are a particular case. H-transfers have a natural niche in the literature on redistributions preserving horizontal equity.
Dalton-transfers as such thus do not play any real role in the literature on income inequality. Admittedly one finds many reference to “Dalton’s principle of transfer” but its specific meaning is completely lost when it is typically complemented by a symmetry (or anonymity) axiom, giving in all a very recondite version of the Muirhead-transfer axiom. Note that in such a construction, the concept of Dalton transfer is not indispensable; one may very well use the H-transfer axiom instead.
18 From the above discussion, it seems reasonable to suggest that the natural field of application of the preorder p M and the associated Muirhead-transfers is the discussion of income distribution (as opposed to re-distribution) starting from an “original position” where the rules of society are chosen before the individuals know whom they will be; hence symmetry. Dalton-transfers on the other hand are contingent on a prescribed existing allocation of income to individuals and are associated with redistribution rules and altruism in a well-defined sense, from a position where individuals know what their incomes are.
Dalton’s principle of transfer appears indeed in a meaningful way in the literature on altruism; under the form of the condition that altruism should not be “excessive”, meaning that every agent cares at least as much about himself as he cares about someone else; see for example Collard (1978, p. 9). The bound on the size of a Dalton transfer corresponds exactly to this idea. The study of altruism in evolutionary games seems to be a particularly relevant here, due to the fact that the creatures in such games interact precisely in a pairwise decentralized fashion. Bester and Guth (1998) for example show the important role played by the equivalent of Dalton’s bound on the size of the transfer in the determination of the evolutionary stability of altruism. In conclusion, it appears that the concept of a Dalton transfer invaded the literature on income inequality through an historical accident and that it is quite misplaced there. It is highly relevant though for modeling altruism and its consequences; hence the interest of being able to describe properly and characterize the set of allocations attainable through a sequence of such “non-excessive” altruistic transfers, which, unlike Muirhead transfers, are descriptive of human behavior.
19 Appendix Proof of Proposition 2. It is sufficient to consider n = 3. Let us assume that initially:
v = (1, 2, 3) and x = (a, b, c) with a < b < c and b > (a + c)/2.
(The case b < (a + c)/2 is similar and is left to the reader). If we now average income between the first and last individual, the resulting income vector is:
x * = ((a + c)/2, b, (a + c)/2),
corresponding to both v = (1, 3, 2) and v = (3, 1, 2).
Now, reaching the latter permutation by equalizing twice income between neighbors, we first reach v = (1, 3, 2), giving:
x ' = (a, (b + c)/2, (b + c)/2),
and then
x° = ((2a + b + c)/4, (b + c)/2, (2a + b + c)/4),
corresponding to v = (3, 1, 2). Comparing x * and x° , we have that in both cases, individuals 1 and 3 have equal income, and have less income than individual 2. But since (b + c)/2 > b, individual 2 is richer in x * than in x° . In other words, x * is a convex combination of x° and the point of equality x= where all individuals have
20 income (a + b + c)/3. Thus x * obtained by averaging over non-neighbors is in the interior of S D (x). The argument is illustrated on Figure 5. The reader can easily perform the corresponding construction for the case b < (a + c)/2, for example on Figure 2. If one had instead taken three arbitrary individuals i, j, k such that v−1 (i) < −1 −1 v (j) < v (k), one would obtain that averaging over i and k would have brought us to
a point which is a convex combination of the point x= and the point obtained by successively averaging between j and k and then i and k. (The latter may not be extreme, but this is irrelevant). "
Before turning to the proof of Proposition 3, we establish a couple of lemmas.
Lemma 1. Let two vectors x, y ∈ ∆n be such that:
∑
s i =1
xi ≤
∑
s
i =1
yi , for s = 1, …, n,
and construct two new vectors x' , y ' such that x'i = xi and y 'i = y i for i = 1, …, j−1, j+2, …, n, and
x' j = x' j +1 =
y j + y j +1 x j + x j +1 and y ' j = y ' j +1 = . 2 2
Then: (A-1)
∑
s i =1
x'i ≤
∑
s i =1
y 'i , for s = 1, …, n.
21 Proof. By construction, x' , y ' ∈ ∆n . Next, note that
∑
s
x' ≤ i =1 i
∑
s i =1
y 'i , for s =
1, …j−1, j+1, …, n, since x'i = xi and y 'i = y i for i = 1, …, j−1, j+2, …, n, and
∑
x'i = ∑i =1 xi ,
j +1 i =1
j +1
∑
j +1 i =1
y 'i = ∑i =1 yi . It remains to show that (A-1) also holds for s = j, j +1
that is:
∑i =1 xi + j −1
y j + y j +1 j −1 x j + x j +1 ≤ ∑i =1 y i + . 2 2
This follows from summing the following two inequalities:
1 2
∑
j −1 i =1
xi +
x j + x j +1 1 ≤ 2 2
∑
1 2
∑
j −1 i =1
yi +
y j + y j +1 2
and 1 2
∑
j −1 i =1
xi ≤
j −1 i =1
yi . "
Lemma 2. Let two vectors x, y ∈ ℜn , Σx = Σy , have x∧ and y ∧ as their increasing rearrangements. Then x p M y if and only if
(A-2)
∑
s i =1
∧ x i≤
∑
s i =1
∧ y i , for s = 1, …, n − 1.
If furthermore x, y are similarly ordered, then (A-2), x p M y, x p D y and x p H y are all equivalent.
22 The first part of the lemma is another way of stating Muirhead’s Lemma; (A-2) is commonly used as the definition of majorization (“x majorizes y”), Marshall and Olkin (1979). The second part is obvious from Section 1.
In the following, we use the notation x(i) rather than xi to index the elements of a vector in order to avoid, in many cases, multiple subscripts.
Proof of Proposition 3. Let c(t) = i mean that in iteration (averaging) t, we averaged income between individuals v(i) and v(i+1). For technical reasons, we allow c(t) = 0, meaning that we did nothing in iteration t. Whenever income is averaged between individuals v(i) and v(i+1), we switch the values of v(i) and v(i+1), that is we move from one cell to the next (Figure 1). Technically this is accomplished by performing the following sequence of assignments: w:= v(i); v(i):= v(i+1); v(i+1):= w.
Now, consider two situations: 1. In the first T iterations, we perform some averagings described by c1 (1), c1 (2), …, c1 (T). Iteration T+1 results in some averaging c1 (T+1) ≠ 0. Let the individuals taking part in this averaging to be i and j with i > j, such that in fact, −1 c1 (T+1) = v1 (j). Then a new series of averagings is performed, described by
c1 (T+2), c1 (T+3), …, c1 (N). Assume that after averaging N, i and j are again neighbors, that is: v1−1 (i) = v1−1 (j) − 1. In a final iteration N+1, we average income between i and j (for the second time), i.e. c1 (N+1) = v1−1 (i).
23 2. The first T iterations are as in situation 1, so that c2 (t) = c1 (t) for t = 1, …, T. Iteration T+1, on the other hand is such that c2 (T+1) = 0. Thereafter, iterations T+2 to N are again the same as in situation 1 and described by c2 (T+2), …, c2 (N), with c2 (t) = c1 (t) for t = T+2, …, N. Here also, i and j are neighbors after iteration N, but v−21 (i) = v−21 (j) + 1. Let c2 (N+1) = 0.
It is important to note that after iteration N + 1, v1 = v2 , hence both situations bring us to the same cell, see Figure 1. We can thus compare the two income vectors under the assumption that they are similarly ordered, and thus apply Lemma 2.
From this construction, one has the following; remember that xt (i) is the income of individual i after t iterations. We use subscripts “1” and “2” on x (as we did on c and v) to distinguish between the two above situations. Note that by summing over x(v(k)) rather than x(k), we are in fact summing over the increasing re-arrangement of x.
∑ ∑
s k =1
s k =1
t t x 2 (v 2 (k )) = ∑k =1 x1 (v1 (k )) for s = 1, …, n and t = 1, …, T s
T +1 T +1 x 2 (v 2 (k )) = ∑k =1 x1 (v1 (k )) for s = 1, …, c1 (T+1) − 1, c1 (T+1) + 1, …, n s
∑c
(T +1)
1
k =1
c (T +1) T +1 T +1 x2 (v 2 (k )) ≤ ∑k =11 x1 (v1 (k )).
This inequality is strict unless in iteration T+1, we averaged over two individuals who happened to have identical income. Generally we have thus:
24
∑
T +1 T +1 x (v 2 (k )) ≤ ∑k =1 x1 (v1 (k )) for s = 1, …, n. k =1 2 s
s
It now follows from Lemma 1 used repeatedly that as long as c2 (t) = c1 (t), this inequality remains valid for t = T+2, …, N as well, that is:
∑
s k =1
t t x 2 (v 2 (k )) ≤ ∑k =1 x1 (v1 (k )) for s = 1, .., n and t = T+2, …, N. s
Our final step is now to average income between i and j in situation 1 but not in situation 2. It is this final averaging which makes v1 = v2 .
To finalize the proof, we note that v1−1 (i) = v−21 (j) and v1−1 (j) = v−21 (i), as individuals i and j have opposite positions in the two situations. Furthermore, N N x1 ( j ) ≥ x 2 ( j )
since individual j has a higher income in situation 1 than in situation 2. Likewise: N N x1 (i ) ≤ x2 (i ) .
We can also compare the incomes of i and j across the two situations: N N N N x2 ( j ) ≤ x1 (i ) and x1 ( j ) ≤ x 2 (i ) .
These two inequalities follow from the fact that we averaged income in situation 1 but not in situation 2. We now use those results to show that
∑
s k =1
N +1 N +1 x 2 (v 2 (k )) ≤ ∑k =1 x1 (v1 (k )) for s = 1, .., n. s
As before, only one partial sum is changed from iteration N, namely the one corresponding to s = c1 ( N + 1) . Hence, since x1N (i ) ≤ x1N ( j ) ,
25
∑
C1 ( N +1) k =1
1 N +1 x2 (v2 (k )) = ∑k =1
C ( N +1) −1
1 N N x2 (v2 (k )) + x 2 ( j ) ≤ ∑k =1
C ( N +1) −1
N N x1 (v1 (k )) + x1 (i )
N N ( N +1) −1 N x1 (i ) + x1 ( j ) = c1( N +1) N +1 ( (k )) , ≤ ∑ck =11 x1 (v1 (k )) + ∑k =1 x1 v1 2
which completes the proof. "
We now give an algorithm which produces exactly once all extreme points of the set D S (x) for some x, assumed without loss of generality to be increasingly ordered. Thus
when the procedure is called for the first time, v(i) = i.
Procedure Average(v,x) Begin output(v,x) for i:= 1 to n−1 do if v(i) < v(i+1) then w:= v; w(i):=v(i+1); w(i+1):= v(i); y:= x; y(i):= y(i+1):= (x(i) + x(i+1))/2; Average(w,y); end; (*for, if*) End; (*Procedure*)
The procedure “Average” constructs a tree; the root represents the original income distribution, with v(i) = i. The root has n − 1 children, one for each pair of neighbors between which averaging takes place. For the example in Figure 8, the root has two children, corresponding to averaging between individuals 1 and 2, and individuals 2
26 and 3. We do not average between individuals 1 and 3, as they are not neighbors. The nodes at level 2 again have children, one for each pair of neighbors between whom averaging takes place. In Figure 8, each node has only one child, as only one pair of neighbors have not yet participated to an averaging. At the leaves (i.e. the end of a branch), the income distribution is reversed in all cases but the actual incomes vary from leaf to leaf. Each arc in Figure 8 is given a number, corresponding to the index of the extreme points ε 2 , …, ε 7 in Figure 3. Since the smallest of those numbers is “2”, it is the first arc to be generated and corresponds to extreme point ε 2 in Figure 3. By following the arcs of Figure 8, one can see the order in which the corresponding extreme points of Figure 3 are created. Each node in Figure 8 contains the vector v from the statement output(v,x) in Procedure Average. This corresponds one-to-one with the cell-indexing of Figure 3.
We now turn to an algorithm to test whether, starting from x ∈ ∆n and thus v(i) = i, it is possible to reach by a sequence of Dalton-transfers the point y with permutation u, whose increasing rearrangement is ξ ∈ ∆n . It is not necessary here to produce all extreme points but only those with the same permutation as y; only a portion of the above tree need be constructed.
27 Procedure Test(v,x,u); Begin if u = v then output(x) else begin for i:= 1 to n – 1 do if v(i) < v(i+1) then if u −1 (v(i)) > u −1 (v(i+1)) then begin w:= v; w(i):= v(i+1); w(i+1):= v(i);
ξ := x; ξ (i):= ξ (i+1):= (x(i) + x(i+1))/2; Test(w, ξ ,u); end; (*if*) end; (*for, if*) end; (*else*) End; (*Procedure*)
Note that the line “if u −1 (v(i)) > u −1 (v(i+1)) then begin” is a test to see if at this stage, the i’th and (i + 1)’th poorest have an ordering which is the opposite of the one in u. If so, their income is averaged (Proposition 3). The procedure output(x) simply writes the resulting vectors to some file. It is then a simple matter to check whether the better-than set according to p M of any of those vectors contains y, using the partial sums conditions (A-2) defining majorization (Lemma 2). For example, take x = (1, 4, 13) with v = (1, 2, 3) and y = (6.7, 6.0, 5.3), u = (3, 2, 1). The output of the procedure are the two extreme points ε 4 = (7.75, 5.125, 5.125) and ε 7 = (6.625, 6.625, 4.75). The test (A-2) of Lemma 2 indicates that ε 4 p M y and thus x p D y.
28 Had one instead taken y = (7.2, 5.8, 5), one would have found that neither “ ε 4 p M y” nor “ ε 7 p M y” holds and thus neither does x p D y. See Figure 4.
REFERENCES Bester, H. and W. Guth “Is altruism evolutionary stable ?” Journal of Economic Behavior and Organization, 34, 1998, 193-309. Collard, D. “Altruism and economy”, Oxford, Robinson, 1978. Dalton, H. “The measurement of the inequality of income” Economic Journal, Sept. 1920, 348-361. Foster, J. “Inequality measurement” in H. P. Young (Ed.) “Fair Allocation” Proceedings of Symposia in Applied Mathematics, 33, American Mathematical Society, Providence, 1985. Hoffman, A. “A special class of doubly stochastic matrices” Aequationes Mathematicae, 2, 1969, 319-326. Kolm, S. “The optimal production of social justice” in H. Guitton and J. Margolis (eds) “Public Economics” Macmillan, London, 1969. Kolm, S. “Modern Theories of Justice” M.I.T. Press, 1996. Marshall, A. and I. Olkin “Inequalities: Majorization Theory and its Applications” Academic Press, 1979. Muirhead, R. “Some methods applicable to identities and inequalities of symmetric algebraic functions of n letters” Proceedings of the Edinburgh Mathematical Society, 21, 1903, 144-157. Pigou, A. “Wealth and Welfare” McMillan, London, 1912. Sen, A. (with J. Foster) “On Economic Inequality” Clarendon Press, Oxford, 1997. Zylka, C. “A note on the attainability of states by equalizing processes” Theoretica Chimica Acta, 68, 1985, 363-377. Zylka, C. “Zum problem der Erreichbarkeit von Zustanden” Annalen der Physik, 7, 44, 3, 1987, 247-248. Zylka, C. “On the accessibility of states for systems with dissipative dynamics” Annalen der Physik, 47,7, 1990, 268-274.
29
0
18
2
16 c123
c213
4
14
6
12
8
10
10
8
12
6
14
16
c132
c231
2
c321
c312
18 0
4
0 2
4
6
8
10
12
14
16
18
30
x
ε2
C
A
ε5
ε3 x= ε4
ε7
ε6
B
C
B D
A ε
7
ε4
31
x x' x= x* o
x
ε2
x
i i
ε3
i8
7
6
ε 5
i
4
32
ε
x
2
ε A ε
4
5
i
3