Fair-by-design algorithms: matching problems and beyond David García-Soriano
Francesco Bonchi
ISI Foundation, Italy
[email protected]
ISI Foundation, Italy
[email protected]
arXiv:1802.02562v1 [cs.DS] 7 Feb 2018
ABSTRACT In discrete search and optimization problems where the elements that may or not be included in a solution correspond to humans, individual fairness needs to be expressed in terms of each individual’s satisfaction probability (the probability of being included in the solution). In this paper we introduce the maxmin fairness framework which provides, on any given input instance, the strongest guarantee possible for all individuals, in terms of satisfaction probability. A probability distribution over valid solutions is maxmin-fair if it is not possible to improve the satisfaction probability of any individual without decreasing it for some other individual which is no better off. We provide an efficient exact algorithm for maxmin-fair bipartite matching combining flow-based methods with algorithms for edge-coloring regular bipartite graphs. As shown in our experimental evaluation, our algorithm scales to graphs with millions of vertices and hundreds of millions of edges, taking only a few minutes on a simple architecture. We generalize our method to the case where the structure of valid solutions forms a matroid, in which case the price of fairness is zero, and show that then maxmin-fair distributions minimize social inequality among Pareto-efficient distributions. More generally, we prove that a maxmin-fair distribution of solutions to any combinatorial search problem may be found efficiently under the sole assumption that, given an arbitrary assignment of nonnegative weights to individuals, a maximum-weight solution may be found in polynomial time. This class of problems extends beyond matchings and matroids, and includes the vast majority of search problems for which exact algorithms are known. ACM Reference Format: David García-Soriano and Francesco Bonchi. . Fair-by-design algorithms: matching problems and beyond. In Proceedings of arXiv Technical Report. ACM, New York, NY, USA, 16 pages. https://doi.org/
1
INTRODUCTION
In domains such as education and employment, finance, search and recommendation, policy making, and criminal justice, algorithms have become an essential tool to provide better decisions by relying on data and quantitative measures. While decision-making based on algorithms and machine learning is becoming pervasive, awareness and concern about the risks of unfair automated decisions is rising quickly in the whole of society. As an attempt to answer them, the Obama Administration’s Big Data Working Group released a report [1] in May 2014 arguing that decisions informed by big data could have inadvertent discriminatory effects due to potential bias existing in the data and encoded in automated decisions. A subsequent report [2] called for algorithms that are “fair arXiv Technical Report, February 7, 2018 .
by design” and identifies “poorly designed matching systems” as one of the main flaws of algorithmic decision-making. The raising level of concern for algorithmic bias and the ensuing ethical and societal issues is reflected in attention that the topic has garnered within the research community. However, despite the fact that combinatorial search mechanisms lie at the basis of many automated decision systems, the bulk of the research in the area of algorithmic bias and fairness has mainly focused on one single problem: avoiding discrimination against a sensitive attribute (i.e., a protected social group) in supervised machine learning [33]. Our work departs from this literature in three main directions: (1) we focus on individual fairness instead of group-level fairness; (2) we focus on bias stemming from the algorithm design itself, rather than the bias existing in the input data and potentially leading to discriminatory decision-making models; (3) instead of supervised learning we focus on combinatorial search and optimization problems, where the solution may not be unique and individuals correspond to elements to be included in the solution. In this setting, the utility (or satisfaction) function of each individual is based on whether the individual is selected or not for the solution. Two individuals satisfying all relevant criteria equally well (e.g., having the same skill set) should have, in principle, the same probability of being selected. This is usually not the case as algorithms may be “biased by design”: bias may stem from something as petty as the order in which the algorithm chooses to process the list of candidates in its main loop (e.g., by alphabetical order or application date, which should be irrelevant), or some details of the internal workings of the algorithm. At KDD 2017, Cynthia Dwork opened her keynote [15] discussing the National Resident Matching Program and the Gale-Shapley algorithm for stable matching as an extreme example of an algorithm which is “biased by design”: this algorithm produces in fact a solution which is always the best for every man and the worst for every woman (see the box on p.2). Algorithmic bias and randomization. Consider a job-search setting and suppose we have a certain number of positions and applicants. Assume that each applicant has a binary fitting for each of the positions (either she is fit for the job or not) and a binary satisfaction function (either she is selected or not). This can be represented as a bipartite graph with applicants on one side and positions on the other one; the selection task amounts to finding a matching. Unless a matching satisfying simultaneously all applicants exists, some of them will have to be left out from the solution. An applicant who was not selected could notice that there are other matchings satisfying her. Although other equally good solutions exist, even of maximum size, the algorithm is programmed to pick one deterministically: the applicant might rightfully deem this unfair. Unlike the Gale-Shapley algorithm, whose bias can be characterized by a simple theorem, for the problems we consider in this
arXiv Technical Report, February 7, 2018
David García-Soriano and Francesco Bonchi Gale-Shapley algorithm: biased by design
In 2012 Lloyd Shapley and Alvin Roth were awarded the Nobel Prize in Economics for their contributions to the theory and practice of stable matchingsa . Shapley developed the notion of stability, the central concept in the cooperative game theory, during the 50s. The Gale-Shapley algorithm, presented in 1962 [22], solves the stable matching problem: given n men and n women, where each individual has ranked all members of the opposite sex in order of preference, match each man to a woman so that there are no two people of the opposite sex who would both rather be matched to each other than to their current partners. The practical relevance of the Gale-Shapley algorithm was recognized in the early 1980s, when Alvin Roth studied the job market for recently graduated medical students in the U.S. who are employed as residents (interns) at hospitals. Hospitals and residents rank each other, and a centralized system, called the National Resident Matching Program (NRMP), produces a matching. Roth discovered that the algorithm in use by the NRMP since the early 1950s was closely related to the Gale-Shapley algorithm, and hypothesized that the fundamental reason for its success was its producing stable matchings. In the early 1990s, Roth went on to study similar medical markets in the UK, where different regions had adopted different methods: those resulting in stable matches were found to be more successful than those who did not. The Gale-Shapley algorithm is a propose-and-reject algorithm: in every round each unengaged man proposes to the first woman on his list not yet crossed off, and each woman “provisionally engages” with her preferred proposer while definitively rejecting all the others. Rejected men cross off the woman from their list. The provisional nature of engagements permits an engaged woman to change partners in case a better proposal arrives. Once jilted, a man becomes unengaged again, crosses off the woman from his list, and proposes to the top woman on his list not yet crossed off. The algorithm always terminates with a stable matching which, among all possible stable matchings for a specific problem instance, is simultaneously the best for every man and the worst for every woman (of course the property can be reversed, by changing the side that proposes). a https://www.nobelprize.org/nobel_prizes/economic-sciences/laureates/2012/popular-economicsciences2012.pdf
paper it may be hard to tell in advance which individuals a particular algorithm will favour. However, the fact that the bias is not easy to pinpoint does not mean it does not exist, just that we do not know what it is. Since no single candidate solution satisfying all individuals at the same time can exist in general, we we turn our attention to randomized algorithms, which make random choices to pick from among several valid solutions. We request that every individual has as high a chance as possible to be satisfied. Contributions and roadmap. Consider again the job-search setting with a single open position and n applicants fit for it: intuitively, the fairest solution would choose one applicant uniformly at random, so every applicant has a guaranteed satisfaction probability of 1/n. However, as the graph between applicants and jobs grows more complex it becomes unclear how to proceed. Simple techniques (such as randomizing the input order before running the standard deterministic assignment algorithm) only serve to shift the bias somewhere else. Instead, we need to define which properties our randomized algorithm must exhibit in order to be fair. In this paper we introduce the maxmin fairness framework which provides, on any given input instance, the strongest guarantee possible for all individuals, in terms of satisfaction probability, and discuss methods for its transparent and accountable deployment (Section 2). We then study the case in which the structure of valid solutions forms a matroid: in this case maxmin fairness is attainable at no cost in solution size (i.e., the price of fairness is zero). We also show that maxmin-fairness minimizes the largest inequality gaps in satisfaction probabilities between all pairs of individuals, among all Pareto-efficient distributions. Finally, we characterize the “amount of fairness” attainable in any given matroid problem instance, generalizing Hall’s matching theorem (Section 3). We apply our framework to bipartite matching (Section 4) and show that, by combining flow-based methods to find a “fair decomposition” with algorithms for edge-coloring regular bipartite graphs, an exact solution for maxmin-fair bipartite matching can be determined in a logarithmic number of maximum flow computations. Our algorithm runs faster in practice than its theoretical
running time of O(min(|E| 3/2 , |E||V | 2/3 ) · (log |V |)2 ) and scales to very large graphs, as shown in our experiments (Section 6). We also show how to extend these ideas to matchings in general graphs. Finally, we generalize our algorithmic techniques to prove that a maxmin-fair distribution of solutions to any search problem may be found efficiently under the sole assumption that, given an arbitrary assignment of non-negative weights to individuals, a maximum-weight solution may be found in polynomial time (Section 5). This is a large class problems including the vast majority of search problems for which exact algorithms are known.
2 MAXMIN FAIRNESS Consider a general search problem instance I = (U, S) defined over a finite set of individuals U and where S , ∅ denotes the set of feasible solutions for the problem instance I. Note that S is defined implicitly by the structure of the problem, rather than being explicitly encoded in the input. (For example, instance I could represent a bipartite graph between jobs and a set U of applicants, and S could represent the set of all matchings.) We assume that for every solution S ∈ S, each individual u ∈ U is either fully satisfied or fully dissatisfied, and this is the only property of the solution we are concerned with. For the sake of simplicity we will identify each solution in S with the subset of users satisfied by it, so S ⊆ U. Given I, our problem is to return an element of S while providing a fairness guarantee to all individuals in U. Since no single candidate solution satisfying all u ∈ U at the same time exists in general, we turn our attention to a randomized algorithm A that, for any given problem instance I, always halts and selects one solution A(I) from S. Thus A induces a probability distribution D over S: PrD [S] = Pr[A(I) = S] for each S ∈ S. In the following we will blur the distinction the randomized algorithm A and the distribution D induced by it. Definition 1 (Satisfaction probability). Given a search problem instance I = (U, S) and a probability distribution D defined over S, the satisfaction probability of each individual u ∈ U under D is D[u] = PrS ∼D [u ∈ S].
Fair-by-design algorithms: matching problems and beyond a0
b0
a1
b1
a2
b2
arXiv Technical Report, February 7, 2018 Another maxmin-fair distribution is, e.g., the one choosing uniformly at random between S 1 , S 2 and S 5 .
a3
Figure 1: An example bipartite graph between people (on the left) and jobs (on the right): there is an edge when the applicant is fit for the job. It may be tempting to call a distribution fair if all u ∈ U have exactly the same satisfaction probability. The next example illustrates why this would not make for a good definition. Example 1. Consider the matching problem in Figure 1. Here U = {a 0 , a 1 , a 2 , a 3 } and S is the set of all possible matchings. An individual u ∈ U is satisfied by a solution S ∈ S iff it is matched in S (i.e., she is selected for the job). Let D be the distribution assigning probability 13 to each of the following solutions: S 1 = {(a 0, b 0 ), (a 1 , b 1 ), (a 2 , b 2 )}, S 2 = {(a 0, b 0 ), (a 1 , b 1 ), (a 3, b 2 )}, S 3 = {(a 2, b 2 ), (a 3 , b 1 )}. All satisfaction probabilities under D are exactly the same, namely 23 .While D might naively look “fair”, notice that the job b 0 is left unassigned in S 3 , despite the existence of a fitting candidate left unemployed (a 0 ). This artificially restricts the satisfaction probability of a 0 , who can always be satisfied in a very strong sense: for any matching covering a subset T ⊆ {a 1, a 2 , a 3 }, there is another matching covering T ∪ {a 0 }. So a 0 can always be satisfied without hurting anyone else’s chances, and any reasonable solution has to match a 0 with probability 1. Other applicants will have lower satisfaction probability (because no matching covers a 1, a 2 and a 3 ). Based on the insight from this example, we next provide the key definition of our work. Intuitively, a distribution over solutions is maxmin-fair if it is impossible to improve the satisfaction probability of any individual without decreasing it for some other individual which is no better off. Definition 2 (Maxmin-fairness). A distribution F over S is maxmin-fair for U if for all distributions D over S and all u ∈ U D[u] > F [u] =⇒ ∃v ∈ U | D[v] < F [v] ≤ F [u].
(1)
Similarly, a randomized algorithm is maxmin-fair if it induces a maxmin-fair distribution. Example 2. Consider again the job matching problem of Example 1. A distribution assigning non-zero probability to a solution not covering a 0 (such as S 3 ) cannot be maxmin-fair, as one can increase the satisfaction probability of a 0 , by adding (a 0 , b 0 ), without detriment to anyone else. So for D to be maxmin-fair we must must have D[a 0 ] = 1. On the other hand, notice that {a 1, a 2, a 3 } have only two neighbors {b 1 , b 2 }: this makes it is impossible to guarantee satisfaction probability better than 23 for a 1 , a 2 and a 3 at the same time. The graph in this example has four maximum matchings: S 1 and S 2 from Example 1, S 4 = {(a 0 , b 0 ), (a 1 , b 2 ), (a 3, b 1 )}, and S 5 = {(a 0, b 0 ), (a 2 , b 2 ), (a 3 , b 1 )}. The distribution choosing among S 1 , S 2 , S 4 and S 5 with probability 13 , 61 , 61 , 13 , respectively, is maxminfair. The satisfaction probabilities of a 0 , a 1 , a 2 and a 3 are then 1, 32 , 23 , 23 . Any attempt to match, say, a 1 with probability > 32 will necessarily result in satisfaction probability < 23 for a 2 or a 3 .
Finding a fair distribution is a daunting task, as it involves solving a continuous optimization problem over (infinitely many) distributions over the set S of valid solutions (which is commonly exponential in size). The problem that we face is thus how to design a randomized algorithm inducing a maxmin-fair distribution. Problem 1. Given a search problem, design a randomized algorithm A that always terminates and such that, for each instance I = (U, S), the distribution of A(I) is maxmin-fair for U over S. Transparency and algorithmic accountability. Even a provably fair algorithm might still be perceived by the average user as a blackbox outputting an arbitrary solution. For the sake of transparency and accountability, it can be interesting to publish all the solutions in a maxmin-fair distribution (along with their respective probabilities). Once a complete fair distribution is published, convincing any user u of fair treatment amounts to: (1) letting u verify independently the fairness guarantees of the distribution (for this it is also possible to output a short certificate, based on what we call a fair decomposition, of the fact that no higher probability for u is possible in a maxmin-fair distribution); and (2) picking one of the published solutions at random, via any fair and transparent lottery mechanism or coin-tossing protocol (this is the only stage where randomness plays a role, as the fair list itself may be found deterministically). One difficulty is the potentially large support size of the maxmin-fair distribution, which could prevent publication. An interesting question is if we can produce a maxmin-fair distribution with small support. For matchings, we prove (Sec. 4) that |U|−1 solutions suffice; the actual number can be substantially less in practice (Sec. 6). This bound can be substantially reduced in that, if the “fairness parameter” is not too small, then a distribution with nearly optimal fairness and small support can be found (Section 5).
3 FAIRNESS AND SOCIAL INEQUALITY In this section we present several properties of maxmin-fair distributions. The first observation is that, while several maxmin-fair distributions may exist, their corresponding satisfaction probabilities for each user are unique: Lemma 1. Let F and D be two maxmin-fair distributions. Then F [u] = D[u] for all u ∈ U. Proof. Assume that the set is non-empty. Let
A = {u ∈ U | F [u] , D[u]}
u = arg min{ min(F [u], D[u]) | u ∈ A },
where ties are broken arbitrarily. Then D[u] , F [u]; suppose that D[u] > F [u]. Then for any v ∈ A, our choice of u implies that D[v] ≥ min(D[v], F [v]) ≥ min(D[u], F [u]) = F [u]; and for any v < A, we have D[v] = F [v] by definition. In either case one of the inequalities required by condition (1) fails, so F is not maxmin-fair. Put differently, we have shown the following implication: F is maxmin-fair =⇒ D[u] < F [u].
arXiv Technical Report, February 7, 2018
David García-Soriano and Francesco Bonchi
Similarly, D is maxmin-fair =⇒ F [u] < D[u]. But then F and D cannot both be maxmin-fair. The only way out of this contradiction is to conclude that A is empty. Given a distribution D over S, write D ↑ = (λ 1 , . . . , λn ) for the vector of satisfaction probabilities (D[u])u ∈U sorted in increasing order. Let ≻ denote the lexicographical order of vectors1 . The following characterization holds. Theorem 1. A distribution F is maxmin-fair if and only if F ↑ D ↑ for all distributions D over S. Proof. =⇒ Let F be maxmin-fair and consider any other distribution D. We need to show that F ↑ D ↑ (that is, F ↑ is lexicographically largest). Define A = {u ∈ U | F [u] , D[u]}.
If A is empty, the claim is trivial; otherwise let
u = arg min{min(F [u], D[u]) | u ∈ A}.
As in the proof of Lemma 1, from the maxmin-fairness of F we deduce that F [u] > D[u]. Then for any v it cannot be the case that both F [v] < D[u] and F [v] ≤ D[v] hold, as this would imply min(F [v], D[v]) = F [v] < D[u] = min(F [u], D[u]). In other words: F [u] > D[u] and (F [v] < D[u] =⇒ F [v] > D[v]).
It is readily verified that this implies F ↑ ≻ D ↑ . ⇐= Let F be a distribution which is not maxmin-fair. We show that F is not lexicographically largest either. Since (1) does not hold for F , there exist another distribution D and a user u ∈ U such that D[u] > F [u] and (D[v] < F [v] =⇒ F [v] > F [u]) ∀v.
(2)
For any ε ∈ (0, 1), let X ε denote the distribution picking F with probability 1 − ε and D with probability ε, so that X ε [v] = F [v] + ε(D[v] − F [v]) ∀v.
Choose ε > 0 small enough so as to guarantee that and
(F [v] < F [u] =⇒ X ε [v] < X ε [u]) ∀v
(3)
(F [v] > F [u] =⇒ X ε [v] > X ε [u]) ∀v. (4) o n F [u]−F [v] For instance, any ε < 21 min D[v]−F [v] | v; D[v] , F [v] , F [u] will do. We have, by (2), (F [v] ≤ F [u] =⇒ D[v] ≥ F [v] =⇒ X ε [v] ≥ F [v]) ∀v
(5)
and
D[u] > F [u]. (6) But (3), (4), (5) and (6) say that X ε ↑ is strictly larger than F ↑ in lexicographical order, as we wished to show. In other words, a maxmin-fair distribution maximizes the smallest satisfaction probability; subject to that, it maximizes the secondsmallest satisfaction probability, and so on. As the probability vectors defining distributions over S form a non-empty compact set, and the mapping from such vectors to their corresponding sorted satisfaction vectors is continuous, the following also holds by Weierstrass theorem. 1 Recall that (v , . . . , v ) ≻ (w , . . . , w ) iff there is some index i ∈ [n] such that 1 n 1 n v j = w j for all j < i and v i > w i ; the relations , ≺ and are defined similarly.
Theorem 2. Given a search problem instance I = (U, S), a maxmin-fair distribution always exists. Matroid problems. Many search/optimization problems (including matching) can be formulated in terms of matroids. Definition 3 (Matroid problem). Let L be a finite set. A matroid with ground set L is a non-empty collection M of subsets of L satisfying the following two properties: (1) if A ∈ M and B ⊆ A, then B ∈ M; (2) for any X ⊆ L, all maximal subsets of X belonging to M have the same size. 2 A search problem is a matroid problem if for any instance I = (U, S), the set S is a matroid. Condition (1) is natural after identifying solutions with user subsets. Condition (2) is met, e.g., by the sets of matchable vertices in a graph (Sec. 4) and many other problems (see Sec. 5 for examples.) The elements of a matroid M are called independent sets. The maximal elements of M are called bases. All bases have the same size. The rank function of M is ρ M (S) = max{|X | | X ⊆ S, X ∈ M }. The rank function is monotone submodular, meaning that for all S,T ⊆ L, it holds that 0 ≤ ρ(S ∪ T ) − ρ(S) ≤ ρ(T ) − ρ(S ∩ T ). Conversely, any monotone submodular function f defined on subsets of L and such that f (X ) ≤ |X | for all X is the rank function of some matroid. The dual matroid of M is the matroid with ground set L given by M ∗ = {L \ S | S ∈ M }; clearly the dual of M ∗ is M itself. The rank function of M ∗ is given by ρ ∗ (S) = |S | − (ρ(L) − ρ(L \ S)). The span of X ⊆ L is the unique maximal superset of X with rank ρ(X ). A parallel element of M is an element u ∈ L such that ρ({u}) = 0, that is, {u} < M. See [34] for proofs of these statements. Our next result plays a major role in the development of fair algorithms: it tells us that the only obstruction to achieving high satisfaction probability for every user is the existence of a set of users with small rank-to-size ratio. Finding these obstruction sets will enable us to devise a divide and conquer strategy to obtain fair distributions. This characterization is of independent interest. Theorem 3. Let M be a matroid with ground set L and rank function ρ : 2L → Z. The minimum satisfaction probability ino a maxminn
fair distribution over M is λ 1 = min
ρ (X ) |X |
|0,X ⊆L .
Proof. Any maxmin-fair distribution is supported on the collection B of bases of M, since extending an independent set to a base containing it never decreases any satisfaction probability. Optimizing the smallest satisfaction probability λ 1 amounts to finding a suitable distribution over B; let us denote the corresponding probabilities by {p B }B ∈B . Since the probability of v ∈ U being Í included is v ∈B p B , maximizing the minimum such probability is modeled by Program (7) below. It may be written as a linear program by introducing an additional variable; its dual is equivalent to (8).
2 Equivalently, the second condition may be replaced with: (2’) if A,
|B | , then there is y ∈ B \ A such that A ∪ {y } ∈ M .
B ∈ M and |A |
E[u] for at least one u ∈ U. Clearly any maxmin-fair distribution is Pareto-efficient, hence any solution in its support is maximal (with regard to set inclusion). We next introduce the notion of minmax-Pareto efficiency, a kind of dual to maxmin-fairness: it says that no user satisfaction can be decreased without increasing that of another user which is no worse off, or losing Pareto-efficiency. Definition 5. A Pareto-efficient distribution F over S is minmax-Pareto for U if for all Pareto-efficient distributions D over S and all u ∈ U, it holds that D[u] < F [u] =⇒ ∃v ∈ U | D[v] > F [v] ≥ F [u]. Requiring Pareto-efficiency is redundant for maxmin-fairness, but crucial for minmax-Pareto efficiency; without it, a solution satisfying nobody would be minmax-Pareto. Similarly to Lemma 1, the satisfaction probabilities of a minmaxPareto distribution are uniquely determined. Lemma 2. If F and D are both minmax-Pareto, then F [u] = D[u] for all u ∈ U. We also have an analogue of Theorem 1: let F ↓ denote the satisfaction vector of distribution F sorted in decreasing order. Then the following holds. Theorem 4. For matroid problems, a distribution F is minmaxPareto if and only if F is Pareto-efficient and F ↓ D ↓ for all Paretoefficient distributions D. Proof. First observe that, for matroids, a distribution is Paretoefficient if and only if it is supported over bases. For any distribution D of bases over a matroid M with ground set L, consider the distribution D ∗ of (L \ X | X ∼ D) of bases over the dual
arXiv Technical Report, February 7, 2018
David García-Soriano and Francesco Bonchi
matroid M ∗ . Then we have D[u] + D ∗ [u] = 1 for all u ∈ L, so clearly F is minmax-Pareto if and only if F ∗ is maxmin-fair, which (by Theorem 1) occurs if and only if F ∗ is lexicographically largest for M ∗ , which in turn is equivalent to F being lexicographically smallest among Pareto-efficient distributions for M, as we wished to show. In fact, for matroid problems, this notion coincides with that of maxmin fairness: any excess satisfaction probability for the bestoff user can be taken away from him and redistributed to others.3 Thus the maxmin-fair solution minimizes the largest gap in satisfaction probabilities; among those, it minimizes the second-largest gap, etc. Definition 6 (Sorted ineqality vector). The sorted in↓ equality vector of a distribution D over S, written D , , is the vector of all pairwise differences in the satisfaction probabilities of the elements of U under D, sorted in decreasing order. Theorem 5. For matroid problems, the following are equivalent: (1) D is maxmin-fair; (2) D is minmax-Pareto; (3) D is Pareto-efficient ↓ ↓ and D , E , for all Pareto-efficient distributions E over S. Proof. First observe that whether D is maxmin (resp., minmax) fair depends only on the satisfaction probabilities given by D, by Theorem 1 (resp., Theorem 4). Therefore, by Lemmas 1 and 2, showing the existence for every problem of a distribution which is both maxmin-fair and minmax-Pareto suffices to show that (1) ⇔ (2). Furthermore, if said equivalence holds, then a maxmin-fair distribution simultaneously maximizes the minimum satisfaction probability and minimizes the maximum satisfaction probability (among Pareto-efficient distributions), hence it also minimizes the largest difference in satisfaction probabilities. An easy inductive argument shows then that the equivalence (1) ⇔ (3) follows from the equivalence (1) ⇔ (2), which we prove below. Let M be a matroid. We argue by inductiono on the size of its n ρ (X ) |X |
ground set L. Let λ = min | 0 , X ⊆ L . By Theorem 3, there is a maxmin-fair distribution over M with minimum satisfaction probability λ. We distinguish two cases:
(a) ρ(L) = λ|L|. Observe that any Pareto-efficient distribution D is supported on the bases of M, so the expected number of satisÍ fied elements under D is u ∈L D[u] = ρ(L). In particular, if D is maxmin-fair then D[u] = λ for every u ∈ L because if the strict inequality D[u] > λ held for some u ∈ L, we would have Í u ∈L D[u] > λ|L| = ρ(L). But then D must also be minmaxPareto, for otherwise there would be a Pareto-efficient distriÍ Í bution D ′ with u ∈L D ′ [u] < u ∈L D[u] = ρ(L). (b) ρ(L) > λ|L|. Then there is a maximal proper subset S of L such that ρ(S) = λ|S |. Consider the restriction of M to the set S, i.e., the matroid M | S with ground set S and independent sets M | S = {I ∈ M | I ⊆ S }. By the reasoning of case (a), a maxmin-fair distribution D 1 over M | S satisfies D 1 [u] = λ for all u ∈ S. By the maximality of S, for any ∅ , X ⊆ L \ X it holds that ρ(S ∪ X ) > λ|S ∪ X | = ρ(S) + λ|X |. 3 This
(11)
equivalence fails outside matroid problems. A counterexample is U {0, 1, 2, 3} and S = { {0, 1}, {1, 3}, {0, 2, 3} } .
=
Now consider the contraction of M to the set L \ S, i.e., the matroid M/S with ground set L \ S and rank function ρ M/S (X ) = ρ(S ∪ X ) − ρ(S). Since ρ M/S (X ) > λ|X | by (11), Theorem 3 gives a maxmin-fair distribution D 2 over M/S such that D 2 [v] > λ for all v ∈ L \ S. The contracted matroid M/S satisfies the following (see [34]): for any base B of S, a subset I ⊆ L \ S is independent in M/S if and only if I ∪ B is independent in M.
(12)
Given distributions A over M | S and B over M/S, denote by [A ∪ B] the distribution of (M ∪ N | M ∼ A, N ∼ B). It follows from (12) that for any distribution X over M, there exists a distribution of the form [X 1 ∪ X 2 ] with the same satisfaction probabilities for all u ∈ L. Indeed, one can take X 1 to be the distribution of A ∩ S (where A is drawn from X ), and X 2 to be the distribution of A ∩ (L \ S); by property (12), if A1 is drawn from X 1 and A2 is drawn from X 2 , then the set A1 ∪ A2 is independent in M. In particular some maxmin-fair distribution F for M can be written in this form: F = [F 1 ∪ F 2 ] (recall Lemma 1). Since the distribution D 3 = [D 1 ∪ D 2 ] satisfies D 3 [u] = λ for all u ∈ S and D 3 [v] > λ for all v ∈ S, the fact that F ↑ D 3 ↑ implies that F [u] = λ for all u ∈ S, i.e., F 1 is maxmin-fair. But then F 2 must be maxmin-fair as well, for otherwise there would be a distribution G 2 over M/S such that G 2 ↑ ≻ F 2 ↑ , which would imply [F 1 ∪ G 2 ] ↑ ≻ F ↑ . Conversely, if F 1 and F 2 are maxminfair, then (again by Lemma 1) F 1 [u] = D 1 [u] for all u ∈ S and F 2 [v] = D 2 [v] for all v < S; hence [F 1 ∪ F 2 ][v] = F [v] for all v ∈ L and [F 1 ∪ F 2 ] must be maxmin-fair. We have shown that F 1 and F 2 satisfy [F 1 ∪ F 2 ] is maxmin-fair ⇔ F 1 and F 2 are maxmin-fair. Similarly, [F 1 ∪ F 2 ] is minmax-Pareto ⇔ F 1 and F 2 are minmax-Pareto. But since |S |, |L \S | < |L|, the induction hypothesis applies and we conclude that F 1 is maxmin-fair if and only if it is minmaxPareto, and likewise for F 2 . This implies the result. Corollary 3.1. The maximum satisfaction probability in a maxmin-fair distribution over M is ρ(L) − ρ(L \ X ) λ |L | = max |0,X ⊆L . |X | Proof. By Theorem 5, it suffices to determine the maximum satisfaction probability in a minmax-Pareto distribution. By definition this is the smallest number µ such that there is a distribution over bases in which each element is covered with probability ≤ µ; this is equivalent to saying that there is a distribution over bases of the dual matroid in which element is covered with probability ≥ 1 − µ. The rank function for the dual matroid is given by ρ ∗ (X ) = |X | − (ρ(L) − ρ(L \ X )); plugging in this expression into Theorem 3 yields the result.
Fair-by-design algorithms: matching problems and beyond
4
FAIR MATCHING
Preliminaries. Let G = (V , E) be an unweighted simple graph. A matching in G is a set of vertex-disjoint edges of G. A maximum matching is a matching of largest size. The matching M covers a vertex v ∈ V if v is incident to some edge in M. A set S ⊆ V is matchable if there is a matching of G covering all of S. For S ⊆ V , define ρG (S) as the size of the largest matchable subset of S; then ρG (V ) is the size of the maximum matching of G. Denote by ΓG (S) the set of neighbours of S in G. Usually we will drop the G subscript. An instance of the fair matching problem encodes a graph G = (V , E) and a set U ⊆ V of users. Following our assumption of binary satisfaction, user u ∈ U is satisfied by a matching M if u is covered by M. The set of valid solutions is the set S of matchable subsets of V . Problem 1 asks for a maxmin-fair distribution of matchings. The following states that we may extend any matchable set of vertices to a matchable set of maximum size. 4 The existential result was discovered by Berge, and its algorithmic counterpart by Edmonds. Theorem 6 (Berge [3], Edmonds[18]). For any matchable set S ⊆ V , there is a matchable set T ⊇ S with |T | = ρ(V ). Moreover, given S, a matching covering T may be found in polynomial time. Consequently, any matching problem is a matroid problem with rank function ρ (Section 3). In particular, any matching in the support of a Pareto-efficient distribution is maximum in size. Reduction to one-sided matching. The one-sided fair bipartite matching problem is the special case of fair matching where G is bipartite (with bipartition V = L ∪Û R) and U = L. In our example in Figure 1, L represents candidates and R represents jobs. Notably, this special case can encode any other matching problem: Theorem 7 (Ford and Fulkerson [20]). For any graph G there exists a bipartite graph H with bipartition (L H , R H ) such that L H = V (G) and the collection of matchable subsets of V (G) in G equals the collection of matchable subsets of L H in H . This is normally stated as “any matching matroid is transversal”. The construction of H in Theorem 7 can be carried out in polynomial time (see [39] for a simple proof). Hence the case of non-bipartite G can be reduced to the one-sided bipartite case. A similar remark applies to general user sets U ⊆ V , as we can remove from L H the elements of V \ U, which has no effect on the collection of matchable subsets of U in H . We make the additional simplifying assumption that R is matchable. If not, find an arbitrary maximum matching of G and remove from R all unmatched vertices. Let R ′ denote the remaining vertices. Theorem 8 (Mendelsohn and Dulmage [14]). If both A ⊆ L and B ⊆ R are matchable in a bipartite graph G with bipartition (L, R), then A ∪ B is also matchable in G.
It follows that for each distribution of matchings of G there is another distribution with the same coverage (satisfaction) probabilities for L and covering only elements of L ∪ R ′ . Note that the coverage probability of each v ∈ R ′ in this distribution is 1. 4
(Such an extension is not possible for the edges in a matching.)
arXiv Technical Report, February 7, 2018 The case where ρ(L) = |R| is easily handled separately (any maximum matching algorithm is maxmin-fair in this case) , yielding: Theorem 9. The fair matching problem on arbitrary graphs with arbitrary user sets can be reduced in polynomial time to the one-sided fair bipartite matching problem on graphs where ρ(L) = |R| < |L|. Proof. Let A be a maxmin-fair algorithm for one-sided bipartite matching. Given a graph G and a user set U, we (1) construct H as in Theorem 7; (2) remove from L H the elements of V (G) \ U; (3) find an arbitrary maximum matching M and remove from R H the elements not covered by M, using any polynomial-time maximum matching algorithm. (4) find a fair one-sided bipartite matching by either (1) using A on the resulting graph if |M | < |L H | or (2) returning M if |M | = |L H |; (5) given the solution S found at the previous step, return a matching in G covering the same vertices as S using Theorem 6. (Step (5) is technically redundant as we identify solutions with sets of matchable users, but is included for clarity.) It is plain to see that the resulting distribution is maxmin-fair for the general problem if and only if A is maxmin-fair for the onesided problem. All steps run in polynomial time, possibly excluding the call to A itself. By Theorem 9 we henceforth focus on one-sided bipartite matching, assuming ρ(L) = |R| < |L| , n. Fairness parameter. We next ask the following important question: what is the minimum satisfaction probability λ 1 of a maxminfair distribution? Hall’s marriage theorem gives a necessary and sufficient condition for the existence of a matching covering the whole of L, which is equivalent to having λ 1 = 1. Theorem 10 (Hall [28]). In a bipartite graph with bipartition (L, R), the set L is matchable if and only if |Γ(S)| ≥ |S | for all S ⊆ L. We show a generalization of Hall’s theorem which will prove useful to characterize the fairness parameter in bipartite matching, and is simpler than the one given by Theorem 3. Theorem 11. Let {αv | v ∈ L} be reals in [0, 1]. A necessary and sufficient condition for the existence of a distribution D of matchings of G such that D[v] ≥ αv for all v ∈ L is Õ αv . (13) for all S ⊆ L, |Γ(S)| ≥ v ∈S
Proof. Necessity is clear because no matching can cover more than |Γ(S)| elements of any set S, but the expected number of eleÍ Í ments of S covered by D is v ∈S D[v] = v ∈S αv by linearity of expectation. For sufficiency, we may assume that all the αv are rational because (13) is a finite set of inequalities with integral coefficients, so Í Í the maximizer of v βv subject to |Γ(S)| ≥ v ∈S βv and βv ≥ αv will have βv ∈ Q. Let M be a suitable common denominator, so that αu = βu = nu /M where M ≥ nu ∈ N. Construct a graph G ′ with • nu replicas u (1) , . . . , u (nu ) of each u ∈ L; • M replicas v (1) , . . . ,v (m) of each v ∈ R;
arXiv Technical Report, February 7, 2018
David García-Soriano and Francesco Bonchi
• V (G ′ ) = L ′ ∪ R ′ , where L ′ = {u (i ) | u ∈ L, i ≤ nu } and R ′ = {v (i ) | v ∈ R, i ≤ M }. • E(G ′ ) = {(u (i ),v (j) ) | (u,v) ∈ E(G), i ≤ nu , j ≤ M }.
This graph is bipartite with bipartition (L ′, R ′ ). Notice that vertices with αv = 0 have no replica in G ′ . Consider (in G) the sets Ak = {u ∈ L | nu ≥ k } for k = 1, 2, . . . , M. Given k and a set S ⊆ Ak let S (k) = {u (k) | u ∈ S }.If A1 = ∅ the theorem is trivial. Otherwise, let H 1 denote the sub(1) graph of G ′ induced by A1 ∪ R ′ . Any subset of A′1 in H 1 is of the form S (1) , for some S ⊆ A1 . Using (13) we obtain Õ nu ≥ |S (1) |, |ΓH 1 (S (1) )| = M · |ΓG (S)| ≥ u ∈S
because nu ≥ 1 for u ∈ A1 ⊇ S. By Hall’s Theorem, there is a (1) matching X 1 in H 1 covering A1 . If A2 , ∅, let H 2 denote the subgraph of G ′ \ V (X 1 ) induced by (2) A2 ∪R ′ . As we removed the edges of the matching X 1 , the number of neighbours in G ′ of any set S ⊆ A2 has decreased by at most |S |, so for any S ⊆ A2 he have Õ (nu − 1) ≥ |S (1) |, |ΓH 2 (S (2) )| ≥ M · |ΓG (S)| − |S | ≥ u ∈S
because nu ≥ 2 for u ∈ A2 ⊇ S. Hence there is a matching X 2 (2) in H 2 covering A2 . Proceeding similarly, an inductive argument shows that we obtain at most M vertex-disjoint matchings in G ′ ; their union is a matching X ′ in G ′ covering L ′. By restricting X ′ to each replica of R in R ′ , we can decompose X ′ into M matchings X 1 , . . . , X M , each of them inducing a matching in G. Furthermore, each u ∈ L is covered in exactly nu of these, since X ′ covers L ′. Thus the uniform distribution over X 1 , . . . , X M yields coverage probability nu /M = αu for each u ∈ L. The proof gives a maxmin-fair distribution which is uniform over a multiset of M matchings, but M may be fairly large. In fact one may construct examples where the support √ size of any uniform maxmin-fair distribution is as high as 2 Ω( |U |) , but |U| − 1 matchings suffice if we do not insist on uniformity (Section 4). There is a dual result about guaranteed satisfaction probabilities bounded from above. We omit the proof. Theorem 4.1. Suppose ρ(L) = |Γ(L)|. Let {αv | v ∈ L} be reals in [0, 1]. A necessary and sufficient condition for the existence of a Pareto-efficient distribution D of matchings of G such that D[v] ≤ αv for all v ∈ L is Õ αv ≥ |Γ(L)|. (14) for all S ⊆ L, |Γ(S)| + v 0} , ∅ satisfies |S | = v ∈r yv . Í Proof. Write λ = v ∈R yv . For any r ∈ (0, 1), define S(r ) = {u ∈ L | yu ≥ r } and T (r ) = {v ∈ R | yv ≥ r }. We show that T (r ) = |Γ(S(r ))| and |T (r )|/|S(r )| = λ for every r ∈ (0, 1). To see this, observe that for any v ∈ R, yv ≥ maxu ∈Γ−1 (v) yu . In fact in any optimal solution equality must hold: yv = maxu ∈Γ−1 (v) yu for all v ∈ R; otherwise we may decrease some yv and hence the objective function without sacrificing feasibility. Consequently, v ∈ T (r ) ⇔ yv ≥ r ⇔
max yu ≥ r ⇔
u ∈Γ −1 (v)
⇔ ∃u ∈ Γ −1 (v) such that yu ≥ r ⇔ v ∈ Γ(S(r )). Recall from Lemma 3 that we can construct a solution to LP (17) from any non-empty set, including S(r ). Since λ is the optimal value of LP (17), for any r we have T (r )/|S(r )| ≥ λ, i.e., 0 ≤ |T (r )|−λ|S(r )|. On the other hand, if we pick r uniformly at random from (0, 1), we have Õ Õ Õ E[|S(r )|] = Pr[u ∈ S(r )] = Pr[r ≤ yu ] = yu = 1 r
u
r
u
r
u
and
E[|T (r )|] = r
so
Õ v
Pr[v ∈ T (r )] = r
Õ v
Pr[r ≤ yv ] = r
Õ
yv = λ,
v
0 ≤ E[|T (r )| − λ|S(r )|] = E[|T (r )|] − λ · E[|S(r )|] = λ − λ · 1 = 0, r
r
r
which implies that T (r ) − λ · S(r ) = 0 almost surely when r is uniform in (0, 1). Observe that T (r )/S(r ) is a piecewise-constant function (all distinct possibilities are given by taking t = yw for some w ∈ L ∪R). Moreover, for any r ∈ (0, 1) there is some interval I of non-zero length such that for all r ′ ∈ I , then S(r ) = S(r ′) and T (r ) = T (r ′ ). Thus, any event that is a measurable function of S(r ) and T (r ) and holds with probability 1 when r ∼ U (0, 1) must actually hold for every r ∈ (0, 1) as well.
Fair-by-design algorithms: matching problems and beyond Thus, |T (r )| = λ|S(r )| for all r ∈ (0, 1). In particular if we pick r 0 = minu ∈L yv , then S(r 0 ) = {v ∈ L | yv > 0} satisfies Í v ∈S (r 0 ) yv = 1, hence is non-empty, and by the above it also has the desired property |Γ(S(r 0 ))| − λ · |S(r 0 )| = 0. In combination with Corollary 1, these two lemmas yield an effective way of computing λ 1 (G) by linear programming: Corollary 2. For any bipartite graph with ρ(L) = |R|, the fairness parameter λ 1 (G) is equal to the optimum value of the LP in (17). Fair decompositions. The next ingredient towards an efficient algorithm is a decomposition of L into a nested collection of “fairly isolated” sets based on the maximum attainable satisfaction probability. For A ⊆ L, denote by G| A the subgraph of G induced by A∪Γ(A), and by G/A the subgraph of G induced by L∪(R \Γ(A)). Intuitively, G| A represents the subproblem where only the elements of A are important, and G/A represents the subproblem of G| A where the use of neighbours of A is disallowed. For any subgraph H of G, let π (H ) (resp., Π(H )) be the minimum (resp., maximum) satisfaction probability of an element of V (H ) ∩ L in a maxmin-fair solution. The set X ⊆ L is fairly isolated if Π(G| X ) < π (G/X ). This means that every u < X can be guaranteed satisfaction larger than the largest maxmin-fair satisfaction inside X , even after removing all possibly conflicting edges from X to Γ(X ). Finding fairly isolated sets enables a divide-and-conquer strategy to find maxmin-fair distributions, since the matchings used inside X have no bearing on the satisfactions needed for users in L \ X: Lemma 5. Let X be fairly isolated. Then (a) If D 1 is maxmin-fair for G| X and D 2 is maxmin-fair for G/X , then (S 1 ∪ S 2 )S 1 ∼D 1, S 2 ∼D 2 is maxmin-fair for G. (b) If D is maxmin-fair for G, then the distributions of (S ∩ X )S ∼D and (S ∩ X )S ∼D are maxmin-fair for G| X and G/X , respectively. Proof. The proof is only sketched as it closely follows the argument in case (b) in the proof of Theorem 5; in fact the notions of restriction and contraction just defined coincide with those for matroids. For any maximum matching B in G| X , a subset I ⊆ L \ X is matchable in G/S if and only if B ∪ I is matchable in G. For any distribution X over L, there exists a distribution of the form [X 1 ∪ X 2 ] with the same satisfaction probabilities for all u ∈ L. In particular some maxmin-fair distribution F for G can be written in this form: F = [F 1 ∪ F 2 ], where F 1 is a distribution over G| X and F 2 is a distribution over G/X . As in the proof of Theorem 5, it follows from the fair isolation of X that F 1 and F 2 are maxmin-fair, implying (b) by the uniqueness of satisfaction probabilities of maxmin-fair distributions; and conversely, if F 1 and F 2 are maxmin fair, then [F 1 ∪F 2 ] is maxmin-fair, proving (a). The following gives alternative definitions: Lemma 6. The following are equivalent for every X ⊆ L: (a) X is fairly isolated; (b) (u,v ∈ Y , F is maxmin-fair, and F [v] ≤ F [u]) =⇒ v ∈ X ; (c) Either Γ(X ) = ∅, or X = L, or |Γ(X )| − |Γ(Y )| |Γ(Z )| − |Γ(X )| min 1, max < min 1, min . |X \ Y | |Z \ X | Y (X Z )X
arXiv Technical Report, February 7, 2018 Proof. The equivalence between (a) and (c) is a consequence of Corollary 1 and the definition of fair isolation. Their equivalence with (b) follows from Lemma 5. We are now ready to state our fair decomposition theorem: Theorem 12 (Fair decompositions). The fairly separated sets form a chain S 0 ⊆ S 1 ⊆ S 2 . . . ⊆ Sk−1 ⊆ Sk = L. The following hold: (a) S 0 = {v ∈ L | deg(v) = 0} and Si +1 is the unique maximal set i )| Y ) Si minimizing min 1, |Γ(Y|Y)−Γ(S . \S i | (b) Sk = L and Si −1 is the unique maximal set Y ( Si maximizing |Γ(S )−Γ(Y )|
min 1, |Si \Y | . i (c) Let B 0 = S 0 and B i = Si \ Si −1 for i > 0. If v ∈ B i , then the coverage probability of v in any maxmin-fair solution is λi = i −1 |) , and any w ∈ Γ(B i ) \ Γ(Si −1 ) is matched min 1, |Γ(Bi )−Γ(S |Bi | to some u ∈ B i with probability 1. We call B 0 , . . . , B k the blocks in the fair decomposition of G; B 0 is the trivial block.
Proof. We show that the sets Si defined by (a), which form a chain by definition, comprise all fairly isolated sets and have property (c); part (b) follows by applying the fair decomposition to the dual of the matching matroid and recalling that maxmin-fairness and minmax-Pareto efficiency are equivalent for matroids. Since the sequence S 0 , S 1 , . . . is strictly increasing (with respect to inclusion) and L is finite, there exists some k such that Sk = L. We reason by by induction on k. The argument closely resembles the proof of Theorem 5. It is easy to see that the smallest fairly isolated set is S 0 = {v ∈ L | deg(v) = 0} and that (c) holds for i = 0. For any i > 0, let |Γ(X )| µ i = min ∅,X ⊆L\S i −1 |X | . From the submodularity of Γ and the minimality of µ i it follows that if |Γ(X )| = µ i |X | and |Γ(Y )| = µ i |Y | for ∅ , X , Y ⊆ L \ Si , then |Γ(X ∪ Y )| = µ i |X ∪ Y |. Thus the union of all X such that |Γ(X )| = µ i |X | is the unique maximal set (with respect to set inclusion) such that ρ(X ) = µ i |X |; call it B i′ . Note that µ i < 1 for all i < k; under the simplifying assumption of Theorem 9, we have µ k < 1 as well. Then Si′ = Si ∪B i is the unique |Γ(Y )−Γ(S )|
i maximal set minimizing ; this shows that Si′ = Si and |Y \S i | ′ B i = B i . We also have µ i = λi . It is easy to see that Si is fairly separated. Let Hi denote the graph (G/Si −1 )| B . By Theorem 3, there is a i distribution of matchings in Hi with minimum satisfaction probability at least λi ; the expected number of covered elements from B i is then at least λi |B i | = Γ(B i ) \ Γ(Si −1 ) = ΓH i (B i ). Hence equality must always hold and the maxmin-fair distribution for Hi has satisfaction probability precisely λi for all u ∈ B i . Since the expected number of covered elements ΓH i (B i ) is the same, it follows that any w ∈ ΓH i (B i ) is matched to some v ∈ B i with probability 1. Since Si is fairly separated, by Lemma 5 the maxmin-fair distribution F for G has the same satisfaction probability λi for all u ∈ B i . (In particular, in F no element of ΓH i (B i ) is matched to any vertex outside B i with non-zero probability.) This shows (c). It remains to be checked that no fairly separated set exists other than S 0 , . . . , Sk . This follows from the fact that λ 1 < λ 2 < . . . < λk and property (b) of Lemma 6.
arXiv Technical Report, February 7, 2018 Envy-freeness. The maxmin-fair matching mechanism is envyfree, i.e., each user x has nothing to gain by switching to the random assignment that a maxmin-fair distribution F gives another user y. Given a matching M drawn from F , switching x and y entails replacing any edge (y, z) with the edge (x, z) if z ∈ Γ(x), and deleting (y, z) if not. Let x appear in block B i and y appear in block B j . If j ≤ i, then F [y] ≤ F [x], and switching cannot increase the satisfaction of x. On the other hand, if j > i, then F only matches y with elements of Γ(S j ) \ Γ(Si ), which is disjoint with Γ(x). Therefore switching would result in satisfaction probability 0 for x in this case.
4.1
The basic algorithm
Theorem 12 and Lemma 4 suggest a line of attack to solve the onesided fair bipartite matching problem: 1. Find the blocks B 0 , . . . , B k in a fair decomposition. 2. For each i = 0, 1, 2, . . . , k, obtain a distribution Di of matchings in Hi = G/(∪j ≤i B j ) such that Di [u] = |ΓH i (B i )|/|B i | ∀u ∈ B i . 3. Return a matching from a distribution D of matchings with marginals D 0 , . . . , Dk (such D is maxmin-fair by Lemma 5(a)). Step 1: Finding a fair decomposition. Theorem 12 suggests the following method. By solving the LP in (17) and using Lemma 4, we obtain a set X minimizing |Γ(S)|/|S |. Remove X and Γ(X ) from the graph G and repeat (if G is non-empty); let Y be the new set obtained. If Γ(Y )/|Y | = Γ(X )/|X | , λ, replace X by X ′ = X ∪ Y ; then |Γ(X ′ )| ≤ |Γ(X )| + |Γ(Y )| = λ(|X | + |Y |), so the minimality of Γ(X )/|X | implies that Γ(X ′ )/|X ′ | = λ too. When we obtain an Y satisfying |Γ(Y )|/|Y | > Γ(X )/|X | we know that X was the maximal set minimizing Γ(S)/|S |, i.e., the first non-trivial block B 1 is X . Remove B 1 and Γ(B 1 ) and repeat (if applicable) to obtain B 2 , . . . , B k .
Step 2: Obtaining a fair distribution for each block. The empty matching is the only solution for B 0 . If i > 0, consider the graph Hi = G/(∪j ≤i B j ). To simplify notation, rename L ∩ V (Hi ) and R ∩ V (Hi ) to L and R. We have |R| ≤ |L| and λ = |R|/|L| ≤ 1. First we calculate the (as of yet unknown) probabilities xi j (i ∈ L, j ∈ R) that each edge (i, j) is saturated (i.e., i is matched Í to j) in some fixed maxmin-fair distribution. Clearly j xi j = λ Í for each i and i xi j = 1 for each j. Let us add a set Z of |L| − |R| fictitious vertices to R and extend the definition of xi j to satisfy xi j = 1/|L| for each i ∈ L, j ∈ Z . We obtain a bipartite graph G ′ with |L| vertices on each side; let Γ ′ denote its neighborhood funcÍ Í tion. Then v ∈Γ′ (u) xuv = 1 ∀u ∈ L, u ∈Γ(v) xuv = 1 ∀v ∈ R ∪ Z, and xuv ≥ 0 ∀u ∈ L,v ∈ R ∪ Z . We can find a solution xuv to these inequalities by solving the associated linear program. By the Birkhoff-von Neumann theorem on doubly stochastic matrices [5], the quantities xuv thus obtained represent the edge saturation probabilities of an actual distribution of matchings in G ′ : Lemma 7. Let {xuv }(u,v)∈E be non-negative numbers satisfying Í v ∈R xuv ≤ 1 ∀u ∈ L and u ∈L xuv ≤ 1 ∀v ∈ R. Then a distribution over |E| + 1 matchings such that PrM ∈M [(u,v) ∈ M] = xuv exists and may be found in polynomial time. Í
Using Lemma 7 we obtain a distribution D of matchings in G ′ in which each edge (u,v) is used with probability xuv . If we pick each
David García-Soriano and Francesco Bonchi matching with its probability in D and remove from it the the edges incident to the ”fictitious“ elements in Z , we obtain a distribution of matchings where each element i of L is matched with probability Í 1 − j ∈Z xi j = 1 − (|Z |/|L|) = 1 − (|L| − |R|)/|L| = λ, as desired.
Step 3: Combining the distributions. The last step requires combining the distributions D 1 , . . . , Dk , each defined for a block B i , into a single maxmin-fair distribution for G. The simplest way is to draw M 1 , . . . , Mk from the product distribution D 1 ×D 2 . . . ×Dk and return M 1 ∪ M 2 . . . ∪ Mk . (This is an easily samplable maxminfair distribution with potentially large support.) Remark. The algorithm outlined above requires solving many LP subproblems. It was presented to showcase the main steps required and to establish the existence of a polynomial-time algorithm. We next propose a more efficient algorithm which uses maximum flow computations for Step 1 and edge-colorings of regular bipartite graphs for Step 2. We also show how to obtain a distribution of small support (needed for transparency, as explained in Sec 2).
4.2 An improved algorithm Improved step 1: Finding a fair decomposition. Fix λ ∈ (0, 1). We wish to separate L into vertices with satisfaction probability < λ and vertices with satisfaction probability ≥ λ. Construct a graph G(λ) by adding to G a source vertex s connected to every u ∈ L with an edge of capacity λ, and a sink vertex t connected to every v ∈ R with an edge of capacity 1; all other edges have infinite capacity. Lemma 8. Let κ be the value of a minimum s −t cut in G(λ). Then exactly one of the following cases hold: (a) κ = λ|L| and π (G) ≥ λ; or (b) κ < λ|L| and there is a fairly-isolated subset X ( L such that Π(G| X ) < λ. We can determine which case occurs, and obtain X in case (b), with a min-cut computation on G(λ). In either case we have “made progress” by solving max-flow on G(λ); either we showed that fairness parameter λ is possible, or found a fair separation (and a reason why it is not possible). Proof. Consider a minimum-value s −t cut in G(λ). Because the capacities of the edges from s are no larger than any other capacity, there is always cut C with no larger value containing no edges from L to R. C only contains edges from s to some subset AL ⊆ L and from some subset AR ∈ R to t; its value is λ|AL | + |AR |. Let AL = L \ AL and AR = R \ AR . Because C is an s, t-cut, there are no edges in G (or in G(λ)) between AL and AR , so Γ(AL ) ⊆ AR . As C is a minimum cut, we must in fact have Γ(AL ) = AR and furthermore, for any X ⊆ AL we must have Γ(X ) \ AR ≥ λ|X |, for otherwise there would be a cut of smaller value λ|AL \ X | + |Γ(AL ∪ X )| = (λ|AL | + |AR |) − λ|X | + |Γ(X ) \ AR |. So the fairness parameter π (G/AL ) is at least λ. Assume further that C = C(λ) yields the min-cut of minimum |AL |. Then AL is unique and may be determined in linear time from the residual network of a maximum (pre)flow [36]. For any Y ⊆ AL , we must have Γ(AL \ Y ) < λ|Y |, otherwise another cut of the same value but with |A′L | < |AL | would exist. Hence Π(G| A ) < λ. L Case (a) occurs when AL = ∅, and case (b) when AL , ∅.
Fair-by-design algorithms: matching problems and beyond The parametric flow algorithm of Gallo et al [23] can find the cuts C(λ) simultaneously for all λ (in the sense of giving a cut for all possible |L| − 1 “breakpoints” for λ). Its running time is asymptotically the same time as that of a single maximum-flow computation via the push-relabel algorithm of [25]. However, this technique does not extend to all max-flow algorithms, and [25] is not optimal for the graphs G(λ). A better idea is the following. Start with λ = 1 and keep halving λ as long as case (a) holds in Lemma 8. The first time that (b) occurs we have found a fairly isolated set X . We recursively find the blocks in the fair decompositions of G| X and G/X . The crucial insight is that we can find both in a single recursive call: G| X and G/X are disjoint, so min-cuts for (G| X )(λ 1 ) and (G/X )(λ 2 ) are easily obtained from mincuts for a single graph G(λ 1 , λ 2 ; X , X ) containing a disjoint copy of each. (except for a single source s and a single sink t). In general we keep a partition of L into t ≤ k subsets S 1 , . . . , St . We maintain the invariant that (a) each Si is the union of consecutive blocks in the decomposition; (b) we have computed lower and upper bounds λi and µ i for the maxmin-fair probabilities of vertices in Si , i.e., [π (Si ), Π(Si )] ⊆ [λi , µ i ); and (c) these bounds satisfy µ i −λi = 2−j at iteration j ≥ 0. Initially, t = 1, S 1 = L, λ 1 = 0, µ 1 = 1 and j = 0. Contruct the graph G(λ 1′ , . . . , λt′ ; S 1 , . . . , St ) where the edge capacities from s to each u ∈ Si are λi′ = (λi + µ i )/2, and edges from u ∈ Si to v ∈ Γ(S j ) where j , i are deleted. With a min-cut computation in G(λ 1′ , . . . , λt′ ; S 1 , . . . , St ) we reduce the range of parameter bounds within Si by half for each i, and possibly split Si into two (increasing t) if we found a new isolated set. After the min-cut computation, obtaining the new partition of L, the new upper bounds, and removing the edges from lower blocks to higher ones takes linear time. After O(log |L|) iterations (each performing a min-cut and a linear-time update), we have µ i − λi < 1/(2|L| 2 ), at which point we have determined the full decomposition (because each maxminfair satisfaction probability is of the form a/b where a ≤ b, 1 ≤ b ≤ |L|). The running time of the max-flow algorithm of [24] for bipartite networks with rational capacities with denominators bounded by a polynomial in |V | is O(min(|E| 3/2 , |E||V | 2/3 ) · log |V |). We obtain: Theorem 13. The fair decomposition of a graph G = (V , E) for the one-sided fair bipartite matching problem can be found in time O(min(|E| 3/2 , |E||V | 2/3 ) · (log |V |)2 ). Improved step 2: Obtaining a fair distribution for each block. As before, suppose that G itself has a single block, so Γ(L) = λ|R| . Let д = gcd(|L|, |R|) and l = |L|/д, r = |R|/д. Let G(λ) be as in Lemma 8. By the max-flow/min-cut theorem, there is a flow in G(λ) with of value λ|L| = |R|. Since the incoming edges to any u ∈ L from s have capacity λ, the flow from s to v must be precisely λ. Let xuv be flow between u ∈ L and v ∈ R. Í Í Then u xuv = λ = rl and v xuv = 1, so we found the edge saturation probabilities {xuv } of a maxmin-fair distribution. Consider now the subgraph G ′ of G containing only those edges for which xuv > 0. By Lemma 7, the same edge probabilities xuv warrant the existence of a distribution of matchings in G ′ with satisfaction probability λ.
arXiv Technical Report, February 7, 2018 By the integral flow theorem [34], each xuv may be assumed to be a multiple of 1/r , because all capacities in G ′ are multiples of 1/r ; in fact a maximum-flow algorithm returns such a solution. Now consider the (r, l)-biregular multigraph P obtained by putting nuv = xuv · r parallel edges between u ∈ L and v ∈ R. We use the same trick as in Step 2 of Section 4.1: add to the right side of P a set Z of |L \ R| fictitious vertices, each joined with a single edge to each element of L. The resulting graph P ′ is bipartite and l-regular. A theorem of Vizing [40] states that any l-regular bipartite graph is l-colorable so that no two adjacent edges share a color. Each color class is a matching, so there are l matchings in P ′ covering each u ∈ L exactly r times in total. Cole et al [11] give an algorithm to color regular bipartite graphs in time O(m log r ) = O(m log |Γ(L)|), where m is the number of edges of P ′ without counting multiplicity; in particular m ≤ |E(G)|. If we remove the “fictitious” vertices in Z from each of these matchings, we are left with a multiset of l matchings in G covering each u ∈ L exactly r times. The uniform distribution over them is therefore maxminfair for G. Now consider the case that the decomposition of G has i for all blocks i several blocks B 1 , . . . , B k . The values xuv can be computed from a single maximum-flow computation in G(λ 1 , . . . , λk ; B 1 , . . . , B k ) if we know the blocks and each satisfaction probability λi . Then each corresponding coloring can be found in time O(m i log |R|); summing these running times and noticing Í that i m i ≤ |E|, we deduce: Theorem 14. Given the fair decomposition, a maxmin-fair distribution for all blocks in it can be found in total time O(|E| log |R|) after a max-flow computation. Thus the running time of the fair algorithm is dominated by the time to obtain the fair decomposition. Altogether: Theorem 15. The maxmin-fair one-sided bipartite matching problem can be solved in time O(min(|E| 3/2 , |E||V | 2/3 ) · (log |V |)2 ). Reducing the distribution support size. Let us find a maxminfair distribution F using at most |L| + 1 − k matchings, where k is the number of blocks in Theorem 12. When k = 1, the technique from Step 2 gives a multiset of l ≤ |L| matchings. Consider the case k = 2, which implies our claim for larger k by induction. Suppose D (resp., D ′ ) chooses matching Mi , i ∈ [r ] on B 1 (resp., N j , j ∈ [t] on B 2 ) with probability pi (resp., q j ). (Here B 1 ∩ B 2 = ∅.) A simple greedy algorithm can construct a distribution Z of matchings in B 1 ∪ B 2 such that D[u] = Z [u] for u ∈ B 1 and D ′ [u] = Z [u] for u ∈ B 2 with at most r +t −1 matchings, as follows. Keep indices i ∈ [r ], j ∈ [t] and let S denote a set of (probability, matching) pairs, which will define the desired distribution at the end. At the outset S = ∅ and i = j = 1; at each iteration we add to S the new pair (δ, Mi ∪ N j ) where δ = min(pi , q j ). We decrement pi and q j by δ and increment i (resp., j) if pi (resp, q j ) vanishes. The process terminates when i and j reach the end of their range, at which point |S | = r + t − 1 and all probabilities in S sum up to 1.
5 GENERALIZATIONS We discuss some natural ways to generalize our algorithmic results from Section 4 . The first deals with matroid problems (see
arXiv Technical Report, February 7, 2018 Section 3). For this we need an additional condition about the computational efficiency of verifying the feasibility of a candidate solution. Efficiently encoded matroid problems. Call a matroid problem efficiently encoded if it admits a polynomial-time independence oracle, which answers the question: “given X and A ⊆ U, is A ∈ S?” Examples of efficiently encoded matroid problems include finding (a) sets of matchable vertices in a graph; (b) linearly independent sets of columns in a matrix; (c) acyclic sets of edges in a graph; (d) sets of unit-length tasks (each with a given deadline) that may be scheduled on a single machine; (e) sets of vertices in a graph for which edge-disjoint paths from another specified vertex exist. One may prove the existence of a fair decomposition analogous to Theorem 12. Theorem 5.1 (Fair decomposition of matroids). The fairly separated sets in a matroid M form a chain S 0 ⊆ S 1 ⊆ S 2 . . . ⊆ Sk−1 ⊆ Sk = L with respect to set inclusion. Define λ 0 = 0 and λi =
ρ(Si ) − ρ(Si −1 ) . |Si − Si −1 |
The following properties hold: (a) S 0 = span(∅) and Si +1 is the unique maximal set Y ) Si miniρ (Y )−ρ (S ) mizing |Y \S | i . i (b) If v ∈ Si \Si −1 , then the coverage probability of v in any maxminfair solution is λi . (c) The chain L \ Sk ⊆ L \ Sk−1 ⊆ . . . L \ S 1 . . . L \ S 0 is the fair decomposition of the dual matroid M ∗ .
David García-Soriano and Francesco Bonchi of any two such sets also minimizes ρ(X )/|X | by the submodularity of the rank function ρ, hence by repeating this procedure as many times as needed we can find the maximal set B minimizing this ratio. Contracting the current matroid by B and proceeding in the same way if B , L we find a fair decomposition in polynomial time. Step 2: Obtaining a fair distribution for each block. If we find fair distributions D 1 , . . . , Dk so that each Di is maxmin-fair for Ð (M/Si )| S i −S i −1 , then the distribution of ki=1 Mi where each Mi is drawn from Di will be maxmin-fair. So again we can reduce the general case to the case where there is a single block and all elements have the same satisfaction probability λ in the maxmin-fair distribution. To find the distribution of bases realizing the desired probabilities, we may use the idea behind the proof of Theorem 10. Suppose λ = r /s, where r and s are integers. Make r disjoint copies of the ground set L, L ′ = L ∪ . . . ∪ L. For any S ′ ⊆ A, leto(S ′) = {a | (a, i) ∈ S ′ }. Define ρ ′(S ′ ) = ρ(o(S)). Then ρ ′ is the rank function of a matroid because it satisfies (a) 0 ≤ ρ ′ (S ′) ≤ |S ′ |, (b) S ′ ⊆ T ′ =⇒ ρ ′(S ′ ) ≤ ρ ′ (T ′ ), and (c) ρ ′ (S ′ ∪T ′ ) + ρ ′ (S ′ ∩T ′ ) ≤ ρ ′(S) + ρ ′(T ),. Since |o(S ′)| ≥ |S ′ |/r , we have that for all S ′ , ρ ′ (S ′) = ρ(o(S ′)) ≥ λo(S ′) = (r /s)o(S ′) ≥ (1/s)|S ′ |. Then Edmond’s matroid partitioning algorithm [17] may be used to partition M ′ into s ρ ′ independent sets, and since element of M has r copies in M ′ , every element is included r times among those s. By definition ρ ′ independent sets do not include copies of the same element. Therefore we found s bases so that each element of L is included in exactly r of t hem; picking one of them at random gives the desired maxmin-fair distribution.
This condition enables efficient maxmin-fair algorithms: Theorem 16. For any efficiently encoded matroid problem, there is a polynomial-time algorithm inducing a maxmin-fair distribution. Proof. Since this result will be shown shortly to be a consequence of the more general Theorem 5.2, we only sketch the idea of how to generalize the steps of our fair matching algorithm from Section 4. (This technique is still interesting as it yields a more efficient algorithm for matroids that that implied by Theorem 5.2). Step 1: Finding a fair decomposition. Given S 0 , . . . , Si −1 , to find Si we need to minimize (ρ(X ) − ρ(Si − 1))/|X − Si −1 |. Write Si = Si −1 ∪ B i . Note that ρ(X ) − ρ(Si −1) = ρ M/S i −1 (X ) . We can build a polynomial-time independence oracle for the contracted matroid M/Si −1 . So finding the next block reduces to minimizing ρ M/S i −1 (B)/|B| in the contracted matroid. For notational convenience we may assume i = 1 and M/Si −1 = M. For any λ, determining the existence of a set with ρ(S) ≤ λ|S | is equivalent to determining whether the minimum of the submodular function f λ (S) = ρ(S) − λ|S | is at most zero. The existence of an independence oracle implies a polynomial-time algorithm to compute this function. Since submodular function minimization is solvable in polynomial time [38], we can determine if there is some set with f λ (S) ≤ 0. There is a polynomial number of rational values for λ with denominator at most |L|. By performing binary search on the set of possible values for λ (or trying all of them), we obtain in polynomial time a set minimizing ρ(X )/|X |. The union
Efficiently optimizable problems. The class of problems with efficiently samplable maxmin-fair distributions is much larger than that of efficiently-encoded matroid problems. We show that, in a specific sense, the fair version of any problem is as easy as its optimization version. Call a problem efficiently optimizable if there is a polynomial-time weight optimization oracle: an algorithm O which, given a non-negative weight function defined on U, returns an element of S of maximum weight sum. (Such an oracle may be used as an independence oracle.) By a characterization of matroids [19], efficiently encoded matroid problems comprise the special case where O is provided by the simple greedy algorithm. Example 3. The following two problems are not matroid problems, but are efficiently optimizable nonetheless: • Given a graph and two vertices s and t, find the vertices in a shortest path from s to t. Indeed, given any assignment of weights to vertices, a shortest path of maximum weight may be found by dynamic programming on the directed acyclic graph of a breadth-first-search from s. • The bipartite matching problem where U = E, that is, users correspond to edges (not vertices) in the graph. Then, given an assignment of weights to edges, a maximum matching of maximum weight may be found by finding a maximum-weight set of edges belonging to the intersection of two 1-uniform matroids, which can be solved in polynomial time [34].
Fair-by-design algorithms: matching problems and beyond
arXiv Technical Report, February 7, 2018
As usual, we characterize the fairness parameter. Given an asÍ signment W : L → R ≥0 and A ⊆ L, define W (A) = v ∈A W (v). Theorem 17. For any problem S, the minimum satisfaction probability in the maxmin-fair distribution is W (S) max min D[u] = min max , (18) D ∈∆ u ∈L W ∈Wα S ∈S W (L) where W = {W : L → R ≥0 | W (L) > 0} and ∆ is the set of all distributions over S.
Proof. By homogeneity we need only consider functions W with W (L) = 1. Just like in the proof of Theorem 3, the following two problems can be written as linear programs which are duals of each other: max
min
v ∈L
s.t.
Õ
pB
Õ
pB
min
max S ∈S
B∋v
S ∈S
pB
1
=
s.t.
wv
Õ
wv
=
1
wv
≥
0.
v ∈S
v ∈L
0
≥
Õ
(19)
(20)
The left-hand side of (18) is the optimum of Program (19), and its right-hand side is the optimum of Program (20). The result follows from linear programming duality. Using (18) one may compute the satisfaction probabilities of a maxmin-fair distribution of an efficiently optimizable problem in polynomial time. Moreover, the distribution itself can be found: Theorem 5.2. For any efficiently optimizable problem, the satisfaction probabilities of a maxmin-fair distribution can be computed in polynomial time. In this case, however, the connection between maxmin-fairness and minimizing social inequality is lost. Proof. Let D be a maxmin-fair distribution. We maintain the invariant that we know the satisfaction probabilities αv = D[v] for all v in a subset K ⊆ L, and that D[v] ≥ maxw ∈K αw . Initially K = ∅. We show how to augment S in polynomial time, which gives the result. We take (19) and add constraints to express the required satisfaction probabilities for K, obtaining the primal program max s.t.
λ Õ
pB Õ B∋v pB − λ B∋v Õ pB S ∈S
pB
and its dual min
"
max S ∈S
Õ
v ∈S
#
wv −
≥
αv
∀v ∈ K
≥
0
∀v < K
=
1
≥
0
Õ
α v wv v ∈KÕ
(21)
a separation oracle: given {wv }v ∈L and λ ∈ R, we can determine if the optimum of (22) is at most λ by using the weight optimization oracle and answering “yes” if the weight of the solution is no Í larger than λ + v ∈K αv wv . Otherwise we answer “no” and we can detect a violated constraint, given either by the base found by Í the optimization oracle, or by the constraint v wv = 1, which can be checked separately. The existence of a separation oracle implies the solvability of (22) in polynomial time via the ellipsoid algorithm, by a result of Grötschel, Lovász and Schrijver [26, 27]. Suppose now that the optimal dual solution found is given by {wv } and has optimum value λ. These variables are dual to {p S } in (19) and (20). By complementary slackness, wv > 0 implies that Í the corresponding constraint is tight: B ∋v ≥ αv if v ∈ K, and Í B ∋v = λ if v < K. Because D[v] ≥ maxw ∈K α w by assumption, there set K ′ = {v < K | wv > 0} is non-empty. Therefore we can add K ′ to K and repeat the process; we stop when K = L. The number of iterations is at most L, and each iteration runs in polynomial time. Theorem 18. For any efficiently optimizable search problem, there is a polynomial-time algorithm inducing a maxmin-fair distribution whose support size is polynomial in |U|. Why should Theorem 18 be true? Imagine an adaptive weight scheme initially setting all user weights to 1. Our weight optimization oracle O gives a maximum-weight solution. With a view to attaining maxmin-fairness, we reduce the weight of the users included in this solution, and raise the rest. Now O returns a second solution. By suitably adjusting user weights to encourage including users that have been too often unsatisfied (and the probability of each solution returned) we may obtain a maxmin-fair distribution. Proof. Consider the pair of LP programs used in the last iteration in the proof of Theorem 5.2 (so that K ∪ K ′ = L). We used the separation oracle and the ellipsoid algorithm to solve LP (20). Since all coefficients in our LPs can be represented in a polynomial number of bits, the number t of calls to the separation oracle during the run of the ellipsoid algorithm can be bounded by a polynomial in the number of variables |L| (see [27]). Consider the subprogram of (22) formed by using only these t constraints, Í along with v wv = 1 and the non-negativity constraints. If we run the ellipsoid algorithm on the new subprogram instead, we will find the same solution. Therefore the reduced set of constraints is enough by itself to guarantee that the optimum of LP (22) is at least λ (hence exactly λ); all other constraints are redundant. The dual of this subprogram is a subprogram of the primal LP (21) using only t of the variables p B and having the same optimal value. Any solution to this reduced primal program gives the desired distribution.
(22)
Approximate fairness. We define approximately maxmin-fair distributions, in which satisfaction guarantees are nearly as good as those of a maxmin-fair distribution.
The dual (22) has |L| variables but a possibly exponential number of constraints (one for each candidate solution S, plus the conÍ straint v wv = 1). To get around this difficulty, observe that it has
Definition 7 (ε-unfairness). Let ε ∈ [0, 1]. A distribution D over S is ε-maxmin-unfair for U if for all u ∈ U, D[u] ≥ (1−ε)F [u], where F is a maxmin-fair distribution.
s.t.
v ∈L
wv
=
1
wv
≥
0.
arXiv Technical Report, February 7, 2018 Our final result yields approximately maxmin-fair distributions with small support size, as long as the fairness parameter λ 1 (G) is not too small. (The number of matchings is roughly 1/λ 1 (G), which is easily seen to be necessary.) As noted at the end of Section 2, this is important from the point of view of transparency. Theorem 19. There is an algorithm that takes an instance (U, S) of an efficiently-encoded matroid problem and a number ε ∈ (0, 1) and, in time poly(|U|, 1/ε), outputs a multiset S of ⌈1/(λ 1 (X )·ε)⌉ solutions from S such that the uniform distribution over S is ε-maxminunfair. Further motivation for approximate fairness comes from optimization problems. Suppose that solutions may have different business value (depending on the users selected), possibly unrelated to the users’ satisfaction probabilities. We can reach a compromise between value and individual fairness by fixing a parameter ε ∈ (0, 1) and looking for the ε-unfair distribution of largest expected value.5 We leave the study of such problems for future work.
6
EXPERIMENTS
We evaluate the practical performance of our fair matching algorithm by measuring its running time and its ability to scale to large graphs, and analyzing the distribution of maxmin-fair satisfaction probabilities and how they compare with those from two baselines. We also describe the features of the fair decompositions obtained. Implementation. We used the improved algorithm from Sec. 4. For max-flows we chose the highest-label push-relabel algorithm of [25], which performs best with the gap heuristic from [9]. Efficient gap detection is done via bucket lists of active nodes at each level [10]. We arrange edges from/to the same vertex consecutively to take advantage of cache locality. Except for the final computation of flows xuv , only the first phase of [25] is run, since it suffices to find min-cuts (not flows) and thus fairly isolated sets. We avoid floating-point computations by using exact integral multipliers. Our code was compiled with g++ using -O3 optimizations and run on a dual-core Intel i7-7560U CPU (2.40 GHz) with 16Gb RAM. Reproducibility: our code is available at https://goo.gl/cbhwsJ. Baselines. To the best of our knowledge, we are the first to study maxmin-fair randomized matching algorithms. As baselines we adapt two mechanisms from economics [31] designed for randomized assignments on full bipartite graphs where every u ∈ L has a full ranking of all possible partners v ∈ R. We compare: (MF): Our maxmin-fair mechanism. (PS) Probabilistic Serial: The goal is to find a set of edge flows from L to R which can be converted into a matching distribution. PS works at follows: each user u ∈ L sends flow at the same fixed rate, sharing this rate equally among her neighbours. When the outgoing flow of u ∈ L (or the ingoing flow of v ∈ R) reaches 1, remove u (or v). Repeat while there are remaining edges. (RP) Random Priority: let S = ∅. Process all users in random order, adding u to S if S ∪ {u} is matchable. Return the final set S. 5 This gives more robust fairness guarantees than the alternative of fixing some δ ∈ (0, 1) and calling a solution feasible only when its value is 1 − δ fraction of the optimum solution in terms of value, as the solution may be unique for a large range of δ values.
David García-Soriano and Francesco Bonchi Table 1: Datasets used: code, number of left and right nodes ( |L |, |R |), number of edges ( |E |), and maximum matching size (ρ). Source: http://konect.uni-koblenz.de/networks/ dataset (code) actor-movie (AM) pics-ti (Vui) citeulike-ti (Cti) bibsonomy-2ti (Bti) wiki-en-cat (WC) movielens (M3) flickr (FG) dblp-author (Pa) discogs-aff (Di) edit-dewiki (de) livejournal (LG) trackers (WT) orkut (OG)
|L | 127,823 82,035 153,277 204,673 1,853,493 69,878 395,979 1,425,813 1,754,823 425,842 3,201,203 27,665,730 2,783,196
|R | 383,640 495,402 731,769 767,477 182,947 10,677 103,631 4,000,150 270,771 3,195,148 7,489,073 12,756,244 8,730,857
|E | 1,470,404 2,298,816 2,411,819 2,555,080 3,795,796 10,000,054 8,545,307 8,649,016 14,414,659 57,323,775 112,307,385 140,613,762 327,037,487
ρ 114,762 67,608 120,125 152,757 179,546 10,544 96,866 1,425,803 248,796 355,045 2,171,971 4,006,867 1,980,077
The exact determination of the satisfaction probabilities of RP is computationally infeasible. To approximate them, we run RP T = 1000 times with independent uniformly random permutations. For this comparison we focus on the smaller graphs, due to the limited scalability of PS (which needs a large number of iterations in its main loop, each taking linear time) and RP (which needs to be run T times to approximate the satisfaction probabilities). Datasets. We used publicly-available bipartite graphs of various types, sizes and domains: Table 1 reports their main characteristics. Fair decompositions. Table 2 shows the number of blocks k in the fair decomposition, the number of distinct edges e 1 used in a maxmin-fair distribution F , the number of matchings M in the support of F , and the time (user and real) to compute the decomposition. Note that e 1 (which determines the space needs for publishing F ) exceeds |L| only slightly. Runtimes were below 8 min. for all instances but for OG, where the memory needs for graph and data structures exceeded the RAM available, causing excessive disk swapping. Satisfaction probabilities. Table 3 reports the distribution of satisfaction probabilities: minimum value (λ min ), quantiles, percentage of users with satisfaction 1 (p%), and Nash welfare (N 0 ), the geometric mean of utilities (satisfaction probabilities in our setting): ! 1/ |U | Ö N 0 (D) = . (23) D[u] u ∈U
Nash welfare is a standard measure of fairness when allocating divisible resources [7], and penalizes small values. As in [6], we also study the generalization of Nash welfare using power means (for a parameter p ∈ R): Í p 1/p u ∈U D[u] Np (D) = . (24) |U|
The power means are a non-decreasing function of p. When p = 1, Np (D) is the mean satisfaction probability, which equals ρ/|L| for any Pareto-efficient mechanism. Taking the limit in (24) as p → 0 one obtains (23) [29], justifying the notation N 0 for (standard) Nash welfare. Taking the limit as p → −∞ yields minu ∈U D[u], which by definition is maximized by MF. Table 4 shows these metrics for the three mechanisms tested, on those graphs were PS terminated in under eight hours. Notice that N 1 (MF ) = N 1 (RP) > N 1 (PS), confirming that MF and RP are
Fair-by-design algorithms: matching problems and beyond
Table 5: Fraction of satisfied users among the bottom t %.
Table 2: Fair decompositions. dataset AM Vui Cti B WC M3 FG Pa Di de LG WT OG
k 194 72 151 179 1468 245 924 2 1117 163 1480 3612 2266
e1 143,425 84,003 157,744 212,667 1,883,431 92,841 435,612 1,425,813 1,784,259 432,472 3,314,628 27,842,321 3,041,112
user (s) 1.812 1.206 1.731 2.152 17.57 11.12 15.17 5.96 23.37 21.67 103.59 444.71 370.76
M 13,762 1,068 2,726 4,119 350,518 52,332 109,242 2 305,104 5,596 302,410 16,548,387 224,738
real (s) 1.845 1.248 1.761 2.176 17.78 11.24 15.29 6.02 23.49 22.16 105.36 447.21 5102.08
Table 3: Distribution of maxmin-fair satisfaction probabilities. dataset AM Vui Cti B WC M3 FG Pa Di de LG WT OG
λ min 0.156 0.0208 0.01 0.0128 5.52e-4 0.0298 0.001012 0.5 0.000025 0.00334 1.83e-4 4.38e-7 0.00453
λ25% 1 0.5 0.5 0.5 0.0149 0.0833 0.116 1 0.0135 0.667 0.2 4e-6 0.333
λ50% 1 1 1 1 0.0417 0.137 0.2 1 0.0588 1 1 1.26e-4 1
λ75% 1 1 1 1 0.1 0.1798 0.254 1 0.167 1 1 0.0294 1
p% 76.18 69.16 65.42 58.63 2.17 0.49 6.04 99.9999 4.28 74 59.43 10.89 56.2
N0 0.860 0.743 0.670 0.630 0.0366 0.121 0.161 0.99999 0.0377 0.718 0.326 3.66e-4 0.583
Table 4: Generalized Nash welfare of MF, PS and RP. dataset N1 (M F ) N1 (P S ) N1 (RP ) N0 (M F ) N0 (P S ) N0 (RP ) N−1 (M F ) N−1 (P S ) N−1 (RP ) N−2 (M F ) N−2 (P S ) N−2 (RP ) N−5 (M F ) N−5 (P S ) N−5 (RP )
AM 0.898 0.796 0.898 0.860 0.695 0.855 0.797 0.524 0.776 0.699 0.335 0.636 0.409 0.100 0.261
Vui 0.824 0.775 0.824 0.743 0.684 0.738 0.585 0.522 0.554 0.339 0.300 0.251 0.0822 0.077 0.0433
Cti 0.784 0.739 0.784 0.670 0.618 0.662 0.470 0.419 0.434 0.232 0.210 0.187 0.0431 0.0429 0.0379
B 0.746 0.702 0.746 0.630 0.580 0.622 0.445 0.395 0.408 0.244 0.214 0.183 0.0573 0.025 0.0382
M3 0.151 0.147 0.151 0.121 0.0824 0.110 0.0927 0.0452 0.0721 0.0720 0.0285 0.0441 0.0486 0.0165 0.0203
Pareto-efficient but PS is not. The generalized welfares for p < 1 are computed exactly for MF and PS, but estimated from the empirical probabilities after T samples for RP. MF comes out on top for all (generalized) Nash welfares in all instances. Interestingly, N 0 (RP) is typically within 1% of N 0 (MF ), but for smaller p the gap can widen to as much as 20% for p = −5, in accordance with the fact that MF was designed to provide better guarantees to lowsatisfaction users. Finally, Table 5 shows the expected fraction of satisfied users among the bottom t% using each method, for t = 1, 5, 10, and 20. We observe again that our method always gives the highest values, the gap being as high as 50% in some instances.
7
arXiv Technical Report, February 7, 2018
RELATED WORK
To the best of our knowledge, we are the first to study maxmin fairness for randomized matching algorithms. As discussed in Section 1, the bulk of the research in the area of algorithmic bias and fairness has mainly focused on avoiding discrimination against a
dataset MF, t = 1 P S, t = 1 RP, t = 1 MF, t = 5 P S, t = 5 RP, t = 5 M F , t = 10 P S, t = 10 RP, t = 10 M F , t = 20 P S, t = 20 RP, t = 20
AM 0.189 0.0640 0.141 0.282 0.118 0.256 0.389 0.167 0.363 0.513 0.259 0.509
Vui 0.0610 0.0528 0.0518 0.158 0.130 0.144 0.231 0.192 0.216 0.344 0.287 0.326
Cti 0.0481 0.0417 0.0353 0.100 0.087 0.088 0.151 0.128 0.137 0.238 0.204 0.229
B 0.0459 0.039 0.0337 0.0970 0.0825 0.845 0.143 0.123 0.131 0.221 0.193 0.210
M3 0.0298 0.00430 0.00417 0.0298 0.00605 0.00417 0.0338 0.00883 0.0200 0.0381 0.0157 0.0319
sensitive attribute (i.e., a protected social group) in supervised machine learning [13, 16, 21]. Most of this literature focuses on statistical parity, or grouplevel fairness, i.e., the difference in having a positive outcome for a random individual drawn from two different subpopulations (i.e., men and women). Feldman et al. [21] propose to repair attributes in such a way as to maintain per-attribute within-group ordering while enforcing statistical parity, so that a single decision threshold applied to the transformed attributes would result in equal success rate among the two different groups. Corbett-Davies et al. [13] reformulate algorithmic fairness as constrained optimization in the context of criminal justice and pretrial release decision-making: the objective is to maximize public safety while satisfying formal fairness constraints designed to reduce racial disparities. Dwork et al. [16] provide a series of examples in which statistical parity is maintained, but from the point of view of an individual, the outcome is clearly unfair. Then they study a randomized solution for classifiers to guarantee that “similar individuals are treated similarly” in an expected sense. The idea that more qualified individuals should be chosen preferentially is present in the work of Joseph et al. [30], who study fairness in multi-armed bandit problems. Pedreschi et al. [35] introduced the related data mining problem of discovering discrimination practices (i.e., bias against a protected social group) in a given dataset containing past decisions. If such a dataset is used as training set for a machine learning model, the bias detected can be fixed before the learning phase [32, 41] preventing it further propagation in the algorithmic decision-making process. Finally, maxmin-fairness (in a non-distributional sense) as an optimization objective is used for flow control in telecommunication networks [4, 12]. The origin of the concept in the context of non-discrimination dates back to Rawls’ theory of justice [37], where a “difference principle” is advocated whereby social and financial inequalities are required to be to the advantage of the worst-off. In Rawls’ distributive justice, social and economic measures should be designed so as to bring the greatest benefit to the least-advantaged members of society, in order to maximize their prospects.
8 CONCLUSIONS We proposed the notion of maxmin fairness for randomized algorithms for problems in which humans are the elements that may or not be included in a solution. A series of theoretical results characterize maxmin-fair distributions and pave the road to our practical contribution: an exact O(min(|E| 3/2 , |E||V | 2/3 ) · (log |V |)2 ) algorithm for maxmin-fair bipartite matching, which scales to graphs
arXiv Technical Report, February 7, 2018 with millions of vertices and hundreds of millions of edges. We also discussed methods for the transparent and accountable real-world deployment of our framework. Finally, we identified a widely applicable condition (efficient optimizability) which guarantees the computational efficiency of discovering maxmin-fair distributions of solutions for general search problems. This result opens the door to further research on maxmin-fair algorithms: although existence is shown, the method implicit in the proof may lead to suboptimal running times, calling for improved methods for specific problems. While our focus has been on search problems, we have also proposed a notion of approximate fairness intended to deal with optimization problems, where solutions may have different business value, possibly unrelated to satisfaction probabilities. The goal is to reach a compromise between fairness and expected business value. It would be desirable to be able to find approximate maxmin-fair distributions faster than exact maxmin-fair distributions; we leave this as an open problem. Finally, future work may consider other notions of fairness for randomized algorithms for search, optimization, ranking and learning problems.
REFERENCES [1] Big data: Seizing opportunities, preserving values. Technical report, US Executive Office of the President, 2014. https://obamawhitehouse.archives.gov/sites/default/files/docs/ big_data_privacy_report_may_1_2014.pdf. [2] Big data: A report on algorithmic systems, opportunity, and civil rights. Technical report, US Executive Office of the President, 2016. https://obamawhitehouse.archives.gov/blog/2016/05/04/ big-risks-big-opportunities-intersection-big-data-and-civil-rights. [3] C. Berge. Two Theorems in Graph Theory. PNAS, 43(9):842–844, 1957. [4] D. P. Bertsekas, R. G. Gallager, and P. Humblet. Data networks, volume 2. Prentice-Hall International New Jersey, 1992. [5] G. Birkhoff. Tres observaciones sobre el algebra lineal. Univ. Nac. Tucumán Rev. Ser. A, 5:147–151, 1946. [6] S. Brânzei, V. Gkatzelis, and R. Mehta. Nash social welfare approximation for strategic agents. In EC, pages 611–628. ACM, 2017. [7] I. Caragiannis, D. Kurokawa, H. Moulin, A. D. Procaccia, N. Shah, and J. Wang. The unreasonable fairness of maximum Nash welfare. In EC 2016. [8] M. Charikar. Greedy approximation algorithms for finding dense components in a graph. In APPROX, pages 84–95, 2000. [9] B. V. Cherkassky and A. V. Goldberg. On implementing the push-relabel method for the maximum flow problem. Algorithmica, 19(4):390–410, 1997. [10] B. V. Cherkassky and A. V. Goldberg. HIPR 3.5. Technical report, IG Systems, Inc., 2004. http://www.avglab.com/andrew/soft.html. [11] R. Cole, K. Ost, and S. Schirra. Edge-coloring bipartite multigraphs in O (E log D) time. Combinatorica, 21(1):5–12, 2001. [12] A. Coluccia, A. D’Alconzo, and F. Ricciato. On the optimality of max–min fairness in resource allocation. Annals of telecommunications, 67(1-2):15–26, 2012. [13] S. Corbett-Davies, E. Pierson, A. Feller, S. Goel, and A. Huq. Algorithmic decision making and the cost of fairness. In KDD 2017. [14] A. L. Dulmage and N. S. Mendelsohn. Coverings of bipartite graphs. Can. Jrl. of Math., 10(4):516–534, 1958. [15] C. Dwork. What’s fair? In KDD 2017 [Keynote]. [16] C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. Zemel. Fairness through awareness. In Proc. 3rd Conf. Innov. Theor. Comp. Sci., pages 214–226. ACM, 2012. [17] J. Edmonds. Minimum partition of a matroid into independent subsets. J. Res. Nat. Bur. Standards Sect. B, 69:67–72, 1965. [18] J. Edmonds. Paths, trees, and flowers. Can. Jrl. of Math., 17(3):449–467, 1965. [19] J. Edmonds. Matroids and the greedy algorithm. Math. Prog., 1(1):127–136, 1971. [20] J. Edmonds and D. Fulkerson. Transversals and matroid partition. J. Res. Nat. Bur. Standards, 69B:147–153, 1965. [21] M. Feldman, S. A. Friedler, J. Moeller, C. Scheidegger, and S. Venkatasubramanian. Certifying and removing disparate impact. In KDD 2015. [22] D. Gale and L. S. Shapley. College admissions and the stability of marriage. The American Mathematical Monthly, 69(1):9–15, 1962. [23] G. Gallo, M. D. Grigoriadis, and R. E. Tarjan. A fast parametric maximum flow algorithm and applications. SIAM Journal on Computing, 18(1):30–55, 1989. [24] A. V. Goldberg and S. Rao. Beyond the flow decomposition barrier. Journal of the ACM, 45(5):783–797, 1998.
David García-Soriano and Francesco Bonchi [25] A. V. Goldberg and R. E. Tarjan. A new approach to the maximum-flow problem. Journal of the ACM (JACM), 35(4):921–940, 1988. [26] M. Grötschel, L. Lovász, and A. Schrijver. The ellipsoid method and its consequences in combinatorial optimization. Combinatorica, 1(2):169–197, 1981. [27] M. Grötschel, L. Lovász, and A. Schrijver. Geometric algorithms and combinatorial optimization, volume 2. Springer Science & Business Media, 2012. [28] P. Hall. On representatives of subsets. Jrl. London Math. Soc., 1(1):26–30, 1935. [29] G. H. Hardy, J. E. Littlewood, and G. Pólya. Inequalities. CUP, 1952. [30] M. Joseph, M. Kearns, J. H. Morgenstern, and A. Roth. Fairness in learning: Classic and contextual bandits. In Adv. Neur. Inf. Proc. Syst., pages 325–333, 2016. [31] Y. Kamada and F. Kojima. Efficient matching under distributional constraints: Theory and applications. American Economic Review, 105(1):67–99, 2015. [32] F. Kamiran and T. Calders. Data preprocessing techniques for classification without discrimination. Knowl. Inf. Syst., 33(1):1–33, 2011. [33] M. Kearns. Fair algorithms for machine learning. In EC 2017. [34] E. L. Lawler. Combinatorial optimization: networks and matroids. Courier Corporation, 1976. [35] D. Pedreschi, S. Ruggieri, and F. Turini. Discrimination-aware data mining. In KDD 2008. [36] J.-C. Picard and M. Queyranne. On the structure of all minimum cuts in a network and applications. Combinatorial Optimization II, pages 8–16, 1980. [37] J. Rawls. A theory of justice. MA: Harvard University Press, 1971. [38] A. Schrijver. A combinatorial algorithm minimizing submodular functions in strongly polynomial time. J. of Comb. Theory, Series B, 80(2):346–355, 2000. [39] E. Triesch. A short proof that matching matroids are transversal. Applied Mathematics Letters, 5(2):19 – 20, 1992. [40] V. G. Vizing. On an estimate of the chromatic class of a p-graph. Diskret analiz, 3:25–30, 1964. [41] I. Zliobaite, F. Kamiran, and T. Calders. Handling conditional discrimination. In ICDM 2011.