MIC’2001 - 4th Metaheuristics International Conference
281
Design of Balanced MBA Student Teams Jacques Desrosiers∗
Nenad Mladenovi´c∗†
Daniel Villeneuve∗
∗
´ ´ GERAD and Ecole des Hautes Etudes Commerciales 3000 chemin de la Cˆ ote-Sainte-Catherine, Montr´eal, Canada H3T 2A7 Email: {jacques, nenad, danielv}@gerad.ca †
1
Mathematical Institute, Serbian Academy of Sciences and Arts Kneza Mihajla 35, 11000 Belgrade, Yugoslavia Email:
[email protected]
Introduction
In some schools and universities, students from the same grade must sometimes be divided into several teams within a classroom in such a way that each team provides a good representation of the classroom population. Therefore a first problem to solve is to choose student attributes (criteria) to include into the process of designing the teams and to determine their relative importance for the school administration. For example, possible balancing criteria are: (i) closeness of the ratio of women to men between a team and the classroom population; (ii) closeness of the average age or of the past school grades; (iii) similarity of the ratio of the countries of origin, or of the qualifications, etc. A second problem, which is the subject of this paper, is to form these teams and to measure the quality of their balance. In this paper we assume that the attributes and their weights are already given. We first propose some mathematical models that take into account the attributes assigned to the students. Two different ways of measuring the balance among teams are proposed: min-sum and min-max models. Two formulations are considered and both exact and heuristic solution methods are developed. The exact method consists in solving a Set Partitioning problem based on the enumeration of all possible teams. In order to solve large problem instances, we have also developed Variable Neighborhood Search (VNS) metaheuristics. The remainder of the paper is organized as follows. Sections 2 and 3 consist of the mathematical formulations and solution methods, respectively. Concluding remarks on the computer experiences ´ ´ conducted on some real-life problems from the MBA Program at Ecole des Hautes Etudes Commerciales in Montr´eal appear in the last section.
2
Mathematical formulations
The following notation is used for the design of balanced teams (DBT). • n : number of students. • p : number of teams. • q : number of attributes associated with each student. • m : number of students in each team; we assume first that m = n/p is an integer number. • A = (a ) = (aik ) : value of attribute k (k = 1, . . . , q) for student i (i = 1, . . . , n); a ∈ Rq . • w = (wk ) : weight factor of attribute k (k = 1, . . . , q); w ∈ Rq . • b = (bk ) : target value of attribute k (k = 1, . . . , q); b ∈ Rq . Porto, Portugal, July 16-20, 2001
282
MIC’2001 - 4th Metaheuristics International Conference
For some attribute, bk (k = 1, . . . , q) can be computed as the average value taken on the student population; otherwise, it is given by the school authority. Let Cj denote the set of students in team j (j = 1, . . . , p) while v represents its centroid in Rq , that is, |Cj | = m and v = (1/m) i∈Cj a . A combinatorial formulation of the min-sum DBT model is to find a partition Cj∗ (j = 1, . . . , p) such that the following function reaches its minimum: p
d(b, v ),
(1)
j=1
where d(·, ·) denotes some weighted distance function in Rq . For the min-max DBT model, we just need to replace the sum operator with the maximum operator. In the min-max model, the centroid of the worst team should be as close as possible to the target vector b. The DBT problem is related to several well-known mathematical programming and combinatorial optimization areas. Its mathematical models may be considered within Goal Programming, Multiobjective Programming, Clustering, Location Analysis, Integer Programming, etc. For example, if we consider each criterion separately, then we have a multiobjective problem with q different objective functions, where vector b = (b1 , . . . , bq ) ∈ Rq may be seen as the ideal vector. In order to provide some mathematical programming formulations, we first introduce the following two sets of decision variables. • x = (xij ) : binary variable that takes the value 1 if student i (i = 1, . . . , n) is assigned to team j (j = 1, . . . , p), and the value 0 otherwise. • v (x) = (vjk (x)) : centroid value of attribute k (k = 1, . . . , q) for team j (j = 1, . . . , p) defined as 1 aik xij m i=1 n
vjk (x) =
(j = 1, . . . , p; k = 1, . . . , q).
(2)
Regardless of the area of interpretation, the min-sum and min-max DBT problems can be formulated as follows: p min d(b, v (x)) or min max d(b, v (x)) (3) j=1,..,p
j=1
s.t.
p
xij = 1
(i = 1, . . . , n)
(4)
xij = m binary
(j = 1, . . . , p) (i = 1, . . . , n; j = 1, . . . , p).
(5) (6)
j=1
n
i=1
xij
Constraint sets (4)–(6) describe the structure of a network flow problem in which each student i is assigned to a single team while each team j comprises exactly m students. In the objective function, d(b, v (x)) represents some measure of the weighted distance in Rq between the target vector b and the centroid v (x) of team j. It can be defined in several ways, two of them being given below: d1 (b, v (x)) = d2 (b, v (x)) =
q
wk |bk − vjk (x)|,
(7)
wk (bk − vjk (x))2 .
(8)
k=1 q k=1
Even though it is possible to linearize the first weighted distance function in the min-sum model by making use of relation (2) and by adding some additional variables, solving the resulting Integer Porto, Portugal, July 16-20, 2001
MIC’2001 - 4th Metaheuristics International Conference
283
Programming problem is, in most cases, useless. Indeed the solution to the Linear Programming relaxation is usually so fractional that even finding a single integer solution in a reasonable amount of time is very hard. The second type of formulation overcomes this difficulty. A classical Set Partitioning formulation can be obtained by enumerating all the possible teams with n . Then, in the min-sum exactly m students. The number of such teams is obviously given by T = m model, we need to find p teams among T that partition the students in such a way that the sum of the weighted distances between their centroids and the target vector is minimum. In the next model, we use the following notation: 1 if student i (i = 1, . . . , n) belongs to team t (t = 1, . . . , T ), • δit : binary coefficient taking the value and the value 0 otherwise; note that i δit = m holds for each team t. • v = (vtk ) : centroid value of attribute k (k = 1, . . . , q) for team t (t = 1, . . . , T ), defined as 1 aik δit m i=1 n
vtk =
(j = 1, . . . , p; k = 1, . . . , q).
(9)
• ct = d(b, v ) : cost coefficient for team t (t = 1, . . . , T ) computed using (7) or (8). • yt : binary variable that equals 1 if team t (t = 1, . . . , T ) is chosen, and 0 otherwise. It should be noted that v and δit are not anymore decision variables since they do not depend on the yt variables. Parameters ct and δit can be calculated in a preprocessing step. Hence the min-sum DBT model in (3)–(6) can be reformulated as follows:
min
T
ct y t
(10)
t=1
s.t.
T
t=1 δit yt
yt
=1 binary
(i = 1, . . . , n) (t = 1, . . . , T ).
(11) (12)
There is an important difference between these two mathematical programming formulations of the min-sum model. The first one (3)–(6) suffers from the duplication of the same solution p! times, and the feasible set should be reduced by introducing into the model some symmetry breaking constraints. For example, if two teams just exchange their index number, the solution is still the same. Note that the second model does not have such difficulties. Indeed, model (10)–(12) can be obtained from the first one by using the appropriate Dantzig-Wolfe [1] decomposition or, equivalently, Lagrangian relaxation [2]: constraint set (5) must be kept within the subproblem structure used to identify the m-student teams.
3
Solution methods
In this section we present different strategies for solving either the min-sum or the min-max DBT problem. If the number of possible teams is not too large, than a solution to the min-sum problem can be obtained by solving the Set Partitioning problem (10)–(12). Firstly, we propose an exact solution method that is based on this Set Partitioning formulation. Secondly, we develop two heuristics based on Variable Neighborhood Search (VNS) and Variable Neighborhood Decomposition Search (VNDS), respectively. The Set Partitioning approach is restricted to solving the min-sum problem, but both neighborhood search heuristics may solve the min-sum and the min-max problems. Porto, Portugal, July 16-20, 2001
284
MIC’2001 - 4th Metaheuristics International Conference
n of possible Set Partitioning solution for the min-sum DBT problem. If the number T = m teams of students is less than a few thousands, the Set Partitioning formulation (10)–(12) may be solved directly by using a standard optimizer for mixed-integer programs. However, for 5-student teams, this approach is limited to instances of about 20 students. In order to solve larger instances, we reduce the Set Partitioning model to an equivalent smaller one by computing an upper bound z¯ on the objective value and a vector of dual variables feasible for the linear relaxation of (10)–(12). This vector of dual variables provides a lower bound z on the objective function (10) of the min-sum problem and it is well-known that variables yt (t = 1, . . . , T ), with reduced cost c¯t relative to these dual variables, will not be part of any integer solution if c¯t > z¯ − z. Obviously, a smaller gap z¯ − z leads to a smaller Set Partitioning model. In order to minimize the gap, we first try to maximize the lower bound z by solving the linear relaxation of (10)–(12). Since the number T of variables will usually be large (several million), we solve the problem by using an external pricing procedure coupled to a standard optimizer for linear programs, resulting in an algorithm similar to the revised simplex method. Since we do not know of a formulation of the pricing subproblem that would allow its efficient solution, our procedure performs an exhaustive enumeration of all the T teams in order to find a subset of columns with negative reduced cost. After a few iterations, we obtain an optimal vector of dual variables and a lower bound z. Secondly, we need to compute a small upper bound z¯ on the objective function (10). Since all the teams are admissible, an arbitrary partition made of m-student teams yields an initial upper bound. A better bound could be obtained by using one of the heuristics described in the next two sections. In practice, we expect the value of the optimal solution to (10)–(12) to be within a small percentage of z. Thus, )z as an initial approximate upper bound. If there we might conveniently consider using zˆ = (1 + 100 is no solution to the reduced Set Partitioning model based on this approximate upper bound, we may still increase it by a fixed amount, eventually finding an optimal solution. We have solved instances of size n = 65 and m = 5 (implying that T > 8 · 106 ) to within 0.01% of optimality in a few minutes using this approach. We used the linear and mixed-integer optimizers of CPLEX 6.6 and a parallel version of the pricing procedure, running on a SUN ULTRA 2360 workstation.
Basic VNS for the DBT problem. The basic idea of Variable Neighborhood Search is a change of the neighborhoods in the search for a better solution (see [6], and for recent surveys, see [3] or [4]). To construct different neighborhood structures and to perform a systematic search, one needs to have a way for finding the distance between any two solutions, i.e., one needs to supply the solution space with some metrics (or quasi-metrics) and then induce neighborhoods from them. Let the solution space S be the set of all possible partitions of the students into p teams. Such a partition s ∈ S can be represented as a graph G(V, E) with p stars (|V | = n + p, |E| = n), where the centroid of each team is a star. Then we supply the solution space S with a distance ρ as follows: ρ(s1 , s2 ) = |E1 \ E2 |/2,
s1 , s2 ∈ S.
In other words, if one student pair just exchanges teams, the two solutions are at distance 1 since exactly two edges have been removed (and replaced by two others). All such possible pairs of exchanges define the neighborhood N1 (s) of some solution s. If student pairs exchange their team membership, we say that the old and the new solutions are at distance , hence defining the neighborhood structure N (s), for s ∈ S. The single parameter in the basic VNS is a maximum distance max that is used in the metaheuristic search. In our basic VNS, interchange of ( = 1, . . . , max ) student pairs is used for the perturbation of an incumbent solution. We have also developed another shaking procedure that forces all students from a randomly chosen team to change their team. Each of them is interchanged in the best possible way, i.e., the objective function is increased at a minimum cost. In that way, the maximum distance of the newly generated solution to the incumbent one is m. The local search is performed within N1 , and an initial solution is obtained by using a random partition. Porto, Portugal, July 16-20, 2001
MIC’2001 - 4th Metaheuristics International Conference
285
VNDS for the DBT problem. VNDS extends basic VNS into a two-level VNS scheme based upon decomposition of the problem (see [5]). It can be seen as a variant of the well-known Successive Approximation method in Numerical Analysis. VNDS is proposed as a mean for solving large problem instances. In solving DBT problems by VNDS, we first join two teams and try to see if there is a better partition of the 2m students. If yes, the incumbent solution is improved and we try with two others teams; if not, we take at random three teams and try to get a better reorganization (partition) of the 3m students. If we find a better solution, we accept it as the incumbent solution and start again with merging two teams; otherwise we increase the number of merged teams to four. When the number of merged teams reaches p, we again start with two teams selected at random. The procedure stops when some condition is met, for example when the maximum time allowed in the search is attained. Recall that each smaller problem involving the students of two teams up to p teams is solved by using the basic VNS. The use of this extended VNS scheme allowed us to obtain results of similar quality to those presented above for the Set Partitioning model, but faster.
4
Concluding remarks
´ Computational experiments will be presented for several data sets from the MBA Program at Ecole des ´ Hautes Etudes Commerciales in Montr´eal. These were used to test the various approaches described in the previous sections. In order to take into account some additional constraints encountered in these real-life instances, we had to slightly modify the above mentioned approaches in mainly two ways. The first one was necessary because the ratio n/p is not always an integer number. For example, if there are 64 students, the school seeks a partition of eight 5-student teams and six 4-student teams. The second modification is needed in practice after each semester. At that time, it is necessary to modify all the teams for the next semester in such a way that any two students that were previously in the same team do not appear anymore together during the next semester of their program. The mathematical models proposed in Section 2 can be extended to satisfy these two extensions.
References [1] G.B. Dantzig and P. Wolfe. Decomposition Principle for Linear Programs. Oper. Res. 8: 101–111, 1960. [2] A.M. Geoffrion. Lagrangian Relaxation for Integer Programming. Math. Prog. Study 2: 82–114, 1974. [3] P. Hansen and N. Mladenovi´c. An introduction to Variable Neighborhood Search. in S. Voss, editor, Meta-heuristics, Advances and Trends in Local Search Paradigms for Optimization, pages 433–458, Kluwer Academic Publishers, Dordrecht, 1999. [4] P. Hansen and N. Mladenovi´c. Variable Neighborhood Search: Principles and Applications. European J. Oper. Res. 130, (3): 449–467, 2001. [5] P. Hansen, N. Mladenovi´c and D. Perez-Brito. Variable Neighborhood Decomposition Search. Les Cahiers du GERAD, G-99-53, Montr´eal, Canada, (1998). (to appear in J. of Heuristics). [6] N. Mladenovi´c and P. Hansen. Variable Neighborhood Search. Computers Oper. Res. 24: 1097–1100, 1997.
Porto, Portugal, July 16-20, 2001