Course Scheduling: Metrics, Models, and Methods

0 downloads 0 Views 259KB Size Report
Jul 9, 1996 - Table 1: Id code for parameter mix of Cycle(IS,Con ict Coloring,SC,C,I). 6.2 Initial Sectioning. In our experiments, the median number of con icts ...
Course Scheduling: Metrics, Models, and Methods

Gary Lewandowski Xavier University Department of Mathematics and Computer Science Cincinnati, OH 45207-4441 [email protected] July 9, 1996

Abstract We consider the course scheduling problem when one is given lists of courses students would like to take, a section bound B , and a number of time slots T in which the courses can be scheduled. In this problem we must both assign students to sections of their selected courses, and build a timetable for the sections. This variant on timetabling problem has seen little previous work. In this paper we suggest a metric for measuring success of a solution, study two models for the problem, and introduce an algorithm that cycles between improving the timetable and the student schedules in order to get a better overall solution than simply treating the two problems separately. In solving the two sub-problems we have explored heuristics that include iterative improvement and bipartite matching as key components. We have conducted experiments on the heuristics presented both to test the heuristics and to evaluate the models we present. Our experiments indicate that cycling is indeed better than treating the two sub-problems separately, and that one of the models we studied, the CourseGroup model, may produce data similar enough to real data that it can be used by researchers who do not have access to real data. Our code, data, and data generators are all available to provide some basic benchmarking materials for future research.

1 Introduction Course scheduling has been a much-studied problem, with many variations introduced re ecting the local concerns of various researchers. In this paper, we consider an instance of the Course Scheduling Problem to be the following. We are given K courses, N student schedules listing the courses a student wishes to take, a section bound B on the maximum number of students that can be in a section of any given course, and a bound T on the number of time slots available for scheduling sections of courses. The problem is to build a timetable (a set of triples ) that schedules 1

all sections in no more than T time slots, and to place students into sections of the courses listed in their schedules in a way that maximizes overall student satisfaction with their schedules. This problem encompasses both the traditional timetabling problem of scheduling courses given con icts based either on student desires or teacher con icts and the less-studied student sectioning problem of assigning students to sections of courses given a timetable. Considering the problem this way allows us to consider optimality from a student's perspective, an approach not generally taken in previous work. The course scheduling problem is NP-complete. This can be shown by a reduction from graph coloring, since any non-trivial instance (G(V; E ), T ) of the coloring problem can be mapped into an instance of the course scheduling problem with jV j vertices, a list of jE j student schedules (each schedule selects the 2 courses corresponding to an edge of G). Setting the section bound B = jV j, we can create a timetable of size T with no con icts if and only if we can color G with T colors. In this paper we suggest a new metric for measuring solution quality, study two models for generating data to study the problem, and present a method of solving the problem that cycles between building student schedules and a timetable for the courses. Information from the student schedules is used to improve the timetable, and the timetable is then used to improve the student schedules. Our heuristic methods include graph coloring and iterative improvement techniques for building a timetable, and bipartite matching for creation of student schedules. Our results indicate that of the two models studied here, one appears to be a reasonable model for generating data similar to real data and may provide a basis for more theoretical work. Our cyclic method for solving the problem proves to be very successful compared to simply building a timetable without rebuilding student schedules. The results indicate that treating course scheduling as mainly a graph coloring problem yields poor overall results. We nd that using bipartite matching to assign students to schedules after a timetable is built is vastly improves the overall solution. Although our method did not nd the optimal schedule on our test data, we believe these techniques are a step towards a general method that will provide high quality solutions. The paper is organized as follows. In Section 2, we de ne terms and conventions we use in the paper. Section 3 discusses previous work related to this problem. In Section 4 we discuss the models we are exploring to study the problem. Section 5 introduces our method of attacking the course scheduling problem; we break the problem into three parts: initial schedule construction, timetable construction, and schedule construction based on a timetable. This section describes the complexity of each of these parts and the algorithms we have tested. Section 6 describes the experimental work we have done on this problem as it relates to solving the course scheduling problem as a whole, its sub-parts, and what we have learned about the models we are exploring.

2

2 Terms, De nitions and Conventions For any list of N student schedules and assignment of students to sections of those courses, the section con ict graph is the graph constructed by considering each section of a course to be a vertex, placing an edge between two vertices if a student has been assigned to both of those sections. The weighted section con ict graph is a section con ict graph with edge weights; the weight of an edge (u; v ) is the number of students assigned to both u and v . We will use the term Minimum Coloring to refer to the problem of obtaining a proper coloring on a graph with as few colors as possible. We will use the term T -coloring to refer to the problem of coloring a graph using only T colors while minimizing the number of con icts according to some metric M . We considered two di erent metrics for measuring con icts in T -colorings and in student section assignments.

 Edge Con icts: this metric corresponds to thinking of the problem as a graph coloring problem. The edge con ict score is the sum of the weights of all edges (u; v ) such that u and v have the same color.  Course Con icts: this metric is the sum over all students of the number of con icts each

student has (e.g. if a student wants twelve courses and has them scheduled in ve time slots, then that student has seven course con icts).

To see the di erences in these metrics, consider the following example listing a partial timetable for several courses, and individual schedules for three students. (In this example, we have only one section of each course; in the general version of the course scheduling problem we must also be concerned with assigning students to sections). Course Time Physics 1 Band 2 Calculus 3 Welding 4 Geometry 2 Chemistry 3 History 3 Student Course 1 Course 2 Course 3 Sean Physics Band Calculus Leslie Band Geometry Chemistry Pat Band Calculus Chemistry 3

Course 4 Welding Calculus History

The edge con ict score, restricted to these students, would be 5 (a score of two from Leslie and three from Pat). The course con ict score would be 4 (two from both Leslie and Pat). This di erence in scores re ects the idea that, in terms of student satisfaction, Leslie and Pat are equally unsatis ed, since each only gets to take two of their desired courses. While edge con icts are not the same as course con icts, we have observed empirically that the best solutions as measured by course con icts generally also have lower edge con icts. An advantage to thinking in terms of course con icts is that the metric makes sense when one considers altering student schedules. The edge con ict metric is oriented towards altering the timetable since an obvious decrease to the objective function is obtained by removing an edge from the graph and placing one somewhere else. Unless otherwise noted, we will use the course con ict metric whenever we discuss con icts. We divide the Course Scheduling Problem into three parts: Initial Sectioning, which is the initial assignment of students to sections of courses; Timetable construction; and Schedule Construction, which is a reassignment of students to sections based on the current timetable.

3 Related Work Most previous work related to the course scheduling problem has concentrated on timetable construction with little regard to student satisfaction, concentrating instead on constraints imposed by teachers. After a great deal of work in the 1960s, usually applying various coloring heuristics to the problem [1, 2, 8, 23], the timetabling problem with teacher constraints was shown to be NP-complete in 1976 by Even, Itai and Shamir [9]. More recently, a variety of techniques have been used in timetable construction. We give here a few examples to show the various avors of approaches. Selim [20] treated the problem as a con ict graph and showed how one could pick out certain vertices in the con ict graph to split, reducing the chromatic number of the graph to the desired value by increasing the number of sections in the timetable. Chahal and de Werra [6] presented an interactive system for timetable construction using network ows. Franklin, Jenkins and Woodson [11] use a linear programming model to solve a timetabling problem in which only single sections of each course are o ered. Our heuristics for timetable construction grow out of the graph coloring heuristics but also incorporate iterative improvement methods. One of our interests in this study was to compare the e ectiveness of graph coloring versus an iterative improvement algorithm. Our work on schedule construction uses bipartite matching, and thus could be viewed as an extension of the work using

ow methods on timetable construction. There has been less work on schedule construction than on timetable construction. In 1966, Macon and Walker [18] gave a monte carlo algorithm for assigning students to desired sections of 4

courses; the probability of being assigned to the desired section was set as the ratio of the number of seats left in the section to the number of requests remaining for the section. In 1967, Busam [5] gave an algorithm that allowed students to indicate a preferred section, trying this section rst if it was available, and otherwise trying in order the sections with the most seats available. Feldman and Golumbic [10] have approached the problem as a constraint satisfaction problem, building an interactive system for students to construct a schedule based on a static timetable. Our work does not currently allow students to indicate section preference. However, our work has the advantage that more students may take their desired courses, since we will modify the timetable to satisfy student needs. E orts in both timetable and schedule construction are few. In 1990, Hertz [13] presented work using Tabu Search to build a timetable for instructor-oriented constraints with variable length courses and groups of students all taking the same courses. He points out that this solution can easily be extended to students with individual schedules without increasing the size of the problem. Still more constraints can be added to the problem to take into account the problem of dividing students into sections. While our method also uses iterative improvement techniques, we are interested in exploring the use of graph coloring techniques and bipartite matching to build the timetable and schedules; in the future we plan to compare our method against Tabu to see if there is a speed and/or quality gain in using these coarser modi cations to the schedules over the many small modi cations he uses in the Tabu search. In 1992, Tripathy [21] considered both teachers and students, including section assignments. He breaks the solution into three steps: generation of a con ict matrix describing the courses that should not be concurrently scheduled; sectioning of students; and the generation of a class schedule. His system is interactive, relying on the user to adjust section assignments if the solution found is not feasible. Our work presents heuristics to automate the sectioning process. Vitanyi [22] considered the problem of constructing a T -coloring of a graph with the goal of minimizing edge con icts. This corresponds to building a timetable with T time slots. He gives a (1 ? 1=T ) approximation algorithm for the problem (i.e. the number of con icts is within a factor of 1=T of optimal). Berry, Condon and Halsey [3] have found a (1 ? 1=e) approximation algorithm for T -coloring when using the course con ict metric. Both of these algorithms are based on the idea of randomly assigning nodes of the con ict graph to colors (the algorithms are then made deterministic using the method of conditional expectations). Frieze and Jerrum [12] improved on Vitanyi's result using semi-de nite programming. So far, the results have not extended to T -coloring when the con icts are counted as course con icts instead of edge con icts.

4 Models Not surprisingly, almost all previous work involved a particular set of data, speci c to the authors. This, along with small variations in the problems, has prevented any direct comparison of the 5

algorithms presented. While we too have acquired acquired data from a typical high school, we have also explored modeling the course scheduling problem. The software to generate random data based on these models is publically available and we believe it can be used to aid anyone study the problem. The model may also serve as a starting point for a theoretical attack on the problem.

4.1 CourseGroup Model The rst model, called the CourseGroup model, gives several levels of control over the distribution of the popularity of the courses selected by students. It is designed to re ect two important factors in the selection of courses. The rst is that students in di erent groups, such as grades or tracks of study, will have di ering sets of core courses. The core courses of di erent groups can usually be scheduled at the same time. The core courses are also likely to be the most popular, hence needing the most sections, which complicates the scheduling process. The second important factor is the sets of courses that intersect the interests of various groups. These course will often be less popular and make scheduling more dicult by preventing the scheduler from considering each student group as an independent, smaller problem. They also often have only one section, reducing the viable options for the student's assignment to sections of core courses. The CourseGroup model captures these factors by rst placing students into groups and then creating course groups than re ect the individual groups of core courses and the courses that intersect interests of several groups of students. More precisely, the CourseGroup model rst speci es the number of students, courses and number of courses selected by each student. Next, the model allows speci cation of student groups. Designation of a student group is done by labeling the group and indicating the percentage of total students in this group. Course groups are speci ed by listing the student groups that have interest in taking courses in the group; for each student group s listed as being interested in the course group, a probability p is given, specifying that as a student in group s is selecting courses, (s)he will select a course from this course group with probability p . For each course group, the number of courses in the group is also speci ed, allowing further control over how popular courses within a group will become. (i.e. a course could be popular because there are very few courses in its course group, or because many students select courses from its course group.) A student's schedule is built as follows. While the student's schedule is not complete, a course group is randomly selected based on the course group probabilities relevant to the student's group. From the selected course group, a course is randomly chosen. Duplicate course selection is not allowed. s

s

6

4.2 Uniform Model Our second model is a simpli cation of the CourseGroup model. In this model, we remove the ability to specify the size of a course group. There is only one type of group in this model; a number k of groups is speci ed. This is used to divide the students and courses uniformly (thus with each group we associate some number of students and some number of courses). Crossovers between groups are speci ed with a k  k crossover matrix { entry (i; j ) is the probability that a student in group i will choose a course in group j . We refer to this model as the Uniform model. While the CourseGroup model is more complicated, our experiments indicate it creates data more closely representing our real data than the Uniform model. However, we emphasize that more work is needed on this and/or other models for both our version of the problem and closely related problems. A aw in both models is that we cannot construct a set of data for which we know the optimal solution.

5 Methods Our work on course scheduling methods started from an interest in the problem as an application of graph coloring [17]. In the course of applying graph coloring to the timetable construction aspect of this problem we learned that the subtle details of sectioning students are as important as building the timetable itself. Additionally, as our results below will show, we discovered that graph coloring itself is not the best approach to timetable construction, despite many earlier uses of graph coloring in studying the problem [1, 2, 8, 23]. In general, little work has been done on schedule construction, and none on the initial sectioning of students. Our approach here is to try some simple heuristics for initial sectioning, one of which is very naive and another which is similar to ideas of Selim on splitting vertices, then construct the timetable, and then do schedule construction based on the timetable. The schedule construction improves the overall solution even though the timetable was built based on the initial sectioning. From this we have learned that the current timetable and current schedules can be used in a cycle to improve the overall course scheduling solution. Below we describe each subproblem as well as the overall algorithm in more detail.

5.1 Initial Sectioning 5.1.1 Complexity Discussion Given N student schedules listing selected courses, we would like our initial section assignment to create a section con ict graph that leads to a timetable with the fewest possible con icts. It is not clear a priori how to achieve this goal, given that minimizing the number of con icts 7

in the timetable is NP-complete. We have very little information to work with, given that we have no timetable at all when we start. One approach is to simply use a naive assignment with no speci c goal other than making sure no more than B students are in each section. Another approach is to section with the goal of minimizing the density of the section con ict graph. In G graphs, lower density graphs have lower chromatic numbers (see, for example, Bollobas and Thomason [4]), and we may hope this is also the case for section con ict graphs, and this graph will result in fewer con icts when using coloring methods to produce a timetable. Unfortunately, section assignment with this goal, which we call the Minimum Density Section Assignment problem, is NP-complete [16]. Below we give a heuristic for this problem. However, even if a heuristic could minimize the density of the graph, this will not guarantee the number of colors needed is minimized. The following small example shows a di erence of one color from optimal, but can be extended quite simply to give an example where the bound is arbitrarily large. Consider courses A, B, C, D, F, G with section bound 3. Let the student schedules be A B C B C D A B C A F G A C D The minimum density graph uses eleven edges, placing students 1,3, and 5 into section one of course A, and student 4 into section two of course A. The resulting graph has a 4-clique. If instead we placed student 4 into section one and student 5 into section two of course A, then the resulting graph would have 12 edges, but needs only 3 colors. This example can be expanded so that for every 11 edges in the lowest density graph we have one extra color. This counterexample should not deter us from pursuing heuristics based on this approach. For example, the goal of Leighton's RLF graph coloring heuristic [15] is to minimize the density of the remaining uncolored subgraph. Johnson et al. [14] showed that achieving this goal would not yield an optimal coloring. Nonetheless, RLF performs very well in practice. n;p

5.1.2 Heuristics in this work We examine two heuristic approaches to this problem. The rst is simply a naive assignment, arbitrarily ordering the students and then assigning students to the rst available section. The second is an attempt to minimize the density of the section con ict graph by clustering students with similar schedules. The clustering heuristic attempts to reduce the density of the section con ict graph by 8

preventing edges between several sections of popular courses; e.g., if course A has two sections, a and a , course B sections b and b , the aim of this heuristic is to avoid having edges (a; b); (a; b ); (a ; b); (a ; b ) by grouping the students taking these courses into sections so that instead there will be only edges (a; b) and (a ; b ). (This idea is somewhat similar to Selim's [20] vertex splitting.) The heuristic is as follows (also see the pseudocode for the inner loop below in Figure 1). Let the popularity of course i be the number of student who have chosen this course and have not yet been assigned to a section. Pick the most popular course and group students having this and several other courses in common. Pick B of these students and assign them to the same section of each of these courses. (Details on how this grouping is done are shown in Figure 1.) Repeat until there are no more empty sections in which to assign students, then nish up by sectioning the remaining courses using the naive assignment. Details on how to cluster students choosing course v : 0

0

0

0

0

0

0

0

Let Courses = {v} Let C = set of courses taken by all students who also took v for each course u in C Let weight(v,u) = number of students requesting both Let weight(u) = popularity(u) * weight(v,u). Let L[1..|C|] = C ordered by decreasing weight(u) Let max = largest index such that the number of students requesting L[1..max] is at least Threshold * B (Threshold is percentage set by user.) Let S' = upto B students requesting courses L[1..max] for i = 1 to max do If an empty section of L[i] is available, assign all students in S to this section endfor

Figure 1: Clustering students that choose course v

9

5.2 Timetable Construction 5.2.1 Complexity Discussion After sections have been assigned, a timetable can be considered to be a T -coloring of the weighted section con ict graph. Timetable construction is often mentioned as an application for minimum graph coloring, however if the minimum coloring results in a graph colored with more than T colors, the timetable must be constructed by reducing the number of colors to T . Unfortunately, reducing a K colored graph to a coloring using only T colors in a way that minimizes con icts is NP-complete (the Max Cut problem trivially reduces to this problem), so we must use a yet another heuristic strategy to remove the extra colors.

5.2.2 Heuristics in this work We explore both a T -coloring method and a minimum coloring approach to building the timetable. In order to compare a minimum coloring approach to timetabling with a T -coloring approach, we color the section con ict graph using the The minimum coloring approach we use is the Hybrid coloring heuristic, a parallel graphcoloring heuristic of Lewandowski and Condon that combines branch and bound and simulated annealing techniques [17]. If the coloring uses more than T colors, the extra colors are folded back onto the rst T colors, placing color class i; i > T into color class i mod T . We also use a T -coloring heuristic, which we call con ict coloring; it is an iterative improvement algorithm, similar to the Fixed K algorithm used in Johnson et al. [14]. We use only T colors and attempt to nd a coloring of the section con ict graph that minimizes the total number of course con icts. The heuristic begins by quickly obtaining some T -coloring of the graph. (Currently we color the graph with RLF, then fold down to T colors.) Given a T -coloring, the main loop of the heuristic runs I times or until the con ict score becomes zero. Each iteration of the loop consists of selecting a vertex (course) and a new color (time) it should be assigned. The selection is done as follows. A random (vertex, new color) pair is examined. If the con ict score would decrease as a result of this re-coloring of the con ict vertex, the pair is selected. Otherwise, with a probability based on the amount of disimprovement, the pair is selected. Possible pairs are examined until some pair is selected. The algorithm can be adapted to concentrate on minimizing edge con icts instead of course con icts. We spent some time on initial experiments to compare the metrics. Our experiments indicate that the nal number of course con icts in the resulting timetable are nearly the same whether course con icts or edge con icts are minimized during the actual coloring. In fact, the results are similar even if one uses one of the metrics to choose a move and the other metric when 10

deciding if the best result should be updated.

5.3 Schedule Construction 5.3.1 Complexity Discussion After a timetable has been constructed, we may be able to decrease the number of con icts by rebuilding the student schedules that were created during the initial sectioning. If we remove the restriction on the number of students assigned to a course at a given time, then there is no limit on the number of students in a section and the problem for a given student reduces to bipartite matching: the courses a student wishes to take make up one component of the graph, the possible time slots of all courses make up the other component. Edges are placed between a course and each time slot in which it could be scheduled. The best matching will yield the best possible schedule for the student. Unfortunately, when the number of students allowed into a section is bounded, this problem is NP-complete [16].

5.3.2 Heuristics in this work We present two heuristics for constructing schedules given a timetable. While neither can guarantee an optimal revision, we do note that results from both of these heuristics may improve with repetition. The rst heuristic works on the observation that if there are con icts between times of some of the sections selected by a student, then there will be time slots unused by that student. If the student can move from a section in con ict to a section at an unused time, the con icts are reduced. The algorithm proceeds by processing the students in some order (currently this is the initial order of the data). For each student with con icts, the section assignments are examined in order of the number of con icts they have with other section assignments. For each section in con ict, a list of possible new sections is generated. From this list (if non-empty), some section is used to replace the current assignment. Currently the new section used is the least popular section. Sections become unavailable when they have B students in them. We call this heuristic simple. The second heuristic uses the fact that matching works when the section size is unbounded. Again examining the vertices in some order, the bipartite graph is constructed with currently available sections (i.e. those currently having less than B students assigned to them, or the section to which the student is currently assigned) and the matching is done. This heuristic will be referred to as matching.

11

5.4 The CYCLE Course Scheduling Heuristic In this section we present the notation for the algorithm that combines the heuristic strategies used on the subproblems to solve the course scheduling problem. We refer to this algorithm as Cycle. Cycle has parameters specifying the heuristics for the initial sectioning, timetable construction, and schedule construction. In addition to the above parameters, Cycle also takes a cycling parameter. After building the initial section assignment, Cycle cycles between the timetable and schedule constructors, with the goal of using the revised student schedules to improve the timetable construction, which in turn will be used in building new student schedules. Figure 2 gives the pseudocode for the Cycle algorithm. Procedure Cycle(IS, TT, SC, C) /* IA is the Section Assignment heuristic

*/

/* TT is the Timetable Construction heuristic

*/

/* SC is the Schedule Construction heuristic /* C is the number of (TT,RA) cycles to run /* Global:

*/ */

N: the list of student schedules, B: the section bound, T: the number of time slots

*/ Begin Let N'=IS(N,B)

/* the initial section assignments */

for i = 1 to C do Let timetable = TT(N',T)

/* build/revise timetable */

Let N' = SC(timetable,N',N,B,T) /* schedule construction */ endfor end

Figure 2: Outline of Cycle

6 Experimental work on the Course Scheduling Problem 6.1 Experimental Method We have run experiments to study the heuristics for each sub-problem in the course scheduling problem, the Cycle heuristic, and the models we have built. The data we use comes from three 12

sources. Course schedules were collected from a midwestern high school of around 500 students. This data is guaranteed to have some section assignment for which the students can be assigned to sections and scheduled without con ict into 14 time slots. We also generated data using both the CourseGroup and Uniform models. Parameters for the models were chosen with the aim of getting data that is comparable to the real data in terms of number of courses, students, and schedule sizes. The exact speci cation les used for each model can be found in Appendix 7. The experiments use three sets of schedules: real data (school-nsh) and one set from each of the CourseGroup model (CourseGroup) and the Uniform model(Uniform), using the speci cations given in Appendix 7. Four other sets of schedules data were built by building the model graphs using the same speci cation les with two di erent random seeds. The results were similar on each class of model data so we concentrate on the data described above. For each set of schedules, we ran the heuristics with the section bound set at 25 and at 30. For the real data we also ran the heuristics with additional data indicating the number of sections available for the course in the actual school timetable. Thus we have a total of seven sets of data. We ran Cycle with nearly every possible combination of parameters to solve the entire course scheduling problem. The only heuristic we did not try with all other combinations was the Hybrid coloring heuristic. The comparison between Hybrid and con ict coloring, as described below, convinced us there would be no point to using it. The number of time slots (T ) was always 14. Preliminary trials were used to determine that a cycling parameter of C = 3 was the largest showing improvement in results. The number of iterations of the con ict coloring heuristic is 1000 when more than one cycle is used, and 3000 when only one cycle is used. We empirically observed that improvement slows down signi cantly beyond 3000 iterations and so used this as the base number of iterations in con ict coloring. The parameter settings for the experiments are summarized in Table 1. For each setting and data set, Cycle was run ten times, using ten di erent random seeds. The random seeds were generated to give as independent of random stream of numbers as possible. In our experiments, the very best results were achieved using Cycle(Clustering,Con ict Coloring,Matching,3,1000). We do not feel con dent, however, in declaring Clustering obviously superior to naive, as we discuss below. We found that the feedback mechanism in Cycle de nitely contributes to its success. In most cases, Cycle with C = 1 and I = 3000 could not match the results when C = 3 and I = 1000. Below we discuss the results of each sub-part in more detail. The results are summarized in gures 3, 4, and 5 and tables 5, 6, and 7. The gures in particular give one an easy way to see how the parameters related to the number of con icts for each data set.

13

Id code 1 2 3 4 6 7 8 9

Cycle parameters Initial Sectioning Schedule Construction heuristic heuristic Naive Simple Naive Matching Clustering Simple Clustering Matching Naive Simple Naive Matching Clustering Simple Clustering Matching

Cycles Iterations 1 1 1 1 3 3 3 3

3000 3000 3000 3000 1000 1000 1000 1000

Table 1: Id code for parameter mix of Cycle(IS,Con ict Coloring,SC,C,I)

6.2 Initial Sectioning In our experiments, the median number of con icts using clustering was lower than the median number of con icts using naive in all cases except Uniform with a section bound of 30. However, it is hard to proclaim clustering clearly superior since the medians were often very close, and in several cases, the highest number of con icts in the clustering runs were higher than the maximum in the naive runs. Figures 3, 4, and 5 show the data for all runs at each parameter setting and give a good visual indication of how similar naive and clustering are. The comparison of quality between clustering and naive is best seen by comparing parameter settings 1 and 3, 2 and 4, 6 and 8, 7 and 9. starting from a naive assignment. Intriguingly, although the aim of the clustering heuristic is to minimize the density of the section con ict graph, the naive heuristic generally produces section con ict graphs with lower density than clustering. In fact, comparing the densities resulting from these two heuristics with the density of the graph produced by Cycle(clustering, con ict coloring,matching,3,1000) (see Table 2), reveals that our best scheduling results correspond to a graph of higher density. These observations indicate that the course schedule problem is quite di erent from coloring on (random) G graphs, and indicate that a key component of the solution is adding structure to the data. n;p

6.3 Timetable Construction Our experimental results indicate that folding a minimum coloring gives results much worse than con ict coloring. Using con ict coloring on the same data gives results much better than Hybrid (see Table 3). In fact, the results presented for con ict coloring took about seventeen minutes 14

Table 2: Density of con ict graph after sectioning data using naive, clustering, and Cycle(clustering, con ict coloring, matching, 3,1000). Graph CourseGroup, sbd 25 CourseGroup, sbd 30 Uniform, sbd 25 Uniform, sbd 30 school, sbd 25 school, sbd 30 school, sbd input

naive 0.22 0.26 0.25 0.30 0.19 0.21 0.22

cluster 0.27 0.30 0.29 0.32 0.20 0.21 0.22

Cycle 0.30 0.34 0.32 0.35 0.25 0.27 0.24

Table 3: Comparison of course con icts when using Hybrid and Con ict Coloring on schedules with section assignments built by Cycle(clustering,Con ict Coloring,Matching,3,1000). * means Hybrid did not nish after running for three hours on the graph. graph CourseGroup, sbd 25 CourseGroup, sbd 30 Uniform, sbd 25 Uniform, sbd 30 school, sbd 25 school, sbd 30

hybrid 1174 1265 1361 1392 1034 *

con ict 920 916 1029 1027 513 509

(10,000 iterations) on a PC equipped with an Intel 486/100 Mhz processor, while the results for Hybrid ran as long as three hours (or 10,000 iterations of its iterative improvement algorithm) on a CM-5. This suggests that while timetabling has often been listed as an application of graph coloring, traditional minimum coloring is not the best approach.

6.4 Schedule Construction Our experiments indicate that the matching heuristic works much better than the simple heuristic, though it does not yield an optimal solution. Figures 3, 4 and 5 dramatically show the di erence in quality. Table 4 compares the best result over all runs using only the simple heuristic with the worst result over all runs using the matching heuristic. For the simple heuristic, the best result was almost always achieved using clustering as the initial sectioning heuristic and only doing one cycle of 3000 iterations (the lone exception was school nsh with section bound 25, where the best three-cycle run was better). The worst for matching varied a bit more, though generally the 15

Table 4: Comparison of simple and matching heuristics. Best simple results vs. worst matching results. Graph CourseGroup,sbd 25 CourseGroup,sbd 30 Uniform, sbd 25 Uniform, sbd 30 school, sbd 25 school, sbd 30 school, input

best simple 725 725 843 852 363 337 397

worst matching 433 492 536 596 259 285 211

matching/simple 0.60 0.68 0.64 0.70 0.71 0.85 0.53

single-cycle run was among the worst. Even in this lopsided comparison, the ratio of the worst matching score to the best simple score was never worse than 0:85. 900 "coursegroup_25" "coursegroup_30" 800

700

600

500

400

300

200

100 0

2

4

6

8

10

Figure 3: Summary of experiments on coursegroup data set. Y-axis is the number of course con icts, X-axis is the parameter setting id number.

6.5 Discussion of Models for Course Scheduling The experimental results give some insight into how useful model graphs can be in studying the course scheduling problem and some of the important factors in the model. Comparing the distribution of course popularity (using a histogram noting the number of 16

Section Parameter Min Con icts bound setting 25 1 777 2 372 3 725 4 274 6 758 7 164 8 733 9 137 30 1 793 2 426 3 731 4 357 6 765 7 214 8 725 9 235

Median Con icts Max Con icts 810.5 408.5 772 312 783 255.5 757 213.5 826 469 764.5 381.5 791 304 743.5 308

825 433 804 354 812 299 804 307 846 492 797 468 802 324 772 352

Table 5: Summary of Results on CourseGroup data; 10 runs of each parameter setting. Maximum number of con icts is 5952. courses with popularity in a given range, Figure 6) among the two models and the real data, we see that the CourseGroup model schedules are reasonably similar to the real data, while the Uniform model has a density distribution that is much more uniform, despite our e orts to t probabilities into the crossover matrix similar to that of the real data. The key di erence appears to be the ability to a ect the size of a course group. Although both the CourseGroup model and the Uniform model use 8 course groups, the number of courses available in each group is large enough that no single course is selected enough times to push the number of sections needed beyond 3 or 4 (while the real data has several courses needing six). The relatively even distribution of course popularity may also be the factor making the Uniform model schedules much harder to section and schedule. For all heuristics, the con icts in the Uniform model are signi cantly higher than the best results of the CourseGroup model and the real school data. The CourseGroup model section con ict graph always had more con icts than the real data, but it was closer in score than the Uniform model. While within a particular class of data a lower-density section con ict graph did not correspond to a better solution, the correspondence does hold when comparing the models and the real data. Table 8 shows the density and largest clique size found (using the dfmax algorithm of 17

Section Parameter Min Con icts bound setting 25 1 898 2 444 3 843 4 410 6 879 7 253 8 849 9 339 30 1 896 2 513 3 852 4 540 6 889 7 378 8 856 9 431

Median Con icts Max Con icts 919.5 474.5 881.5 450.2 913.5 334 882 387 940 539.5 889 574 939.5 441.5 877.5 476.5

934 505 906 536 949 413 945 445 955 591 914 596 969 498 897 568

Table 6: Summary of Results on Uniform data; 10 runs of each parameter setting. Maximum number of con icts is 5952. Carraghan and Paradalos [7]) in the section con ict graphs formed from the section assignment given by Cycle(Clustering, Con ict Coloring, Matching, 3,1000). Both the clique size and density correspond to the solution quality. Given that the CourseGroup data corresponds to the real data much better than the Uniform Model data, our conclusion at this time is that the more elaborate nature of the CourseGroup model is necessary to control the factors simulating the real data and that while the CourseGroup model is not perfect, it seems to give reasonable data for modeling the course scheduling problem.

6.6 Conclusions  Building the timetable in conjunction with assigning students to sections results in fewer con icts. While we did not get a perfect schedule for the real data, we came close, with most students having no con icts and very few having more than one.

 Initial section assignment is much harder than revising an assignment from a timetable.

Finding a way to apply matching techniques to this problem may provide much better results. We are currently exploring this idea, in fact partially creating an initial timetable 18

1000 "uniform_25" "uniform_30" 900

800

700

600

500

400

300

200 0

2

4

6

8

10

Figure 4: Summary of experiments on uniform data set. Y-axis is the number of course con icts, X-axis is the parameter setting id number. while doing the initial section assignment.

 Con ict Coloring is a simple yet powerful timetabling heuristic; it may perform better if the measure of con icts in the heuristic is modi ed to measure course con icts.

 Bipartite matching is a very powerful tool in section assignment and should be exploited wherever possible.

 Having a section bound and determining the number of sections of each course strictly

based on this value may be an inaccurate way of scheduling courses. The real data with section counts input separately tended to yield much better results than all other courses. One of the major reasons for this is that the number of sections was higher than strictly dividing the number of students in a course by the section bound. It is not hard to add

exibility to the heuristics to allow for smaller classes by setting a minimum class size along with the upper bound. The minimum can be used to compute the total number of sections used.

7 Future Work Our experiments indicate that our cycled approach to scheduling is promising, though not yet ideal since we cannot yet nd an optimal schedule. We are currently working on an approach 19

550 "school_25" "school_30" "school_input"

500 450 400 350 300 250 200 150 100 50 0

2

4

6

8

10

Figure 5: Summary of experiments on school nsh data set. Y-axis is the number of course con icts, X-axis is the parameter setting id number. that constructs the student schedules simultaneously with the initial timetable since part of our problem now may be that the none of our initial sectioning heuristics seems impressive. In terms of models, our CourseGroup model seems to generate data that has many characteristics of our real data, but a aw in the model is that we cannot know if it has an optimal solution. This, of course, hinders its usefulness as a research tool. Our continuing work in this area concentrates on maintaining the structure of this data, while ensuring an optimal solution. In general, we hope that this paper spurs continued work on this application, particularly in the problem of both building a timetable and assigning students to courses.

References [1] M. Almond. An algorithm for constructing university timetables, Computer Journal, 8, 1966, 331{340. [2] J.S. Appleby, D.V. Blake and E.A. Newman. Techniques for producing school timetables on a computer and their application to other scheduling problems, Computer Journal, 3, 1961, 237{245. [3] C. Berry, A. Condon, B. Halsey. Best-schedule, manuscript, 1995. 20

Figure 6: Histogram of number of courses with a given range of popularity. The X-axis groups popularity of courses into ranges of 10, with the numbers on the axis indicating the maximum value of the range. The Y-axis indicate the number of courses having popularity in that range. 60

courses-school courses-UG courses-CG

50 40 30 20 10 0

10 20 30 40 50 60 70 80 90 100110 120 130 140 150 160 170 180 190 200 210

[4] B. Bollobas and A. Thomason. Random Graphs of Small Order. Ann. Discrete Math. 28 1985, 47-97. [5] V. Busam. An algorithm for class scheduling with section preference, Communications of the ACM, 10, 9, 1967, 567{569. [6] N. Chahal and D. de Werra. An interactive system for constructing timetables on a PC, European Journal of Operational Research, 40, 1989, 32{37. [7] Carraghan and Paradalos. An exact algorithm for the maximum clique problem, Operation Research Letters, 9, 1990, 375{382. [8] A.K.Duncan,. Further results on a computer construction of school timetables, Communications of the ACM, 8, 1965, 72. [9] S. Even, A. Itai, and A. Shamir. On the complexity of timetable and multicommodity ow problems, Siam Journal of Computing, 5, 1976, 691{703. [10] R. Feldman and M.C. Golumbic, Optimization algorithms for student scheduling via constraint satis ability, The Computer Journal, 33, 356-364, August 1990. [11] T. Franklin, E. Jenkins, and K. Woodson. A case study in scheduling courses, The UMAP Journal, 15.2, 1995, 115-122. 21

[12] A. Frieze, M. Jerrum, Improved approximation algorithms for max k-cut and max bisection, Integer Programming and Combinatorial Optimization, 1995. [13] A. Hertz. Finding a feasible course schedule using tabu search, Discrete Applied Mathematics, 35, 1992, 255{270. [14] D. S. Johnson, C. R. Aragon, L. A. McGeoch and C. Schevon. Optimization by simulated annealing: an experimental evaluation; part II, graph coloring and number partitioning, Operations Research, 3, 1991, 378{406. [15] F.T. Leighton. A graph coloring algorithm for large scheduling problems, Journal of Research of the National Bureau of Standards, 84, 1979, 489{506. [16] G. Lewandowski. Practical implementations and applications of graph coloring, PhD thesis, University of Wisconsin-Madison, Department of Computer Sciences, 1994. [17] G. Lewandowski, and A. Condon. Experiments with parallel graph coloring heuristics and applications of graph coloring, Cliques, Coloring, and Satis ability: Second DIMACS Implementation Challenge, David S. Johnson and Michael A. Trick (eds.), DIMACS Series in Discrete Mathematics and Theoretical Computer Science. To appear. [18] N. Macon and E. W. Walker. A monte carlo algorithm for assigning students to classes, Communications of the ACM, 9, 5, 1966, 339{340. [19] C. Morgenstern. Distributed coloration neighborhood search, Cliques, Coloring, and Satis ability: Second DIMACS Implementation Challenge, David S. Johnson and Michael A. Trick (eds.), DIMACS Series in Discrete Mathematics and Theoretical Computer Science. To appear. [20] S. M. Selim. Split vertices in vertex colouring and their application in developing a solution to the faculty timetable problem, The Computer Journal, 31, 1, 1988, 76{82. [21] A. Tripathy. Computerised decision aid for timetabling | a case analysis, Discrete Applied Mathematics, 35, 1992, 313{323. [22] P.M.B. Vitanyi. How well can a graph be n-colored?, Discrete Mathematics, 34, 1981, 69-80. [23] D. C. Wood. A technique for coloring a graph applicable to large scale time-tabling problems, Computer Journal, 12, 1969, 317{319.

A Speci cation for CourseGroup model # A model class graph specification.

22

# NumCourses is total number of courses available NumCourses 158; # NumStudents is the total number of students who will be # making selections NumStudents 496; NumChoices 12; # group id, percentage of students in that group # (e.g., a group could be a grade) # The group id must be an integer. StudentGroup 9 0.25; StudentGroup 10 0.25; StudentGroup 11 0.25; StudentGroup 12 0.25; # list of groups in the course group in pairs #(group_id, prob of picking from this course group) # followed by the percentage of total courses found # in this group. CourseGroup 9 0.6 0.10; CourseGroup 10 0.6 0.07; CourseGroup 11 0.5 0.07; CourseGroup 12 0.5 0.12; CourseGroup 9 0.10 10 0.05 11 0.03 0.02; CourseGroup 9 0.30 10 0.15 11 0.10 12 0.10 0.14; CourseGroup 10 0.20 11 0.15 12 0.20 0.35; CourseGroup 11 0.22 12 0.20 0.15;

B Speci cation File for Uniform Model #N=number courses, E = number students, C = size of schedule N 160 E 496 C 12 #S = number of groups, crossover matrix follows S 8 0.90 0.10 0.00 0.00 0.00 0.00 0.00 0.00

23

0.90 0.10 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.05 0.75 0.10 0.00 0.10 0.00 0.00 0.00 0.05 0.75 0.10 0.00 0.10 0.00 0.00 0.00 0.03 0.00 0.10 0.60 0.10 0.00 0.17 0.00 0.03 0.00 0.10 0.60 0.10 0.00 0.17 0.00 0.00 0.00 0.10 0.00 0.20 0.50 0.20 0.00 0.00 0.00 0.10 0.00 0.20 0.50 0.20

24

Section Parameter Min Con icts bound setting 25 1 371 2 220 3 365 4 198 6 389 7 152 8 363 9 139 30 1 413 2 230 3 337 4 168 6 371 7 170 8 344 9 146 input 1 449 2 150 3 397 4 105 6 458 7 105 8 412 9 86

Median Con icts Max Con icts 446 240 387 216 420 182.5 386 176 442 257.5 369 205 446 186 367 176.5 505.5 176 428.5 139.5 493.5 137.5 429 130

477 259 410 255 463 211 412 204 497 285 401 218 487 237 414 209 544 211 471 183 550 176 499 162

Table 7: Summary of Results on school nsh data; 10 runs of each parameter setting. Maximum number of con icts is 5916.

25

Table 8: Density and largest clique found in con ict graphs resulting from best section assignment found. Graph CourseGroup, sbd 25 CourseGroup, sbd 30 Uniform, sbd 25 Uniform, sbd 30 school, sbd 25 school, sbd 30 school, sbd input

26

clique size 18 22 22 25 17 17 16

density 0.30 0.34 0.32 0.35 0.25 0.27 0.24