258
Int. J. Services Operations and Informatics, Vol. 3, Nos. 3/4, 2008
Optimatch: applying constraint programming to workforce management of highly skilled employees Yossi Richter* and Yehuda Naveh IBM Haifa Research Lab, Haifa University, Mount Carmel, Haifa 31905, Israel E-mail:
[email protected] E-mail:
[email protected] *Corresponding author
Donna L. Gresh and Daniel P. Connors IBM Research Division, Thomas J. Watson Research Center, PO Box 218, Yorktown Heights, NY 10598, USA E-mail:
[email protected] E-mail:
[email protected] Abstract: Today many companies face the challenge of matching highly skilled professionals to high-end positions in large organisations and human deployment agencies. Unlike traditional workforce management problems such as shift scheduling, highly skilled employees are professionally distinguishable from each other and hence non-interchangeable. Our work specifically focuses on the services industry, where much of the revenue comes from the assignment of highly professional workers. Here, non-accurate matches may result in significant monetary losses and other negative effects. We deal with very large pools of both positions and employees, where optimal decisions should be made rapidly in a dynamic environment. Since traditional operations research methods fail to answer this problem, we employ Constraint Programming (CP), a subfield of Artificial Intelligence with strong algorithmic foundations. Our CP model builds on new constraint propagators designed for this problem (but applicable elsewhere), as well as on information retrieval methods used for analysing the complex text describing high-end professionals and positions. Optimatch, which is based on this technology and is being used by IBM services organisations, provides strong experimental results. Keywords: WM; workforce management; highly skilled employees; CP; constraint programming; text analysis; information retrieval. Reference to this paper should be made as follows: Richter, Y., Naveh, Y., Gresh, D.L. and Connors, D.P. (2008) ‘Optimatch: applying constraint programming to workforce management of highly skilled employees’, Int. J. Services Operations and Informatics, Vol. 3, Nos. 3/4, pp.258–270.
Copyright © 2008 Inderscience Enterprises Ltd.
Optimatch: applying CP to workforce management
259
Biographical notes: Yossi Richter received a BA degree in computer science and economics, and MSc and PhD degrees in computer science specialising in algorithms, all from Tel Aviv University, Israel. Since 2005, he has been a Research Staff Member at the IBM Haifa Research Laboratory, working on the theory and practice of Constraint Programming (CP). Yehuda Naveh received a BS degree in physics and mathematics, an MS degree in experimental physics, and a PhD degree in theoretical physics, all from the Hebrew University of Jerusalem, Israel. He joined IBM Research in 2000 after working for four years as a Research Associate at Stony Brook University in New York. His current research interests include the theory and practice of CP and the theory and practice of workforce optimisation. Donna L. Gresh received her BS degree in engineering in 1983 from Swarthmore College and her MS and PhD degrees in electrical engineering in 1985 and 1990, respectively, from Stanford University, where she studied the rings of Uranus using data from the spacecraft Voyager. She joined the IBM Thomas J. Watson Research Laboratory as a Research Staff Member in 1990 and spent 12 years conducting research in scientific and information visualisation. Since 2002, Dr. Gresh has been a member of the Mathematical Sciences Department, with research interests in the area of workforce optimisation. Daniel P. Connors received his BSE degree in electrical engineering from the University of Michigan in 1982, and his MS and PhD degrees in electrical engineering from the University of Illinois in 1984 and 1988, respectively. Since 1988, he has been a Research Staff Member at the IBM Thomas J. Watson Research Center. Dr. Connors has worked on modelling, simulating and designing business processes and developing decision support tools for manufacturing and supply chain logistics. He is a member of the Mathematical Sciences Department at the Research Center, where he is currently working on developing business processes and workforce management optimisation tools.
1
Introduction
We are witnessing a constantly increasing demand for skilled professionals with specialised combinations of expertise, who are essential in accomplishing high-end projects. This trend can be observed in most markets and industries. As a result, many large business organisations, as well as private and public human deployment agencies, face the Workforce Management (WM) Identification and Assignment (ID&Assign) problem of assigning skilled professionals to positions with specialised requirements. Moreover, the problem is dynamic in the sense that position descriptions and professionals’ availability change on a regular basis, and business pressures create a need for fast decisions. The ultimate goal is therefore to rapidly create matches that are accurate, while maximising generated revenue and minimising the idle time of the expensive highly skilled workers. Currently, in many companies, this critical task is being handled manually by human Resource Deployment Professionals (RDPs). When the number of professionals and positions to be matched is larger than a few dozens, this process results in assignments that are far from optimal and take a long time to create. An automated assignment mechanism for this WM problem that produces nearoptimal assignments is therefore essential.
260
Y. Richter et al.
Traditionally, the typical WM problems addressed in the literature have dealt with different variants of shift scheduling. In these problems we have a large number of employees, roughly divided into a small number of groups. Each group contains employees with similar skills, and is considered to be approximately homogeneous. Employees in the same group can be thought of (from the automation point of view) as indistinguishable and interchangeable. Given this partition to distinct groups, employees are scheduled for shifts, where each shift requires a specific combination of personnel. Such WM problems are widely solved using traditional Operations Research (OR) methods (e.g. linear and integer programming, reductions to other OR problems) or by other methodologies such as modern meta-heuristics (in particular Tabu search and genetic algorithms) and multi-agent systems. For example, Swops (Gilat et al., 2006), a tool suitable for shift scheduling, is based on integer linear programming. Resource capacity planning (Gresh et al., 2007) is a different WM scenario, concerned with aggregates of employees rather than individuals. Here planning is performed in order to estimate future gaps and gluts of workforce. Here too, the problem lends itself naturally to mathematical programming methods. In contrast, the ID&Assign problem we are dealing with is at the opposite extreme: the individual employees are highly skilled, each with his or her own unique combination of competencies, and are highly distinguishable and non-interchangeable. Additionally, it is essential to find a good match between workers and their assigned positions; otherwise we run the risk of under- or over-qualified assignment, or understaffing, with the obvious contingent problems. Our WM problem is also inherently different from the usual supply chain problems in OR. Our entities are people rather than parts; we cannot model them as pure sets of attributes. Individuals have their own unique skills, behaviours, interests and expectations. Our ID&Assign problem has not been addressed before and seems to be harder to automate. The traditional OR methods listed above fail to work for a number of reasons. First, the constraints, which depend on the particulars of professionals and positions, are complex and do not translate easily to linear constraints. This is as opposed to the simple constraints such as vacation time and maximum daily work hours seen in mainstream workforce scheduling applications. Second, most OR methods rely on optimising an objective function. In our case it is nearly impossible to put a price tag on most of the variables involved. For example, how can we quantify the cost of a dissatisfied customer or a displeased employee resulting from a non-perfect match? Finally, new rules and constraints arise constantly. In order to be able to handle them quickly and efficiently, the desired mechanism should have a rich expressive language that will easily allow the formulation and maintenance of these constraints. Translating the problem into a linear model would create a maintenance nightmare as the model would be very far from the original constraints. Our tool, Optimatch, relies on Constraint Programming (CP) methodology, whose expressive language is rich, natural and modular, contains many types of constraints, and therefore allows the rapid development and maintenance of models. Additionally, the strong algorithmic foundation of CP allows for fast execution and good optimality. Therefore, it suits the nature of our WM problem better than traditional OR methods. Indeed, there have been a few attempts to employ CP in solving WM problems, although these were scheduling problems of the more traditional nature. (For example, British Telecom used CP to solve a real-life problem (Munaf and Tester, 1993) that was later also solved in Yang (1996).) A high-level overview of Optimatch was given in Naveh
Optimatch: applying CP to workforce management
261
et al. (2007), along with many background references. In this paper, we focus on new technological advancements (e.g. flexibility and text analysis), experimental results and the close relationship with the services industry. The rest of this paper is organised as follows: In Section 2 we give a brief introduction to CP. In Section 3 we describe how we employed CP in Optimatch, and the novel constraint propagator that lies at the heart of our solution. In Section 4 we describe the flexibility feature that adjusts a given assignment when some of the terms have changed. In Section 5 we present an information retrieval approach to matching position definitions to employees’ skill sets using free text analysis. In Section 6 we present experimental results. We conclude in Section 7.
2
A brief introduction to CP
CP deals with modelling and solving Constraint Satisfaction Problems (CSPs) (see Dechter (2003) for an excellent text). In this section we briefly introduce the mathematical formulation of CP, problem modelling and algorithmic background.
2.1 Mathematical formulation A CSP is formulated mathematically as a triplet (V,D,C), where V = (v1, …, vn) is a set of variables, D = (D1, …, Dn) is the set of associated domains and C = (C1, …, Cm) is a set of constraints, each defined over a subset of the variables. A constraint cj over the k-tuple (vj1, …, vjk) defines a relation over D j1 ×" × D jk . Therefore, in principle, CP supports the definition of general constraints, without requiring any specific structure. A solution to a CSP is an assignment (a1, …, an) such that for each i = 1, …, n, ai ∈ Di and all constraints are satisfied when vi = ai. A CSP P is satisfiable if there exists a solution to P; otherwise it is unsatisfiable. The CP methodology also allows the introduction of soft constraints. These are constraints that the solver attempts to satisfy, but is not strictly required to. Soft constraints may be prioritised, so that the solver favours solutions satisfying higher priority ones over solutions satisfying only lower priority constraints. This is one way the CP methodology can cope with optimisation in addition to satisfiability.
2.2 CP modelling As in other methodologies, there are usually several equivalent ways to model a problem in CP. We usually strive to have a CP model that is as close as possible to our natural conception of the problem, without unnecessary translation. This is often easy to achieve through the rich expressive languages that are used in CP modelling such as Numerica (van Hentenryck et al., 1997) and the Optimisation Programming Language (OPL) (van Hentenryck, 1999). When modelling a problem in CP, one needs to identify the problem’s variables. These usually correspond directly to the physical entities of the problem. The use of auxiliary variables, common in traditional OR methods, is generally avoided. The domain of each variable should contain all possible values that can be assigned to it in the natural formulation of the problem. For example, the domain of a variable representing the
262
Y. Richter et al.
colour of a car should be {black,blue,white} rather than {1,2,3}. Constraints are also modelled in the problem domain, often using one of the generic constraint languages, rather than using a mathematical representation. Generic languages usually support arithmetic and logic operators and comparators, as well as a set of global constraints that may act on a large number of variables. An example of a global constraint is alldifferent, which is defined over a set of variables and requires that they are assigned different values. As a classic example of CP modelling, consider the n-queen problem. We wish to place n queens on an n × n chess board such that no two queens are placed in the same row, column or diagonal. In one model of this problem, one can define n variables, (v1, …, vn) such that vi represents the position of the queen in the ith row. Naturally, all domains are {1, …, n}. The corresponding CSP now becomes: alldifferent(v1 ,v2 ,…, vn)
(1)
alldifferent(v1 – 1,v2 – 2,…,vn – n)
(2)
alldifferent(v1 + 1, v2 + 2, …, vn + n).
(3)
Here, constraint (1) guarantees that no two queens share the same column, and constraints (2) and (3) guarantee that none share the same downward or upward diagonals, respectively.
2.3 Algorithmic background Constraint propagation algorithms fall into two broad categories: systematic and stochastic. In this paper, we are interested primarily in systematic algorithms. These algorithms perform a complete search over the space defined by the Cartesian product of the variable domains. They rely on implementation of constraint propagators in order to prune the search space and possibly make the problem tractable. A constraint propagator filters unsupported values from the domains of variables upon which the constraint acts. For example, consider our n-queens model. Suppose we decided to place the queen in the first row on the second square, hence v1 = 2. In this simple example, the propagator for the first alldifferent constraint (1) will remove the value 2 from the domains of all other variables, since assigning 2 to any other variable violates this constraint. When designing or extending a constraint language to be used with systematic search, one needs to implement a propagator for each constraint incorporated in the language. Given the set of propagators corresponding to all constraints in a CSP, a systematic algorithm typically works in stages. At each stage, we have a partial (possibly empty) assignment of values to variables that is consistent, i.e. it does not violate any constraint. In each stage, two phases are applied. In the instantiation phase, a value is assigned to one of the unassigned variables. Then, in the propagation phase, the algorithm repeatedly activates all constraint propagators to filter all unsupported values. If a domain becomes empty, the algorithm backtracks, i.e. replaces one of the previous assignments with a different value. The algorithm stops when all variables are instantiated (satisfiable case) or when a domain is empty and all variables have already been instantiated with all consistent assignments (unsatisfiable case). Naturally, state-of-the-art systematic algorithms add many additional subroutines and heuristics to this general method.
Optimatch: applying CP to workforce management
3
263
Modelling WM ID&Assign using CP
3.1 The somedifferent constraint The goal of our work is to maximise the number of positions assigned, while maintaining the best fit of employees to their assigned positions. The basic constraint we wish to enforce is that although the same employee can be assigned to multiple positions, these positions are not allowed to have overlapping execution times. We model each position by a variable whose domain is the set of all workers who can perform it. Suppose every pair of positions overlapped. In this case, we could define a single alldifferent constraint over all variables, thereby guaranteeing that no person is assigned to two positions. Since, in general, not all positions overlap, we employ the somedifferent constraint (Richter et al., 2006) instead. The somedifferent constraint is a natural generalisation of alldifferent that answers our needs. It is defined over a subset of the variables, together with an underlying graph whose vertices are the participating variables. The constraint requires that variables that are adjacent in the graph are assigned different values. (Note that the alldifferent constraint is the special case obtained when the underlying graph is complete.) Formulated mathematically, somedifferentG(v1,…,vk) = {(a1,…,ak) : ai ∈ Di, (vi,vj) ∈ E(G) →ai ≠ aj}, where E(G) is the set of edges of the graph G. We note that defining a single constraint, rather than a separate constraint for each pair of conflicting variables, guarantees better pruning during propagation (the same is true for alldifferent). The only concern with somedifferent is that its propagation is an NP-hard problem (since e.g. it generalises colouring problems), and therefore it is not likely that an efficient (polynomial) propagator exists. However, there exists a non-trivial propagator (see Richter et al., 2006), which, together with a few heuristics, works well in practice. Still, defining this constraint over a large set of variables is not recommended.
3.2 The CSP model 3.2.1 Variables and domains Each position is modelled by a variable whose domain is the set of all workers qualified to perform it that are available throughout its duration. We define a single fictitious employee who we add to all domains. We treat him specially: although we admit him as viable in the instantiation phases, we ignore him in the propagation phases, in effect allowing him to be assigned to conflicting positions. The reason we introduce this fictitious employee is that it is quite likely that we will not be able to staff all positions due to insufficient resources. Ordinarily, in such cases we would simply get an indication that the problem is unsatisfiable. By adding the fictitious employee we can guarantee solvability, and by use of unary soft constraints we can direct the solver to prefer real employees over the fictitious one. Of course, once we obtain a solution, we remove the fictitious employee and reject all positions to which it has been assigned.
264
Y. Richter et al.
3.2.2 Hard constraints The hard constraints should guarantee that no employee is assigned to two positions that overlap in time. This can be accomplished by using a single somedifferent constraint whose underlying graph contains an edge between every two overlapping positions. However, because the propagator for somedifferent cannot be efficient (i.e. cannot run in worst-case polynomial time), we use the following partitioning heuristic, which results in several somedifferent constraints, each applying to a small underlying graph. •
Edges in the full somedifferent graph connecting pairs of variables with disjoint domains are obviously redundant (in our application these represent positions whose durations overlap in time, but for which there are no potential workers in common). We delete them. We then partition the resultant graph into its connected components.
•
We partition each connected component which is larger than size t into clusters of size t (a user-defined threshold set by default to 10). If the size of the connected component is not divisible by t, one of the clusters will be smaller than t. We apply a somedifferent constraint to each cluster.
•
To capture the connectedness of the clusters within the original connected components, we further add an approximate somedifferent constraint over each connected component that is larger than t. The approximate constraint has the same semantics as the ordinary one, but is associated with an efficient (polynomial) propagator. The drawback is that this propagator may not filter all unsupported domain values.
Although this scheme risks suboptimal staffing, in practice, it works well and achieves a considerable speedup compared with both a single somedifferent constraint and a single approximate somedifferent constraint, applied to all variables.
3.2.3 Soft constraints We apply two types of soft constraints. First, unary soft constraints prefer the assignment of real vs. fictitious persons. Second, we apply user preferences, which are typically hard to quantify, by adding soft constraints. For example, suppose a position requires a Java programmer as a first choice, a C++ programmer as a second choice and a C programmer as a last resort. We add a soft unary constraint over this position that requires a Java programmer, and an additional soft constraint of lower priority that requires either a Java or a C++ programmer. More complex (non-unary) soft constraints arise in the context of team building, where we consider staffing of complete teams rather than individual positions. Here, preferences refer to the entire team combination.
4
Flexibility
Suppose we generate an assignment for the next months, and shortly afterwards some of the conditions change: some positions are added to the pool, others removed, a few employees left the resource pool, others joined, etc. We now face a WM problem where we wish to generate a new assignment that is both nearly optimal, so there is minimal revenue loss, but at the same time is as close as possible to the current assignment,
Optimatch: applying CP to workforce management
265
namely, a minimal number of positions/employees switch their intended assignment. In general, these requirements can be conflicting and, in extreme cases, an optimal new assignment may require a complete reassignment. However, on average we can meet both requirements to a high degree. Problems of this sort, i.e. when a given solution has to be adjusted to accommodate new constraints, are usually termed flexibility problems in the CP literature (see Verfaillie and Jussien (2005) for a survey). The mainstream approach is to repair the given solution using stochastic local search methods that perturb it until the new constraints are satisfied. This approach assumes that only minor modifications are needed for the adjustment, hence they are best found by searching the local vicinity of the assignment space. We take a different approach and achieve flexibility through the systematic solver for two main reasons. First, using a single solver to solve the entire problem decreases the application development and maintenance burden. Second, local search methods work best when only minor modifications are needed, and are less suitable for the common case where only a partial prior assignment is given. We employ two popular heuristic methods in CP systematic solving: variable ordering and value ordering. The former allows us to specify the next variable to instantiate at the instantiation phase of each step of our traversal of the search tree, while the latter decides the next instantiation value. Our heuristic flexibility algorithm is defined as follows. •
Variable ordering: Define a variable (position) to be prioritised if it participated in the previous solution and its current domain contains its previously assigned value (employee). If there are any prioritised variables, choose one randomly; otherwise choose a random variable.
•
Value ordering: If the current variable is prioritised, assign its value from the previous solution; otherwise, assign to it a random value from its domain.
As our experiments show (see Section 6) this heuristic works well, since it is interleaved in the regular solving process while giving preference when applicable to original variables and their original assigned values. We are left with the computational problem of identifying the prioritised variables at each stage. Note that a prioritised variable may become non-prioritised when propagation takes place, since its domain may lose its previously assigned value. The variable can later return to be prioritised when we backtrack to an earlier point. We can, of course, scan all variables in each step to find the prioritised ones, but the potentially large number of variables renders this inefficient in practice. A second option is to hold the prioritised variables in a set that ‘remembers’ its state and knows how to restore it when backtracks occur. Such objects are supplied by modern CP systematic solvers and we refer to them as undoable objects. The pitfall is that we store the state of a possibly large set, which is potentially time-consuming. The following efficient data structure holds the prioritised variables, does not compute them from scratch at each step and uses undoable objects minimally. •
We use a vector along with a pointer that divides the vector into two sections: left and right. The only undoable object is the pointer. We initialise the vector with the indexes of prioritised variables at the beginning of the solution process. The pointer is initialised to the end of the vector. (The right side is therefore empty.)
266 •
Y. Richter et al. Whenever we require a random prioritised variable (variable ordering heuristic), we do the following. If the left side is empty, we report that there is none; otherwise we randomly choose variables from the left side until we pick a prioritised variable, which is returned. Whenever a variable from the left side is considered, we move it to the right side, while updating the pointer accordingly.
Obviously, whenever we move a variable from the left side to the right side, we only execute a small constant number of operations. Additionally, whenever we backtrack to this point in the search tree, we will not try variables that were already determined to be non-prioritised at this point since they were moved to the right side (recall that the pointer is undoable). This ensures that no redundant computations are made. Our experiments indeed show that there is no run-time overhead incurred when employing this algorithm.
5
Textual data
Our typical input data consists of many structured fields characterising the professionals and the positions. Examples of such fields are job role, skill set, experience level, pay rate and geographical location. However, neither the positions nor the professionals are described using purely structured data. The positions in our application also include four free text fields: ‘project description’, ‘position description’, ‘required skills’ and ‘optional skills’. Employees generally have a resume, which can be browsed by anyone looking to staff a project. When working manually, RDPs trying to staff a position rely extensively on textual search to look for relevant terms in resumes. This occurs because many RDPs are unfamiliar with the details of the expertise taxonomy used to describe job roles and skill sets (that are part of the structured data) and do not know what the ‘right’ job role is, and because a potential professional may have relevant skills that are not completely described by their current formal job role and skill set. Thus, we want to provide additional functionality to Optimatch so it incorporates the fact that significant relevant information may be contained both in the position description and in an employee’s resume. In the prototypes we have built, we use the ‘required skills’ field along with the ‘position description’ because in our experience, these appear to be most closely related to the necessary skills. In practice, while some position ‘required skills’ fields do not contain text that will likely lead to good matches (e.g. ‘Ability to take ownership of the tasks provided’), most requests include a relatively straightforward list of necessary skills. A good example is ‘ECATT scripting and testing, Solution Manager, Functional and Testing experience in Product Lifecycle Management, Project Systems modules’, or even more to the point ‘Java 2, J2EE, Websphere Application Server’. Our approach is to find all those employees who are a ‘good fit’ to the position, based on the content of their resume. We begin by extracting the text for the resumes from the enterprise data warehouse for each employee under consideration. We then use the Lucene (see Hatcher and Gospodnetic, 2004; Apache Lucene webpage) open source text search engine to index the resumes to allow rapid search for terms in the open position request. We use the Lucene ‘Snowball’ stemmer to reduce terms to their root form (developer, developed and develop are all considered to be the same word). We also index the resumes using expansion of acronyms and abbreviations (using a custom-created list) so that searches will return either the short or long version.
Optimatch: applying CP to workforce management
267
Queries are constructed using the text for the relevant open seat fields (the choice of fields is configurable at run-time) and constructing a Boolean query using these terms (as many terms as possible should be found in the resume, but none are required to be found). When Lucene provides ‘hits’ of a document in the index to a query, it provides a score of the goodness of the hit using a combination of the Vector Space Model (VSM) (Salton et al., 1975) and the Boolean model (Salton and Michael, 1986) of information retrieval to determine how relevant a given document is to a query. We are currently using the default Similarity Class provided by Lucene. In general, the idea behind the default scoring mechanism (see Similarity Class web page) is that the more number of times a query term appears in a particular document relative to the number of times the term appears in all the documents in the collection, the more relevant that document is to the query. A score is also increased if a high proportion of the terms in the original query are found in the target document, and if the target document is relatively short (a long document which contains many of the query terms is a less good match than a short document which contains the same number of query terms). We are interested in how well a resume matches a position, so we use as our query all the words in the position text fields under consideration (stemmed and with common English words such as ‘to’ and ‘and’ removed). We also remove a user-configurable set of words which we have found to be common, and unhelpful, in our application, such as ‘accountable’, ‘assist’, ‘business’, ‘discuss’ and so on. We can restrict matches to those with a relatively high score, or to only the top N matches to a position. An example of a ‘required skills’ field with a fairly extensive, skills-oriented description is the following: Mercury LoadRunner expertise required. Experience with formal test scripts, execution and reporting required.
The query that is created (in the Lucene query language) from this text is the following: Query = text:loadrunn text:mercuri text:formal text:script text:expertis text:report text:test
In this case we are searching for all words in the ‘text’ field, which includes the entire resume; section-specific searching could also be supported. Note that some words from the original text are not included in the query, as they are considered ‘stop’ or common English words. A high-scoring resume for this query shows the following fragments; matched terms are in bold: “...completion. Key Skills: Load Testing – Mercury LoadRunner (Certified Product Specialist) – Rational. Company Reliant, Tx, Project Description: Primary Mercury LoadRunner scripter and test execution. Developed test plans, test scripts, and test reports. Project. client/server using Mercury LoadRunner. Generate data for testing and analysed performance results. – Mercury Quick Test Pro (QTP) Test Management – Mercury Test Director – Rational Clear Quest and Clear Case. Project Description: Test lead for SeaWare reservation system using Mercury LR GUI.”
Interesting to note is that in this case the open position request was for a job role of ‘Performance Tester’ and a skill set of ‘General’, while the best matched resume was for a professional with a job role of ‘Test Consultant’ and a skill set of ‘Test Planning’. It remains for the RDP to decide whether the matched job role and skill set are indeed what is required, but it is plausible that this professional would indeed be a good match for the position.
268
Y. Richter et al.
In some cases the required skills field of a posit0ion yields no high scoring matches. This can occur for two reasons. Either the required skills field has few technical terms, so there are no distinctive terms that retrieve any particular resume with a high score. Alternatively, the required skills field contains so many requirements that no resume satisfies a sizable percentage of the terms. Once we have a method for measuring how well a resume fits a position (goodness of fit), we can easily integrate it with the matching results using structured information (band, job role, skill set, etc.). Then, as with any other matching criteria, our interface allows the user to determine the precedence of text matching in the overall prioritisation scheme. We can also allow the user to allow a match, even when a required matching criterion is violated, for a person with a particularly high score, as in the example above, or, alternatively, to eliminate an otherwise matched person when the resume indicates a particularly low level of correspondence with that position.
6
Experiments
We experimented with a real dataset of an IBM Global Services organisation that contained 4000 employees and 1354 positions and is a partial snapshot of a typical organisation at a given time. We randomly partitioned this dataset into our primary instance, which contained 3200 employees and 1083 positions, and an auxiliary instance for the flexibility experiments, containing the remaining input. Experiments were performed on an Intel(tm) P4, 3.6 GHz machine running Linux. Solving the primary instance took 8 seconds and produced an assignment for 263 positions. This latter number is quite reasonable since we were using real data in which most employees were already assigned to previously created positions, and we accepted only good-quality matches. The fast run-time establishes Optimatch as suitable for a real-time decision support system. Our experiments only considered a data snapshot and did not concern time-related performance measures such as average wait time for position assignment, number of delayed positions, etc. Such considerations can be incorporated into Optimatch using its preference mechanisms. Below we discuss experiments related to flexibility. We refer to the 263-position assignment mentioned above as the previous solution. We created four new datasets out of the primary instance: Flex-1%, Flex-5%, Flex-10% and Flex-25%, each consisting of 10 input files. Each file in Flex-n% was obtained by randomly deleting n% of the positions and employees in the primary instance, and replacing them with an equal number of positions and employees selected randomly from the auxiliary instance. We ran Optimatch on each file in each dataset twice: once requiring proximity to our previous solution (flexibility feature on) and once from scratch. We first compare the running time and the number of matches found in the two modes of execution in Table 1. Indeed flexibility incurs no running time overhead, as the times reported in the second column were obtained for both methods of execution. The other columns specify the number of matches: average, minimum and maximum, taken over the 10 files in each dataset. In each entry, the left number refers to execution with flexibility and the right number to execution from scratch. Using flexibility does not induce sub-optimality; in fact, it actually produced slightly better assignments. Finally, we address the question of proximity to the previous solution in Table 2. We measure proximity by counting the number of preserved matches divided by the number of
Optimatch: applying CP to workforce management
269
matches we could have potentially preserved, namely those matches where both the previously assigned position and the previously assigned employee remained in the input file. Not surprisingly, it is harder to stay close to a previous solution when the changes are bigger. However, even when 25% of the input is changed, almost 90% of the previous matches were maintained without impairing the quality of the assignment. Table 1
Results: running time and matches
Input
Time (sec)
Matches-avg.
Matches-min.
Matches-max.
Flex-1%
8–9
264.3/263.9
262/261
268/267
Flex-5%
8–9
266/264.9
260/258
278/276
Flex-10%
8–9
265/264.6
259/259
271/269
Flex-25%
9–10
271.9/271.6
257/257
283/282
Table 2
Proximity to previous solution
Input
7
Proximity-avg.
Proximity-min.
Proximity-max.
Flex-1%
0.94
0.93
0.97
Flex-5%
0.92
0.90
0.936
Flex-10%
0.90
0.88
0.94
Flex-25%
0.877
0.832
0.91
Concluding remarks
We described a CP approach to the ID&Assign problem of highly skilled professionals. CP has many advantages over traditional OR methods in solving this problem, most notably its separation between problem modelling and algorithmic foundations, which enables easy modelling of complex rules, and rapid adjustment to newly created constraints. Optimatch demonstrates the applicability of CP to the problem, and shows that large industrial-scale problems can be solved with near-optimal results and with real-time performance. It is aimed at automating the tedious and repetitive tasks performed manually by resource deployment managers, while allowing them to concentrate on real decision-making. As such, our main direction of current development is in modelling and solving complex CSPs that arise when building coherent teams of professionals for assignment to large projects.
References Gilat, D., Landau, A., Ribak, A. Shiloach, Y. and Wasserkrug, S. (2006) ‘Swops – shift work optimized planning and scheduling’, Proceedings of 6th International Conference on the Practice and Theory of Automated Timetabling (PATAT), pp.518–523. Gresh, D.L., Connors, D.P., Fasano, J.P. and Wittrock, R. (2007) ‘Applying supply chain optimization techniques to workforce planning problems’, IBM Journal of Research and Development, Vol. 51, Nos. 3/4, pp.251–262. Munaf, D. and Tester, B. (1993) And/Or Parallel Programming in Practice, Technical Report WP12:1203, British Telecom Research Lab, Project 1251, London, UK.
270
Y. Richter et al.
Yang, R. (1996) ‘Solving a workforce management problem with constraint programming’, The 2nd International Conference on the Practical Application of Constraint Technology, pp.373–387. Naveh, Y., Richter, Y., Altshuler, Y., Gresh, D.L. and Connors, D.P. (2007) ‘Workforce optimization: identification and assignment of professional workers using constraint programming’, IBM Journal of Research and Development, Vol. 51, No. 3/4, pp.263–279. Dechter, R. (2003) Constraint Processing, Morgan Kaufmann Publishers, San Francisco. van Hentenryck, P. (1999) The OPL Optimization Programming Language, MIT Press, Cambridge, MA, USA. van Hentenryck, P., Michel, L. and Deville, Y. (1997) Numerica: A Modeling Language for Global Optimization, MIT Press, Cambridge, MA. Richter, Y., Freund, A. and Naveh, Y. (2006) ‘Generalizing alldifferent: the somedifferent constraint’, Proceedings of Principles and Practice of Constraint Programming (CP), pp.468–483. Verfaillie, G. and Jussien, N. (2005) ‘Constraint solving in uncertain and dynamic environments: a survey’, Constraints, Vol. 10, No. 3, pp.253–281. Apache lucene. Available online at: http://lucene.apache.org/java/docs/index.html Hatcher, E. and Gospodnetic, O. (2004) Lucene in Action, Manning Publications. Salton, G., Wong, A. and Yang, C.S. (1975) ‘A vector space model for automatic indexing’, Communications of the ACM, Vol. 18, No. 11, pp.613–620. Salton, G. and Michael, J. (1986) Introduction to Modern Information Retrieval, McGraw-Hill. Similarity Class. Available online at: http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/ search/Similarity.html