A piano fingering indicates which finger should play each note in a piece. Such a ... Keywords: Metaheuristics, OR in music, Piano fingering generation, Variable ...
A variable neighbourhood search algorithm to generate piano fingerings for polyphonic sheet music Matteo Balliauw1 , Dorien Herremans1 , Daniel Palhazi Cuervo1 , and Kenneth S¨orensen1 1
University of Antwerp, Prinsstraat 13, 2000 Antwerp, Belgium
Abstract A piano fingering indicates which finger should play each note in a piece. Such a guideline is very helpful for both amateur and experienced players in order to play a piece fluently. In this paper, we propose a variable neighbourhood search algorithm to generate piano fingerings for complex polyphonic music, a frequently encountered case that was ignored in previous research. The algorithm takes into account the biomechanical properties of the pianist’s hand in order to generate a fingering that is user-specific and as easy to play as possible. An extensive statistical analysis was carried out in order to tune the parameters of the algorithm and evaluate its performance. The results of computational experiments show that the algorithm generates good fingerings that are very similar to those published in sheet music books. Keywords: Metaheuristics, OR in music, Piano fingering generation, Variable neighbourhood search, Combinatorial optimisation
Author’s version. The official version appears in print in International Transactions of Operations Research. October 2015.
1
Introduction
The piano is an instrument with one of the widest ranges of playable notes, spanning over seven octaves. It offers tremendous musical opportunities to both the pianist and composer. Unfortunately, a pianist only has ten fingers to play the piano’s 88 keys. Therefore, it can be very helpful to have a piano fingering available, especially for beginner pianists. A piano fingering indicates which finger should play each note in a piece. Any finger could be used to play any note on a piano, contrary to instruments such as the saxophone or the recorder for which the fingerings are fixed. Depending on the melodic and harmonic properties of piano pieces, some fingerings are easier or better suited than others. The quality of a piano fingering depends to some degree on the pianist’s expertise level and biomechanical properties (e.g., the maximum distance between two notes that can be played with the same hand varies widely between pianists), as well as the composer’s desired interpretation (e.g., accented notes should be played with ‘strong’ fingers) (Sloboda et al., 1998). There is a widespread agreement that a good piano fingering is crucial to allow a pianist to play a piece fluently (Parncutt et al., 1997). However, deciding on the right fingering for a piece of music can be a complex, time consuming and burdensome task (Hart et al., 2000). Automation of this process could help pianists, both novices and experts (Yonebayashi et al., 2007a). For the former, finding a fingering that allows them to play a piece fluently is difficult in its own right. The latter could save valuable time by having an automatically determined fingering, possibly consisting of different alternatives with different expressive characteristics (Gellrich & Parncutt, 1998; Parncutt et al., 1999).
1
2
LITERATURE REVIEW
Generating a piano fingering for a specific piece can be seen as a combinatorial optimisation problem. This is characterised by discrete decision variables that assign a specific finger to each note, an objective function that indicates the difficulty of the fingering, and a set of constraints to ensure the fingering is playable by a human. In this paper, we formulate the generation of a piano fingering as such a combinatorial optimisation problem and propose an effective and efficient algorithm to solve it. This algorithm and its parameter settings are thoroughly tested in two computational experiments. The approach proposed in this paper goes further than the current state-of-the-art in that it is able to handle complex polyphonic music, determines a fingering for both hands and allows for user-specific inputs. In the next section, an overview of the current literature available for the generation of a piano fingering is discussed. In Section 3, finding a piano fingering for polyphonic music is defined as a combinatorial optimisation problem and a metric for evaluating the difficulty of a fingering is described. In Section 4, the developed variable neighbourhood search algorithm is discussed in detail. Section 5 evaluates the performance of the algorithm and describes the computational experiments carried out to tune its parameters. Section 6 shows example outputs and discusses the results. Conclusions and recommendations for further research are given in Section 7.
2
Literature review
The first exercises for learning how to select piano fingerings were published in the 18th century. C.P.E. Bach was one of the first to publish such work. He was followed by many others in the 19th century (Gellrich & Parncutt, 1998). The first mathematical translations and algorithm developments for this problem were proposed at the end of the 20th century. Similar research for other instruments can be found: e.g., automatic string instrument fingering by Radicioni et al. (2004).
2.1
Measuring the quality of a piano fingering
Throughout history, different rules emerged for deciding on an appropriate piano fingering. They even became repertory dependent (Clarke et al., 1997; Gellrich & Parncutt, 1998). According to Parncutt et al. (1997), the quality of a fingering can be measured by three objectives. The first objective is the ease of playing, which is related to the size and biomechanics of the hand. A second objective is the ease of memorisation, as some musicians can memorise certain combinations more easily than others. Finally, the third objective is musical expressiveness. A good fingering should help the pianist to convey the musical message intended by the composer. In order to comply with the first objective, any algorithm that generates piano fingerings should take personal biomechanics of a pianist into account. Robine (2009) shows that the chosen fingering influences the musical execution and expressiveness, as considered by the third objective. This author gives the specific example of small lags between two notes, caused by passing the thumb underneath another finger. Because of the aforementioned, different pianists choose different fingerings for the same piece, as proven by Parncutt et al. (1997, p. 369-372). This latter criterion is especially important for more advanced pianists, since the potential to express certain musical interpretations with different fingering combinations becomes possible. When deciding on a fingering, a trade-off between these three criteria should be made (Clarke et al. 1997; Sloboda et al. 1998, p. 185). According to Clarke et al. (1997, p. 100) however, some details of fingerings are subject to personal preferences. As a result, we believe that they cannot all be captured by the model and that manual adaptations afterwards may be required. Quantifying the quality of a fingering is one of the most important aspects in defining the generation of a piano fingering as a combinatorial optimisation problem. One of the first papers to introduce a cost-minimising paradigm to quantify the difficulty of a piano fingering in the literature is due to Parncutt et al. (1997). The higher the cost of a fingering, the more difficult it is to play the piece using the specified fingers. The authors increase the cost of a fingering based on rules that penalise difficult sequences and distances in a fingering. This set of rules is also applied
Preprint accepted for publication in International Transactions of Operations Research. Special Issue on Variable Neighbourhood Search, Volume X, Issue X, X October 2015, Pages X. 2
2.2
Solution strategies
3
PROBLEM DESCRIPTION
by Parncutt (1997) and Lin & Liu (2006) and expanded by Jacobs (2001). An important feature of this cost minimising paradigm, is that it uses a set of complementary rules. This makes it possible to identify certain trade-offs between different origins of difficulty and assign a weight to them. The rules from this approach are interpretable and have a musical/biomechanical origin, contrary to machine learning methods that learn a transition matrix based on a monophonic fragment (Robine, 2009; Hart et al., 2000) or short fragments with simple polyphonic chords (Al Kasimi et al., 2005, 2007). Yonebayashi et al. (2007a) learned Hidden Markov Models [HMM], whereby the fingering with the highest probability is considered to be the best alternative. Nakamura et al. (2014) implement an HMM for simple polyphonic music. They state that the advantage of this approach is that it differentiates between the left and right hand without needing a predetermined separation between notes in the right and left hand staff.
2.2
Solution strategies
The solution strategies found in the literature for finding good piano fingerings differ widely. Parncutt et al. (1997), Al Kasimi et al. (2005, 2007) and Lin & Liu (2006) make use of the Dijkstra algorithm. This algorithm calculates the shortest path in a graph, by retaining the minimal total cost paths through nodes containing the previous, actual and following used finger. This total cost is the accumulation of all the costs of the previous nodes on this path, increased with the cost of the actual node. At the last node of the algorithm, the shortest path is found. Moreover, as this algorithm was implemented on a situation which excluded a considerable amount of solutions which could not be played legato1 from the solution space, Parncutt et al. (1997) claimed that its execution time had been reduced from exponential to polynomial. This approach however is not well suited to deal with complex polyphonic music whereby simultaneous notes have different start and end points, because this would result in an exponentially large number of nodes. It could also be possible that a complex polyphonic piece cannot or should not be played legato. A similar algorithm would then result in an empty solution space. In general, complex polyphony with different simultaneous melody lines complicates the application of any algorithm using networks or graphs. Hart et al. (2000) and Robine (2009) use dynamic programming algorithms with very similar characteristics to the Dijkstra algorithm. This alternative approach finds the shortest path in a graphical network, using transition matrices and calculating backwards from the end to the beginning. In each node n, the finger used for the n-th note is indicated. For the last node, the cheapest transition to reach this node from all previous possible nodes is calculated. This is repeated for the previous node, considering the sum of the previously calculated score of the best transitions to the last node and the score of this additional transition. The algorithm calculates its way back to the first node, by repeating this action N − 1 times with a total of N nodes. The first node corresponding to the lowest total cost, is chosen as the solution. From then on, the best transitions have to be chosen from the previously explored transitions. Similar to this approach, Yonebayashi et al. (2007a) and Nakamura et al. (2014) use the Viterbi Algorithm [VA] to find the optimal solution from a HMM for monophonic music. As Forney Jr (1973) states, VA is a simplified version of forward dynamic programming to find the shortest path in a graph that corresponds to the maximal a posteriori probability [MAP] estimation problem for a sequence of states, such as a HMM.
3
Problem description
In this section, the problem of finding an optimal fingering is presented as a combinatorial optimisation problem. The decision variables are discussed below, followed by a description of the objective function and imposed constraints. At the end of this section, the complexity of the problem is discussed. 1 smoothly
connected
Preprint accepted for publication in International Transactions of Operations Research. Special Issue on Variable Neighbourhood Search, Volume X, Issue X, X October 2015, Pages X. 3
3.1
Decision variables
3.1
3
PROBLEM DESCRIPTION
Decision variables
A piano fingering indicates which finger should be used to play each note. In piano music, a piece consists of two staves, whereby the notes from the upper staff are played by the right hand and those from the lower staff by the left hand. This means that a solution of the piano fingering problem can be split up in two different sub problems, one for every hand or staff. The convention when writing fingerings is to label every finger with a number from one up to five, where the first finger is the thumb. This is illustrated in Figure 1.
Figure 1: Convention for piano fingering representations (Yonebayashi et al., 2007b).
3.2
Objective function
In this paper, we define a set of rules that is used to evaluate the difficulty of a candidate fingering. This is done by measuring the extent to which the rules are followed. In analogy with Parncutt et al. (1997), Jacobs (2001) and Al Kasimi et al. (2005), an objective function based on an adapted version of their rules is minimised. This function penalises difficult combinations of fingers. In order to properly evaluate biomechanical characteristics of finger positions, a distance matrix is first defined. 3.2.1
Distance matrix
How easy is it to play two subsequent notes with a particular set of fingers? This depends mainly on the finger pair used and the physical distance between the notes on the piano. When using two adjacent fingers, the distance between the notes should be small. To quantify this idea, a distance matrix is defined. The convention of Parncutt et al. (1997) is used, in which six different types of distances are defined per finger pair. MinRel and MaxRel stand for minimal and maximal relaxed distances, which means that two notes separated by this physical distance on the keyboard can easily be played with this finger pair, while still maintaining a relaxed hand. MinPrac and MaxPrac define the largest distances that can be played by each finger pair. All of these distances are user-specific and can easily be adapted in the implementation of the described algorithm. The two final distances are MinComf and MaxComf. These show the distances that can be played comfortably, in the sense of not having to stretch to a maximal spread. They depend on the values of MinPrac and MaxPrac, as can be derived from Eq. (1) (Parncutt et al., 1997). MaxComf = MaxPrac − 2
(for all finger pairs).
MinComf = MinPrac + 2
(for finger pairs including the thumb),
= MinPrac
(1)
(for finger pairs without a thumb).
When evaluating a fingering, ranges of distances are required in order to know which are the largest spreads that can be attained. For this reason minimal and maximal distances are defined to delimit the relaxed, comfortable and practical ranges. In this paper, distances are expressed in units that measure how many piano keys one note is separated from another. This is analogous to the definition of Parncutt et al. (1997) and is visualised in Figure 2. Nevertheless, distances between some keys are altered in this research in order to solve the issue noticed by Jacobs (2001). He stated that a missing black key between Preprint accepted for publication in International Transactions of Operations Research. Special Issue on Variable Neighbourhood Search, Volume X, Issue X, X October 2015, Pages X. 4
3.2
Objective function
3
PROBLEM DESCRIPTION
Table 1: Example distance matrix for a large right hand (adapted from Parncutt et al. (1997)).
Finger pair
MinPrac
MinComf
MinRel
MaxRel
MaxComf
MaxPrac
1-2 1-3 1-4 1-5 2-3 2-4 2-5 3-4 3-5 4-5
-10 -8 -6 -2 1 1 2 1 1 1
-8 -6 -4 0 1 1 2 1 1 1
1 3 5 7 1 3 5 1 3 1
6 9 11 12 2 4 6 2 4 2
9 13 14 16 5 6 10 2 6 4
11 15 16 18 7 8 12 4 8 6
E and F on one hand, and between B and C on the other hand, does not influence the distance between these respective key pairs. Therefore we introduce two imaginary, unused keys (key 6 and 14 in Figure 2) where a black key is missing. The numbering of the keys (or relevant pitches) continues in the next octave, in such a way that the next key (C) would be attributed a value of 15. This representation system has several advantages. First, the distance between two keys can now correctly be calculated without a bias for the ‘missing keys’. Moreover, by calculating modulo 2, one can distinguish between black and white keys. Finally, by calculating modulo 14, the pitch class can be decoded.
Figure 2: Distances on a piano keyboard with additional imaginary black keys.
All finger pairs involving the same finger twice (e.g., fingers 1-1) are set to zero in the distance matrix. An example of the other values in such a distance matrix for a large right hand is given in Table 1 as an adaptation of Parncutt et al. (1997), taking into account the newly defined keyboard distances. For pianists with smaller hands, tables with reduced absolute values of MinPrac, MaxPrac, MinRel and MaxRel are available in the developed software. Alternatively, users could adapt them according to their personal preferences, as was also suggested by Al Kasimi et al. (2005, 2007). Interpreting the values in Table 1 is straightforward in combination with the keyboard image in Figure 2. A value of -10 for MinPrac R(1-2) means that when the thumb is put on A4 (11), the index of a large right hand can be stretched no further than C4 (1), because 1 − 11 equals -10. In order to calculate the distances for the finger pair where the order of the fingers used in first and second position are swapped, Min and Max have to be interchanged and the values have to be multiplied by -1. E.g., to calculate MinPrac R(2-1) from finger 2 to 1 of the right hand: take MaxPrac R(1-2) (11) and multiply by -1. This gives MinPrac R(2-1) = −11. To deduce the information for the left hand, the order of the finger pair needs to be swapped (e.g., 1-2 becomes 2-1). E.g., to calculate MinPrac L(2-1) = MinPrac R(1-2) = −10 (Parncutt et al., 1997).
Preprint accepted for publication in International Transactions of Operations Research. Special Issue on Variable Neighbourhood Search, Volume X, Issue X, X October 2015, Pages X. 5
3.2
Objective function
3.2.2
3
PROBLEM DESCRIPTION
Rules
The previously defined distance matrix is used to calculate how well a fingering adheres to the set of fingering rules. These rules are listed in Table 2, which includes a description, the associated penalty in the objective function and the originating source for each rule. The application column specifies the relevance of each rule in the context of monophonic, polyphonic or both types of music. Rules 1, 2, and 13 take the distance matrix into account for consecutive notes. These rules can be applied for each pair of subsequent notes both for monophonic and polyphonic music. To prevent unnatural finger crossings in chords, rules 1, 2, and 13 are also applied within a chord in rule 14 to avoid awkward positions. An example of such a position would be the right index finger placed on a lower note than the right thumb within one chord. Hence within rule 14, rules 1 and 2 are applied with doubled scores to account for the importance of this natural hand position within one chord. It is important to point out that Parncutt et al. (1997) use the distances MinPrac and MaxPrac in rule 13 as hard constraints. These minimal and maximal distances of each finger pair are not to be violated. If no feasible fingering is found, then that particular (monophonic) part cannot be played legato. In this paper, these practical distances are implemented as soft constraints in the objective function, but they are assigned very high penalties per unit of violation (e.g., +10). This allows to apply the set of rules of Parncutt et al. (1997) on polyphonic music more easily. Otherwise, it would be very hard for the algorithm to find a feasible initial solution. Rules 3 and 4 also consider the distance matrix and account for hand position changes, which have to be reduced to necessary cases only (Parncutt et al., 1997). Rules 5 up to 11 prevent difficult transitions in monophonic music. The original scores of rule 8 and 9 could respectively add up to five or four between one pair of notes, which is relatively high compared to a score of one or two, resulting from the application of the other rules. Therefore, the score obtained by these rules is divided by two compared to the original suggestion made by Parncutt et al. (1997). In addition, the use of the fourth finger should not always be avoided, since the pianist might want to train this finger. Hence the user is advised to change the score in rule 5 to zero, which is also the default setting for our experiments. However, since the use of finger three and four in any consecutive order is complicated by their biomechanical connection, rules 6 and 7 are retained in order to prevent this difficult combination (Leijnse, 2014). Rule 12 prevents the repetitive use of a finger in combination with a hand position change. For instance, a sequence C4–G4–C5 fingered 2–1–2 in the right hand forces the pianist to reuse the second finger very quickly. This becomes problematic in fast pieces. Therefore, the finger combination 2-1-3, which is favoured by rule 12, offers a better solution. Rule 15 tries to prevent hand position changes when notes with the same pitch are played consecutively. When this rule would not be implemented, consecutive notes which have the same pitch could be played by a different finger. The logic of the rules and their relative scores were obtained from discussions with professional musicians (Eeman, 2014; Swerts, 2014; Chew, 2014; Leijnse, 2014). The scores resulting from the application of each rule are summed for each hand to form the total difficulty score per hand for a candidate fingering. This is shown in Eq. (2). This sum forms the objective function of the piano fingering problem and should be minimised in order to find an easily playable fingering. For each hand, one objective function is defined, because each hand can be optimised independently. X min f (s) = fi (s) (2) i
An advantage of this approach is that it allows the user to change the score of different rules and thus change their relative importance. For instance, pianists who do not like to use the fourth finger might set the score of rule 5 back to one instead of zero.
Preprint accepted for publication in International Transactions of Operations Research. Special Issue on Variable Neighbourhood Search, Volume X, Issue X, X October 2015, Pages X. 6
All
All
Monophonic
Monophonic
Monophonic Monophonic Monophonic
Monophonic
Monophonic
Monophonic
Monophonic Monophonic
All
Polyphonic
All
2
3
4
5 6 7
8
9
10
11 12
13
14
15
Application
1
Rule
7
+1
+10
+2 +1
PROBLEM DESCRIPTION Own rule, based on constraint of Parncutt et al. Own rule, based on Al Kasimi et al. Own rule
Parncutt et al. Own rule
Parncutt et al.
Parncutt et al.
Parncutt et al.
Parncutt et al. and Jacobs Parncutt et al. Parncutt et al.
3
+1
+0.5 +1 +1 +1 +1
+1 +1 +1
+1
Parncutt et al.
Parncutt et al.
+1
+1 +1
Parncutt et al. and Jacobs
Parncutt et al.
Source
+1
+2
Score
Constraints
For consecutive slices containing exactly the same notes (with identical pitches), played by a different finger, for each different finger.
For every unit the distance between two consecutive notes is below MinComf or exceeds MaxComf. For every unit the distance between two consecutive notes is below MinRel or exceeds MaxRel. For three consecutive notes: If the distance between a first and third note is below MinComf or exceeds MaxComf: add one point. In addition to that, if the pitch of the second note is between the other two pitches, is played by the thumb and the distance between the first and third note is below MinPrac or exceeds MaxPrac: add another point. Finally, if the first and third note have the same pitch, but are played by a different finger: add another point. For every unit the distance between a first and third note is below MinComf or exceeds MaxComf. For every use of the fourth finger. For the use of the third and the fourth finger consecutively. For the use of the third finger on a white key and the fourth finger on a black key consecutively in any order. When the thumb plays a black key: add a half point. Add one more point for a different finger used on a white key just before and one extra for one just after the thumb. When the fifth finger plays a black key: add zero points. Add one more point for a different finger used on a white key just before and one extra for one just after the fifth finger. For a thumb crossing or passing another finger on the same level (white–white or black–black). For a thumb on a black key crossed by a different finger on a white key. For a different first and third consecutive note, played by the same finger, and the second pitch being the middle one. For every unit where the distance between two following notes is below MinPrac or exceeds MaxPrac. Apply rules 1, 2 (both with doubled scores) and 13 within one chord.
Description
Table 2: Set of rules composing the objective function (adapted from Parncutt et al. (1997), Jacobs (2001) and Al Kasimi et al. (2007)).
3.3
Preprint accepted for publication in International Transactions of Operations Research. Special Issue on Variable Neighbourhood Search, Volume X, Issue X, X October 2015, Pages X.
3.3
3.3
Constraints
4
VARIABLE NEIGHBOURHOOD SEARCH ALGORITHM
Constraints
In addition to the soft constraints defined in the objective function above, a hard constraint is defined, stating that the same finger cannot be used on two different keys played at the same time. A violation of this constraint results in a rejection of the solution.
3.4
Complexity of the problem
The piano fingering problem can be considered as a special case of the partial constraint satisfaction problem [PCSP] (Koster et al., 1998). The PCSP is defined by a so-called constraint graph G = (V, E), where V is the set of vertices and E is the set of edges. Each vertex v ∈ V represents a decision variable that must be assigned a value from a given domain Dv . Each value in Dv has a penalty associated to it, which is defined by a vertex-penalty function Qv : Dv → R. An edge {v, w} ∈ E indicates that there are penalties attached to certain combinations of values for vertices v and w. These penalties are defined by an edge-penalty function P{v,w} : {{dv , dw } | dv ∈ Dv , dw ∈ Dw } → R. In a more formal notation, the PCSP can be defined by the quadruple (G = (V, E), DV , PE , QV ), where DV is the set of domains Dv (for each vertex v ∈ V ), PE is the set of edge-penalty functions P{v,w} (for each edge {v, w} ∈ E) and QV is the set of vertex-penalty functions Qv . The objective of the PCSP Pis to find an assignment A = P{av | v ∈ V, av ∈ Dv } that minimizes the total sum of the penalties {v,w}∈E P{v,w} ({av , aw })+ v∈V Qv (av ). This problem is known to be NP-hard as it can be reduced to the maximum satisfiability problem (Koster et al., 1998, 1999). The piano fingering problem can be modelled as an instance of the PCSP in which each note in a piece is represented by a vertex v ∈ V . In this sense, each vertex (note) must be assigned a value (finger) from a common domain Dv = {1, 2, 3, 4, 5}. The set of rules defines the edges that connect each pair of vertices {v, w} ∈ E, the edge-penalty functions in PE and the vertexpenalty functions in QV . Two vertices are connected by an edge if the notes they represent are played simultaneously or consecutively2 . This condition causes the graph G that defines the PCSP instance to have a very specific structure. Nonetheless, to the best of our knowledge, it is unclear whether such properties make this kind of instances easier to solve.
4
Variable neighbourhood search algorithm
Exact algorithms are often used to solve a wide range of combinatorial optimisation problems. These algorithms guarantee to find the optimal solution. However, for complex problems, their execution time can grow exponentially with the size of the instance. For these cases, metaheuristics present a good alternative solution technique. Although they are not guaranteed to find the optimal solution, they tend to find very good solutions for complex instances in a reasonable execution time. A metaheuristic is a high-level problem-independent algorithmic framework that provides a set of guidelines or strategies to develop heuristic optimization algorithms (S¨orensen & Glover, 2012). Heuristics are based on rules of thumb and are a frequently applied alternative to solve complex combinatorial optimization problems. Metaheuristics are divided into three categories; constructive, population-based and local search [LS] (S¨orensen & Glover, 2012; Herremans & S¨ orensen, 2013). LS metaheuristics form a large class of metaheuristics that are characterised by the fact that they iteratively improve a solution (called the current solution) by applying small adaptations (moves) to it. For most representations of combinatorial optimisation problems, different move types can be defined. The set of all solutions reachable from a given current solution by a single move of a given move type is called this solution’s neighbourhood (S¨orensen & Glover, 2012). When no more improving moves of a certain type exist in the neighbourhood of the 2 Note that this model of the piano fingering problem excludes complex rules that involve triplets of notes (like, for example, Rule 4 in Table 2). Nevertheless, such simplification is still useful to illustrate the complexity of the problem.
Preprint accepted for publication in International Transactions of Operations Research. Special Issue on Variable Neighbourhood Search, Volume X, Issue X, X October 2015, Pages X. 8
4
VARIABLE NEIGHBOURHOOD SEARCH ALGORITHM
current solution, a local optimum is reached. A possible way to escape such a local optimum is by using a different move type, a strategy commonly called variable neighbourhood search [VNS]. This strategy is based on the fact that a local optimum relative to a certain move type is not necessarily a local optimum relative to another (Mladenovi´c & Hansen, 1997). When all neighbourhoods are exhausted, a perturbation randomly changes part of the best solution so that the next iteration of the algorithm can continue the search (Louren¸co et al., 2003). The algorithm developed in this paper is a VNS algorithm. Herremans & S¨orensen (2012) also successfully applied a VNS algorithm to a musical problem, namely to the automatic composition of counterpoint music. A local search metaheuristic might also prove to be successful in piano fingering, as the process of making small changes to a fingering is analogous to an artist making final adaptations to a given fingering (whether or not indicated by the composer). This approach is similar to how Sloboda et al. (1998, p. 185) describe the human process involving some heuristics. Moreover, since metaheuristics can generate good-quality solutions in a reasonable execution time, they offer a promising optimisation framework to generate good fingerings for complex polyphonic pieces. The implemented VNS algorithm does not require a graph based representation for solutions, which allows it to be easily applied to polyphonic music. The complex nature of polyphonic music makes it very difficult and computationally expensive to use an exact approach. Metaheuristics offer an efficient way of solving the specified piano fingering problem. As can be seen from the flowchart diagram in Figure 3, the algorithm starts by generating a random initial solution. Every note is assigned a finger at random. In order to obtain a feasible solution, once a finger is used in a particular time slice, this finger can no longer be assigned to another note in the same slice. A slice is a part of the piece with the duration of the shortest note from the piece. In this way, the duration of a note consists of an integer number of slices. The described process works its way through the piece from left to right. Assuming that maximum five notes are to be played at the same time by one hand, a feasible solution is guaranteed. After generating the initial fingering for both staves, the right and left hand are optimised separately. An initial preprocessing step, called Swap, is performed to make a large improvement by interchanging all assignments of two fingers throughout the entire piece. This step produces a relatively small neighbourhood with a maximum size of 10 possible moves3 . The implemented local search procedure (VNS) uses different neighbourhood structures, each defined by a move type. In order to get an indication of the complexity of the algorithm, the formulas for calculating the size of each type of neighbourhood are given below. These formulas are based on a piece containing chords, each with the same amount of notes. t is used to indicate the number of notes in the piece and s refers to how many notes are played simultaneously within one chord. The neighbourhood sizes for the example piece in Figure 4 are calculated below. In the algorithm, there are three neighbourhood structures included: Change1 The Change1 move type changes the fingering of each note to any other allowed finger. The resulting neighbourhood contains feasible fragments in which one finger is changed. An example of this type of move is given in Figure 5(a). The size of this neighbourhood can be calculated as (5 − s)t. In the example piece in Figure 4, there would be 24 possible solutions in a Change1 neighbourhood. Change2 For the Change2 move type, the previous Change1 move is expanded to changing the fingering of two notes that are either adjacent or simultaneously played. This results in a neighbourhood which consists of fragments with two changed fingers. For an example, we refer to Figure 5(b). The size of this neighbourhood is given by 2s [(6 − s)(5 − s) + 1] + ( st − 1){ 2s [(6 − s)(5 − s) + 1] + s2 (5 − s)2 }. In the example piece in Figure 4, this neighbourhood contains 192 solutions. SwapPart A SwapPart move type is included to enhance the changeability of the fingering of simultaneous notes. Suppose a long note a, assigned finger f is played simultaneously with 3 5 2
Preprint accepted for publication in International Transactions of Operations Research. Special Issue on Variable Neighbourhood Search, Volume X, Issue X, X October 2015, Pages X. 9
4
VARIABLE NEIGHBOURHOOD SEARCH ALGORITHM
Generate random initial solution For each hand Swap
Select order (P = 0.5)
iters++
Local Search in Change1
Local Search in SwapPart
Local Search in SwapPart
Local Search in Change1
Perturb best solution no iters > nrofiterations?
yes End
Local Search in Change2 Reset: iters = 0
no
yes
Did current solution improve in LS loop?
Current solution better than best?
no
yes
Save current in best
Figure 3: Structure of the algorithm.
Preprint accepted for publication in International Transactions of Operations Research. Special Issue on Variable Neighbourhood Search, Volume X, Issue X, X October 2015, Pages X. 10
4
VARIABLE NEIGHBOURHOOD SEARCH ALGORITHM
Figure 4: A short example piece used to calculate sizes of the neighbourhoods, with s = 3 and t = 12.
multiple short notes b1 , b2 , ..., bk , each assigned the same finger g. Since all notes bi are fully contained within note a, none of the previous move types would allow note a to be assigned finger g, as this would break the basic feasibility rule (because one finger would be playing multiple notes at the same time). The SwapPart however does allow the algorithm to swap fingers f and g, even in cases where it would otherwise not be possible without breaking feasibility. Moreover, this strategy only swaps the fingering of simultaneous notes that are fully contained within the interval of the longer note in order to reduce the probability of arriving at an infeasible solution. An example of this move type is shown in Figure 5(c). In the figure, D5 is note a, played by finger 1. Notes F4, A4 and again F4 are played simultaneously with D5. Both the F4’s could be considered as notes b1 , b2 , played with finger 5. The fingering of the longer note can only be changed from 1 to 5 when all simultaneous notes played with finger 5 are changed to the previous fingering of the long note. This is exactly what the SwapPart move can do, it allows us to change the fingering of longer notes in a polyphonic piece. This neighbourhood has a size of 4t. A SwapPart neighbourhood from the example piece in Figure 4 would contain 48 moves.
(a) Change1
(b) Change2
(c) SwapPart
Figure 5: Examples of a move from the different neighbourhoods.
In order to select a solution from a neighbourhood as the new current solution, a move strategy has to be selected. In this paper, two different strategies were implemented, namely steepest and first descent. In a steepest descent strategy, the entire neighbourhood is calculated before the solution with the biggest improvement is selected. In a first descent strategy, the first solution that improves the current solution is selected. In Section 5, we study the influence of the chosen move strategy on the solution quality and the execution time. Figure 3 shows how the different neighbourhoods are chained in the VNS strategy. The algorithm continues to perform a LS in a neighbourhood until it reaches a local optimum. The algorithm then moves on to the next neighbourhood, and continues until no better solution is found. When the local optimum of the last neighbourhood in the chain (Change2) is reached, the algorithm starts from the first neighbourhood again. The flowchart shows a split after the initial Swap step. This indicates that the order of Change1 and SwapPart is chosen randomly (with a 50–50% probability) by the algorithm. The reason for this design choice is that the best order is fragment dependent. The performance of both neighbourhoods depends on the monophonic or polyphonic character of a fragment. This search strategy continues until none of the three LS neighbourhoods can find a better solution. When such a local optimum is reached, this solution is compared to the best solution up to that point. When it is better, it is stored as the new best solution and the number of iterations without improvements is reset to zero. If the current solution is not better than the best solution, the number of iterations without improvement is compared with the stopping parameter (maximum allowed number of iterations without improvement). When the maximum number
Preprint accepted for publication in International Transactions of Operations Research. Special Issue on Variable Neighbourhood Search, Volume X, Issue X, X October 2015, Pages X. 11
5
EXPERIMENTS
of iterations without improvement is reached, the algorithm stops. Otherwise, a perturbation is performed on the best solution found so far. This new solution is then handed back to the Swap step and a new iteration of the algorithm is initialised. The perturbation splits up the piece in different parts. A set of subsequent slices is selected in each part. The fingerings in the selected slices are replaced by new ones. The new random fingerings are chosen only from unused fingers per slice in order to guarantee the feasibility of the solution. The size of the parts and the set of slices within each part are defined as a percentage of the length of the piece and the number of slices in each part respectively. This approach is preferred over changing the fingering of random notes spread across the piece, because it is very likely that in polyphonic music such changes are infeasible. For example, if you have a chord of five notes, you have to remove at least the fingering of two notes in order to obtain a feasible perturbed solution. Hence, perturbing entire sets of consecutive slices results in the best option. The VNS algorithm has been implemented in C++ and is available as Open Source software4 . For inputting sheet music and outputting sheets with annotated fingerings the MusicXML format is used. This format was designed for easy interchangeability (Good, 2001). They can be read by most music notation software, including the open source program MuseScore5 , which was used to visualise the results in this paper.
5
Experiments
The algorithm developed in Section 4 brings up the need for decisions concerning the parameter setting of its different components (e.g., size of the perturbation) that ensure optimal performance. Two experiments were conducted in order to base the choice of these parameters on a thorough statistical analysis. First, we describe the experiment conducted to determine the settings that are used by the intensification phase of the VNS. Afterwards a second experiment was carried out to set the parameters of the entire VNS. Independence of these two parameter groups is assumed, an approach similar to the one used by Palhazi Cuervo et al. (2014) in their experiments, focusing on a subset of parameters. The resulting best parameters of the first experiment are implemented in the second. By setting up the experiments like this, the computing time decreases. It is reasonable to assume that the performance of the intensification phase is independent of the entire VNS framework. Hence, the experiment can be split in two sub-experiments. As a result, instead of having to test A · B combinations, we only have to test A + B combinations. Both experiments were run ten times on ten pieces of one to three pages in sheet music notation. All experiments were run on a 2.93 GHz Intel Core i7 processor.
5.1
Intensification phase
The goal of the first experiment is to determine the optimal values for the parameters of the intensification phase. It takes five parameters into account. The four parameters step swap and nbh X indicate the inclusion (On) or exclusion (Off) of the improvement step or the respective neighbourhood in the algorithm. A final parameter improv strat indicates the use of steepest descent or first descent as the improvement strategy. An overview of the tested parameter values is given in Table 3. The impact of these parameters on two dependent variables is studied: the solution quality and the computing time. The quality of the solution is measured through the summed scores of the objective functions for the left and right hand, denominated Objective function. The variable Runtime is equal to the total computing time (user time) used for one test run. A full factorial experiment was conducted on ten different classical input pieces from varying style periods. For every piece, 25 or 32 executions of the algorithm were performed. This was 4 http://dorienherremans.com/automatic-piano-fingerings 5 Available
from musescore.org.
Preprint accepted for publication in International Transactions of Operations Research. Special Issue on Variable Neighbourhood Search, Volume X, Issue X, X October 2015, Pages X. 12
5.1
Intensification phase
5
EXPERIMENTS
repeated ten times for every piece. Since we considered a benchmark set of ten pieces, a total of 3200 observations was obtained. Table 3: Tested values of the parameters for the intensification phase experiment.
Parameter
Values
step swap nbh change1 nbh change2 nbh swapp improv strat
Off, On Off, On Off, On Off, On Steepest, First
The data generated by the experiment was analysed by means of a mixed-effects analysis of variance (ANOVA) using the statistical software JMP. The model considers the name of the piece as a random effect and includes second degree interaction effects. The p-values of the F -tests listed in Table 4 indicate the significance of the impact of the different parameters. The table also includes parameters with interesting, significant interaction effects. This table shows that the inclusion of the preprocessing step and the activation of each neighbourhood has a significant impact on the performance of the algorithm in terms of the Objective function. The mean plots in Figure 6 show that including a component in the algorithm has a positive impact on its performance. When examining the influence of the improv strat parameter, it is important to notice that using steepest or first descent has no impact on the Runtime, nor on the Objective function (see Table 4). This was verified by a non-significant first order effect in the models where only first-order effects were included. When looking at the impact of some significant interaction effects on the execution time in the extended model (see Figure 7), it becomes clear that the impact of the improvement strategy on the Runtime is neighbourhood-dependent. In Change1, first descent appears to be the fastest strategy, but in Change2, steepest descent is significantly faster. Figure 8 shows the evolution of the score over time of a run of the algorithm with steepest versus first descent. It is apparent that first descent requires more moves to arrive at a similar final solution, as the improvements are smaller than those of steepest descent. These smaller steps however generally take less time, as the algorithm does not have to search through the entire neighbourhood. As a result, first improvement improves faster at the beginning of a run. However, the described inverse relationship between the improving capacity of a move from an improvement strategy on one hand and and its time required on the other hand, causes both strategies to converge after a while and obtain similar results, which is confirmed by the high p-value of this parameter. Based on the analysis above, the Swap step and every neighbourhood are included in the optimal design of the algorithm. The chosen improvement strategy is steepest descent, as it Table 4: p-Values for an extract of the F -tests in the ANOVA model of Objective function and Runtime for the intensification phase experiment.
Parameter
Objective function