Ancient Music Recovery for Digital Libraries - CiteSeerX

5 downloads 2817 Views 372KB Size Report
ROMA (Ancient Music Optical Recognition) is a project intended to recover ... means constructing a model of the music printed, in some format that enables the.
Ancient Music Recovery for Digital Libraries J.Caldas Pinto *, P. Vieira *, M Ramalho*, , M. Mengucci **, P. Pina ** and F. Muge** * IDMEC/IST - Technical University of Lisbon - Instituto Superior Técnico, Av. Rovisco Pais, 1049-001 Lisboa, PORTUGAL ** CVRM/Centro de Geo-Sistemas - Instituto Superior Técnico Av. Rovisco Pais, 1049-001 Lisboa, PORTUGAL

Abstract. The purpose of this paper is to present a description and current state of the “ROMA” (Reconhecimento Óptico de Música Antiga or Ancient Music Optical Recognition) Project that consists on building an application, for the recognition and restoration specialised in ancient music manuscripts (from XVI to XVIII century). This project, beyond the inventory of the Biblioteca Geral da Universidade de Coimbra musical funds aims to develop algorithms for scores restoration and musical symbols recognition in order to allow a suitable representation and restoration on digital format. Both objectives have an intrinsic research nature one in the area of music ology and other in digital libraries.

Introduction The generality of National and Foreign libraries suffer from the problem of having to keep and administrate rich collection of musical manuscripts mainly from centuries XVI to XIX. Much of that music was composed by important composers. Nowadays we report a growing interest in those manuscripts, either by musicologists or Ancient Music performers both nationals and foreigners. However, libraries have two main problems to tackle: documents preservation and edition. Nowadays libraries are impotent against its collections’ degradation. The reading of the scores is becoming increasingly impossible, due to continuous process of degradation of the physical documents and internal (paper, ink, etc.) and external (light radiation, heat, pollution) factors. On the other hand it is frequently necessary to photocopy those manuscripts being that also a harmful operation. This project has a twofold objective: inventorying the Biblioteca Geral da Universidade de Coimbra musical funds and developing algorithms for scores restoration and musical symbols recognition, naturally for a suitable and significant class of scores. Both objectives have an intrinsic research nature one in the area of musicology and other in digital libraries.

Statement of the Problem ROMA (Ancient Music Optical Recognition) is a project intended to recover ancient music scores manuscripts to obtain a digital, easy to manage and easy to conserve, and, last but not the least, easy to handle music heritage. Optical Music Recognition is the process of identifying music from an image of a music score. The music scores under consideration are, most of the times, on paper. Identifying music means constructing a model of the music printed, in some format that enables the score to be re-printed on paper or even played by a computer. This formats capture the semantics of music (notes pitches and times) instead of the image of a music score, bringing, among others, the following advantages: (i) It occupies considerably less space; (ii) It can be printed over and over again, without loss of quality; (iii) It can be easily edited (with a proper editor). These advantages will bring self-correction capabilities to the system under development. OMR has some similarities with OCR (Optical Character Recognition). In OMR, instead of discovering what character is in the image, the aim is to discover what musical symbol is in the image (including notes, rests, clefs, accidents, etc.). However it con not be supported by a dictionary although some grammatical rules can aid as support for misunderstood signs. Ancient music recognition raises additional difficulties as: (i) notation varies from school to school and even from composer (or copyist) to composer (or copyist) (ii) simple (but important) and even sometimes large changes in notation occur in the same score; (iii) staff lines are mostly not the same height, and not always are straight; (iv) symbols were written with different sizes, shapes and intensities; (v) the relative size between different components of a musical symbol can vary; As some documents were hand written additionally (i) more symbols are superimposed in hand-written music than in printed music; (ii) different symbols can appear connected to each other, and the same musical symbol can appear in separated components; (iii) paper degradation requires specialised image cleaning algorithms. Maybe because of these difficulties, attempts to tackle this problem are sparse in literature ([1], [5]).

Basic Process of OMR The most common and simple approach to OMR, found in most literature, is through a set of stages composed in a pipelined architecture: the system works from stage to stage, from the first to the last, each stage producing results that are the inputs of the next stage. The most common stages are: • Pre-processing of the image: corrections at the level of image processing to simplify the recognition. Includes image rotation, to straighten staff lines, binarization (transforming a coloured image to a black and white one), image cleaning (in ROMA, this is a key issue: degradation of the paper requires great investment in image cleaning). • Removal of non-musical symbols. This stage consists of removing symbols that are not relevant to the music.

• Identification (and removal) of staff lines: localisation of the lines in the image and deletion, to obtain a new image having only musical objects ([1], [2] and [5]). • Object segmentation. This is the process of identification (recognition) of simple objects like blobs or lines that make part of a musical symbol, isolating them and constructing an internal model of the objects ([6]). • Object reconstruction. The stage on which the isolated simple objects are assembled in all sorts of musical symbols (notes, rests, etc.). For this process it is usual to use DCG's (Defined Clause Grammars) that describe each musical symbol through its components [9]. • Constructing the final representation. The final stage performs a transformation of the identified musical objects in a musical description format, such as NIFF or MIDI ([10]). The work that has already been done in this project is related to the stages of identification and removal of staff lines and object segmentation. We present the developments next.

Image Pre-processing A pre-processing approach mainly based on mathematical morphology operators [10][11] was developed. Input: The input is constituted by true colour images of approximately 2100x1500 pixels. They are constituted by lyrics and music scores printed in black, over a light yellow background, decorated with blue, red and gold signs like “illuminated letters” (figure 1a) or by particular notations within the music scores (figure 1b). In addition, the background of the images reveals the printed signs of the verso (other side of the pages) with lighter intensity, apart from other normal dirt resulting from natural causes or human handling. Output: The segmentation/classification of the several different components of the initial coloured images will produce binary images. Steps: The developed algorithm is constituted by 2 steps described in the following: (1) segmentation of coloured signs, (2) segmentation of music scores. 1. Segmentation of Coloured Signs The colour images are converted from Red-Green-Blue (RGB) to Hue-IntensitySaturation (HIS) colour spaces, better permitting to classify the coloured signs and to extract separately important marks, which are simply black signs (notes and lyrics, staff lines). Once the coloured signs present strong colours, its segmentation is quite simple and mainly based on the combination of simple thresholdings on the Hue and Saturation channels. The sets corresponding to the previously identified main colours (red, gold-yellow and blue) are separated into three different binary images. The application of these masks to the images of figure 1, is shown in figure 2.



Fig. 1. Coloured signs in musical scores

This information is very useful in the next step when it will be necessary to make a distinction of the dark signs to segment from some the coloured surplus.

2. Segmentation of Music Scores Due to the colourless aspect of the music scores, its segmentation is performed on the image of the channel Intensity. After the application of a smoothing filter (median) to remove local noise, the morphological gradient is used as the basis to construct a mask by dilation, that covers all the significant structures, i.e., the most contrasting structures in relation to the background. Within this binary mask the segmentation of the music scores is obtained through the reconstructed gradient approach developed to segment pages in books of the Renaissance [12]. Once the application of the reconstructed gradient approach is limited to the mask zone, most of noise far located from the music scores is suppressed.



Fig. 2. Segmentation of coulored signs: (a) gold -yellow, and (b) red

The application of the developed methodology to a full page coloured page (figure 3a) is presented in figures 3b and 3c, respectively for the music scores and for the coloured signs.

Identification and Removal of Staff Lines Indeed a method for identifying staff lines is already implemented in a musical manuscript from the XVI century with promising results. The algorithm is as follows: Input: a binary image having an unknown number of staff lines. Output: a coloured image, with the staff lines marked with a different colour. (Not only the centre of the line is marked, but also all the black pixels identified as

belonging to staff lines are coloured. The pixels belonging to other objects that superimpose with the staff lines are not coloured. Steps: The general algorithm has 2 steps: (1) Finding the staff lines, and (2) Marking the staff lines. These are described next. Note: For this objective (detecting the staff lines), a simple Hough transform technique could have been used. However, we find our method more adaptable and extensible in regard to future work, in staff images where lines are not completely straight (having slight curvatures) and have different heights across the image (see [1]).




Fig. 3. (a) Full coloured page of a musical score; (b) Segmented music scores after subtraction of coloured signs; (c) Segmented coloured signs with vivid colour (blue, red, yellow)

1. Finding the Staff Lines The first step consists of finding the centre of each line belonging to a staff line. This is done by analysing the horizontal projection of the black pixels. The horizontal projection is the counting of the number of black pixels, for each line of the image. A perfect line of a staff should correspond to a line of the image that has only black pixels. We consider that the lines that have the biggest count of black pixels (the greater projection) belong to staff lines. We define a threshold, which we call projection threshold, for which the projections, that are greater than that threshold, are considered staff lines. How do we find this threshold? We explain it later. It matters now to see that, around the staff lines, there will be areas of lines whose projection exceeds the projection threshold. We mark the centre of these areas as the centre of the lines. (Figure 4) The decision process is as follows: Forall l1, l2 (where l1 and l2 are lines of the image, and l2 > l1), If for all l, such as l1 < l < l2, projection(l) > projection_threshold, Then c = l1 + ((l2 - l1) / 2) is the centre position of a line.

Rotating the Image. To find the angle for which the staff lines in the image are horizontal, the algorithm calculates the horizontal projection for the image, rotated between two pre-defined angles, α1 and α2, with a pre-defined interval. The best

angle, αb , will be the one in which the centre of the horizontal lines has the greater projection (number of pixels):



Fig. 4. A clip from a binary image of a hand-written music score. (b) - Horizontal projection at blue, best projection threshold at green and centre of the max areas at red (from a) α b = arg maxα { MaxSum(α, Image) }, where α1 < α < α2, and MaxSum(α, Image) = Σ projection(l), where l is a line of Image where projection(l) is a local max of the projection, and projection(l) > projection_threshold.

(See figure 5, and compare it with figure 4: notice that the projection of figure 5 is smaller).



Fig. 5. A clip from a binary image of a hand-written music score, with a rotation of 0.5 degrees right. (b) 2b - horizontal projection at blue, best projection threshold at green and centre of the max areas at red (from figure 5a)

Finding the Projection Threshold. We use two different projection thresholds along the algorithm. For the issue of rotating the image, we use a pre-defined threshold. This value is fixed because it is not too relevant for finding the best angle. 40% is the best value for the images currently being handled. For the issue of finding the staff lines themselves, we execute an algorithm that finds the best projection threshold for the image. The algorithm is as follows: Input: The horizontal projection of the image: Output: An integer (corresponding to the best projection threshold) Process: Starting from the pre-defined threshold, the algorithm incrementally tests several thresholds above and below the initial one, to find the best one. For each threshold, it counts the number of staff lines in the image (a staff line is a group of 5 lines with similar distance between them; note that there may be some lines in the image that do not belong to staff lines). The best threshold is the one for which the number of lines belonging to staff lines minus the number of all lines is maximum. best_projection_threshold = arg maxthreshold { #StaffLines(L) #L }, where L is the set of lines found in the horizontal projection, using a certain threshold (see the decision process above), and StaffLines(L) is a subset of L, corresponding to the lines that belong to staff lines.

2. Marking the Staff Lines After we have found the centre of each line in a staff line, we will mark all the pixels that belong to the line. Note that we want to separate the pixels that belong to lines from the pixels that belong to musical objects. A straightforward approach is followed (e.g. [6]). It works as follows: Inputs: The image, and the positions of the centre of the lines belonging to staff lines. Outputs: The image, with the lines marked. Steps: For each line, 1. Estimate the line width. 2. For all columns of the image, 2.1. Retrieve the black stripe of that column (set of black pixels that are vertically connected to the pixel of the centre of line). 2.2. Mark or not the black stripe, according to following decision process: If length(black_stripe) < stripe_threshold, MARK, Else, DON'T MARK, Where, stripe_threshold = stripe_threshold_factor * estimated_line_width

( The stripe threshold is proportional to the expected width of the lines of the staff lines, since we want to remove the lines but not the objects that superimpose with them. The stripe threshold factor was calculated empirically and has the value of 1.6. However, an estimate or learning process should be used in the future for application to other pieces of music. More sophisticated decision processes could have been used (for instance, a template matching process with the objects that superimpose with the lines could identify more clearly the pixels that belong to lines and the ones that belong to objects). However, our simple process has proven to work finely for the images currently being handled. ) The final result is shown on figure 6.



Fig. 6. Lines of the staff marked with pink (from the image on figure 6a). (b) - Lines of the staff deleted (from the image on figure 6a).

Object Segmentation Object segmentation is the process of identifying and separating different components of the image that represent musical objects or parts of musical objects. In ROMA, a pre-segmentation process exists, to identify bar lines. In fact, this is the completed work regarding object segmentation, in this project.

Identifying bar lines Described OMR systems (e.g., [5]), the bar lines are recognised after the segmentation process. In ROMA targeted images it is more efficient to know the bar line positions prior to this process, because (1) The bar lines are visibly different from the rest of the objects in the image, and (2) They divide the score in smaller regions, producing block segmentation and the definition of regions of interest as a decomposition of the image [4]. The bar line identification process can be seen as a 2-phased classification algorithm. Here it is: Input: The image, The line boundaries of the staff line Output: Centre positions (columns of the image) of the bar lines Steps: 1. Calculate the vertical projection of the staff line (see figure 7). width

max height

Fig. 7. Vertical projection at blue (from the image on figure 6b).

Fig. 8. Features of a local maxima window: max height and width.

2. Pick local maxima regions (windows) where all the vertical projection in the window is above a given threshold (window height threshold) W = { (c1,c2) : ∀c, c1 hg1, Where wd1, wd2 and hg1 are pre-defined.

( The values of wd1, wd2 and hg1 were set empirically, and should be estimated or automatically learned in the future, for application to other pieces of music. This process seams to be enough, as the following rules apply: (1) The area, in pixels, of the bar lines are mostly identical, (2) Bar lines are always almost vertical, and (3) Bar lines are isolated from other objects, in the image. ) Some other musical objects can be included as bar lines candidates, when we use the above features (see figure 9a). To differentiate them, we use an additional feature

in a second phase classification, which is the standard deviation of the horizontal projection of the window (see figure 9b).



Fig. 9. (a) - Several objects are, besides bar lines, are classified as bar line candidates. (b) Horizontal projection of each window classified as a bar line candidate.

4. Second classification process: choose the bar lines, from the bar line candidates. Let w ∈ bar_line_candidates(W), and fw = (sd) be the feature vector of w, where sd = standard_deviation ( horizontal_projection (w)), w is a bar line window if: sd > sd1, Where sd1 is pre-defined.

( The value of sd1 was set empirically, but should be estimated or automatically learned in the future, for application to other pieces of music. This process seams to be enough, since the bar lines are the only objects that occupy the whole height of the staff line and, at the same time, pass through the prior classification process. )

Experimental Results We present next the final result for a full page of the musical piece from the XVI century that is being used for experiments (Figure 10).

Conclusions and Future Work The completed work in our project is a system that can mark (or remove) the staff lines from an image of a music score (possibly slightly rotated), and locate the bar lines in the score. Currently, the system has been tested for a set of images from a XVI century musical piece. Effectively there is a set of parameters that were set empirically, and for their generalisation and calculus a mo re extensive analysis should be performed. Future work includes not only the automatic calculation or estimation of the referred parameters, but also the continuing process of music recognition. These would require the completion of the object segmentation, recognition and assembling procedures, and the translation of the results to a musical representation language, or the printing of the recognised music.

Fig. 10. Full page of a musical piece from the XVI century, with the staff lines removed and the bar lines located.

Acknowledgements This project is supported PRAXIS/C/EEI/12122/1998.






References 1. F. Pépin, R. Randriamahefa, C. Fluhr, S. Philipp and J. P. Cocquerez, “Printed Music Recognition”, IEEE - 2nd ICDAR, 1993 2. I. Leplumey, J. Camillerapp and G. Lorette, “A Robust Detector for Music Staves”, IEEE - 2nd ICDAR, 1993 3. K. C. Ng and R. D. Boyle, “Segmentation of Music Primitives”, British Machine Vision Conference, 1992 4. S. N. Srilhari, S. W. Lam, V. Govindaraju, R. K. Srilhari and J. J. Hull, “Document Image Understanding”, Center of Excellence for Document Analysis and Recognition 5. Kia C. Ng and Roger D. Boyle, “Recognition and reconstruction of primitives in music scores”, Image and Vision computing, 1994 6. David Bainbridge and Tim Bell, “An Extensible Optical Music Recognition system”, Ninetenth Australian Computer Science Conference, 1996 7. “Musical notation codes: SMDL, NIFF, DARMS, GUIDO, abc, MusiXML”, http://www.s

8. Christian Pellegrini, Mélanie Hilario and Marc Vuilleumier Stückelberg, “An Architecture for Musical Score Recognition using High-Level Domain Knowledge”, IEEE - 4º ICDAR, 1997 9. Jean-Pierre Armand, “Musical Score Recognition: A Hierarchical and Recursive Approach”, IEEE - 2º ICDAR, 1993 10.Jean Serra, “Image Analysis and Mathematical Morphology”, Academic Press, London, 1982. 11.Pierre Soille, “Morphological Image Operators, Springer”, Berlin, 1999. 12.Michele Mengucci and Isabel Granado, “Morphological Segmentation of Text and Figures in Renaissance Books (XVI Century)”, to present at ISMM’2000 – 5th International Symposium on Mathematical Morphology and its Applications to Image and Signal Processing, Palo Alto, USA, June 2000.