Image and Text Coupling for Creating Electronic

0 downloads 0 Views 204KB Size Report
Second, tools are offered to organize the electronic document and establish ... Transcribing the text consists in coupling lines typed on the keyboard with their ...
Image and Text Coupling for Creating Electronic Books from Manuscripts

Laurent ROBERT Laurence LIKFORMAN-SULEM Eric LECOLINET

Ecole Nationale Supérieure des Télécommunications Signal Processing Department - CNRS URA 820 46 rue Barrault, F-75634 Paris cedex 13 e-mail : { robert, lauli, elc }@sig.enst.fr

Correspondence should be addressed to : Laurent ROBERT ENST/SIG 46 rue Barrault, F-75634 Paris cedex 13 FRANCE Phone : (33)-01-45-81-72-48 Fax : (33)-01-45-88-79-35 e-mail : [email protected]

1

Image and Text Coupling for Creating Electronic Books from Manuscripts .

Abstract : This article concerns research based on handwritten document and electronic books. We present here the first achievements of a Hypermedia Edit and Read Station (HERS) devoted to the browsing and the edition of hypermedia documents from literary material including document images. Our purpose is twofold. First, capabilities are offered to transcribe manuscripts. Second, tools are offered to organize the electronic document and establish hypermedia links between its different components.

Transcribing the text consists in coupling lines typed on the keyboard with their corresponding text lines in the manuscript images. This task can be realized by using a collaborative system based on computer-human interaction and document analysis. Automatic segmentation into text lines can be performed when the layout of the page is fair enough. Segmentation can also be performed interactively by using « markers » which enable to select areas in an intuitive and accurate way.

Links can be created interactively between image areas, transcribed words or lines, or other kind of heterogeneous data. We have developed a specific tool for visualizing the organization of the manuscripts, including their different versions and the associated links. By providing a global knowledge of the document organization our system helps the user to browse through the whole hypermedia document in an efficient way.

Keywords : manuscripts / text line extraction / hypermedia / electronic publishing / computerhuman interaction / document analysis.

2

1. Introduction Computer analysis of handwritten documents can be applied to other goals than character or word recognition : forensic document analysis, the study of texts in a paleographic sense, writer identification, verification of signatures. The computer does not replace the human expert but helps him to take decisions. Features are extracted on the writing and the makeup of the text in order to give information about the writing style or the writer [1] [2] [3]. Other studies concern the critical edition of literary works from drafts, manuscripts and their variants, historical and bibliographical notes. The systematic investigation of such material is known as genetic criticism [4] [5]. Critical editions generally include facsimile of manuscripts with their transcriptions. Projects related to electronic libraries have been recently proposed to edit and browse library material [6] [7] [8]. These projects aim at building read stations that will offer tools to display, code and transcribe manuscripts. Hypertext techniques are necessary as well as browsing tools. Document analysis is mostly required for transcription purposes as the transcribed text reflects the layout of the manuscript.

We present here the first achievements of a system dedicated to the edition of Electronic Books. It will give philologists, geneticists and other specialists of manuscripts (medieval documents, author manuscripts...), tools for producing an electronic literary work. This requires a system for manipulating a large set of heterogeneous data (the facsimile of manuscript images, the different versions of the manuscripts, the comments and notes of the editor or even of the readers, possible text, graphic or sound documentation...). It is a real challenge to represent and to organize all this data. Therefore our application is based on hypermedia capabilities which provide the possibility to link heterogeneous kinds of data (text, image, sound, video). We have considered manuscript images as « hypertexts » to create. We call them « hyperscripts » and the main idea consists in defining special areas in manuscripts images which can be linked to other parts of documents or other kind of data.

3

Our research also aims at combining document analysis with computer-human interaction in the context of unconstrained manuscripts. The main idea consists in taking advantages of these two fields by making them cooperate in a powerful way. As dealing with large documents in a fully interactive way could became a tedious task, we propose a semi-automatic approach that let the user validate or correct interactively editing hypothesis that are automatically provided by a document analysis module. Expected processing include the separation of the handwritten text from graphic elements, the segmentation of the text into text lines and words.

The paper is organized as follows. The next section presents the general design of our hypermedia system. Section 3 describes the interactive methods for transcribing manuscripts and for creating hypermedia links. In this part, we will describe also a system dedicated to the information visualization. Section 4 concerns the description of the method used for the automatic processing of the manuscript images. At last, we will conclude and present future works.

2. The General design of our « Hypermedia Edit and Read Station » (HERS) This section presents the general design of HERS. This piece of software has been developed for editing electronic books from manuscript images and other sources and for browsing the resulting hypermedia documents.

2.1 Editing capabilities Manuscript images are at the basis of the hypermedia editing process : the final document is created from these images. The edition of electronic books based on manuscripts involves two different tasks : • the capability to transcribe manuscript images in an intuitive way. • the capability to organize the whole document and to establish hyper-links between its components.

4

2.1.1 Transcription and image/text duality The transcription task consists in editing a typed version of the manuscripts from their original images, while respecting their graphic aspect (the spatial disposition of text lines is preserved, erasures and other editing marks are preserved) and in associating image elements in the manuscripts with the corresponding textual elements in the transcriptions (see parts 2, 4 and 5 of figure 2). A specialist must study and analyze a manuscript before transcribing it. He must extract the information on the meaning of this document and on the process of production that gave birth to it : « writing is a storage medium, a track, and also the process producing this track » [9]. Such a process requires to consider the dual nature of the manuscript images. They can either be seen as bitmap images that contain text and graphic elements (writing shapes, erasures, framed texts, marks of cross-reference such as arrows or asterisks) or as pure textual representations : they are « texts with a graphic interest » [10]. Consequently, our system has been designed in such a way that it can simultaneously manage the dual representation of documents (text and image).

5

figure 1 : a Flaubert’s manuscript and its classical diplomatic transcription typed on paper which respects the spatial disposition of the original text. We can observe that the words in square brackets are crossed out the original document.

6

2.1.2 Hypermedia links Our model includes several different kinds of links. First, the transcription process generates an implicit association between the handwritten text of a manuscript image and the corresponding typed-text in the transcription (see section 3.2.1 and part 5 of figure 2). Another kind of link consists in creating index tables (see section 3.2.2). At last it is possible to link the different areas of the manuscript images and of the transcriptions with themselves or even with external hypermedia entities such as image, sound, video... (see section 3.2.3). These last links constitute a network that can be compared to a graph. We have developed interactive tools for creating all these kinds of links : the idea consists in connecting the documents of the electronic book by creating and associating sensitive areas (graphic tracks) in the manuscript images and in the transcriptions (see section 3.2 and part 3 of figure 2). By clicking on these areas the user activates the browsing system.

2.2 Browsing capabilities Within an electronic book, the reader does not simply go through the one-dimensional space of a printed page, but he can design his own « journey » by choosing a mode of navigation in the « network of links » that constitutes the hypertext document. So, our system let users follow different links for navigating in the whole hypermedia document in a nonsequential way and create their own links and notes (thus enabling a « productive reading »). The nodes of the hypertext network correspond to the sensitive areas that are defined in the document images and their corresponding transcriptions.

However, an efficient navigation system requires extended browsing capabilities. It is quite important to provide the user with a global knowledge of the document organization in order to facilitate his navigation through the different parts of the hypermedia document. Consequently, our system makes it possible to visualize elements of the hypermedia document in different

7

ways. For instance several documents can be visualized simultaneously as small scale images in order to provide the users with an idea of the nature and the shape of the current document (the one that the user is looking) and of its context (see part 1 of figure 2).

Part 1 : The visualization area where several documents of the hypermedia book can be displayed (manuscript images or transcriptions) as small scale images. Thus, the user can have an idea of the context of the current document (it appears in a red frame).

Part 2 : Manuscript image (facsimile). Part 3 : Sensitive areas defined into the document. By clicking on these areas the user navigates through the whole hypermedia book or he calls heterogeneous kinds of data (notes, drawings, other documents or transcriptions, images, tape readings, text, etc.).

Part 4 : Transcription of the document located above.

Part 5 : A mere click on one line of the document and the user can visualize the corresponding text in the transcription (and vice versa). A transcribed line can be deleted or moved by drag and drop.

figure 2 : the main window of the application.

2.3 Human-computer interaction and document analysis Our system includes an automatic text line extraction algorithm which avoids fully interactive segmentation when possible (see section 4). Line selection in a manuscript to transcribe can be realized by using interactive tools or by a semi-automatic process.

8

Furthermore, we have developed interactive tools which provide users with a control on the automatic processing of manuscript images.

3. Interactive tools for editing hypermedia documents In this section we describe the processes for editing an electronic book from manuscripts. We describe how to transcribe a manuscript and how to create hypermedia links.

3.1 Creating a transcribed version The transcription is performed in two successive steps : the user must first select interactively the handwritten text in the manuscript image and can then enter the corresponding typed-text. The system provides automatic layout capabilities : once entered, the typed-text is automatically located in a way that respects the spatial disposition in the original manuscript image. We have also developed protocols based on image analysis and man-machine interaction for selecting areas of a document image. This selection process can be performed by using interactive tools called « markers ». Or, as mentioned before, selection can also be helped by an automatic page analysis stage that determines line segmentation hypothesis (see section 4). It is then up to the user to validate or correct interactively these hypothesis.

Four kinds of interactive markers are provided to the user. These tools make it possible to « drop » a graphic track on or around the handwritten text that has to be selected. The first two markers are based on the « stabilo model ». They make it possible to highlight a text line or a subpart of it (for instance for highlighting a single word or a group of words). The first marker (or « line marker ») is used to select a straight line (i.e. a thin rectangle area) on the text image while the second marker (or « stabilo marker ») can be used to follow fluctuating text line. This marker is materialized by a roll that can change its direction several times and that follows the user’s freehand line.

9

figure 3 : the « line marker ».

figure 4 : the « stabilo marker ».

The two remaining markers have been designed for framing several lines of text or any arbitrary parts of the manuscript image. The « rectangle maker » is used to define large image areas, without any concern for the shape of the handwritten text, or to select a text which is separated of the main text. Furthermore, for using this tool the user can select a word or a line. The « lasso marker » is used to select any parts of the image : the user can thus define very precise areas, useful when the form and style of the writing are not clear.

figure 5 : the « rectangle marker ».

figure 6 : the « lasso marker ».

A transcription interface is provided to the user for entering the typed-text. This window appears on the manuscript image of the document and at the same time, on the page containing the transcription. This interface automatically takes up a position under the handwritten line in the image and under the corresponding typed-text in the text window. This window can be moved at will by using the mouse in order to let the user visualize the context (surrounding lines) of the line in process in the facsimile and in its transcription.

figure 7 : Transcription of the selected line. The interface appears inside the image and can be moved at will by direct manipulation.

10

3.2 « Hyperlinks » Our system lets the user create the links in an homogeneous way : the user « drops » a graphic track on a manuscript image or transcription area using one of the interactive markers we have defined above. Then the user associates a « pointer » to the object that he wants to link to this zone. This one may be another document part or transcription, text, image, sound, bibliographic information, or reading notes. Such an area is « sensitive » because a link between two documents is activated by clicking on this area with the mouse. The three following sections describe the different kinds of links.

3.2.1 Text-image coupling The use of markers, during the phase of selection of a document part to transcribe, generates an implicit process that creates a link between the selected handwritten text and the corresponding typed-text. The user can take advantage of these links during the simultaneous visualization of the document and its transcription as clicking on a manuscript image area automatically highlights the corresponding typed-text in the transcription (and vice versa).

3.2.2 Indexes Another kind of links can be created to define index tables. Indexes are used to access the content of the document from selected terms in order to navigate effectively through the electronic book. The software provides the possibility to link a term to a manuscript image area and/or to its associated transcription. By clicking on an input of the index table, the reference of this input will be highlighted in the document and in its transcription (figure 8).

11

figure 8 : In this example, the user has selected the reference of the name « Félicité » in a table of indexes. So, the application makes it possible to visualize the line of the manuscript which contains this reference and the corresponding typed-text.

3.2.3 Links for navigating or « calling » hypermedia entities The « rectangle » and « stabilo » markers can help to define another kind of links that can be used : • to associate different parts (graphic tracks) of manuscript images or of transcriptions between themselves. In this way, users can create meaning associations between words or lines of documents or transcriptions,

(a) (b) figure 9 : When the user activates the graphic track of the image (a) he goes to (b) (and vice versa).

• to « anchor » heterogeneous hypermedia entities (text, image, sound, video) in handwritten documents and transcriptions. It could be possible for instance to call an image or a tape-recording by a mere click on a graphic track.

12

3.3 Visualizing and browsing the hypermedia document We have realized a tridimensional system for visualizing the organization of the manuscripts images and the links which compose the electronic book. This browsing tool provides users with a knowledge of the context of each manuscripts images (the quantity of the different versions of a same manuscript for instance). So, the users can navigate more efficiently.

This entity represents a manuscript : the page 1/ version 1 of the electronic book.

A link between two manuscript image areas.

(a) the front side of the wall. (b) the back side of the wall. figure 10 : (a) represents manuscripts images (horizontally the n pages [Pn]Vi and vertically the i different chronological versions Pn[Vi]). (b) represents links between these documents or parts of them.

The user can scroll on the wall, horizontally the pages, and vertically the versions or even he can navigate by following the links. By clicking on an entity of the wall the user can visualize the links that leave this manuscript image or the detail of this manuscript in a window of the main application (see part 2 of figure 2).

This approach is related to the domain of « information visualization » which aims in representing many documents in a restraint space (the screen) [11] [12] [13].

13

4. Documents analysis We have been mainly working with Flaubert' s manuscripts which were digitized with a resolution of about 150 dpi. The resolution was chosen in order not to break thin graphical elements such as erasures and lines. Document images appear on the screen and may directly be read for transcription purposes. Noise has to be removed as well as black stripes which are sometimes found along sheet sides. We removed point-like components using an area criteria. Black stripes were removed by considering the elongatedness of components in the vicinity of the image's border.

Taking advantage of document analysis is possible for image documents close to the final manuscript. As the transcription task involves coupling text lines of the manuscript with typed text, text lines are automatically extracted from images. We have applied a segmentation method which has been developed for unconstrained documents such as handwritten letters, drafts and author manuscripts. Documents may present fluctuating or interwoven lines, irregular spacing between lines and words, insertions between lines and include several text line directions. It is an iterative and ascending process which groups connected components into alignments. We recall here the main steps of the method [14].

Connected components are first searched in the image. Some of these components which have a reliable direction are selected as anchor points. Each anchor point constitutes the seed of an alignment. The grouping process consists in linking other neighboring components to anchor points by perceptive criteria : local neighborhood constraints are combined with global quality measures.

Two neighboring components are grouped if they satisfy three local constraints which are proximity, similarity and direction continuity. A threshold distance provides the proximity

14

constraint by defining, at each iteration, the allowable distance between components. The direction continuity criteria is computed by considering the density of black picture elements along the alignment direction. Components are incrementally included into alignments and the size similarity constraint eliminates the small components which stop this incremental construction because they do not have neighbors in the alignment direction. In case of dissimilarity and when the construction is stopped, more similar neighbors are searched in order to continue the alignment.

During the grouping process, conflicts may occur which are solved by applying a set of rules which consider alignments configuration and their quality. Alignment quality is a global measure computed, at each iteration, from the alignment obtained. The quality measure mostly depends on the length of the alignment. The actions of these rules consist in merging, splitting, extending or deleting irrelevant alignments.

The process is iterated in order to link all the components of a text line. At each iteration, the proximity constraint is relaxed by incrementing the distance threshold by a fixed value. When alignments do not change between two iterations, the segmentation is achieved.

At present, processing time is too long to be called on request when transcribing. It takes a few minutes on a whole page to perform component labeling, anchor point selection and grouping. Images are processed off-line avoiding the user to wait. For all lines of the pages, the coordinates of their components are provided. In a second stage, the result of the segmentation can interactively be changed by selecting with the mouse those components that must be added or removed. For instance, on figure 11 the last two components of alignment 16 must be removed and added to alignment 17. And the two last components following alignment 14 are left isolated (as they are not included in a rectangle) and must be added to it.

15

As we mentioned before, degraded documents cannot be processed by the segmentation method. We can enlarge the number of documents to be processed by removing big erasures drawn through the page which cannot be visually confused with text lines because of their size and direction. Flaubert drew these erasures in a great deal of documents once he corrected the draft and copied it on another sheet. The elimination of these erasures will also improve the visibility of the text for transcription purposes.

5. Conclusion and future works We have presented the preliminary achievements of a read and edit station devoted to manuscript transcription and hypertextual link creation. Document analysis and humancomputer interaction are combined to segment the image into text lines. Once a line is selected in the image, the user can type the corresponding text and easily create hypertextual links. Links can also be created between other type of data which enables to build an electronic book.

Future work will consist in enlarging the class of documents that can be processed automatically, by removing linear erasures through the page. Another point concerns the representation of hypermedia documents. We will focus on the development and on the testing of new techniques for information visualization. The third point concerns the links and the link network. We will define new tools for organizing hypermedia links by using direct manipulation. The coherence of the link network will be also automatically controlled.

Furthermore, research has still be done to develop functions within the HERS system in tight collaboration with specialists of manuscripts.

References : [1] J.B. Friedman, « Some contour features in medieval scripts : a preliminary study », in Advances in Handwiting and drawing : a multidisciplinary approach, C. Faure, P. Keuss, G. Lorette, A. Winter (Eds), pp. 547-560, Europia, Paris, 1994. 16

L. Likforman-Sulem, H. Maitre, C. Sirat, « An expert vision system for analysis of [2] Hebrew characters and authentication of manuscripts », Pattern Recognition, Vol 24, No 2, pp.121-137, 1991 R. Plamondon, G. Lorette, « Automatic signature authentication and writer [3] identification: the state of the art », Pattern Recognition, Vol 22, No 2, pp 107-131, 1989. J. Anis, « Tought, language and handwriting : what can we guess from literary drafts ? [4] », in Advances in Handwiting and drawing : a multidisciplinary approach, C. Faure, P. Keuss, G. Lorette, A. Winter (Eds), pp. 531-546, Europia, Paris, 1994. Lebrave, J-L , « L' hypertexte et l' avant texte [5] », Actes du Colloque Texte et Ordinateur : les mutations du lire-écrire, revue LINX (Centre de Recherches Linguistiques de Paris X), pp. 103-119, 1990 (in french). Philectre, Projet présenté dans le cadre du GIS Sciences de la Cognition sur le thème [6] mutation de l' édition induite par le livre électronique, B. Cerquiglini et J-L Lebrave (coordonateurs), 1995 (in french). A. Bozzi, A. Sapuppo, « Computer-aided Preservation and Transcription of Ancient [7] Manuscrits and Old Printed Documents », Ercim News, vol. 19, pp. 27-28, 1995. B. Stiegler, « Lecture et édition savante assistées par ordinateur : l’hypertraitement de [8] texte », Actes du congrès Afcet 1993 (tome 4 : Bureautique, document, groupware et multimedia), Gérard Dupoirier, ed., pp. 37-45, Afcet, Versailles, Juin 1993 (in french). J.-L Lebrave, « Hypertextes-mémoires-écritures », Genesis, manuscrits, recherches, [9] invention, n° 5, 1994, pp. 9-24 (in french). J. André, J-D. Fekete, H. Richy, « Mixed text/image processing of old documents », [10] Actes du congrès GuTenberg, La grande Motte, p. 75-85, 1995 (in french). J.D Mackinlay, G.G. Robertson & S.K. Card, « The perspective Wall : Detail and [11] context smoothly integrated », In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems, 1991, Addison-Wesley, Reading, MA. pp. 173-179. B.B. Bederson, J.D. Hollan, K. Perlin, J. Meyer, D. Bacon, G. Furnas, « Pad++ : A [12] Zoomable Graphical Sketchpad For Exploring Alternate Interface Physics », Computer Science Department, University of New Mexico, Journal of Visual Languages and Computing, 1996 (7), pp. 3-31. J. Lamping, R Rao, « The Hyperbolic Browser: A Focus + Context Technique for [13] Visualizing Large Hierarchies », Xerox Palo Alto Research Center, Journal of Visual Languages and Computing, 1996 (7), pp. 33-55. [14] L. Likforman-Sulem, C. Faure, « Extracting lines on handwritten documents by perceptual grouping », in Advances in Handwiting and drawing : a multidisciplinary approach, C. Faure, P. Keuss, G. Lorette, A. Winter (Eds), pp. 21-38, Europia, Paris, 1994.

17