DIVAServices – A RESTful Web Service for Document ...

3 downloads 38064 Views 945KB Size Report
different DIA algorithms which can be invoked from external applications using ... Computations and results can be included in any kind of application: desktop, ...
DIVAServices – A RESTful Web Service for Document Image Analysis Methods

Marcel Würsch, Rolf Ingold, and Marcus Liwicki DIVA Group, Department of Informatics, University of Fribourg, Boulevard de Pérolles 90, 1700 Fribourg, Switzerland {firstname.lastname}@unifr.ch

Abstract In this article we present a web service framework providing automatic document processing methods to the public. Furthermore, an assessment environment and sample applications using this framework are briefly described. Research on Document Image Analysis (DIA) focuses mainly on developing and refining automatic processing steps, e.g., text line extraction, binarization, and layout analysis. While many state-of-the-art methods perform satisfactorily, the algorithms applied to obtain the results are not easily accessible for other researchers. Making the source code available is often not sufficient as it typically requires a cumbersome installation of required libraries and reading long manuals about the usage. We present a new approach for making methods available to researchers in the digital humanities without requiring them to have detailed knowledge of the algorithms. For our approach we propose a RESTful web service architecture, the current state of the art in online web communication. For a developer this reduces the steps needed to access a method to sending and receiving HTTP requests with Java Script Object Notification (JSON) data, removing all installation steps. We will build on standards, such as the Text Encoding Initiative (TEI) and the International Image Interoperability Framework (IIIF). Thus, methods hosted on DIVAServices can be integrated easily into document processing workflows by any software engineer in computer science, but also the digital humanities without specific knowledge of the mathematical and algorithmic details of DIA.

1. Introduction In recent times, many digital humanities projects started to either incorporate or to build new (semi)-automatic methods for solving problems related to image analysis for digital humanities (e.g., layout analysis or text line extraction). One of the shortcomings of the outcomes of these projects (e.g., IMPACT1, or TextGrid2) is that the produced automatic or semi-automatic methods are tightly coupled to the application they were used in. This problem, what we call the ‘Island Problem’, is visualized in Fig. 1. Both applications make use of some methods, some of which are identical. For developing these applications each developer had to undertake all the necessary steps to get the used methods running within the respective program. But, since they are then again coupled in the resulting application it is almost impossible to reuse them in other software. DIVAServices3 aims at solving this problem by providing access to different DIA algorithms using RESTful (Richardson & Ruby, 2008) web service architecture. To integrate a hosted method into any kind of application all that is necessary is an internet connection, the ability to send and receive HTTP requests, and the parsing of JSON information. Fig. 2 shows the general concept of DIVAServices. Our hosted platform provides access to different DIA algorithms which can be invoked from external applications using HTTP requests. Every application can use the results according to its specific needs; for further processing or visualization. As our infrastructure provides high computing power, this opens up possibilities for all forms of applications; desktop, web, and even mobile.

Fig. 1 The ‘Island Problem’ persistent in many digital humanities software projects. Each application uses automatic or semi-automatic methods which are tightly coupled. Therefore it is difficult to reuse the methods in other projects.

We aim at being compatible to existing standards: We will use IIIF for making new, computed images available and we will provide methods to generate TEI information from computed results.

1

http://impact-project.eu http://textgrid.de 3 The current version is available at: http://divaservices.unifr.ch 2

2. State of the art Our research is motivated by the availability of many different web-based tools for researchers with a humanist background wanting to do document image analysis (e.g. SALSAH (Schweizer & Rosenthaler, 2011), Transcribe Bentham (Causer & Wallace, 2012) or the Genizah project (Wolf et al., 2011)). These tools were either developed to solve one specific problem but have the methods directly coupled into it, or they do not make use of (semi)-automatic methods at all. In Document Image Analysis, web services have already been brought up in the past. One example being the Document Analysis and Exploitation (DAE) system (Lamiroy & Lopresti, 2011) that makes different algorithms available as SOAP4 web services and allows for workflow creation. We aim to expand this research with a special focus on digital humanities researchers with limited computer science knowledge. We will provide simple web interfaces as showcases for the web services and to demonstrate how they can be used and integrated into new applications. Some of them will be described in this article.

Fig. 2 The proposed approach; DIVAServices serves as a gateway to access different DIA algorithms. Computations and results can be included in any kind of application: desktop, HTML5 clients and even mobile applications.

3. Methodology We propose an open source framework for providing algorithms to the public. For this we designed a RESTful web service architecture exposing all information using JSON (Zyp & Court, 2013). The intention is to include a wide arrange of services for different tasks. Since

4

http://www.w3.org/TR/soap/



Image processing and enhancement in order to make the desired content more easily visible or to make the processing of further automatic analysis simpler. Those methods include, for example, binarization methods (Otsu, 1979), Laplacian of Gaussian (LoG), Difference of Gaussian (DoG)



Document layout analysis methods allowing to automatically extracting texts, text lines, or images. These methods include pixel (Wei, 2013) and interest point (Garz, Sablatnig, & Diem, 2011) based approaches



Optical Character Recognition (OCR) to support the transcription of the documents



Methods for palaeographic analysis, such as script identification (Ghosh, Dube, & Shivaprasad, 2010), writer identification (Fiel, Hollaus, Gau, & Sablatnig, 2013) and water mark analysis



Methods for feature extraction and feature selection, so that computer scientists can directly work on extracted meta-information without any specific knowledge in DIA. For example, the following methods are included: Local Binary Patterns (LBP) (Nicolaou, Slimane, Maergner, & Liwicki, 2014), Scale-Invariant Feature Transform (SIFT) (Lowe, 1999), Gabor features (Chen, Cao, Prasad, Bhardwaj, & Natarajan, 2010), standard feature search algorithms, as well as several feature selection methods (Wei, Chen, Ingold, & Liwicki, 2014).



Machine Learning algorithms: Support Vector Machines (SVMs), k-nearest neighbor algorithm (k-NN), Gaussian Mixture Models (GMMs)



Evaluation metrics for the automatic assessment of results and to allow computer science researcher to compare their systems. There we will build on the standards laid out in DAE

Fig. 3 Conceptual overview of the proposed DIVAServices framework. Access to the provided methods and tools would all be standardized using HTTP requests and JSON as input/output format.

Besides a large set of own implementations we integrated open source software like Tesseract5 (Smith, 2007) and OCROpus6 (Breuel, 2008). These projects provide a wide range of image processing algorithms, have been in development for years and proven to produce reliable results. A high level overview of the proposed framework is provided in Fig. 3. Access to the provided tools and algorithms is standardized across all possible applications using HTTP requests and JSON as input/output format. Note that the purpose of DIVAServices is to provide algorithms. The creation of specific workflows is left to developers designing client applications and can therefore be designed targeting the specific need of end users. A workflow for text line analysis has already been provided by our group (Wei, Chen, Seuret, Würsch, et al., 2015). Further approaches and interfaces will be described in Section 5.

3.1. Representational State Transfer (REST) In this section we provide the necessary background to understand RESTful web services as it is the core concept of DIVAServices. An in-depth introduction RESTful architecture is available by Richardson and Ruby (Richardson & Ruby, 2008). The RESTful principle is a resource-orientated architecture (ROA), meaning that the distributed resources may be under the control of different ownerships. In the focus of a ROA is the resource – a logical entity exposed for direct interaction (Overdick, 2007), meaning that a user in a ROA system directly interacts with the exposed objects. The REST principle was proposed by Fielding as the architectural style of ROA for the web (Fielding, 2000) using HTTP requests for performing actions. It is based on the following set of constraints: 1.

Client-Server Interaction: The core of REST architecture is based on the distinction of two logical components. The client performing requests and the server providing responses.

2.

Stateless interaction: When generating the response the server only considers only information provided by the client in the request. Sessions are not supported and the server can therefore not rely on stored information.

3.

Cache support: Performance improvements can be achieved by storing responses on the server or the client.

5 6

https://github.com/tesseract-ocr http://github.com/tmbdev/ocropy

4.

Uniform interface: The client-server interface needs to fulfil several characteristics (e.g., identification of resources).

5.

Layered systems: If a server is unable to generate a response by itself, it can act as a client by performing a request to another server.

DIVAServices adopts this principle; methods and results are reachable by a unique URI, information can be accessed using HTTP GET requests and no states are saved on the server. In the core principle HTTP POST requests should be used to create an entity. We draw upon this analogy and use POST requests for starting the execution of a method, arguing that this will create the result.

3.2. Java Script Object Notation (JSON) JSON is a data format that can be used to exchange information between two systems. Compared to XML it is much simpler; using curly brackets for object representations which contain only name-value pairs. We decided for JSON over XML as it has become the de facto standard for data transfer in modern web applications (Burke, 2013) and we can adapt it specifically to the needs of D IVAServices.

Fig. 4 A comparison of a JSON (left) with XML (right)

Fig. 4 provides an example of a JSON (left) and shows how the same information would be encoded using XML. Advantages of JSON are that it reduces some of the overhead produced by XML, and that data represented in the JSON format can be processed using almost any programming language.

3.3. Communication with DIVAServices All communication with DIVAServices is performed using HTTP requests. GET requests are used to access information about available methods, detailed method information, and already computed results. Fig. 5 shows the JSON response that will be sent as answer to a GET request. The response contains information about the

origin of the method, basic runtime information, and needed parameters (e.g., user defined regions, and input values).

Fig. 5 JSON response to a GET request, asking for more information about one specific method. The response contains information about the origin of the method and the required parameters to run it.

Methods are invoked using POST requests containing image and parameter information in a JSON object. Once the computation is finished the response will contain all computed information as JSON again. In Fig. 6 the JSON information needed to invoke a text line segmentation method is shown (left). The information contains which image should be used, specified as a URL, as well as information about a user selected area on the image that should be processed. A response to such a request is also shown (right). In this case it contains information about the bounding box of each detected text line. Responses can vary for each executed method.

4. Method Effectiveness Testing Environment To test the effectiveness of DIA methods, we have built a web-based assessment user interface called DIVAServices-Spotlight7. It allows end users and application developers to perform experiments. We aimed at building a user friendly interface that provides all necessary information. A user can upload his images, apply available methods to them, and try different parameters for the methods to find the most suitable combination given the task. To ensure the user is able to interpret the results we provide simple interfaces for displaying them. Fig. 7 shows an example of such an interface: On the left we show the user selected input, necessary for processing, and on the right we show the computed output for the given algorithm (in this case a text line segmentation).

5. Applications using DIVAServices

7

Available at: http://divaservices.unifr.ch/spotlight

Methods hosted on D IVAServices are already used to perform certain tasks in real-world applications. We provide two examples here, one being a transcription, the other a ground-truth generation application.

Fig. 6 JSON information needed to invoke a text line segmentation method (left) include image information, and information about a selected rectangle that should be processed within the image. Once the computation is finished DivaServices will send the result back also as JSON (right), in this case of a text line segmentation it will contain information about the bounding box of each detected text line.

The first application, DIVADIAWT is a web-application which can be used to do transcription of documents directly in the browser. The user interface of D IVADIAWT is shown in Fig. 8. In a first step, a user can manually define several text blocks of which each can contain multiple text lines. Text blocks can then automatically be separated into text lines using a segmentation algorithm provided by D IVAServices. This process is visualized in Fig. 9 where a user created a rectangle using his mouse around a region that he wants to have automatically processed into text lines (left). The result of the segmentation is also shown in this image (right). An additional feature of this web interface is that the transcription on the right side is aligned with the position on the original image.

Fig. 7 The user selected input (left), and the computed result of a text line segmentation method (right)

Fig. 8 Overview of the DIVADIAWT. The original image is displayed on the left side, transcriptions on the right side. Transcriptions can be displayed in Layout mode, where they are aligned with the original image.

Another application using DIVAServices is DIVADIAWI (Wei, Chen, Seuret, Liwicki, & Ingold, 2015), a web interface for creating ground truth on historical documents using semi-automatic method. It uses methods for segmentation into text lines, as well as merging, and splitting of parts of lines. Furthermore we have developed a library in Java 8 that provides access to all available methods. This library can be used by any Java application to gain access to the methods hosted on D IVAServices and takes care of all the necessary communication.

Fig. 9: The user marked a box (visible as grey shaded area) that should be automatically divided into text lines using an algorithm provided by DIVAServices (left). The result of the automatic text line segmentation visualized by D IVADIAWT (right).

6. Conclusion and Future Work In this paper we presented a new approach for making DIA algorithms available over the internet for easy integration in existing or new research in the digital humanities. Using a web service based approach the time needed for researchers and software engineers in the digital humanities to use and apply state-of-the-art DIA methods should significantly decrease. With D IVAServices all complicated installation and configuration steps to use a method are removed by providing easily accessible RESTful web services.

8

Available at: https://github.com/DIVA-DIA/DivaServicesCommunicator

Note that one aspect in designing such a system is scalability. These questions are not easily solved and will be investigated as part of our future research. We aim at addressing issues such as: what is the best way to transfer all the data; how can large datasets be processed; and how can processing power be distributed? In the future we also want to add the possibility for results to also be exportable to other standard formats like TEI for certain methods. Furthermore we plan to provide toolkits for other programming languages to lower the entry barrier even more.

References Breuel, T. M. (2008). The OCRopus open source OCR system. In Proc. SPIE 6815, Document Recognition and Retrieval XV, Vol. 6815, pp. 1–15. Burke, B. (2013). Restful Java with Jax-RS 2.0. O’Reilly Media, Inc. Causer, T., & Wallace, V. (2012). Building A Volunteer Community: Results and Findings from Transcribe Bentham. Digital Humanities Quarterley, 6, 1–28. Chen, J., Cao, H., Prasad, R., Bhardwaj, A., & Natarajan, P. (2010). Gabor features for offline Arabic handwriting recognition. In International Workshop on Document Analysis Systems pp. 53–58. Fiel, S., Hollaus, F., Gau, M., & Sablatnig, R. (2013). Writer Identification on Historical Glagolitic Documents. In Document Recognition and Retrieval, Vol. 9021, pp. 902102–902102–10. Fielding, R. T. (2000). Architectural Styles and the Design of Network-based Software Architectures. Building, 54, 162. Garz, A., Sablatnig, R., & Diem, M. (2011). Layout Analysis for Historical Manuscripts Using Sift Features. In International Conference on Document Analysis and Recognition, pp. 508–512. Ghosh, D., Dube, T., & Shivaprasad, A. P. (2010). Script Recognition – A Review. IEEE Transactions on Pattern Analysis and Machine Intelligence, XX, 1–21. Lamiroy, B., & Lopresti, D. (2011). An Open Architecture for End-to-End Document Analysis Benchmarking. In 2011 International Conference on Document Analysis and Recognition, pp. 42–47. Lowe, D. G. (1999). Object Recognition from Local Scale-Invariant Features. In International Conference on Computer Vision, pp. 1150–1157. Nicolaou, A., Slimane, F., Maergner, V., & Liwicki, M. (2014). Local Binary Patterns for Arabic Optical Font Recognition. In International Workshop on Document Analysis Systems, pp. 76–80. IEEE. Otsu, N. (1979). A Threshold Selection Method from Gray-Level Histograms. IEEE Transactions on Systems, Man, and Cybernetics, 9(1), 62–66. Overdick, H. (2007). The Resource-Oriented Architecture. In the IEEE Congress on Services, pp. 340–347. Richardson, L., & Ruby, S. (2008). RESTful Web Services. O'Reilly. Schweizer, T., & Rosenthaler, L. (2011). SALSAH – eine virtuelle Forschungsumgebung für die Geisteswissenschaften. In EVA 2011 pp. 147–153. Smith, R. (2007). An overview of the tesseract OCR engine. Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, 2, 629–633. Wei, H. (2013). Layout Analysis on Historical Documents. In International Conference on Document Analysis and Recognition, Doctoral Consortium (ICDAR DC). Wei, H., Chen, K., Ingold, R., & Liwicki, M. (2014). Hybrid Feature Selection for Historical

Document Layout Analysis. In International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 87–92. Wei, H., Chen, K., Seuret, M., Liwicki, M., & Ingold, R. (2015). DIVADIAWI - A Web-based Interface for Semi-automatically Generating Ground Truth of Historical Document Images. Digital Scholarship in the Humanities, submitted. Wei, H., Chen, K., Seuret, M., Würsch, M., Liwicki, M., & Ingold, R. (2015). DIVADIAWI -- A Web-based Interface for Semi-automatic Labeling of Historical Document Images. Digital Humanities. Wolf, L., Littman, R., Mayer, N., German, T., Dershowitz, N., Shweka, R., & Choueka, Y. (2011). Identifying join candidates in the Cairo Genizah. International Journal of Computer Vision, 94, 118–135. Zyp, K., & Court, G. (2013). JSON Schema. http://tools.ietf.org/html/draft-zyp-json-schema-04 (accessed 17 September 2015)

Suggest Documents