Practical Experiences of Testing Web Services

3 downloads 79954 Views 296KB Size Report
taken relies on the careful selection of multiple testing tools and techniques. .... Based Testing) for the sake of automating both test generation and execution. ... When considering the testing of large-scale systems, performance testing,.
Practical Experiences of Testing Web Services Ghita Kouadri-Mostefaoui and Andrew Simpson Oxford University Computing Laboratory Wolfson Building, Parks Road, Oxford OX1 3QD United Kingdom {ghita.kouadri.mostefaoui, andrew.simpson}@comlab.ox.ac.uk

Abstract. The utilisation of web services in building e-commerce applications and business-to-business solutions is becoming increasingly widespread. The nature of these applications dictates a strong need for robust and fault-tolerant infrastructures. To this end, the establishment of sound approaches to testing web services is essential. Existing literature on the subject typically focuses on theoretical classifications of possible tests, with limited consideration being given to practical issues. In this paper we describe the approach taken to testing web services within the (iterative) development of a large-scale service-based infrastructure. The system in question is being developed to support a variety of distributed applications pertaining to healthcare delivery and research. We also report upon our experiences, which have given rise to a collection of generic principles for the testing of systems built on web services.

1

Introduction

Web services are becoming increasingly popular in the development of looselycoupled service-oriented infrastructures due to their utilisation of open standards such as XML and the flexibility afforded to developers to provide custom implementations. Systems built on web services are now starting to be deployed in business- and security-critical contexts, which is increasing the need for clear thought to be given to appropriate techniques and methodologies for testing systems built on web services. Recent efforts in this area have distinguished between two approaches: the use of ready-to-use tools (of either commercial or open source varieties),1 and the development of research prototypes for testing specific aspects of web services functionalities. In this paper we describe the approach to testing web services that has been taken within the GIMI (Generic Infrastructure for Medical Informatics) project [1]. Within GIMI, a project team at Oxford consisting of seven developers is building middleware to support the sharing of clinical data to facilitate distributed healthcare delivery and research. The requirements of interoperability and technical low-entry for application developers have led the project team to adopt an architecture based on web services. This architecture, together with a 1

Such as Apache-JMeter, IBM-TestStudio and Parasoft-SOAPtest.

focus on test-led iterative development have combined to require the team to give serious consideration to the appropriate testing of web services. The approach taken relies on the careful selection of multiple testing tools and techniques. The structure of the paper is as follows. Section 2 describes the GIMI project, and highlights the pertinent aspects of both the project and the system. In Section 3 we provide a necessarily brief overview of previous contributions in the area of testing web services. In Section 4 we discuss the derivation of tests, before, in Section 5, providing details about our testing strategy and the various tools and techniques that have been utilised to implement it. Section 6 comments upon the obtained results. Section 7 details a number of generic principles that we have distilled from our work. Finally, in Section 8, we summarise the contribution of this paper and discuss some potential areas of future work.

2

GIMI

GIMI (Generic Infrastructure for Medical Informatics) is a collaborative project funded by the UK’s Department of Trade and Industry. The main aim of GIMI is to develop a generic, dependable middleware layer capable of supporting data sharing across disparate sources to facilitate healthcare research, delivery, and training. The key deliverable of a middleware layer is being complemented by applications being developed for self-management of long-term conditions, image analysis for cancer care, and training and auditing for radiology. The project partners are drawn from academia and the commercial sector: the University of Oxford (Computing Laboratory and Engineering Science), University College London, Loughborough University, t+ Medical, IBM UK, Siemens Molecular Imaging, and the National Cancer Research Institute. The project work is divided into four key work-packages: the development of the core technology (Oxford University Computing Laboratory); long-term conditions (t+ Medical and Engineering Science, Oxford); mammography auditing and training (UCL and Loughborough University); and medical imaging in cancer care (Engineering Science, Oxford). The focus of this paper is the first of these. The middleware being developed is sympathetic to the design of an idealised secure health grid of [2], and builds upon the implementation of [3]. The current system utilises: Linux (Gentoo) / IBM AIX for the server operating system; Java 2 Standard Edition 5.0; Apache Tomcat 5.0 and Java Web Services Development Pack 2.0; Apache 2.2 using OpenSSL; Apache Derby and IBM DB2 databases; the Ant build tool; and the Eclipse development platform. The motivation for using Java-based technologies is platform-independence; using web services allows maximum interoperability with client implementations in a variety of languages. We are also using the Apache web server which is available for most platforms. The technologies we have chosen to create the GIMI infrastructure were chosen to ensure that it would be as portable and interoperable as possible: we have attempted to choose solutions with excellent cross-platform support and which is at least freely available—if not open source.

Fig. 1. GIMI node

Figure 1 illustrates the architecture of a GIMI node. WebDAV (Web-based Distributed Authoring and Versioning) [4] folders are used to provide scratch space for operations. Access to web services and WebDAV is possible only by using PKI mutual authentication using X.509 certificates. Users communicate with the WebDAV server over HTTPS with mutual authentication enabled— which establishes a secure channel over which insecure messages can be passed. WebDAV plays a key role in the GIMI architecture. First, there is no standard high-speed and secure way of transferring binary data using SOAP messages; WebDAV allows us to both get files from and put files to web servers in a secure way efficiently. Second, WebDAV allows the user to browse, upload and download files directly to their scratch space using a variety of third party WebDAV clients—this is especially useful when wrapping the web services with a portal. The WebDAV folders also provide an ideal location for any intermediate results that might be generated by algorithms which do not belong in the file-store but may be of interest to the user. A file-store and a database of meta-data provide part of the back-end of the system, with all interactions occurring through web service calls. We separate image data from patient and meta-data (with appropriate references between the two to guarantee referential integrity).

3

Related work

Current contributions on testing web services can be divided into three main classes: model-based testing of web services, web services monitoring, and practical testing of web services. Model-based testing of web services. In coordinated web services, the invocations of operations obey some ordering or coordination protocol. Taking GIMI as an

example, the user first has to build a query and submit it to a relevant database, before being able to retrieve the results. In [5], the authors investigate the use of symbolic transition systems to approach testing coordinated web services. They argue that the coordination protocols can be exploited as input for MBT (ModelBased Testing) for the sake of automating both test generation and execution. However, the description of these protocols need to be made available in the web service registry. Similar contributions are made by [6] and [7]. Web services monitoring. This approach is generally used to testing commercial web services, where access to the source code is limited, or impossible. In this case, the web service is mainly tested for robustness [8] and QoS. The process is conducted from a user perspective and relies on the service interface rather than on its implementation [9]. It is generally realised by implementing an external testing entity or an observer [10]—which is aimed at detecting output/faults either at run-time by checking the various inputs/outputs, or off-line by analysing interaction traces. Testing entities are generally implemented as mobile agents [11, 12] or as web services [13]. Practical testing of web services. There is very little literature pertaining to the practical testing of web services, with the most relevant contributions focusing on testing specific aspects, such as XML schema and WSDL. In this respect, [14] describes a method to generate tests for XML-based communication. The approach relies on the modification and then instantiation of XML schemas based on defined primitive perturbation operators. Related contributions include [15]. Industry reports such as [16] tend to provide a broad classification of the different tests one might perform, with little information given on their implementation and best practices. The main conclusion one might draw from current contributions is that the proposed testing techniques are mainly applied to research prototypes and mainly target specific aspects of web services. There is a lack of literature describing practical experiences in testing real-life web services. In particular, little has been written about the test case derivation process—and the concrete implementation of such tests—in such contexts.

4

The derivation of tests

What to test? should be the first question that a tester should answer before embarking upon the testing process. The answer to this question is relevant not only with respect to the preparation of the tools and knowledge needed to generate tests but also to estimate the time needed for this task. Planning to develop and run the maximum types of tests on web services is idealistic—for (at least) two reasons. First, testing is a never-ending process; second, in practice, software testing always involves a trade-off between cost, time and quality. As such, a preliminary selection and planning of tests is essential. When considering the testing of large-scale systems, performance testing, load testing and stress testing are all essential. Despite the fact that these terms

can be rather loosely defined at times, essentially they refer to different—and complementary—activities. Performance testing is mainly dedicated to measuring the response time of the web service under a certain workload; load testing is dedicated to analysing a web service when it is invoked simultaneously by many clients; stress testing is aimed at ensuring that the web service recovers gracefully when one of its resources (database, port, or server) is no longer available. While definitions abound of these categories of testing, practical advice on how to undertake the activities is limited. While these categories of testing are clearly important parts of any distributed or web application testing plan, sticking religiously to this classification for deriving tests is insufficient due to the fact that the categories do not cover all aspects of the system; indeed, they may ignore the critical ones. To derive the set of tests to perform, we argue that it is important to start from the specification of the system. In our case, we initiate our study from the requirements document. In case this document is not available, a verbal description of the functionalities of the web services infrastructure and/or its source code should be used instead. It is worth mentioning that in the GIMI project, the development team is following an iterative approach to development—with new aspects of functionality being incorporated continuously. Thus, tests are also carried out iteratively following the regression testing paradigm. Before continuing, it is worth considering the key characteristics of GIMI.

4.1

GIMI characteristics

In addition to being built as a service-oriented infrastructure using web services, there are other aspects to the project that are relevant to our discourse. – Layered architecture. The GIMI infrastructure is comprised of two layers: the client-side artifacts layer—which contains the details of web service methods, stubs, the management of SOAP message—and the API layer which abstracts all of the earlier implementation details and provides a lightweight interface to invoke web services. However, it is perfectly possible to achieve full functionality by interacting directly with the underlying web services. The API layer allows developers to easily extend the GIMI framework without the need for dealing with the low-level details of web services. – Several developers. The development of the core middleware within GIMI involves a team of seven developers working on various aspects of the middleware, with code changes being committed constantly. – End-user community. The premise upon which the GIMI project is based is that the middleware should enable researchers and clinicians to share data in a secure and ethical fashion. The middleware is a conduit for applications— and consideration of application developers and the delivery of appropriate APIs is at the heart of the project.

4.2

Testing perspectives

Service-oriented architectures raise a set of testing challenges and issues. The issues highlighted in [17] are devoted to the different kinds of testing that might be carried out by the services’ stakeholders, such as developers, service providers, service integrators, and end-users. Since a single development team at Oxford is undertaking most of these responsibilities (i.e., development, provision, integration, etc.), we have more freedom with respect to the types of tests that can be run on the system. Nevertheless, the ‘test the worst things first’ principle of [17] is a key driver in our test set selection—with this being manifested as ‘test the most critical aspects first’. Based on the nature of the project—and the characteristics of Section 4.1—a collection of guidelines has been derived. The intention of these guidelines is to drive the testing project with a view to achieving the optimum trade-off between cost, time, and quality. Top-down approach. A combination of the ‘test the worst thing first’ principle and the layered architecture of GIMI has led us to adopt a top-down approach to testing. The API layer is tested first, since it provides the starting point for developers to extend the framework with new functionalities. The extension/customization process is more likely to introduce bugs than the client-side artifacts layer, which relies on standard code for SOAP management, XML parsing, etc. Incremental testing. The requirements on the middleware are evolving constantly. As such, tests also follow an incremental pattern. To ensure that this process works, we utilise two main technologies: a version control system and a continuous build framework. The former relies on Subversion2 —an improved version of CVS. We rely on the subversion plugin, known as subeclipse, which is available for the Eclipse IDE. This is of particular benefit since multiple developers are working on the project simultaneously—and the latest version of the code base is needed for control by the continuous build. The latter is achieved using CruiseControl3 —an open source framework that supports continuous build processes—including testing. CruiseControl can be configured to run periodically in order to reduce code integration problems. It affords a web-based user interface for monitoring the project status, displaying testing results history, and sending notifications regarding the status of the build. Test automation. While test automation is a laudable aim, it is not possible (or desirable) to automate all tests: over time, some tests may have no real prospect of being rerun—and, as such, their presence becomes useless. The question of when automation should be employed is an interesting one (see [18, 19] for a discussion of such issues.) Potential strategies include trading-off the time needed to develop tests against the generated cost. It emerges (implicitly) from the contributions of [18] and [19] and (explicitly) from our practical experiences that 2 3

http://subversion/tigris.org/ http://cruisecontrol.sourceforge.net

taking the decision to automate a given test is dictated by the constraints of the specific context. Indeed, only the developers and testers can evaluate if a test needs to be added to the automated test suite. In GIMI, automation of a specific test (or a test suite) is driven by two main factors. The first is the automation of tests that relate to satisfying obligations that are contractual in nature. For example, being able to create a temporary folder on a grid node and then to move a medical image to the newly created folder is a base functionality that might be present at any time of the development process. These kinds of tests are also referred to as acceptance tests. They allow estimating the project progress by ensuring that a given requirement is implemented and fully-functional across the multiple iterations on the code. This first class of tests is more related to ensuring that scenarios defined in the requirements document are still achievable. The second is to automate the underlying support code upon which the earlier code relies. This helps in bounding the source of any bug and helps to realise the top-down approach. The choice of adequate frameworks and tools. Many tools and frameworks for the purpose of testing web services are available. Some of these tools are dedicated to testing web services from a developer’s perspective, while others allow testing from a consumer’s perspective. The latter are generally used in order to ensure that the functionality of the web service is as expected. In this case, testers have no access to the code and the used tools mainly perform direct invocations of the web services under different conditions. The first type—which are of interest to us—allowing access to the code are mainly used by the developers of the web services in order to reduce bugs for the sake of delivering a software release. Our choice for testing tools was also driven by the complexity of the GIMI infrastructure, which relies on functionalities like a custom-implementation of web services session management. Common testing tools do not offer the possibility of testing such extra functionalities; it is thus of paramount importance to select tools that give sufficient control on the source code. Our preference is to utilise free tools and Java frameworks. This choice has the benefit that such tools are continuously evaluated by the research and development communities.

5

Test implementation

In this section we present the different types of tests that have been undertaken on the GIMI infrastructure, and provide some details pertaining to their implementations and execution. 5.1

Functional correctness

The aim of functional testing is to verify the achievability of the functional requirements. In order to devise the tests relevant to this class of testing, we rely on the system use cases. In our context, actors are the Medic Group and the Researcher Group. The Medic Group includes General Practitioner (GP),

Specialist Nurse (SN) and Consultant (CN) whose typical task is to investigate data of a particular patient. The members of the Researcher Group are more interested in querying medical data sources for the purpose of statistical analysis. A sample GIMI use case diagram is depicted in Figure 2. Verifying functional correctness includes testing scenarios on manipulating patient data, transferring medical images, retrieving medical data, etc. Some use cases require the development of a single test application, while others require multiple independent tests. Complex use cases are refined to smaller ones, with tests being devised accordingly. This approach allows testing both higher level functionalities (depicted by use cases) and fine-grained functionalities (not visible in the use cases but relevant to the technological choices during development). For example, the identify GIMI resources for patient implicitly requires connect to datasource (which in turn relies on remote listing of relevant databases), and store retrieved data (which requires connecting to and creating a WebDAV folder and transferring the result set into it). This fragmentation of tests equates to testing smaller scenarios of the parent use case. More complicated use cases require the inclusion of the use cases they depend on. A sample code snippet illustrating the latter case is shown in the following listing, which presents testing sequentially both submit query on patient data and retrieve/display patient data use cases. public c l a s s T e s t Q u e r i e s { //A t e s t c l a s s showing q u e r i e s on p a t i e n t // d a t a b a s e s public s t a t i c void main ( S t r i n g a r g s [ ] ) throws ParserConfigurationException , IOException , SQLException { ...... // Create a SQL q u e r y s q l Q u e r y = ” S e l e c t ∗ FROM c i s . p a t i e n t ” ; try { // R e t r i e v e t h e r e s u l t as a WebRowSet WebRowSet r e s u l t = ( WebRowSet ) con . q u e r y ( s q l Q u e r y ) ; System . o ut . p r i n t l n ( ” \ n P r i n t i n g t h e WebRowSet . . . ” ) ; ...... } }

In summary, there is no direct mapping between the number of use cases and the effective tests to implement and run, and this should be borne in mind in any test plan. Additionally, despite the fact that the main use cases are mined from the requirements, some of them emerge as a result of adding helper elements to the system. These helpers are important and sometimes vital to the health of the whole system. For instance, a GIMI healthcheck web service has been implemented for the sake of checking periodically the status of the GIMI nodes. Including such components of the system into the testing process is valuable.

Fig. 2. GIMI sample use cases

5.2

Performance and load testing

The performance of a web service can be evaluated by measuring its response time and throughput. These two parameters are the QoS indices end-users can see and understand, so they are used to compare between multiple web services providing the same functionality. For performance and load testing, JMeter4 emerges as a commonly used tool. It provides GUIs for specifying the testing parameters such as the number of concurrent threads, the type of the request and the number of requests by time interval. The main issue here is that the implementation of GIMI web services is not trivial; the system supports extra functionalities not accessed by JMeter. For example, in order to manage sessions, GIMI relies on a ‘ticketing’ system [3]. The latter creates and publishes a ticket to be used as an identification token during the multiple invocations of the web service by the same client. Thus, to be able to have more control on the tests to perform we rely instead on GroboUtils,5 a free Java package which extends JUnit with multithreaded functionalities. The framework allows the simulation of heavy web traffic. Multiple unit tests can be run simultaneously for the aim of assessing the limitations of web services. It provides code patterns to use for building tests and, since the tests are implemented manually, GroboUtils gives developers greater control over the code they write to run tests. This ability was of great support in our case where users of GIMI access the web services through an API. The following listing shows a sample test implemented using the GroboUtils framework. 4 5

http://jakarta.apache.org/jmeter/ http://groboutils.sourceforge.net

public c l a s s M u l t i t h r e a d e d R e q u e s t s extends TestCase { private c l a s s T e s t S c e n a r i o extends TestRunnable { private S e r v e r w s S e r v e r ; private T e s t S c e n a r i o ( Server wsServer ) { this . wsServer = wsServer ; } public void r u n T e s t ( ) throws Throwable { long l ; S t r i n g myID = new S t r i n g ( ) ; l = Math . round ( 2 + Math . random ( ) ∗ 3 ) ; Thread . s l e e p ( l ∗ 1 0 0 0 ) ; // c r e a t i n g a web s e r v i c e c o n n e c t i o n WSConnection con = new WSConnection ( new C l i e n t ( ” /home/ g i m i / c e r t s / c l i e n t ” ) , this . wsServer ) ; try { System . o ut . p r i n t l n ( ” \n C r e a t i n g a DAV f o l d e r on node −−” +w s S e r v e r . g e t S e r v e r A l i a s ( ) . t o S t r i n g ( ) ) ; myID = con . c r e a t e D a v D i r ( ) ; System . o ut . p r i n t l n ( ” \n t h e r e t u r n e d ID i s −− ” + myID ) ; System . o ut . p r i n t l n ( ” \n C a l l in g LocalToStore . . . ” ) ; F i l e m y f i l e = new F i l e ( ” /home/ g i m i / f i l e s / i n p u t F i l e . t s t ” ) ; i n t newID = con . l o c a l T o S t o r e ( m y f i l e , ” o u t p u t ” ) ; System . o ut . p r i n t l n ( ” \n New I d i s −− ” +newID ) ; con . s t o r e T o L o c a l ( newID , new F i l e ( ”/ f i l e s / outputFile ” ) ) ; System . o ut . p r i n t l n ( ” \n D e s t i n a t i o n f i l e downloaded . ” ) ; } catch ( WSException e ) { e . printStackTrace ( ) ; } } } // use o f t h e MultiThreadedTestRunner // t o run t h e r e c o r d e d t h r e a d s s i m u l t a n e o u e l y . public void t e s t s P a c k T h r e a d ( ) throws Throwable { // i n s t a n t i a t e t h e TestRunnable c l a s s e TestRunnable t r 1 ; TestRunnable [ ] t r s = new TestRunnable [ 2 0 ] ; t r 1 = new T e s t S c e n a r i o (new S e r v e r ( ” gmsvim01 . g i m i . ox . ac . uk ” , 8 0 8 0 , ” gmsvim01 ” ) ) ; // p a s s t h e c r e a t e d i n s t a n c e s t o // t h e MultiThreadedTestRunner f o r ( i n t i = 0 ; i