of RESTful application development, through the use of semi- automatically constructed WADL interface specifications, without compromising the ease of the ...
2015 IEEE International Conference on Web Services
Using WADL Specifications to Develop and Maintain REST Client Applications Marios Fokaefs and Eleni Stroulia Department of Computing Science University of Alberta, Edmonton, Alberta, Canada Email: {fokaefs,stroulia}@ualberta.ca Abstract—Service orientation is one of the most popular paradigms for developing modular distributed software systems. In spite of the substantial research effort dedicated to the development of methods and tools to support SOAP-based serviceoriented application development, in practice, RESTful services have surpassed SOAP-based services in popularity and adoption, primarily due to the simplicity of their invocation. However, poor adoption of REST specification standards and lack of systematic development tools have given rise to many, more or less compliant, variants of the RESTful style constraints, which undermine the evolvability and interoperability of these systems. In this paper, we describe a tool that supports the systematization of RESTful application development, through the use of semiautomatically constructed WADL interface specifications, without compromising the ease of the overall practice. We illustrate the use and advantages of our tool on real-world REST APIs. Additionally, we comment on how REST APIs are documented, especially in comparison to the auto-generated WADLs.
interactive tools to support service discovery, selection and binding, web-service maintenance and evolution, web-service middleware platforms and more. This line of research has focused primarily on SOAP-based services. The relative scarcity of tool support for engineering REST-based systems is due to the same factor that makes them easy to use: with no need for any intermediate specifications mediating the service and its clients, software-engineering researchers are deprived of the artifacts that are usually manipulated by tools, and of a “natural” place at which to intervene in the servicesystem development process. For example, the interfaces of REST services are usually published as semi-structured HTML pages that do not follow the same standards across different providers and are specified in free text. Furthermore, these specifications may be incomplete; for example, responses are usually specified through simple and incomplete examples. As a result, automated support for tasks such as service discovery, and service evolution is limited for the REST service ecosystem. Although standard interface formats have been proposed for specifying REST services, , like WADL1 or WSDL 2.02 for example, providers keep publishing REST APIs as HTML pages, which are easily understandable by humans but not as easily consumable and reasoned about by software. A challenge, even more important than the informality of the public documentation of REST APIs, is the high degree of variance in their design and evolution. Variation in the REST API-design conventions can occur between different vendors or even within a single API. Consider, for example, the response media type (e.g. JSON or XML): there are services that return only one type (e.g. Twitter3 or Tumblr4 ); yet other services return both types (e.g. Google Maps5 or RottenTomatoes6 ). The type may be specified as an extension to the URL request (Twitter and RottenTomatoes) or as a separate resource or parameter (Google and Tumblr). Another variability dimension concerns the concept of “resource” itself. For example, the Twitter and the Tumblr APIs allow some inputs (e.g. type of post or user ID) to be specified as either
I. I NTRODUCTION Service orientation has been established as one of the most popular paradigms for the development of modular distributed software systems. Run-time dynamic service binding, implementation-information hiding, language interoperability and flexible invocation through HTTP-based protocols are some of the key advantages of the service-oriented architecture style that motivate its adoption by software businesses today. Within the broad domain of service technologies, the REST style[1] and its associated technologies gain ever increasing popularity and adoption. This popularity can be attributed to the fact that REST services are invoked through HTTP (requiring no formal middleware and/or additional knowledge), which makes them very easy to use since most programming languages (especially those popular in the development of web systems) provide libraries to support HTTP-request construction. The increasing popularity of REST services over the more specification-heavy SOAP-based stack of technologies is also evident in the recent developments in the software industry. Big companies like Amazon [2] and Google [3] have discontinued their SOAP APIs in favour of their RESTful counterparts. Amazon has also reported that 80% of the requests to their services comes through their REST API [4] and that querying the services using REST is 6 times faster than with SOAP [5]. Software-engineering research has already contributed numerous methods to supporting the development and maintenance of service systems. There exist many automatic and 978-1-4673-7272-5/15 $31.00 © 2015 IEEE DOI 10.1109/ICWS.2015.21
1 http://www.w3.org/Submission/wadl/ 2 http://www.w3.org/TR/wsdl20/ 3 https://dev.twitter.com/rest/public 4 https://www.tumblr.com/docs/en/api/v2 5 https://developers.google.com/maps/documentation/webservices/ 6 http://developer.rottentomatoes.com/docs
81
resources, in which case they are part of the request body, or as parameter values. These differences increase the service consumers effort to understand these services, make the maintenance of client applications more difficult and costly, and hinder the development of automated tools. In general, such unsystematic variation prohibits developers for consistently following best practices. Systematic methods and automatic tools can alleviate some of these problematic situations, standardize the use of development best practices, and effectively assist the developers of REST client applications. In this work, we argue that the first step to such RESTful application-development standardization is the construction of standard WADL (Web Application Description Language) interface specifications for REST services, in an automatic and efficient manner with no overhead to either providers or clients. To this end, we have developed a tool as part of the WSDarwin toolkit for service client adaptation [6]. The goal of the tool is not to enforce the use of a formal middleware or standard but to enable the generation of respective artifacts to promote the use of existing middleware with relatively low effort overhead from the developers. We demonstrate the applicability and usefulness of the WSDarwin WADL generator on three real-world REST APIs, Tumblr, Google Maps and Wolfram Alpha7 . In addition, we analyse how these APIs are actually documented, compare these documentations to the auto-generated WADL specifications and discuss how the latter can be more useful in automating software engineering tasks. The rest of the paper is organized as follows. Section II introduces the proposed automatic WADL generation tool. In Section III, we present the cases of the Tumblr, Google and Wolfram Alpha APIs, on which we we demonstrate the application of the WADL generator and discuss the differences between existing and generated documentation. In Section IV, we present an overview of research literature related to the evolution and maintenance of service systems focusing more specifically on REST applications. Finally, Section V concludes our work and briefly discusses some of our future plans.
REST purists sometimes argue that WADL violates the philosophy behind the REST style, which is supposed to be flexible and lightweight. For this reason, in this paper, we propose an efficient semi-automatic method to generate WADL interfaces for REST APIs with minimal effort, in order to, first, relieve any specific party (provider or client) from the responsibility of having to generate a WADL interface and, second, generate an interface once, which will continue to be useful many times in the future and for a variety of tasks. Since the tool does not directly impose any formal specification (at least on the provider’s side), but rather complements the development of REST services, we argue that it does not violate the REST philosophy, while facilitating the consumption of these services in an easy and maintainable manner in the longer term. The proposed WADL generator is part of the larger WSDarwin framework [6] for the support of the evolution of web services and the adaptation of service clients, complying with web-services principles and aware of actual development practices. WSDarwin operates under the assumptions that providers and clients share the minimum required information regarding the implementation of their software, i.e., services and applications, and that evolution decisions are made independently, without consultation between providers and clients. As a result, service clients may have to deal with unexpected breaking changes to web services with little information. WSDarwin offers the ability to analyse software, generate middleware assets, and manipulate client applications using only publicly available software artifacts in a systematic and interactive manner. Service evolution and client adaptation are two tasks that can greatly benefit from the existence of a structured and machine-readable specification. First, there exists the capability to compare different versions of a WADL interface efficiently and effectively [7]. The comparison will not only help the developer infer what are the changes in a service from one version to another, but it will make it easier to immediately and accurately understand the change impact on service clients, since the WADL reflects how the service may be used by a client. Especially, in the case where a client proxy has been generated from a WADL interface, the client application can be automatically adapted with considerable efficiency [8]. The WSDarwin WADL generator is built in accordance to the general philosophy of REST services; it is lightweight and easy to use, and it operates under the realistic assumption that some best practices may not necessarily be followed. Its simple interface and degree of automation is conceived to make it a useful tool for service developers and client developers alike. Service developers can use the application to provide additional information to their clients about the application in an efficient and cost-effective manner. Service clients can obtain more information about a service even when this information is not provided by the service developers. We have implemented the WADL generation on the WSDarwin platform. Primarily, the tool is offered as part of a
II. T HE WSDarwin WADL GENERATOR In order to facilitate the documentation of REST APIs, abstract away the variability in defining and documenting them, and support a number of software-engineering tasks related to RESTful application development, we propose the use of standardized machine-readable WADL interface specifications. Although WADL is not widely adopted, it is a W3C recommendation with substantial support by Apache8 and Oracle9 , just to name a few organizations. There are a number of efforts that take advantage of the machine-readability of WADL to propose tools to automatically extract information from the service interface to generate other software artifacts including client proxies in various languages, documentation and test cases. 7 http://products.wolframalpha.com/api/documentation.html 8 http://cxf.apache.org/docs/jaxrs-services-description.html 9 https://wadl.java.net/
82
web application10 complemented with a client proxy generator, a WADL interface comparator, and a service mapper. With respect to the WADL generator, the application offers an interactive interface so that the developers can edit the produced WADL specification. Eventually, the user can download the WADL file and the client proxy locally.
in the hundreds or thousands) to be issued to the API per day. Moreover, every API allows a number o requests (usually around a dozen) to be issued per second, before shutting down access. Our method falls way below these limits as it will be discussed later. The input URLs are analysed using the rules described in the RestDescribe tool [9], which is also designed as a WADLinterface generation tool. Figure 1 summarizes these parsing rules.
A. WADL Generation In order to generate the WADL interface of a REST API, WSDarwin requires as input one or more URLs corresponding to API requests, at least one request per method. This input requirement is easily available; such request examples are typically part of the API documentation and, even if they were not, client-application developers have to construct such requests anyway to test the service. Arguably, the WSDarwin WADLconstruction method generates a “complete” interface specification with very few examples, since it is able to generalize over combinations of parameter values or classes of parameters with variable identifiers, like accounts, usernames and so on. A na¨ıve method to construct input requests is to copy the ones that are provided as examples in the APIs documentation. Although such examples are not always provided, it is a popular convention to assist clients, which the user of the generator can take advantage of. However, these examples may not cover all the service methods and be insufficient to produce the complete range of responses returned by the service. A more systematic method of constructing requests in order to increase coverage would involve creating a request per method. Then, we append all the possible input parameters. It is not necessary to provide values for optional parameters. The responding service, when invoked, will assume that these parameters are missing and will assign them default values. The generator will register them and include them in the request part of the method. The risk here is that, without any example values, the generator will assume the type of any (and all) optional parameter(s) is “string”. Users can mitigate this risk by explicitly providing a default value, if one is specified by the provider and if the type of this value is anything but string. Another issue that requires attention is how to cover the range of the returned response data. If the input requests (given their particular values) do not produce the whole spectrum of possible responses, the user may need to create additional requests to cause the service to return a complete set of response types. We encountered a case like this in the Tumblr and Google Maps case studies, which we will discuss in the next section. Clearly, the more URLs are given to WSDarwin, the more complete the generated interface specification will be, but at a larger effort cost on the part of the user. There can be practical implications with respect to the number of provided requests for the API, for example, costs to issue requests or detection of denial-of-service attacks, if the requests come to often. Fortunately, most APIs tend to have a free-tier, where they permit a number of requests (usually
Fig. 1. The analysis of an input URL in its components
The first part (authority) corresponds to the service endpoint, i.e., the address of the service. In the WADL specification, the endpoint is the base attribute of the resources element. The next part of the URL (path) specifies the individual resources, whose paths are separated by a slash (“/”). The individual resources are organized hierarchically in the WADL file and each resource is nested within the previous one. Each URL request is a method that is added to the last resource specified by the URL. The HTTP operation of the method, i.e., GET, PUT, POST, DELETE, is not part of the URL and has to be specified explicitly by the user. When resources are followed by a question mark (“?”), the implication is that the method corresponding to the resource has parameters, which come in name-value pairs, separated with an ampersand (“&”). These parameters and their inferred types (the type-resolution heuristic method is described in detail in Section II-A1) are added to the request part of the method. Once the requests have been analysed, they are used to invoke the service and obtain the corresponding responses, which are returned either as XML or as JSON. In cases where the format of the response is not explicitly specified, WSDarwin automatically examines the response instance, automatically infers its format, and specified the mediaType of the response element in the WADL file. The response is then parsed and analysed to infer its underlying schema. The identified types of the output parameters and their structure are added as an XML schema in the grammars element of the WADL file. A representation element is added in the response part of the WADL method with a reference to the corresponding type in the grammars part. The generated WADL file is now available to the developer, who can further edit it by changing the attributes of an element, or adding and removing elements, or editing attribute types. A significant feature of WSDarwin is its ability to batch process multiple requests for a service. This enables the tool to resolve parameter types with high confidence, to identify enumeration types, and to identify resources with variable identifiers. The assumption that the user of the tool will be
10 http://ssrg17.cs.ualberta.ca/wsdarwin/
83
able to easily provide multiple URLs is quite realistic. A second important feature of WSDarwin’s on-demand WADL generation is that the produced WADL specifies only the parts of the service used by the client application (and specified by the provided requests). This results in a concise and compact interface without unnecessary data, a fact that can facilitate the maintenance of the client application. While this may somewhat limit the extendibility of the client application (a service method not currently used might still become useful in the future), the WADL-generation process is so simple that it can be invoked at any point in time to produce a new WADL to meet the current client-application requirements. 1) Type Resolution: WADL specifications must include type information because certain programming languages, for which client proxies can be generated, explicitly require them. Given that REST requests and responses do not explicitly specify their parameter types, the inference of a parameter’s type by analysing its value is a critical part of the WADLgeneration process. WSDarwin can identify a variety of primitive types as specified in the W3C definition for the XML Schema 11 : string, double, float, long, int, short, byte, dateTime, date, boolean and anyURI. Each type is determined based on a regular expression and, if necessary, by a set of specific conditions (Table I). Given the value of a parameter, the expressions are checked starting from the most specific type and moving towards the most general one. In essence, everything can be expressed as a string and every number can be expressed as a double. So, these general types are examined last. Numeric types differ from each other based on the specific range of numbers they cover. Therefore, numeric values are examined against additional conditions, in order to infer the particular range in which their values belong. If a parameter is a list of values, then a complex type is created for the list and an XSD element is added to the type for the contents of the list. The type of the element is determined as previously and an additional attribute is added (maxoccurs=‘‘unbounded’’). When processing multiple URLs, if more than one different type is inferred for a particular parameter, the most generic of all the identified candidate types is associated with the parameter (e.g., string is more generic than int), under the assumption that the generic type can subsume the values of all the more specific types. When presenting the final WADL specification inferred, the tool also presents its level of confidence for each identified type, where confidence is measured as the percentage of the total requests processed. For example, if a parameter was found to be string in 7 out of 10 processed URLs, then it is reported as string with 70% confidence. 2) Enumerations: Enumerations are a special type, whose values are restricted within a predefined set. In a WADL specification, enumerations can be implemented in one of two ways: either as option elements of a param or as simpleType
TABLE I R EGULAR EXPRESSIONS TO IDENTIFY PARAMETER TYPES . Type double
float
long int short byte dateTime date boolean anyURI
email
Regex ˆ [-+]?[0-9]+[.]?[0-9]* ([eE][-+]?[0-9]+)?$ AND parseDouble == TRUE ˆ [-+]?[0-9]+[.]?[0-9]* ([eE][-+]?[0-9]+)?$ AND parseFloat == TRUE ˆ [-+]?\\d*$ AND parseLong == TRUE ˆ [-+]?\\d*$ AND parseInteger == TRUE ˆ [-+]?\\d*$ AND parseShort == TRUE ˆ [-+]?\\d*$ AND parseByte == TRUE ˆ (\\d{4})-(\\d{2})-(\\d{2})[T]? (\\d{2}):(\\d{2}):(\\d{2})[Z]?$ ˆ (\\d{4})-(\\d{2})-(\\d{2})$ true|false \\b(https?|ftp|file):// [-a-zA-Z0-9+&@#/%?=˜_|!:,.;]* [-a-zA-Z0-9+&@#/%=˜_|] ˆ [_A-Za-z0-9-\\+] +(\\. [_A-Za-z0-9-]+)* @[A-Za-z0-9-]+ (\\.[A-Za-z0-9]+)* (\\.[A-Za-z]{2,})$
in the XML schema with a restriction element that contains the possible values as enumeration elements. The WSDarwin WADL generator adds param elements for method requests, since requests usually have a limited number of input parameters (if any at all), and representation elements that refer to the XML schema simpleTypes for method responses, because responses may be as long as a JSON or XML file. Identifying enumerations is a hard task, given that they manifest themselves through their values. The occurrence of a value more than once may be an indication of an enumeration or it may just be a coincidence. Nevertheless, the identification would require a prohibiting number or input requests to cover all possible values, with a very small gain in terms of the interface coverage. For this reason, WSDarwin does not identify enumerations but rather provides an automated refactoring to transform a simple element to an enumeration; a simpleType with multiple enumerations. The user can then manually specify the values for the enumerations. Enumerations play an important role in the service interface, since they restrict the value range of specific parameters only to those valid values. 3) Resources with Variable ID: WSDarwin’s ability to recognize resources with variable identifiers and appropriately specify them in the WADL specification is key to the construction of concise and compact interface specifications. The alternative would be to record each instance of a resource class (e.g. each different account identifier under the accounts resource) as a different resource, which could result in a long interface depending on the number of instances. Resource classes are identified during the batch processing of the URL requests. The various resource paths comprising the URL are compared and counted and the tool looks for systematic differences in the same part of the URL. For every such path, its prefixes and suffixes in every URL are
11 http://www.w3.org/TR/xmlschema-2/
84
also checked. If the path component that differs between URLs is found to have a common prefix in all the examined URLs, and a common suffix in all the URLs that are at least as long (in terms of number of path components) as the longest examined URL, then this path component is considered to correspond to a resource class. The reason for the last condition is because it is expected for the examined URLs to have different lengths (covering different parts of the API) and, thus, it is unrealistic to expect that a path will have a common suffix in all the examined URLs. Furthermore, a path is not examined with respect to its variable identifier, if it occupies the last position in the longest examined URL, since this might just correspond to different resources instead of a class of resources. Eventually, different paths that are surrounded by the same resources are clear indications of resources with variable identifiers. Similarly to the enumerations, a positive identification of a variable ID may be true or coincidental (as in the case of Google Maps APi, which we will discuss later). Therefore this part of the generation is still under consideration. Once a resource with variable identifier is found, it is specified accordingly in the WADL document. A resource is added, whose path attribute is the term resource along with the position of the path in the URL surrounded by curly brackets, e.g. {resource3}. Moreover, a param is added in the resource with the same id as the path and its style is set to template (instead of query, which is the norm for input parameters). This parameter is used to indicate to the middleware, which will generate the client proxy based on the WADL interface, that a concrete identifier needs to be specified before trying to access the particular resource of the service.
TABLE II R ESULTS OF THE WADL API resources operations input output WSDarwin WSDarwin WSDarwin WSDarwin WSDarwin not found extra implied
requests resources operations input output
GENERATOR ON REAL
Tumblr 15 12 88 56 13 15 12 88 126 6 76 7
Google 17 10 82 288 11 17 10 82 270 34 16 14
REST API S .
Wolfram 3 2 26 30 11 3 2 26 52 3 25 18
Wolfram* 3 2 26 30 2 3 2 26 33 6 9 4
successfully test the clients using the input parameters we used for the generation of the WADL specifications. A. Tumblr Tumblr has one of the most well-structured documentations we have encountered in our study. It documents input and output parameters for each method in tables with their names, a general type for each parameter (text, number, boolean), also specifying if a parameter is an array of values, and specific comments for each parameter on how they can be specified and what their effect is on the response. Tumblr also completely documents the response parameters, also providing examples of the returned JSON file. For our study, we prepared 13 requests to access the blog posts. The reason for this convenient limitation is that the blogpost requests are GET requests that only require an API key for authorization, while requests for other resources require authentication through OAuth2.0. These requests correspond to all 12 operations for these resources, with an additional request to identify the variable identifier that corresponds to the blog name. Using the 13 requests, on blogs provided as examples from the API’s documentation or on some of the most popular Tumblr blogs, WSDarwin was able to generate all resources, their operations, and their input parameters. With respect to response parameters, WSDarwin missed six parameters (11% of the complete set of documented parameters). The reason for these misses was primarily that some of the blogs did not actually have posts of a particular type (most notably audio and video), thus depriving the tool from responses that included data about music albums and artists. A more careful choice of blogs might have produced perfect results, but the extra effort might have been more than the anticipated benefit to be of any merit. In fact, this case demonstrates the sensitivity of the method to the provided input requests. Nevertheless, the generator was able to produce 76 additional output parameters (a staggering 135% over the documented ones). It is not clear why such a large number of parameters were not documented, although 7 of them were implied in the documentation, which states that if a request specifies the reblog_info parameter, the response contains the corresponding data. This is a first indication that
III. C ASE S TUDIES We used the WSDarwin WADL generator to produce WADL interface specifications for three APIs: Tumblr, Google Maps, and Wolfram Alpha. Table II summarizes the results of applying the WADL generator on the three APIs, with respect to the documentation provided by the providers, the structure of the WADL, and the schema corresponding to the API responses. We ran the tool twice for Wolfram, once with the na¨ıve construction of input URLs and another with the systematic way to see with which of the two we can achieve the largest coverage. In order to ensure that the generated WADL specifications are functional, we checked their validity against the WADL schema12 using an online validator13 . All produced WADL files were syntactically valid. Second, using these speciifcations, we were able to generate fully compilable and executable client proxies in Java using the Glassfish WADL2Java tool. In order to execute these clients, we first had to bind it with Jersey14 , the Oracle toolkit to develop REST services in Java as part of the Glassfish project. We were able to 12 http://www.w3.org/Submission/wadl/wadl.xsd 13 http://www.freeformatter.com/xml-validator-xsd.html 14 https://jersey.java.net/
85
REST APIs may be under-documented depriving users from exploiting the full potential of the API. Another issue we encountered was that Tumblr provides a large set of common parameters for all types of blog posts and a smaller set of parameters specific to each type. Due to the great similarity between the responses, and the fact that there is no explicit name for the surrounding return type of each method in the JSON file, WSDarwin interpreted all the responses as being of the same type and merged the various parameters under the same complex type. Although this may not exactly correspond to the conceptual model of the API, it effectively reduces duplications and produces a more concise interface.
relative to the branching factor (how many subresources and methods each resource has) and the depth of the interface. Every time a request is processed, a tree is created and it is compared and merged to the one created in the previous step. Each node of the two trees is compared and merged once (a node is not revisited once it has been merged). Therefore, the total time for the generator equals the execution of a DFS for the interface tree times the number of the input requests. In practice, this time is negligible even for a medium-sized set of input requests which makes the generator a practical tool that can be invoked on demand and frequently without significant effort or overhead.
Fig. 3. The autogenerated client proxy for the Tumblr API.
B. Google Maps Google separates the documentation of the Maps API in different sections for each operation as a distinct API, although it refers to different resources of the one API under the same service endpoint. The resources we have studied include directions, geocoding, distance matrix, time zone and elevation. Unlike Tumblr, Google returns responses to the API in two formats; XML and JSON. The two formats are documented as separate resources with a GET method each. Therefore we have 17 resources (5 API resources, with two formats each and two additional resources as part of the service endpoint) and a total of 10 operations. In total, the operations receive 82 parameters as input (41 for each format). The request parameters are well-structured in the Google documentation, but they lack the explicit definition of their type, as in the case of Tumblr. Google documents the name of a parameter and a short description that implies its type and details about its purpose and its effect on the response. The description may also hide some extra parameters, if the parameter represents a complex object. Response parameters (288 in total, 144 for each format) are similarly documented. We constructed 11 requests to generate a WADL specification for the Google Maps API, one for each operation
Fig. 2. WADL specifications produced for the Tumblr API by WSDarwin.
Figure 2 shows part of the WADL specification (focusing on the service part and omitting the schema details) produced for the Tumblr API by WSDarwin. A detail that can be observed from the figure is the resource after “blog” that is enclosed within curly brackets to denote a resource with variable ID. Additionally, a parameter of style template is added to guide clients in the invocation of the particular resource. The structure of the generated proxy for the Tumblr API is shown in Figure 3. One can see the nested resources of the REST service and the getAsJson methods to access the posts resource. We ran the interface generator 10 times and the average execution time was approximately 16.5 ms (st. dev. 3.5 ms). These time measurements indicate two things. First, the execution time for the generator depends on the number of the input requests and the complexity of the produced interface. This is expected since the interface generator has the time complexity of a depth-first search (DFS) algorithm. The WADL interface has the structure of a tree and the complexity of the DFS is linear to the number of its edges, or in the case of WADL
86
(practically one for each format of the 5 resources) and an additional one for the Geocoding API. The reason for the last request is that the Geocoding API has an additional piece of functionality that is not documented as a separate operation. The API receives an address as input and returns the geographical coordinates of the address as a response. It also offers the reverse process, where the user provides a set of coordinates and the service returns a formatted address. The reverse process required an additional request so that WSDarwin could capture a few more parameters. It is worth noting here that given the structure of the Google requests (same prefix and similar suffix, json or xml) for each resource, WSDarwin initially perceived the various APIs as the same resource with variable IDs, so we had to manually override this functionality and ignore it for this API. This means that this part of the tool is still in progress. Once again, WSDarwin was able to generate all resources, their operations and the input parameters in their entirety, since we provided a set of input requests of adequate number and quality. As far as the output is concerned, WSDarwin was not able to find 34 parameters out of the 288 in total (about 12%), concerning mostly fare data for transit and vehicle data. However, it did find 16 undocumented parameters. Of these, 14 parameters were implied in the documentation as parts of more complex objects. For the Google API, we manually provided a specific value other than the default for a parameter; we specified travel_mode as transit for some requests, because we knew that such requests provide some additional parameters.
In retrospect, and after the experience we had with the first execution for the Wolfram API, we consulted the structured part of the documentation and we were able to construct two more comprehensive requests, one for each operation. With the systematic construction of the requests, we were able to identify all resources, operations and input parameters, but in this case only 24 out of the 30 output parameters (80%), with 9 extra parameters, 4 of which were implied by the documentation. IV. R ELATED W ORK Our work is mainly related to the development and evolution of REST applications. With the rise of REST services as dominant software components, the interest for their evolution and their software ecosystems has also peaked and this has spawned a number of research works around these topics. More specifically, in this work, we discuss a number of empirical studies around the evolution of REST and web APIs. Wang et al. [10] perform an empirical study on how REST APIs evolve. They take the social approach and argue about the changes on a variety of REST APIs based on the discussions these changes raised among client developers in StackOverflow. The authors recognize similar changes to those we report in this paper, mainly changes in the requests and the responses of the APIs. Additionally, they report changes in the authentication method of the API, which can be considered a special change in the request of the API and changes in the rate limit, i.e. how often a client can access the API, which is usually something that is specified in the Service Level Agreement (SLA) and not in the service interface. According to the findings of the study, adding new methods raised the most questions in StackOverflow, although without a clear justification by the authors, while deleting existing methods produced the longest discussions, since this is a breaking change. Li et al. [11] present another empirical study on the evolution of web APIs how it affects the clients, but they focus more on the technical aspects of the problems. Their data and arguments are derived by examining the native client development toolkits on various programming languages provided by the service vendors. This way they can actually argue on the impact of the changes on client applications and about which ones of them can be addressed automatically by client developers. Once again their findings are in accordance with our claims and our findings in our previous work on the evolution of SOAP services [12] both in terms of the identified changes as well as about their impact. Finally, Espinha et al. [13] present a very comprehensive study on the evolution of web APIs both from a social and a technical perspective. They conduct interviews with developers of large and popular REST APIs and separate interviews with developers of client applications of these APIs. Among other things, these interviews revealed certain evolution policies from the providers, like Twitter’s blackout tests and Google’s extensive grace period for client migration to a new version of
C. Wolfram Alpha The Wolfram Alpha was the most poorly documented of the three we studied. The documented elements are presented as they are encountered in examples of more complicated requests. At the end of the documentation, there is a more structured presentation of the input and output parameters with their names and a description similar to the Google documentation. The design quality of the API is also questionable. In spite of the small number of operations (2), there are a lot and diverse input and output parameters implying that these two operations actually cover a large number of functionalities, effectively provide a very incohesive interface. Further evidence to the API’s poor design and documentation is that the two methods are named query and validatequery, which do not provide any insight to their actual behaviour, while one of the input parameters (the basic one) is called simply input. Due to the complexity of the design and the documentation, we used the na¨ıve approach of constructing input requests from the documentation examples. This resulted in 11 requests that correctly identified the 3 resources, 2 operations and 26 input parameters for the API. The generator was able to identify 27 out of the 30 documented output parameters (90%, missing three parameters indicating errors) and additionally 25 parameters that were not documented in the structured part of the documentation, but 18 of them were implied in the unstructured part.
87
the API, policies which were generally received with appreciation by the client developers. Then, the authors examined two open source web APIs and their clients, which are also open source. The access to the source code of both the services and the clients gave the opportunity to the authors to examined the dependencies between the two software systems, when the service evolves. This study validates two of our most important claims; first, that there are great variations and inconsistencies on how REST applications are developed and evolve and, second, that these evolution strategies actually create strong dependencies between providers and clients. Another set of relevant works concerns our contribution of an automatic tool to generate WADL service interfaces for REST APIs. RestDescribe15 is a web application developed by Thomas Steiner [9]. The tool works very similarly to WSDarwin; it receives one or more URL requests, which are parsed to generate the WADL interface. The interface is presented in an editor, so that the user can manually change the WADL by adding, removing or changing elements. Finally, the tool offers the generation of a client proxy based on the generated proxy on a number of programming languages. Despite their similarities, RestDescribe has several shortcomings compared to WSDarwin. First, it does not exercise the service with the provided requests and as a result no response element is produced, and by extend no schema is inferred for the service, although there is functionality to infer a schema if the user manually provides the response elements. Second, when batch analysing requests, the tool doesn’t merge common elements but rather appends the analysed elements in the same WADL. As a result, the tool is also incapable to recognize resources with variable identifiers. Another tool that has the capability of producing WADL interfaces is soapUI16 . Unlike RestDescribe, soapUI does exercise the REST service given a URL request and as a result it can infer the XML schema of the service. However, unlike WSDarwin, soapUI can only process one request at a time and it has no batch processing option. Furthermore, the capabilities of the tool stop at the schema inference and it cannot produce a complete WADL from the URL request. Finally, soapUI cannot recognize certain types that WSDarwin can, including anyURI and email.
and Wolfram Alpha. We validated the syntactic correctness of the generated WADL documents against the WADL schema and their functional validity by generating compilable and runnable client proxies for Java applications. Going beyond the validation of our WSDarwin WADLgeneration tool, these case studies constitute a second, more general contribution to the field: through these studies, we gained interesting insights on how REST APIs are actually documented. We saw large variations between providers and, more generally, we encountered incomplete and ambiguous documentation, verifying our original assumption that current API design and documentation practices are not very systematic. Therefore, the automatically generated WADL interface specifications are even more useful as API documentation and, at the same time, they provide machine-readable software artifacts to support the development and maintenance of REST client applications. The prototype we have built for the WADL generator is functional but still under development. We plan to further extend it and conduct additional experiments with real developers to assess its usability. ACKNOWLEDGMENT The authors would like to acknowledge the generous support of NSERC, iCORE, and IBM. R EFERENCES [1] R. T. Fielding, “Architectural styles and the design of network-based software architectures,” Ph.D. dissertation, University of California, Irvine, 2000. [2] Monsoon Stone Edge User Forum, “Amazon soap being discontinued,” http://www.stoneedge.net/forum/pop printer friendly.asp?TOPIC ID= 12687, June 2011. [3] E. Tholom´e, “A well earned retirement for the soap search api,” http://googlecode.blogspot.ca/2009/08/well-earned-retirement-forsoap-search.html, August 2009. [4] T. Anderson, “Ws-* vs the rest,” http://www.theregister.co.uk/2006/04/ 29/oreilly amazon/, April 2006. [5] A. Trachtenberg, “Php web services without soap,” http: //www.onlamp.com/pub/a/php/2003/10/30/amazon rest.html, October 2003. [6] M. Fokaefs and E. Stroulia, “The WSDarwin Toolkit for Service-Client Evolution,” in Proceedings of the 2014 IEEE Internation Conference on Web Services, Work In Progress (ICWS’14 WIP. Anchorage, Alaska, USA: IEEE, 2014, pp. 716–719. [7] ——, “Wsdarwin: Studying the evolution of web service systems,” in Advanced Web Services. Springer, 2014, pp. 199–223. [8] ——, “Wsdarwin: Automatic web service client adaptation,” in CASCON ’12, 2012. [9] T. Steiner, “Automatic multi language program library generation for rest apis,” Ph.D. dissertation, 2007. [10] S. Wang, I. Keivanloo, and Y. Zou, “How do developers react to restful api evolution?” in Service-Oriented Computing. Springer, 2014, pp. 245–259. [11] J. Li, Y. Xiong, X. Liu, and L. Zhang, “How does web service api evolution affect clients?” in Web Services (ICWS), 2013 IEEE 20th International Conference on. IEEE, 2013, pp. 300–307. [12] M. Fokaefs, R. Mikhaiel, N. Tsantalis, E. Stroulia, and A. Lau, “An Empirical Study on Web Service Evolution,” in Proceedings of the 2011 IEEE International Conference on Web Services, ser. ICWS ’11, Washington, DC, USA, 2011, pp. 49–56. [13] T. Espinha, A. Zaidman, and H.-G. Gross, “Web api growing pains: Loosely coupled yet strongly tied,” Journal of Systems and Software, 2014.
V. C ONCLUSIONS In this work, we present a tool for automatically generating WADL specifications for REST services, as part of the WSDarwin platform. The tool analyses REST requests and the corresponding responses, to infer a specification of the service interface. The client can then use this specification to perform a number of tasks, including to generate client proxies for a variety of programming languages, and to compare different versions of the service to understand its evolution. To validate the usefulness of this tool, we used it to infer specifications of three real-world REST APIs, namely Tumblr, Google Maps, 15 http://tomayac.com/rest-describe/latest/RestDescribe.html 16 http://www.soapui.org/
88