Prevalence and Maintenance of Automated Functional Tests for Web ...

6 downloads 75888 Views 2MB Size Report
A study on Adobe's. Acrobat Reader found that 74% of its test scripts get broken between two successive releases [20]. Worse, even the simplest individual GUI ...
Prevalence and Maintenance of Automated Functional Tests for Web Applications Laurent Christophe∗ , Reinout Stevens∗ , Coen De Roover†∗ , Wolfgang De Meuter∗ ∗ Software

Languages Lab, Vrije Universiteit Brussel, Brussels, Belgium Engineering Laboratory, Osaka University, Osaka, Japan Email: lachrist—resteven—cderoove—[email protected]

† Software

Abstract—Functional testing requires executing particular sequences of user actions. Test automation tools enable scripting user actions such that they can be repeated more easily. S E LENIUM , for instance, enables testing web applications through scripts that interact with a web browser and assert properties about its observable state. However, little is known about how common such tests are in practice. We therefore present a crosssectional quantitative study of the prevalence of S ELENIUMbased tests among open-source web applications, and of the extent to which such tests are used within individual applications. Automating functional tests also brings about the problem of maintaining test scripts. As the system under test evolves, its test scripts are bound to break. Even less is known about the way test scripts change over time. We therefore also present a longitudinal quantitative study of whether and for how long test scripts are maintained, as well as a longitudinal qualitative study of the kind of changes they undergo. To the former’s end, we propose two new metrics based on whether a commit to the application’s version repository touches a test file. To the latter’s end, we propose to categorize the changes within each commit based on the elements of the test upon which they operate. As such, we are able to identify the elements of a test that are most prone to change.

I.

I NTRODUCTION

Testing is of vital importance in software engineering [22]. Of particular interest is functional GUI testing in which an application’s user interface is exercised along requirement scenarios. Functional GUI testing has recently seen the arrival of test automation tools such as HP Quick Test Pro, SWTB OT1 , ROBOTIUM2 and S ELENIUM3 . These tools execute socalled test scripts which are executable implementations of the traditional requirements scenarios. Test scripts consist of commands that simulate the user’s interactions with the GUI (e.g., button clicks and key presses) and of assertions that compare the observed state of the GUI (e.g., the contents of its text fields) with the expected one. Although test automation allows repeating tests more frequently, it also brings about the problem of maintaining test scripts: as the system under test (SUT) evolves, its test scripts are bound to break. Assertions may start flagging correct behavior and commands may start timing out thus precluding the test from being executed at all. A study on Adobe’s Acrobat Reader found that 74% of its test scripts get broken between two successive releases [20]. Worse, even the simplest individual GUI change can cause defects in 30% - 70% of a 1 http://eclipse.org/swtbot/ 2 https://code.google.com/p/robotium/ 3 http://docs.seleniumhq.org/

system’s test scripts [12]. For diagnosing and repairing these defects, test engineers have little choice but to step through the script using a debugger [3] —an activity found to cost e.g., the Accenture company $120 million per year [12]. Several automated techniques for repairing broken test scripts have therefore been explored: [19], [15], [4], [12], [3]. These techniques apply straightforward repairs such as updating the arguments to a test command (e.g., its reference to a widget) or a test assertion (e.g., its comparison value). However, little is known about the kind of repairs that developers perform in practice. Insights about manually performed repairs are therefore of vital importance to researchers in automated test repair. In this paper, we present an empirical study on the prevalence and maintenance of automated functional tests for web applications. More specifically, we study functional tests implemented in Java that use the popular S ELENIUM library to automate their interactions with the web browser. Section II will outline the specifics of this library. Our study aims to answer the following research questions: RQ1

RQ2

RQ3

How prevalent are S ELENIUM-based functional tests for open-source web applications? To what extent are they used within individual applications? Do S ELENIUM-based functional tests co-evolve with the web application? For how long is such a test maintained as the application evolves over time? How are S ELENIUM-based functional tests maintained? Which parts of a functional test are most prone to changes?

The remainder of this paper is structured as follows. Section II identifies the components of a typical automated functional test that is implemented using S ELENIUM. Examples include test commands (e.g., sending keystrokes to the browser) and test assertions (e.g., comparisons against the browser’s observable state). Section III seeks to answer RQ1 through a quantitative analysis. To this end, we gather a large-scale corpus of 287 Java-based web applications with references to S ELENIUM. We classify the projects in this corpus and study the extent to which they use functional tests in terms of relative code sizes. Answering the other research questions requires a corpus that is representative of projects with an extensive usage of S ELENIUM. We therefore refine the large-scale corpus into a smaller high-quality corpus of 8 projects that feature many functional tests. Section IV

Language Java Python Ruby C# JavaScript

Keyword openqa.selenium from selenium import webdriver require selenium webdriver using OpenQA.Selenium require selenium webdriver

# Repositories 4287 1800 1503 558 237

# Files 113435 6207 9169 14473 643

TABLE I: Search hits for S ELENIUM on GitHub at 9/12/2013. answers RQ2 through a quantitative analysis of every commit in their version repository. We complement this analysis with a discussion of illustrative “Change History Views” [23] for a selection of projects. Finally, Section IV answers RQ3 through an automated qualitative analysis of the changes to S ELENIUMbased tests within each of these commits. To this end, we categorize each change based on the test component upon which it operates.

Driver-related Locators

webElement.findElement(By.ById("..."))

Inspectors

Demarcators

Assertions

Table I depicts the popularity of S ELENIUM amongst different programming languages of projects available on GitHub. This table is populated by looking for projects that contain the appropriate import statements. The fact extractors for our experiments will scope our results to web applications and tests that are implemented in Java. Moreover, we only consider tests that use the API directly and not through some third-party framework. Such tests require an import of the form import org.openqa.selenium at the top of the compilation unit in which they reside. In the remainder of this paper, we will refer to such files as S ELENIUM files. Any project that contains at least one S ELENIUM file will be called a S ELENIUM project. A. S ELENIUM By Example: a GitHub Scraper Listing 1 depicts a Java program that illustrates most of the S ELENIUM WebDriver API for Java. This API provides functionality for interacting with a web browser (e.g., opening a URL, clicking on a button, or filling a text field), and functionality for inspecting the website that is currently displayed (e.g., retrieving its title or a particular element of its DOM). Table II gives an overview of the typical components of a S ELENIUM-based functional test. Assertions are missing from the depicted program as it is not a functional test, but rather a scraper for the GitHub website. The program’s main method opens a connection to the web browser on line 47. Line 54 invokes method fetchUrl with this connection and the URL of a GitHub page with search results. Line 32 directs the web browser to this URL. Table II refers to such navigation requests and user actions as commands. Assuming the page rendered correctly, line 33 4 http://www.w3.org/TR/2013/WD-webdriver-20130117/

retrieving properties of a DOM e.g., element.getAttribute("..."),

Navigation requests and interface interactions; e.g., driver.get("..."), navigation.back(), element.click(), element.sendKeys("...") Means for demarcating actual tests and setting up or tearing down their fixtures, typically provided by a Unit Testing framework; e.g., Test, BeforeClass Predicates over values, typically provided by a Unit Testing framework ; e.g., assertTrue(element.isDisplayed()), assertEquals(element.getText(), value)

Exception-related

S ELENIUM FOR AUTOMATED F UNCTIONAL T ESTING

Several open-source frameworks for automating functional tests have become available. Examples include ROBOTIUM for testing mobile Android applications, SWTB OT for testing SWT-based desktop applications, and S ELENIUM for testing web applications respectively. Apart from the targeted applications, these frameworks are very similar. Each provides an API for scripting interactions with an application’s user interface and asserting properties about its observable state. Our study focuses on S ELENIUM because its API is representative and it has grown into an official W3C standard called WebDriver4 .

Expressions element;

element.isDisplayed()

Commands

Constants

II.

Expressions opening and closing a connection to the web browser; e.g., new ChromeDriver(), driver.close() Expressions locating specific DOM elements; e.g., driver.findElements(By.ByName("...")),

Means for handling exceptions that stem from interacting with a separate process. e.g., StaleElementReferenceException, TimeOutException Constants specific to web pages such as identifiers and classes of DOM element.

TABLE II: Components of a S ELENIUM-based functional test. retrieves all h1 elements in its DOM. We refer to such calls as locator expressions. Through a similar inspector expressions, line 36 verifies that there is no text "Whoa there!" within these elements. In this way, the program checks that it has not exceeded GitHub’s requests per minute threshold. It then proceeds to locate all div elements that include the class code-list. Those elements correspond to the hits shown on each search page; a matching source code fragment, and the file and repository it belongs to. Lines 24 and 25 extract this information from each element. Note that our S ELENIUM-based scraper is strongly coupled to the user interface of GitHub. Small changes to GitHub web pages can easily break the scraper. For instance, a change to the "Whoa there!" text will cause it to keep on exceeding the threshold and miss hits. Likewise, changes to the element class that labels search hits will also cause our scraper to miss hits. S ELENIUM-based functional tests are prone to the same problem. The problem arises every time the interface of the system under test is changed. III.

P REVALENCE OF S ELENIUM T ESTS

RQ1 How prevalent are S ELENIUM-based functional tests for opensource web applications? To what extent are they used within individual applications?

Table I gives an informal account of the popularity of S ELENIUM among the projects that are hosted on GitHub. In this section, we provide more in-depth insights about the usage of S ELENIUM within these projects. A. Large-Scale Corpus of Projects that use S ELENIUM We will answer RQ1 in an exploratory manner using a large-scale corpus of Java-based S ELENIUM projects. We gather this corpus using a variant of the GitHub scraper explained above. This variant not only respects the threshold on requests per minute, but also accounts for the fact that GitHub search pages show at most 1000 results (100 pages of 10 hits). To overcome this limitation, the variant implements a sliding window that limits all search requests to files of a window-wise increasing size. Scraping GitHub for the initial 4287 candidate repositories for our corpus (cf. Table I) took most of the week of December

1 import j a v a . u t i l . I t e r a t o r ; 2 import j a v a . u t i l . L i s t ; 3 4 import o r g . o p e n q a . s e l e n i u m . By ; 5 import o r g . o p e n q a . s e l e n i u m . WebDriver ; 6 import o r g . o p e n q a . s e l e n i u m . WebElement ; 7 import o r g . o p e n q a . s e l e n i u m . f i r e f o x . F i r e f o x D r i v e r ; 8 9 public class Scraper { 10 11 public s t a t i c c l a s s RateLimitException extends Exception { 12 p u b l i c R a t e L i m i t E x c e p t i o n ( S t r i n g msg ) { 13 s u p e r ( msg ) ; 14 } 15 } 16 17 p r i v a t e s t a t i c v o i d p r i n t H i t s ( WebDriver d r i v e r ) { 18 S t r i n g x p a t h = ” / / d i v [ @ c l a s s = ’ code−l i s t ’ ] / ∗ ” ; 19 L i s t h i t s = d r i v e r . f i n d E l e m e n t s ( By . x p a t h ( x p a t h ) ) ; 20 I t e r a t o r i t e r = h i t s . i t e r a t o r ( ) ; 21 while ( i t e r . hasNext ( ) ) { 22 xpath = ” . / p [ @class = ’ t i t l e ’ ] / a ” ; 23 L i s t l i n k s = i t e r . n e x t ( ) . f i n d E l e m e n t s ( By . x p a t h ( x p a t h ) ) ; 24 S t r i n g repositoryURL = l i n k s . get (0) . g e t A t t r i b u t e ( ” href ” ) ; 25 S t r i n g fileURL = l i n k s . g e t ( 1 ) . g e t A t t r i b u t e ( ” h r e f ” ) ; 26 System . o u t . p r i n t l n ( r e p o s i t o r y U R L + ” ” + f i l e U R L ) ; 27 } 28 } 29 30 p r i v a t e s t a t i c v o i d fetchURL ( WebDriver d r i v e r , S t r i n g u r l ) 31 throws R a t e L i m i t E x c e p t i o n { 32 driver . get ( url ) ; 33 L i s t h 1 s = d r i v e r . f i n d E l e m e n t s ( By . x p a t h ( ” / / h1 ” ) ) ; 34 I t e r a t o r i t e r = h 1 s . i t e r a t o r ( ) ; 35 while ( i t e r . hasNext ( ) ) { 36 i f ( i t e r . n e x t ( ) . g e t T e x t ( ) . c o n t a i n s ( ”Whoa t h e r e ! ” ) ) { 37 throw new R a t e L i m i t E x c e p t i o n ( ” GitHub r a t e E x c e e d e d ! ” ) ; 38 } 39 } 40 } 41 42 p u b l i c s t a t i c v o i d main ( S t r i n g [ ] a r g s ) 43 throws R a t e L i m i t E x c e p t i o n , I n t e r r u p t e d E x c e p t i o n { 44 i n t minSize = 1000; 45 i n t maxSize = 1 0 5 0 ; 46 long delay = 5000; 47 WebDriver d r i v e r = new F i r e f o x D r i v e r ( ) ; 48 S t r i n g u r l = ” h t t p s : / / g i t h u b . com / s e a r c h ? r e f = s e a r c h r e s u l t s ” ; 49 u r l = u r l + ”&t y p e =Code ” ; 50 u r l = u r l + ”&l = j a v a ” ; 51 u r l = u r l + ”&q=\” i m p o r t o r g . o p e n q a . s e l e n i u m \”” ; 52 u r l = u r l + ”+ s i z e : ” + m i n S i z e + ” . . ” + maxSize ; 53 f o r ( i n t i = 1 ; i . This primitive moves the current version to one of its successors. Other primitives are available as well, such as q=>*, which skips an arbitrary number of versions (i.e., including none). For each primitive a counterprimitive is available that traverses to a predecessor instead of a successor. For example q

Suggest Documents