Behavior Driven Testing in ALMA Telescope

0 downloads 0 Views 553KB Size Report
We adapted Behavior Driven Development (BDD) to testing activities applied to ... 3. Phase C - Validation: Aims to validate additional capabilities and scientific data content. .... collaboration, also it states that ”the most efficient and effective method of ... Fitnesse [9] which features the use of decision tables, and RSpec[10].
Behavior Driven Testing in ALMA Telescope Calibration Software Juan Pablo Gila and Mario Garcesa , Dominique Broguiereb , Tzu-Chiang Shena a Atacama

Large Milimeter/submilimeter Array, Alonso de Cordova 3107, Santiago, Chile de Radioastronomie Millimetrique (IRAM), Grenoble, France

b Institut

ABSTRACT ALMA software development cycle includes well defined testing stages that involves developers, testers and scientists. We adapted Behavior Driven Development (BDD) to testing activities applied to Telescope Calibration (TELCAL) software. BDD is an agile technique that encourages communication between roles by defining test cases using natural language to specify features and scenarios, what allows participants to share a common language and provides a high level set of automated tests. This work describes how we implemented and maintain BDD testing for TELCAL, the infrastructure needed to support it and proposals to expand this technique to other subsystems. Keywords: behavior driven testing, TELCAL, agile, automation

1. INTRODUCTION ALMA observatory has reached its operational stage in 2013, but its software is still under active development, which implements new capabilities to be offered to the scientific community. The current software release process involves the delivery of short releases that includes a set of new features for one or more subsystems of ALMA Software[1]. Also, the testing stage of the same process involves not only developers and testers, but also stakeholders (i.e: scientists and engineers). Currently, there are still some drawbacks in the used testing methodology. One of the most relevant problem is that most software subsystems relies on functional tests that are executed either by exploratory analysis or by custom scripts. As a consequence, the current methodology produce a knowledge gap between developers, testers and scientist in the sense of comprehension and/or repeatably of testing results. In that way, BDD was proposed in order to increase test formalization and automation of testing activities. BDD is an agile technique derived from Test Driven Development that encourages communication between roles by defining and automating test cases using natural language. In BDD, each requirement is translated into several test cases (scenarios) that covers the correctness of the requirement functionality[2]. In addition, the use of natural language within test cases definition allows participants to develop a common domain language and to perform, in advance, peer reviews to assure correctness of the defined testing strategy for a specific release. This paper describes how we have implemented and maintained test cases to verify TELCAL based on BDD, the infrastructure needed to support automated tests and explore mechanism to expand this technique to other subsystems, in particular to online parts of the TELCAL subsystem that interact with real time observing processes.

1.1 ALMA Software Lifecycle Currently, the ALMA software follows an incremental release approach, each cycle lasts two months. Within a cycle, there are three main phases: 1. Phase A - Developer Integration and Testing: Aims to demonstrate that a functionality is implemented as requested, based on unit tests. All tests must pass to move to the verification phase. Further author information: (Send correspondence to Juan Pablo Gil) Juan Pablo Gil: E-mail: [email protected]

2. Phase B - Verification: Aims to test new functionality using both simulation and, when required, production environments. This includes a regression tests suite for detecting bugs and provide fixes for them. All tests must pass to proceed with the validation phase; otherwise features are ”de-scoped”. 3. Phase C - Validation: Aims to validate additional capabilities and scientific data content. Performance and robustness aspects are also analyzed as part of validation tests (new releases should always behave better or at least not worse than the previous ones). After accumulating several releases, the software acceptance was carried out.

1.2 Telescope Calibration Software Telescope Calibration Software (TELCAL) is a collection of calibration engines whose purpose is to maintain the ALMA interferometer optimally tuned to successfully execute the planned observations. TELCAL primarily deals with: 1. the measurement of the atmospheric absorption and the determination of phase radiometric correction 2. the pointing and focus offsets computation 3. the delay measurements 4. the solving of antenna position 5. the monitoring of phase and amplitude on the astronomical calibrator sources TELCAL is a component of the ALMA instrument operation. It receives data from several instruments (Baseline and ACA Correlators, Total Power detectors, Water Vapor radiometers), performs the required calibrations and delivers the results to the Control System and the Archive. The communication between TELCAL and the other components relies on the ALMA Common Software (ACS), based on object oriented CORBA middleware. TELCAL uses the ACS infrastructure and deploys its code into CORBA components which process in parallel the data from several arrays and exchange messages with other CORBA components. Metadata, which describe

Figure 1. TELCAL modes

the observing process, are sent to TELCAL as CORBA messages. Binary data, issued from the correlators, are sent to TELCAL through the Data Distribution System (DDS), to support high data rate (up to 60MBytes/s). This is the ”online” mode of TELCAL (Figure 1). The input data of TELCAL are fully described by the ALMA Science Data Model (ASDM) [3]. Practically, an ASDM data set is composed of a set of tables which groups all the parameters relative to the observation (metadata) and a set of binary blocks, containing correlator, square law detector, or radiometer raw data. TELCAL also provides an ”offline” mode (aka: TELCAL Standalone), whose primary goal is to process the calibrations that cannot be run in realtime (i.e: the ones that needs human interaction to validate the results). Actually, all the TelCal engines can be run in this standalone mode. This is extremely useful to reproduce the calibrations done ”online” and investigate some possible issues. Also, more refined data reductions can be done afterwards, without any dependency on the online system. TELCAL ”offine” is run through CASA program, the official data reduction package for ALMA. For this, TELCAL calibration algorithms, developed in C++, are interfaced with Python, using SWIG tool. Then, a specific mechanism in CASA makes the TELCAL code appear as a plug-in of CASA. In Listing 1, the interaction of CASA Software and TELCAL engines is showed. In this case, Pointing engine (used to determine pointing offets) is being called through tc pointing task in CASA. The listing shows the input parameters of the tc pointing task, using the generic CASA command inp(). Listing 1. TELCAL running in CASA Software CASA : inp tc_pointing -- -- -- -- > inp () # tc_pointing :: Process a pointing calibration on an ASDM and plot the results . asdm = ’ uid___A002_X788be1_X12cf ’ # name of input ASDM dataset dataorigin = ’ totalpower ’ # origin of the data to be used scan = ’3 ’ # scans to be plotted antenna = ’’ # list of antennas fitwidth = True # fit beam width polaraverage = True # average polarizations planetfit = False # perform the calibration using the planet profile calresult = ’’ # name of output ASDM dataset showplot = True # whether or not display the plot showxmlflags = False # show the XML flags showbinaryflags = False # show the binary flags verbose = False # verbose mode

Once the input parameters have been set, the specified TELCAL task can be run. When the calculations are completed, the results are displayed on the screen. If the showplot option is enabled, a plot is also displayed. In Listing 2, tc pointing task is run using the parameters detailed in Listing 1. The relative pointing offsets (offRel column), expressed in arcseconds, correspond to the pointing corrections that are supposed to be applied on each antenna by CONTROL when TELCAL is run ”online”. Listing 2. Execution of a Pointing calibration using CASA Software CASA : tc_pointing -- -- -- -- > tc_pointing () Start : 2014 -01 -12 T11 :43:29.352000000 End : 2014 -01 -12 T12 :43:29.352000000 Duration 3600.000 s Method : CROSS Band : ALMA_RB_07 Freq : 282.88 -297.11 GHz Temp : 271.84 Az -54.69 El 60.27 Ant : CM01 Pol : X offRel : ( -0.72 , 1.01) offAbs : (651.49 , 12.14) Err :(0.19 , 0.19) Peak : 1148.194846 ( 9.32) Width : 34.62 (0.35) Ant : CM02 Pol : X offRel : ( -1.90 , 1.00) offAbs : (160.80 , -36.81) Err :(0.15 , 0.15) Peak : 1302.570554 ( 8.79) Width : 34.28 (0.28) Ant : CM09 Pol : X offRel : ( -0.01 , -0.47) offAbs : (316.45 , 9.27) Err :(0.18 , 0.18)

Peak : 1309.402669 ( 10.29) Width : Ant : DA60 Pol : X offRel : ( -0.03 , Peak : 3162.441935 ( 42.23) Width : Ant : DV22 Pol : X offRel : ( -0.83 , Peak : 3473.674386 ( 50.12) Width :

34.37 0.17) 21.58 1.41) 21.94

(0.33) offAbs : (293.43 , 11.12) Err :(0.20 , 0.20) (0.35) offAbs : (319.18 , -363.03) Err :(0.21 , 0.22) (0.38)

TelCal results are used by the scientists who have to monitor and tune the instrument during an observation, for example, adjust the pointing of each antenna to keep the required accuracy (0.6 arcsec) or monitor the atmosphere to see if the conditions are adequate to observe in the Millimeter Spectrum. Another important goal is to detect and report possible hardware failures as soon as possible during the observation. As ALMA provides more capabilities to the scientific community in each observing cycle, new corresponding requirements are defined by scientists. These requirements result in new implementations, bugfixes or improvements from the existing software. In the testing process of TELCAL software, each calibration engine is tested separately using TELCAL Standalone. Suitable data-sets are choosen for each test. With this approach, BDD is a well suited testing paradigm for TELCAL.

2. BEHAVIOR DRIVEN TESTING The term Behavior Driven Development was coined in 2006 by Dan North in his seminal article ”Introducing BDD” [4], where the author established a set of practices derived from Test Driven Development[5]. The main difference between those two techniques is that BDD emphasizes on team collaboration and facilitates information sharing between the different participants of software development by moving the focus from unit test to features tests, making this set of tools a natural choice for functional testing stages. While unit testing are still the bricks for automated regression tests, this complementary approach is specially useful to promote a common language between developers, testers and stakeholders by using feature tests that are understandable, executable and can be used as a functional regression test suite as it will be shown later in this paper.

2.1 Concepts of Behavior Driven Development BDD in its pure form was initially proposed as a methodology to write software following the principles of agile development. The Agile Manifesto emphasizes on individuals and interactions, working software and customer collaboration, also it states that ”the most efficient and effective method of conveying information to and within a development team is face-to-face conversation” [6]. The process starts by the analysis of the requirements and expected business outcomes, translate them into user stories and write the feature tests that achieve those outcomes [7]. A typical story includes the description of the requirement, the expected benefit and a set of acceptance criteria in the form of scenarios that must pass the tests. The tests written in this way defines the scope of the feature as well as its acceptance criteria. The last stage of the process is to write the required code to pass the scenario. As in test driven development BDD starts writing tests using a framework, write enough code to pass the test, as more tests are written it becomes a regression suite and later it serves also as documentation. But, instead of implement the tests to check code correctness, BDD tests are written in a business readable, domain specific language that resembles natural language to describe expected behavior rather than implementation details. This extra layer is intended to improve communication between the different roles: developer, tester and customer, and provides a test decoupled from the details of implementation.

2.2 Feature Tests Definition There are good sources of information regarding how to write a feature test, the language to be used how to map to executable code and the different frameworks currently available to develop and test using BDD. Although the principles of BDD are very broad and do not restrict to particular syntax, in practice some conventions appear and are currently part of many frameworks. In this section the core concepts will be explained using the terminology of Cucumber[2], one of the most popular supporting tools for BDD.

A requirement can be understood as one or more specific behaviors of the software to be built. These behaviors are written as feature tests using an ubiquitous language, but technically they are simple text files starting with a description and one or more scenarios to deal with details. Each scenario follow the same logic: given the system is in a specific state, when a set of actions are executed then some expected outcomes should happen. These given / when / then sentences are then mapped to actual pieces of code that acts as traditional unit tests. This way to write tests has the main advantage that the focus is on expected results by specifying behaviors under different scenarios. Also, this allows all actors to understand how the requirement should work in terms of results rather than on implementations details. Cucumber and similar tools uses a special syntax called Gherkin [8] to write the feature tests. Gherkin is line oriented, allows comments and has some language keywords, among them: Feature, Scenario, Given, When, Then, And. With this simple syntax, it is possible to write feature tests for the canonical example in the literature, an ATM machine: Listing 3. A simple feature test Feature : Withdraw Money from ATM A user with an account at a bank would like to withdraw money from an ATM . Provided he has a valid account and debit or credit card , he should be allowed to make the transaction . The ATM will tend the requested amount of money , return his card , and subtract amount of the withdrawal from the user ’ s account . Scenario : Eric wants to withdraw money from his bank account at an ATM Given Eric has a valid Credit or Debit card And his account balance is $100 When he inserts his card And withdraws $45 Then the ATM should return $45 And his account balance is $55 Scenario : A second scenario ... Given defined preconditions When actions has been performed Then results should be expected ...

Even that a feature test by itself seems to not have any functionality, it could be possible to implement steps (functional code that maps specific parts of the feature test) and so, a feature test could be able to do specific actions and validations. Depending on the framework selected, step definitions can be written in several programming languages like: Ruby (default for Cucumber), Java, Python and many other choices. Since BDD is a methodology that fits Agile paradigm, the number of different approaches and frameworks has been developed over the time has increased notoriosly and so, the maturity of BDD tools is constantly evolving. Even that one of the most popular BDD framework at the moment is Cucumber, there are some other tools like JBehave, a java-based BDD framework, Fitnesse [9] which features the use of decision tables, and RSpec[10] which is widely used in Ruby and works by embedding specifications directly in the code to be tested. All of those tools shares in common a specification/behavior document, a parser for that file, a mapping system from scenarios to executable code and a framework to execute those scenarios.

2.3 BDD Applied to Testing The principles and practices from BDD can be naturally applied to testing workflows like the one used for ALMA software. Examples can be found in the literature[11]. A fixed number of new features are agreed

between developers, testers and stakeholders for a specific release, and they usually relates to a new specific behavior or system configuration. The behavior described in a new feature can be quickly translated to a feature test and so, use it as functional acceptance criteria. It also allows to perform at early stages peer reviews to assure correctness of the defined testing strategy for a specific release. The first advantage of this approach is that once the feature test is tied with a functional requirement, it remains being valid even if the implementation changes. Documentation written in separate means tends to become obsolete quickly, however feature test serves as up-to-date documentation of the software given that the tests are maintained to pass after every new release. When writing the test, a common domain specific language is developed between the participants and so, inexperienced people can quickly get a clear idea of what is implemented and tested. As with TDD, it is fundamental to maintain and increase the number of regression test suites in order to ensure existing features still work over newer releases. Even that the implementation of a feature test from a specific requirement seems to be seamless, in fact is not as that due the implementation must be as robust as possible. This is not always a disadvantage as doing this job leads a deeper expertise on what is tested event that it takes more time to do that. Test driven development have been used in astronomical software [12][13] and that experience can also be ported to BDD testing approach. It is important to note that BDD applies mainly at functional level. In big astronomical projects like ALMA and others being constructed like SKA or E-ELT, this technique should be part of a battery of automated tests including performance, scalability and web apps that must be addressed with another set of tools to have a complete coverage of the system under test [14].

3. FEATURE TESTS FOR TELESCOPE CALIBRATION SOFTWARE The selected subsystem for the first experience of applying BDD on the testing phase was TELCAL in its OFFLINE version. In this mode results can be reproduced by using an observational result (ASDM) previously generated with the online part of the software. One of the main advantages with this mode of operation is that the the state of the system is almost completely described by the data inside the ASDM. Others preconditions are given by TELCAL version used, dependencies on other software, and some machine specific configurations like number of cores, etc. As was said before, TELCAL Standalone works on top of CASA which is written in Python. Considering that, the software tool chosed to apply BDD in TELCAL was ”Behave”[15] which follows the same logic as Cucumber, but is native for Python. Given the implementation-agnostic nature of the feature tests, this choice may be changed in the future with low impact.

3.1 A Simple TELCAL Feature Test To illustrate how a feature test is used in TELCAL Standalone, a simple and auto-contained example is shown in Listing 4. Listing 4. A feature test for an execution using Bandpass calibration engine @ICT -4953 Feature : Exception thrown by CalibrationFunctions :: callPolyant must be handled Reduce some asdms with known fails in BandPassScan and check that exceptions are correctly handled . Scenario : Catch an error in test description uid . Given uid :// A002 / Xa57844 / X1f9b is already exported When the commands below are executed """ tc_bandpass ( asdm =" ui d _ _ _A 0 0 2_ X a 5 78 4 4 _ X1 f 9 b " , scan ="8" ) """ Then output should have " Error in polyant "

This feature file is written in Gherkin language that Behave match with a python file using regular expressions. An excerpt of the actual steps.py file (which contains the implementation of multiple feature tests) is described in Listing 5. Listing 5. Implementation of matched expressions @given ( ’{ uid } is already exported ’) def step_impl ( context , uid ) : do_export_asdm ( uid ) @when ( ’ the commands below are executed ’) def step_impl ( context ) : context . stdout = execute_casapy ( context . text , inDirectory = " data " ) @then ( ’ output should have "{ expected_result }" ’) def step_impl ( context , expected_result ) : assert expected_result in context . stdout

After showing the feature test file and its implementation, the execution of that test file is described in Listing 6. After each execution, Behave provides valuable information about the test, such as: execution time per step, encountered number errors and a detail of those (if any) and total execution time. Listing 6. Execution of a test feature using Behave $ behave -k -- tags @ICT -4953 Feature : Exception thrown by CalibrationFunctions :: callPolyant must be handled # ICT -4953. feature :3 Reduce some asdms with known fails in BandPassScan and check that exceptions are correctly handled . Scenario : Catch an error in test description uid . # ICT -4953. feature :8 Given uid :// A002 / Xa57844 / X1f9b is already exported # steps / steps . py :29 0.016 s When the commands below are executed # steps / steps . py :85 19.902 s """ tc_bandpass ( asdm = " u i d __ _ A 0 02 _ X a 57 8 4 4 _X 1 f 9 b " , scan = " 8 " ) """ Then output should have " Error in polyant " # steps / steps . py :110 0.000 s 1 feature passed , 0 failed , 24 skipped 1 scenarios passed , 0 failed , 72 skipped 3 steps passed , 0 failed , 336 skipped , 0 undefined Took 0 m58 .910 s

3.2 From JIRA Requirement to TELCAL Feature Test After the initial implementation of TELCAL feature tests, it was decided to integrate this practice as part of one complete testing cycles. In our normal workflow, new features for each release are agreed through JIRA tickets. Each JIRA ticket contains a full requirement description, test instructions to be run manually, a discussion regarding the feature, and a set of testing states to be marked as passed/failed (including test details[1]). Those last fields related to testing are exclusive for the tester to be filled and also, without any restriction regarding to format or a template definition. The first attempt to include BDD was targeted to test a BDD framework and to ensure replicability of the tests. Before BDD, most of the tests used to verify the correctness of ALMA software was performed by using simple automation approaches, mainly of those based on Bash scripts, and so, the first steps in order to apply BDD was in translating those scripts into several feature tests. In Listing 7, a real feature test (which contains the requirements detailed in JIRA Ticket ICT-4420) is described.

Listing 7. Execution of a test feature using Behave @2014 .6 @ICT -4420 @NewFeature Feature : take WVR channel centres and widths from ASDM and remove code related to hard - coded values Scenario : Proposed Developer Test Given Telcal version is at least "2014.6" and Telcal build is newer than "2015 -04 -01" and uid :// A002 / X9bb85e / Xcb is already exported when the commands below are executed """ tc_wvr ( asdm =" u id __ _A0 02 _X 9bb 85 e_ Xcb " , scan ="4" , fullwetpath = False , showplot = False ) """ then these antennas should have results 1\% near these values | Antenna | Values | | DV03 | wH2O : 11.063 mm wetP : 74.235 mm dryP : 1820.405 mm | | PM01 | wH2O : 11.087 mm wetP : 74.410 mm dryP : 1820.667 mm |

After the migration from the older testing approach to BDD, some basic lessons were learned: • Feature tests needs constant refactoring as any written code. Most of the refactoring process is mainly oriented in order to make the test concise and clear: every reader must understand it with minimum effort. The description of the feature that is written in JIRA tends to be clear and detailed but, the language used some times tends to be scientific-oriented and often, comments present in JIRA gives useful insights on what should be the expected behavior. For writing a clear, short and understandable feature test, the whole JIRA ticket must be to be analyzed first and the behavior needs to be rewritten in a concise but complete form. • The initial test is proposed by the developer in the ticket. With deterministic software like TELCAL offline translate it to given / when / then format is straightforward. • Environment state must be taken in account. Preconditions (givens) related to version and build were added later, when the next testing campaign started. • When testing the output of TELCAL Standalone, it is important to take special care on how each test was set to be executed, how to capture the output, and what ancillary mechanisms should be mocked up. In our case, a X11 interface was needed to load the environment without errors so we configured a Xvfb server. This special server accepts incoming X11 petitions and silently drops them, providing a good mockup for X server. • Output values could be wrongly understood as simple text string to match with a stored pattern. This is not true in our case. While running BDD tests in different servers some of them failed due numerical precision changes slightly between machines. Even though, as those numerical precision mismatch are still valid for scientific purposes, a mechanism was implemented to deal with that. The solution adopted was to implement a specialized function that parses the strings generated by TELCAL Standalone and for each value, a range comparison near the expected result is analyzed instead of fixed values for each parameter. All this logic is mapped in feature tests on phrases like: ”Then these antennas should have results 1% near these values”. For the first software release in which BDD approach was implemented, near half of the time assigned to testing was invested in develop the initial feature test suite. Even though, the team gained more experience on the subsystem and also, with that initial implementation, next releases took less and less time. After four subsequent software releases (equivalent to 10 months) of acquiring BDD as part of testing methodology, a

feature test suite is already available which provides some testing basis in order to perform regression tests when required. The experience gained gave more insights on BDD testing. The use of tags in each feature test were useful from the beginning. It allows to execute just a portion of our test suite, for example, some tests belonging to a specific software releases. While developing new feature tests it is necessary to take them out of the regression suite and so, the tag @skip is used to ignore it. There are plans to integrate SikuliX for graphical testing, but as it is not done yet the tag @GUI is added to the feature test, so it can be skipped in the running environment. To play with different parameter combinations, ”Scenario Outlines” are very useful. It allows to define tests by example and later, be executed as individual tests. All the code associated with feature tests was designed in order to become a specialized ancillary library for TELCAL. As the same with any other libraries, refactoring is expected over the time but, those changes must be backwards compatible in order to do not break previous tests. This code will also serve for testing at different scopes. In particular, we have plans to reuse this same library in TELCAL online to have an automated way to verify real time observations during the regression tests using to deploy software into operation[16]. In Listings 8, the current style of writing feature tests is described which gives a glimpse of the benefits obtained by using the BDD testing methodology. Listing 8. A example of the style used to write feature tests using Behave @ICT -6202 @2015 .8 @NewFeature Feature : Add tc_wvrskydip task to calculate wvr efficiencies from skydip A new task called tc_wvrskydip was made to easily process fast skydip scans taken in CSV -3221 , to measure the sky coupling efficiencies of the WVR receivers . This task is derived from tc_wvr . The ultimate goal is to update these values in the TMCDB so that they can appear in the ASDM Feed table for access by TelCal . # This feature was implemented in 2015.8 and backported to 2014.6. # Run these scenarios takes as long as 4 hours ! # # TODO : # - Approval of Developer and Scientist pending Background : # Common to all scenarios Given Telcal version is at least "2014.6" And Telcal build is newer than "2015 -12 -21 00:00" And uid :// A002 / Xac5575 / X4086 is already exported # This scenario is intended to run the default test proposed by the developer . # It runs in 400 s in telcal - dev @slow Scenario : 1) Canonical tc_wvrskydip test with one representative ASDM When the commands below are executed """ asdm = ’ uid___A002_Xac5575_X4086 ’ scan = ’1 ’ fitskycoupling = ’1 ’ waterscaleheight = 1000 lapserate = 6.5 spillovertemp = 290 use actualaltitude = False usewvrloads = False

calresult = ’’ showplot = True verbose = True inp ( tc_wvrskydip ) go () """ Then these antennas should have results 1% near these values | | | | |

Antenna | Values DA41 | wH2O : 1.144 mm DA42 | wH2O : 1.156 mm DA43 | wH2O : 1.111 mm DA45 | wH2O : 1.024 mm

skyCoupling : skyCoupling : skyCoupling : skyCoupling :

0.984 0.963 0.992 0.980

0.984 0.963 0.992 0.980

0.984 0.963 0.992 0.980

0.984 0.963 0.992 0.980

sigma sigma sigma sigma

fit : fit : fit : fit :

1.376 K 5.168 K 1.297 K 1.150 K

| | | | |

And these files should be in dataset directory | | | |

filename uid___A002_Xac5575_X4086 - LR6 .5 - SH1000 - SOT290 - skyCoupling . dat uid___A002_Xac5575_X4086 - DV12 - LR6 .5 - SH1000 - SOT290 . png uid___A002_Xac5575_X4086 - DA43 - LR6 .5 - SH1000 - SOT290 . png

| | | |

@slow Scenario Outline : 2) Exercise valid parameters combinations When the commands below are executed """ tc_wvrskydip ( asdm = ’ uid___A002_Xac5575_X4086 ’ , scan = ’1 ’ , fitskycoupling = ’1 ’ , = < waterscaleheight > , lapserate = < lapserate > , spillovertemp = < spillovertemp > , usea ctualaltitude = < useactualaltitude > , usewvrloads = < usewvrloads > , showplot = False , verbose = True ) """ Then no telcal error should appear Examples : | waterscaleheight | 500 | 900 | 1100

| | | |

lapserate 3.5 5.5 7.5

| | | |

spillovertemp 260 260 290

| | | |

useactualaltitude False False True

| | | |

usewvrloads True False True

| | | |

3.3 Feature Tests Categorization The BDD framework installed for TELCAL was suitable for more kinds of test that just new features. Regarding its function, tests were tagged as @ancillary, @sanity, @NewFeature, and @BugFix. • Ancillary Tests: Some functions developed for the tests like ”Given Telcal version is at least” needs proof by itself, and a set of support tests were developed and tagged as @ancillary. They are run as a proof that the test supporting code is still valid after steps code refactoring. • Sanity Tests: When a new software release is ready to be tested it needs to be installed in testing server. This sanity tests has the purpose of validates the installation by performing very basic functionality of TELCAL offline. • New Features: These are the features agreed with scientist to be implemented, verified and validated at every software release. Previous sections had deep explanations on these kind of tests.

• Bug Fixes: When bugs are found in the software they are marked to be fixed in a specific software release. The logic behind this set of tests are similar to new feature tests but they usually contains less scenarios just to verify that previous bugs are actually fixed.

4. AUTOMATED TESTING ENVIRONMENT One of the main aspects for automated regression testing is the infrastructure that holds that specially when Continuous Integration is desired as part of ALMA’s software development methodology. Until now, the main components of the infrastructure that handles the software used in ALMA’s operations are Virtual machines, STEs and a Buildfarm which is in charge of producing operative builds for each incremental software release version [17]. Unfortunately, the software implemented for ALMA is extremely specific, complex and also requires physical resources (in the Online side) and so, implementing automated solutions for testing is very challenging and has been subject of many discussions inside ALMA [18]. In general, most of the software is tested by hand and the only thing that is currently automated is the compilation via a Buildfarm infrastructure. This Buildfarm is based on Jenkins and it is in charge to gather all new changes and compile all the necessary in order to obtain an operative build for a specific release version. Before the inclusion of BDD for automatized testing, TelCal software was completely tested by hand by IRM staff and depending which feature is needed to be tested, is what version of TelCal will be used (”offline” or ”online”). Each of those versions of TelCal has different deployments, The ”online” version of TelCal is deployed in the same way other subsystems of the ALMA Online Software are deployed on a STE. That means, a specific machine (part of that environment) used to deploy all CORBA components of TELCAL by setting the deployment parameters in the TMCDB (a special database that handles all the configuration options for the deployment). On the other hand, TELCAL Standalone is installed as a tool on top of CASA and deployed on all the machines that belongs to the ALMA reduction cluster. Considering that existing STEs are shared resources between all software testers and stakeholders (even the ones used for testing purposes), the inclusion of an automated solution for testing is much more feasible in the ”offline” side of TELCAL as the environments can be replicable without much extra cost as comparing the costs of replicating a STE. Regarding TELCAL Standalone, the reduction cluster involves machines whose deployed version of TelCal (and CASA) differs according to the specific needs. Also, each week a new deployment is requested by stakeholders in which APO staff installs a fresh build on each reduction machine. Even that ALMA’s incremental release methodology controls quite well the changes done in each software branch, there are still risks of new changes that breaks old features or system dependencies that are detected during the weekly deployment in production unless regression testing is performed for all the active versions of TELCAL. Considering all those aspects, some changes were included on the existing infrastructure of ”offline” TelCal: 1. Testing machine: The first change in order to get an automatized platform for Phase-B testing is the introduction of a specific machine (telcal-dev) used to run regression tests for each active version in production and also, the ones that are candidates to be released an should be tested in the current Phase B testing campaign. In order to do that, for each desired TELCAL version in which regression test suites will be ran, a separate context is created but sharing the same data directory in order to avoid duplications of ASDMs or other input data needed for the regression tests. 2. Automatic deployment: Another utility that was implemented not only for automate the regression test execution, but also in order to provide an easier way to deploy TelCal in reduction machines is a script which download the latest successful build from the Buildfarm, obtains the installer from that build and later, install it. 3. Automatic orchestration script and reporting tool: One of the most important part of the automatization of the tests for TELCAL was the implementation of an script that orchestrate the execution of the regression tests and also provides a summary of the execution of the regression tests. The first implementation checks daily if a newer build is available and if so, tries to install TELCAL, execute the

Figure 2. Automated BDD Testing execution Workflow

regression tests and finally, upload a summary report containing relevant information such as: machine used to execute the tests, the regression test global status, the build and software release version used and a detailed description of each test executed. With the infrastructure implementations described above, it was possible to obtain an automated solution in which automatically gets new code changes and produces as an output a report containing the summary of the regression test. All this automatic process is described in Figure 2.

5. CONCLUSIONS Behavior Driven Development concepts applied to testing are being used into TELCAL software with good results. Up to date four software releases has been verified using feature tests and replicability is guaranteed by the very nature of BDD tools. Also, during feature test definition and implementation, an important amount of time was invested in exploratory testing, an activity that brings deeper understanding and helps to find unexpected issues related to other areas like configuration or performance. From the administrative point of view the adoption of BDD testing was aligned with corporate software life cycle policies. Gerkhin language used to write feature test resembles natural specifications and were introduced in our normal workflow without impact, it was perceived just a clarification in requirements and test strategy to be used. BDD was created to focus on behavior, an area traditionally covered by functional and acceptance testing. Specific details of implementation, border condition and extensive parameter combination should be covered by unit testing, while integration, scalability and end to end testing should be done at other levels [19]. After ten months creating feature tests for the different software release, now TELCAL offline has a functional test suite which cover sanity checks, new features and bug fixes. It was found that having a common hardware infrastructure dedicated to testing, separated from production environment but configured in same way greatly improves testing activities. Among other benefits it helps us to uncover non standard configuration in production machines and have a close feel on the way that final users will use TELCAL, because our testing group became another set of user by its own right. In the case of TELCAL online, special care must be taken for replicability. The first task will be to use simulation environments currently available and test communication with other subsystems. There are still work pending and new lines of development are foreseen in the short and middle term. In order to have most of the power of BDD it is desirable to have more involvement of other actors: developers and scientist. If all interested parties agrees on the description of the behavior from the beginning then hidden implementation details could be handled early, testing strategies could be adapted and acceptance of the feature should be just a simple verification because the feature test already are passed. An automated testing environment was implemented to run this suite periodically but more progress are expected in this area, the reporting system itself must be refactored to serve as a template for reporting automated testing of other ALMA software subsystems. Another feature that is expected to be implemented in the short term is adding support for graphical interfaces using frameworks like SikuliX. Also, our experience with BDD testing and automation must be adapted to the rest of our software and other scopes of testing should follow this approach whenever is possible.

ACKNOWLEDGMENTS The Atacama Large Millimeter/submillimeter Array (ALMA), an international astronomy facility, is a partnership of Europe, North America and East Asia in cooperation with the Republic of Chile. ALMA is funded in Europe by the European Organization for Astronomical Research in the Southern Hemisphere (ESO), in North America by the U.S. National Science Foundation (NSF) in cooperation with the National Research Council of Canada (NRC) and the National Science Council of Taiwan (NSC) and in East Asia by the National Institutes of Natural Sciences (NINS) of Japan in cooperation with the Academia Sinica (AS) in Taiwan. ALMA construction and operations are led on behalf of Europe by ESO, on behalf of North America by the National Radio Astronomy Observatory (NRAO), which is managed by Associated Universities, Inc. (AUI) and on behalf of East Asia by the National Astronomical Observatory of Japan (NAOJ). The Joint ALMA Observatory (JAO) provides the unified leadership and management of the construction, commissioning and operation of ALMA.

References [1] R. Soto, J.P.A. Ibsen, N. Saez, and T.C. Shen, “Alma release management: A practical approach,” Proc. 15th International Conference on Accelerator & Large Experimental Physics Control Systems (ICALEPCS2015) (2015).

[2] Wynne, M. and Hellesoy, A., [The cucumber book : behaviour-driven development for testers and developers ], The pragmatic programmers, Dallas, Tex. Pragmatic Bookshelf (2012). [3] Viallefond, F., “The Alma Science Data Model,” in [Astronomical Data Analysis Software and Systems XV], Gabriel, C., Arviset, C., Ponz, D., and Enrique, S., eds., Astronomical Society of the Pacific Conference Series 351, 627 (July 2006). [4] North, D., “Introducing BDD,” Better Software Magazine (Mar. 2006). [5] Beck, [Test Driven Development: By Example ], Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA (2002). [6] Beck, K., Beedle, M., van Bennekum, A., Cockburn, A., Cunningham, W., Fowler, M., Grenning, J., Highsmith, J., Hunt, A., Jeffries, R., Kern, J., Marick, B., Martin, R. C., Mellor, S., Schwaber, K., Sutherland, J., and Thomas, D., “Manifesto for agile software development,” (2001). [7] Park, S. and Maurer, F., [Agile Processes in Software Engineering and Extreme Programming: 11th International Conference, XP 2010, Trondheim, Norway, June 1-4, 2010. Proceedings ], ch. A Literature Review on Story Test Driven Development, 208–213, Springer Berlin Heidelberg, Berlin, Heidelberg (2010). [8] Marchese, M., Zen, R., and Villafiorita, A., “Gherkin cucumber,” [9] Mugridge, R. and Cunningham, W., [Fit for developing software: framework for integrated tests ], Pearson Education (2005). [10] Chelimsky, D., Astels, D., Helmkamp, B., North, D., Dennis, Z., and Hellesoy, A., [The RSpec book: Behaviour driven development with Rspec, Cucumber, and friends], Pragmatic Bookshelf (2010). [11] Diepenbeck, M., K¨ uhne, U., Soeken, M., and Drechsler, R., [Tests and Proofs: 8th International Conference, TAP 2014, Held as Part of STAF 2014, York, UK, July 24-25, 2014. Proceedings ], ch. Behaviour Driven Development for Tests and Verification, 61–77, Springer International Publishing, Cham (2014). [12] Kulas, M., Borelli, J. L., Gssler, W., Peter, D., Rabien, S., Orban de Xivry, G., Busoni, L., Bonaglia, M., Mazzoni, T., and Rahmer, G., “Practical experience with test-driven development during commissioning of the multi-star ao system argos,” Proc. SPIE 9152, 91520D–91520D–10 (2014). [13] Mller-Nilsson, O., Pavlov, A., and Feldt, M., “Sphere data reduction software: first insights into data reduction software development for next-generation instruments,” Proc. SPIE 7740, 774022–774022–8 (2010). [14] Wicenec, A., Parsons, R., Kitaeff, S., Vinsen, K., Wu, C., Nelson, P., and Reed, D., “Distributed agile software development for the ska,” Proc. SPIE 8451, 845106–845106–9 (2012). [15] Benno Rice, Richard Jones, J. E., “Behave bdd in python,” (2016). [16] Soto, R., Gonzlez, V., Ibsen, J., Mora, M., Sez, N., and Shen, T.-C., “Alma software regression tests: the evolution under an operational environment,” Proc. SPIE 8451, 84511R–84511R–7 (2012). [17] Shen, T.-C., Soto, R., Mora, M., Reveco, J., and Ibsen, J., “Alma operation support software and infrastructure,” Proc. SPIE 8451, 845106–845106–6 (2012). [18] Glendenning, B., Schmid, E., Kosugi, G., Kern, J. S., Ibsen, J., Watanabe, M., Chavan, M., Griffith, M., and Soto, R., “Ten things we would do differently today: reflections on a decade of alma software development,” Proc. SPIE 9152, 91521L–91521L–13 (2014). [19] Mora, M., Avarias, J., Tejeda, A., Gil, J. P., and Sommer, H., “Alma software scalability experience with growing number of antennas,” Proc. SPIE 8451, 84510W–84510W–19 (2012).