application discovered more unique defects and had more code coverage than test ... Python programming language that embodies simplicity and readability. ...... implemented Android Ripper, which is an automated GUI testing tool capable of.
Colorado Technical University
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS
Chapters 1 to 5 Dr. Danette Lance A Dissertation Presented in Partial Fulfillment of the Requirements for the Degree of Doctor of Computer Science
By
Herbert Dawkins
Colorado Springs, Colorado August 2015
Committee
__________________________________________ [Mentor name], [Degree], Chair
__________________________________________ [Committee Name], [Degree], Committee Member
__________________________________________ [Committee Name], [Degree], Committee Member
__________________________________________ Date Approved
©
Herbert Dawkins, 2015
i
Abstract Unit tests written by developers and quality assurance engineers require massive amounts of time during the software development lifecycle (SDLC) and fail to cover a wide range of possible execution scenarios that leave the applications susceptible to attacks. This dissertation presented Halfwaytree, a tool capable of automatically generating a unit test for every feasible execution path for simple Python programs. This study determined if test cases generated by the symbolic execution of a module in a Kivy mobile phone application discovered more unique defects and had more code coverage than test cases generated randomly. The results indicated that Halfwaytree discovered 17 unique inputs of death and covered 21 unique feasible execution paths in comparison to random testing that discovered 10 unique inputs of death and covered 14 unique input paths.
ii
Dedication To humanity....
iii
Acknowledgements I would like to thank my advisor, Dr. Danette Lance, for helping me navigate through the obstacles of the research process and for allowing me to bombard her with countless questions. I would also like to thank the members of my committee, Dr. James Prunier and Dr. Henry Felch, for their thought-provoking questions and suggestions that helped propel my research in the right direction. I thank Guido Van Rossum for creating the Python programming language that embodies simplicity and readability. I deeply appreciate the efforts of the head librarian assistant, Martha Hall, who helped me in finding the most inaccessible research articles on the topic of symbolic execution. Finally, I would like to offer special thanks to my fellow graduate students who told me to keep my research simple and straightforward. Without the efforts of all of these individuals mentioned above, none of my research would have been possible.
iv
Table of Contents
Dedication..............................................................................................................iii Acknowledgements.............................................................................................iv Table of Contents..................................................................................................v List of Tables.......................................................................................................ix CHAPTER ONE......................................................................................................1 Background...........................................................................................................1 Statement of the Problem......................................................................................3 Purpose of the Study.............................................................................................3 Significance of the Study......................................................................................4 Primary Research Questions.................................................................................6 Hypotheses............................................................................................................6 Research Objectives..............................................................................................7 Theoretical Framework.........................................................................................7 Assumptions.........................................................................................................8 Limitations............................................................................................................8 Scope and Delimitations.......................................................................................9 Definition of Terms.............................................................................................10 Summary.............................................................................................................11 v
CHAPTER TWO: LITERATURE REVIEW........................................................14 Introduction.........................................................................................................14 Vulnerability Scanning.......................................................................................16 Model-Based Testing...........................................................................................17 Symbolic Execution............................................................................................24 Generalized Symbolic Execution.......................................................................26 Lazy Initialization...............................................................................................29 Dynamic Symbolic Execution............................................................................30 Symbolic Execution Tools..................................................................................33 Tools for C Programs..........................................................................................33 Developed for Java Programs.............................................................................35 Developed for JavaScript Programs...................................................................37 Developed for Python Programs.........................................................................38 Developed for Multiple Languages....................................................................41 Constraint Solvers Applied.................................................................................42 Summary.............................................................................................................44 References...........................................................................................................45 CHAPTER THREE: METHODOLOGY..............................................................53 Introduction.........................................................................................................53 Rationale for Research Approach.......................................................................53 vi
Research Setting.................................................................................................54 Data Collection Methods....................................................................................54 Data Analysis Methods.......................................................................................55 Issues of Trustworthiness....................................................................................56 Summary.............................................................................................................56 References...........................................................................................................57 CHAPTER FOUR: RESULTS..............................................................................58 Introduction.........................................................................................................58 Halfwaytree Validation.......................................................................................58 Description of Samples.......................................................................................63 Survey of Results................................................................................................64 Control................................................................................................................64 Experiment..........................................................................................................65 Summary.............................................................................................................65 References...........................................................................................................66 CHAPTER FIVE: CONCLUSION.......................................................................67 Introduction.........................................................................................................67 Summary of This Study......................................................................................67 Significance of This Study..................................................................................67 Compendium of Study........................................................................................68 vii
Interpretation of Results.....................................................................................69 Explanation of Findings......................................................................................69 Acceptance of Hypotheses..................................................................................70 Limitations..........................................................................................................71 Suggestions for Future Research........................................................................71 Summary.............................................................................................................72 References...........................................................................................................72
viii
List of Tables Table 1: Side-by-Side Comparison of Source Code from Cadar and Sen (2013).............59 Table 2: Side-by-Side Comparison of Execution Tree from Cadar and Sen (2013)..........60 Table 3: Side-by-Side Comparison of Source Code from Khurshid, Pasareanu, and Visser (2003) and Python...............................................................................61 Table 4: Side-by-Side Comparison of Execution Tree from Khurshid, Pasareanu, and Visser (2003) along with the Halfwaytree Tree.....................................................62 Table 5: Control Code Coverage and Defects....................................................................64 Table 6: Experiment Code Coverage and Defects.............................................................65
ix
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS
CHAPTER ONE Background Organizations have lost millions of dollars due to software defects (Bradbury, 2014; Dougherty & Huang, 2014; Securities Exchange Commission, 2013). Specifically, Knight Capital Group’s automated routing system, SMARS, had a software defect that caused it to route millions of orders over a period of 45 minutes that resulted in the company losing over $460 million (Securities Exchange Commission, 2013). Similarly, Mt. Gox, a Bitcoin exchange, lost $480 million after malicious users exploited a software defect that allowed them to alter Bitcoin’s withdrawal transactions before the transactions were confirmed by Bitcoin’s network. This defect, when exploited by attackers, singlehandily caused the largest Bitcoin exchange at the time to file for bankruptcy (Dougherty & Huang, 2014). A former Mt. Gox developer revealed that the company had only recently introduced a testing environment, meaning that previously inadequately tested software changes were pushed out to production for customers to use (McMillan, 2014). The financial damages caused by these defects far outweigh the costs associated with discovering them. Unfortunately, software development can be a very expensive process (Kumari & Pushkar, 2013). Many organizations exhaust their resources on software planning and development and leave only a fraction for their software verification and validation processes. Software testing is too important to be an afterthought and should be an ongoing process during the entire software development lifecycle (SDLC). According to Kumari and Pushkar (2012), a classical approach for 1
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS software testing is described by the V-model, which requires user acceptance tests to comply with the business requirements, system and integration tests to comply with the architecture design, and unit tests to comply with the code specifications. Under this testing paradigm, the software is considered complete only after each type of testing has satisfied its corresponding requirements. However, the reality is that software testing on projects only terminates when the resources are depleted. One of the challenges that exist in software testing is that, even if the software is considered complete, there are still at-the-edge case scenarios that can cause unanticipated defects to appear. This problem cannot be solved by increasing the number of quality assurance engineers because they would have to manually write test cases for every possible execution path. Even more difficult is that when the software is altered during development, a set of new test cases would have to be created due to possible software regression. Hence, there is a dire need for automated test case generation for higher code coverage. One approach that addresses this issue is to symbolically execute the source code. In a symbolic execution, a program is executed using symbolic variables in place of concrete values for inputs (Sen, Marinov, & Agha, 2005). This forms an execution tree representing all possible execution paths and constraints expressed in terms of symbolic variables at each node in the tree. Therefore, to determine the concrete variable inputs that would execute a particular node in the execution tree, the constraints at that particular node would have to be solved with a constraint solver. There are limitations to
2
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS this symbolic execution, such as path explosion, as well as complex constraints that will be discussed in Chapter Two. This dissertation will present an approach to automated test case generation using symbolic execution for applications written with Kivy, a cross-platform, Python mobile phone framework. The novel symbolic execution engine algorithm, Halfwaytree, will be implemented using Python’s abstract syntax tree (AST) module to obtain all execution paths and Z3, a constraint solver developed by Microsoft (Moura & Bjorner, 2014). Statement of the Problem Unit tests manually written by developers and quality assurance, along with requiring massive amounts of time during the SDLC, fail to cover a wide range of possible execution scenarios. This problem has been evidenced by Esfahani, Kacem, Mirzaei, Malek, and Stavrou (2013); Mirzaei, Malek, Pasareanu, Esfahani, and Mahmood (2012); and Yang, Mukul, and Tao (2013); and Bucur, Kinder, and Candea (2014), which is the genesis of this research. Mobile phone application developers should spend most of their time designing and analyzing the results of automated tests. Unfortunately, the reality is that most of their time is spent conducting tedious manual tests and trying to duplicate hard-to-reproduce defects. The research proposed here describes a Halfwaytree that is a symbolic execution engine prototype that can be used to generate test cases for cross-platform Python mobile phone applications. Purpose of the Study Let X be the number of unique defects discovered by unit tests generated
3
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS randomly, and let Y be the number of unique defects discovered by unit tests automatically generated by Halfwaytree. This study’s research uses an experimental quantitative research design for testing the differences between X and Y. Specifically, the purpose of this study is to determine if Y > X. Moreover, let N represent the number of execution paths covered by test cases generated randomly, and let M represent the number of execution paths represented by test cases generated through symbolic execution, as another aim of this study is to determine if M > N. The main reason why the symbolic executions for Kivy applications should be studied is because Kivy has emerged as a dominant player for the cross-platform development framework in the mobile market. However, no method exists to quickly generate test cases for applications on this framework. Moreover, this research will help the mobile phone industry because it will reduce the number of defects in an application before shipping it for release. Even though defects at release would more than likely be caught and dealt with at a later time, the defects would be more expensive to resolve because of the obligatory administrative processes needed to discover and remove them. Even worse, sometimes new functionality is built on top of a pre-existing defect and therefore resolving the defect later will break the new functionality. Hence, this research is in the best interest of the mobile phone industry because it assists in discovering defects as early as possible in the software development process. Significance of the Study This contributions of this study would be a significant move forward in testing
4
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS Python-based mobile applications. Currently, there is no off-the-shelf product or opensource tool capable of symbolically executing Kivy applications. The impact of this study would greatly impact quality assurance engineers because they would be able to allocate more time to conducting tests not suited for automated testing such as regression testing for user interface defects. This study is beneficial to quality assurance and to all the stakeholders of the SDLC. Developers would be able to code and conduct a code review and shortly thereafter receive quick feedback from quality assurance on inputs of “death” that would crash their application and are not covered by the test-driven development unit’s tests. Subsequently, developers could edit the pre-existing source code to properly exception-handle such scenarios. This research would be helpful to project managers because their teams would be more productive, which, in turn, means they would be more likely to finish code releases on time. Moreover, product owners would benefit significantly because they would receive code with greater execution path coverage. Another substantial contribution of this study is that Python mobile phone applications would become more secure against attacks that are successful in breaking functionality. This is important because mobile phones are used to conduct activities that interact with personal identifying information such as accessing bank accounts, checking emails, and storing personal images. According to Esfahani, Kacem, Mirzaei, Malek, and Stavrou (2013), mobile markets are harboring apps that are either malicious or vulnerable, which leads to susceptibility in millions of devices, and that security is not one of the primary design tenets for the mobile applications (2013). Hence, many defect-
5
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS ridden applications are downloaded unknowingly by the end users. This represents a major security threat. Finally, this research would be useful in creating a software tool that could be deployed to automatically scan the source code of an application for test cases that break security-related functionality. Such a system could be implemented by the App store and/or Google Play to analyze Kivy applications before the binaries are created and uploaded. Primary Research Questions 1. Do test cases generated by symbolic execution of Python programs discover more unique defects than test cases generated randomly? 2. Do test cases generated by symbolic execution of Python programs have more code coverage than test cases generated randomly? Hypotheses H0 1: Test cases automatically generated by Halfwaytree will detect the same number of unique defects in a Kivy mobile application in comparison to test cases generated randomly. H1 1: Test cases generated automatically by Halfwaytree will detect more unique defects in a Kivy mobile application in comparison to test cases generated randomly. H2 1: Test cases generated automatically by Halfwaytree will detect fewer unique defects in a Kivy mobile application in comparison to test cases generated randomly. H0 2: Test cases generated automatically by Halfwaytree will have the same code coverage in comparison to test cases generated randomly.
6
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS H1 2: Test cases generated automatically by Halfwaytree will have more code coverage in comparison to test cases generated randomly. H2 2: Test cases generated automatically by Halfwaytree will have less code coverage in comparison to test cases generated randomly. Research Objectives 1. To develop a prototype called Halfwaytree that is capable of symbolic execution with Kivy mobile applications. 2. To compare random test case generation to programmatic test case generation in terms of the number of unique defects discovered and code coverage. 3. To identify if symbolic execution can reduce the development time during the SDLC.
7
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS Theoretical Framework There is evidence that demonstrates that test suites generated by symbolic execution of software written in C can generate high code coverage, discover new security vulnerabilities, and locate hard-to-find defects in code ranging from third-part code used in libraries to device drivers (Cadar et al., 2011). Similarly, others have noted that manually selecting values for test input cases is labor-intensive and does not guarantee that all possible execution paths of a particular code unit will be observed during the testing (Sen, Marinov, & Agha, 2005). The quantitative experimental research for this dissertation is grounded in the previous observations. In addition, this theory has not yet been tested on Python-based mobile phone applications, which is one of the objectives of this research. Assumptions This research has three assumptions. First, the applications being tested have defects that have not yet been analyzed by symbolic execution and then corrected. This is extremely important because the H0 hypothesis is automatically accepted if the source code has absolutely no defects. Second, the Kivy application was developed for Python versions 2.7.3 and above, as required by Halfwaytree. Third, the source code of the application being analyzed must be available because Halfwaytree does not work on Python binaries. Limitations This study has five major limitations. The first three limitations are inherent to all
8
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS symbolic executions and the last two limitations occur because Halfwaytree is only a prototype. First, path conditions must be solvable by the Z3 solver used by Halfwaytree. Insolvable path conditions occur when the constraint solver is unable to find solutions because the constraints are too complex to solve. For instance, the MathSAT5 constraint solver is incapable of solving non-linear constraints (MathSAT5, 2014). Second, Halfwaytree cannot analyze function calls in compiled code. Compiled code cannot be scrutinized by symbolic execution to the same extent as regular Python code. Therefore, defects caused by compiled code in third-party libraries represent a blind spot to Halfwaytree. Third, Halfwaytree cannot cover all execution paths in large programs due to path explosion that occurs when the exploration of too many execution paths uses up all of the computer’s RAM, which causes the symbolic execution engine to crash or become stuck. Symbolic execution does not scale well for programs with a large number of possible paths because the number of feasible paths increase exponentially with program size (Cadar et al., 2011). Fourth, Halfwaytree can only perceive Integer variables. Hence, all the variables in the program being analyzed must be of an Integer type. Fifth, Halfwaytree can analyze programs that are composed of Assert, Print, Assignment, and If statements. Moreover, the If statements can only have a single conditional or have multiple conditionals joined with the “And” conjunction. Scope and Delimitations To ensure the Kivy application is sufficiently large to meet the first assumption
9
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS but not too large to cause path explosion, the applications tested by Halfwaytree must have a source code of between 1 to 5,000 lines of code. Additionally, the Kivy mobile application must be developed for Python 2.7.3 to Python 3 to meet the second assumption. Additionally, the source code cannot have function calls to compiled code or code in a language other than Python. Conditional statements present in the source code can be composed only of linear and non-linear arithmetic. Therefore, constraints solved by the Z3 are more likely to be solvable. Moreover, this study considers only the nongraphical unit tests because they are more concrete than graphical unit tests and better documented (Kivy, 2014). To satisfy the third assumption and not to infringe on any proprietary restrictions, another requirement of this scope is to only test open-source Kivy applications. Definition of Terms The terms used in this study are defined in Table 1. Table 1 Definition of Terms SDLC
Software development lifecycle
QA
Quality assurance: also called testing engineer
BA
Business Analyst: gathers the requirements for projects
Acceptance Criteria
Business requirements written from the perspective of the user
TDD
Test Driven Development: development process in which a developer writes unit tests before coding 10
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS Completion Criteria
Criteria used to determine when testing is complete
Regression Testing
Retesting of unit test to ensure code modifications did not break the functionality
Validation
Determining if the software meets its business requirements
Verification
Determining if the software is being built according to structural design specifications
AST
Abstract Syntax Tree
Execution Tree
A tree representing all possible execution paths and constraints. It is expressed in terms of symbolic variables at each node in the tree.
Symbolic Variable
A variable that contains a symbolic value of a particular data type, i.e., x = ‘x’ so 2*x = 2*’x’ = 2’x’
Concrete Variable
A variable that contains a non-symbolic value, i.e., x = 3 so 2*x = 2*3 = 6
Explicit Path Fork
A fork or possibility in the code execution arising from source code control structures, i.e., if statements, try catch, loops
Implicit Path Fork
A fork or possibility in the code execution arising from possible errors, i.e., array references to elements that do not exist
Constraint
A conditional that must be true at a particular node in the execution tree. They are written in terms of symbolic variables and concrete variables. They are created by any implicit or explicit path execution fork in the source code. A typical execution path has many constraints.
Symbolic Execution
In general, executing a program with symbolic variables instead of concrete variables. Often referred to as a three-step process involving instrumentation, execution of symbolic variables, and constraint solving.
11
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS Path Explosion
Occurs when the exploration of too many execution paths consumes too much of the computer’s resources and this causes the symbolic execution engine to crash.
Test Suite
A collection of test cases
Test Case
A particular set of input values for a computer program
Inputs of Death
A test case that causes a runtime error
Unique Defect
A defect that traverses a previously unexplored execution path
Summary Chapter One presented the background, the problem statement, the purpose of the study, and the significance of the study. Also, the two research questions and corresponding hypotheses were discussed in great detail. Furthermore, this chapter introduced the research objectives, theoretical framework, assumptions, limitations, scope, and delimitations in regards to symbolic executions for Kivy applications. Chapter Two will present an extensive review of literature pertinent to symbolic execution. References Bradbury, D. (2014). What the Bitcoin bug means: A guide to transaction malleability. Coin Desk. Retrieved from http://www.coindesk.com/bitcoin-bug-guidetransaction-malleability Bucur, S. (2014). chef-symbex-python. Retrieved from https://github.com/dslabepfl/chef-symbex-python Bucur, S., Kinder, S., & Candea, G. (2014). Prototyping symbolic execution engines for interpreted languages. Proceedings of the 19th International Conference on
12
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS Architectural Support for Programming Languages and Operating Systems. doi: 10.1145/2541940.2541977 Cadar, C., Godefroid, P., Khurshid, S., Pasareanu, C., Sen, K., Tillmann, N., & Visser, W. (2011). Symbolic execution for software testing in practice—preliminary assessment. International Conference on Software Engineering, 1066–1071. doi: 10.1145/1985793.1985995 Dougherty, C., & Huang, G. (2014). Mt. Gox seeks bankruptcy after $480 million Bitcoin loss. Bloomberg. Retrieved from http://www.bloomberg.com/news/2014-0228/mt-gox-exchange-files-for-bankruptcy.html McMillan, R. (2014). The inside story of Mt. Gox Bitcoin’s $460 million disaster. Wired. Retrieved from http://www.wired.com/2014/03/bitcoin-exchange/ Kivy (2014). Kivy: Unit Tests. Retrieved from http://txzone.net/files/projects/ kivy/kivydoc/contribute-unittest.html Kumari, S., & Pushkar, S. (2013). Performance analysis of the software cost estimation methods: A review. International Journal of Advanced Research in Computer Science and Software Engineering. Retrieved from http://www.ijarcsse.com/ docs/papers/Volume_3/7_July2013/V3I7-0247.pdf Kumari, S., & Pushkar, S. (2012). Perking up the V-Model to VM-Model. IJEIT, 1(5). Retrieved from http://ijeit.com/vol%201/Issue%205/IJEIT1412201205_46.pdf MathSat5 (2014). An SMT solver for formal verification & more. Retrieved from http://mathsat.fbk.eu/
13
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS Moura, L., & Bjorner, N. (2014). Z3: A powerful SMT solver. Retrieved from http://research.microsoft.com/en-us/um/redmond/projects/z3/Z3_system.pdf Sen, K., Marinov, D., & Agha, G. (2005). CUTE: A concolic unit testing engine for C. Proceedings of the 10th European Software Engineering Conference, 263–272. doi: 10.1145/1081706.1081750 Securities Exchange Commission (2013). Security Exchange Act of 1934: Release No. 70694. Retrieved from http://www.sec.gov/litigation/admin/2013/34-70694.pdf
14
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS CHAPTER TWO: LITERATURE REVIEW Introduction Over the years, the complexity of software architecture has increased and this has brought extensive testing of software components to the forefront of the development process. This has become even more relevant today, especially when it is commonplace to construct new software built upon pre-existing software packages that may contain serious errors. Consequently, organizations have increased the effectiveness of their software analytical methods in order to foster improved code quality. Lee, Lee, and Kang (2012) completed a comprehensive inspection of software methods and tools by examining the current practices used by experts from 73 Fortune 1000 companies. To conduct their study, the researchers developed a survey of 50 questions that covered three areas regarding the testing environment, the current testing practices and perceived weaknesses, and the capabilities desired of future testing strategies. The results demonstrated that the amount of software testing is proportional to the maturity of the organization. This is not surprising because companies with more experience understand the enormous value added by extensively testing its software. In order to improve code quality, others have resorted to developing the improved mathematical models used to calculate the existence of defects. Garg, Lai, and Huang (2011), utilized statistical approaches to ascertain the appropriate time to end software testing by examining the exponential, Rayleigh, Weibull, exponential-Weibull, and logistic probability distributions of reliability. As a result, they determined that software
15
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS testing should stop once the failure intensity is below some arbitrary threshold and that the most important factors in software reliability were the size of the code base, penalties for failures, and resources available. In a similar fashion, Asthana and Okumoto (2012) presented an integrative software design reliability framework that showed high accuracy in predicting the number of software defects during the software development lifecycle (SDLC). There are seven steps in their framework: (1) define the software reliability requirements and tools to be used, (2) develop a reliability model for the software architecture, (3) design system integration, (4) conduct fault injection testing and stability testing, (5) predict defects using parameters from all aspects of SDLC, (6) perform field reliability and validation, and (7) refine prediction models. One weakness of this framework is that quality assurance engineers may not have enough input in designing the system integration because this is usually developed by the system architect. Likewise, Maevsky, Yaremchuk, and Shapa (2014) sought to improve code quality by designing a better software reliability system. Specifically, they created an a priori software reliability evaluation system that applies parameters derived from software development to form a mathematical model that is solved to obtain reliability indexes. The strength of this approach is that it can estimate latent faults and fault density at the beginning of the testing process (2014). Even though there have been attempts to construct better software by improving defect prediction and software reliability by using mathematical models (Garg, Lai, & Huang, 2011; Lee, Lee, & Kang, 2012; Maevsky, Yaremchuk, & Shapa, 2014), a more effective approach is to apply automated testing
16
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS tools such as vulnerability scanning, model-based testing, and symbolic execution. In prior related research of symbolic execution engines for Python programs, others have measured performance time, discussed algorithm design, ease of use, or introduced their symbolic execution analysis tool to the research community (Bucur, 2014; Canini, Venzano, Peresini, Kostic, & Rexford, 2012; Chen, Zhang, Chen, Yang, & Bai, 2014; Jiang, Liu, Yin, & Liu, 2009; Jacky, 2011). However, there is a gap in the research that no one has dealt with thus far. Specifically, there is no research-based evidence that proves that the symbolic executions of Python programs actually result in deeper code coverage and/or discover more defects. Hence, the aim of this dissertation is to fill in this knowledge gap by answering these important questions. Vulnerability Scanning Vulnerability scanning is an automated software evaluation process conducted by testing an application against common exploits. Kuzma (2011) analyzed the web application security of 60 online pharmacy websites with the N-Stalker web application security scanner. Kuzma’s approach was effective at detecting serious flaws that may be caused by improper implementations and outdated installations of software such as OpenSSL and Apache. In terms of percent and vulnerability, the results indicated that 24% were susceptible to SQL injection, 18% were susceptible to denial of service, 14% were susceptible to cross-site scripting, 11% were susceptible to buffer overflow, 7% were susceptible to directory traversal, and 14% were susceptible to miscellaneous attacks. This is a significant issue because many of these sites managed personal
17
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS identifying information (PII) of the patients. Even more, intrusion to PII could result in a violation of the Health Insurance Portability and Accountability Act (HIPAA) of 1996, which levies a costly first-time fine between $100 to 450,000 per infringement (HIPAA, 2014). Model-Based Testing Model-based testing (MBT) is an automated testing methodology that creates a formal model of the system under test (SUT) and processes the model to generate test cases (Takala, Katara, & Harty, 2011). One of the benefits of MBT is that it can be conducted without direct access to the source code, which is why it has been used numerous times by researchers. In one such case, Mehlitz, Tkachuk, and Ujma (2011) created JPF-AWT, an extension for the Java Path Finder (JPF) software that is capable of analyzing a large graphical user interface (GUI) made with the abstract window toolkit library (AWT). They tested JPF-AWT on Robot Manager, a Java application that serves as a control center for managing robots. Although the application had already been rigorously tested, the researchers discovered 12 new defects. Some advantages of the JPF-AWT are its ability to handle multiple threads and check for deadlocks and race conditions, which makes it ideal for evaluating complex GUIs. It can also execute test cases on the unmodified, compiled application. Hence, the source code is not needed for implementing tests. Additional work has been done in MBT for automatic test case generation. Amalfitano, Asolino and Tramontana (2001) presented an automated mobile testing technique based on a crawler that constructs a GUI tree model of the application
18
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS and generates test cases that can be executed. The GUI tree is composed of nodes that denote the interface and edges that denote the event-based transitions between nodes. This study tackled two major issues. How to design an MBT approach so it, one, catered to heterogeneity in device hardware and, two, worked with a variety of inputs such as user, environment, and context inputs. However, both of these questions were left unanswered by the researchers. Unlike Amalfitano, Asolino and Tramontana (2001), other researchers conducted MBT using the actual unified modeling language (UML) diagrams from the design phase of the SDLC instead of crawling the GUI. Karsoliya, Sinhal, and Kanungo (2013) developed a modelbased test generation architecture (MBTGA) framework by applying unified modeling and combinatorial testing. Their approach applied input modeling and combinatorial coverage to generate testing suites. Even though their method was valid, a major shortcoming was that the researchers did not elaborate enough on the inner workings of the research. In a similar study, researchers devised a clever approach to automatic generation of valid test cases from unified modeling language (UML) with Prim and Dijkstra’s algorithms (Prasanna, Chandran, & Thiruvenkadam, 2011). These UML diagrams were also based on the design specifications of the software and represented the static and dynamic interaction between objects in SUT. Prasanna et al. (2011) applied the UML diagrams to generate a grid-like graph consisting of nodes that represent objects and edges that represent messages passed between objects. Subsequently, the edges were arbitrarily assigned weights and then Prim and Dijkstra’s algorithms were used to
19
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS determine test cases by looping through all possible paths in the grid. The primary advantage of this approach is that it allows software engineers to develop test cases early in the SDLC that can reveal inconsistencies or weaknesses in the requirements or software architecture. Even though the source code of an application may not be necessary, some investigators have conducted MBT using it. Hu and Neamtiu (2011), proposed a method for detecting GUI bugs automatically by generating JUnit test cases from an application’s source code, running test cases with the Dalvik virtual machine while feeding it random and deterministic events with Monkey, and analyzing log files produced for errors. The study was conducted on 10 open-source Android applications. Bugs were classified as dynamic errors such as run-time errors, which are unhandled exceptions that occur when the application does not catch errors capable of crashing the program; API errors, which are caused by the application assuming it is running on a different API as provided by the system; I/O errors; and thread concurrency errors. For this MBT study, Hu and Neamtiu used test cases focused on activities because they are the main entry points of Android applications. Moreover, each JUnit test was based on the initial condition, specification, or the state management of an activity. Hu and Neamtiu’s approach was a practical technique for automated testing because SDK includes out-of-the-box test with the event generator they applied in their study. This increased the simplicity of their approach. With regards to MBT of Python programs, only a small number of tools are available. On such tool is PyModule, which presents itself an ideal MBT test case
20
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS generation approach because it can generate unit tests for nondeterministic systems (Jacky, 2011). PyModel is a model-based, offline and on-the-fly testing framework for Python and consists of an analyzer, graphics, and testing program. The analyzer generates a finite state model of the SUT and each model represents the allowable paths of the application. Takala, Katara, and Harty (2011) introduced an open-source MBT tool for the GUI analysis of Android applications by utilizing Python and TEMA tools. Additionally, the investigators tested their approach on the BBC News Widget Android application. Their method works by injecting GUI events and verifying the state of the GUI with random testing. A major contribution of the study was the design of a keyword-based test automation tool devised for the Android emulator. The researchers faced two challenges during the study. One, the network connection with the emulator would fail after a few hours of operation. Two, executing MBT would sometimes find defects that were already known. Nonetheless, the researchers were able to discover eight new defects during the modeling process and six new defects during the testing execution. The utility of their method was demonstrated because two of the defects were severe runtime errors capable of crashing the application. However, given the benefits of this approach, a significant drawback of the framework is the lengthy amount of time it requires. The test case execution for the BBC application lasted for 115 hours. Moreover, many of the tests did not require a complex series of actions and could have been discovered by manual testing. With regards to the theory behind their approach, Takala and colleagues provided
21
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS thorough documentation supporting why the MBT keyword-driven approach applied in this study was the best method for GUI analysis. A key benefit mentioned was that applying keywords to MBT separated the test automation logic from the test design, thus making it easier to update test cases when the GUI changes. In a similar manner, other researchers have conducted MBT on Android applications. Amalfitano, Fasolino, Carmine, Memon, and Tramontana (2012), implemented Android Ripper, which is an automated GUI testing tool capable of detecting unhandled run-time exceptions by exploring an event flow graph (EFG) derived from the application’s GUI. In this study, researchers compared the effectiveness of Android Ripper to Monkey, a random test generator, in terms criteria such as defects detected, runtime errors detected, and code coverage for three preconditions. To elaborate, it applied three different strategies to traverse the application’s GUI to create JUnit test cases: depth-first, breadth first, and random strategies. Once the test cases were obtained, they were executed by an Android testing framework such as Robotium. A noteworthy advantage of Android Ripper is that its underlying algorithm can be configured by altering the termination criteria, application preconditions, or the GUI traversing strategy being used. According to the results, Android Ripper was more effective than Monkey at detecting code defects. This approach is also promising because of its demonstrated speed at detecting bugs. To illustrate, unlike the keyword-driven MBT framework developed by Takala and colleagues, Android Ripper automatically detected four new bugs in the
22
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS application’s source code in just under five hours. However, one disadvantage of this approach is that it does not handle the path explosion problem caused by applications with complex GUIs or with convoluted events. Another Android testing framework called Orbit was composed of an action detector and a dynamic crawler (Yang, Prasad, & Xie, 2012). This action detector conducted a static analysis of the source code utilizing IBM’s Watson Libraries for Analysis (WALA), and this produced disconnected call graphs. Subsequently, an actioninferring algorithm merged the call graphs to form an action map. The dynamic crawler, based upon Robotium, used this action map to evaluate test cases at runtime. Yang and his colleagues compared Orbit to other Android application testing tools in terms of code coverage and speed. The results demonstrated that, although Orbit was not the fastest, it had more code coverage. A very different MBT approach was developed by Esfahani, Kacem, Mirzaei, Malek, and Stavrou (2013). Unlike other researchers who conducted Android MBT on test devices or computers, Esfahani and colleagues created a technique to fuzz test the security posture of Android applications in the cloud. The technique leveraged the scalability of Amazon EC2 virtual servers to run test cases simultaneously for multiple instances of the same application. In particular, test cases were generated by a four-step process: (1) changing applications executable (.apk) to Java byte code (.class) with a tool such as Dex2Jar, (2) decompiling the Java byte code (.class) into source code files (.java) with JD-GUI, (3) parsing the source code and application manifest file to generate the
23
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS call graph model and architectural model with MoDisco, and (4) using the models along with Android specifications to create JUnit test cases (.java). MBT is applicable outside the realm of mobile applications. In 2012, researchers applied MBT to generate test cases for web applications (Mesbah, Deursen, & Lenselink, 2012). Specifically, they developed Crawljax, an open-source web analyzer designed to automatically crawl AJAX-based web applications by dynamic analysis of client-side code. Crawljax has four main components: a robot used for simulating user actions, document object model (DOM) analyzer that monitors changes in the DOM, and a finite state machine that maintains a state flow graph (SFG) of the web application. A major appeal of Crawljax is its capability to analyze the JavaScript of a web application and to keep track of elements being created or destroyed dynamically. This feature is especially useful in modern web applications because most of the elements in the GUI do not exist the moment the web application starts up; instead, they are created on the fly as the user interacts with the application. Likewise, in a similar study, Dallmeier, Burger, Orth, and Zeller (2013) developed Webmate, which is an automated tool used to analyze web applications. More specifically, Webmate is capable of extracting a usage model from the web application and use it to explore functionality by conducting cross-browser compatibility checks, regression tests, and code analysis. The usage model is obtained by Webmate crawling through the application and exploring all HTML elements linked with a JavaScript event handler. In general, this approach of web analysis is both a practical and effective
24
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS technique in mapping and MBT of web applications, especially because it is simple to use, as it requires the user to only enter the URL of the website. However, a large drawback is that, unlike Crawljax developed by Mesbah, Deursen, and Lenselink, Webmate is unable to map elements in the DOM that are created dynamically. Therefore, there is certainly room for improvement in future versions. Symbolic Execution Symbolic execution is a program analysis technique that determines what input values cause a particular part of the code to execute. Its usage in automatic test case generation has been well-documented since its inception in the 1970s. Boyer, Elspas, and Levitt (1975) pioneered worked on the symbolic execution with SELECT, a systematic debugger capable of handling paths of programs written in LISP (1975). What made SELECT unique, in comparison to its contemporary debuggers, was its ability to conduct formal processing of a program’s path to generate potential test input data, simplified symbolic values of variables, and proof of correctness of a path. Their approach used the hill-climbing algorithm to solve the system of inequalities that are generated by a particular program path. Other ground-breaking work in symbolic execution was led by King in 1976. He devised a system that used symbolic execution as a general-purpose interactive debugger called Effigy (King, 1976). Effigy performed symbolic execution of Programming Language One programs by substituting symbols into variables, executing the code to generate an execution tree, and solving path conditions for each terminal branch of the tree.
25
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS Concurrently, Clarke (1976) implemented similar research in symbolic execution. In particular, she introduced a system that conducted the symbolic testing of FORTRAN applications. In her program, the logic of the source code was stored in a control flow directed graph in which the nodes represented executable statements and the edges represented program flow. Her research was well-written and clearly explained the conceptual framework behind FORTRAN’s symbolic execution in such a way that it can be readily ported and applied to other languages. Ramamoorthy, Ho, and Chen (1976) contributed to symbolic execution from a different standpoint. The researchers proposed an approach for handling array references and loop iterations during the symbolic execution of FORTRAN programs (Ramamoorthy, Ho, & Chen, 1976). To elaborate, their approach used a path selector to track and determine test paths. Afterwards, a constraint generator obtained path constraints for each test path and a constraint solver was used to calculate path constraints to generate concrete test data. Another early FORTRAN symbolic execution engine was developed by Howden (1976), who extended the work of King, Boyer, Elspas, and Levitt (1976). Howden introduced Dissect, which was heavily influenced by other contemporary symbolic programs such as Effigy and Select (Howden, 1976). Dissect was designed to take the source code of a program and a list of input commands to carry out symbolic execution. One benefit of Dissect is that test engineers could choose path selection commands to specify which path the symbolic execution engine would execute. For instance, if the test engineer only wanted Dissect to explore and execute only the True branches as an “if” statement, then he would specify,
26
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS “SELECT ANY CONSISTENT”. Hence, the path selection commands available in Dissect provided some optimization by reducing the total number of paths traversed. In the early 2000s, symbolic execution was improved by designing symbolic execution engines to handle software with increased complexity such as multi-threaded programs. Notably, Khurshid, Pasareanu, and Visser (2003) presented symbolic execution capable of handling multi-threaded applications, dynamically allocated data structures, primitive data, and method conditions. In modern times, researchers have advanced symbolic execution even further by creating symbolic execution engines that apply heuristics to reduce path explosion and more powerful constraint solvers. Cadar et al. (2011) discussed recent advancements in generalized symbolic execution and dynamic symbolic execution such as directed automated random testing (DART) and concolic unit test engines (CUTE). Likewise, Cadar and Sen (2013) discussed the evolution of symbolic execution software testing from its birth in the 1970s to the modern era. Moreover, they explored present-day techniques such as concolic testing and executiongenerated testing. In addition, they examined the current state of key challenges such as path explosion, constraint solving, and concurrency handling of complex applications. Even more, Cadar and Sen proposed state-of-the art software solutions that would be best suited to tackle some of these challenges such as CUTE, CREST, DART, EXE, and KLEE. Generalized Symbolic Execution Generalized symbolic execution, also referred to as non-dynamic symbolic
27
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS execution or simply symbolic execution for short, works by executing programs with symbolic values as inputs instead of concrete values. To elaborate, at the high level, generalized symbolic execution consists of at least three steps. First, the source code of a computer program is preprocessed to obtain an execution tree that represents all possible execution paths. The branches in the execution tree occur because of the additional possibilities in the code path created by control structures such as if statements, loops with non-concrete iterations, and array references. Moreover, each path has a corresponding constraints container that stores a set of constraints for that particular path. Second, the selected path is executed with symbolic variables by running it with either a modified virtual machine, a modified interpreter, a custom-built program, or instrumenting the program before execution. For instance, Mirzaei, Malek, Pasareanu, Esfahani, and Mahmood (2012) symbolically executed Android applications by using a Java Virtual Machine that they had modified. With regards to interpreted languages, in practice many researchers chose to modify the interpreter to conduct symbolic execution (King, 1976; Sapra, Minea, Khaki, Gurfinkel, & Clarke, 2013; Bucur, Kinder, & Candea, 2014; Chen, Zhang, Chen, Yang, & Bai, 2014). In another case, Esfahani, Kacem, Mirzaei, Malek, and Stavrou (2013) used the custom-built approach to symbolic execution by creating a customized test case generator that read the symbolic execution tree expressed as a call graph. On the other hand, the majority of researchers prefer to implement symbolic execution by first instrumenting the program (Larson & Austin,
28
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS 2003; Sen, Marinov, & Agha, 2005; Anand, Pasareanu, & Visser, 2006; Anand, Naik, Harrold, & Yang, 2012; Canini, Venzano, Peresini, Kostic, & Rexford, 2012; Sen, Kalasapur, Brutch, & Gibbs, 2013). Specifically, instrumentation is a process that allows the state tracking and monitoring of symbolic variables during the execution. A common approach to instrumentation is source code instrumentation in which the source code is parsed and then injected with other code such as functions that allow tracking of specified input variables (Sen, Marinov, & Agha, 2010). Other common instrumentation approaches are to use the byte code of a program or the abstract syntax tree of the program by injecting nodes into the tree. During the execution of a particular path on the execution tree, for every statement that is executed, a new constraint is appended to that path’s constraints container. Therefore, once the path terminates, the constraints in that path’s constraints container represent the input variables that cause that particular path to execute. Note that each constraint is expressed in terms of the symbolic variables. The third step in symbolic execution is to solve the constraints to determine the concrete input variables. Once a single path has been executed, the symbolic execution engine would repeat steps 1–3 for all paths in the execution tree. Before symbolic execution can occur, there are a few items that must be either specified by the test engineer or assumed/determined by the symbolic execution engine. At the start, the test engineer must identify which input variables should be tracked and considered symbolic while executing. Most implementations of symbolic execution
29
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS assume all input variables are symbolic (Cadar & Sen, 2013). Next, regarding the loops on which the termination is dependent on a symbolic variable, the maximum number of iterations in that loop must be specified by the test engineer or assumed by the symbolic execution engine. Otherwise, the loop would perpetuate infinitely (Boyer, Elspas, & Levitt, 1975; Anand, Pasareanu, & Visser; 2006). Another important consideration for the test engineer is to specify where the code termination occurs. Typically, symbolic executors have been designed to terminate a path once it reaches an error via a specified assertion or the end of a program (Boyer, Elspas, & Levitt, 1975; Clarke, 1976; Cadar et al., 2011). Lazy Initialization Lazy initialization is a programming technique in which variables remain uninitialized until they are evoked in a calculation or comparison. This technique is especially useful in generalized symbolic execution because it allows the symbolic execution engine to handle unbounded data structures such as arrays. Much work in this area has been led by Anand, Pasareanu, and Visser (2006), who developed an alternative approach to symbolic execution capable of handling recursive data structures and arrays with lazy initialization. In the researchers’ implementation of lazy initialization, arrays had a symbolic length and were composed of a collection of initialized array cells each having its own symbolic index and symbolic value. Moreover, when an uninitialized cell was accessed, a new cell was created non-deterministically at the index that was previously uninitialized. Furthermore, the value inside of this newly initialized cell would
30
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS be set as symbolic. Another contribution by these researchers is a subsumption algorithm that was used to compare and eliminate redundant constraints. This improvement can be used to reduce processing time of the symbolic execution. Other researchers took lazy initialization and expanded upon it to develop new methods. Geldenhuys, Aguirre, Frias, and Visser (2013) introduced field-bounding and symmetry-breaking, which were two methods of improving lazy initialization in symbolic execution. To explain this further, in symmetry-breaking, isomorphic structures were prevented that eliminated the processing of redundant or similar paths. On the other hand, in field-bounding, the quality assurance engineer defined the scope by specifying the maximum number of iterations and object instances of classes which reduced the number of paths that were symbolically executed. To add, the authors compared the conventional lazy initialization, field-bounded lazy initialization, and symmetry-breaking lazy initialization approaches. The results demonstrated that two new approaches were faster than the conventional lazy initialization. Interestingly, the symmetry-breaking lazy initialization approach was faster than the field-bounded approach when the data structure contained more than seven nodes. Dynamic Symbolic Execution In dynamic symbolic execution, the variables being tracked during execution are represented by both symbolic and concrete values simultaneously. This differs from generalized symbolic execution that, at any given time during the execution, the variables being tracked are represented only symbolically or concretely, but never both. There are
31
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS two types of dynamic symbolic execution: concolic, which stands for concrete and symbolic execution, and execution generated testing (EGT) (Cadar & Sen, 2013; Cadar & Engler, 2014). The process of concolic execution consists of two steps beginning with the instrumentation of the program very much like generalized symbolic execution. The second step is to randomly select concrete values for the input variables being tracked. During execution, along with the initial random values, the variables of interest are tracked and represented symbolically as well. The key difference between concolic and generalized symbolic execution occurs whenever the symbolic execution engine arrives at a branch in the code. At this branch, the concolic execution engine examines all the possible paths at that particular branch that are not true with the pre-existing concrete values of variables. Specifically, for all paths that are false, while using the pre-existing concrete values, the concolic execution engine will negate the specific constraint that makes that particular path false and solve the modified constraints for a set of new concrete values. Note that both the constraints and the pre-existing concrete values are used to calculate the set new concrete values. The significance of this is that, if the variables of interest were initialized with the new set of concrete values, the code execution from the beginning would arrive at the corresponding path. Concolic execution exploits this fact and repeats the execution of the program from the beginning by using the sets of concrete variable values calculated from negating the constraints at a branch. Keep in mind that one branch may have more than two possibilities. As a result, a single branch point may produce multiple sets of new
32
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS concrete variable values that can be used to execute the program. Similar to generalized execution, the process of concolic execution continues until it terminates at an error, a specified assertion, or the end of the program. The process of EGT symbolic execution proceeds in similar manner to concolic symbolic execution, except at any particular time during the execution, EGT permits some tracked variables to be represented only symbolically unlike concolic, which always represents tracked variables concretely and symbolically (Cadar & Sen, 2013). Having that said, the majority of authors who apply dynamic symbolic execution prefer the concolic approach over EGT (Sen, Marinov, & Agha, 2005; Anand, Naik, Harrold, & Yang, 2012; Canini, Venzano, Peresini, Kostic, & Rexford, 2012; Clements, Kaashoek, Zeldovich, Morris, & Kohler, 2013; Sapra, Minea, Chaki, Gurfinkel, & Clarke, 2013; Sen, Kalasapur, Brutch, & Gibbs, 2013; Chen, Zhang, Chen, Yang, & Bai, 2014). One of the benefits to dynamic symbolic execution is that it eliminates issues caused by executing code from a third-party module. Hence, even if a program uses a function that is compiled or a part of an external code base, the dynamic symbolic execution engine will only input concrete values into that function. Moreover, another significant benefit is that dynamic symbolic execution does not need to worry about unbounded data structures as in generalized symbolic execution because all the values being executed are actually concrete. Consequently, there is no need to utilize lazy initialization to simplify the symbolic execution process.
33
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS Symbolic Execution Tools Symbolic execution tools are best categorized by the language that they target and the constraint solver they utilize. The constraint solver used for symbolic execution is important because it is a determining factor for the speed of the overall symbolic execution process. In addition, in order for a constraint solver to be used for symbolic execution, it must have a language binding for that symbolic execution tool’s target language. Among the languages targeted, many symbolic execution tools have been developed for programming languages such as C, Java, JavaScript, and Python, to name a few. Tools for C Programs In 2005, Sen and his colleagues developed a concolic unit testing engine (CUTE) and used it to perform testing for two case studies. In the first case study, the researchers used CUTE to analyze and improve its own source code. In the second case study, the researchers used CUTE to automatically find errors in the SGLIB, a popular open-source C library used for manipulating data structures, and they were able to discover two previously unknown serious errors. The true value-added of their research is that they devised a method for representing and solving approximate pointer constraints to generate test inputs. This opens up the range of languages that could be used with the concolic testing approach. In a similar manner, Larson and Austin (2003) conducted dynamic symbolic execution on the source code of eight popular C programs. They found two new defects
34
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS in OpenSSH that would have been very difficult to find with static analysis alone. This is significant because OpenSSH is a widely used program for encrypted communication and such defects could be exploited by malicious users to crack the encryption. Likewise, Le (2013) developed Helium, a hybrid approach to symbolic execution that combined generalized symbolic execution and dynamic symbolic execution concurrently for C programs. Unlike other symbolic execution engines, Helium could handle data structures such loops and library calls. Another advantage of Helium was its ability to easily generate unit test cases given a program segment. Biere, Knoop, Kovacs, and Zwirchmay (2013) designed a symbolic execution tool for C programs called SmacC, which was composed of five components: the lexer, which tokenized the C source code; the parser generated an abstract syntax tree; PathGen extracted paths from the abstract syntax tree; and BtorGen symbolically executed selected paths while using an SMT solver, called to Booletor, to solve constraints. This tool is an excellent choice for determining worst-case scenario execution times. In another C symbolic execution tool, Bugrara and Engler (2013) developed a new method for dealing with path explosion called dynamic slicing that was capable of speeding up symbolic execution by eliminating redundant paths. The researchers took an open-source C symbolic execution engine called KLEE and modified it by enabling dynamic slicing. Subsequently, the version of KLEE enhanced by dynamic slicing was about 50 times faster than the original version. One of the strong points behind Bugrara and Engler’s work was how eloquently he described his approach with a basic examples.
35
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS McCamant, Payer, Caselden, Bazhanyuk, and Song (2013) created FuzzBALL, a symbolic execution tool for software written in C language and used it to analyze AbiWord, an open-source text editor, and Poppler, an open-source PDF library. Their main objective was to develop a symbolic execution engine capable of analyzing software that specialized in transforming image data from one format to another. FuzzBALL’s algorithm was very similar to KLEE with the exception that FuzzBALL does not do parallel processing to explore multiple branches at a fork simultaneously. Instead it processes one branch completely and then returns to the fork later to process the other branch. Other researchers altered KLEE’s symbolic execution engine in different ways. Siddiqui and Khurshid (2012) improved the efficiency of KLEE by modifying it such that it conducted symbolic execution in two stages instead of all at once. To explain, the first stage generated abstract symbolic inputs that can be reused across methods, and the second stage expanded each abstract symbolic test for systematic testing. In comparison to the standard version of KLEE, the system devised by Siddiqui and Khurshid demonstrated an improvement in performance in terms of the time required to carry out symbolic testing for the methods of a binary search tree, a sorted linked list, and a binary heap. Developed for Java Programs Mirzaei, Malek, Pasareanu, Esfahani, and Mahmood (2012) applied symbolic execution to automatically generate test cases for Android applications. They made two contributions. First, they developed customized libraries to replace the Android-specific
36
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS libraries that enabled the application’s source code to be compiled with the Java Virtual Machine instead (JVM) of the Dalvik Virtual Machine (DVM). Hence, Mirzaei and his colleagues were able to analyze the source code with the Java Pathfinder tool (JPF) to determine path-divergence. Second, they created a process that automatically generated drivers that simulated all possible valid user input permutations within the Android application. Subsequently, each driver produced would correspond to a valid test case that could be used to evaluate the performance of the application. Prior to this, one of the main challenges of symbolic execution for Java was with processing multi-threaded programs. In 2003, this problem was tackled by Khurshid, Pasareanu, and Visser, who discussed how to conduct symbolic execution capable of handling multi-threaded Java applications that had dynamically allocated data structures, primitive data, and method conditions. Another challenge in the symbolic execution of Java applications is path explosion. One approach researchers have applied in dealing with this problem was to utilize dynamic symbolic execution. Anand, Naik, Harrold, and Yang (2012) presented an automated test generation technique based on a concolic algorithm to solve the path explosion problem for Java-based programs. Specifically, they measured the efficiency of two algorithms, CONTEST and CLASSIC, for five Android applications in terms of algorithm execution time, the number of feasible paths discovered, and the number of feasible constraint checks conducted. The main contribution of their research was the development of CONTEST, an algorithm that was the modification of the CLASSIC
37
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS algorithm. The difference between the two was that CONTEST checked a subsumption condition between event sequences. As a result, it was able to exclude redundant event sequences that suppressed path explosion. To add, the subsumption condition was analyzed in real-time during a test run iteration by dynamically checking facts about an application’s data and control flow. The results of the study showed that CONTEST took 5% to 36% of the time CLASSIC required. In addition, CONTEST explored less feasible paths and conducted fewer constraint checks. A major strength of this approach was that the system they developed ran off the Android SDK, which enabled it to be portable across mobile devices and run on a cluster of machines capable of emulating the Android SDK. However, there were some unanswered questions. The researchers did not discuss why CONTEST discovering a smaller number of feasible paths was beneficial, especially when less paths could result in finding fewer defects. Additionally, their approach was restricted to mobile applications that only have tap events, which exclude a plethora of Android applications that utilize more than just tap events. Developed for JavaScript Programs In terms of symbolic execution for JavaScript-based applications, researchers created Jalangi, a dynamic analysis framework that provides an offline mode for recording and replaying events and an online mode for in-browser analysis (Sen, Kalasapur, Brutch, & Gibbs, 2013). The strong points of this framework were its ability to conduct concolic testing, track the origins of nulls of undefined variables, perform taint
38
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS analysis, and profile object allocation and usage. Furthermore, the authors claimed that their approach could be extended to implement dynamic analysis of other languages. However, this task would be extremely difficult due to underlying differences between languages. Developed for Python Programs Sapra, Minea, Chaki, Gurfinkel, and Clarke (2013) developed CutiePy, which used dynamic symbolic execution for Python-based applications to generate test cases. Their goal was to devise an automated test suite generation mechanism that would find as many as possible buggy execution paths in a Python application. Such a task is actually more challenging than it may appear because Python, unlike older languages, is a dynamic type of language, which means that a variable at one point in the program can refer to a string, and then at another point could refer to an integer data type (INT). This dynamic behavior makes it harder to perform symbolic execution because it complicates formalizing and tracking type constraints. Another challenge of concolic testing in Python is that many Python applications have other languages being used or compiled code embedded into an imported package. Consequently, these sections of code cannot be scrutinized at the same level as regular Python code by symbolic execution. The researchers dealt with this problem with a clever approach of using a combination of symbolic execution and concrete execution such as concolic testing. In a similar fashion, other researchers used concolic testing in symbolic execution for Python applications. Specifically, Canini, Venzano, Peresini, Kostic, and Rexford
39
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS (2012) built NICE, a tool for automating the testing of OpenFlow applications coded in Python by applying model checking and concolic execution. OpenFlow is a programmable communication protocol that researchers used to develop and test experimental routing and switching protocols. In addition, they were able to demonstrate that NICE was five times faster than two other tools, Java Path Finder and Spin. Moreover NICE resulted in detecting 11 new defects. The main disadvantage of this research is that working with Open-Flow applications is not yet mainstream. Hence, other researchers interested in software testing may overlook the insights offered by Canini and his colleagues. In another study, researchers modified the Python interpreter and used it to as a universal, automated, real-time embedded software tester (Jiang, Liu, Yin, & Liu, 2009). In their study, the researchers integrated the modified Python interpreter to simulate a test platform for the flight control and management system of an unmanned aerial vehicle. Their approach was able to conduct multiple tests cases under 25 milliseconds. One of the benefits of this study is that there is still room for performance improvement because the execution speed of the interpreter can be increased which tools such as Psyco and Pyrex. On the other hand, there have been noteworthy approaches to automated test case generation that are similar to symbolic execution such as conducting mutation testing. Derezieska and Halas (2014) compared first-order and higher-order mutation testing of Python applications using Astmonkey, Bitstring, Dictset, and MutPy tools. To explain
40
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS further, in a first-order mutation, a single change is made to the source code by a mutation operator. Additionally, in a high-order mutation, multiple changes are made to the code by applying more than one mutation operator or the same mutation operator multiple times. Note these changes to the code do not occur to the actual source code, but to copies of the source code in the computer’s memory. The researchers in this study examined the effects of structural mutation operators, object-oriented mutation operators, and Python-related mutation operators. Moreover, in order to quantify the effectivity of the mutation testing tools, the authors calculated a mutation score based on the number of mutants killed by the test, the number of mutants killed by timeout, the number of all generated mutants, the number of incompetent mutants, and the number of equivalent mutants. The results demonstrated that MutPy had the highest first-order mutation score, while Astmonky had the highest second- and third-order mutation scores. A major challenge researchers faced in this study was in dealing with incompetent mutants, which are mutants that have changes that are syntactically and semantically correct but cause errors at runtime. This issue is common in mutation testing for dynamic languages such as Python. Their solution to incompetent mutants was to simply exclude them from the study altogether. More automated test case generation tools have been developed for Python programs. Chen, Zhang, Chen, Yang, and Bai (2014) developed ConPy, a concolic execution tool for Python programs. ConPy classified variables as either symbolic, which are pre-selected by the user, or non-symbolic. During execution, Conpy
41
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS ignored the non-symbolic variables and only created constraints for the symbolic variables. Consequently, this resulted in fewer constraints in comparison to non-concolic execution. Developed for Multiple Languages Sakamoto, Shimojo, Takasawa, Washizaki, and Fukazawa (2013) proposed the Open Code Coverage Framework (OCCF), a single tool capable of determining test coverage of procedural languages such as Python, C#, C, C++, Java, JavaScript, Lua, and Ruby. The significance of their work is that there is no single approach of quantifying the effectivity of test suites. Furthermore, OCCF provided a common code able to parse and analyze the abstract syntax tree of an application’s source to execute test coverage regardless of the language. Specifically, the instrumentation component processes source code by generating an abstract syntax tree used to create a modified version of production code. Subsequently, the testing framework executed the modified production code to generate coverage data. However, despite the benefits of OCCF, it does have a significant drawback. In particular, it required the usage of the .Net framework and could only work with procedural languages. Hence, those requirements will limit the possible range of future users who are likely to use OCCF. In a similar fashion, Rosu designed the K framework that allowed the developers to conduct a wide variety of program analysis such as interpreting, parsing, symbolic execution, model checking, and semantic debugging. Even though the project was not yet mature, the K framework demonstrated
42
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS promise because it was an open-source project with ample video documentation and was supported by 18 developers. Another multilingual tool capable of symbolic execution was developed by Bucur, Kinder, Candea (2014), who introduced CHEF, an open-source system for implementing the symbolic execution of interpreted languages such as Python, Ruby, and Lua. One of the strong points was that CHEF took a language-independent approach because it defined language semantics using the interpreter of the target language that eliminated the possibility of having multiple dialects. CHEF worked by utilizing the target language’s interpreter, the program to be analyzed, and test specification to produce concrete test cases. Despite CHEF’s flexibility, a downside was that it was not as fast as symbolic execution engines that target only one language such as NICE (Bucur, Kinder, & Candea, 2014). Constraint Solvers Applied Constraint solvers are used to obtain test cases by solving the set of constraints generated from paths traversed by the symbolic executor. There are a variety of constraint solvers currently used in practice for symbolic execution (Oe, Stump, Oliver, & Clancy, 2012; Cimatti, Griggio, Schaafsma, & Sebastiani, 2013; Biere, Knoop, Kovacs, & Zwirchmay, 2013). Case in point, researchers collaborated to create MathSAT5, a satisfiability modulo theories (SMT) solver (Cimatti, Griggio, Schaafsma, &Sebastiani, 2013). MathSAT5 was the most recent advancement in a succession of MathSAT solvers. What makes this version stand out in comparison to its predecessor was its increased
43
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS efficiency, additional extensions, and capability to integrate with third-party software. This was important because the faster the SMT solver, the faster constraints are solved which increases the overall speed of symbolic execution. Moreover, the researchers demonstrated an improvement by showing that MathSAT5, in comparison to MathSAT4, completed more COMP09 benchmarks in less time. One key advantage of MathSAT5 was its cross-platform compatibility. For instance, MathSAT5 was available on Linux, MacOS, and Windows operating systems. Furthermore, it has Java and Python API bindings. Unfortunately, due to license restrictions, MATHSAT5 can only be used for non-commercial applications. Another popular constraint solver used by researchers was Verstat, which is a Boolean satisfiability (SAT) solver capable of clause learning, watching literals, optimizing conflict analysis, non-chronological backtracking, and decision heuristics (Oe, Stump, Oliver, & Clancy, 2012). Verstat stands out because it used low-level data structures such as mutable arrays for clauses and machine integers for literals. However, one major drawback is that Verstat is still immature and needs more development to compete with mainstream SAT solvers such as PicoSat. Researchers conducted an experiment that showed Verstat only completed 19 out of 50 benchmarks in comparison to PicoSat, which solved 46, and MiniSat, which solved 47 (Oe, Stump, Oliver, & Clancy, 2012). Developed in a similar fashion, Z3 was an SMT solver and theorem prover created by Microsoft that produced solutions based on Boolean-constraint inputs.
44
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS Recently, researchers extended Z3 to produce Z3-str, which differed because it was capable of treating strings as a primitive type (Biere, Knoop, Kovacs, & Zwirchmay, 2013). The key advantage of this added functionality was that it permitted multiple programs paths that could be used for both static and dynamic analysis. Additionally, Z3str reached a larger audience in comparison to other strings solvers because it was based on Z3, which has API bindings for Python, C++, C, and .Net. Summary This chapter discussed automated test case generation in terms its exposition, vulnerability scanning, model-based testing, and symbolic execution. The exposition highlighted gaps in the research that will be addressed by this dissertation. The vulnerability scanning section brought up ramifications that occur due to common security flaws in web applications. Subsequently, model-based testing described how automated test case generation could be used to find defects in applications by generating a model of the GUI and executing actions that could cause errors. The majority of Chapter Two focused on various aspects of symbolic execution such as the two types of symbolic execution, an overview of symbolic execution for multiple languages, and the steps that make up the symbolic execution process. Chapter Three will present the methodology used to conduct the experiments that determine whether symbolic execution of Python applications increase code coverage and/or discover more defects.
45
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS References Amalfitano, D., Fasolino, A., Carmine, S., Memon, A., & Tramontana, P. (2012). Using GUI ripping for automated testing of Android applications. Automated Software Engineering 2012, 258-261. doi: 10.1145/2351676.2351717 Amalfitano, D., Asolino, A., & Tramontana, P. (2011). Automating GUI testing for android applications. International Conference on Software Testing, Verification and Validation Workshops, 252-261. doi: 10.1109/ICSTW.2011.77 Anand, S., Naik, M., Harrold, M., & Yang, H. (2012). Automated concolic testing of smartphone apps. Foundations of Software Engineering 2012. doi: 10.1145/2393596.2393666 Anand, S., Pasareanu, C., & Visser, W. (2006). Symbolic execution with abstract subsumption checking. 13th International SPIN Workshop, 3925, 163-181. doi: 10.1007/11691617_10 Asthana, A., & Okumoto, K. (2012). Integrative software design for reliability: Beyond models and defect prediction. Bell Labs Technical Journal, 17(3), 37-59. doi: 10.1002/bltj Biere, A., Knoop, J., Kovacs, L., & Zwirchmay, J. (2013). SmacC: A retargetable symbolic execution engine. Automated Technology for Verification and Analysis, 482-486. doi: 10.1007/978-3-319-02444-8_40 Boyer, R., Elspas, B., & Levitt, K. (1975). SELECT—A formal system for testing and debugging programs by symbolic execution. ACM SIGPLAN Notices–
46
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS International Conference on Reliable Software, 10(6), 234-245. doi: 10.1145/800027.808445 Bucur, S., Kinder, J., & Candea, G. (2014). Prototyping symbolic execution engines for interpreted languages. Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems. doi: 10.1145/2541940.2541977 Bugrara, S., & Engler, D. (2013). Redundant state detection for dynamic symbolic execution. 2013 USENIX Annual Technical Conference, 199-211. Retrieved from http://www.stanford.edu/~suhabe/atc13-bugrara.pdf Cadar, C., & Engler, D. (2014). Execution generated test cases: How to make systems code crash itself. Stanford University Computer Systems Laboratory. Retrieved from https://www.doc.ic.ac.uk/~cristic/papers/egt-spin-05.pdf Cadar, C., & Sen, K. (2013). Symbolic execution for software testing: Three decades later. Communications of the ACM, 56(2), 82-90. doi: 10.1145/2408776.2408795 Cadar, C., Godefroid, P., Khurshid, S., Pasareanu, C., Sen, K., Tillmann, N., & Visser, W. (2011). Symbolic execution for software testing in practice— -Preliminary assessment. International Conference on Software Engineering, 10661071. doi: 10.1145/1985793.1985995 Canini, M., Venzano, D., Peresini, P., Kostic, D., & Rexford, J. (2012). A NICE Way to test OpenFlow applications. Presented as part of the 9th USENIX Symposium on Networked Systems, 127-140. Retrieved from
47
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS https://www.usenix.org/system/files/conference/nsdi12/nsdi12-final105.pdf Chen, T., Zhang, X., Chen, R., Yang, B., & Bai, Y. (2014). Conpy: Concolic execution engine for Python applications. Algorithms and Architectures for Parallel Processing Lecture Notes in Computer Science, 150-163. doi: 10.1007/978-3-319-11194-0_12 Cimatti, A., Griggio, A., Schaafsma, B., & Sebastiani, R. (2013). The MathSAT5 SMT solver. Tools and Algorithms for the Construction and Analysis of Systems Lecture Notes in Computer Science, 7795, 93-107. Retrieved from https://es-static.fbk.eu/people/griggio/papers/tacas13.pdf Clarke, L. (1976). A system to generate test data and symbolically execute programs. IEEE Transactions on Software Engineering, 2(3), 215-222. doi: 10.1109/TSE.1976.233817 Dallmeier, V., Burger, M., Orth, T., & Zeller, A. (2013). WebMate: A tool for testing web 2.0 applications. Software Quality. Increasing Value in Software and Systems Development, 133, 55-69. doi: 10.1007/978-3-642-35702-2_5 Derezieska, A., & Halas, K. (2014). Experimental evaluation of mutation testing approaches to Python programs. 2014 IEEE International Conference on Software Testing, Verification, and Validation Workshops, 156-164. doi: 10.1109/ICSTW.2014.24 Esfahani, N., Kacem, T., Mirzaei, N., Malek, S., & Stavrou, A. (2013). A whitebox approach for automated security testing of Android applications on the cloud.
48
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS Automation of Software Test, 22-28. doi: 10.1109/IWAST.2012.6228986 Garg, M., Lai, R., & Huang, S. (2011). When to stop testing: A study from the perspective of software reliability models. IET Software, 5(3), 263-273. doi: 10.1049/iet-sen.2010.0007 Geldenhuys, J., Aguirre, N., Frias, M., & Visser, W. (2013). Bounded lazy initialization. NASA Formal Methods: Lecture Notes in Computer Science, 7871, 229-243. doi: 10.1007/978-3-642-38088-4_16 HIPAA (2014). The reality of HIPAA violations and enforcement. Retrieved from http://www.hipaa.com/2013/10/the-reality-of-hipaa-violations-and-enforcement/ Howden, W. (1977). Symbolic testing and the DISSECT symbolic evaluation system. IEEE Transactions on Software Engineering, 3(4), 266-278. doi: 10.1109/TSE.1977.231144 Hu, C., & Neamtiu, I. (2011). Automating GUI testing for android applications. International Workshop on Automation of Software Test, 77-83. doi: 10.1145/1982595.1982612 Jacky, J. (2011). PyModel: Model-based testing in Python. Proceedings of the 10th Python in Science Conference. Retrieved from http://phd.gccis.rit.edu/weile/docs/le_icse13.pdf Jiang, C., Liu, B., Yin, Y., & Liu, C. (2009). Study on real-time test script in automated test equipment. Reliability Maintainability and Safety, 738-742. doi: 10.1109/ICRMS.2009.5270090
49
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS Karsoliya, S., Sinhal, A., & Kanungo, A. (2013). Combined architecture for early test case generation and test suit reduction. International Journal of Computer Science Issues, 10(1), 484-489. Khurshid, S., Pasareanu, C., & Visser, W. (2003). Generalized symbolic execution for model checking and testing. TACAS ‘03 Proceedings of the 9th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, 553-568. Retrieved from http://cse.unl.edu/~dwyer/csce896/calendar/papers/khurshid03generalized.pdf King, J. (1976). Symbolic execution and program testing. Communications of ACM, 19(7), 385-394. doi: 10.1145/360248.360252 Kuzma, J. (2011). Web vulnerability study of online pharmacy sites. Informatics for Health and Social Care, 36(1), 20-34. doi: 10.3109/17538157.2010.520418 Larson, E., & Austin, T. (2003). High coverage detection of input-related security faults. Proceeding SSYM’03 Proceedings of the 12th Conference on USENIX Security Symposium, 12. doi: 10.1.1.133.8131 Le, W. (2013). Segmented symbolic analysis. Proceedings of the 2013 International Conference on Software Engineering, 212-221. Retrieved from http://phd.gccis.rit.edu/weile/docs/le_icse13.pdf Lee, J., Kang, S., & Lee, D. (2012). Survey on software testing practices. IET Software, 6(3), 275-282. doi: 10.1049/iet-sen.2011.0066 Maevsky, D., Yaremchuk, S., & Shapa, L. (2014). A method of a priori software
50
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS reliability evaluation. Reliability: Theory and Applications, 9(1), 64-72. Retrieved from http://gnedenkoforum.org/Journal/2014/012014/RTA_1_2014-02.pdf McCamant, S., Payer, M., Caselden, D., Bazhanyuk, A., & Song, D. (2013). Transformation-aware symbolic execution for system test generation. EECS Department, University of California, Berkeley. Retrieved from http://www.eecs.berkeley.edu/Pubs/TechRpts/2013/EECS-2013-125.pdf Mehlitz, P., Tkachuk, O., & Ujma, M. (2011). JPF-AWT: Model checking GUI applications. 2011 International Conference on Automated Software Engineering, 584-587. doi: 10.1109/ASE.2011.6100131 Mesbah, A., Deursen, A., & Lenselink, S. (2012). Crawling AJAX-based web applications through dynamic analysis of user interface state changes. Association for Computing Machinery Transactions on the Web, 6(1). doi: 10.1145/2109205.2109208 Mirzaei, N., Malek, S., Pasareanu, C., Esfahani, N., & Mahmood, R. (2012). Testing android apps through symbolic execution. ACM SIGSOFT Software Engineering Notes, 37(6). Retrieved from http://dl.acm.org/citation.cfm?id=2382798 Oe, D., Stump, A., Oliver, C., & Clancy, K. (2012). Versat: A verified modern SAT solver. Verification, Model Checking, and Abstract Interpretation Lecture Notes in Computer Science, 7148, 363-378. doi: 10.1007/978-3-642-27940-9_24 Prasanna, M., Chandran, K., & Thiruvenkadam, K. (2011). Automatic test case
51
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS generation for unified modelling language collaboration diagrams. ETE Journal of Research, 57(1), 77-81. doi: 10.4103/0377-2063.78373 Ramamoorthy, C., Ho, S., & Chen, W. (1976). On the automated generation of program test data. IEEE Transactions on Software Engineering, 2(4), 293-300. doi: 10.1109/TSE.1976.233835 Rosu, G. (2013). Specifying languages and verifying programs with K. SYNASC ‘13, IEEE/CPS. Retrieved from http://fsl.cs.illinois.edu/FSL/papers/2013/rosu-2013synasc/rosu-2013-synasc-public.pdf Sakamoto, K., Shimojo, K., Takasawa, R., Washizaki, H., & Fukazawa, Y. (2013). OCCF: A framework for developing test coverage measurement tools supporting multiple programming languages. 2013 IEEE International Conference on Software Testing, Verification, and Validation, 422-430. doi: 10.1109/ICST.2013.59 Sapra, S., Minea, M., Chaki, S., Gurfinkel, A., & Clarke, E. (2013). Finding errors in python programs using dynamic symbolic execution. Lecture Notes in Computer Science, 8254, 283-289. doi: 10.1007/978-3-642-41707-8_20 Sen, K., Kalasapur, S., Brutch, T., & Gibbs, S. (2013). Jalangi: A tool framework for concolic testing, selective record-replay, and dynamic analysis of JavaScript. Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, 615-618. doi: 10.1145/2491411.2494598 Sen, K., Marinov, D., & Agha, G. (2005). CUTE: A concolic unit testing engine for C.
52
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS Proceedings of the 10th European Software Engineering Conference, 263-272. doi: 10.1145/1081706.1081750 Siddiqui, J., & Khurshid, S. (2012). Staged symbolic execution. SAC ‘12 Proceedings of the 27th Annual ACM Symposium on Applied Computing, 1339-1346. doi: 10.1145/2245276.2231988 Takala, T., Katara, M., & Harty, J. (2011). Experiences of system-level model-based GUI testing of an android application. Fourth IEEE International Conference on Software Testing, 377-386. doi: 10.1109/ICST.2011.11 Wilson, C. (2013). Portable game based instruction of American Sign Language (Master’s thesis). Retrieved from ProQuest Dissertations and Theses. (UMI No. 1544218). Yang, W., Prasad, M., & Xie, T. (2013). A grey-box approach for automated GUI-model generation of mobile applications. Fundamental Approaches to Software Engineering Lecture Notes in Computer Science, 7793, 250-265. doi: 10.1007/978-3-642-37057-1_19 Zheng, Y., Zhang, X., & Ganesh, V. (2013). Z3-str: A z3-based string solver for web application analysis. ESEC/FSE 2013 Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, 114–124. doi: 10.1145/2491411.2491456
53
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS CHAPTER THREE: METHODOLOGY Introduction Given that X is the number of unique defects discovered by test cases conceived by the test engineer and Y is the number of unique defects discovered by test cases generated through symbolic execution, this research aims to determine if Y > X. Moreover, let A represent the number of execution paths caused by test cases created by the test engineer and B represent the number of execution paths caused by test cases generated through symbolic execution. Then another aim of this study is to determine if B > A. This chapter will describe the methodology used to implement this study in terms of research rationale, setting, data collection, data analysis, issues of trustworthiness, limitations, and delimitations. Rationale for Research Approach In prior literature, code coverage was defined as the percentage of reachable program statements executed by a test case or a test suite (Amalfitano, Fasolino, Carmine, Memon, & Tramontana, 2012; Yang, Prasad, & Xie, 2013). Therefore, this dissertation adopted the same definition for code coverage. In regards to the performance of the symbolic execution engine, this dissertation used the method applied by previous researchers (Larson & Austin, 2003; Sen, Marinov, & Agha, 2005). To elaborate, this study compared the defects discovered by the symbolic execution to the number of defects discovered without the symbolic execution. Note, even though typically the execution times would be recorded, this study does not call into question the speed of the
54
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS symbolic execution. Therefore, it is not necessary to measure the symbolic execution times. However, in the future, it might be useful to improve the symbolic execution algorithm being used and repeat this study to ascertain any speed improvements. Research Setting Test cases were run on a machine with a Linux Mint 17 operation system, 64 bit i5 Intel processor, and 6 GB RAM memory. The version of Python used for this study was Python 2.7.6. Git was used as the version control system for the source code being studied. The source code repository that was analyzed has two branches: scrutinized and unscrutinized. The unscrutinized branch stores the application’s source code before defects have been removed. Then the scrutinized branch stores the application’s source code after defects have been removed. Data Collection Methods The application analyzed in this study was a mobile phone Kivy calculator app, which consisted of many modules. Out of all these modules, the calculation module was the workhorse of this app. Therefore, this study selected the unscrutinized branch of calculation module for analysis to meet the first assumption expressed in Chapter One that states the program being analyzed must have some defects. In that way, the utility of the symbolic execution engine is demonstrated. This study involved two groups. The control group consisted of three test suites with 400, 4,000, and 40,000 test cases, respectively. Each test case in the control test suites was created by randomly selecting an integer to represent the input variables.
55
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS Specifically, the “left” variable is a random integer between -1001 and 1000, the “right” variable is a random integer between -1001 and 1000, and the “op” variable is a random integer between 0 and 3. To clarify, “left” is the first number of a calculation; “op” is the operation, represented as an integer; and “right” is the second number of a calculation. This process emulates the actions taken by a test engineer while manually testing the calculator’s mobile application functionality. The experiment group consisted of a single test suite. The experiment group was generated by Halfwaytree symbolically executing the unscrutinized branch of the calculation module. Subsequently, the unscrutinized branch of the calculator module was executed with the test suites from the control group and the experiment group. After the module was executed with the test suites, data on total time, inputs of death, and coverage were recorded. A similar approach was used to generate test cases in a related study (Amalfitano, Fasolino, Carmine, Memon, & Tramontana, 2012). Data Analysis Methods With regards to code coverage, T was the total number of executable statements in the entire calculation module. N was the total number of unique statements executed by the all non-symbolic test cases and M was the total number of unique statements executed by all symbolic test cases. This study determined if M == T and if M > N. With regards to defects discovered, Y is the total number of unique defects discovered by symbolic execution-generated test cases and X is the total number of defects discovered by non-symbolic execution-generated test cases. This study
56
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS determined if Y > X. Issues of Trustworthiness In terms of internal validity and reliability, the non-symbolic execution tests for the control group were repeated 20 times to remove the influence of variations. This was a necessity because the non-symbolic execution based test cases were generated randomly. On the other hand, the symbolic execution test for the experiment group were conducted only once because it produced the same test cases regardless of how many times it was repeated. With regards to external validity, a similar approach was used to generate test cases for an Android application in a related study (Amalfitano, Fasolino, Carmine, Memon, & Tramontana, 2012). Additionally, the following three-step approach is used to demonstrate Halfwaytree external validation. One, an independent source, obtains a Python code and its known inputs of death from symbolic execution publications. Two, the independent source uses Halfwaytree to analyze Python code and discover the inputs of death. Three, the inputs of death obtained in step one are compared to the inputs of death obtained in step two. Summary Chapter Three discussed the aspects of the methodology in terms of the exposition, research rationale, research setting, data collection, and data analysis. Furthermore, issues of trustworthiness were addressed in terms of internal and external validity. For simplicity, this chapter referred to any common and measurable characteristic of the study as a research variable. Chapter Four will present the results
57
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS obtained from applying the research methodology described in Chapter Three. References Amalfitano, D., Fasolino, A., Carmine, S., Memon, A., & Tramontana, P. (2012). Using GUI ripping for automated testing of Android applications. Automated Software Engineering 2012, 258-261. doi: 10.1145/2351676.2351717 Larson, E., & Austin, T. (2003). High coverage detection of input-related security faults. Proceeding SSYM’03 Proceedings of the 12th conference on USENIX Security Symposium, 12. doi: 10.1.1.133.8131 Sen, K., Marinov, D., & Agha, G. (2005). CUTE: A concolic unit testing engine for C. Proceedings of the 10th European Software Engineering Conference, 263-272. doi: 10.1145/1081706.1081750 Yang, W., Prasad, M., & Xie, T. (2013). A grey-box approach for automated GUI-model generation of mobile applications. Fundamental Approaches to Software Engineering Lecture Notes in Computer Science, 7793, 250-265. doi: 10.1007/978-3-642-37057-1_19
58
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS CHAPTER FOUR: RESULTS Introduction The purpose of the study was to compare the number of unique defects generated by unit tests set up randomly to the number of unique defects generated by unit tests created with Halfwaytree. This research also compared the number of execution paths traversed by the unit tests created randomly to the number of execution paths traversed by unit tests using Halfwaytree. This chapter is arranged in terms of Halfwaytree validation, description of samples, survey of results, and a brief summary. In addition, the survey of results are organized in terms of the control and experimental groups. Halfwaytree Validation To test the effectiveness of Halfwaytree, two source code samples were selected from the symbolic execution literature and rewritten as Python equivalents (Cadar & Sen, 2013; Khurshid, Pasareanu, & Visser, 2003). Next, the Python equivalents of source codes were analyzed with Halfwaytree to create test cases for every feasible execution path. Subsequently, for each source code, the Python equivalent of the source code was executed with each test case to obtain the same execution paths indicated in the literature (Table 1).
59
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS Table 1 Side-by-Side Comparison of Source Code from Cadar and Sen (2013) Source Code from Cadar and Sen
Python Equivalent of Source Code
int twice ( int v) { return 2∗v; }
x=0 y=0 z=2*y if z==x: if x>y+10: assert False
void testme ( int x, int y) { z = twice (y ); if (z == x) { if (x > y+10) ERROR; } } int main() { x = sym input(); y = sym input(); testme (x, y ); return 0;
} Note. The Python equivalent code was simplified due to the limitations of Halfwaytree, but the execution tree of the original source code from Cadar and Sen is still represented by the Python equivalent.
60
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS Table 2 Side-by-Side Comparison of Execution Tree from Cadar and Sen (2013) Execution Tree from Cadar and Sen (2013)
61
Execution Tree from Halfwaytree
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS As shown in Table 2, a major visual difference between the two execution trees being compared is that Cadar and Sen’s execution tree has the True if-statement branches on the right hand side and the False if-statement branches on the left hand side. Therefore, the tree from Halfwaytree appears mirrored. Another difference between the two execution trees occurs at the if-statement where z==x. The execution tree from Cadar and Sen expresses the variable “z” in terms of its node state, which is “2*y,” unlike the Halfwaytree, which displays “z” as “z.” Both execution trees have three paths, and one of the paths causes an error. Even neither execution tree shows the same test cases per execution path, these test cases correspond to each other. Specifically, as shown in Tables 3 and 4, [x=0, y=1] in Cadar and Sen corresponds to [x=1, y=0] in Halfwaytree; [x=2, y=1] in Cadar and Sen corresponds to [x=20, y=10] in Halfwaytree; and [x=30, y=15] in Cadar and Sen corresponds to [x=22, y=11] in Halfwaytree. Table 3 Side-by-Side Comparison of Source Code from Khurshid, Pasareanu, and Visser (2003) and Python Source Code from Khurshid, Pasareanu, and Visser Python Equivalent of Source Code int x, y; if (x > y) { x = x + y; y = x - y; x = x - y; if (x - y > 0) assert(false); }
x=0 y=0 if x>y: x=x+y y=x-y x=x-y if x-y>0: assert False
62
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS Table 4 Side-by-Side Comparison of Execution Tree from Khurshid, Pasareanu, and Visser (2003) along with the Halfwaytree Tree Execution Tree from Khurshid, Pasareanu, Execution Tree obtained from Halfwaytree and Visser (2003)
63
SYMBOLIC EXECUTION FOR KIVY APPLICATIONS In Table 4, the execution tree from Khurshid, Pasareanu, and Visser (2003) does not show the actual concrete test cases. Instead, it shows path constraints that can be solved to obtain feasible solutions at each node. Similar to the execution tree from Cadar and Sen, a major difference between the execution tree from Halfwaytree and the execution tree from Khurshid, Pasareanu, and Visser in that the latter shows the node state of symbolic variables. Both trees have three execution paths, and one of these paths is completely not feasible because there is no solution for the path constraints at that particular path. With regards to the relationship among execution paths and test cases, [x>y, yx>0] in Khurshid, Pasareanu, and Visser corresponds to [path unsatisfiable] in the Halfwaytree; [x>y, y-x