Keywords: Software quality assurance; software testing; Internet of Things; .... performance or condition of devices, equipment or other things which generate .... Therefore, practitioners and researchers need automated tools and methods to.
Data Science Challenges to Improve Quality Assurance of Internet of Things Applications Harald Foidl, Michael Felderer Institute for Computer Science University of Innsbruck Innsbruck, Austria {herald.foidl,michael.felderer}@uibk.ac.at
Abstract. With the increasing importance and complexity of Internet of Things (IoT) applications, also the development of adequate quality assurance techniques becomes essential. Due to the massive amount of data generated in workflows of IoT applications, data science plays a key role in their quality assurance. In this paper, we present respective data science challenges to improve quality assurance of Internet of Things applications. Based on an informal literature review, we first outline quality assurance requirements evolving with the IoT grouped into six categories (Environment, User, Compliance / Service Level Agreement, Organizational, Security and Data Management). Finally, we present data science challenges to improve the quality assurance of Internet of Things applications sub-divided into four categories (Defect prevention, Defect analysis, User incorporation and Organizational) derived from the six quality assurance requirement categories. Keywords: Software quality assurance; software testing; Internet of Things; data science; process mining; software quality engineering
1
Introduction
As the third phase of the Internet revolution, the Internet of Things (IoT) has attracted a lot of attention from practitioners, researchers and industries from all over the world [1, 2]. Compared to its predecessors, the World Wide Web (1990’s) and the mobile Internet (the 2000’s), the IoT aims to connect and link all physical entities and objects of the real world with the Internet and with each other [1, 3]. Thus, information and communication systems will be invisibly embedded in our environment and enable connectivity for anything and anyone at anytime and anyplace with anything and anyone [1]. This opens tremendous opportunities for a variety of new and innovative applications [4] that promise a significant improvement of our everyday lives [5] (e.g. smart homes where smart fridges monitor the expiration date of food and reorder consumed food completely autonomous online). Beside the huge number of opportunities which are coming along with the IoT, the emergence of the IoT also unveils several challenges in data privacy, safety and security [3]. In addition, the IoT requires the handling of a huge number of heterogeneous devices from simple sensors and RFID tags to intelligent cars and fridges. This in turn leads to the challenge to deal with the growing complexity caused by the IoT. Therefore, developers of the IoT
2
Foidl and Felderer
have to deal with a massive amount of data and a great number of different devices which cannot be envisioned by them [6]. This resulting new and challenging environment consequently increases the requirements to develop and deliver high quality and error-free IoT software applications. With the emergence of the IoT, especially the amount of available data literally explodes and accumulates at an unpredictable speed bringing humanity finally in the big data era [7-9]. This poses new challenges to software quality assurance and requires radically new quality assurance methods and techniques. Hence, typical software quality assurance processes, which can be seen as “a set of activities and tasks that enable software suppliers to produce, collect, and validate evidence […] that the software product conforms to its established requirements” [10], must be adapted to address these new requirements. Marwah et al. [3] even claim that software quality assurance in the IoT can be seen as a new era of research. However, researchers and practitioners realized that analyzing and exploiting data and information provides a great opportunity to gain several significant benefits in different application scenarios [7]. Therefore, data science is currently gaining a lot of attraction and becoming an important topic of investigation [8, 11]. Provost and Fawcett [12] describe data science at a high level as “a set of fundamental principles that support and guide the principled extraction of information and knowledge from data”. The application of data science principles in the field of software engineering already emerged as a successful research direction with a lot of contributions (i.e. mining software repositories, mining software engineering data, software analytics) [13-15]. Bird et al. [16] even state that there is currently an unsustainable “bubble” in data science for software engineering. Nevertheless, with the advent of the IoT and the associated tremendous increase of available data there arises the promising opportunity to apply data science techniques for assuring the challenging quality requirements of IoT applications. Hence, this paper first presents software quality assurance requirements of the IoT based on an informal literature review. Afterwards, challenges of data science relating to improve quality assurance of IoT applications are illustrated. The remainder of this paper is structured as follows. Section 2 briefly introduces the IoT and its main characteristics. Afterwards, Section 3 presents basic background information on data science. Section 4 then presents the upcoming requirements of IoT compared to traditional software regarding software quality assurance. Based on these requirements, Section 5 outlines challenges and opportunities how data science can be applied in quality assurance of the IoT. Finally, Section 6 concludes the paper and suggests possible future work in this area.
2
Internet of Things
As a promising area of future technology [2] the IoT is defined by various researchers and practitioners in different ways depending on the perspective taken [17]. Semantically, the term “Internet of Things” describes a world-wide network of seamlessly integrated and uniquely addressable objects and devices interacting via standard communication protocols [1, 4, 17]. Such “smart” objects and devices will communicate and interact completely wireless by using emerging technologies like
Data Science Challenges to Improve QA of IoT Applications
3
real-time localization or near-field communication and in synergy within a network of various different sensors and actuators [17]. The different application domains and scenarios of the IoT are large and therefore also the expected increase in the quality of our lives [17]. Typical application scenarios are the transportation and logistics domain (i.e. intelligent decisions on routing of products), healthcare domain (i.e. personalize patient care) and smart cities, homes and factories (e.g. energy savings and property protection, Industry 4.0) [2, 4, 18]. These are by far not all the possible application domains, but they clearly demonstrate the three categories where IoT can be applied: Monitoring and control, big data and business analytics and information sharing and collaboration [2]. Monitoring and control IoT applications allow the user to constantly track the performance or condition of devices, equipment or other things which generate data and are connected to the IoT network (e.g. tracking temperature and humidity of transported goods, tracking body temperature, heart beat rate, blood pressure). Therefore, decisions can be taken based on real time data and future outcomes can be predicted or areas of potential improvement can be identified (i.e. predictive maintenance). By using business analytics someone is able to discover and resolve critical business issues based on real time data generated by different embedded sensors and actuators of IoT machines and devices (e.g. changes in market conditions or customer behavior). Through information sharing and collaboration between all participants of the IoT (things as well as people), information delay can be avoided and situational awareness can be significantly enhanced (e.g. increasing customer engagement, increasing productivity in plants) [2, 4, 17-20]. To implement and deploy the concept of the IoT there are several technologies used [2, 18]. The most essential and widely used are radio-frequency identification (RFID), wireless sensor networks (WSN), middleware, cloud computing and IoT applications [2]. As the IoT is currently in its infancy, there is a lack of common underlying software to build IoT applications and developers must build coherent IoT applications out of several unrelated software modules. Therefore, a big challenge for developers of IoT applications is to combine the numerous different user- and industry-specific IoT applications in order to build stable systems ensuring robust and reliable interactions and communication. [2, 21] To put it simply, the IoT is characterized by the facts that anything can communicate and interact with anything and anything is uniquely identified [22]. In more detail, Miorandi et al. [22], Zhang et al. [23] and Liu and Zhou [24] describe the main characteristics and features of the IoT as: Devices heterogeneity, Scalability and Real-time, Localization and tracking capabilities, Self-organization capabilities, Semantic interoperability and data management, Embedded security and privacypreserving mechanisms, Reliability and Energy-optimized solutions.
4
3
Foidl and Felderer
Data Science
This section aims to give a brief overview about data science. First, Subsection 3.1 introduces the emerging field of data science. Subsection 3.2 then illustrates related work of the application of data science techniques on software engineering data. 3.1
Background
Driven by the large amount of abundantly available data, van der Aalst [25] states that we currently see the birth of a new discipline called “data science”. Although the term “data science” was already first used by Peter Naur in 1966 [26] as a suitable replacement for the term “computer science”, its common usage in academic and research began not before the 1990’s [8, 27]. In accordance with the emerging of data science in research, several journals dealing with scientifically data and its application were founded (e.g. Data Science Journal [28], Journal of Data Science [29]). Although several research contributions dealt with the content, scope and topics of data science [8, 12, 30-32], a common definition of it is still missing or varies between researchers [8]. Waller and Fawcett [33] define data science as “the application of quantitative and qualitative methods to solve relevant problems and predict outcomes”. A more detailed definition was stated by Smith [27] who defines data science as “the study of the capture of data, their analysis, metadata, fast retrieval, archiving, exchange, mining to find unexpected knowledge and data relationships, visualization in two and three dimensions including movement, and management”. In this paper we want to use the in our opinion most intuitive definition given by Provost and Fawcett [12] who describe data science at a high level as “a set of fundamental principles that support and guide the principled extraction of information and knowledge from data”. However, data science is multidisciplinary and draws from many fields of study [12, 34]. Van der Aalst [34] depicts data science as a broad discipline which combines different sub-disciplines as Figure 1 illustrates.
Fig. 1. Data Science [34]
It is important to realize that data science is more than analytics, statistics and mining data. Beside knowledge about methods to visualize, analyze and interpret data, data scientists should also have practical knowledge about information
Data Science Challenges to Improve QA of IoT Applications
5
technology and programming to realize solutions. Further, data science is a discipline where intuition, common sense and creativity are needed. In addition, it is an essential capability of every data scientist to view business problems from a data perspective. Therefore, successful data scientists must have beside excellent analytical skills also deep domain knowledge, knowledge about business models and the ability to communicate their message [12, 33, 34]. Basically, the application domains of data science are manifold (e.g. marketing, finance, manufacturing) [12]. In the course of this paper we will focus on data science and its application on software engineering. The next subsection describes some related work about the application of data science in the field of software engineering. 3.2
Data Science in Software Engineering
In order to make daily decisions, software engineers and managers typically have to answer several questions (i.e. which parts of a software system to change and test? when to release a new feature and who uses this feature?) [15]. As different types and volume of software engineering data had grown with an vast rate over the past few years, manual investigation and browsing through these data becomes impossible [35]. Therefore, practitioners and researchers need automated tools and methods to analyze this huge amount of data in order to get valuable information out of it [16, 35]. As a result, the application of data science methods to software projects became a normal course in research [16, 36]. How valuable and important the application of data science on software engineering for software practitioners is, show two surveys of Begel and Zimmermann [37]. They presented a catalog of 145 questions grouped into 12 categories that software engineers would like to ask data scientists about software projects, processes and practices. Hence, mining software engineering data (e.g. execution traces, bug databases, mailing lists, code bases, historical code changes) emerged as a successful research direction over time by providing valuable information about the status, progress and evolution of a software project [14]. Various published papers and many studies underpinned the successful application of data science and mining techniques in software engineering (e.g. Mockus et al. [38], Zimmermann et al. [39], Cheatham [40], Gorla et al. [41], or Chaturvedi et al. [42]). Concluding, the usage of data science techniques in the area of software engineering already resulted in a variety of successful applications and promises a great potential by the steadily increasing amount of data generated in software development processes.
4
Software Quality Assurance Requirements of the IoT
Today, software applications and systems govern and permeate nearly all aspects of our daily life [43]. Resulting from this great dependence of today`s society on software, there is a strong demand for software products representing the highest quality [44]. Therefore, assuring the quality of software products is an essential part of current software development practices. In general, quality assurance is “a planned and systematic pattern of all actions necessary to provide adequate confidence that an item or product conforms to
6
Foidl and Felderer
established technical requirements” [45]. Hence, “quality assurance includes all techniques we use to build products in a way to increase our confidence as well as techniques to analyze products” [46]. As an essential quality assurance technique for modern software-intensive systems, software testing aims to improve the quality of software products by finding and correcting defects. Therefore, we treat software testing in the following as an essential main part of software quality assurance. Thus, we especially focus on software testing in the remainder of this paper as an essential quality assurance technique. With the emergence of the IoT and its characteristics, the environment for developers and software engineers radically changes related to develop and deliver high quality and error-free IoT software applications. Hence, software quality assurance and software testing activities must meet these new requirements and must be adapted to be compliant with the new and rapidly changing environment caused by the IoT. As already stated, Marwah et al. [3] even claim that software quality assurance in the IoT can be seen as a new era of research. This section outlines the requirements which are evolving with the IoT related to software quality assurance and software testing activities. Therefore, the main differences to traditional software quality assurance and testing are described. The presented requirements are based on an informal literature review considering grey as well as academic literature [47]. Google as web search engine and renowned databases as jstor, Ebsco, emeraldinsight, Elsevier ScienceDirect and AISEL (only abstract and title were browsed to limited number of hits) were used for reviewing the literature. Following keywords were used: “testing“ “internet of things“, “quality assurance“. The word order of the keywords was changed (i.e. “internet of things“ “testing“) and slightly modified (i.e. “assuring quality“ “IoT“) in order to increase the number of hits. Since the IoT is still in its infancy, not much academic literature on quality assurance and testing the IoT was available. Hence, we focused on grey literature including whitepaper, articles, presentations, blog entries and websites. We grouped the identified requirements based on the identified literature into six categories, namely Environment (E), User (U), Compliance / Service Level Agreements (C/SLA), Organizational (O), Security (S) and Data Management (DM) requirements. Whereas the most requirement categories are self-explanatory, we briefly want to describe the Environment and Organizational requirement categories. The Environment category describes the surrounding field in which IoT devices must operate. All managerial and organizational activities (e.g. planning, communication, social interaction, monitoring, controlling, or project management) of the software development life cycle are summarized under the requirement category Organizational. Table 1 shows the result of the literature review and the assigned requirement categories. We assigned the requirements categories to a contribution if the contribution at least mentioned or dealt with requirements within the defined category. Environment. With the emergence of the IoT, the environment in which quality assurance and testing activities are executed radically changes. As classical software quality assurance and testing activities are typically applied in defined environments (e.g. personal computer with a defined operating system and a network connection), testing and assuring the quality of IoT applications and solutions demands a shift to a complex dynamic environment with millions of sensors, actuators and different types
Data Science Challenges to Improve QA of IoT Applications
7
of devices in conjunction with intelligent software engines. Due to this great number of different devices and components in the IoT, many of them will not be available when IoT applications are developed and tested. Whitepapers Title The Internet of Things: QA Unleashed [48] Testing for Internet of Things (IOT) [49] Testing the Internet of Things [50] The Importance of Quality Assurance Testing for the Internet of Things [51] Keeping up with the Internet of Things (IoT): 10 Hints on Testing and Optimizing a Connected World [52] Internet of Everything – Test Strategy [53]
Company / Institution Cognizant
Author
Year
Category
Muthiah Subbiah, Venkatasubramani an Ramakrishnan
2015
E, U, C /SLA, O, S
n/a
n/a
E, U, S, DM
n/a
2015
E, S
TechArcis Solutions Polarion Software Ayla Networks
n/a
2016
E, U, C/SLA, S
Testbirds
n/a
n/a
E, U, S
Gerrard Paul
2015
E, U, DM
Author
Year
Category
Gerrard Consulting
Presentations Title How to test the Internet of Things [54] Testing the Internet of Things [55]
Event German Testing Night Munich Presentation TMF
Böger, Henning n/a
2013
E, U
n/a
E, U, S
Articles Title Testing the Internet of Things [56]
Journal Printed Circuit Design and Fab
Year
Category
Lau Mark
Author
2014
E
Website / Blog DevOps
Author Riley Chris
Year 2015
Category
LeanTesting
Hill Simon
2015
E, U, S
Neotys
Rexed Henrik
2015
E, U, C/SLA
Bailey Brian
2014
E
Rohrman Justin Lanka Venkata Ramana
2015
E, U, S
2015
E, C/SLA, S, DM
n/a
E, S
2016
E, U, C/SLA, S
2014
E
2014 2016
E, O, S
Year
Category
2015
E, S
Websites / Blogs Title Functional Testing for IoT [57] How do we test the Internet of Things? [58] Performance Testing 101: How to Approach the Internet of Things [59] How To Cut Verification Costs For IoT [60] Testing the Internet of Things [61] Testing the Internet of Things? It’s time to plan your test strategy [62] Security Testing the Internet of Things IoT [63] The Function of Quality Assurance (QA) with the Internet of Things [64] The testing challenges ahead for the Internet of Things [65] Testing Strategy for the IoT [66] Internet-of-Things [67]
Semiconductor Engineering SmartBear IoT-Now Beyond Security CenturyLink Cloud embedded LogiGear TestPlant
n/a Townsend Jonathan Hammerschmidt Christoph Hagar Jon n/a
E, C/SLA, O, S
E, U
Academic Literature Title Software Quality Assurance in Internet of Things [3]
Journal Int. Journal of Computer Applications
Author Marwah, Mateen Qudsia, Sirshar Mehreen
Tab. 1. Literature Review
8
Foidl and Felderer
In order to develop IoT applications, developers must understand and know the architecture of used third party hardware and devices. Often, vendors of used third party subcomponents restrict the access to their components and devices which increases the effort of development and further also quality assurance activities. Therefore, an authentic replication of such a dynamic and fully connected environment (e.g. smart cars, smart cities) is sheer impossible and classical integration testing is not feasible anymore. In addition to this, IoT devices must operate correctly and reliable under different physical conditions as extreme heat, cold, rain, snow or humidity which are also costly and complex to simulate. Quality assurance and testing of IoT applications and solutions requires testing the ability of devices to support the required functionality among other external devices and implementations. IoT devices must be able to deal with different messaging protocols, operating systems, software versions, types of networks and communication channels. Due to the fact that IoT devices are movable, they move in and out of different networks. This requires testing the functionality in different networks including all possible circumstances (e.g. bandwidth, dropped connections, lost packets). IoT applications must ensure that data is stored in case of a dropped connection or loss of power. The energy consumption of some IoT devices can be critical (for instance, for wearable devices). Therefore, quality assurance activities must also consider testing different power modes to ensure the device also works properly in each power mode. Further, many smart devices in the IoT have integrated self-healing abilities which recover the device’s state in case of failures. The purposely creation of such failures in order to evaluate the devices ability to recover itself properly is an upcoming challenge in testing and assuring quality of smart IoT devices. Due to IoT devices typically interact with clouds and mobile apps, quality assurance and testing IoT applications does not mean focusing only on the devices itself. Cloud services must be able to deal with increasing user demand (e.g. on Christmas, typically many new devices will go online) and user requests (e.g. on New Year’s Eve, typically many people interact with their devices in order to send new year celebrations). Therefore, scalability and reliability testing of services which are used by IoT applications and offered by cloud computing is essential. As a lot of IoT applications operate and deliver data in real time, quality assurance and testing activities should also incorporate evaluating the real time behavior of IoT applications. In general, testing real time ability is challenging because of the variety of possible real time use case scenarios and the complexity which emergences in combination with the applications intelligence. The real time replication, simulation and mimic of data generated by real world business processes is difficult (e.g. for stock exchange business processes). Further, testing and validating all analytical real time rules of a business process (e.g. which drug dose related to health parameters) can result in a massive effort due to the huge amount of different application scenarios of IoT applications. Nevertheless, testing real time ability is challenging but important for IoT applications. Therefore, logging to get time-stamped events is a very important prerequisite to evaluate and assess the real time ability of IoT devices. With the deployment of IoT devices, the feature to update and upgrade device software and firmware over the network (over-the-air) [68] will become common. Therefore, over-the-air testing will be needed in order to ensure that this feature works properly. Automatic regression testing routines can ensure that each device
Data Science Challenges to Improve QA of IoT Applications
9
works correctly after each upgrade. In case of failures, remotely initiated downgrading to earlier versions must be possible and remote debugging procedures should be considered in the architecture of IoT applications. User. As already stated, classical integration testing is not feasible anymore in the new emerging IoT environment. IoT devices can communicate and interact with various other devices such that a lot of bugs and failures will only emerge when the IoT device is used in the real world. Therefore, quality assurance and testing of IoT applications must incorporate user behavior and experience. Testers and quality assurance personal should mimic user behavior and test “as a user” to identify and correct as many failures as possible before the product is released. Based on the huge amount of different application scenarios of IoT products, testing and quality assurance activities should focus on evaluating functions and features that are important for the user. Nevertheless, user behavior strongly varies between different concurrent users and IoT applications offer many possible ways to use it that sufficient quality assurance and testing is not feasible without incorporating users. Therefore, quality assurance and testing of IoT applications and devices should consist of field testing executed by different user groups. Hence, data of how the users use the product can be collected and tracked in order to use it for assuring quality and testing. The user experience must be considered in order to guarantee a high quality and error free product as well as to satisfy user requirements. Along with the integration of the user in the quality assurance and testing procedure, new demand for remote and immediate user support arises. Failures of IoT devices must be immediately notified and remote debugging and error correction methods must be available. Otherwise, user satisfaction will dramatically decrease. Organizational. Assuring the quality of IoT deployments incorporates devices, network infrastructure and cloud services which will require different teams or persons testing different parts of the IoT deployment. The coordination of all involved people, exchanging information and data, is a new challenge which must be considered in the quality assurance of the IoT. Moreover, testing IoT devices and applications includes a lot of different components of different vendors. In case of failures or questions about unexpected behavior of these components, fast and efficient communication with third party vendors is essential to meet the strict quality assurance time plan. Compliance / Service Level Agreements. IoT solutions can be used at different geographic locations in different countries. Therefore, large scale releases of IoT devices and solutions must meet the governmental standards and regulations related to each country. Assuring that the IoT deployment is compliant to these standards and regulations is an important activity in the quality assurance of the IoT. In addition, vendors of third party components that are integrated in IoT solutions typically validate parameters of their components (e.g. performance) in defined environments. Assuring that these parameters are also met in new environments and in conjunction with other subcomponents of IoT devices must be done while testing the IoT application. This is essential because IoT devices must be compliant to service level agreements in order to meet user requirements. Moreover, a failure in one component of an IoT deployment can cause a ripple effect that leads to a wrong behavior of another IoT device which results in an
10
Foidl and Felderer
undesired outcome. In such a case it must be clearly defined who is responsible for the consequences. Security. Through the distributed nature of IoT devices communicating and interacting in a worldwide network several security issues arise. Quality assurance activities must ensure that sufficient authentication methods are integrated in every IoT device. The transmitted data must be properly protected and data storage in web clouds must fulfill stringent requirements. Different data privacy regulations, depending on the business sector in which the IoT applications are used, must be considered and evaluated while testing IoT applications. Moreover, a lot of devices will be upgraded with internet and network connections to enter the demanding IoT market. Therefore, the used hardware of these devices is rather old and can bear various vulnerabilities for hackers as soon as being connected to the Internet. Resulting, quality assurance activities must consider already existing hardware and software which were not designed for being connected to the world wide web. Data Management. With the advent of the IoT the amount of data generated and transmitted will explode. IoT devices will create tons of varying data in differing formats and quality. Resulting, quality assurance activities must deal with a diversity of data sources creating several different data types incorporating complex data structures in real time. The management of these data will play a key role in order to ensure feasible and efficient quality assurance of IoT applications (e.g. regarding the usage of recorded data from different types of devices and environments for validating IoT applications). Summarizing, the advent of the IoT causes several new requirements for the domain of software quality assurance and software testing. In order to meet these new requirements, quality assurance and testing activities must be adapted accordingly. The next section presents data science challenges in order to address these new emerging requirements.
5
Data Science Challenges
In this section we present challenges, opportunities and application scenarios of data science related to software quality assurance and software testing activities of the IoT. As stated in Section 3, the field of data science is very broad including many different sub-disciplines. Each of these disciplines is in itself very comprehensive, incorporating dozens of algorithms, methods and techniques. Moreover, a lot of algorithms and methods are used interchangeably in several disciplines [69, 70]. In order to avoid confusion and ambiguity, the challenges, opportunities and application scenarios presented in this section focus especially on the sub-disciplines data mining, process mining, statistics, machine learning and visualization. Concretely, we aim to motivate the application and usage of algorithms and techniques (e.g. classification, characterization, association and clustering algorithms) as well as methods (e.g. predictive modeling, findings patterns and relations) of these disciplines to support the quality assurance of IoT solutions and deployments. As the IoT will generate a massive amount of data, we see promising potential for innovative applications.
Data Science Challenges to Improve QA of IoT Applications
11
Following, we present data science challenges grouped into four categories (Defect prevention, Defect analysis, User incorporation and Organizational) for which we see promising opportunities to apply data science techniques and methods in order to improve quality assurance for IoT applications. These four categories are derived from the six requirement categories of the previous section as shown in Figure 2 by the dashed framed rectangles. Due to the large amount of data which becomes available with the emergence of the IoT the quality assurance requirement category Data Management influences all four data science challenge categories. The IoT with its huge amount of generated and available data is illustrated as an ellipsis in Figure 2. The six software quality assurance requirements of the IoT are illustrated by rounded rectangles. Further, the field of data science with its methods, techniques and algorithms is outlined by the rectangle with the two cut corners. The four remaining rectangles with the cut corners represent the four categories which contain the data science challenges. Moreover, the double arrows outline on the one hand the derivation of the four data science challenge categories from the six software quality assurance requirements and on the other hand which data science challenges can support which quality assurance requirements of IoT solutions and deployments.
Fig. 2. Derived Data Science Challenges
The challenges, opportunities and application scenarios of each of the four categories presented in the following subsection are, as not otherwise stated, inspired by the work of Kim et al. [36], Menzies and Zimmermann [35] as well as Buse and Zimmerman [71]. Further, we underpin the mentioned challenges with related work and example applications presented in published academic contributions. In order to avoid misunderstandings, we use the term defect in the following according to Wagner [46] as a superset of faults and failures.
12
Foidl and Felderer
5.1
Defect prevention
Preventing IoT applications and solutions from defects is an important and essential quality assurance task. Defect prevention aims to identify defects and unexpected behavior of the software as early as possible and to ensure that these defects and anomalies do not occur again. Applied at each stage of development, it can reduce overheads, costs, resources and the time needed for building IoT applications with a minimum of defects. [72] [73] Following, we present some data science opportunities to improve the defect prevention of IoT applications and solutions. One example to prevent IoT applications and solutions from defects is to exactly monitor their behavior. By applying data science techniques to monitor applications and to mine runtime data, valuable information can be gained. For example, Han et al. [74] proposed an approach that mines call stack traces to effectively discover performance bugs. Further, Jiang et al. [75] presented an approach that automatically analyzes execution logs of a load test and flags possible performance problems. Hindle et al. [76] investigated how software changes are impacting the software power consumption by mining software repositories. Moreover, Rubin et al. [77] suggest the usage of process mining techniques to application monitoring, predicting system failures and discovering architectural anti-patterns. As one sub-discipline of data science [34], process mining is typically applied to extract knowledge from event logs of information systems in order to discover, monitor and improve real processes [78, 79]. Hence, process mining “connects process models and data analytics” [25]. According to van der Aalst [78], there are already a lot of mature process mining techniques available which can be directly used in everyday practice. Another application scenario of data science techniques for preventing defects is the injection of telemetry to detect anomalies. Also the dynamical program analysis can be enhanced by applying methods from data science. Shershakov and Rubin [80] presented several examples how process mining techniques can be successfully applied for system runtime analysis. Moreover, determining defect predictors is a promising way to apply data science techniques (i.e. data miners [81]) for preventing IoT applications from defects. For example, Ostrand et al. [82] developed a negative binomial regression model to predict the numbers of faults for a large inventory system by using information from previous releases. Using simulation for testing software products is not new [83]. With the large amount of available data in the IoT, the application of data science techniques provides promising opportunities to simulate and mimic the complex and distributed IoT environment. This becomes rather important due to the variety and the sheer number of IoT devices. Mimic sensor data, middleware functionality and communication is an essential task of future quality assurance of the IoT and demands strong support of data science techniques. 5.2
Defect analysis
Analyzing defects is a very useful and vital task to assure and continuously improve the quality of IoT applications and solutions [73, 84]. Basically, defect analysis seeks to identify possible causes of defects in order to eliminate them [73]. However, the analysis of defects is according to Kumaresh and Baskaran [73] limited and restricted
Data Science Challenges to Improve QA of IoT Applications
13
by human investigation capabilities. Hence, data science provides promising potential for improvement. Data science techniques can be used to detect root causes of defects. Hence, the exact localization of faults in the code can be determined. For example, Wong and Qi [85] successfully applied a machine learning model (back-propagation neural network) to exactly localize program faults in software programs. Also Kannadhasan and Maheswari [86] successfully applied a machine learning algorithm to classify fault and non-fault statements in object oriented applications. Further, efficient bug prioritization and bug reproduction can be realized by applying data science methods. In addition, the application of data science techniques generates richer information about defects and enables the generation of more detailed bug reports. Also the dependencies of different software modules, components as well as third party software can be illustrated in an effective way by applying data science techniques. 5.3
User Incorporation
Academia and practice already recognized that a strong user involvement in software development is very important to meet user requirements and expectations. Hence, the ISO/IEC 25010:2011 [87] defines beside the software product quality model which describes a software product’s quality also the quality-in-use model which represents the perspective of a user interacting with the software product [46]. As outlined in the previous section, functional testing is no longer feasible in the IoT area. Hence, a strong user involvement and incorporation is essential in assuring quality of the IoT. Users who test IoT applications generate a lot of data which can be used to extract valuable information about different attributes (e.g. performance or user habits). For example, Cao et al. [88] proposed an approach that mines behavior patterns of mobile users by analyzing context logs. Moreover, Rubin et al. [79] presented how process mining techniques can be used to analyze and predict user behavior as well as to discover usability anti-patterns [77]. A quite different, but also promising approach was presented by Gruska et al. [89]. They mined more than 6,000 open source Linux projects in order to extract rules which reflect normal interface usage. By using these usage rules they were able to efficiently determine anomalies (i.e. code smells, defects) in new software projects. Applying data science techniques on user data of IoT applications promise great potential to increase user satisfaction. Moreover, it is an effective mean to address the new requirements evolving with the IoT where a classical integration testing approach is no longer feasible and testers should “test as a user” in order to better understand the final customers. 5.4
Organizational
As stated earlier, we use the term “organizational” in this paper to summarize all managerial and organizational activities (e.g. planning, communication, social interaction, monitoring, controlling, or project management) of the software development life cycle. Mining and analyzing software development processes can provide significant and valuable insights into how organizations execute software projects [90]. Rubin et al. [77] even mention software process mining as a new area which opens numerous challenges and research directions.
14
Foidl and Felderer
One interesting approach how data science techniques can be applied was proposed by Bacchelli et al. [91]. They presented an approach which exploits valuable information (i.e. design choices) embedded in emails related to software development. Additional work on this was done by Bird et al. [92] who mined the email social network on an open source software project. This is very valuable in order to understand and uncover misunderstandings related to user requirements between software engineers and therefore to improve the quality of the developed application. Moreover, the variety and amount of different devices and components used while developing IoT solutions offers great opportunities to measure and monitor the progress of a software project on the basis of each involved software engineer. This could be further used to exactly determine which engineers are the most suitable choice for specific development projects that demand special technical skills. According to Begel and Zimmermann [37] software engineers are very interested in comparing each other as well as assessing their individual performance. The emergence of the IoT causes organizations to tailor their software quality assurance processes in order to meet the new evolving requirements. Applying data science techniques promises great potential to enhance the organizational perspective of software development related to quality assurance of the IoT.
6
Conclusion and Future work
With the advent of the IoT the amount of available data literally explodes and accumulates at an unpredictable speed [7]. This poses promising opportunities to apply data science techniques for assuring the challenging quality requirements of IoT applications. Although the application of data science algorithms and techniques in the field of software testing is not new [93, 94] we expect sustainable improvement in this area caused by the large amount of data becoming available with the IoT. One of the main challenges will be to deal with the huge amount of data which becomes available with the emergence of the IoT. Applying the right techniques, algorithms and methods on this data to extract valuable knowledge and using this gained knowledge in proper application scenarios will be a further big upcoming challenge which opens promising new research directions. This is also in line with Zimmermann and Menzies [35] who state that “this is an exciting time for those of us involved in data science and software analytics”. In this paper, we presented data science challenges to improve the quality assurance of IoT applications. Therefore, we first described the main characteristics of the IoT. Afterwards, we provided a brief introduction of data science and its application in the software engineering domain. Based on an informal literature review, we further outlined requirements grouped into six categories (Environment, User, Compliance / Service Level Agreements, Organizational, Security and Data Management) which are evolving with the emergence of the IoT related to software quality assurance. Finally, we presented data science challenges related to quality assurance of the IoT sub-divided into four categories derived from the six quality assurance requirements categories. Namely, we see promising potential of data science applications in the following four categories: Defect prevention, Defect analysis, User incorporation and Organizational.
Data Science Challenges to Improve QA of IoT Applications
15
The outlined challenges in this paper constitute the motivation of the following future work. First, existing algorithms and methods of data science and its subdisciplines must be investigated in order to determine their meaningful application addressing the stated challenges. After selecting possible algorithms and methods, concerns must be made about their specific application. Secondly, their application must be roughly evaluated and their successful usage must be documented. Currently, implementing IoT solutions lack on the availability of generic methodologies, tool support and automation. Therefore, also the quality assurance of such solutions is done without structure and tool support. Hence, methodologies, frameworks and tool support should be developed incorporating the selected and evaluated algorithms and methods. This would contribute to improving the quality assurance of IoT deployments. Further possible future work comprises the development and evaluation of new algorithms and methods as well as their innovative application to address the stated challenges.
References 1. Santucci, G.: The Internet of Things: Between the Revolution of the Internet and the Metamorphosis of Objects In: Sundmaeker, H., Guillemin, P., Friess, P., Woelfflé, S. (eds.) Vision and Challenges for Realising the Internet of Things pp. 11-24. CERP-IoT – Cluster of European Research Projects on the Internet of Things, Luxembourg (2010) 2. Lee, I., Lee, K.: The Internet of Things (IoT): Applications, investments, and challenges for enterprises. Business Horizons 58, 431-440 (2015) 3. Marwah, Mateen, Q., Sirshar, M.: Software Quality Assurance in Internet of Things. International Journal of Computer Applications 109, 16-24 (2015) 4. Gubbi, J., Buyya, R., Marusic, S., Palaniswami, M.: Internet of Things (IoT): A vision, architectural elements, and future directions. Future Generation Computer Systems 29, 1645-1660 (2013) 5. Xia, F., Yang, L.T., Wang, L., Vinel, A.: Internet of Things. International Journal of Communication Systems 25, 1101-1102 (2012) 6. Prasad, N.R., Eisenhauer, M., Ahlsén, M., Badii, A., Brinkmann, A., Hansen, K.M., Rosengren, P.: Open Source Middleware for Networked Embedded Systems towards Future Internet of Things In: Sundmaeker, H., Guillemin, P., Friess, P., Woelfflé, S. (eds.) Vision and Challenges for Realising the Internet of Things pp. 153-163. CERP-IoT – Cluster of European Research Projects on the Internet of Things, Luxembourg (2010) 7. Cai, L., Zhu, Y.: The Challenges of Data Quality and Data Quality Assessment in the Big Data Era. Data Science Journal 14, 1-10 (2015) 8. Zhu, Y., Xiong, Y.: Towards Data Science. Data Science Journal 14, 1-7 (2015) 9. Katal, A., Wazid, M., Goudar, R.H.: Big Data: Issues, Challenges, Tools and Good Practices. In: Sixth International Conference on Contemporary Computing (IC3), 2013, pp. 404-409. IEEE, (2013) 10. IEEE: IEEE Standard for Software Quality Assurance Processes vol. 730™-2014 (Revision of IEEE Std 730-2002) IEEE Computer Society, New York (2014) 11. May, T.: The New Know: Innovation Powered by Analytics. Wiley, New Jersey (2009) 12. Provost, F., Fawcett, T.: Data Science and its relationship to Big Data and data-driven decision making. Big Data 1, 51-59 (2013) 13. Taylor, Q., Giraud-Carrier, C.: Applications of data mining in software engineering. International Journal of Data Analysis Techniques and Strategies 2, 243-257 (2010) 14. Xie, T., Pei, J., Hassan, A.E.: Mining Software Engineering Data. 29th International Conference on Software Engineering (ICSE'07 Companion), pp. 172 - 173. IEEE, Minneapolis, MN, USA (2007)
16
Foidl and Felderer
15. Hassan, A.E., Xie, T.: Software Intelligence: The Future of Mining Software Engineering Data Proceedings of the FSE/SDP workshop on Future of software engineering research, pp. 161-165. ACM, Santa Fe, New Mexico, USA (2010) 16. Bird, C., Menzies, T., Zimmermann, T.: The Art and Science of Analyzing Software Data. Morgan Kaufmann, Waltham, MA (2015) 17. Santucci, G., Lange, S.: Internet of Things in 2020 - A Roadmap for the Future. INFSO D.4 NETWORKED ENTERPRISE & RFID INFSO G.2 MICRO & NANOSYSTEMS (2008) 18. Agrawal, S., Vieira, D.: A survey on Internet of Things. Abakós 1, 78-95 (2013) 19. Atzori, L., Iera, A., Morabito, G.: The Internet of Things: A survey. Computer Networks 54, 2787-2805 (2010) 20. Foidl, H., Felderer, M.: Research Challenges of Industry 4.0 for Quality Management. In: Felderer, M., Piazolo, F., Ortner, W., Brehm, L., Hof, H.-J. (eds.) Innovations in Enterprise Information Systems Management and Engineering, pp. 121-137. Springer (2016) 21. Vermesan, O., Harrison, M., Vogt, H., Kalaboukas, K., Tomasella, M., Wouters, K., Gusmeroli, S., Haller, S.: Strategic Research Agenda. In: Sundmaeker, H., Guillemin, P., Friess, P., Woelfflé, S. (eds.) Vision and Challenges for Realising the Internet of Things pp. 39-82. CERP-IoT – Cluster of European Research Projects on the Internet of Things, Luxembourg (2010) 22. Miorandi, D., Sicari, S., De Pellegrini, F., Chlamtac, I.: Internet of things: Vision, applications and research challenges. Ad Hoc Networks 10, (2012) 23. Zhang, D., Yang, L.T., Huang, H.: Searching in Internet of Things: Vision and Challenges. In: Ninth IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA 11), pp. 201-206. IEEE, (2011) 24. Liu, Y., Zhou, G.: Key Technologies and Applications of Internet of Things. Fifth International Conference on Intelligent Computation Technology and Automation (ICICTA 12), pp. 197-200. IEEE, Zhangjiajie, Hunan (2012) 25. van der Aalst, W.M.P.: Extracting Event Data from Databases to Unleash Process Mining. In: vom Brocke, J., Schmiedel, T. (eds.) BPM - Driving Innovation in a Digital World. Springer, Cham (2015) 26. Naur, P.: The Science of Datalogy. Communications of the ACM 9, 485 (1966) 27. Smith, J.F.: Data Science as an academic discipline. Data Science Journal 5, 163-164 (2006) 28. Data Science Journal http://datascience.codata.org/ 29. Journal of Data Science, http://www.jds-online.com/about 30. Hayashi, E.C.: What is Data Science ? Fundamental Concepts and a Heuristic Example. In: Data Science, Classification, and Related Methods - Proceedings of the Fifth Conference of the International Federation of Classification Societies (IFCS-96), pp. 40-51. Springer, (1996) 31. Liu, L., Zhang, H., Li, J., Wang, R., Yu, L., Yu, J., Li, P.: Building a community of data scientists: An explorative analysis. Data Science Journal 8, 201-208 (2009) 32. Dhar, V.: Data Science and Prediction. (2012) 33. Waller, M.A., Fawcett, S.E.: Data Science, Predictive Analytics, and Big Data: A Revolution That Will Transform Supply Chain Design and Management Journal of Business Logistics 34, (2013) 34. van der Aalst, W.M.P.: Data Scientist: The Engineer of the Future. In: Mertins, K., Bénaben, F., Poler, R., Bourrières, J.-P. (eds.) Enterprise Interoperability VI Interoperability for Agility, Resilience and Plasticity of Collaborations, vol. 7, pp. 13-26. Springer (2014) 35. Menzies, T., Zimmermann, T.: Software Analytics: So What? IEEE Software 30, 31-37 (2013) 36. Kim, M., Zimmermann, T., DeLine, R., Begel, A.: The Emerging Role of Data Scientists on Software Development Teams - Technical Report. MSR-TR-2015-30. Microsoft Research (2015)
Data Science Challenges to Improve QA of IoT Applications
17
37. Begel, A., Zimmermann, T.: Analyze this! 145 questions for data scientists in software engineering. In: Proceedings of the 36th International Conference on Software Engineering (ICSE 2014), pp. 12-23. ACM, (2014) 38. Mockus, A., Weiss, D.M., Zhang, P.: Understanding and Predicting Effort in Software Projects. In: Proceedings of the 25th International Conference on Software Engineering (ICSE'03), pp. 274-284. IEEE Computer Society, (2003) 39. Zimmermann, T., Weißgerber, P., Diehl, S., Zeller, A.: Mining Version Histories to Guide Software Changes. In: Proceedings of the 26th International Conference on Software Engineering (ICSE’04), pp. 563-572. IEEE, (2004) 40. Cheatham, T.J.: Software Testing: A Machine Learning Experiment. In: 23rd annual conference on Computer science (CSC '95) pp. 135-141. ACM, (1995) 41. Gorla, A., Tavecchia, I., Gross, F., Zeller, A.: Checking app behavior against app descriptions In: 36th International Conference on Software Engineering (ICSE 2014), pp. 1025-1035. ACM, (2014) 42. Chaturvedi, K.K., Singh, V.B., Singh, P.: Tools in Mining Software Repositories. In: 13th International Conference on Computational Science and Its Applications (ICCSA 2013), pp. 89-98. IEEE, (2013) 43. Fuggetta, A., Di Nitto, E.: Software Process. In: Proceedings of the Future of Software Engineering FOSE'14, pp. 1-12. ACM, (2014) 44. Trendowicz, A., Kopczynska, S.: Adapting Multi-Criteria Decision Analysis for Assessing the Quality of Software Products. Current Approaches and Future Perspectives. In: Hurson, A., Memon, A. (eds.) Advances in Computers, vol. 93, pp. 153-226. Academic Press, Waltham (2014) 45. ISO/IEC/IEEE: ISO/IEC/IEEE 24765:2010 - Systems and software engineering — Vocabulary. ISO (2010) 46. Wagner, S.: Software Product Quality Control. Springer, Heidelberg (2013) 47. Garousi, V., Felderer, M., Mäntylä, M.V.: The need for multivocal literature reviews in software engineering: complementing systematic literature reviews with grey literature. 20th International Conference on Evaluation and Assessment in Software Engineering (EASE 2016). ACM, Limerick, Ireland (2016) 48. Cognizant, http://www.cognizant.com/InsightsWhitepapers/the-internet-of-things-qaunleashed-codex1233.pdf 49. TechArcis Solutions, http://techarcis.com/whitepaper/testing-for-internet-of-things/ 50. Polarion Software, https://www.polarion.com/resources/download/testing-the-internet-ofthings?utm_campaign=Blog-2016-Embedded-Q1&utm_medium=Blog&utm_source=Blog 51. Ayla Networks, http://theinternetofthings.report/Resources/Whitepapers/7f4b81fe-25c34fa1-a1fb-a12fa6d42f44_Ayla_Whitepaper_Art-of-IoT-QA.pdf 52. Testbirds, https://www.testbirds.com/fileadmin/Whitepaper-Studies/Whitepaper-Internetof-Things-EN.pdf 53. Gerrard Consulting, http://gerrardconsulting.com/sites/default/files/IoETestStrategy.pdf 54. Böger Henning, http://de.slideshare.net/HenningBoeger/german-testingnightiothenningboeger20130228enexport 55. Test and Verification Solutions, http://www.testandverification.com/wpcontent/uploads/Testing%20the%20Internet%20of%20Things.pdf 56. Lau, M.: Testing the Internet of Things. Printed Circuit Design and Fab (April), 43 (2014) 57. DevOps, http://devops.com/2015/02/24/functional-testing-iot/ 58. LeanTesting, https://leantesting.com/resources/how-do-we-test-the-internet-of-things/ 59. Neotys, http://www.neotys.com/blog/performance-testing-101-how-to-approach-theinternet-of-things/ 60. Semiconductor Engineering, http://semiengineering.com/how-to-cut-verification-costs-foriot/ 61. SmartBear, http://blog.smartbear.com/user-experience/testing-the-internet-of-things/ 62. IoT-Now, http://www.iot-now.com/2015/05/25/33241-testing-the-internet-of-things-itstime-to-plan-your-test-strategy/
18
Foidl and Felderer
63. Beyond Security, http://www.beyondsecurity.com/security_testing_iot_internet_of_things.html 64. CenturyLink Cloud, https://www.ctl.io/blog/post/qa-with-the-iot/ 65. embedded, http://www.embedded.com/electronics-news/4437315/The-testing-challengesahead-for-the-Internet-of-things 66. LogiGear, http://www.logigear.com/magazine/issue/past-articles/testing-strategy-for-theiot/ 67. TestPlant, http://www.testplant.com/explore/testing-use-cases/testing-the-internet-ofthings-set-top-boxes/ 68. Nilsson, D.K., Larson, U.E.: Secure Firmware Updates over the Air in Intelligent Vehicles. In: ICC Workshops - 2008 IEEE International Conference on Communications Workshops, pp. 380-384. IEEE, (2008) 69. Zumel, N., Mount, J.: Practical Data Science with R. Manning Publications, New York (2014) 70. Schutt, R., O´Neil, C.: Doing Data Science. O´Reilly, Sebastopol, CA, USA (2014) 71. Buse, R.P.L., Zimmermann, T.: Information needs for software development analytics. In: Proceedings of the 34th International Conference on Software Engineering (ICSE '12), pp. 987-996. IEEE, (2012) 72. Suma, V., Nair Gopalakrishnan, T.R.: Effective Defect Prevention Approach in Software Process for Achieving Better Quality Levels. Proceedings of World Academy of Science: Engineering & Technology 42, 258-262 (2008) 73. Kumaresh, S., Baskaran, R.: Defect Analysis and Prevention for Software Process Quality Improvement International Journal of Computer Applications 8, 42-47 (2010) 74. Han, S., Dang, Y., Ge, S., Zhang, D., Xie, T.: Performance debugging in the large via mining millions of stack traces. In: Proceedings of the 34th International Conference on Software Engineering (ICSE `12), pp. 145-155. IEEE, (2012) 75. Jiang, Z.M., Hassan, A.E., Hamann, G., Flora, P.: Automated Performance Analysis of Load Tests. In: International Conference on Software Maintenance (ICSM 09), pp. 125134. IEEE, (2009) 76. Hindle, A.: Green Mining: A Methodology of Relating Software Change to Power Consumption. In: 9th IEEE Working Conference on Mining Software Repositories (MSR), pp. 78-87. IEEE, (2012) 77. Rubin, V., Lomazova, I., van der Aalst, W.M.P.: Agile Development with Software Process Mining. In: Proceedings of the 2014 International Conference on Software and System Process (ICSSP 2014), pp. 70-74. ACM, (2014) 78. van der Aalst, W.M.P.: Process Mining - Discovery, Conformance and Enhancement of Business Processes. Springer, Berlin Heidelberg (2011) 79. Rubin, V.A., Mitsyuk, A.A., Lomazova, I.A., van der Aalst, W.M.P.: Process Mining Can Be Applied to Software Too! . Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM ’14). ACM, Torino, Italy (2014) 80. Shershakov, S.A., Rubin, V.A.: System runs analysis with process mining. Modeling and Analysis of Information Systems 22, 813-833 (2015) 81. Menzies, T., Greenwald, J., Frank, A.: Data Mining Static Code Attributes to Learn Defect Predictors. IEEE Transactions on Software Engineering 32, 2-13 (2007) 82. Ostrand, T.J., Weyuker, E.J., Bell, R.M.: Where the Bugs Are. In: International Symposium on Software Testing and Analysis (ISSTA '04), pp. 86-96. ACM, (2004) 83. Lazic, L., Velasevic, D.: Applying simulation and design of experiments to the embedded software testing process. Software Testing, Verification and Reliability 14, 257-282 (2004) 84. Wagner, S.: Defect classification and defect types revisited. In: International Symposium on Software Testing and Analys (ISSTA '08 ) - workshop on Defects in large software systems (DEFECTS '08), pp. 39-40. ACM, (2008) 85. Wong, E.W., Qi, Y.: BP Neural Network-Based Effective Fault Localization. International Journal of Software Engineering and Knowledge Engineering 19, 573-597 (2009)
Data Science Challenges to Improve QA of IoT Applications
19
86. Kannadhasan, N., Maheswari, U.B.: Machine Learning based Methodology for Testing Object Oriented Applications. Journal of Engineering and Applied Sciences 10, 7400-7405 (2015) 87. ISO/IEC: ISO/IEC 25010:2011 - Systems and software engineering -- Systems and software Quality Requirements and Evaluation (SQuaRE) -- System and software quality models. ISO/IEC (2011) 88. Cao, H., Bao, T., Yang, Q., Chen, E., Tian, J.: An Effective Approach for Mining Mobile User Habits. In: 19th ACM international conference on Information and knowledge management (CIKM '10), pp. 1677-1680. ACM, (2010) 89. Gruska, N., Wasylkowski, A., Zeller, A.: Learning from 6,000 projects: lightweight crossproject anomaly detection. In: Proceedings of the 19th international symposium on Software testing and analysis (ISSTA '10), pp. 119-130. ACM, (2010) 90. Santos, R.M.S., Oliveira, T.C., Brito e Abreu, F.: Mining Software Development Process Variations. In: 30th Annual ACM Symposium on Applied Computing (SAC '15), pp. 16571660. ACM, (2015) 91. Bacchelli, A., Dal Sasso, T., D`Ambros, M., Lanza, M.: Content Classification of Development Emails. In: 34th International Conference on Software Engineering (ICSE 12), pp. 375-385. IEEE, (2012) 92. Bird, C., Gourley, A., Devanbu, P., Gertz, M., Swaminathan, A.: Mining Email Social Networks. In: International workshop on Mining software repositories (MSR '06), pp. 137143. ACM, (2006) 93. Noorian, M., Bagheri, E., Du, W.: Machine Learning-based Software Testing: Towards a Classification Framework In: 23rd International Conference on Software Engineering & Knowledge Engineering (SEKE'2011), pp. 225-229. (2011) 94. Lenz, A.R., Pozo, A., Vergilio, S.R.: Linking software testing results with a machine learning approach. Engineering Applications of Artificial Intelligence 26, 1631-1640 (2013)