Google Android development teams sincerity

7 downloads 0 Views 300KB Size Report
terms of user satisfaction [1]. In a market-driven environment [2], ... open source systems (OSS), relied in part on user requests as logged on the issue .... executing previously written scripts to remove all HTML tags and foreign characters [17] ...
Full citation: Licorish, S. A., Zolduoarrati, E. and Stanger, N. 2018. Linking User Requests, Developer Responses and Code Changes: Android OS Case Study, in Proceedings of the 22nd International Conference on Evaluation and Assessment in Software Engineering (EASE 2018) (Christchurch, New Zealand, June 28-29). ACM, 79-89. 10.1145/3210459.3210467

Linking User Requests, Developer Responses and Code Changes: Android OS Case Study Sherlock A. Licorish

Elijah Zolduoarrati

Nigel Stanger

Information Science Department University of Otago Dunedin, New Zealand [email protected]

Information Science Department University of Otago Dunedin, New Zealand [email protected]

Information Science Department University of Otago Dunedin, New Zealand [email protected]

ABSTRACT Since software systems are designed to satisfy customers’ needs, developers have an obligation to address users’ requirements and demands logged via issue trackers and other forums. Having to respond to a large number of requests while developing and perfecting systems presents prioritization challenges, however. Android Operating System (OS) developers have largely overcome this obstacle by responding to specific user requests, which may be traced back to actual software code changes, providing lessons for the software engineering community. This study applies text and data mining techniques to investigate the Android community as an ecosystem, exploring how developers responded to issues raised by the community over several versions of the OS. Results show a strong relationship between issues raised by the community and developer responses to these issues. This relationship also extended to actual source code changes made by developers. Furthermore, the findings show a correlation between user requests and developer responses enacted via code changes across specific Android versions and important functionalities. This evidence suggests that developers have invested in the Android platform to guarantee its survival and overall success, largely through addressing user demands. We outline implications for software engineering professionals and software systems success.

KEYWORDS Android OS; User Requests; Developer Responses; Mining Software Repositories; NLP.

1

INTRODUCTION

Software developers play a fundamental part in addressing and responding to requests from users over a system’s evolution. In particular, this activity is central to software project success, in terms of user satisfaction [1]. In a market-driven environment [2], developers tend to build software for many potential anonymous users, which is different from the usual approach that tends to involve the development team and client negotiating requirements for system implementation [3]. To this end, developers must be

nimble in delivering client value. However, when under pressure, practitioners may sometimes claim to deliver and fix features when this is not the case, or may remove defects from issue trackers [4]. Short term measures may also be used to appease clients, if only minor improvements are requested. Android OS developers have managed to satisfy a very large community of users. The development of this software, like many open source systems (OSS), relied in part on user requests as logged on the issue tracker [5]. Little is known about how Android developers achieve this feat, although evidence has shown that they at least respond in their release notes to user requests logged on the issue tracker [5]. However, it would be more beneficial to investigate user requests, developer responses and associated code changes that lead to the delivery of features that users desire. Such insights would be useful for the software engineering community in terms of understanding how to deliver client value in light of many competing interests among users. Furthermore, this evidence would help with user confidence, in terms of developers’ sincerity in responding to user requests [4]. An essential challenge of such a proposition, however, relates to how to associate a broad number of user requests and developer responses to changes in specific source code files. Android development teams have established a stringent process of responding to user enhancement requests and defects [5]. Research evidence has shown that Android developers respond to user requests to ensure that they address the most critical requirements from the users’ perspective [5]. This is essential for organizations to consider, as it may identify ways for gaining or increasing competitive advantage [1, 6]. That said, researchers have not examined how user requests, developer responses and code changes are related, particularly for successful systems such as Android OS. The aim of the current study is to investigate how developers respond to user requests by tracing these back to Android development teams’ source code. The study examines Android history data provided by Google to look at how Android system versions have evolved in relation to user requests, developer responses, and source code changes, providing an in-depth view of

how Android evolved in light of user feedback. An unintended benefit of this study is the understanding and awareness of how Android updates are prioritized, an important challenge faced by the software engineering community. Through this investigation the software engineering community is also provided insights into how client value is delivered in light of many competing interests among users. Furthermore, the work validates the sincerity of software developers in terms of responding to user requests. The remaining sections of the paper are organized as follows. We provide our study background and research questions in Section 2. We next provide our research setting in Section 3, before providing our results in Section 4. We then discuss our findings, their implications, and the threats to the study in Section 5. Finally, we provide concluding remarks in Section 6.

2

BACKGROUND

Software system evolution relies on feedback from users, and thus, developers must track such feedback to make necessary upgrades [7]. To this end, researchers have argued that user reviews are integral to the development of successful applications, particularly in the mobile apps space. For instance, Palomba et al. [8] undertook a study of 100 Android apps with the aim of showing how application developers enhance the success of their applications (ratings), finding that developers who make use of user reviews enhance their apps’ ratings. Ha and Wagner [9] manually analyzed 556 reviews from 59 different Google Play apps finding that users place little emphasis on security and permission issues, instead focusing on working features. An analysis of app reviews in support of software maintenance was conducted by Panichella et al. [10], who demonstrated that reviews are useful for directing maintenance effort. Och et al. [11] examined mechanisms for linking online customer requests to established system requirements, finding that automatic extraction of reviews could link corresponding user requests and requirements with around 50% accuracy, saving developers time. Krusche and Bruegge [12] explored the need for early user feedback when evolving and perfecting mobile applications, finding that user feedback enables developers to continuously enhance usability and user experience of mobile applications. In harnessing the power of end-user feedback, studies have also looked at the provision of tools to automate review extraction, providing rapid insights for developers. For instance, tools such as the Crowd Listener for releAse Planning (CLAP) [13] and ARMiner [14] promise the generation of reports that may offer insights to developers in terms of ranking and grouping user reviews depending on their purpose (reporting bugs), merging related reviews (same bug), and automatically prioritizing review clusters in planning subsequent application release. Another stream of work has examined issue trackers. For instance, given the backdrop that negative reviews may affect software usage, Licorish et al. [15] evaluated Android issue tracker feedback from various communities of users to understand the issues that affected the Android community. They observed that

features associated with a particular subset of topics were most frequently sought for improvements. Licorish et al. [16] also explored stakeholder perceptions and experiences regarding specific Android security issues. They found that Android confidentiality and privacy concerns had varying severity and such issues were most prevalent over Jelly Bean releases. Android community users also expressed different preferences for new security features, ranging from more relaxed to very strict, potentially posing a dilemma for developers in terms of the optimal security mechanisms to enact. The Android issue tracker and developer release notes were used to investigate the issues that were logged by the Android community, and how Google’s remedial efforts correlated with user requests [5]. These authors found very strong alignment between end-user top feature requests and Android developer responses, particularly for the more recent Android releases. While this study examined user request and developer responses in relation to such requests [5], it did not seek to reconcile user requests with actual source code changes made by developers. Developers may develop workarounds or claim to fix broken features when in fact issues with such features may remain. Evidence indeed shows that some software development practitioners may even go as far as removing specific records from the issue tracker in order to reduce community members’ opinion around such issues and delay providing remedial work [4]. Under these circumstances developer release notes may not be trusted for accurately understanding how developers satisfy a very large community of user requests. Given the success of Android developers in satisfying the community’s needs, and evidence from the release notes that these developers do respond to the issue tracker via their release notes [5], it would be interesting to investigate user requests, developer responses, and associated code changes that lead to the delivery of features that users desired. Such insights promise lessons for how to deliver client value in light of many competing interests among users, understanding and awareness around how Android updates are prioritized, and validation in terms of the sincerity of software developers in responding to user requests. These lessons would be particularly noteworthy if provided over Android evolution. We thus use the Android OS as a case study to answer the following two research questions to provide these contributions: RQ1. How do Android OS developer responses to user requests correlate with their source code activities? RQ2. Do these correlations hold across different Android OS versions?

3

RESEARCH SETTING

We used the Android OS community as our case study “organization”. Issues identified by the Android community are

submitted to an issue tracker hosted by Google 1. Among the data stored in the issue tracker are the following details: Issue ID, Type, Summary description, Stars (number of people following the issue), Open date, Reporter, Reporter Role, and OS Version. We extracted a snapshot of the issue tracker, comprising issues submitted between January 2008 and March 2014. Our particular snapshot comprised 21,547 issues. We next gathered release notes from the official Android developers’ portal 2. Google provides, for most versions, two types of notes: user and developer release notes. We obtained and combined all available release notes for both users and developers for each version 3.

Table 1 provides a brief summary of the numbers of enhancement requests logged between each of the major releases, from the very first release through to KitKat (Android version 4.4). Of note is that the issue tracker was extracted during March 2014, where KitKat was the final release at this time. We used text mining to study requests logged via the issue tracker and Android release notes, and we mined logs of code changes to perform comparisons. These issues are considered in the following two subsections, prior to our approach to reliability assessments.

These issues and release notes were imported into a Microsoft SQL Server database, and we then performed data cleaning by executing previously written scripts to remove all HTML tags and foreign characters [17], particularly those in the summary description and release notes, to prevent these confounding our analysis. We next employed exploratory data analysis (EDA) techniques to investigate the data properties, in order to identify missing values, transform data, detect anomalies, and select our research sample. Issues were labelled (by their submitters) as defects (15,750 issues), enhancements (5,354 issues), and others (438 issues). Given our goal of studying how the Android development teams respond to user requests in new releases of the Android OS, we selected the 5,354 enhancement requests, as logged by 4,001 different reporters. Of the 5,354 enhancement requests, 577 were logged by members identifying themselves as developers, 328 were sourced from users, and 4,449 were labelled as anonymous.

Natural language processing (NLP) techniques including part-ofspeech (POS) tagging [18] and n-gram analysis [19] were employed to study enhancement requests and release notes, in order to investigate whether Android developers paid attention to their users’ needs. We created a program that incorporated the Stanford NLP API to enable the extraction of noun phrases from the enhancement requests and release notes, before counting the frequency of each noun as unigrams in the enhancement requests and release notes respectively (e.g., if “SMS” appeared at least once in 20 enhancement requests the program would output SMS = 20). Checks were also made for misspellings and plural forms of the nouns that the program found, ensuring that these were counted in the analysis. The findings from these checks demonstrated strong alignment between end-users’ top feature requests and Android developer responses, particularly for the more recent Android releases [5].

We examined the data of each request in our database to align these with the official releases of the Android OS. Its first release was in September 2008 4, although the first issue was logged in the issue tracker in January 2008. This suggests that the community was already actively engaged with the Android OS after the release of the first beta version in November 2007, with issues being reported just two months later. Given this level of active engagement, occurring even before the first official Android OS release, we partitioned the issues based on Android OS release date and major name change. So for instance, all of the issues logged from January 2008 (the date the first issue was logged) to February 2009 were labelled “Early versions”, reflecting the period occupied by Android OS releases 1.0 and 1.1, which both lacked formal names. The subsequent partition comprised the period between Android OS version 1.1 and Cupcake (Android version 1.5), and so on. It was straightforward to mine the release notes as these were labelled as per the individual releases. Thus, apart from data cleaning, we partitioned the release notes in line with the separation of enhancement requests described above.

1

https://issuetracker.google.com/issues?q=status:open

2

https://developer.android.com/about/

3.1

3.2

Text Mining Requests and Release Notes

Data Mining Code Changes

Having extracted matching nouns in associated enhancement requests and release notes, we then mined the Android code repository to validate the linkage between code changes, nouns in release notes, and those in enhancement requests. We imported 4 gigabytes of data, comprising over 6 million records from the Android code repository into our database. Subsequently we performed data aggregation involving the Android commitTable and commitTarget tables. Altogether, 1,243,225 cleaned records were accounted for, with fields: hash number, author name, author date, committer name, committer date, and subject line (i.e., file path for change). The subject line was of particular interest to us, as this captured developers’ work on a specific class, function or subroutine (or other file). To answer our two research questions (RQ1 and RQ2), we created a script to return the specific file that was created or edited, and mined these records (for 195,705 unique files altogether), correlating counts of the specific file with the results from the enhancement requests and release notes discussed above. The periods during which the files were created or changed were tracked through the committer date (corresponding to dates in Table 1). We observed that records were first logged in the code 3

Note that “Early versions” (described shortly) provide only developer release notes.

4

https://android-developers.googleblog.com/2008/09/announcing-android-10-sdkrelease-1.html

repository on June 1998, with the final date being June 2011. To this end, we aligned developer changes with our enhancement request dates, and our results were restricted to between Early versions and Honeycomb in Table 1 (seven versions of the OS),

covering 512,538 records for Early versions, 133,073 for Cupcake, 104,520 for Donut, 116,837 for Éclair, 168,644 for Froyo, 148,971 for Gingerbread, and 58,642 for Honeycomb.

Table 1: Android OS enhancement requests over major releases Version (Release) Early versions (1.0, 1.1) Cupcake (1.5) Donut (1.6) Éclair (2.0, 2.01, 2.1) Froyo (2.2) Gingerbread (2.3, 2.37) Honeycomb (3.0, 3.1, 3.2) Ice Cream Sandwich (4.0, 4.03) Jellybean (4.1, 4.2, 4.3) KitKat (4.4)

Last release date

Number of days between releases

Total requests logged

Mean requests per day

09/02/2009 30/04/2009 15/09/2009 12/01/2010 20/05/2010 09/02/2011 15/07/2011 16/12/2011 24/07/2013 31/10/2013

451 80 138 119 128 265 156 154 586 99

173* 64 141 327 349 875 372 350 1,922 781

0.4 0.8 1.0 2.8 2.7 3.3 2.4 2.3 3.3 7.9

∑ = 2,176 ∑ = 5,354 𝑥𝑥̅ = 2.7 * Total number of requests logged between the first beta release on 16/11/2007 and Android version 1.1 released on 09/02/2009

3.3

Reliability Assessment

Correlation analysis was used to triangulate results observed visually. In addition, we conducted manual coding, involving a random sample of 50 outputs from each block of enhancement request and release note outputs of the POS and n-gram analyses, to check that nouns were correctly classified, and to verify that misspellings and plural forms of nouns were appropriately considered by our program (flagging each as true or false). To do so we traced each noun back to the enhancement request and/or release note from which it was derived. We then computed reliability measurements from these coding outputs using Holsti’s coefficient of reliability [20] to evaluate our agreement. Overall, our reliability check revealed 90% agreement, meaning that our NLP processing largely extracted the correct noun from the text that was supplied. In fact, we also agreed that the 10% not accounted for were nouns that appeared sparsely, and thus did not affect our outcomes, as we only focused on the top features as discussed below. Our checks here show excellent agreement between coders, confirming that our data extraction was reliable. We provide our results next.

4

RESULTS

We present the results to answer our two research questions in this section, first presenting those pertaining to RQ1 in Section 4.1, and then results relating to RQ2 in Section 4.2.

4.1

RQ1. How do Android OS developer responses to user requests correlate with their source code activities?

We present results pertaining to the top 20 enhancement requests and responses identified in [5], to examine how these correlate with the source code activities of Android development teams. Fig. 1 presents the top 20 enhancement features requested by the community. This shows that the top five features in enhancement requests across all versions were contacts, screen, notification, call, and calendar, with an average overall count of 147.6 requests per feature. Android developer responses to these requests (via release notes) are shown in Fig. 2. On average there were 115 responses per requested feature [5]. Fig. 3 shows code changes associated with enhancement requests. In total, there were 106,809 code changes relating to the top 20 enhancement requests, with an average count of 5,340.5 code changes per feature (median=994.5, Std Dev=11,126.4). We observe that file, api, and text attracted the most code changes across the Android OS releases studied, with developers working on the file feature 46,193 times, api 23,523 times, and text 11,865 times. These were followed by call (7,197 changes), message (5,042), default (2,594), and keyboard (1,576). On the other hand, email appeared to be the least worked on feature, accounting for only 0.1% (104) of source code changes in Fig. 3. By taking the average of the top 20 code changes as a threshold (i.e., 5,340.5 changes), we observed that four features were changed more frequently than normal (file, api, text, and call). Overall, looking at Figs. 2 and 3 we see strong convergence in developer release note responses and the corresponding source code changes for features. Notable, for instance, is the pattern of

results for file, api, text, call, message, default, and screen, which appear among the top 10 features in both figures. More than half of these features also appear among the top 10 user enhancement requests in Fig. 1. These observations were triangulated through formal correlation tests to examine the pattern of relationships between user requests, developer responses, and corresponding source code changes. The objective was to see whether there were significant correlations between requests, responses, and code changes. Our three distributions violated normality, so the degree of association between variables was assessed using Spearman’s rho (r) correlation analysis, where a 5% threshold was used to measure significance (i.e., p ≤ 0.05). This investigation may also involve negative associations when the relationship is inverse rather than direct. For the top 20 requested features, the possible associations between the numbers of requests, responses and code changes for features were examined. A statistically significant and strong correlation was found between the number of user requests and the number of developer release notes responses to these requests (r = 0.54, p = 0.01). We also observed a statistically significant correlation between developer responses and their corresponding source code changes (r = 0.58, p = 0.01). However, there was no significant correlation between user requests and developer source code changes (r = 0.02, p = 0.92). These results demonstrate that Android OS development teams responded to the user community’s top enhancement requests online via their release notes, and these responses aligned with the actual software development they performed. These outcomes validate that these developers were genuine in responding to user requests. We next examine the pattern of requests, responses, and source code changes over Android OS versions to answer RQ2.

4.2

Figure 1: Android’s top 20 most frequent enhancement requests

RQ2. Do these correlations hold across different Android OS versions?

The number of user enhancement requests, developer responses, and corresponding code changes for the top 20 features for Android OS versions can be observed in Fig. 4, Fig. 5, and Fig. 6, respectively. In Fig. 4 we see that there were reasonably consistent feature requests across all features over the Early Versions of Android. These requests reduced between Cupcake and Donut, before increasing over Éclair and Froyo, peaking for Gingerbread, and then dropping for Honeycomb. The pattern of developer responses to user enhancement requests across Android OS versions (Fig. 5) was not uniform, however. While the pattern of results for some responses to enhancement requests remained stable over Android OS versions (e.g., for sms, volume, keyboard), others were erratic (e.g., api, screen, search). That said, all responses for features increased from Gingerbread to Honeycomb. Correspondingly, Fig. 6 reveals that source code changes were relatively high for all features over the Early Versions (particularly for file and api), before reducing between Cupcake and Donut. Code changes then increased over Éclair and Froyo, before reducing again for Gingerbread and Honeycomb.

Figure 2: Responses to Android’s top 20 most frequent enhancement requests The results shown in Figs. 5 and 6 were further analyzed to examine the convergence of developer responses and their source code changes for the top features file, api, text, call, message, default, and screen. Visual evidence does not demonstrate convergence across all versions, but there were similarities in the trends of responses and source code changes over Donut and Gingerbread. When the results shown in Figs. 4 and 6 were analyzed there was no uniform pattern of results across Android versions for enhancement requests and source code changes. To triangulate the observations in these figures, Spearman’s correlations between user requests, developer responses, and corresponding source code changes across all Android OS main

versions were formally examined (as detailed in Tables 2, 3 and 4 respectively; significant results are denoted with *).

practice, and outlining implications for software engineering. Finally, we consider threats to the study in Section 5.3.

Table 2 shows that user requests did not correlate with developer responses at the granular level across versions, and a similar pattern of outcomes is noted in Table 3 for user requests and developer source code changes. In contrast, the results in Table 4 show that developer responses correlated with their enacted source code changes for Donut (r = 0.54, p = 0.01) and Gingerbread (r = 0.56, p = 0.01).

5.1

In examining the specific enhancement requests that were most often addressed with responses and code changes, we observed interesting patterns in our results, shown in Tables 5, 6, and 7. Table 5 shows that there were statistically significant correlations between user requests and developer responses for call (r = 0.69, p = 0.03), message (r = 0.66, p = 0.04), number (r = 0.77, p = 0.01), text (r = 0.71, p = 0.02), and volume (r = 0.79, p = 0.01). Strong correlations are also seen between requests and code changes in Table 6 for alarm (r = 0.82, p = 0.02), button (r = -0.82, p = 0.02), and volume (r = -0.92, p = 0.00). Furthermore, Table 7 shows strong correlations between responses and source code changes for number (r = -0.78, p = 0.04) and text (r = -0.95, p = 0.00). Of note here is that some of these relationships are inverse rather than direct, which we will discuss next.

Figure 3: Code changes to Android’s top 20 most frequent enhancement requests

5

DISCUSSION AND IMPLICATIONS

We now examine our results to answer the research questions posed and discuss the implications of our findings for research and practice. Section 5.1 is dedicated to answering RQ1, where we assess our outcomes in relation to previous work and draw comparisons with software development practice. Section 5.2 considers RQ2, evaluating our findings against research and

RQ1. How do Android developer responses to user requests correlate with their source code activities?

The findings in this study show that, overall, Android development teams respond to the Android community’s top enhancement requests. We observe that there were correlations between end user requests for enhancement and Android developer responses to those requests. There was also correlation between developer responses and their corresponding source code changes. Our findings suggest that Android developers recognize the importance of end-user feedback, and strive to reach a balance by involving clients during system evolution, despite the high number of forwarded requests [5]. This evidence confirms speculation that mobile developers focus on working software, where features are perfected after initial release, based on customer feedback. This way software systems target user desires, and developer responses enhance the likelihood of satisfaction. Beyond this assessment, our findings confirm that Android OS is not designed exclusively based on Google’s specifications, but also according to user feedback, suggesting that listening to endusers is an essential strategy for product evolution and success. In fact, the results of this work provide some implications for software engineering practitioners and teams who aim to satisfy unknown end-user groups in similar settings, with regard to involving users in the product building process by auditing online portals and issue trackers. The fact that Android development teams provide access to their code changes is also noteworthy in terms of enhancing community confidence in the platform. This strategy may be vital to enhancing community satisfaction, and gaining competitive advantage in the marketplace. Thus, taking user feedback seriously and addressing customer concerns through product and file updates seems to have a significant impact on product quality despite the challenges associated with requirements overload, ambiguity, and prioritization [21]. Of the feature enhancements requested, our results reveal that for the Android OS, contacts, screen, notification, call, and calendar were the top features requested for enhancement; api, screen, file, notification, and text received most attention in developer release notes; and file, api, text, call, and message were most prominent in source code changes made by Google’s development teams. There are clear intersections between user requests, developer responses, and source code changes here, and our statistical analysis confirms the significance of these relationships. In particular, the first four features under source code changes (file, api, text, call) attracted above average attention from developers. Text, call and message are primary features of mobile devices, so our finding that these aspects were given most attention is sensible. In fact, file and api features are also considered primary features for interaction channels between software applications (e.g., file storage API, camera API, and so on). Thus, those features (i.e., file and api) are

expected to attract more developer attention than those that reflect unique functionalities, and those that are found in specialized application subsystems (e.g., notification). That said, a large number of enhancement requests may not always correlate with a large number of code changes and vice versa. This may especially be the case where there are differences in the way developers work to satisfy features, or where features may have higher business

value. For instance, some developers are more verbose than others, and so may enact numerous changes for correspondingly little work [25]. Also, developers may prioritize work on features corresponding to their business value [26]. Furthermore, the actual rigour of developing a feature may also mediate how much effort developers must commit to it.

Figure 4: Top 20 enhancement requests over Android versions

Figure 5: Responses to top 20 enhancement requests over Android versions

Figure 6: Code Changes to top 20 enhancement requests over Android versions Table 2: Spearman’s correlation between requests and responses over Android versions Feature Early Cupcake Donut Éclair Froyo Gingerbread Honeycomb

r 0.23 0.18 -0.08 0.21 0.22 0.00 0.40

p-value 0.32 0.45 0.73 0.39 0.34 0.99 0.08

Table 3: Spearman’s correlation between requests and code changes over Android versions Feature Early Cupcake Donut Éclair Froyo Gingerbread Honeycomb

r 0.22 0.10 -0.03 -0.19 0.08 -0.08 0.14

p-value 0.68 0.68 0.89 0.43 0.74 0.72 0.57

Table 4: Spearman’s correlation between responses and code changes over Android versions Feature Early Cupcake Donut Éclair Froyo Gingerbread Honeycomb

r 0.08 0.27 0.54 0.14 0.24 0.56 0.35

p-value 0.73 0.25 0.01* 0.56 0.31 0.01* 0.14

Table 5: Spearman’s correlation between requests and responses for individual features over Android versions Feature alarm api browser button calendar call contacts default email file keyboard message notification number screen search sms text voice volume

r 0.23 0.30 0.14 0.53 0.17 0.69 0.07 0.48 0.09 0.42 0.34 0.66 0.55 0.77 0.46 -0.55 0.22 0.71 0.16 0.79

p-value 0.53 0.39 0.69 0.11 0.64 0.03* 0.85 0.16 0.81 0.23 0.34 0.04* 0.10 0.01* 0.19 0.10 0.53 0.02* 0.67 0.01*

When we analyzed the features that were given the strongest consideration by Android teams in their source code changes, irrespective of end-user requests and developer release note responses, we observed drivers to be at the top of the list. However, file, text, and api remain prominent. We saw a similar pattern for developer responses provided in release notes, where api, screen, notification, and text remained prominent. This evidence provides awareness for how Android updates are prioritized. We believe that Android developers pay attention to the issues that attract most users’ interest, in order to deliver client

value in light of many competing interests among users. Overall, these developers also seem sincere in the responses they provide to users in their release notes. These are important lessons for the software engineering community in terms of the issues to target for development when there are many user requests, and the payoff provided for demonstrating sincerity in responding to user requests. That said, we concede that our analysis here only detects preliminary patterns, so follow up inductive analysis is necessary to further explore the deeper connections between user requests, developer responses and their code changes. We now examine our outcomes with respect to Android OS evolution. Table 6: Spearman’s correlation between requests and code changes for individual features over Android versions Feature alarm api browser button calendar call contacts default email file keyboard message notification number screen search sms text voice volume

r 0.82 0.04 -0.13 -0.82 -0.11 -0.16 -0.71 -0.14 0.49 0.05 0.39 -0.57 0.54 -0.45 0.04 -0.43 0.22 -0.36 -0.47 -0.92

p-value 0.02* 0.96 0.79 0.02* 0.84 0.73 0.09 0.78 0.27 0.91 0.38 0.20 0.21 0.31 0.94 0.33 0.53 0.44 0.29 0.00*

Table 7: Spearman’s correlation between responses and code changes for individual features over Android versions Feature alarm api browser button calendar call contacts default email file keyboard message notification number screen search sms text voice volume

r 0.47 -0.57 -0.24 -0.17 -0.12 -0.04 0.24 -0.17 -0.58 -0.36 -0.39 -0.44 -0.07 -0.78 -0.53 -0.27 0.31 -0.95 -0.41 -0.47

p-value 0.28 0.20 0.61 0.72 0.80 0.93 0.61 0.72 0.17 0.44 0.38 0.33 0.89 0.04* 0.24 0.56 0.50 0.00* 0.36 0.28

5.2

RQ2. Do these correlations hold across different Android OS versions?

There were slightly higher numbers of enhancement requests from the community during the early versions of the Android OS, with variable levels of requests subsequently, and the most being evident towards the release of Gingerbread. Evidence for correspondingly higher levels of source code changes was also seen over the early versions of the Android OS, but this trend changed as the OS evolved. On the other hand, while the pattern of results for some responses to enhancement requests remained stable over Android OS versions (e.g., for sms, volume, keyboard), others were erratic (e.g., api, screen, search). It is assumed that source code changes in relation to enhancement requests in early versions of Android OS were also likely to be driven by internal Google teams and market research, given that the product was new to the user community. Thus, this outcome was anticipated. Furthermore, it was noted that developers committed more source code changes in relation to user requests over specific versions of the Android OS. This finding was also triangulated with statistically significant outcomes for the correlations between developer responses in release notes and actual source code changes over both Donut and Gingerbread. In fact, there were statistically significant correlations between user requests and developer responses for the call, message, number, text, and volume features. This implies that Android developers considered user requests for specific issues and felt obligated to respond accordingly as the OS evolved. Also of note is that these features are all heavily customer facing, and would thus likely affect a large cohort of users. Strong correlations were also seen between requests and code changes for alarm, with the reverse noted for button and volume, where it was observed that developers made a disproportionally larger number of source code changes corresponding to fewer enhancement requests. This pattern was also seen for developer release note responses and their source code changes, where we observed that developers enacted much higher levels of source code changes corresponding to their release notes responses for the number and text features. Our outcomes here mirror those established in previous work which examined request-response ratios [5], suggesting that Android OS developers have committed to specific features more than others over some OS versions. This, however, aligns with the strategies mentioned in the previous section to maintain high levels of user satisfaction. Of course, this evidence could also be interpreted to mean that developers were actually shifting attention to forgotten issues that had not been addressed in previous releases, or that the most prominent features might be harder to code, thus attracting more code changes (noted above). These thoughts are all plausible; however, of note is that users were treated to a steady flow of software improvements and an innovation project in the form of the Android OS. Beyond the evidence for the higher level of focus developers gave to the features mentioned above, despite a relative lack of user enhancement requests, other features were given proportionally

more attention than others at specific times of the Android OS evolution. For instance, Google developers paid more attention to file and api requests over the earlier versions of Android, although text, call, and message issues were a bit more pressing. While these latter features may be regarded as fairly significant, perhaps aspects of file and api were related to the announcement of new evolving hardware, since there was rapid advancement in Androidsupported hardware devices over the early versions [22]. This growth likely had an impact on developers’ software evolution and improvement strategies, and therefore, some features were prioritized over others, regardless of the level of user interest expressed. These findings need inductive triangulation, but they still provide important lessons for software development practitioners and companies that want to ensure they produce essential products that answer consumer demands. First, there is good value in analyzing developer traces to reflect on a project’s ecosystem, in order to identify improvement opportunities or to track a project’s health. Our evidence here also suggests that developers may contribute significantly to product success by considering user feedback to identify ways to enhance their offerings and to remain competitive [5]. Such a strategy is likely to boost a company’s profit and satisfy stakeholders [21]. Furthermore, the findings in this work show that involving end-users during Android OS development, and especially from project inception, provided developers with critical insights that helped with perfecting software features, improving the success of the product. Such a user-centered policy is deemed crucial for software development (e.g., Agile approaches). Beyond practical relevance, our outcomes have implications for future work in terms of understanding how developers respond to user requests in evolving systems when there is competing user feedback. Replication studies are likely to provide evidence that will strengthen developer credibility in terms of their sincerity in responding to user requests. Furthermore, we see evidence of features with the highest number of user requests receiving most developer effort on some occasions, but the opposite at other times. While it is understandable that developers would pay attention to the issues that attract most user interest as a strategy for prioritizing work, the evidence presented here is not conclusive in explaining the higher level of effort developers expended on specific features with limited user requests. Furthermore, there is scope for studies to quantify the actual value provided to companies that respond to user feedback in the way that Android OS teams have done. These issues warrant further research.

5.3

Threats to Validity

While we have examined an important topic area, and have provided insights into the way developers use end-user feedback to evolve a product, we concede that like all case studies, there are shortcomings to this work that may affect its generalizability [24]. We acknowledge that the Android OS community is quite large, and thus our outcomes may not be typical of small teams of

developers. In particular, Android OS developers are likely to be adequately resourced, allowing them to conduct extensive tests of their product before launch. However, given that the Android issue tracker brings together developers and users, such that users can provide diverse feedback on the product’s performance and improvement opportunities, we believe that the lessons learned from this case would be relevant to developers of larger products (e.g., Microsoft teams). Although the Android issue tracker is publicly hosted, and so is likely to capture most of the community’s concerns, issues may also be informally communicated to and addressed internally within development teams at Google. Similarly, unreported issues are not captured by our analysis. We also accept that there is a possibility that we could have missed misspelled features. That said, the convergence of our results across multiple separate data sources (issue logs, release notes, and code changes) suggests that our approach was generally robust. In fact, our reliability assessment measure, albeit computed from conservative techniques, revealed excellent agreement between coders, suggesting that our findings benefitted from accuracy, precision, and objectivity [20]. We separated artifacts based on the dates of the major Android OS releases. Given that device manufacturers have been shown to delay upgrading their hardware to more recent Android OS releases [23], there is a possibility that some issues reported between specific releases were in fact related to earlier releases. However, this misalignment was not detected during previous contextual analysis [5], suggesting that our approach appropriately classified issues. Furthermore, our approach to extracting features from enhancement requests and release notes using NLP techniques [18, 19] may be open to validity and reliability concerns, especially if features are described using multiple words. This may also affect the alignment with code changes, which was studied at the file level. That said, we observed that there was overlap in the naming convention across these sources (noted in the results), which suggest that this may not be a major threat to the work. Finally, although the issue trackers of many mobile OSs are not publicly available, and the distribution of issues in these OSs may not be similar to that observed in the Android OS, mobile OSs such as Microsoft Windows and Apple iOS are all likely to follow release-maintenance cycles similar to that of Android OS in order to remain competitive in the market.

6

CONCLUSION

User feedback is considered fundamental to developing successful software systems. Thus, developers stand to benefit from acknowledging and addressing such feedback in future software updates. At times developers must develop initial products for potential users where conventional requirements gathering with a particular group of customers is not an option. Rather, in such settings developers try to release a beta version of their software to users who then contribute suggestions for enhancements and

new innovations towards feature perfection. Thus, to ensure the success and prosperity of software products, these developers should consider user concerns in meaningful system changes. While previous researchers have explored the utility of user feedback, providing insights into the nature of the content provided by users and how such content may be extracted and used, few studies have examined how user feedback is utilized by developers. In particular, it was observed that developers do respond to user requests in their release notes, but studies have not validated that such responses actually correspond to the development of code features. This study is an effort to bridge this gap. Exploring Android OS enhancement requests, release notes, and source code histories, we studied how developer responses to user requests correlated with their source code activities and whether such correlations held across different Android OS versions. Our outcomes reveal that Android development teams responded to the Android community’s top enhancement requests, where user enhancement requests correlated with developer responses in release notes and with their source code changes. This pattern of results also held across some Android OS versions and specific features. In fact, at times developers seemed to spend more time working on features than could be justified by the observed level of user interest. We anticipate that these developers recognized the value of using a user-centered approach to development, and its necessity for evolving a successful product. In doing so, these developers focused on features that attracted most user interest, but were equally strategic in paying attention to features that laid the foundation for further growth of the Android OS. Notwithstanding the single case that was examined and the threats considered in Section 5.3, this work shows that a user-centered policy may be helpful for fueling systems success, and validates the sincerity of software developers in terms of responding to user requests. That said, we encourage similar follow up studies into strengthening developer credibility. Future work may also look to further explain some of the patterns observed in this work for the higher level of effort developers expended on some features that had limited user requests, and investigate how to quantify user feedback.

ACKNOWLEDGMENT The authors would like to recognize Google for making the Android issues, release notes, and code repository publicly available, which facilitated the analyses performed in this research.

REFERENCES [1] Mohd Hairul Nizam Nasir, Shamsul Sahibuddin. 2011. Critical success factors for software projects: A comparative study. Scientific Research and Essays. 6, 10 (2011), 2174-2186. [2] Mahmood Hosseini, Keith Phalp, Jacqui Taylor, and Raian Ali. 2014. Towards crowdsourcing for requirements engineering. In Proceedings of the Empirical Track of REFSQ 2014. 82–88.

[3] Soo Ling Lim and Anthony Finkelstein 2012. StakeRare: using social networks and collaborative filtering for large-scale requirements elicitation. IEEE Trans. on Software Engineering. 38, 3 (2012), 707-735. [4] Pamela Bhattacharya, Liudmila Ulanova, Iulian Neamtiu, and Sai Charan Koduru. 2013. An empirical analysis of bug reports and bug fixing in open-source Android Apps. In Proceedings of 17th Euro. Conference on Software Maintenance and Reengineering (CSMR). IEEE, Washington, DC, 133-143. [5] Sherlock A. Licorish, Amjed Tahir, Michael Franklin Bosu, and Stephen G. MacDonell. 2015. On satisfying the Android OS community: User feedback still central to developers’ portfolios. In Proceedings of 2015 24th Australasian Software Engineering Conference (ASWEC). IEEE, Piscataway, NJ, 78-87. [6] Sherlock A. Licorish. 2016. Exploring the prevalence and evolution of Android concerns: A community viewpoint. Journal of Software. 11, 9 (2016), 848-869. [7] M.M. Lehman, D.E. Perry, and J.F. Ramil.1998. Implications of evolution metrics on software maintenance. In Proceedings of International Conference on Software Engineering. IEEE, Bethesda, MD, 208-217. [8] Fabio Palomba, Mario Linares Vásquez, Gabriele Bavota, Rocco Oliveto, Massimiliano Di Penta, Denys Poshyvanyk, and Andrea De Lucia. 2015. User reviews matter! Tracking crowdsourced reviews to support evolution of successful applications. In Proceedings of 2015 International Conference on Software Maintenance and Evolution. IEEE, Washington, DC, 291-300. [9] Elizabeth Ha and David Wagner. 2013. Do Android users write about electric sheep? Examining consumer reviews in Google Play. In Proceedings of IEEE 10th Consumer Communications and Networking Conference (CCNC). IEEE, Piscataway, NJ, 149-157. [10] Sebastiano Panichella, Andrea Di Sorbo , Emitza Garching, Corrado A. Visaggio, Gerardo Canfora, and Harald C. Gall. 2015. How can I improve my app? Classifying user reviews for software maintenance and evolution. In Proceedings of IEEE 31st International Conference on Software Maintenance and Evolution (ICSME). IEEE, Washington, DC, 281-290. [11] Johan Natt och Dag, Vincenzo Gervasi, Sjaak Brinkkemper, and Björn Regnell. 2004. Speeding up requirements management in a product software company: Linking customer wishes to product requirements through linguistic engineering. In Proceedings of the 12th IEEE International Requirements Engineering Conference. IEEE, Los Alamitos, CA, 283-294. [12] Stephan Krusche and Bernd Bruegge. 2014. User feedback in mobile development. In Proceedings of the 2nd International Workshop on Mobile Development Lifecycle (MobileDeLi’14). ACM, Portland, OR, 25-26. [13] Lorenzo Villarroel, Gabriele Bavota Barbara Russo, Rocco Oliveto, and Massimiliano Di Penta. 2016. Release planning of mobile applications based on user reviews. In Proceedings of the 38th International Conference on Software Engineering (ICSE’16). ACM, New York, NY, 14-24. [14] Ning Chen, Jialiu Lin, Steven C. H. Hoi, Xiaokui Xiao, and Boshen Zhang. 2014. AR-miner: Mining informative reviews for developers from mobile app marketplace. In Proceedings of the 36th International Conference on Software Engineering, (ICSE 2014). ACM New York, NY, 767–778. [15] Sherlock A. Licorish, Chan Won Lee, Bastin Tony Roy Savarimuthu, Priyanka Patel, Stephen G. MacDonell. 2015. They’ll know it when they see it: Analyzing post-release feedback from the Android community. In Proceedings of 21st Americas Conference on Information Systems (AMCIS 2015). Curran Associates, Inc., Red Hook, NY, 1. [16] Sherlock A. Licorish and Stephen G. MacDonell. 2015. Analyzing confidentiality and privacy concerns: Insights from Android issue logs. In Proceedings of the 19th International Conference on Evaluation and Assessment in Software Engineering. ACM, Nanjing, China, 18. [17] Sherlock A. Licorish and Stephen G. MacDonell. 2013. The true role of active communicators: An empirical study of Jazz core developers. In Proceedings of the 17th International Conference on Evaluation and Assessment in Software Engineering. ACM, New York, NY, 228–239. [18] Kristina Toutanova, Dan Klein, Christopher D. Manning, and Yoram Singer. 2003. Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology. Association for Computational Linguistics, Stroudsburg, PA, 173-180. [19] Christopher D. Manning and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA. [20] Ole Holsti. 1969. Content Analysis for the Social Sciences and Humanities. Addison-Wesley, Reading, MA. [21] Björn Regnell and Sjaak Brinkkemper. 2005. Market-driven requirements engineering for software products. In Aybüke Aurum and Claes Wohlin (Eds.), Engineering and Managing Software Requirements. Springer, Berlin, Germany, 287-308. [22] Margret Butler. 2011. Android: Changing the mobile landscape. IEEE Pervasive Computing 10, 1 (2011), 4-7. DOI:10.1109/mprv.2011.1 [23] T. Wimberly. 2010. “Top 10 Android phones, best-selling get software updates first.” Android and Me, Nov. 26, 2010. https://androidandme.com/2010/11/news/

top-10-android-phones-best-selling-get-software-updates-first/. Retrieved May 4, 2018. [24] P Runeson and M Host. 2009. Guidelines for conducting and reporting case study research in software engineering. Empirical Software Engineering, 14, 2, 131-164. [25] Sherlock. A. Licorish and S. G. MacDonel. 2018. Exploring the links between software development task type, team attitudes and task completion performance: Insights from the Jazz repository. Information and Software Technology, 97 (2018), 10-25. [26] Sherlock. A. Licorish, B. T. R. Savarimuthu, and S Keertipati. 2017. Attributes that predict which features to fix: Lessons for App Store mining. In Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering (Karlskrona, Sweden, 2017). ACM, 108-117.