presented, instead of spending resources on developing application-specific solutions, .... component, on any application framework that offers one (e.g. Android: ..... When a direct quotation is used, always include the author, year, and.
Meet your Users
1
Meet your Users: In Situ Data Collection from within Apps in Large-Scale Deployments Nikos Batalas, P.Markopoulos, Javier Quevedo Fernandez, Jean-Bernard Martens Department of |Industrial DesignEindhoven University of Technology Abstract Increasingly, ‘app-store’ releases of software are used as a vehicle for large-scale user trials ‘in the wild’. Several opportunities and methodological challenges arise from having little or no access to users, other than through the application itself. So far, researchers have needed to hardcode survey items into the software application studied, which is laborious and error prone. This paper discusses how these problems are addressed using TEMPEST, a platform for longitudinal in situ data collection. We discuss the use of TEMPEST, alongside commercial tools, to study the deployment and real-world use of a tablet application called idAnimate; this application has been designed to support the creation of simple animations as design representations during the creative design process. We discuss how the tool has supported the gathering of data in over 2000 installations, both from a development and research perspective, and add to the discussion on future methodological research regarding large-scale app trials. 1. Introduction Usability and user experience testing typically requires the recruitment of test-participants who represent as closely as possible the intended users of the system being evaluated. Evaluations, whether in the laboratory or in the field, aim to ensure as much as possible that the usage evaluated is realistic. This can ensure that the understanding of the user experience achieved reflects what can be experienced in real use, and can allow one to expect that the problems identified in the interaction are also ones that might occur during actual usage. Potential improvements can then be prioritized and carried out before releasing the system. Inevitably though, there are limits to how representative of actual use a test situation can be. In the short time of testing, participants will usually not have the opportunity to develop their own strategies of use, or become expert users of the application (Henze, Pielot, Poppinga, Schinke, & Boll,2011). Field studies that ensure relatively longer exposure to the system under test have been heralded as the golden standard for evaluations and evaluative user research (Carter, Mankoff, Klemmer, & Matthews, 2008), but even they are limited in duration and sample size due to costs and other logistics. These limitations make it difficult to address larger populations of test participants and may even introduce sampling bias (Dufau et al., 2011). Last but not least, as pointed out by Brown, Reeves, & Sherwood (2011) reactivity is unavoidable with participants configuring their behaviour during trials to meet what they interpret to be the expectations of the researchers. All the above arguments suggest that from the perspective of users, field trials are still fundamentally different from actual installation and use of a system on their own initiative and for their own purposes. Distribution channels modeled as app stores open up new opportunities to reach large numbers of users and gather in-sights from contexts of actual use, allowing for evaluative studies with increased ecological validity. Compared to labor field studies, the evaluation of app-storereleased soft-ware takes place on user-owned hardware, which relieves re-searchers from the burden of providing and supporting de-vices along with the application software (McMillan,
Meet your Users
2
Morrison, Brown, Hall, & Chalmers, 2010). Also, it enhances the external validity of test results, as they now reflect the operation of software applications by actual users on the actual range of target devices where the software will be used. Interestingly, this distribution mode also allows researchers to release applications as probes in a wider research context. In such cases, having their application put to actual use in the wild, allows researchers to investigate not just how a particular application can be improved in terms of its usability, but also to achieve a better understanding of how users function in a wider environment, which the particular software application is only a part of. The application thus functions as both a conversation piece and as the medium through which a dialogue between developer (or researcher) and user can be realized. The potential for this is also acknowledged by Kranz, Murmann, & Michahelles (2013). Techniques for surveying users during wide deployments for actual use are still open to exploration, and researchers are confronted with challenges on many levels. Cramer, Rost, Belloni, & Bentley (2010) identified the following challenges: 1. Getting the users and data we need: Attracting desired users, but also acquiring feedback from users and their demographic data for association. 2. Technical development issues: Coping with a variety of platforms, devices, and their differences, and dealing with technical challenges. 3. Applications under construction: Deciding at what stage of development a research prototype can be released to maximize the value of user feedback. 4. Monetary concerns: Development costs, server operating costs, and technical support issues. 5. Playing by the rules: Taking ethics into account. In our research, we seek to lower the barrier for enabling the dialog of researchers with an unknown, large and re-mote user population, as conducted through an application distribution. With regard to getting the data needed (i), we advocate for an approach in which subjective (user-derived) data is collected alongside objective (application-generated) data. Such a practice is characterized by asking for data overtime and progressively refining queries to users. The readily adaptable nature of such research constitutes fertile ground for a discussion of ethics in this context (v). With regard to technical development (ii), by example of a solution, we advocate implementation possibilities for tools that can help developers add to their applications adaptive data-collection capabilities, with minimal effort. The tool that we present in this paper is platform-independent, and allows serving users with both default and custom-designed, expressive interfaces that can facilitate the collection of rich data. Making use of general-purpose tools such as the one presented, instead of spending resources on developing application-specific solutions, alleviates concerns related to development costs (iv). Finally, we present the instrumentation of an iPad application with the tool. The application has been deployed to over 2000 users through Apple’s App Store, and provides a concrete example of how the tool can be put to use. We offer an account of the issues that have arisen during the process. 2. Motivation In order to collect data from deployments in the wild, it is common practice to instrument applications with logging capabilities that can provide detailed knowledge of software, device, and user performance, as Henze (2012) did. Examples could be the frequency of button presses
Meet your Users
3
or time spent on a specific task. This data, quantitative in nature, is generated by the application as a by-product of user interactions. A host of commercially available applications can be used to perform such logging, such as Flurry (n.d.), AskingPoint (n.d.), and UserMextrix (n.d.). While these logs can be used to infer a user's activity within an application, logging tools do not facilitate the gathering of data directly reported by the users themselves, or do so with substantial limitations as to when questions can be asked and what interfaces they offer users for responding, e.g. allowing a survey consisting of native form elements, to be conducted when a user launches the application. It is important to ask users explicitly about their personal motivations, experiences and opinions. Even more so, such queries benefit from taking place in the immediacy of a specific context within the application, e.g., when a user decides not to save but, rather, to discard work done with the software, it could be important to understand the reason for such a decision. In mainstream practice, application stores offer two main channels for user opinions to be fed back to developers. One has to do with allowing users to report their satisfaction with an application by using rating scales. The second has to do with hosting comments, where users can provide open-ended feedback by typing in text fields (Ferreira, Kostakos, & Dey, 2012). However, these types of communication often take place outside the application, thus losing the immediacy and accuracy of the context and the time that the user feedback pertains to. An additional issue is that the initiative to provide feedback lies with the user and there is no guidance as to what the feedback should be about. Consequently, the feedback provided does not necessarily focus on what is most useful for development and research purposes. It is left up to the application itself to perform the collection of such data and developers are tasked with programming such capabilities, usually as point solutions that might serve a specific application or purpose sufficiently, but not lend themselves easily to wider adoption. A route for developers to take, especially when pressed for resources, is to hardcode (i.e. to make fixed parts of the source code) survey items into the application itself. This would demand of users to update their installed applications, if the questions need to be revised, at the risk however of introducing problems. Issuing frequent application updates may impact the usage and user experience. Also, users themselves decide if and when to update their applications, which might hinder data gathering purposes (Möller, Michahelles, Diewald, Roalter, & Kranz, 2012).
Meet your Users
4
Figure 1. The TEMPEST client can be executed inside a native web browser component (webView) and interface with the rest of the application, while encapsulating operations related to the server.
This work is situated within the context of gathering application data as well as explicit user feedback, in the frame of a large-scale actual deployment, over sustained periods, within the deployed software itself, on widely adopted mobile and desktop platforms. It is different from the aforementioned approaches though, in that it focuses on a general-purpose tool, rather than a point solution, and is platform independent. The tool can be used to log both qualitative and quantitative data. Furthermore, it allows for adapting the survey after release, without application updates, not only of questions submitted to users, but also of the interface elements that users can use for the provision of data. In this way it is possible to go beyond limited collections of radio buttons, checkboxes and sliders that are typically found in digitized questionnaires, and use richer interfaces, possibly better suited to user experience research, that may allow for richer data to be reported with less effort. For example, users could be asked to sketch their satisfaction with an application over time and identify relevant episodes that influenced this satisfaction, as proposed in the iScale method (Karapanos, Martens, & Hassenzahl, 2010) or use an expressive instrument for assessment of emotions, as proposed in PrEmo (Desmet, Hekkert, & Jacobs, 2000). We draw on the paradigm of the Experience Sampling Method (ESM) (Larson & Csikszentmihalyi, 1983), where study participants are prompted to provide in situ, answers to repeated questions over time, as set by researchers. The method is meant to take place in the context of participants' regular lives, as they unfold. Prompting can be time-based, where the system prompts for experiences at specified times, or triggered by specific activities or it can be that the initiative of reporting is left up to the participant. In the particular case of application deployments, the context is application-specific, and prompts for the provision of data can be triggered by the user's activity within the application (e.g., editing a document, saving it, etc.).
Meet your Users
5
Gathering data that pertain to the same line of inquiry over time can reveal the unfolding of a temporal process, both descriptively and in terms of causal analysis (Bolger & Laurenceau, 2013). However, unlike ESM studies, where the participating population is carefully selected, in the case of large scale deployments, applications are being released to an unknown audience. Thus, initial hypotheses about who chooses to use an application might need to be revised, and questions to participants might need to be adapted. Consequently, we consider it of importance for the researcher to be able to modify the questions at any time after deployment, both with regard to content, as well as with regard to the feedback interfaces that are presented to users. In the sections that follow we elaborate on the technical requirements stemming from such configurability. 3. Tooling for Data Collection In AppStore Deployments We have argued that it is important to support an iterative approach to data gathering from large scale application deployments, mainly because of limited initial knowledge about the user base. For this to take place, the following central requirements need to be addressed: • The ability to adapt the focus of the information requested from users over time. As more information about users and the context of use becomes available, initial assumptions might need to be modified or revised, and requests submitted to users to be adapted accordingly. • Maintaining an overview and control of the evolving study should be possible for researchers, in real time. • Changes to the questions or sampling strategy should not require users to update the application. To cater to such requirements, a solution would need to consist of: • A server to store data, and provide central monitoring. • An application for the authoring and configuration of the sampling instruments by researchers, possibly featuring a graphical interface. • A client that can be used as part of the application and can communicate with the server for updates and render the specified user interfaces when it is appropriate to do so. This client must be able to function offline, since connectivity to the Internet cannot be guaranteed at all times especially for users of smartphones and tablets. Implementing custom software for such functionality can be very expensive for small software developers and researchers, and quite likely out of scope for a typical app development project. To better address developer needs, a tool can abstract away from the components involved in the task, and simply offer: • A (single) function-call to serve the relevant interface to the application user (client). • Events generated by the client informing, the host application on how the process went, e.g., that submission of data to a server failed, or was successful. • The ability to associate application-generated data to subjective data reported by users. Description of Tempest
Meet your Users
6
TEMPEST (Batalas & Markopoulos, 2012) is a software platform aimed at supporting in situ data collection. It has been developed with the aim of enabling researchers to easily develop and administer surveys in situ. Surveys are easy to perform for standard questionnaires where web based survey instruments abound (e.g., surveymonkey (Surveymonkey) ), but setting up and administering surveys in experience-sampling studies is an endeavour for which no tools are considered established. We hypothesize that studies using experience sampling or its variants would be more widespread, if a substantial software engineering effort would not be required. The driving principles in building this software lie in two propositions: •
•
First, different user survey methods (questionnaires, diaries, experience sampling, etc.) and different methodological choices within each method can be effected as variations of programmed behavior obtained by setting relevant parameters. For example, the difference between repeated longitudinal sampling, to a traditional web based survey, is in the number of iterations that need to be performed, but can still be performed with the same materials. Or the difference between a diary and an experience sampling procedure might be limited to how a set of questions is triggered. Second, in situ data collection technology needs to accommodate a spectrum of stakeholders in terms of their respective skills. This calls for a modular and layered design that is addressable by people with different types of expertise. This could mean for example, not having to employ a software engineer to rebuild an application just to implement a change in styling of an interface's typeface, or the content and triggering of questions.On the other hand, a software expert should be able to easily extend the range of functions and report formats supported by the platform.
Figure 2. Previews of configurations in the graphical authoring editor, showing some standard and some custom widgets. At its core, TEMPEST consists of a Web Application, written in JavaScript, which can render survey items on a variety of target platforms while preserving the essential interface content and structure. The web application takes advantage of an internet connection for immediate data
Meet your Users relay and updates, but also remains functional offline by making use of the Application Cache and localStorage in HTML5. The client currently uses jQuery Mobile for its UI. Applications on mobile and desktop platforms can instantiate the Web Application inside a native web-browser component, on any application framework that offers one (e.g. Android: WebView, iOS: UIWebView, Windows Phone: WebBrowser, Windows 8: WebView, Qt: QWebView). Through a JavaScript bridge, the Web Application exposes its own API to native applications, to allow them to make use of the functionality it encapsulates. Additionally, the tool offers a graphical WYSIWYG editor, which allows researchers to configure the interfaces to survey participants with, as well as parameters for the WebApplication and its native host, and is meant to democratize the effort of building protocols and monitoring collected data (Figures 2 and 4). The Graphical Editor application generates configuration interfaces procedurally (Figure 3), making use of the exact same configuration mechanisms and widgets that can used for the composition of queries to the participant, essentially making use of itself for this purpose (Figure 1).
Figure 3. Configuring parameters for a standard multiple choice widget, and a custom made stopwatch with seconds and milliseconds hands.
7
Meet your Users
8
Finally, the tool includes a server for the Web Application and Graphical Editor components to retrieve configurations from and submit data to. Data is exchanged in the JSON format. With regard to the treatment of collected data, TEMPEST offers a monitoring interface where the latest data submissions can be overseen. It exports the data in the Comma Separated Values (CSV) text format, which can be imported into popular data analysis packages (e.g. R, Matlab, Excel, SPSS). To enable data analysis in more traditional programming environments, it also allows researchers access to data collected in the JSON format. The software architecture enforces a separation of concerns, which renders the tool customizable to a high degree by developers and designers at different levels of involvement. Moreover, it enables end-users to design, test, and debug data collection experiments on a level that abstracts away from more technical aspects of the platforms and instead deals only with components and representations relevant to designing in situ, longitudinal data surveys. In the next section we discuss a case study of the using TEMPEST to evaluate an application deployed through Apple's Appstore for research purposes. We first introduce this application, called idAnimate. 4. Case Study: idAnimate IdAnimate is a tool for rapid animation-authoring which enables users without explicit animator skills to author expressive animations (Quevedo & Martens, 2012). The tool was designed to support the early conceptual phases of interaction design, and allows different stakeholders to ‘sketch’ interactions and interactive experiences. IdAnimate has already been evaluated in two separate studies. First, a study was carried out in order to compare it with another state-of-the-art animation sketching tool (Quevedo & Martens, 2013). The study was conducted in a laboratory setting, where users were asked to perform a series of tasks with two tools (idAnimate and K-Sketch (Davis, Colwell, & Landay, 2008). Qualitative and quantitative measures were collected, such as time to completion, user satisfaction or simplicity of use. The second study was carried out in collaboration with the design department of a multinational company, where practitioners conducted a brainstorming workshop for generating game concepts (Quevedo & Buskermolen, 2013). The questions in this study were mostly aimed at understanding how well idAnimate suited the context of design workshops. By their nature these studies targeted a small and select audience, expected normative uses of the application, and could not shed light on the diversity of uses that idAnimate could have in practice. A new wide deployment using the Apple AppStore was decided, aimed at achieving a deeper understanding of what role animated visualizations may play in various contexts and how the tool fairs in supporting such a role. Understanding exactly what contexts this application can become useful in, and for what purposes it is used, was the key research question of this deployment. For the purposes of conducting a longitudinal study of the use of idAnimate in the wild, the following requirements needed to be met: • Easily ask questions to users, in different parts of the application. • Be able to first quickly set the questions up and redefine their exact content at later times, after the application has been initially deployed. • Avoid disturbing users when such changes happen.
Meet your Users • •
9
Capture additional data to what users report, e.g. metadata and relate what is collected with streams from other tools, such as Flurry. Be able to retrieve and visualize the outcome as soon as it has been submitted by the user.
5. Data collection mechanisms for idAnimate The researchers employed two types of data collection mechanisms required for the study, analytics and user sampling. Analytics Although TEMPEST can interface with the application, and gather machine data as well, and despite the fact that laborious testing went into TEMPEST and in its use within idAnimate, analytics tools that have been commercially proven stable were used to collect applicationgenerated data. These include measures such as the number of sketches created per animation, the amount of time spent drawing and animating. In order to do this, the Flurry and AskingPoint analytic tools for iOS were used. User Sampling While analytics can provide objective measures about actual usage of the tool, these alone are not sufficient to learn about the different contexts of use. For this purpose the study required to explicitly ask users about their motivations and context while they were working with the tool. The researchers wished to know the background of the users, i.e., who is using the application, how much previous experience with animation they have, or demographic information, such as age or profession. Furthermore, the researchers needed to know for what purpose each particular animation was made, why it was made with that particular tool as opposed to other available ones, or how many people were involved in making it. It was important to be able to determine the exact point in time at which a particular question would be presented to the user. It is not possible to collect such information with available analytics tools. Hard-coding specific survey items within the application itself would have been a quick solution, but was considered sub-optimal. Since the sampling instruments would need to be updated over time as the investigation of users progressed, this would require issuing application frequent updates. Furthermore, the researchers did not have the time to develop a complete sampling solution for the study. The surveys needed for this case consisted of conventional forms, those that can be created using existing surveying websites (the likes of Google Forms). Such forms can be loaded within a WebView component and would also have allowed researchers to update their survey. Such solutions however have not been developed to explicitly serve as embedded modules in an application, and do not accommodate requirements specific to this scenario. To do so, they would have to be able to function offline, and be able to interface programmatically with the native tablet application and allow the addition of metadata to the user responses. In light of these limitations of other solutions, TEMPEST presented the best match to the developers’ needs, so it was used for surveying the application users.
Meet your Users
10
Figure 4. Calendar heatmap and overview of recently collected data within the tool's interface. Embedding TEMPEST in idAnimate IdAnimate is an iPad application; its developer is a computer scientist with experience in iOS development, but relatively little hands-on experience with JavaScript. However, embedding the Web Application module of TEMPEST in the application was deemed a straightforward programming process. In more detail, using the module required the developer to take two steps: • Instantiate a UIWebView object with an HTTP request for a survey container. • In the authoring interface of TEMPEST, in his role as a researcher, configure the content of the survey. In functional terms, the module behaves within idAnimate as follows: • On first run, the WebView downloads the TEMPEST client, which now resides in the browser's app-cache, making itself available offline. It assigns a unique ID to itself, also identifying the particular user (without violating their privacy). • On each initialization, the client queries its server for updates to itself and for content that is specific to the survey container. It also uploads data that had been collected offline. If it finds itself offline, it resorts to working with previously downloaded content. • As users interact with the survey and submit data, the client either transmits it, or stores it in the browser's localStorage, if offline. • The JavaScript client makes calls to the native application at various points to let it know of important events, such as making a request to the server or its result. • If idAnimate needs to attach implicit data to user input, it does so in code through the JavaScript bridge. The client attaches the data accordingly. Since the
Meet your Users
11
developer also wished to use Flurry, an ID was attached by the application to the answers provided by users, to associate the two streams of data (qualitative and quantitative) as derived from the same source. User sampling is triggered at various events during the use of the application: on first launch, data about the user's profile is requested (e.g. gender, age group, professional identity). Repeated measures are taken when editing a project (e.g. asking the purpose of the work, Figure 6), replaying an animation, or sharing a project by email.
Figure 5. Distribution of reported reasons for exporting animations made with idAnimate. Given the use of different data collection platforms, there would have to be a way to couple the data collected from each, and be able to associate and filter the different system-level events, and user responses based on the background of the participants or the context of use. To achieve the coupling of the data sources, when the application starts for the first time, a unique anonymous identifier number is generated (UDID) and stored in the application. The answers are to queries preformed with TEMPEST are tagged with this UDID, by calling the instantiated Web Application's JavaScript API. The UDID is also attached to every event that is logged using the Analytic tools. Deployment
Meet your Users
12
Adaptations in questions. The release in Apple's App Store has had over 2000 unique installations of idAnimate, from March 2013 to December 2013. During this time, the researchers monitored the answers that were provided by users, and were able to effectively adapt the questions based on intermediate results. An example of such an adaptation was the profile questionnaire that is issued the first time the application runs. It was set up to ask for gender, occupation and age group. After the release, the research team realized that it was also important to ask how experienced the user is with animation. Using the web based interface of TEMPEST, they adapted the question to include an item about experience in animation in a matter of minutes. Later on, they also realized that, also in the profile, options for the age groups ("younger than 18'', ``18-45'', ``older than 45") did not have sufficient resolution. They were changed to ``younger than 14'', ``14 to 18'', ``18-25'', ``25-45'', ``older than 45''. Also, the profile question included a limited number of options regarding the background and occupation of the user (student, engineer, designer, etc), but the users were also allowed to describe custom professions using the other field. After collecting data from a few hundreds of users, the research team realized that they had a significant amount of users that described themselves as teachers. The research team then proceeded to add this as one of the predefined options. Similar changes took place over time in questions that were asked in other contexts of the application, such as when editing an animation, or when playing one back. In all these cases, no changes needed to be made in the application code and the users could continue using the existing version of the software they had downloaded initially. Collecting and analyzing the data. Collecting the data is only the first phase of a study. It is equally important to be able to quickly access this data as it builds up over the course of time. The Flurry analytics tool posed a serious problem for the researchers, as it was not possible to easily access all of the data that had been submitted for each of the UDIDs registered. AskingPoint on the other hand did make this possible with a mechanism that despite not being documented, was provided by the AskingPoint staff after the research team contacted them with this particular issue. TEMPEST offers a simple approach to extracting the data. The research team was able to access the website and check the responses as they came in, but more importantly for them, they were able to write custom programs to process it. Retrieving the data in the JSON format, the idAnimate research team wrote a series of custom scripts in the Ruby language that generate reports on the particular aspects that they were interested in. The following example illustrates the type of results that the idAnimate research team was able to collect using TEMPEST: IdAnimate has the capability to generate movies from the animations. The movies can be stored in the device, or shared via emails or to social networks. The researchers seeking to understand what the actual purpose for which the people where exporting animations to movies. For this purpose, upon exporting a movie, the users are asked to describe the reason behind their actions by presenting the following question in the middle of the screen: ``For what reasons did you export this movie?" The users could answer ``I want to archive the animation'', ``I want to put it in a presentation'', ``I want to share it with someone'', or ``Other'' and a text input. Figure 5 illustrates distributions for the possible answers for each of the user groups. The responses show that designers export animations mostly to share them, students are more interested in archiving a copy, and artists wish to present them to an audience.
Meet your Users
13
In reports of ``other" items, people were exporting movies also to test or check the quality. Also, in all contexts the tool was being used most of the time by a single person and less frequently by groups.
Figure 6. Asking a question in idAnimate 6. Related Work and Discussion Conducting research with applications in wide distribution, such as through an online app store, and specifically soliciting feedback from users in a targeted way at specific events and contexts, presents software challenges that pertain to the conducting longitudinal research, the diversity of
Meet your Users
14
devices and contexts the application is used in (e.g. lack of connectivity to the Internet), the diversity of the user population, and to the development effort required for servicing the needs. This holds even more for mobile applications, as opposed to desktop environments. A general purpose solution, packaged as as an easy to use tool can be a useful aid, both to the development effort, and to the research process. The multitude of commercial general-purpose platforms for collecting application usage data is indicative of the value of such systems. However, typical commercial solutions for surveying or analytics have not yet turned their focus to features that fully respond to the spectrum of challenges that are associated with gathering user feedback from users. For example, while AskingPoint currently offers the ability to serve one-time questionnaires to the users of a mobile application, this is neither meant to take place over time, nor with respect to specific context of activity that the user is performing. Other works have also addressed the issue of sampling users within the application, with minimal impact on the application development process. For encouraging users to report satisfaction from within applications, Lettner, Holzmann, & Loesch (2013) propose a way to make it easy for developers to push and trigger questionnaires with 5 point rating scales, focusing on the System Usability Scale (SUS), after the application has been released. Their proposed method is specific to Android development. With regard to adapting the questions posed to users after the release of the application, McMillan et al. (2010) released an iPhone game, in which they could ask questions of their own interest to their participant population in exchange for in-game tokens. The questions could be updated without requiring an application update, and would either offer drop-down lists of possible answers, or allow free text input. However, while consistent with the approach presented here, their solution the survey instrument was specific to the game deployed. These works remain point solutions to a problem that can be addressed in more generic terms, and they only address facets of the larger issues it involves. Instead, the work presented takes into account the diversity of platforms that applications can be deployed to, and also considers the fact that the data collection process needs to be monitored and maintained. An indispensable part of the tool is the Graphical Editor that not only allows configuration of the questions, but also helps monitor the data collection. The deployment with idAnimate has provided a unique opportunity to test its suitability for such research, and derive directions for future features. In particular, the following opportunities for future functionality to be added have been identified. A first insight is to further enhance the adaptivity of the questionnaire, by making it possible to associate survey items with practically any action the user can take with the application after release: this requires that the mapping of user actions to survey items has to be factored out of the application software, and can be accomplished by implementing a queryable registry that remains synchronized with the server for updates. Second, an important direction to take in the immediate future, is to provide further automation of the study setup, with rule-based partitioning of the user population. Depending on how they are responding to questions, users could automatically be assigned to dynamic groups. Separate questions, more appropriate to the common characteristics of its users, could then be assigned to each group. The need for such a development was illustrated by the following situation during the case study: idAnimate was developed for use in design studios by professionals or students; however out of 655 users who reported their age group, 103 had answered ``younger than 18" and after the revision to the question, 121 additional users answered they were younger than 14 years old,
Meet your Users
15
revealing the potential recreational use of the application by an unexpected user group. The evaluation planning had not anticipated this audience and the research methods were not tailored to children participants. In this case, the opportunity arises to address the survey of younger participants separately, by splitting the survey according to reported age and adapting questions to each user group. This occurrence and its possible solution also points to an issue of ethical practice, which is how to manage a user's understanding and expectation of data that is being gathered, and this particular case, asked for. The importance of dealing with such issues is stressed by McMillan Donald, Morrison, & Chalmers (2013). In idAnimate, users had the option to dismiss the questions, but the issue remains in keeping users sufficiently informed about possible implications when they do decide to offer responces. Flexibility in adapting the questions during the course of the study means that they can change is ways that researchers might be unable to predict at the start, and certainly after participants have provided their consent. This calls for special attention potential issues that might arise, with regard to trust, transparency, and the implementation of informed consent procedures that are appropriate for adaptive longitudinal studies. Finally, in light of changing the measures that are recorded over time, methodological pitfalls can come into existence, such as the treatment of past data in relation to more recent data. Should data that has been collected earlier and has been superceeded by more recent inputs be taken into account and if so, in what way? 7. Conclusion Conducting user studies with actual software applications deployed on a large scale, in context, is an emerging area of investigation for the field of human computer interaction, with enormous potential, where best practices are yet to be established. This paper has argued that tool development is a necessary step that can create new opportunities for data collection and can maximize the potential to gather valuable data, while avoiding pitfalls from a methodological and an engineering perspective. The contribution of this paper has been to identify and elaborate a set of distinct requirements that tools in support of research with large-scale app deployments could satisfy. The paper discusses these requirements as they arise from the specific case of researching application software in the wild, and discusses in what ways TEMPEST, a system for in situ data collection, meets these requirements. The validity of TEMPEST was demonstrated with a case study involving the deployment of an application for research purposes to a population of 2000 users. The solution we proposed can help researchers cope with the lack of prior knowledge about the user base. TEMPEST can work on different hardware and software platforms, and allows a researcher to monitor results and update sampling instruments as the deployment unfolds and knowledge about the actual user base and their usage of their application is accumulated. 8. References 1. Askingpoint. (n.d.). Retrieved from http://www.askingpoint.com 2. Batalas, N., & Markopoulos, P. (2012). Considerations for computerized in situ data collection platforms. Proceedings of the 4th ACM SIGCHI symposium on Engineering interactive computing systems (EICS '12). ACM, New York, NY, USA, 231-236.
Meet your Users
16
3. Bolger, N., & Laurenceau, J. (2013). Intensive longitudinal methods: An introduction to diary and experience sampling research. 4. Brown, B., Reeves, S., & Sherwood, S. (2011). Into the wild: challenges and opportunities for field trial methods. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '11). ACM, New York, NY, USA, 1657-1666. 5. Carter, S., Mankoff, J., Klemmer, S., & Matthews, T. (2008). Exiting the cleanroom: On ecological validity and ubiquitous computing. Human-Computer Interaction, 23, 47-99. 6. Cramer, H., Rost, M., Belloni, N., & Bentley, F. (2010). Research in the large. using app stores, markets, and other wide distribution channels in ubicomp research. Proceedings of the 12th ACM international conference adjunct papers on Ubiquitous computing - Adjunct (Ubicomp '10 Adjunct). ACM, New York, NY, USA, 511-514. 7. Davis, R. C., Colwell, B., & Landay, J. A. (2008). K-sketch: a’kinetic’sketch pad for novice animators. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '08). ACM, New York, NY, USA, 413-422. 8. Desmet, P. M., Hekkert, P., & Jacobs, J. J. (2000). When a car makes you smile: Development and application of an instrument to measure product emotions. Advances in consumer research, 27(1). 9. Donald, M., Morrison, A., & Chalmers, M. (2013). Categorised ethical guidelines for large scale mobile HCI. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '13). ACM, New York, NY, USA, 1853-1862. 10. Dufau S, Duñabeitia JA, Moret-Tatay C, McGonigal A, Peeters D, et al. (2011) Smart Phone, Smart Science: How the Use of Smartphones Can Revolutionize Research in Cognitive Science. PLoS ONE 6(9): e24974. 11. Ferreira, D., Kostakos, V., & Dey, A. K. (2012). Lessons learned from Large-Scale user studies: Using android market as a source of data. International Journal of Mobile Human Computer Interaction (IJMHCI), 4(3), 28-43. 12. Flurry. (n.d.). Retrieved from http://www.flurry.com 13. Henze, N. (2012). Hit it!: an apparatus for upscaling mobile HCI studies. CHI '12 Extended Abstracts on Human Factors in Computing Systems (CHI EA '12). ACM, New York, NY, USA, 1333-1338. 14. Henze, N., Pielot, M., Poppinga, B., Schinke, T., & Boll, S. (2011). My app is an experiment: Experience from user studies in mobile app stores. International Journal of Mobile Human Computer Interaction (IJMHCI), 3(4), 71-91. 15. Quevedo, J., & Buskermolen, D. (2013). Towards understanding the potential of sketching animated visualizations in generative workshops. Proceedings of the 6th International Conference on Designing Pleasurable Products and Interfaces (DPPI '13). ACM, New York, NY, USA, 106-115. 16. Quevedo, J., & Martens, J. (2012). Demonstrating idAnimate: a multi-touch system for sketching and rapidly manipulating animations. Proceedings of the 7th Nordic Conference on Human-Computer Interaction: Making Sense Through Design (NordiCHI '12). ACM, New York, NY, USA, 767-768. 17. Quevedo, J., & Martens, J. (2013). idAnimate: a general-Purpose animation sketching tool for Multi-Touch devices. Proceedings of the Fifth International Conference on Creative Content Technologies. May 2013. Valencia, Spain., 38-47.
Meet your Users
17
18. Karapanos, E., Martens, J., & Hassenzahl, M. (2010). On the retrospective assessment of users’ experiences over time: memory or actuality? CHI '10 Extended Abstracts on Human Factors in Computing Systems (CHI EA '10). ACM, New York, NY, USA, 4075-4080. 19. Kranz, M., Murmann, L., & Michahelles, F. (2013). Research in the large: Challenges for large-scale mobile application research a case study about nfc adoption using gamification via an appstore. International Journal of Mobile Human Computer Interaction (IJMHCI), 5(1), 45–61. 20. Larson, R., & Csikszentmihalyi, M. (1983). The experience sampling method. New Directions for Methodology of Social & Behavioral Science, 15, 41–56. 21. Lettner, F., Holzmann, C., & Loesch, L. (2013). Mobile surveys. R. Moreno-Díaz, F. Pichler, & A. Quesada Arencibia (Eds.), Computer aided systems theory - eurocast 2013 (Vol. 8112, p. 400-408). Springer Berlin Heidelberg. 22. McMillan, D., Morrison, A., Brown, O., Hall, M., & Chalmers, M. (2010). Further into the wild: Running worldwide trials of mobile systems. Proceedings of the 8th international conference on pervasive computing, 210–227. Berlin, Heidelberg: Springer-Verlag. 23. Möller, A., Michahelles, F., Diewald, S., Roalter, L., & Kranz, M. (2012). Update behavior in app markets and security implications: A case study in google play. Proceedings. of the 3rd Intl. Workshop on Research in the Large. Held in Conjunction with Mobile HCI , 3-6. 24. Surveymonkey. (n.d.). Retrieved from http://www.surveymonkey.com 25. User mextrix. (n.d.). Retrieved from http://www.usermetrix.com
Meet your Users
18
Begin your paper with the introduction. The active voice, rather than passive voice, should be used in your writing. This template is formatted according to APA Style guidelines, with one inch top, bottom, left, and right margins; Times New Roman font in 12 point; double-spaced; aligned flush left; and paragraphs indented 5-7 spaces. The page number appears one inch from the right edge on the first line of each page, excluding the Figures page. 9. Headings
Meet your Users
19
Use headings and subheadings to organize the sections of your paper. The first heading level is formatted with initial caps and is centered on the page. Do not start a new page for each heading. Subheading Subheadings are formatted with italics and are aligned flush left. Citations Source material must be documented in the body of the paper by citing the authors and dates of the sources. The full source citation will appear in the list of references that follows the body of the paper. When the names of the authors of a source are part of the formal structure of the sentence, the year of the publication appears in parenthesis following the identification of the authors, for example, Smith (2001). When the authors of a source are not part of the formal structure of the sentence, both the authors and years of publication appear in parentheses, separated by semicolons, for example (Smith and Jones, 2001; Anderson, Charles, & Johnson, 2003). When a source that has three, four, or five authors is cited, all authors are included the first time the source is cited. When that source is cited again, the first author’s surname and “et al.” are used. See the example in the following paragraph. Use of this standard APA style “will result in a favorable impression on your instructor” (Smith, 2001). This was affirmed again in 2003 by Professor Anderson (Anderson, Charles & Johnson, 2003). When a source that has two authors is cited, both authors are cited every time. If there are six or more authors to be cited, use the first author’s surname and “et al.” the first and each subsequent time it is cited. When a direct quotation is used, always include the author, year, and page number as part of the citation. A quotation of fewer than 40 words should be enclosed in
Meet your Users
20
double quotation marks and should be incorporated into the formal structure of the sentence. A longer quote of 40 or more words should appear (without quotes) in block format with each line indented five spaces from the left margin.1
Meet your Users
21
10. References Anderson, Charles & Johnson (2003). The impressive psychology paper. Chicago: Lucerne Publishing. Smith, M. (2001). Writing a successful paper. The Trey Research Monthly, 53, 149-150. Entries are organized alphabetically by surnames of first authors and are formatted with a hanging indent. Most reference entries have three components: 1. Authors: Authors are listed in the same order as specified in the source, using surnames and initials. Commas separate all authors. When there are seven or more authors, list the first six and then use “et al.” for remaining authors. If no author is identified, the title of the document begins the reference. 2. Year of Publication: In parenthesis following authors, with a period following the closing parenthesis. If no publication date is identified, use “n.d.” in parenthesis following the authors. 3. Source Reference: Includes title, journal, volume, pages (for journal article) or title, city of publication, publisher (for book).
Meet your Users
11. Appendix Each Appendix appears on its own page.
22
Meet your Users
12. Footnotes 1
Complete APA style formatting information may be found in the Publication Manual.
23
Meet your Users Table 1 Type the table text here in italics; start a new page for each table [Insert table here]
24
Meet your Users
13. Figure Captions Figure 1. Caption of figure
25
14. [Figures – note that this page does not have the manuscript header and page number]