Considerations for computerized in situ data collection ... - CiteSeerX

Considerations for Computerized In Situ Data Collection Platforms Nikolaos Batalas Eindhoven University of Technology Den Dolech 2, 5600MB Eindhoven, The Netherlands [email protected]

Panos Markopoulos Eindhoven University of Technology Den Dolech 2, 5600MB Eindhoven, The Netherlands [email protected] The past decade has seen an impressive evolution of devices that are affordable, portable and networked, merging computing and sensing capabilities. PDAs in the past, and smartphones and tablet computers in the present, are prime products of this evolution. The more these devices are adopted by consumers and become instruments of communication and information handling within their daily activities, the better suited they become to in situ data collection. They are always on one’s person, always functioning, multitasking, ready to serve the needs of a researcher, while still performing their primary functions.

ABSTRACT

Computerized tools for in-situ data collection from study participants have proven invaluable in many diverse fields. The platforms developed within academic settings, eventually tend to find themselves abandoned and obsolete. Newer tools are susceptible to meeting a similar fate. We believe this is because, although most of the tools try to satisfy the same functional requirements, little attention has been paid to their development models also keeping in line. In this paper we propose an architectural model, which satisfies established requirements and also promotes extensibility, interoperability and cross-platform functionality between tools. In doing so, we aim to introduce development considerations into the larger discussion on the design of such platforms.

In order to leverage the potential of these devices for data collection, the research community has built during the last decade, software tools, some of which have been made freely available[3][5][9][12]. They tend to be complex pieces of software that need to be configurable by researchers, have networking capabilities, and often feature server components. Development of these kinds of platforms is a non-trivial task. Therefore, it makes sense to have generic tools that can accommodate common data collection needs across research areas, such as the compilation and distribution by researchers, of questionnaires to be filled out at opportune moments, the capturing of photos and other media by participants that would be of use to the researcher, or the detection of the context within which events take place, through the sensors of the device.

Author Keywords

Software Architecture and Engineering; End-User Programming; In situ data collection ACM Classification Keywords

J.4 [Computer Applications]: Social and Behavioural Sciences; D.2.11 [Software Engineering]: Software Architectures INTRODUCTION

In the academic community, efforts to build such tools have always been carried out on platforms available at the given time, targeting the functional requirements needed to implement research methods. The focus has been on the particular methods and their applications, rather than the development process. As such, development has been carried out with an end-user mindset, where the software is a means to an end, not the end itself[14]. As a result, iterations on the tools have been concerned with the evolution of the requirements that need to be satisfied and not with the iterative evolution of the software as an artifact.

Research Methods that rely on data collection from participants in situ, while their daily lives unfold, are seeing wide adoption from researchers in a variety of different fields, ranging from clinical psychology[16] to human computer interaction[7]. Methods to do so include diaries, where participants are instructed to log events, as well as the Experience Sampling Method[15] and its variants. Some of the most common reasons for using these methods are, mitigating memory biases in self reports, and making sure that observations take place within the context of interest, thus ensuring ecological validity[6].

Admittedly, the community that is interested in building data collection tools seems to be converging to an implicit paradigm of constituent components for the tools, as indicated by an agreement on the need for client-server components and configuration interfaces[13][8]. However, it has only been roughly outlined and has not yet been made explicit in more specific terms. We feel that the lack of discourse on this topic is hampering the software products in terms of interoperability and cross platform functionality.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. EICS’12, June 25–26, 2012, Copenhagen, Denmark. Copyright 2012 ACM 978-1-4503-1168-7/12/06...$10.00.

231

oper must know of the existence of the reusable artifact) and acceptability (the reused artifact must be acceptable to the developer for use in the new project and its environment) as the two major factors that prevent one from reusing code. In the same study, the complexity of the code examined for reuse was found to be a prominent inhibitor of acceptability. Additionally, in the case of open source software developed in academic environments, as people move on to different things, projects naturally tend to lose their own leading developers, which is a cause for their presence, as viable options for reuse, to fade[4].

In this paper we try to address this issue by introducing requirements for development in the general discussion on requirements, and by proposing a generic but also realistically implementable model for the construction of software for data collection in the field. In doing so, we aspire to promote the case for a general development platform which can be extendable and future-proof, as the basis for these tools. In the following sections we outline the issues that data collection tools face in addressing challenges in the application of in situ research methods. Requirements gathered from previous works are then summarized, and requirements for development are proposed. Finally, a model which satisfies these requirements is detailed, and an example of its implementation is briefly discussed. We conclude by offering thoughts on the benefits of the approach.

Conclusively, for all their merits, individual software applications for in situ data collection, have not, to our knowledge, been able to exhibit resilience against, or respond to the corrosive effects of age of their hardware and inscrutability of their codebase. Past tools have been good at the tasks they have been designed for, on the platforms they have been written for. However, they are platform specific, and their code base has not been migrated to newer platforms, nor have they had additional features added to them, and therefore are being rendered obsolete. More recent works are also susceptible to the same challenges.

BACKGROUND

Researchers planning to perform data collection in a computerized manner, would find significant obstacles upon choosing to employ ad hoc means. They would have to make sure that their tools are robust, perform as expected during the course of the study, and gather reliably the data intended for collection.

The situation is aggravated by the fact that each tool stands insulated, developed in isolation from similar efforts. This has led to a very fragmented landscape of tools. However, the observation that all these tools meet familiar functional requirements gives us hope that, expanding the problem domain to include developers in the set of stakeholders, will, in a fashion similar to the functional requirements, lead to succinct, interoperable components for future development efforts to rely upon.

A preferable option is to use one of the tools that are already available, as is. Tools that have been used widely in the past, and have been extensively tested in the field[3][9], are still in use, but have aged along with the hardware they have been written for. Handheld devices become outdated at quite a fast pace. Smartphones running Windows Mobile, dominant in 2007, are no longer in the market, and PDAs are now relics of the past. To use them in the present, researchers have to get hold of legacy devices, and impose their use on their participants.

REQUIREMENTS

Previous efforts to build data collection tools have identified a number of requirements the software should meet with regard to the researchers and the participants. However, they have only focused on these two roles, overlooking the role of the developer. The following section lists the findings of past works, in relation with the researcher and the participant, and suggests considerations for the development of the software.

Currently modern tools do not present such problems in terms of hardware, but still need to be considered in the light of how well they can be adapted to the evolving needs of research methods. As sensor technology embedded in smartphones is growing, the potential for capturing rich data in novel modalities for studies should not be left untapped, and ways to elicit input from participants evolve as well, moving past the traditional questionnaire format, into diverse research instruments. These might involve multimedia or even functional application prototypes, as can be the case for user centered design[10]. Therefore, given pieces of software cannot universally satisfy every inquisitive requirement on the part of researchers. For them, employing alternative ways to collect data, also comes at the expense of implementing the interfaces to do so in software.

Requirements for researchers

Researchers should have the ability to monitor the process of data collection in real time. This gives them the ability to detect and deal with participant dropout issues, which could take place because of fatigue or device failures. Also, it allows them to better curate collected data for timely follow ups such as interviews[3][5][11][12]. In conjunction with real-time monitoring, real-time modification of the study should also be provided. The setup of the study could contain oversights, errors or be found to yield lower information quality than expected. Easy, remote reconfiguration of the study in real time can help salvage such situations.[5][8]

Indeed, some of these tools are available as open source software[3][9][5][12], and researchers who can develop software could theoretically adapt an open source tool to tailor its functionality to their own needs. Reuse of source code is commonly seen as a way to increase productivity and save time and effort in software development. On the other hand, several barriers to software reuse exist. A study by Agresti[1] is indicative of the barriers developers face when considering reuse of software components. It cites awareness (the devel-

It is impossible to meet the exact needs of every researcher with regard to the interfaces that can be presented to participants for data gathering. Domain-specific studies require cus-

232

database systems. On the other hand, the latter cannot foresee or care for all the possible ways a participant might be required to interact with the system for data input, or how collected data could be visualized. To deal with this, the platform needs to maximize the ability of problem-owners to cater to their own needs, but not require them to be concerned with greater issues.

tom interfaces, up to the point of fully functional applications, that only researchers can understand, and should be able to deliver to their participants. In the programmed behaviour of data collection software, the strict separation of studies to diary, experience sampling, context or event contigent questionnaires and all their variants, can be quite artificial. Study protocols could benefit from mixing methods that have been traditionally considered separately, as has been the case in [17].

At the same time, stakeholders in the development process should be thought of as users, as much as developers. A programmer focusing on a client application, can be thought of as a user of the server’s facilities or vice versa, while a researcher interested in the development of a specialized widget for participants to interact with, can be thought of as user of the client’s facilities, and so on.

Requirements for participants

Participants should not have to use mobile devices issued by the researchers. For researchers, monetary cost can be high, and the number of participants would be limited to the number of available devices. For participants, adversities related to adoption and retention of technology could apply. Rather, use of a participant’s own device should be made. [5][11]

It would therefore be fair to say that the platform we should be building is faced with an interesting dichotomy. On one hand, it needs to rely on professional programming skills that can make accessible and abstract sets of features that are realized through hard to manage technologies, such as servers, databases and low-level hardware capabilities. These relate to the whole range of stakeholder interests, right down to the participant.

Also, the burden of installation or maintenance of the software’s uninterrupted function should be minimal. Moreover, the software should not stand in the way of the regular use of the device[9][13]. Access to the study should be possible from multiple platforms. As participants move between different devices and contexts, from smartphone to tablet to desktop pc, so could access to the study follow.

On the other hand our platform should accommodate more specialized end-user development intents, which relate only to small partitions of stakeholders, on a case by case basis. These could be achievable with as simple means as producing configuration specs, or with the slightly more complex customization of components, or even with regular programming. In these cases, the application of widely established practices should be allowed and the use of pre-existing knowledge should be encouraged. Demand for domainspecific scripting and use of custom frameworks should be discouraged.

Ability or desire to comply with the study’s instructions might be varied from user to user. Study designs should allow using tailored configurations for each user.[5] Requirements for development

In addition to the previous set of requirements, as has been proposed by previous works, this section lays out considerations that pertain to the development of the tools. Both the platform, as the product of software engineering, and the people involved in its development and use, as drivers of the engineering process, need to be considered.

SYSTEM DESIGN SPECIFICATIONS

Our model acknowledges two dimensions of developmentrelated concerns. One dimension is that of developer roles, adopting the view that the distinction between end-user and professional development, is one of intent and is continuous instead of being dichotomous, as defined by Ko et al. In [14] they state,

A notable characteristic of the platforms for data collection in the field is that they can, and need to be, in perpetual development. Their goals need to be constantly shifting and new sets of features will always need to be added in order to take advantage of the latest hardware or to satisfy the evolving needs of researchers. To address such issues a careful layering of components with clear roles needs to be applied. Hardware-specific layers should aim to be as thin as possible. Component layers should be isolated from one another, so that modifications to one cause minimal rippling effects to the next.

“as the number of intended uses of the program increases, a programmer will have to increasingly consider software engineering concerns in order to satisfy increasingly complex and diverse constraints. Second, even if a programmer does not intend for a program to be used by others, circumstances may change: the program may have broader value, and the code which was originally untested, hacked together, and full of unexercised bugs may suddenly require more rigorous software engineering attention”.

Another distinctive property is that development of, and extensions to the platform need to accommodate a wide spectrum of developer roles. Multiple people with varying intent could potentially take on the role of developer, ranging from the social scientist to the professional programmer. They have varying levels of expertise and they target different aspects of the platform. For example, the former have little expertise in, and little patience for building low-level mobile services or

This view of developers allows us to match their variable intent to specific system components, which make up the second dimension of our concerns. In this way, we aim to make explicit how the people involved in development, and make use of the platform, relate to each system layer, and offer some insight as to how concerns can be separated not only

233

across components, but also across developers, whose aim is to take advantage of the potential for reuse.

platform server

excited about the opportunities for development, deployment and use that are opened up[2]. In Figure 1, we take the liberty of dividing the layers below the browser into optional and essential, to indicate that the essential layers are the absolute minimum for the most basic data collection study, such as a diary, to run. The more the needs of a study scale, the more essential and rich in features the ‘optional’ layers need to be.

essential components optional components

data model controller server API

...

By discussing the role each component plays in the platform, we hope to show how the requirements we established previously can be satisfied:

3rd party servers

Server

HTTP

object configurations

platform objects

3rd party objects

dynamic execution layer ( browser )

configuration authoring interface

The server maintains the central data store, and exposes an API,implemented in the controller, through which requests to retrieve or store data can be made. Requests can be made by both the researcher client and the participant client, providing the grounds for real-time communication between researchers and participants.

monitoring interface

platform objects

Participant client Native layer and client API.

dynamic execution layer ( browser )

There are cases when the browser’s features are not sufficient. The native layer serves two purposes. It integrates the platform with the rest of participant’s device as a regular application. It can be launched and used in way familiar to the participant, or monitor its environment through a background process, and trigger events and make use of the server API. Its second purpose is to provide access to the client’s hardware components and operating system functions. Sensors and local data stores and background processes can be accessed, encapsulated into the client API, and offered to the browser as javascript functions to be used by layers above.

client API native layer (android, iOS, Qt, etc)

participant client

researcher client

Figure 1. An overview of the system’s components. Arrows indicate http requests to store or retrieve data.

This section presents a generic view of how the system is layered. Figure 1 gives an overview of the 3 main subsystems, a server and two client components, one to be used by the participant and the other by the researcher. Central to the system’s composition is the dynamic execution environment that the browser has become nowadays, which is commonplace in smartphones, tablets and desktop systems. Alongside existing as a standalone application, it is also integrated in frameworks such as iOS, android and Qt, providing interfaces to lower-level system features. The javascript programming language features many capable and very actively developed frameworks, such as jQuery and jQuery mobile, which offer exemplary extensibility in the form of easy to reuse plugins. Also. the advancing scripting features of HTML5 make it a robust and rich platform which can facilitate many levels of development expertise.

Platform objects and third party objects

The platform objects, adhering to the object-oriented paradigm, are units that encapsulate javascript code and state variables, and are instantiated and executed in the browser. They can be the interface components which are expose to the user, and can be as simple or complicated as the needs of the study dictate and the developer’s skills allow. They can make use of the underlying client API, as well as the server API. An example of a very simple platform object would be an html-form. More complicated ones can implement complex logic, display graphics and audio, take pictures with a smarphone’s camera, or even be functional application prototypes to be evaluated by users. If, in the set of their state variables, they make the kinds of parameters they are initialized with explicit, then object configurations can be produced for them through the authoring interface on the researcher’s client.

This makes our system revolve around the authoring and distribution of what essentially are web applications, with the additional capacity to call system-specific features, as made available by the client API, and with the native layer substituting the server that traditional web applications need to constantly be in communication with. We agree with Anttonen et al. who state that in the future, a vast majority of software will be developed using web technologies, while binary programs will be limited to system software, and we are equally

To illustrate the things that can be made possible, the system could also afford third party objects, as custom or off-theshelf javascript code that can be agnostic of the underlying API and extend the platform objects, with access to data and well established applications and APIs already available in the Web.

234

Object configurations

Object configurations, while also objects in themselves, are distinct from platform objects in that they are the system components that can be authored in the simplest possible way, even through a GUI. They contain sets of parameters meant to initialize and customize the platform objects. JSON is a format easily suited

data model server controler server API monitoring interface configuration authoring interface

Researcher client Platform objects and authoring

object configurations

Purely a web application, the researcher client harbours the exact same platform objects that are available to the client. Its configuration authoring interface can parse these objects and enable the production of object configurations through a GUI. Moreover, the code for additional platform objects can be submitted to the server for distribution to participant clients. Participant management, and allocation of platform objects to participants, are also handled through this module by calling the server API,

platform objects native client layer

end user developer

professional developer

Figure 2. Estimation of how system components are distributed across development stakeholders.

Monitoring interface

The monitoring interface is a distinct component that makes use of the server API to query the server for data that has been submitted by participant clients, and handles them locally in the browser for processing or visualization.

The platform objects that have been implemented are configurable through a generic GUI, which produces key-value pairs in JSON, and can be customized and updated for each participant in real time. Participants’ responses are also stored in JSON. They can be monitored and graphically visualized over time through a separate monitoring component which makes use of jQuery plugins to draw graphs. The researcher has been able, by observing the behaviour of participants, to modify the interfaces to better suit the inputs the participants provide. Optionally, responses can be downloaded as comma separated values for processing in packages such as Matlab or SPSS.

Implications for developer roles

Having explored the dimension of system components, in Figure 2 we offer a conceptual distribution of how each component can be mapped on development concerns, from the end-user developer role, where the product concerns a small set of the user population, to the professional developer role, whose output affects a greater body of users. The system components that befall the end-user developers such as object configurations can only be sets of key-value pairs that can simply be produced through a GUI. Platform objects contain scripted and programmed behaviour, and the popular libraries and tools available for end-user authoring of HTML and javascript can be put to use to produce them. They are the key components to making the platform truly customizable. More complex behaviours than those that the browser allows, need to be implemented in the client API, and should more aptly be dealt with in a more professional mindset.

The researcher also has the ability to add to the pool of platform objects, produce configurations for the code and serve the new content to participants, without having to redeploy any component. For this to take place, the javascript object needs to implement two short methods and provide values for three specific members, in order to interface with the system. In the future, the addition of an event handling component in the client API and the extension of the configuration authoring GUI with trigger specifications will allow for the implementation of experience sampling studies as well, without modification to the platform objects.

It should also be noted that the more volatile and easy to throw-away a component is, the easier it is to produce.

CONCLUSION

Challenges for tools for in situ data collection lie not only in how to allow participants to run applications that perform data collection, or in how to allow researchers to appropriate the tools in their studies. The need for developers to keep chasing after moving targets, such as constantly evolving hardware platforms and new design goals, without considering a general direction for the research community at large, leads to recurring reimplementation of tools, where only incremental or partial implementation is needed.

IMPLEMENTATION

The model is under progressive implementation, with the resulting application able, at the point of this writing, to support an ongoing diary study, with conditionally branching questionnaires, featuring offline data logging for android clients with limited connectivity. In the absence of a dedicated client for a participants’ device, access has still been possible from a regular browser on a smartphone, tablet, or desktop, thanks to the support that the jQuery mobile library provides for a host of different devices.

By enforcing a separation of end user development concerns

235

as related to system components, we have proposed a model that can be implemented on established standards that have massive support from the greater software development community, and a very exciting future.

6. Carter, S., Mankoff, J., Klemmer, S., and Matthews, T. Exiting the cleanroom: On ecological validity and ubiquitous computing. Human-Computer Interaction 23, 1 (2008), 47–99.

Applying a common direction for tools such as the one suggested in this paper, provides benefits for both developers and users of the tools.

7. de Sá, M., and Carriço, L. A mobile tool for in-situ prototyping. Proceedings of the 11th International Conference on Human-Computer Interaction with Mobile Devices and Services - MobileHCI ’09 (2009), 1.

• For those interested in development of the tools, agreement on a modular approach can lead to shared client and server APIs that will promote the interoperability and reuse of components. The cost to implement new methods in software can be made lower. Also, iterations on past efforts can be allowed to focus only in areas of interest and not have to worry about the system as a whole.

8. Fischer, J. E. Experience-sampling tools : a critical review. Journal of Youth and Adolescence 57, 3 (2009), 1–3. 9. Froehlich, J., Chen, M. Y., Consolvo, S., Harrison, B., and Landay, J. A. MyExperience: a system for in situ tracing and capturing of user feedback on mobile phones, vol. San Juan,. ACM, 2007, 57–70.

• For those interested in using them to conduct studies with, access to a greater base of components to choose from will be possible, as well as mixing components from different projects, to produce a custom system that suit their needs. Additionally, having systems that are directly comparable can take us closer to consicely explicating research protocols for purposes of repetition and comparison across individual studies.

10. Froehlich, J., Dillahunt, T., Klasnja, P., Mankoff, J., Consolvo, S., Harrison, B., and Landay, J. A. UbiGreen: investigating a mobile tool for tracking and supporting green transportation habits, vol. 09. ACM, 2009, 1043–1052. 11. Gerken, J., Dierdorf, S., Schmid, P., Sautner, A., and Reiterer, H. Pocket Bee: a multi-modal diary for field research. ACM, 2010, 7–10.

This work is not at odds with previously implemented platforms, but rather complementary to their efforts. Modern tools also apply technologies we advocate for in their implementations[12], but do so in isolation from each other. It is therefore the opportune moment to address the overarching issue of how to attain maintainable and reliable data collection tools, which can be interoperable, within the academic community. We urge academic developers and software designers who are building these tools to also consider these issues in the relevant discourse.

12. Hicks, J., Ramanathan, N., Falaki, H., Longstaff, B., Parameswaran, K., Monibi, M., Kim, D. H., Selsky, J., Jenkins, J., Tangmu, H., and Estrin, D. ohmage : An Open Mobile System for Activity and Experience Sampling. ACM (2011). 13. Khan, V.-j., and Eggen, B. Features for the future experience sampling tool. Human Factors (2009), 2–5.

1. Agresti, W. W. Software reuse : Developers experiences and perceptions. Business 2010, January 2010 (2011), 48–58.

14. Ko, A. J., Myers, B., Rosson, M. B., Rothermel, G., Shaw, M., Wiedenbeck, S., Abraham, R., Beckwith, L., Blackwell, A., Burnett, M., and et al. The state of the art in end-user software engineering. ACM Computing Surveys 43, 3 (2011), 1–44.

2. Anttonen, M., Salminen, A., Mikkonen, T., and Taivalsaari, A. Transforming the web into a real application platform. ACM Press, 2011, 800–807.

15. Larson, R., and Csikszentmihalyi, M. The experience sampling method. New Directions for Methodology of Social and Behavioral Science 15, 15 (1983), 41–56.

3. Barrett, L. F., and Barrett, D. J. An introduction to computerized experience sampling in psychology. Social Science Computer Review 19, 2 (2001), 175–185.

16. Myin-Germeys, I., Oorschot, M., Collip, D., Lataster, J., Delespaul, P., and Van Os, J. Experience sampling research in psychopathology: opening the black box of daily life. Psychological Medicine 39, 9 (2009), 1533–1547.

REFERENCES

4. Bezroukov, N. Open source software development as a special type of academic research (critique of vulgar raymondism). First Monday 4, 10 (1999), 1–16.

17. Khan, V.J., Ruyter, B. D., Markopoulos, P., and Eggen, B. Reconexp : A way to reduce the data loss of the experiencing sampling method. Technology (2008), 471–476.

5. Carter, S., Mankoff, J., and Heer, J. Momento: support for situated ubicomp experimentation, vol. San Jose,. ACM, 2007, 125–134.

236

Considerations for computerized in situ data collection ... - CiteSeerX

Considerations for computerized in situ data collection ... - CiteSeerX

Suggest Documents

RAIN: in-situ aerosol collection

In situ data collection and structure refinement from microcapillary ...

Computerized behavioral-data collection and analysis ... - Springer Link

Computerized behavioral-data collection and analysis ... - Springer Link

Conceptualisation and Data Collection for modelling - CiteSeerX

Data Collection and Restoration for Heterogeneous ... - CiteSeerX

Caching Considerations for Generational Garbage Collection

In-Situ Data Quality Assurance for Environmental ... - CiteSeerX

issues involved in voicemail data collection - CiteSeerX

Quality Control Considerations for Fluorescence In Situ ... - IntechOpen

data transmission options for vmt data and fee collection ... - CiteSeerX

Data Collection in Wireless Sensor Networks for Noise ... - CiteSeerX

Data Collection in Wireless Sensor Networks for Noise ... - CiteSeerX

Tools and Methods for Data Collection in Ethnobotanical ... - CiteSeerX

Mobile Element Scheduling for Efficient Data Collection in ... - CiteSeerX

Tools and Methods for Data Collection in Ethnobotanical ... - CiteSeerX

Data Collection in Geography

Standards for Computerized Clinical Data: Current ...

Computerized model for the integration of data

In situ - CiteSeerX

Therapeutic Considerations for Disease Progression in ... - CiteSeerX

Computerized Data Mining for Adverse Drug Events in an Outpatient ...

Data security considerations in modern automation networks - CiteSeerX

Games For Sketch Data Collection