Document not found! Please try again

Using the World-Wide Web as a Platform for an Interactive ... - CiteSeerX

51 downloads 34223 Views 725KB Size Report
Such applications require specialist software to be developed ... The WWW as a platform for a program auralisation and debugging .... that best suited them.
School of Computing and Mathematical Sciences

CMS 21 (2001)

Using the World-Wide Web as a Platform for an Interactive Experiment

Paul Vickers School of Computing and Mathematical Sciences Liverpool John Moores University Byrom Street Liverpool L3 3AF, UK Tel: +44 (0)151 231-2283 Fax: +44 (0)870 133-9127 E-mail: [email protected]

ISBN 1 902560 050

1. Abstract Computer science research (especially that in human-computer interaction) often requires empirical studies to be conducted. When more than a few subjects are involved in complex and time-constrained experimental tasks, conducting experiments becomes problematic. Faced with the task of undertaking a quantitative analysis of a program auralisation system, we designed and implemented a system that would allow a laboratory-based experiment to be conducted and administered automatically. The system addressed the following challenges: ·

How can we effectively utilise the multimedia capabilities of web browsers to support empirical experimentation in general and for program auralisation specifically?

·

How can we conduct experiments remotely or at multiple locations?

·

How can we use the Internet to collect experimental results and data?

The application was used to conduct an experiment into how musical program auralisation can be used to assist novice programmers with debugging. The application gave two main benefits. First, its novel mechanisms facilitated investigation into how the program debugging process can be improved for novices. Second, it offered the facility to obtain experimental data that would have been much harder to capture otherwise. We discuss issues surrounding the implementation of such a system, such as the technological challenges and problems faced by those wishing to implement similar systems, especially those who are not computing specialists (e.g. psychologists). 2. Keywords Worldwide web, experiments, HCI, development 3. Introduction Research in computer science tends to be either experimental (quantitative) or case-study-based (mostly qualitative, perhaps with some quantitative analysis). Surveys and action-research are also sometimes used. Experiment-based research (what Robson (1993, p.18) terms the hypothetico-deductive approach), especially when in the field of human-computer interaction (HCI), or other cognitive disciplines such as psychology, normally requires observations and measurements to be made of subjects performing tasks. The data collected from the observers can then be analysed by statistical techniques to test hypotheses. To get data that can be meaningfully analysed with statistical methods requires a number of observations and subjects. These observations typically require either large teams of observers or the experiment to be conducted repeatedly with smaller groups. When an experimental design calls for a number of subjects to undertake a series of experimental tasks, some means of observing their behaviour is needed. The type of evaluation will determine exactly what data need to be recorded, but it is usual to collect information about the time taken to complete certain tasks and the different ways individual subjects respond to the stimuli (protocol data). For instance, it may be important to know how many times subjects select certain menu options in an experimental program interface. A common solution to the problem of gathering subject protocol and behaviour data is to record the sessions onto video tape for later analysis. When small numbers of subjects are used this approach can be quite effective. However, when larger numbers are required, or when the experimental sessions are lengthy, this approach can easily become burdensome. Many experimental designs require human observers to collect the data as the experiment proceeds. However, this can be problematic. First, it is possible for human observers to unintentionally affect the situation under observation (perhaps by inappropriate interventions or by subjects feeling uneasy about being observed). Second, it is time consuming to involve observers in lengthy experiments, especially when a number of different observers are needed. Having multiple observers can lead to inconsistency in the way data are gathered. Observer bias can increase inconsistency. Automation of experimental procedures offers some advantages. Graziano and Raulin (1993) proposed automation as a way of reducing observer bias because a machine treats all subjects the same. If a single machine (or set of networked machines) can be programmed to carry out the data collection automatically, then the need for multiple observers is also removed. Having no human observers then

1

reduces the risk of subjects feeling they are being watched, assessed, or judged as they perform their tasks. The Internet and associated technologies, such as web browsers, offer researchers the opportunity to automate much of the routine work associated with conducting an experiment and carrying out the observations, and to do it consistently. Such applications require specialist software to be developed and specialist networking and information technology (IT) knowledge and support. Computer science education is moving more and more towards using the Internet and the world-wide web (WWW, or web) as a medium for learning activities. One topic that is particularly challenging to teach is computer programming. As part of a research project that investigated how novices could be assisted in the program debugging task using a technique called program auralisation (the mapping of program data to non-speech audio), Vickers and Alty (1996; 1998; 1999) constructed the CAITLIN musical program auralisation system. The project required experimentation to test the effectiveness of the approach. This report discusses some of the technological issues surrounding the development and implementation of a system for conducting an interactive experiment into program auralisation and debugging via the WWW. The project showed that the WWW offers many opportunities and benefits for carrying out experiments, but that there are technological challenges that must be met to do so successfully. Furthermore, the WWW and the Internet require a mixture of technologies to build computeradministered-experiment applications. This means that developers must be technically skilled, particularly in the realm of programming. 4. The WWW as a platform for a program auralisation and debugging experiment 4.1. Background As part of an investigation into program auralisation we needed to conduct an experiment. This report how a web-based application was built that would automatically conduct and administer (including data gathering) such an experiment. Within the field of auditory display (see Kramer (1994) for a summary of the field), program auralisation, or the mapping of program data to sound (usually non-speech audio), has attracted increasing levels of interest. Brown and Hershberger (1992) were among the first to suggest using sound to aid the visualisation of software. Jameson’s Sonnet (Jameson, 1994), Bock’s Auditory Domain Specification Language (ADSL) (Bock, 1994) and the LISTEN Specification Language (LSL) (Boardman, Greene, Khandelwal et al., 1995) developed the idea. Jameson built a visual programming language to add audio capabilities to a debugger, while ADSL and LSL added audio to programs at the pre-processing stage. However, to-date, very little formal evaluation has been undertaken. The CAITLIN system was developed to provide auralisations within a structured musical framework as it has been argued that music offers many advantages as a communication medium (Alty, 1995; Alty, Vickers & Rigas, 1997; Alty & Rigas, 1998). Details of the CAITLIN musical auralisation approach are given elsewhere (Vickers & Alty, 1996; Alty & Vickers, 1997; Vickers & Alty, 1998; Vickers, 1999). In summary, auralisations are achieved by mapping the constructs of a Pascal program (IF, IF…ELSE, CASE, CASE…ELSE, REPEAT, WHILE and FOR) to short musical tunes (motifs). The key aspects of a construct (what we call its points of interest (Vickers & Alty, 1996)), such as entry and exit, and evaluation of Boolean expressions are each assigned a motif whose content is consistent with the structure and harmony of the other motifs for the construct. A hierarchic approach was taken to the motif design to allow the taxonomy Pascal constructs to be maintained (Alty & Vickers, 1997). One aspect of that research was to explore how musical program auralisations could assist novice Pascal programmers with the debugging process. An experiment was required in which subjects would be given a set of programs to debug using program auralisations for some of the tasks. An experiment was designed in which the time taken by subjects to complete a series of bug location tasks could be measured. Qualitative data regarding subject workload and perceived annoyance was also required as was information regarding subjects’ prior computing and music experience. The experimental design chosen was a within-subjects or repeated-measures design. In half the tasks subjects would be given full program documentation (specification, pseudo-code design, program source, sample input data, expected output data and actual output data). For the other half, subjects would have auralisations of the programs available in addition to the standard documentation. This design allowed each subject to be tested in both experimental conditions (auralisation present and not present).

2

It is beyond the scope of this report to describe the experiment fully, details of which can be found elsewhere (Vickers, 1999; Vickers & Alty, 2000). However, to provide context for the description of the application, we briefly summarise the experimental method. The experiment comprised eight different debugging exercises, A1 to A8. For each exercise subjects were given a specification for the program, a pseudo-code program description, sample input data, expected output data, and the actual output data. Each program was syntactically correct (that is, it would compile) and contained a single logical error (bug). Subjects were allowed as long as necessary to read the program documentation. When ready to continue, they were provided with the program source code after which they had a maximum of ten minutes to locate the bug (by highlighting the line of code in which they thought the bug resided). After completing each exercise, (whether by locating a candidate bug, or by running out of time) subjects were required to provide NASA task-load index (TLX) measures (NASA Human Performance Research Group, 1987; Hart & Staveland, 1988) so that comparisons of their perceived workload could be made between the auralised and non-auralised conditions. Subjects performed the eight tasks in order beginning with A1. Half the tasks had accompanying auralisations, the treatment (auralisation) being given to alternate tasks. Half the subjects had the treatment applied to tasks A1, A3, A5 and A7; the others had the treatment applied to tasks A2, A4, A6 and A8. Thus, every program was tested in both the treatment and non-treatment states so as to attempt to balance any differences between programs (such as complexity). In addition, as all subjects performed tasks in both states, any differences between the subject groups should be balanced. Without automation, subjects would require a set of printed documentation for each exercise as well as access to the musical auralisations. Some means of preventing access to program source code until they were ready to begin the debugging phase would also be needed, as would the ability to measure the time spent in each debugging phase. In a manual implementation, a team of human observers equipped with stopwatches would be necessary to gather this timing data and to give subjects the program source code at the appropriate points. The experiment was conducted on twenty-two subjects, which meant that the required observer team would be large. As the observers would only be needed to note the time taken to complete each exercise and to give subjects the documentation and source code for subsequent exercises, such an approach was considered unwieldy and not resource-efficient. First, finding sufficient observers was problematic. Second, it would be necessary to train observers in the requirements of the experiment. Even then, it could not be guaranteed that all observers would behave consistently. Therefore, it was decided to automate the experiment so that a computer could collect timing information and provide subjects with the documentation for each exercise at the appropriate point. Additionally, an automated system could collect user protocol data and subjects’ qualitative responses for the TLX analysis and experience questionnaire. Gathering the protocol data would be problematic in a manual system. Either, each subject’s session would have to be video-taped for later analysis, or the observers would have to make very detailed observations; this amount of attention could lead to subjects feeling uncomfortable. 4.2. Application design and structure The requirement, then, was to build an interactive system that would allow the above experiment to be administered automatically. The experiment had the following main requirements: 1.

To provide subjects with all necessary program documentation at the appropriate time.

2.

To log and control the amount of time spent on the debugging phase of each task.

3.

To log all user responses in respect of the bug location tasks, the TLX scores, and the questionnaire.

4.

To provide access to the musical program auralisations for those exercises that required them.

It makes sense to meet these requirements within a single application, rather than have separate programs to administer the different tasks. If a single application can provide a seamless and logical flow through the various experimental tasks and at the same time control the timing issues, capture all required user input, and provide mechanisms for playing the musical auralisations, then the user’s experience will be better. The ubiquitous nature of the graphical user interface (e.g. Windows) and its mouse-based point-andclick interaction paradigm means that most computer users already possess the skills necessary to operate a web browser. The rapid growth of the WWW has exposed millions of users to browser-based interaction in a very short time. On-line shopping and other services mean that it is now commonplace 3

to interact with sophisticated web applications via a browser program. Given people’s familiarity with this interaction paradigm (especially computer science students from which population the experiment’s subjects were drawn), it makes sense, from a usability standpoint, to provide a web-based front end to the application that will run the experiment. This offers the advantages of a simple and well-understood interface and the ability to transmit all collected data back to a central web server. Researchers are starting to use the WWW to conduct experiments. For example, Walker, Kramer, and Lane (2000) built a simple application to allow a psycho-physical experiment to be conducted remotely over the WWW1. The application was a bespoke system designed to meet the needs of a single experiment. In our case, the features needed from the application also meant that a bespoke system was necessary as no general-purpose systems existed that would support all the requirements of our experimental design. In their implementation of an on-line learning system, Phelps and Reynolds (1998) identified the importance of using a common interface to ensure the consistency of the application’s look-and-feel. In turn, this helps to make the application simple to use. Using a web browser for the interface enabled us to follow this principle. Phelps and Reynolds also found that users wanted on-line help. We did not consider extensive help necessary for our system as it had very few controls. Instead, we made sure that interface components had short descriptions (tool-tips) that popped up when the mouse pointer was moved over them (see Figure 3). Additional clues were provided by the clickable headers that were dynamically-labelled according to their status (see Figure 3). Navigation through the application was in the form of what Ng (1997) described as a guided tour with the buttons to advance subjects to the next stage being dynamically provided as required. The status pane (see Figures 2 and 3) provided additional contextual cues to help subjects determine where they were at any given time, where they had been, and what remained to be done. The hypertext paradigm, an integral part of the WWW, encourages non-linear thinking and navigation. Because progress through the experiment’s tasks was linear, the use of hypertext was kept to the bare minimum necessary to implement the required system functions (such as buttons, pop up messages and forms, etc.) In our case, the main reason for using the WWW as a delivery mechanism is its ability to package text, graphics, audio, data gathering, and control buttons within a single document. An instructions screen (provided after logging in) gave exemplars of all the application’s interface components and instructions on how to use them. Subjects tended to spend around ten minutes working through these instructions. As Phelps and Reynolds (1998) observed from evaluation of their learning system, users would rather read detailed information from paper than from the screen. For this reason, and because the documentation for the programming exercises could not all fit on the screen at the same time, the documentation was also made available in printed form. Subjects could then choose the presentation that best suited them. Thus, the structure of the application can be described with the Jackson structure diagram given as Figure 1 (see Davies and Layzell (1993) for an explanation of the Jackson method and notation).

1

Walker’s experiment can be found at http://milgram.rice.edu

4

Application

Login

Instructions

Questionnaire

Experimental Tasks

C1 o Conditions Complete C1 - IF first login for this user ID Questionnaire C2 - While exercises to complete C3 - IF auralisation to be given (for alternate exercises) C4 - WHILE NOT (User submits OR Timeout)

C3 o

Provide Auralisation

C2

*

Exercise

Read documentation

Debugging

Select Auralisation

Locate Bug

ELSE o

No Auralisation

Gather TLX Weightings

Gather TLX Ratings

C4 Select Line#

*

Figure 1 Program structure of the experiment administrator application The requirements for each task shown on Figure 1 will now be discussed. Login At the start of the experiment, each subject is given a unique identification code. The application uses this number to generate a data file for each subject into which the subject’s responses will be stored. The code may also be used for deciding which exercises will be given auralisations. Instructions Because the subjects must operate the application to complete the experiment, an element of user training is needed. The first activity after logging in is to give subjects detailed instructions on how to interact with the system. The instructions screen contains examples of all interface elements used (e.g., buttons, radio buttons, selection lists, dynamic HTML (DHTML) and Windows Media Player controls). Complete questionnaire Next, the application gathers subjects’ responses to the questionnaire on musical knowledge and experience, and programming experience. The results are saved in the subject’s data file. Read Documentation For each exercise, the system should display the relevant documentation (program specification, outline design, and input/output data). Subjects press a button to indicate that they are ready to move on to the time-constrained bug location task (see Figure 2).

5

Figure 2 Screen shot of the web application: subject in the reading documentation phase Select auralisation We stated above that half the programs would be given auralisations. To achieve the desired balancing effect, programs A1, A3, A5, and A7 would be auralised for subjects with even-numbered identification codes; for odd-numbered subjects, programs A2, A4, A6, and A8 would be auralised. Therefore, the application decides whether or not to provide an auralisation on the basis of the subject’s identification code and the current exercise. Select line The experimental task is for subjects to identify which single line of code in a program contains the bug. Therefore, the application must provide a means of displaying the source code and allowing the user to choose a line of code. Also, the user must be allowed to change his/her mind as many times as desired until either, the user presses the button to move to the next exercise, or, he/she runs out of time (ten minutes, in this case). Gather TLX ratings & weightings After completing each program, subjects were required to evaluate their perceived workload during the task using the NASA TLX ratings. The application provides a form on which these responses can be made. After completing all eight exercises subjects are required to rank the seven TLX ratings in order of influence (that is, which factors had the most effect on their perceived workload). These weightings are needed to complete the TLX analysis. Thus, the application provides another form for gathering this information. Additional requirements In addition to administering the above tasks, the application should provide status information to inform the user of their progress through the experiment. If the computer crashes, then the application should also allow re-entry to the experiment at the next available exercise. It is also desirable for the application to collect subject protocol data (for example, how many times subjects listened to particular auralisations, or how many times they changed their mind about a bug’s location before submitting a response). Gathering this sort of data allows for more in-depth analysis of the results. 6

4.3. Technologies used In its early days as a pure hypertext information provider, the WWW presented few challenges but also few opportunities for useful, interactive sites. With the more recent innovations like scripting languages, Java, dynamic HTML (DHTML), object embedding (e.g. ActiveX), database integration, and serverside processing (e.g. Microsoft’s Active Server Pages (ASP) technology), it is possible to write sophisticated web-based applications for administering the kind of experiment described in this report. However, the same technologies that offer these increased opportunities also bring associated challenges of implementation in terms of ease-of-use, skills required, security, and compatibility. Below we discuss the technologies used in the experiment application, how they were used to provide the required function, some of the challenges they presented, and what advantages and disadvantages arose from their use. Browser Because we had full control over the computer laboratory in which the experiment was to be conducted, and hence, could guarantee that all subjects would be using machines equipped with Microsoft Internet Explorer 4.0, it was decided to support only that platform. This meant that differences in browser implementations of various HTML and other web features did not need to be explicitly catered for in the application’s construction. Obviously, if a more general platform is desired then cross-browser compatibility becomes a design issue, and increases the complexity of the application. In an application of this type, state and transitions between states become of critical importance. For example, if a subject decides that they want another attempt at an already completed exercise, all that is required (in an application that does not explicitly prevent this) is sufficient presses of the browser’s ‘back’ button to load the desired page from the client machine’s memory cache. State preservation (in this case, remembering what tasks have already been attempted) is difficult to achieve and requires sophisticated server-side processing. An alternative solution is to limit the browser’s capabilities by removing the navigation buttons from the interface. This can be achieved comprehensively by constructing a new browser client (for example, by using Microsoft’s Internet Explorer Application Kit) in which all the relevant functions are removed. Of course, this has no effect if a user simply uses a noncustomised version of the browser. A quicker, albeit partial, solution, and the one that was used in this case, was to have the application spawn a new browser window in which the tool bars are hidden. By so doing, the navigation buttons are removed (although their functions can still be accessed via the context menu by a ‘right-click’ on the mouse—a feature not advertised to the users).

7

Figure 3 Screen shot of the web application: subject in the bug location phase JavaScript A condition of the experiment was that subjects have only ten minutes to complete each bug location task. Figure 3 shows how the screen would look for a subject in the bug location phase of exercise A4. In the upper right-hand corner can be seen a clock indicating the time remaining. The countdown clock was implemented in JavaScript which allows client-side timekeeping to be carried out. Active Server Pages The application was implemented using Active Server Pages (ASP). ASP is a technology that provides server-side script processing in order to customise the HTML output sent to the client browser. The source pages on the server contain the script routines (either in JavaScript or VBScript) that render the HTML received by the client. ASP allows data for each browser session to be stored in session variables (a special implementation of Internet cookies). By using session variables it was relatively straightforward to keep track of each subject’s progress (and therefore, state) through the experiment and to provide this information in a status window (see the left hand column in Figures 2 and 3). By allowing decisions about HTML content to be made prior to serving the pages to the client, the HTML can be tailored according to information provided by the user. For example, if a subject aborts the experiment (for a natural break, perhaps) and then logs in again, the instructions screen will lead directly to the experimental tasks rather than to the questionnaire (which was completed the first time the user started the application). Instead of having a separate page for each debugging exercise, a single ASP page was written that used the subject’s session variables to decide which exercise was being undertaken, and thus, which program documentation, source code, and auralisation files to serve to the client browser. The data provided by the subjects (such as their answers to the questionnaire, TLX scores, etc) were written to the data files by ASP scripts that had access to the server’s local file system. Using ASP meant that no data had to be stored on the client machines and so there were no problems with transferring the results back to the server. This had the added benefit that the subject data files

8

could not be tampered with prior to submission. In addition, because ASP scripts and expressions are evaluated prior to serving the HTML to the clients, it was possible to ensure that the values of session variables could not be seen or tampered with by users viewing the pages’ raw HTML source from the browser. ActiveX It was not possible to give subjects access to the CAITLIN program auralisation system itself as the development version in use at the time ran under MS-DOS and required an external General-MIDI synthesiser. In addition, to ensure that every user gets identical auralisations for the eight experimental programs, identical synthesisers would be needed. Instead, the eight programs were processed by the CAITLIN system and their auralisations sampled using Sonic Foundry’s Sound ForgeÒ2 program and stored as digital audio files on the server. Although most of them were quite short, some auralisations lasted up to two minutes. Using 16-bit stereo sampling at a sampling rate of 44.1KHz (CD quality) requires approximately ten megabytes of storage for each minute of audio. Thus, even short auralisations have large storage requirements, and hence longer download times. Downloading a 20Mb audio file could take too long (especially if network traffic is high, or available bandwidth is low) so, to reduce the audio file sizes, the auralisation files were converted to MPEG-1 Layer 3 (MP3) format at a sampling rate of 22.05KHz and a bit-rate of 40Kbit/second (near-CD-quality MP3 requires a 44.1KHz sampling rate and a bit-rate of 128Kbit/second). This allowed the files to be compressed to about ten-percent of their original size, so the largest file was 2.2 megabytes instead of 22 megabytes. Because the dynamic range of the music was narrow the perceived loss of quality was negligible. To play back the auralisations, the Windows Media Player (WMP)3 was embedded in the application as an ActiveX control (see top of Figure 3). The control offers users the usual audio file transport functions (play, stop, pause, rewind, etc). The Windows Media Player was used principally because it allowed playback of MP3 audio and was easily embedded in an HTML page. Other solutions that were considered but rejected included Rich Music Format and Real Audio. Rich Music Format (RMF) is a proprietary music format used by the Beatnik plug-in software4. Beatnik’s RMF files can contain both compressed MIDI data and compressed audio. The sound is rendered using the Beatnik engine plug-in on the client machine. Beatnik was rejected for this application because it required more software to convert the audio files into RMF files. Additionally, at the time the application was built we were unable to integrate the Beatnik player into the web pages cleanly. Real Audio5 (a streaming audio format) was rejected as its audio quality was not high enough. Furthermore, we felt that adding an audio streaming service to the application made the implementation even more complicated. Streaming increases the load placed on the web server with a consequent impact on performance. Breaks in a streamed service can also be cause by network congestion. Java applet Because the program auralisations were stored as audio files on the server, they had to be downloaded to the client browser as required. The auralisation files are not needed by the subject until the timed bug location phase begins. However, some of the auralisation files were over two megabytes in size, which meant that significant download delays could occur, which in turn would impact on the subject’s performance. The solution was to pre-fetch the required auralisation files while the user was reading the program documentation. This was done by using the TurboSite Accelerator from Intellisoft6. TurboSite 2

See http://www.sonicfoundry.com

3

See http://www.microsoft.com/windows/mediaplayer/en/default.asp

4

See http://www.beatnik.com/

5

See http://www.realaudio.com/

6

See http://www.intellisoft-inc.com/

9

Accelerator is a Java applet that loads files directly into the client browser’s cache before they are needed. Because this is done while subjects are reading the program specification and other documentation, the auralisations are immediately available when the bug location phase is begun. Dynamic HTML Figures 2 and 3 show how the program documentation and source code are displayed on the screen. Because the program documentation (description, input data, expected output, and actual output) would not fit onto one screen, dynamic HTML (DHTML) was used to organise the information. Each part of the documentation has its own heading that is ‘clickable’. For example, using the mouse to click on the heading labelled Input Data (Show) (see Figure 3) would cause the text holding the input data to be made visible inside a scrolling window (just like the program description in Figure 3). Clicking on the heading again causes that section to be hidden from view. This technique obviates the need for multiple screens, and allows several components to be shown at once. Of course, if all sections were made visible at once, little useful information could be seen. Therefore, a printed version of the documentation was also made available to the subjects. DHTML was also used to enable on-line selection of candidate bug locations. Subjects were required to select which line of code in the program source code contained a bug. By using DHTML to control the colour of the text in this region of the screen, subjects could click on a line of code and it would be highlighted in red. Moving the mouse over a line causes that line to turn blue as long as the mouse hovers over that line. Clicking on a different line would cause the previously selected line to every to its original colour and the new line to turn red. Subjects’ clicking activity can also be monitored to allow protocol data to be gathered and stored on the data file (e.g. the number of times a subjects change their minds over the bug location). 5. Discussion The experiment described in section 4.1 was carried out using the application described above7. The session ran smoothly with all subjects successfully using the web application to complete all the experimental tasks. The data from the experiment were automatically stored in a form that was easily readable by data analysis programs (e.g. SPSS and Excel). Having the data stored directly on the server removed any need to manually key in the results of the experiment from observation logs (with the consequent danger of transcription errors). This had the added benefit that no observers were needed. Instead, one of the authors supervised the session and was free to patrol the laboratory lending procedural assistance as required. The application worked well and none of the users had any difficulty in using it. The decision was taken early on to store subject data in simple flat text files rather than in a database. A database solution offers several advantages, not the least of which is that all the data are kept in one file which can then be exported to a program like SPSS for statistical analysis. The principal reason for using data files instead was the relative simplicity of implementation. Data were written to and read from the files by simple ASP script commands. To use a database would mean structured query language (SQL) queries would have to be specified to add and extract data. A database solution would also lead to a higher server load and higher rates of data transmission between the server and the clients. Also, with many users accessing the single database at the same time, delays could occur. An advantage of using the web application was that a team of observers was not necessary to collect the various subject response and protocol data. Indeed, an automated application makes it easy to collect much more subject protocol data than a human observer could. For instance, the application kept track of how many times each subject played, paused, and stopped each auralisation. This allows quite sophisticated statistical analysis to be carried out on the experiment’s results. Administering the experiment required only a single person to patrol the computer laboratory to lend procedural assistance when necessary. However, a drawback of this approach is that it was not possible to carry out think-aloud protocol recording. If this is a necessary part of the experimental design, then additional resources (such as tape recorders) would need to be provided. An alternative solution would be to capture think-aloud data

7

The system is now open-access and can be found at http://www.cms.livjm.ac.uk/caitlin/experiments

10

digitally as compressed audio to be sent to the server on completion of the experiment. Of course, this would require extra programming. Developing a web-based application, rather than a more traditional piece of software, means that the experiment can be conducted at almost any location without the need to install specialist software programs. The application runs on a ubiquitous platform requiring only Microsoft’s Internet Explorer 4.0 (or above), Microsoft’s Windows Media Player 6.0 (or above) and a standard sixteen-bit sound card. With a little more effort, the application could be modified to ensure compatibility with Netscape’s range of browsers (Navigator and Communicator), although this was not necessary for the purposes of our experiment. Furthermore, the application means that experiments can be conducted on much larger groups of subjects than would normally be possible with the consequent benefit of increased power in the experiment. The experiment data are automatically stored on files on the server. This has two principal advantages: a) the data do not have to be manually keyed in to a computer after the experiment, thus saving transcription errors and b) the experiment can be conducted off site, and the data are still saved to the server. The web application allowed a computer science experiment to be conducted (off-site at a different university, in this case) easily and effectively without the need for specialist software to be installed on multiple computers. However, development of the application was not without its challenges. First, such applications require specialist knowledge in order to construct them. In our case, several different technologies were used to achieve the desired results (ASP, DHTML, JavaScript, Java, and ActiveX), all of which required a range of programming knowledge and skills. Those with only a rudimentary knowledge of programming and program design would find it hard to develop such applications. Web site development programs (such as Microsoft FrontPage) only provide a skeleton in which these additional technologies must be fitted manually (at best semi-automatically with manual adjustment of relevant parameters). The Internet/WWW and their associated technologies offer researchers the opportunity to write sophisticated applications to conduct on-line experiments. Authoring tools, such as Microsoft’s FrontPage system, make the task of creating the architectural framework for such a web site straightforward as hyperlinks and style sheets can be managed automatically by the software. Some web-authoring tools also provide a measure of support and automation for database connectivity and form handling. The challenges arise when one wishes to implement anything other than the straightforward sites catered for by these authoring tools. Indeed, even some of the more simple uses of databases and forms still require a level of knowledge and technical skill beyond that needed to operate the tools’ interfaces. For non-computer specialists the task is that much harder. When sophisticated form handling, state preservation, automated decision making, and multimedia components are required, then the developer is faced with an array of different technologies from which to choose, all of which can produce the desired results. 6. Guidance for developers of interactive experiments From our experiences in developing the experiment application, we offer the following insights to assist developers of future applications. The overarching principle to apply when developing such systems is that which is colloquially known as ‘KISS’, or keep it simple, stupid and work with the tools and technologies with which you are already familiar. Despite having a good deal of experience in computer science, programming, and software development, we still found it necessary to learn the rudiments of client-side scripting (JavaScript), server-side processing (active server pages), embedding of Java applets (TurboSite Accelerator), dynamic HTML, and ActiveX component embedding (Windows Media Player) to satisfy the system’s basic functional requirements. Therefore we can summarise our findings and identify three challenges that are faced by developers of systems to support interactive experiments. 6.1. Challenge 1—skill acquisition The principal challenge in delivering on-line experiments via the web is the acquisition of the technical skills necessary to develop and build the supporting WWW applications. This includes basic competence in computer programming.

11

6.2. Challenge 2—choice of technology Assuming that the technical skills are available, the second challenge is the choice of technology. The Internet and the WWW are developing continuously and growing rapidly. More and more applications are being devised for the WWW and so more and more supporting technologies are emerging. Whilst this offers developers much scope and freedom, it can be difficult to know which of the available technologies is best suited to the task in hand. For instance, when we were deciding how to provide the music audio to the subjects in the CAITLIN experiment, we were faced with a number of possible solutions (RMF, RealAudio, and digital sampled wave files, for example). A case could even have been made for building the whole application in Java, but this was ruled out because, at the time, the Java Media Framework would not support the required audio quality. In our case we chose the MP3/Windows Media Player solution because it was the easiest to implement at the time. There may have been technically-superior solutions, but these would have delayed the project’s development beyond that which was acceptable. Windows Media Player had the added benefit of providing a simple solution. Thus, assuming appropriate technical expertise is available, we would identify the second main challenge of using the WWW to support experimentation to be the choice of solution strategy. Although some technologies and methods may provide superior solutions, often the pressures of the project schedule demand that an alternative is used in order to hasten completion. In effect, this is a rephrasing of the KISS rule (above). 6.3. Challenge 3—choice of platform Having identified which technologies to use, a third challenge that emerges is that of client platform independence. Despite the attempts to standardise the various Internet and WWW protocols, significant differences are found in the way that browsers from different manufacturers render the various HTML and DHTML components. For instance, Netscape’s browsers will render the HTML tag, whereas Microsoft’s Internet Explorer ignores it. Of course, from an ergonomics point of view there is a question over whether one should ever use the tag in the first place; nevertheless, this and other differences make the web application developer’s task even harder. Added to this is that HTML is a dynamic standard and successive generations of the language offer more and more features. Developers have to keep in mind that older browsers will not support the newer HTML and DHTML features. This becomes critically important when the application relies on the use of those features to achieve its aims. Again, the KISS rule applies. If an application is to be developed for a ubiquitous platform, then appropriate steps must be taken during application development to ensure that the application will work across different browsers and different generations of browser. This will add to the complexity of the development task. The complexities of the experiment application described in section 3 and the short times scale in which the application had to be developed meant that the decision was taken to support only a very specific platform (Microsoft IE4, Windows Media Player 6, and a 16-bit sound card). The application achieves very satisfactory results when run on the target platform, but fails to work when accessed via a Netscape browser. Conversely, Walker’s experiment (Walker, Kramer & Lane, 2000) can only be run using a Netscape browser with the Beatnik plug-in installed. If the task of on-line experimentation tricky for experienced and skilled computing specialists, what can be offered by way of hope to those from other disciplines? The continued development of WWW authoring packages means that even more powerful support will soon be available to non-specialists for the construction of sophisticated systems. One only has to look back a few years to see how much easier it is now to create basic web sites as much of the need to write HTML code has been taken away. Furthermore, products are now available that aim to solve specific web application problems. In UK higher education we are faced with much larger class sizes than just a few years ago, and the resultant assessment burden has grown. Specialist applications such as Question Mark Computing’s Perception software8 provide facilities for setting and administering surveys and examinations on-line with all the results automatically saved in a Microsoft Access database for later analysis. The tool is designed for general use, and does not require much technical expertise. Two drawbacks we have identified with this tool are, first, it has a steep learning curve and so one could not expect to be using it productively in a 8

See http://www.qmark.com/

12

very short time. Secondly, Perception is licensed on the number of concurrent users that may access it. In our case, the application we developed is limited only by the ability of the server to cope with demand. Additionally, once it is required to provide more than just the basic facilities of the tool, one must again resort to scripting and other forms of web programming. 7. Conclusion The aim of this work was to build a web application that would allow an experiment into program auralisation and debugging to be carried out. Auralisations from the CAITLIN system were successfully embedded into the application and evaluated empirically. Analysis of the data gathered by the application provided insights into how useful novices found the musical program auralisations. These insights can now be fed back into further development of the CAITLIN system for additional study. In summary, we identify three principal challenges to those wishing to support experiment administration on the WWW: ·

A good deal of technical skill is required to develop anything other than rudimentary web applications.

·

The availability of different technologies makes choosing the most appropriate solution strategy difficult. Often, decisions will be made on the basis of expediency and time pressures rather than technical superiority.

·

Despite the move towards increased standardisation, much explicit coding is still required to ensure cross-browser compatibility.

If these challenges can be overcome, then the multimedia environment of the WWW offers researchers great opportunities for automating much of the administration and data gathering of scientific experiments. If replication of results is required, then a web application offers a framework for repeated experimentation with the assurance that the experimental method, protocols, and observation remain consistent. Such applications can be readily built by computing specialists with the appropriate skills. Researchers from non-IT-related disciplines would do well to seek the cooperation of those with the necessary technical skills to ensure the development of usable and reliable systems. 8. Acknowledgements Thanks go to Prof. J.L. Alty at Loughborough University with whom the program auralisation research was carried out. 9. References Alty, J. L. (1995). Can We Use Music in Computer-Human Communication? People and Computers X. D. Diaper and R. Winder. Cambridge: Cambridge University Press. Alty, J. L. & Rigas, D. I. (1998). Communicating Graphical Information to Blind Users Using Music: The Role of Context, in Proc CHI98 Conference on Human Factors in Computing Systems, Los Angeles, CA, April 18-23, ACM Press. Alty, J. L. & Vickers, P. (1997). The CAITLIN Auralization System: Hierarchical Leitmotif Design as a Clue to Program Comprehension, in Proc The Fourth International Conference on Auditory Display, Palo Alto, Xerox PARC, Palo Alto, CA 94304. Alty, J. L., Vickers, P. & Rigas, D. (1997). Using Music as a Communication Medium, in Proc Refereed Demonstrations, CHI97 Conference on Human Factors in Computing Systems, Atlanta, GA, March 22-27, ACM Press. Boardman, D. B., Greene, G., Khandelwal, V. & Mathur, A. P. (1995). LISTEN: A Tool to Investigate the Use of Sound for the Analysis of Program Behaviour, in Proc 19th International Computer Software and Applications Conference, Dallas, TX, Aug. 9-11, IEEE. Bock, D. S. (1994). ADSL: An Auditory Domain Specification Language for Program Auralization, in Proc Second International Conference on Auditory Display, ICAD '94, Santa Fe, NM, Nov. 7-9, Santa Fe Institute. Brown, M. H. & Hershberger, J. (1992). “Color and Sound in Algorithm Animation.” Computer 25(12): 52-63.

13

Davies, C. G. & Layzell, P. J. (1993). The Jackson Approach to System Development: An Introduction Chartwell-Bratt. Graziano, A. M. & Raulin, M. L. (1993). Research Methods: A Process of Inquiry. New York: HarperCollins College Publishers. Hart, S. & Staveland, L. (1988). Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research. Human Mental Workload. P. Hancock and N. Meshkati. Amsterdam: North Holland B.V: 139-183. Jameson, D. H. (1994). Sonnet: Audio-Enhanced Monitoring and Debugging. Auditory Display. G. Kramer. Reading, MA: Addison-Wesley. XVIII: 253-265. Kramer, G., Ed. (1994). Auditory Display. Santa Fe Institute, Studies in the Sciences of Complexity Proceedings. Reading, MA: Addison-Wesley. NASA Human Performance Research Group (1987). Task Load Index (NASA TLX) v1.0 computerised version, NASA Ames Research Centre. Ng, E. H. (1997). Software Reusability and its Application to Interactive Multimedia Learning Systems. Computing & Mathematical Sciences. Liverpool, Liverpool John Moores University: 207. Phelps, J. & Reynolds, R. (1998). Summative Evaluation of a Web-based Course in Meteorology. Innovation in the Evaluation of Learning Technology. M. Oilver. London: University of North London: 135-150. Robson, C. (1993). Real World Research. Oxford: Blackwell Publishers Ltd. Vickers, P. (1999). CAITLIN: Implementation of a Musical Program Auralisation System to Study the Effects on Debugging Tasks as Performed by Novice Pascal Programmers. Computer Science. Loughborough, Loughborough University: 234. Vickers, P. & Alty, J. L. (1996). CAITLIN: A Musical Program Auralisation Tool to Assist Novice Programmers with Debugging, in Proc Third International Conference on Auditory Display, Palo Alto, Nov 4-6, Xerox PARC, Palo Alto, CA 94304. Vickers, P. & Alty, J. L. (1998). Towards some Organising Principles for Musical Program Auralisation, in Proc ICAD '98 International Conference on Auditory Display, Glasgow, November, 1998, British Computer Society. Vickers, P. & Alty, J. L. (2000). Musical Program Auralisation: Empirical Studies, in Proc ICAD 2000 Sixth International Conference on Auditory Display, Atlanta, GA, 2-5 April, International Community for Auditory Display. Walker, B. N., Kramer, G. & Lane, D. M. (2000). Psychophysical Scaling of Sonification Mappings, in Proc ICAD 2000 Sixth International Conference on Auditory Display, Atlanta, GA, 2-5 April, International Community for Auditory Display.

14

©2001 by the author All rights reserved. No parts of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the authors.

15

PUBLISHED BY : School of Computing and Mathematical Sciences Liverpool John Moores University

ISBN 1 902560 050

16

Suggest Documents