Voice-based Approach for surmounting spatial and

0 downloads 0 Views 512KB Size Report
Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. .... and listen to the comments at the same time rather than ... the conversation is over, it is up to the people on .... where everyone wins.
IGI PUBLISHING

 Journal Information Research, 1(2), -0, April-JuneUSA 2008 701ofE. Chocolate Technology Avenue, Suite 200, Hershey PA 17033-1240,

ITJ4158

Tel: 717/533-8845; Fax 717/533-8661; URL-http://www.igi-global.com This paper appears in the publication, Journal of Information Technology Research, Volume 1, Issue 2 edited by Mehdi Khosrow-Pour © 2008, IGI Global

Voice-based Approach for surmounting spatial and temporal separations Kate O’Toole, GreenSun and Kastle Data Systems, USA Srividhya Subramanian, University of Arizona, USA Nathan Denny, University of Arizona, USA

AbstrAct This article describes a new voice-based tool for global collaboration. This tool, called EchoEdit, attempts to provide multimedia capabilities to program source code editing for the purpose of eliciting in situ vocal commentary from active developers. Keywords:

agile documentation; voice annotation; voice-based collaboration

IntrOductIOn

Global teams are becoming a growing reality. When teams are distributed, individuals lose the ability to walk down the hall and ask their other team members a question. When team members are distributed but work in similar time zones, they possess the ability to pick up the phone and call their peers. When the teams are distributed both temporally and spatially, team members lose these forms of immediate feedback, and different methodologies are needed to surmount the time and space separations that many global teams experience. One way to distribute a team is by breaking the project down into different modules (e.g., site A is responsible for module A while site B

is responsible for module B). In this situation, the two teams need to agree on an interface, but further interaction is limited. Another way for distributed teams to interact is in a masterslave type relationship; in this case, one team may do the design work and a second team may do the testing. In this scenario, one site is telling the other site what to do and has more authority. A third way for distributed teams to interact is in what is known as the “24-Hour Knowledge Factory” (24HrKF) (Denny, Mani, et al., in press). One way to reduce the confusion that may be caused when multiple people share the same piece of code is to increase the documentation and explain how the decisions were made. This

Copyright © 2008, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

Journal of Information Technology Research, 1(2), -0, April-June 2008 

article focuses on the use of voice recognition software for this purpose.

EmbEddEd PrOgrAm dOcumEntAtIOn

Most programming languages have a built-in feature that allow programmers to make notes on what the code is doing and how the code works.These are usually done as comments and are denoted by a specific symbol that the compiler will ignore. However, many persons who have gone back to look at the code that they or their colleagues had written wish that code had been better documented and that they could access the logic that the original programmer used while creating that code. It is much easier to understand a code when someone tells you what he or she was doing. Three surveys conducted in 1977, 1983, and 1984, highlight the lack of dFocumentation as one of the biggest problems that the code maintenance people have to deal with (Dekleva, 1992). Prechelt conducted experiments using pattern comment lines (PCL) and added the pattern usage of design patterns to the code (Prechelt, Unger-Lamprecht, Philippsen, & Tichy, 2002). By adding a few lines of comments and running experiments where students took existing code and modified it, major operational improvements were observed. Documentation is not always a negative aspect of a project. When documentation is done correctly, it can be a positive aspect of the project and very useful. An example of good documenting on legacy software is IBM’s involvement with the space program. Starting in the 1970s, IBM worked with NASA to come up with programming for the space shuttle. The documenting process was high on the team’s to-do list when developing the software and finding and correcting bugs. Billings et al. cite that data collection to update and fix any errors paid off quickly with the software becoming nearly error free (Billings, Clifton, Kolkhorst, Lee, & Wingert, 1994). However, she further states that when this data was brought in late, it tended to disrupt the development activities. Timely documenting is as important as good documenting. The oldest, and probably the most famous, code documentation approach is called literate

programming. It was developed by Knuth in the early 1980s (Knuth, 1984). This style combines documentation with the code and then compiles the two into separate documents. There is a tangle function, which compiles the actual code into a program, and then the weave function, which produces documentation from the comments imbedded in the code. This style requires that the programmer have a good understanding of the weave function so that the comments come out properly formatted. The comments are written in the code document in a format that is similar to LaTex. Literate programming ideas have evolved into what is now called elucidative programming (Vestdam, 2003). In elucidative programming, the code and the documentation exist as different documents. Links are then added to the code so that the external documentation is easily accessed. There are no rigid standards for what the comments have to say, only that they should describe the functionality of the code. Javadoc comments are another standard for documenting code (Kramer, 1999). Javadoc compiles to create an API for the source code. Here, there are more rigid standards for the documentation requirements. The input types and return types are needed and also, what the method does to find the correct output. The comments are imbedded into the document because the thought is that when the comments are so close to the code, programmers are more likely to update the comments as they update the code.

VOcAL AnnOtAtIOn In cOLLAbOrAtIOn

Fish et al. provide strong evidence that a more expressive medium is especially valuable for the more complex, controversial, and social aspects of a collaborative task (Fish, Chaljonte, & Kraut, 1991). The usage of voice is preferred over text to comment on higher level issues in a document, while the usage of text is appropriate to deal with lower level intricate details. This can be extended to the 24-Hour-Knowledge Factory context of software development where programmers can use voice comments to an-

Copyright © 2008, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

 Journal of Information Technology Research, 1(2), -0, April-June 2008

notate an issue, like probing the logic used by another programmer, or to suggest a change in the original overall design of the software being developed and use textual comments to correct lower level syntactical details. Various plausible scenarios can be finely contrasted by permuting the annotation modalities. When the users’ annotation modalities were restricted, using written annotations led them to comment on more local problems in the text, while the use of speech led them to comment on higher level concerns. Further, when they did use written annotations to comment on global problems, they were less successful than when they used spoken annotations. Finally, when they offered spoken annotations, they were more likely to add features such as personal pronouns and explanations that made their comments more equivocal and socially communicative. All these scenarios discussed by Fish and his group focused on the annotator. They did not, however, examine the effects of the spoken annotations on the recipient (Fish et al., 1991). It is crucial to note that recipients of voice annotations may be at a disadvantage when compared to recipients of written annotations. For example, recipients of a voice message are more limited in their ability to review, skim, and otherwise abstract the content. While speech is faster than writing for the producer, it can be slow and tedious for the receiver to process (Fish et al., 1991; Gould, 1978). Experiments have shown that reviewers making voice annotations produced roughly two-and-one-half times more words per annotation than reviewers making written annotations. Having conducted experiments to assess the preference of the writer’s preference of modality, it was established that regardless of the mode in which comments were produced, writers preferred to receive comments about purpose/audience and style in voice, and preferred to receive comments about mechanics in writing. However, in our system, we have minimized the problem of remembering the content of the voice annotations (not observed in the text annotations) with the intelligence of voice recognition software being put to use, to convert speech into text. The text, thus obtained,

can be conveniently revised whenever required. The annotation, as in PREP editor, is anchored to the block of code being annotated. When the user listens to the voice comment, the relevant block of code gets highlighted, thus eliminating the problem of annotation orphaning that occurred with the technique used by Cadiz, Gupta, and Grudin (2000). Other systems have ventured into experimenting with the combination of voice and text annotations as a possibility in coauthoring documents (e.g., Quilt, Coral and Filochat; Whittaker, Hyland, & Wiley, 1994). Our experience corroborated two findings from Filochat (Whittaker et al., 1994). First, people appeared to be learning to take fewer handwritten notes and to be relying more on the audio. Second, individuals want to improve their handwritten notes afterwards by playing back portions of the audio. Dynomite facilitates this task by allowing users to visually navigate the audio using the audio timeline and also to automatically skip from highlight to highlight in playback mode. This, again, is a system that resorts to a temporal aspect of the audio notes, unlike our spatial vocal commenting approach, which focuses on anchoring the comment to the block of code germane to it. Steeples (2000) conducted an experiment that used voice annotations to look at video files of professionals in learning situations. By having the participants of the study vocally annotate what was going on during their discussions, it was easier for other people to understand how they were learning. Steeples found that it was better to have voice rather than text because the people viewing the video could watch the video and listen to the comments at the same time rather than having to read in one window and watch the video in the other window. Another interesting point was that the professionals were more likely to give a better description of what was going on if they were talking about what was going on in the video rather than writing about it. When the professionals were told to analyze the video by writing, the accounts were more abstract and less detailed. The last thing they found helpful about the voice annotations was

Copyright © 2008, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

Journal of Information Technology Research, 1(2), -0, April-June 2008 

that these annotations did not change over time. When talking to someone on the telephone and the conversation is over, it is up to the people on the phone to remember what was said. Using the voice annotations, they were able to keep track of the recordings and refer back to them when needed. Steeples conducted more research and found that people could save a considerable amount of time by adding video annotations rather than text annotations. The time to add one minute of video was about the same as typing for four minutes (Steeples, 2002). She also brought up the point that video is not easy to scan through to find the spot you are looking for. Some of the participants in the study did not like the video and audio comments since they were used to scanning through text comments to quickly find the relevant information. The need for documenting-while-developing tools is great and if these tools were simplified, then they may be more widely used (Aguiar & David, 2005). Aguiar developed a wiki-based documentation tool that works with IDEs and allows external documentation that supports the actual code. The wiki links to the actual source code and UML diagrams to help developers as they work on the design of the project. The wiki approach also allows technical and nontechnical people to collaborate on the same documents (Aguiar & David, 2005). A wiki-based approach can be combined with voice comments so that instead of having to type out discussions, people can talk about them. By using speech recognition software, it would be easier for the next person to quickly read through what the other person said and then respond either through voice or text. If the wiki was set up as the voice annotations in the current Eclipse plug-in prototype, then the users would have a choice between reading the comments of the other users and also listening to the original voice comments to see if there were other vocal queues that allowed the person to retrieve more information from the voice rather than the text. Adding voice comments to different documents seems to improve the quality of the com-

ments and also the amount of information the producer imparts to the next person. Applying this idea to programming was the basis for the development of EchoEdit.

EchOEdIt

EchoEdit was developed to be used in the 24Hour Knowledge Factory using teams following the composite personae model (Denny, Mani, et al. 2008). In this model, there is a person at every site in the 24-Hour Knowledge Factory that is responsible to making decisions about different pieces of the project. This way, there is always someone available who can clear up confusion and answer questions. EchoEdit is a plug-in developed for Eclipse that allows the user to add voice comments to their code. EchoEdit is aimed at making the best of both the speech and text modalities as Neuwirth, Chandhok, Charney, Wojahn, and Kim (1994) suggest as part of the conclusion of their research The programmers are given the choice of both text and voice comments by the system. In the production mode, the programmer can use either text comments for syntactical or other lower level semantic concerns or voice comments for addressing higher level complex issues in his/her code (Neuwirth et al., 1994). On the reception mode, users play voice annotations by clicking on an iconic representation, which is displayed on the margin, similar to the idea painted in the PREP environment (Neuwirth et al., 1994). When the voice annotation is played, the block of code relevant to the comment is highlighted, showing the anchoring of the comment. This helps avoid orphaning of annotations. The purpose of the voice recognition software is to convert the voice notes into text for ease of indexing and to provide choice of modality to the programmer on the reception mode. Some people find it easier to talk through their thoughts than to type through them. Adding a lightweight program to add voice comments makes the program relatively easy to use. The 24HrKF requires programmers to be accountable for their code after every shift instead of only when they need to check new code into

Copyright © 2008, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

8 Journal of Information Technology Research, 1(2), -0, April-June 2008

their code repository. This may mean that code coming out of the 24HrKF is of higher quality since it is better commented and, therefore, easier to maintain. EchoEdit was developed for the IBM Eclipse environment because the prototype was driven by the desire to create a plug-in that would be an add-on to what many developers are already comfortable with. The Eclipse platform provides an easy mechanism for adding a plugin project that can easily be distributed to any other use of the Eclipse platform. The Eclipse platform provides a feature rich IDE, which is similar to other Java IDEs such as Netbeans and JBuilder. Since Eclipse is open source and free to download, it became an obvious choice to develop the prototype as an Eclipse plug-in which would allow tests to be conducted using a full featured IDE with the prototype also being fully integrated. There are three user actions that can be performed: Record a comment, play back an audio file, and convert of an audio file to a comment. An example of the type of text comment that is added to the code after it is translated from voice to text can be seen in Figure 1. Figure 2 shows the design of EchoEdit. The user interacts with the file in the IDE. When the

user presses the record button, an event is sent to the Record Action Handler. The Action Handler initializes the Audio Recorder and then listens for a Record Event. The Record Event Listener is received when audio recording has begun. Upon receiving the Record Event, the Record Event Listener starts a thread to do the actual recording, starting a thread to do the converting from voice to text and also launches the dialog box that lets the user know that the recording has started taking place. The Audio Recorder then listens for events from the Converter Thread and inserts the text that was generated into the original file the user was modifying. The recording of a comment generates both an audio file as well as a text comment of the dictation. This is performed by showing a popup dialog window that informs the user that a recording is in progress and the user can press a button to stop recording. Everything spoken is stored in memory to be written to an audio file once the recording finishes, along with being converted to text and inserted into the document being edited. There is a conversion step that creates a text comment from an audio file. EchoEdit allows the creators of programming codes to speak their thoughts out loud to make notes. It differs from existing tools lies in that

Figure 1. Comment inserted using EchoEdit /** The following method will take in an integer and then return the summation from zero to the given integer @audioFile HelloWorld.java.1177889903437.wav **/

private int returnSum(int num) {

int sum = 0;

for (int i = 1; i < num + 1; i++) { } }

sum = sum + i;

return sum;

Copyright © 2008, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

Journal of Information Technology Research, 1(2), -0, April-June 2008 

Figure 2. Simplified design overview Insert Text Audio Recorder

IDE Doucment Action Handler Recording Event

Recorder Thread

Initialize Recorder

Begin Record Event

Launch

Dialog

it allows the user to access the comments both in audio and written formats.

cOncLusIOn

While the arena of software development was examined in this article, the idea about voice comments can be added to other disciplines. Trambert (2006) showed that adding voice comments to images in radiology improved efficiency by reducing physician confusion and therefore, interaction. For the use of these concepts in the development of software and other knowledge based artifacts can help to reduce the amount required to complete the task at hand, as well as to improve documentation for future updates. By adding voice comments to code or other documents, one is much more likely to quickly find out the purpose of the text and therefore, update it more efficiently. Finally, the 24-Hour Knowledge Factory may also change the overall perspectives of offshoring. This type of development environment allows people to see offshoring as a collaborative endeavor rather than an “us verses them” mentality. The aspect of software development in the 24-Hour Knowledge Factory was explored because this is one discipline where many people in the industry feel that they are losing their jobs to people in other countries. Rather than look at jobs leaving one country to go to another country, individuals can look at the

Converter Thread

24-Hour Knowledge Factory as an environment where everyone wins. Products are developed faster, people get to work normal daylight hours, talent around the world is leveraged and more jobs are created. Voice-based collaboration can help to attain this vision.

rEfErEncEs Aguiar, A., & David, G. (2005). WikiWiki weaving heterogeneous software artifacts. In Proceedings of the 2005 International Symposium on Wikis (WikiSym ‘05). Retrieved November 16, 2007, from http://doi.acm.

org/10.1145/1104973.1104980

Billings, C., Clifton, J., Kolkhorst, B., Lee, E., & Wingert, W. B. (1994). Journey to a mature software process. IBM Systems Journal, 34(1), 46. Cadiz, J. J., Gupta, A., & Grudin, J. (2000). Using Web annotations for asynchronous collaboration around documents. In Proceedings of the 2000 ACM conference on Computer supported cooperative work (p. 309). Dekleva, S. (1992). Delphi study of software maintenance problems. 10-17. Denny, N., Crk, I., Sheshu, R., & Gupta, A. (in press). Agile software processes for the 24-hour knowledge factory environment. Journal of Information Technology Research. Denny, N., Mani, S., Sheshu, R., Swaminathan, M., Samdal, J., & Gupta, A. (in press). Hybrid offshoring: Composite personae and evolving collaboration technologies. Information Resources Management Journal.

Copyright © 2008, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

0 Journal of Information Technology Research, 1(2), -0, April-June 2008 Fish, R. S., Chaljonte, B. L., & Kraut, R. E. (1991). Expressive richness: A comparison of speech and text as media for revision. In Conference on Human Factors in Computing Systems Proceedings of the SIGCHI Conference on Human factors in computing systems: Reaching Through Technology (pp. 1-6). Gould, J. D. (1978). An experimental study of writing, dictating, and speaking. In J. Requin (Ed.), Attention and performance: Vol. 7 (pp. 299-319). Gupta, A., & Seshasai, S. (2007). 24-hour knowledge factory: Using Internet technology to leverage spatial and temporal separations. ACM Transactions on Internet Technology, 7(3). Knuth, D. E. (1984). Literate programming. Computer journal, 27(2), 97-111. Kramer, D. (1999) API documentation from source code comments: A case study of javadoc. In Proceedings of the 17th Annual International Conference on Computer Documentation (SIGDOC ‘99) (pp. 147-153). Neuwirth, C. M., Chandhok, R., Charney, D., Wojahn, P., & Kim, L. (1994). Distributed collaborative writing: A comparison of spoken and written modalities for reviewing and revising documents. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems: Celebrating Interdependence (p. 51). Prechelt, L., Unger-Lamprecht, B., Philippsen, M., & Tichy, W. F. (2002, June). Two controlled experiments

assessing the usefulness of design pattern documentation in program maintenance. IEEE Transactions on Software Engineering, 28(6), 595-606. Sheshu, R., & Denny, N. (2007). The nexus of entrepreneurship & technology (NEXT) initiative (Tech. Rep. 20070220). Unpublished manuscript. Steeples, C. (2000). Reflecting on group discussions for professional learning: Annotating videoclips with voice annotations. 251-252. Steeples, C. (2002). Voice annotation of multimedia artifacts: Reflective learning in distributed professional communities, 10. Trambert, M. (2006). PACS voice clips enhance productivity, efficiency; radiologists, referrers, front offices, and patients benefit from reporting in reader’s own voice. Diagnostic Imaging, S3. Vestdam, T. (2003). Elucidative programming in open integrated development environments for java. In Proceedings of the Second International Conference on Principles and Practice of Programming in Java (PPPJ ‘03) (pp. 49-54). Whittaker, S., Hyland, P., & Wiley, M. (1994). FILOCHAT: Handwritten notes provide access to recorded conversations. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems: Celebrating Interdependence (p. 271).

Katharina O’Toole is currently working with two start up companies. GreenSun is focused on renewable energy and Kastle Data Systems is focused on network attached storage. She holds an MBA from the Eller College of Management at the University of Arizona and masters degree in electrical and computer engineering from the University of Arizona. Srividhya Subramanian is pursuing her master’s in computer science at the University of Arizona. She is currently working with Intel under the Mobility group with the Graphics Software Validation team, as part of her Summer Internship. Her interests and associated academic projects lie in Firmware Validation and Software Development, more precisely, the 24-Hour Knowledge Factory and the use of Agile Processes in this scenario. Nathan Denny is currently pursuing a doctoral degree in computer engineering at the University of Arizona. He holds a master’s degree in computer science from Southern Illinois University and has previous publications in design automation for reliable computing, knowledge management, and computer and Internet security. His current interests include the 24-Hour Knowledge Factory and distributed agile software development.

Copyright © 2008, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

Suggest Documents