Gender in End-User Software Engineering - ACM Digital Library

16 downloads 43 Views 208KB Size Report
Gender in End-User Software Engineering. Margaret Burnett*, Susan Wiedenbeck†, Valentina Grigoreanu*,. Neeraja Subrahmaniyan*, Laura Beckwith*, Cory ...
Gender in End-User Software Engineering Margaret Burnett*, Susan Wiedenbeck†, Valentina Grigoreanu*, Neeraja Subrahmaniyan*, Laura Beckwith*, Cory Kissinger* †Drexel University

*Oregon State University Corvallis, OR, USA

Philadelphia, PA

{burnett,grigorev,subrahmn,beckwith,ckissin}@eecs.orst.edu

[email protected]

ABSTRACT

debugging tools seemed to be different than males’ behavior. This led us to conduct a series of studies investigating whether there were indeed gender differences pertinent to the design of these end-user software engineering tools.

In this paper, we describe research that reports gender differences in usage of software engineering tools by end-user programmers. We connect these findings with possible explanations based on theories from other disciplines, and then add to that our recent results that these differences go deeper than software engineering tool usage to software engineering strategies. We enumerate the strategies that work better for males and the ones that work better for females, and discuss implications and possible directions for follow-up.

Our results showed consistently that male and female end-user programmers did indeed use debugging tools differently. In this paper, we first offer possible explanations for these phenomena from theories in the areas of information processing and problem solving, demonstrating how these differences should indeed affect males’ and females’ behaviors during end-user debugging. Second, we describe our most recent empirical results on gender in end-user software engineering, focusing on two lines of research: (1) what strategies males and females employ in debugging and (2) how two variants of just-in-time explanations of debugging strategy impact males’ and females’ debugging.

Categories and Subject Descriptors D.2.5 [Software Engineering]: Testing and Debugging; H.1.2 [Information Systems]: User/Machine Systems—Human factors; H.4.1 [Information Systems Applications]: Office Automation— Spreadsheets

2. BACKGROUND

General Terms

Areas such as psychology and marketing have identified ways in which males and females differ, and these differences have been linked to behaviors in software-based problem-solving tasks such as end-user debugging [2].

Human Factors, Reliability

Keywords Gender, debugging, end-user programming, end-user software engineering, tinkering, self-efficacy, strategy, Surprise-ExplainReward.

In psychology, Bandura defines the construct of self-efficacy as an individual’s judgment of his or her ability to carry out a specific action and thus to attain a desired performance outcome [1]. Individuals who have low self-efficacy tend to exhibit lower use of cognitive strategies, less persistence, and lower effort overall than individuals who have high self-efficacy. Indeed, studies of learning computer applications showed that females had lower self-efficacy than males [8, 17], but they did not go on to tie the effects of self-efficacy to performance outcomes.

1. INTRODUCTION A goal of our research is to support people who engage in enduser software engineering tasks, such as testing and debugging spreadsheet formulas. For example, we have developed features that can be seamlessly blended into spreadsheet software to encourage and assist end users in systematically testing and debugging spreadsheets [7]. In the course of this research we began to notice that females’ behavior with these testing and

An empirical study that our group carried out on the effects of self-efficacy in end-user debugging did go on to show downstream effects of self-efficacy [3]. The environment used was WYSIWYT (“What You See Is What You Test”) which provides visual debugging tools [7]. Participants in the study, males and females, were each given a research spreadsheet with enhanced features to aid debugging. Their self-efficacy was measured before doing the debugging tasks. As in the studies above, females had lower self-efficacy than males. But more importantly, their self-efficacy was tied to their performance. Females’ level of self-efficacy predicted their final percent testedness of the spreadsheet—low self-efficacy was associated with low feature usage, whereas males’ performance was not

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. WEUSE IV’08, May 12, 2008, Leipzig, Germany. Copyright 2008 ACM 978-1-60558-034-0/08/05...$5.00.

21

predicted by self-efficacy, suggesting that self-efficacy is much more important in females’ problem solving than males’. Followon studies sometimes found females’ self-efficacy to be lower than males and sometimes did not—but they consistently found this phenomenon of a significant tie between females’ selfefficacy and their success, but no such tie for males [4].

number of bugs fixed. Males tinkered highly, even excessively, in the low-cost interface, which makes tinkering easy to do. They tended to tinker without pausing to reflect, and this may have reduced both the educational benefit of tinkering and correspondingly their effectiveness in debugging [14]. In contrast to males, females tinkered equally in both interfaces, and their tinkering was predictive of testing effectiveness and increased bugs fixed. Females paused during tinkering more than males did. Tinkering in a moderate and “pause-ful” manner appears to have helped the females learn about the features, leading to effective outcomes. Nevertheless, self-efficacy remained an important predictor of effectiveness in females’ debugging, interacting with the benefits gained in tinkering. For example, in some cases in the high-support environment, females’ tinkering actually interfered with their self-efficacy.

Research in motivation has shown that females perceive higher risks than males do in many situations [10], including intellectual risks [9]. Blackwell’s Attention Investment Model [6] describes risks and benefits in problem solving using programming by end users who are not necessarily highly skilled. In the model, an individual considers the perceived benefit of programming and its expected payoff versus the perceived risk and cost if the programming fails. If females are risk-averse they may perceive the risks and cost to greatly outweigh the benefits of programming. Blackwell’s model applies not only to programming, but also to other software-based problem-solving tasks, especially those requiring use of (perceived) sophisticated or time-intensive devices in the software environment.

3. CURRENT AND FUTURE RESEARCH DIRECTIONS 3.1 Strategies

In our study above, we did find evidence of risk-averse behaviors by females. As we have mentioned, several features were provided to help participants debug spreadsheets. Females showed significantly lower acceptance of these debugging features. They were willing to edit formulas, a debugging feature they were familiar with, but were less willing than males to initially try out the new features. The females were also less willing to adopt new features, that is, to intellectually engage with the features in repeated usage. In a post-questionnaire, females said that they did not use a new feature because they thought it would take too long to learn, suggesting that risk was an issue with them and is a barrier to adopting new features that could improve their debugging. Here again self-efficacy arises; although the females said they thought it would take them too long to learn the features, by the end of the task, our measures of feature comprehension showed no differences in how well males versus females actually understood the features.

Because of the tool usage and behavior differences we had found in end-user debugging, we wondered whether the differences went deeper than mere behaviors, down to their strategies. Thus, we decided to investigate what debugging strategies end-user programmers were actually trying to follow, and whether there were gender differences in those strategies. Research in the area of marketing also provides reasons to think that there may be gender differences in end-user debugging strategies. The Selectivity Hypothesis [13] proposes that males and females behave differently in decision making and problem solving. Females tend to process information in a comprehensive way, examining all the available cues and making elaborative inferences in order to make a decision. Furthermore, they practice comprehensive processing whether the problem is simple or complex. Males, on the other hand, avoid comprehensive information processing, using heuristics to make decisions and only falling back on comprehensive processing if a complex task requires it. If the Selectivity Hypothesis holds across multiple areas involving problem solving and decision making, female end users may be most effective using debugging strategies that make good use of comprehensive processing.

A partial replication of the study above, with real-world end-user developers using Excel on a real-world spreadsheet, showed the findings to be robust. It replicated the finding that females’ selfefficacy predicts effectiveness, while males’ self-efficacy does not. Furthermore, females in the Excel study again focused on the most familiar features, especially value edits in this study, suggesting that they may have avoided more complicated features (or at least those features that were perceived to be complicated)—something that did not occur among the males.

There are many potential strategies for debugging spreadsheets. For example, one might begin by testing, and then use a data flow strategy to narrow down the possible cause of incorrect output. Alternatively, one might inspect the code looking for logic flaws. To discover the strategies males and females attempted to use in debugging, we asked them via an open-ended question what strategies they used. We then combined these verbal descriptions of strategies with behavioral evidence of these strategies, to determine whether males and females actually used different strategies in debugging and whether those strategy choices led to success.

We then carried out a follow-on empirical study to understand male and female end-user programmers’ exploration and selflearning of spreadsheet testing and debugging features. In this study we used two environments, one similar to the prior study (called the low-cost environment) and a second environment that was also similar but designed to provide greater support (called the high-support environment).

Their responses revealed eight strategies: testing, dataflow, code inspection, specification checking, color following, to-do listing, fixing formulas, and spatial. Three strategies stood out: testing, dataflow, and code inspection.

Participants engaged in tinkering (or playful experimentation) in exploring and using new features. Educational literature reveals that hands-on, self-guided tinkering can benefit learners [14]. However, the educational literature also reports that males have a greater propensity to tinker than do females [11].

The behavioral evidence showed that testing values to verify cells’ correctness was a successful strategy for males. In particular, the males who had high success in debugging used testing more than the less successful males; likewise, successful

In the study, males tinkered a great deal, but across both environments their tinkering did not predict effective testing nor

22

male debuggers used testing more than successful female debuggers. Males also used dataflow strategies more than did females, and furthermore males using dataflow were more successful at the end of the experiment than were females who used dataflow.

approach debugging strategies, rather than explaining features. Six such explanations were developed: how to find errors, how to fix errors, how I can test my spreadsheet, why should I change values, what is a good overall strategy, and am I doing it right. One important question is whether participants would take the time to learn via the explanations—the Attention Investment Model makes clear that individuals assess the potential benefits and costs in order to make a decision.

By contrast, code inspection was more associated with females; successful females had more instances of code inspection than males and more total formulas displayed in all instances. Referring back to the Selectivity Hypothesis, it may be that females using code inspection were practicing comprehensive processing, investigating many formulas in detail. (Males, in contrast, may have been successful because strictly following dataflow arrows through the spreadsheet minimizes the amount of information processing required.) Other strategies that were successful for females but not for males were specification following in which users verified formulas via comparison to a specification document, as well as to-do listing (marking cells to track code inspection progress).

Tool tips in WYSIWYT addressed part of these explanations but were limited to a short text, and did not seem to be a viable way to explain strategies. We therefore chose two new two vehicles for short strategy explanations: video explanation snippets (1-2.5 minutes) and hyperlinked textual explanations. The videos showed two people (one male and one female) problem solving and discussing one of the six strategies, while demonstrating how to carry them out on a spreadsheet. The female took on the role of the confused or questioning student, who in the end of the video was always successful. This was done because one way of increasing self-efficacy is to view another person like oneself struggle in a task and ultimately succeed [1]. The wording of the text version was identical to the video except that there was no sample spreadsheet to which the reader could refer. The participants were 7 males and 3 females. When they needed an explanation during the course of debugging, they had the option of using either the text version or the video.

Overall, of the eight strategies the end-user participants described, there were gender differences for seven of them! A main result of this study was that debugging strategies that worked well for males did not work well for females. A disadvantage for female debuggers was that, for the most part, their preferred strategies of code inspection, specification following, and to-do listing, are not well supported by features in end-user environments. Future work may assess whether new design features can provide better support for these strategies.

The results showed that the explanations improved the participants’ choices, allowing them to close information gaps that hindered their debugging. In addition, females reported an increase in their confidence due to the explanations, but no male suggested this. In this small group of participants the choices of media varied, suggesting both presentation choices should be provided. The findings also provided recommendations for changes to future versions of the explanations: some participants lacked the motivation to read or view the explanations and some misinterpreted the explanations.

3.2 Strategy Explanations We have also been exploring how to support the end-user programmers who struggle to find a suitable strategy, because the above study also revealed that not everyone used reasonable strategies to track down the bugs. In particular, we would like to entice them into strategies that have a chance of working well for them. The WYSIWYT environment in which we prototype our approaches provides a motivational strategy, called SurpriseExplain-Reward, which attempts to entice end-user programmers to use end-user software engineering features for systematic testing and debugging [18]. The first step in the strategy is to (gently) surprise the user with something unexpected in the environment, such as a red colored cell border. Second, if the user is curious about it, he or she can seek an explanation via tool tips that pop up when the user hovers over the red cell border. Finally, if the user reads and carries out what is advised in the tool tip, rewards may occur; for example, the feature might help them find the bug. While the surprises and rewards are fairly well developed, there is still much to do in determining what explanations users want and how they can most effectively be conveyed.

4. IMPLICATIONS AND CONCLUSIONS The implications of our findings so far are quite clear: the features in supposedly gender-neutral end-user software environments are not gender-neutral after all. Males’ and females’ use of and benefits from these features vary greatly. Currently, we are designing a larger and more definitive empirical study of strategy explanations and gender. We are also investigating whether small changes to the attributes of features may remove some of the barriers to males’ or females’ success with and willingness to use those features. We point out that there are no “typical” males or “typical” females. For example, many males use comprehensive information processing, and many females use heuristic processing. Therefore, we are not advocating separate versions of software for each gender. Rather, we advocate adjusting features and offering flexible options so that any end-user programmer whose cognitive or problem-solving style does not fall into the patterns preferred by the developer of that software can still be effective in their software development efforts.

We have conducted an initial qualitative empirical study to determine what types of explanations users need in debugging [12]. These explanations fell into five groups. One of the most prominent needs elicited from the study participants was explanations of how to proceed in debugging, e.g., what would be a suitable strategy or how to accomplish a particular goal. This type of information gap was by far greater than explanations of new features.

5. ACKNOWLEDGMENTS This work was supported in part by Microsoft Research and by NSF CNS-0420533, ITR-0325273 and CCR-0324844.

Building on these new findings on information gaps, we carried out a qualitative empirical study focusing on explaining ways to

23

6. REFERENCES [1] Bandura, A. Self-efficacy: Toward a unifying theory of behavioral change. Psychological Review 8, 2 (1977), 191215.

[11] Jones, M. G., Brader-Araje, L., Carboni, L. W., Carter, G., Rua, M. J., Banilower, E. and Hatch, H. Tool time: Gender and students’ use of tools, control, and authority. Journal of Research in Science Teaching 37, 8 (2000), 760-783.

[2] Beckwith, L. and Burnett M. Gender: An important factor in end-user programming environments? In Proc. IEEE Symposium on Visual Languages and Human-Centric Computing (2004), 107-114.

[12] Kissinger, C., Burnett, M., Stumpf, S., Subrahmaniyan, N., Beckwith, L., Yang, S., and Rosson, M. B. Supporting enduser debugging: What do users want to know? In Proc. Advanced Visual Interfaces, ACM Press (2006), 135-142.

[3] Beckwith, L. Burnett, M., Wiedenbeck, S., Cook, C., Sorte, S., and Hastings, M. Effectiveness of end-user debugging software features: Are there gender issues? In Proc. CHI 2005, ACM Press (2005), 869-878.

[13] Meyers-Levy, J. Gender differences in information processing: A selectivity interpretation. In P. Caffarata & A. Tybout (Eds.), Cognitive and Affective Responses to Adverrtising. Lexington, MA, Lexington Books, 1987.

[4] Beckwith, L., Inman, D., Rector, K., Burnett, M. On to the real world: Gender and self-efficacy in Excel, In Proc. VLHCC, IEEE (2007).

[14] Rowe, M.B. Teaching Science as Continuous Inquiry: A Basic (2nd ed.). McGraw-Hill, New York, NY 1978. [15] Subrahmaniyan N., Beckwith, L., Grigoreanu, V., Burnett, M., Wiedenbeck, S., Narayanan, V., Bucht K., Drummond, R., Fern, X. Testing vs. Code Inspection vs. ... What Else? Male and Female End Users’ Debugging Strategies. In Proc. CHI 2008, ACM Press (to appear).

[5] Beckwith, L. Kissinger, C., Burnett, M., Wiedenbeck, S., Lawrance, J., Blackwell, A., and Cook, C. Tinkering and gender in end-user programmers’ debugging, In Proc. CHI 2006, ACM Press (2006), 231-240.

[16] Subrahmaniyan N., Kissinger, C., Rector, K., Inman, D., Kaplan, J., Beckwith, L., and Burnett, M. Explaining debugging strategies to end-user programmers. In Proc. IEEE Symposium on Visual Languages and Human-Centric Computing (2007), 127-134.

[6] Blackwell, A. First steps in programming: a rationale for attention investment models. In Proc. IEEE Human-Centric Computing Languages and Environments (2002), 2-10. [7] Burnett, M., Cook, C. and Rothermel G. End-user software engineering. Communications of the ACM 47, 9 (2004), 5358.

[17] Torkzadeh, G. and Koufteros, X. Factorial validity of a computer self-efficacy scale and the impact of computer training. Educational and Psychological Measurement 54, 3 (1994), 813-821.

[8] Busch, T. Gender differences in self-efficacy and attitudes toward computers. Journal of Educational Computing Research 12, 2 (1995), 147-158.

[18] Wilson, A., Burnett, M., Beckwith, L., Granatir, O., Casburn, L., Cook, C., Durham, M. and Rothermel, G. Harnessing curiosity to increase correctness in end-user programming. In Proc. CHI 2003, ACM Press (2003), 305–312.

[9] Byrnes, J. P., Miller, D. C. and Schafer W. D. Gender differences in risk taking: A meta-analysis. Psychological Bulletin 125, (1999), 367-383. [10] Finucane, M., Slovic, P., Merz., C-K., Flynn, J. and Satterfield, T. Gender, race and perceived risk: the white male effect. Health, Risk and Society 2, 2 (2000), 159-172.

24

Suggest Documents