A Study in Cooperative Spreadsheet Development Raymond R. Panko Decision Sciences Dept. University of Hawaii 2404 Maile Way, E303 Honolulu, HI 96822 (
[email protected])
Richard P. Halverson, Jr. Information and Computer Sciences Dept. University of Hawaii 2565 The Mall, Keller 319 Honolulu, HI 96822 (
[email protected])
ABSTRACT Studies of spreadsheet usage have revealed that spreadsheeting is very often a cooperative activity. How a group cooperates may be an important factor in its success at pooling domain and programming knowledge of the individual members to produce an error-free spreadsheet. This study compares individual and group spreadsheet development with an expected performance measure from Lorge and Solomon’s model for groups solving eureka-type problems. It found evidence that same place-same time meetings may reduce a group’s effectiveness suggesting that different place or different time interaction may be better for cooperative spreadsheet development. 1. INTRODUCTION Spreadsheeting is one of the two most important personal computer applications [Panko, 1988]. Among managers, it is first in importance [Igbaria, 1992; Igbaria, Pavri, & Huff, 1989; Lee, 1986]. In one recent survey, senior information systems executives rated spreadsheeting as the application making the most significant contribution to their firms [McLean, et al., 1993]. While spreadsheet development by individuals is an important concern, there is growing evidence that spreadsheet development is not just a solitary activity. Wilkins [1992] surveyed recent business school graduates and found that 70% of the respondents 1
were already using spreadsheets and only 5% of their applications were built and used only by themselves. Nardi and Miller’s [1991] ethnographic study of spreadsheet usage uncovered that spreadsheet codevelopment was the rule, not the exception. They found that codevelopers shared both programming and domain expertise to build commonly used spreadsheets, which served as a medium of communication. Groups should be able to produce larger or more useful spreadsheets as well as reduce errors, both because they combine knowledge and because they allow people to check one another’s work. Lorge and Solomon [1955] analyzing previously published data, proposed a model based on the reasoning that groups will perform better than individuals in intellective tasks simply because the probability that at least one person in the group knows the correct answer is higher the more members in the group. We propose that if actual groups perform better then the model would predict, there is synergism. If actual groups perform worse, there is dysfunctionality. If there is dysfunctionality, then we may examine structural or other alternatives for group interaction to help improve group performance. In this experiment we compare individual and group performance using individuals and ad hoc same-place/same-time laboratory groups [McGrath, 1984]. 108 undergraduate business students developed pro forma income statements from a word problem. Subjects worked either individually, as a group of two (dyad) or as a group of four (tetrad). First we verify that groups do in fact perform better than individuals in the experimental spreadsheet task. (This will, in part, validate our experimental task.) Next we examine whether spreadsheet programming errors were significant for our sample. (This will help us see, in part, the effects of group size on spreadsheet programming skills.) Finally we compare the observed group performance with how we would expect them to perform based on the observed individual performance and Lorge and Solomon’s model. (This will
2
help us, in part, examine the efficacy of same-place/same-time groups for spreadsheet development.) Section 2 of the paper describes the task, methodology and sample used for this experiment. Section 2 also reviews Lorge and Solomon’s model for deriving expected performance probabilities from observed individual performance averages. Section 3 discusses the experimental results. Section 4 concludes with our observations and plans for future research in this area. 2. METHODOLOGY This section discusses the experimental methodology. Section 2.1 describes the spreadsheet task and the domain knowledge necessary to participate in the experiment. Section 2.2 reviews Lorge and Solomon’s equations for expected group performance based on observed individual performance. Section 2.3 discusses the experimental design, problem, sample and how it was conducted. 2.1 The Spreadsheet Problem Task Nardi and Miller [1991] observed that to complete a spreadsheet problem task requires both domain knowledge and spreadsheet programming skills. Domain knowledge for this experiment involved understanding certain accounting principles, specifically regarding the construction of pro forma income statements. Errors resulting from incorrect or incomplete domain knowledge would cause either incorrect or absent rows or formulas in the spreadsheet model. Spreadsheet programming skills include knowing how to use spreadsheet software such as keyboarding and typing formulas. Successful programming also involves simple tasks such as reading, transcribing, and avoiding other simple (non-accounting related) mistakes that can cause problems. Errors resulting from incomplete programming skills cause entered or computed cell values to be incorrect.
3
Figure 1 illustrates that for this experiment, domain knowledge consists of (a) the purpose and layout of a pro forma statement (which calculates a company’s year by year net income after taxes). To construct one, the spreadsheet designer(s) must know (b) how to calculate revenues, and (c) how to calculate expenses. The proper way to perform these tasks will likely be actively discussed between members of a group which would eventually come to an agreement on one particular way. Problem Task
Domain Knowledge
Programming Skills
Building a pro forma income statement
Using spreadsheet software, typing formulas, etc.
Figure 1. Knowledge and Skills for Building a Spreadsheet There are two types of spreadsheet programming skills, (a) knowing the functions of the spreadsheet software and (b) constructing formulas and typing numbers without making mistakes. In this experiment, skills such as how to construct and enter simple formulas into cells and copying them from one year to the next were the extent of the spreadsheet skills necessary to do the problem. The performance of these “tasks” would likely not be the result of a lengthy discussion and instead be left primarily to the person at the keyboard. 2.2 Model for Expected Group Performance When a group is performing a subtask requiring domain knowledge, the individuals of the group interact, pooling their knowledge to recognize and figure out how to complete the subtask correctly. It is reasonable to expect that if one individual of the group identifies the correct solution to a subtask, then the others in the group will recognize it as
4
such, or at the least, the person identifying the correct solution would be able to explain it and convince the others. This follows the “truth wins” social combination process model described by Laughlin and Ellis [1986] which they test using mathematical intellective tasks, and how groups solved what Lorge and Solomon [1955] called “eureka-type” problems. Both models assume that if just one individual is able to identify the correct solution, then the rest of the group will eventually recognize it as the correct solution. Therefore, average individual performance of a subtask can be used as the expected probability that an individual will know that particular subtask when part of a group. From this we can derive a projected expected probability for each subtask for dyads and tetrads. For each subtask s, the expected probability that any individual knows the subtask will be ps, where ps is the percentage of individuals that got s correct when working alone. We define qs as the expected probability that an individual does not know subtask s. qs = 1–ps. For evaluating dyads, the probability that neither individual knows the subtask is qsqs or qs2. (We assume individuals are independent.) Therefore, the probability of it being false that neither individual knows the subtask, is the probability that at least one individual knows the subtask, which is 1–qs2 = 1–(1–ps)2. Similarly, for evaluating tetrads, the probability that none of the four individuals knows the subtask is qs4. Therefore, the probability of it being false that none of the four individuals knows the subtask is the probability that at least one individual knows the subtask, which is 1–qs4 = 1–(1–ps)4. If for a particular subtask actual groups are observed on average to perform better than the expected value based on observed individuals, then we have evidence that some synergism is taking place causing the group to discover answers that none of the member individuals would likely have discovered on their own. If actual groups are observed to
5
perform worse than the expected value, then we have evidence that the group interaction is inhibiting an individual (who might normally know the correct answer when working alone) from knowing or being able to convince the other members of the correct answer. 2.3 The Experiment The experiment was conducted over a two year period. The sample consisted of third and fourth year undergraduate business students, all of whom had taken two accounting courses and an introductory computer course covering spreadsheeting. This commonalty allowed us to use a spreadsheet problem requiring basic business accounting knowledge. Students who had taken additional accounting or finance courses were excluded. Students were given course credit to participate. In addition, prizes of $25 per person were awarded to the highest scoring spreadsheets roughly every semester. We allowed subjects to express preferences for times to participate in the experiment. Once they appeared on their chosen day, they were randomly assigned to treatments. Subjects worked in only one condition: individually, in a group of two (dyad), or in a group of four (tetrad). Subjects could use either Lotus 1-2-3 or Microsoft Excel, except tetrads, which used a DOS version of Lotus on a network. We believed that the difference in the two programs would have little effect on the spreadsheet task and it allowed us to use subjects that were trained in Excel and felt uncomfortable attempting to use Lotus for the first time. The length of the problem (subjects had 75 minutes to complete it) and the length of the post-experiment questionnaire discouraged us from using multiple problems in multiple treatments. Spreadsheet knowledge requirements were minimal. Students merely had to enter data and formulas. An experimenter stood by to answer spreadsheet programming questions when the subjects were getting started (e.g., starting formulas with a + sign and widening a column). The purpose was to allow the results to reflect accounting
6
knowledge, general conceptual modeling skills and oversight mistakes, rather than knowledge of specific techniques that are different on different programs. (In practice, the subjects rarely needed help.) The subjects were given a problem statement which explained that their overall task was to build a two-year pro forma income statement for a small company using a spreadsheet modeling program. The problem statement contained the eight specific items listed in Table 1. Each item was relevant for computing the bottom line net income on the statement. Table 1. Items on Problem Statement #
Item
(1)
Owner’s salary is $80,000 per year. Manager’s salary is $60,000 per year
(2)
Tax rate is 25%
(3)
Unit materials cost is $40 and labor cost is $25 the first year. Unit materials cost is $35 and labor cost is $29 the second year
(4)
Capital purchase of $500,000 the first year. 10% straight line depreciation with no scrap value
(5)
Unit sales price is $200 the first year and $180 the second year
(6)
Three sales people with an average salary of $30,000 the first year and $31,000 the second year
(7)
Rent is $3,000 per month
(8)
Sales are projected to be 3,000 the first year and 3,200 the second year
Table 2 lists how the combined items of Table 1 break down into the eight subtasks for building the pro forma income statement for this problem. The third column identifies which items in Table 1 are used for each subtask. Subtask 2.3, charging the capital depreciation expense, was the most difficult as it requires knowing that only the yearly $50,000 (i.e., 10% of the original capital purchase price) depreciation expense is counted on the income statement, and not the $500,000 capital purchase itself.
7
The last subtask in Table 2 is to avoid spreadsheet programming oversights. This is intended to include all the types of mistakes one can make when creating the spreadsheet, such as typing mistakes, formula pointing errors, etc. This subtask may or may not take place as a separate activity (e.g., debugging), but for this experiment, spreadsheets will be scored as having completed this subtask correctly only if they have no spreadsheet programming or oversight errors. Table 2. Subtasks for this problem Subtask 1
Description
Items
Calculate Revenues
(5) x (8)
2.1
Add Owner’s and Manager’s Salaries to Expenses
(1)
2.2
Calculate Manufacturing Costs and Add to Expenses
2.3
Calculate Depreciation Cost and Add to Expenses
(4)
2.4
Charge Sales Salaries to Expenses
(6)
2.5
Calculate Rent and Add to Expenses
(7)
3
Calculate Income Tax
(2)
4
Avoid Oversights*
(3) x (8)
(1)..(8)
* requires only spreadsheet programming skills
3. RESULTS Spreadsheet analysis is appealing in groupwork experiments because performance can be determined unambiguously. To score a spreadsheet, each incorrect subtask was corrected one at a time until the entire spreadsheet was correct. This allowed us to identify each error and to give each spreadsheet a score based on the number of subtasks performed correctly. (The Appendix lists the different errors encountered and their frequencies.) A spreadsheet was given one point for each correct subtask. If there were no spreadsheet programming oversight mistakes, the last subtask was counted as correct. The bottom line was correct when all eight subtasks were completed correctly.
8
Section 3.1 verifies that a larger group size does increase the chance of producing a correct spreadsheet for this problem task. Section 3.2 examines whether spreadsheet programming errors were significant for this sample. Section 3.3 compares the observed group performance with the expected values from Lorge and Solomon’s model. 3.1 Group Size and Correctness Individuals working alone will not have the benefit of consultation with partners. Individuals must rely on their own knowledge and meditation. Dyads will have the benefit of two minds working together trying to correctly perform the subtasks. Tetrads will have four people, all working together towards the same goal of creating a correct spreadsheet model of the hypothetical business described in the problem statement. First we will compare the average number of subtasks completed for each of the three group sizes. We expect the average number of subtasks completed by the individuals to be less than the average number completed by the dyads, which will be less than the average number completed by the tetrads. Table 3 compares the percent subtask completion scores between the three group sizes. On the average, individuals completed 79.9% of the eight subtasks. Dyads completed 85.6% on average while tetrads completed 92.5%. With F = 4.4095, larger groups got higher scores (p < .02). To determine the significant differences between the different group sizes, the results of a Duncan’s multiple range test is shown in Table 4. This comparison method was chosen to balance concerns over Type I and Type II errors. The table shows that only the difference between individuals and tetrads was statistically significant (p < .05). From this we can still conclude that group size does make a difference for this spreadsheet problem task. We see evidence that adding group members will increase group performance.
9
Table 3. Group Size and Spreadsheet Score Groups
Count
Sum
Average Score
Variance
Individuals
28
22.38
79.91%
0.0189
Dyads
20
17.13
85.63%
0.0120
Tetrads
10
9.25
92.50%
0.0042
SS
df
MS
F
Between
0.1243
2
0.0622
4.4095*
Within
0.7754
55
0.0141
Total
0.8998
57
ANOVA Source
* p < .02
Table 4. Duncan’s Multiple Range Test Average Score
Individuals 79.91%
Individuals
Dyads 85.63%
Tetrads 92.50%
Shortest Significant Ranges
0.0571
0.1259*
R2= 0.0845 R3= 0.0889
Dyads
0.0688
Tetrads * p < .05
3.2 Programming Errors There will no doubt be at least some spreadsheet programming errors made by individuals, dyads and tetrads. Here we wish to examine if these errors alone cause a significant change (i.e., decrease) in the average number of spreadsheets with correct bottom lines for each group size. For testing the effects of programming errors, we will use paired t-tests with a hypothesized mean difference of zero between the correctness of a spreadsheet with programming errors and the same one with the programming errors corrected. If a significant number of spreadsheets had incorrect bottom lines only because of a 10
programming error, then it would be worth investigating group process modifications specifically for eliminating these types of errors. We expect that the average number of correct bottom lines to increase when we correct only the oversight mistakes in the spreadsheets constructed by individuals. As Table 5 shows, 17.86% of the individuals had correct bottom line spreadsheets. If, however, oversight errors were eliminated, 28.57% of the individuals would have had correct bottom lines. This difference produced a t = –1.800 which was significant at p < .05. Table 5. With/Without Oversights: Individuals Oversights
NoOversights
Average
17.9%
28.6%
Variance
0.1521
0.2116
Observations
28
28
Hypo Mean Diff
0
df
27
t Stat
–1.8000* * p < .05
Similarly, we expect that the average number of correct bottom lines to increase when the spreadsheets constructed by dyads are corrected of oversight mistakes. Table 6 shows the results for dyads. 25% of the laboratory dyads developed correct bottom line spreadsheets. If oversight errors were eliminated, 45% of the spreadsheets would have been correct. This difference was significant at p < .05. With tetrads we also expect that the average number of correct bottom lines with spreadsheets with oversight errors to be less than the same spreadsheets with no oversight mistakes. Table 7 shows that 40% of the tetrads developed correct bottom line spreadsheets. If oversight errors were eliminated, 60% would have been correct, but 11
choosing p < .05 as our significance level, we could not show evidence that this difference was not due just to random error. Table 6. With/Without Oversights: Dyads Oversights
NoOversights
Average
25.0%
45.0%
Variance
0.1974
0.2605
Observations
20
20
Hypo Mean Diff
0
df
19
t Stat
–2.1795* * p < .05
Table 7. With/Without Oversights: Tetrads Oversights
NoOversights
Average
40.0%
60.0%
Variance
0.2667
0.2667
Observations
10
10
Hypo Mean Diff
0
df
9
t Stat
–1.4213* * p < .1
Tables 5 and 6 suggest that, at least for individuals and dyads, an extra effort to eliminate “dumb mistakes” from the spreadsheet would be worthwhile. Also, we can experiment with adding specific steps to the group process for eliminating these types of errors. Table 7 suggests that since we did not reach significance at the p < .05 level with tetrads, that tetrads may be better at not letting these types of mistakes go undetected, or perhaps our sample size wasn’t large enough to reach a stronger significance.
12
3.3 Evaluating Group Performance As explained in Section 2, we will use a truth-wins combination rule to obtain an expected level of performance for groups, based on how the sample of individuals performed. When groups interact to come up with a solution to an intellective task, ideally, if one individual of the group recognizes the correct answer, he or she should be able to explain it to the others so the group ends up getting the correct answer. The observed laboratory dyad performance ODS for each subtask s (and individual performance probability ps) is expected to be EDS = 1–(1–ps)2. Similarly, the observed laboratory tetrad performance OTS for each subtask s is expected to be ETS = 1–(1–ps)4. The observed groups may perform better than expected, in which case we may conclude that a synergistic effect takes place when groups interact. Also likely, however, is that the observed groups will perform equal or less than expected. Groups performing less than expected would imply some dysfunctionality in the group interaction, suggesting that occasionally a person who knows the correct answer is not able to convince the others in the group. If this is observed, in the future we can experiment with strategies for process modifications to overcome this dysfunctionality, allowing groups to perform at least as well as theoretically expected according to Lorge and Solomon’s model. For comparing the observed against the expected probabilities, the frequency that a subtask is correct is compared against the expected value for that size given by Lorge and Solomon’s model. A χ2 (chi square) analysis is performed whenever the expected frequencies are greater than or equal to five. At these times we report the χ2 values and significance probabilities on how closely the laboratory groups performed compared to expected on the given subtask. Table 8 lists the percentage of individuals that got each subtask correct. We see from column 3 that 96.4% of the individuals calculated the revenues correctly. 92.9% knew to
13
charge the owner’s and manager’s salaries as expenses. Only 46.4% knew how to calculate the capital depreciation expense. Every one knew to charge the salesperson salaries. 71.4% calculated the income tax correctly. Only 42.9% of the individuals had no oversight mistakes on their spreadsheet. Table 8. χ2 Expected Group Performance vs. Observed (Percentage Correct) Individual s
Subtask
ps NI=28
1
Calculate Revenues
96.4%
2.1
Owner’s and Manager’s Salary
92.9%
2.2
Manufacturing Costs
92.9%
2.3
Capital Depreciation
46.4%
2.4
Sales Salaries
100.00%
2.5
Rent Expense
96.4%
3
Income Tax
71.4%
4
Avoid oversight errors
42.9%
Correct Bottom Line
17.9%
Dyads Expected 1–(1–ps)2
Tetrads
Observed ND=20
Expected 1–(1–ps)4
25.0%
54.5%
Observed NT=10
AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA 99.9% 85.0% 100% 100% AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA 99.5% 100% 100% 100% AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAAAAAAAAAAAAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA 99.5% 90.0% 100% 100% AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA 71.3% 65.0% AAAA 91.8% 60.0% AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAAAAAA AAAA A AAA AAAA AAAA AAAA AAAA AAAA 2 AAAA AAAA AAAA AAAAAAAA AAAA AAAA χ =8.648 AAAA AAAA AAAAAAAA AAAA AAAA AAAAAAAA AAAA AAAA AAAAAAAA AAAA AAAA AAAAAAAA AAAA AAAA AAAA AAAA AAAAAAAA AAAA AAAA p