EXPLORING THE USE OF TECHNIQUES FROM GROUNDED THEORY IN PROCESS ENGINEERING By Lucila Carvalho, Louise Scott and Ross Jeffery
Report No. 03/1 Centre for Advanced Software Engineering Research (CAESER) School of Computer Science and Engineering University of New South Wales Sydney 2052 Australia © 2003 CAESER 1
Exploring the use of techniques from Grounded Theory in Process Engineering Lucila Carvalho, Louise Scott and Ross Jeffery l.carvalho, l.scott,
[email protected] Centre for Advanced Software Engineering Research (CAESER) School of Computer Science and Engineering University of New South Wales, 2052, Sydney, Australia
Abstract The paper describes a study that investigated two models that were generated from the same process data by two different researchers. One of the models was generated by a psychologist inexperienced in process modelling. This model was developed based on the psychologist’s data analysis using grounded theory procedures. The second model was generated by an experienced process engineer using an ad hoc approach relying heavily on experience and skill. The results show that (1) grounded theory methods can be applied to analyse process data, (2) a person with little experience in process modelling can produce a process model based on the data analysis using grounded theory, and (3) the process model produced will not necessarily be equivalent to that produced by an experienced process engineer. The paper describes grounded theory methods and explores the nature of the differences in the two models produced. It then suggests ways in which grounded theory methods may contribute to the process modelling task. It also serves as exploratory research on the application of grounded theory methods in software engineering research domain. Keywords Qualitative Research Methods, Process Engineering, Grounded Theory, Software Engineering
2
1. Introduction Research scientists in the software engineering domain are constantly examining the potential advantages and/or disadvantages of using innovative approaches in their work. These scientists explore (among other things) the contribution different methodologies or theories derived from other disciplines may add to the domain. Recently, there has been discussion about the possible contributions of qualitative methodologies, such as grounded theory, to complement empirical investigations in the field [1]. In this paper we explore the use of grounded theory methods in the software engineering domain by analysing its potential contribution to the construction of software process models from elicited process data. In this we are not seeking to develop a general theory of software process modelling, but rather to explore the benefits that might be obtained by the use of the methods in a software engineering problem domain. As such this work is exploratory. Descriptive software process modelling is an important part of any software process improvement programme. Descriptive modelling allows process engineers to understand existing processes, communicate process and analyse existing practices for improvement. Much has been written about process modelling languages and tools [2-4], but little about the elicitation of process models. In general, a process engineer elicits process data from interviews, documents, surveys, e-mails and observation for example, and then interprets this data to produce a process model. This approach relies heavily on the experience and skill of the process engineer, which can be very costly, and without any structured method quality and repeatability cannot be ensured. It can be seen from the types of data involved (e.g. interviews, observations, e-mails) that this field lends itself naturally to the application of qualitative data analysis techniques. Grounded theory is a qualitative and inductive method, which allows the emergence of theory in an investigation. Theory is developed as a result of the investigator’s interpretation of meanings, experiences, events, and the reality of the phenomena under study. Theory here consists of developing “plausible relationships among concepts and set of concepts” (p. 168) [5] and as such may provide at least a methodical way of producing a process model from process data. The theory generated by a grounded theory study is always traceable to the data from 3
which it was developed – “within the interactive context of data collecting and data analysing, in which the analyst is also a crucially significant interactant” (p.169) [5]. Within this, the goal of this paper is to explore the use of grounded theory methods in process modelling. In particular, the research questions to be answered in the study were: Q1. Could grounded theory methods be used to analyse elicited process data? Q2. Could a psychologist, inexperienced in software engineering, using grounded theory methods, produce a process model from process data? Q3. Would a model produced by a psychologist, inexperienced in software engineering, using grounded theory methods, be equivalent to a model produced by an experienced process engineer using an experiential approach? Q4. How could grounded theory contribute to process modelling?
In order to answer these questions, we produced independently two software development process models derived from the same textual data. A psychologist, experienced in qualitative research methods but with limited knowledge of software engineering created the first model, which was developed based on data analysis using the grounded theory methods. The second model was produced by an experienced software engineer using an ad hoc approach.
The results showed that it was possible to analyse the process data using grounded theory methods. Moreover, the psychologist was able to derive a process model from the data analysed. Under comparison, however, the psychologist’s model was very different to the model produced by the experienced process engineer. The models differed significantly in both the types of entities they described and the details of the entities. Very little correspondence was found in the relationships between entities described in the models. During discussions, concrete reasons for some of the differences were found, including effects of the grounded theory methods, the differing background and experience of the researchers, human error and misunderstanding. The analysis suggests however that grounded theory methods could contribute to process modelling by encouraging systematic decomposition of process concepts and ensuring coverage of the data. It also
4
suggests that grounded theory methods alone cannot replace software engineering experience in the interpretation of the data. The next section of this paper presents the background for the study, which includes an overview of current practices in process modelling, some considerations on the nature of social sciences and an overview of the grounded theory methodology. Section 3 describes the research design for the study, including participants, procedures, approaches and analysis. In Section 4 the results are presented and then discussed in Section 5. Section 6 discusses the limitations of the study and Section 7 concludes the paper.
2. Background This section discusses the background relevant to this study including problems faced in process modelling and how grounded theory methods may be applied to solve these problems. It also presents some considerations on the nature of social sciences and describes the grounded theory approach to qualitative data analysis. 2.1. Process modelling Software Process Improvement (SPI) is a recognised way for companies to ensure they are able to produce high quality software competitively. Descriptive process modelling is a central activity of SPI. Descriptive modelling allows process engineers to understand existing processes, communicate processes and analyse existing practices for improvement. For this reason, much work has been done on proposing languages, techniques and tools for descriptive process modelling (see, for example, [2, 4, 6]). The aim of descriptive process modelling is to produce quality models in terms of accuracy, completeness and fitness for purpose in an efficient, repeatable manner. According to the ELICIT method [6], descriptive process modelling consists of four core activities from understanding the environment to validating the process models developed. The development of the process models includes the sub-steps “elicit process 5
info”, “translate” and “review models”. Eliciting information may be done by observation (e.g. video taping), interview, survey or document analysis, for example. It is the translation step that interests us in this paper. This is the activity that in this case takes the elicited process information and creates a process model. In order to translate the process information the Elicit approach recommends analysis of verb-noun relationships in the data (for example, documents or interview transcripts) for the identification of the components of the process models, where nouns represent potential objects or entities, and verbs represent potential operations on the objects or actions. However, this approach can be problematic because sentences can be imprecise or incomplete, facts can be expressed implicitly and verbs and nouns can be overloaded, so that the approach still requires a lot of interpretation by the process engineer. Also, this approach cannot guarantee the quality of the resulting model, is generally not repeatable and requires a large amount of effort by the process engineer. While process elicitation, process languages and process tools have received much attention in process modelling research there is still no systematic method for translating process data to a descriptive process model. Without a systematic method, quality of the resultant model cannot be ensured, repeatability is low (i.e. it is unlikely that different engineers will produce the same model) and the efficiency of the translation may be poor. Because it is so poorly understood and often involves interpretation and translation from large amounts of unstructured textual data (e.g. interviews, surveys, documents and emails), this aspect of process engineering is an ideal candidate for the use of a qualitative research method such as grounded theory. 2.2. Nature of Social Sciences In this paper we discuss the comparison of two process models, each independently constructed based on a specific approach and method. When comparing these two models it is important to think about some of the underlying differences the two researchers might bring to their work a priori, because these differences are not only in the method 6
employed but also grounded in theoretical and philosophical elements that lie behind any individual’s choice of a particular method and methodology. When exploring the use of a particular method is very important to be aware of the philosophical foundations of such method. Whenever a social scientist approaches a subject, there are always explicit or implicit assumptions about the nature of the social world and the way the phenomenon may be investigated [7]. As described by Burrel and Morgan [7], these assumptions might be of an ontological, epistemological, human and methodological nature. Assumptions of an ontological nature are the ones concerned with the essence of the phenomena, with whether the ‘reality’ under investigation is external to the individual or a product of his/her mind. Epistemological assumptions deal with the basis of knowledge; how it is possible to identify and communicate what we know. Assumptions of a human nature are concerned with the relationship between human beings and their environment. These three sets of assumptions have then important implications in the methodological assumptions, so affecting the way we investigate and obtain knowledge about the social world. Thus the result is that different ontologies, epistemologies and models of human nature can lead social scientists towards the use of different methodologies. This choice is so vast that “what is regarded as science by the traditional ‘natural scientist’ covers but a small range of options.” [7]. Crotty [8] provided an interesting framework for guiding the social research process and clarifying how methods and methodologies relate to more theoretical elements. This framework considers four elements in the research process, which are: methods, methodology, theoretical perspective and epistemology. Methods are the specific techniques and procedures used in data gathering and data analysis. Methodology comprises the research strategy and the plan of action that lie behind the choice of the methods. Theoretical perspective refers to the philosophical stance lying behind the chosen methodology. This means that when the social scientist chooses a particular 7
methodology, this choice is something connected with the assumptions this scientist has about the reality. By asking about these assumptions we ask about the scientist’s theoretical perspective. Epistemology as mentioned before deals with the way we understand and explain how we know what we know. Traditionally, most empirical studies in software engineering appear to have objectivism as their underpinning epistemology. Objectivism considers truth and meaning as independent of individual consciousness and experience. The scientists assume that there is an objective truth, a meaningful reality that is detached from individuals, a reality which is “out there” regardless of human awareness, waiting to be discovered. Objectivism is in the foundations of the positivist approach. On the other side of the spectrum is the constructionist view, which considers (as the name says) that all knowledge, and therefore meaningful reality are constructed. Human beings construct meaning as they interact with the world they are interpreting. Rather than searching for the objective truth, constructionists consider that there is no meaning without a mind. In constructionism different people may construct meaning in different ways even when examining the same phenomenon. Constructionism is the epistemology found embedded in symbolic interactionism. This theoretical perspective emphasizes the symbolic nature of human interaction, linguistic and gestural communication. For symbolic interactionists social reality and human behaviour is conceptualised as symbolic, communicated, and subjective. Symbolic interactionism is regarded as the theoretical perspective that is in the basis of the grounded theory methodology [8]. 2.3. Grounded theory Grounded theory was first described by Glaser and Strauss in 1967 [9] as a method for the study of complex social behaviour from a sociological point of view. Since then, grounded theory has been used as a research strategy in many studies, within different 8
disciplinary contexts [10]. In information systems, examples of studies using grounded theory can be found in [11] where the author applies its techniques to investigate interactional tactics used by analysts and clients in requirements gatherings. Another example can also be found in [12] where the author explores organizational transformation, by examining subtle shifts in organizational actors’ actions and implications of such changes for the organization. The basis of grounded theory is that theory is developed inductively from the data, being therefore generated (or grounded) in a process of continual sampling and analysis of data [10, 13]. The dynamic relationship between data analysis and data collection is a significant characteristic of the grounded theory approach. The theory is shaped by two fundamental analytical commitments: the method of constant comparison and the theoretical sampling. The method of constant comparison specifies that the investigator will continually examine and compare elements (such as data instances, emerging categories, theoretical propositions), throughout the whole research project. Theoretical sampling is related to the sampling of new data as the analysis proceeds. This means that the researcher doesn’t necessarily need to wait until all the data is collected to start the analysis. The data analysis may start as soon as sufficient material is available to work on, and this will drive the sampling of additional data. This new data is often selected by its potential for generating new theory by extending or deepening the investigator’s evolving understanding of the phenomena under study. Most studies of a qualitative nature gather their data through the use of observations, questionnaires and interviews. Nevertheless investigation of archival material has also been reported in qualitative studies: the data sources can be documents, newspapers or books [10]. Strauss and Corbin [10] considered that a “cache of archival material” is equivalent to a collection of interviews and field notes. The authors explain that when using archival material the grounded theory procedures (the sample and the interplay of coding and sampling) follow exactly the same techniques used with interviews or observational data. They stress however that documentary data should not be located in 9
just one single place. As in any qualitative study, grounded theory values triangulation (i.e. the gathering of the data from more than one source of evidence). The process of analysis in grounded theory begins with “coding” the data. As pointed out by Seaman [1], coding in this context is understood as the process of generating labels (or codes) to describe concepts and relevant features of certain passages of the data. The researcher searches the data for similarities and diversities, collecting a number of indicators that may point to multiple qualitative aspects of a potentially significant concept. The researcher designates labels to passages that seem relevant to an idea of interest in the study. Then, these labelled passages of the text are searched for patterns, and grouped together. These groups (or categories) are then examined in search of meanings, themes and explanations of the phenomena. The process of coding is laborious and starts with the investigator reading a section of the data once. Then the researcher goes back, reading again and assigning labels to pieces of the text. After that, he will read through it yet again, checking for consistency in the codes used, and making sure he has not missed any relevant information. The pieces of the text that receive a label may vary in size, and the same piece of the text may be coded under different labels. Also, not everything in the data needs necessarily to be assigned a code. The researcher may have a set of pre-formed codes to start with, which would be developed from the goals of the study, the research questions, and pre-established variables of interest [1]. On the other hand, codes may be post-formed if the study objectives are very open and unfocused. In both cases, the researcher can always add new codes as the study progresses, as well as delete, modify, merge, or subdivide them. The resulting set of codes often has a structure, which contains codes and sub-codes. To illustrate this idea, in this study the researcher found that a number of passages in the emails were related to meetings. These passages were then labelled under the code “meetings”. However, re-reading the passages under “meetings”, the researcher identifies 10
a possibility of another sub-division. So the data is re-arranged and sub-groups formed: passages regarding “organizing the meeting” itself, like for example confirmation of presence, suggestion of dates or time, formed one of the sub-groups. Passages related to “meeting content”, like for example listing priorities for discussion, formed a separate subgroup. The table below exemplifies the structure of the category “meetings” with the related sub-groups: Meetings Organizing the meeting Request
of
confirmation
Meeting content
Confirmation
Suggestion
of presence
of
Confirmation of time
Priorities for discussion
dates/time
Table 1: Example of codes and sub-codes
Strauss and Corbin [10] defined what is called the “paradigm model” in grounded theory. In this model subcategories are linked to a category in a set of relationships representing causal conditions, phenomenon, context, intervening conditions, action/interaction strategies, and consequences. As pointed out by the authors the paradigm model (Figure 2) enables the researcher to think systematically about the data: Causal conditions
Phenomenon
Action/interaction strategies
Context
Intervening conditions
Consequences Figure 1: The paradigm model
In the paradigm model the incidents or events that lead to the occurrence of the phenomenon are referred as the Causal conditions. Phenomenon represents the central idea or event. Context refers to specific properties of a phenomenon and also to a series of particular conditions that affect the action/interaction strategies. Intervening conditions can influence by facilitating or constraining the action/interaction strategies, in a particular context. Action/interaction strategies refers to the ways in which the 11
phenomenon is managed, handled, carried out and responded to, in a certain context and under specific conditions. Consequences are the outcomes. There are other strategies that are used by researchers in grounded theory [14]. Writing memos is performed during the process of coding, when the researcher records his observations as the analysis proceeds. The goal of writing memos while coding is that any “preliminary hypothesis” formulated by the researcher will not be lost. Also, in this process additional questions may emerge and the data is then re-examined, or the researcher may seek new data for elucidating aspects of the emerging theory (as mentioned above this is called theoretical sampling). Theoretical saturation occurs when new categories are no longer found in the process of coding. So, once theoretical saturation is reached additional data collection becomes unproductive, and the researcher begins then, to search for relationships between the categories. The categories that have achieved saturation will receive a definition, and the researcher will then try to integrate the categories, establishing relationships between them. As a final result of this whole process, the researcher will formulate a proposition that insightfully describes the phenomenon under study. Strauss and Corbin [10] present a list of seven criteria to be used when evaluating the research process in studies using grounded theory. It is important to mention that the authors emphasise that these criteria are guidelines that should not be read as evaluative rules, since new areas of investigation may require that procedures are adapted to fit specific circumstances of the research. This set of criteria was used in the evaluation of the grounded theory methods employed in the Approach A and the results are summarised in Section 3.6. of this paper. The table below summarises a set of questions to be formulated when examining a grounded theory study:
12
Criterion 1
Are the concepts generated?
Criterion 2
Are the concepts systematically related?
Criterion 3
Are there many conceptual linkages and are there categories well developed? Do they have conceptual density?
Criterion 4
Is much variation built into the theory?
Criterion 5
Are the broader conditions that affect the phenomenon under study built into its explanation?
Criterion 6
Has process been taken into account?
Criterion 7
Do the theoretical findings seem significant and to what extent? Table 2: Set of criteria to evaluate the empirical grounding of the study.
Disadvantages of using grounded theory include the fact that the analysis process is not easy for a beginning researcher and it is still considered a very subjective process, relying a great deal in the researcher’s abilities. As pointed out by Seaman [1] the literature still lacks specific guidance for the intellectual process of finding patterns in the data. The analysis of qualitative data is often tedious and extensively time-consuming. Nevertheless, benefits of using qualitative inductive methods such as those in grounded theory include the ability to derive theory from within the context of the data collected. These methods force the researcher to explore the complexity of the problem, and often produce a richer and more informative outcome. In grounded theory, any propositions formulated by the researcher must be clearly and strongly supported by the data, which would allow other researchers to follow the paths taken in the study.
The rationale for investigating the use of grounded theory methods on process modelling relies on the fact that 1) the methodology provides a systematic set of procedures to be used in the analysis of qualitative data, 2) process modelling still lacks an effective methodology for generating a descriptive software process model.
13
3. Research Design Given the potential of applying grounded theory methods to help with some of the problems present in process engineering, the study was designed with the following goal: To investigate the use of grounded theory methods in the development of a descriptive software process model to determine if the methods can contribute to improving current practices in software process modelling. We formulated three hypotheses to be tested in the study: H1. The grounded theory methods can be used to analyse descriptive process data. H2. An inexperienced software engineer using grounded theory methods can produce a process model from descriptive process data. H3. The model resulting from the analysis using the grounded theory methods will be equivalent to a model produced by an experienced process modeller. The details of the research design including the participants, data, procedures, approaches and analysis used are described below. 3.1. Participants The participants in this study were: Researcher A – a psychologist with limited background in software engineering, but with knowledge of qualitative research methodologies and experience in the use of grounded theory; Researcher B – a software engineer, with a solid background in software engineering and three years experience in process modelling. 3.2. Data Three sources of evidence were used in this research: a series of e-mails exchanged by project participants, documented reports and agendas and minutes from project meetings. All the data referred to one single project being developed by a team of professionals 14
from an organisation. The data was obtained with the consent of a small software development company based in Sydney. It is important to mention that in order to enable the comparison of the models and ensure that each model would derive from the same data, neither researcher sought feedback from the process participants on their models. Although we are aware that practices in qualitative research and process modelling emphasize the importance of obtaining such feedback, we considered that both models would change significantly upon such feedback. Two more uncontrollable variables (i.e. skill of the modeller in obtaining and interpreting the feedback and the feedback itself) would be added to the experiment, making comparison of the models even more difficult. We acknowledge that in a nonexperimental context both approaches would certainly have sought continuous feedback on the emerging models. 3.3. Procedures To test H1 the psychologist was asked to use grounded theory methods to analyse the data. The researcher’s previous experience with grounded theory ensured that difficulties arising from any inexperience in the use of the method itself would not be present. To test H2 the psychologist was asked to produce a process model from the data analysed using grounded theory methods (Model A). The psychologist was given some basic training in process modelling concepts to allow the process model to be captured in a suitable language. We ensured that the researcher selected to develop process models using grounded theory methods had a limited knowledge of software engineering. To test H3 an experienced software engineer was also asked to produce a process model (Model B) using the same project data that was given to the psychologist. The experienced engineer recorded the process model in the same language as the psychologist. The resulting models were compared to see if they were equivalent (see Analysis section for details of equivalence). 15
In order to control external (inter-personal) influences on the models, Researchers A and B did not exchange information or discuss the study during the model building phase. Both researchers recorded the time spent in the study by filing time sheets for their respective tasks. Task profiles included data organisation, data analysis, model building, and model refinement. 3.4. Analysis To determine equivalence, aspects of the models considered included semantic content (meaning), structure and decomposition hierarchies. The comparison was done through mapping of process concepts across the models. A process concept is defined as a singular piece of the process, which may be represented by a single entity or a collection of entities. The researchers task was to jointly analyse the models, searching for equivalence of process concepts. Equivalence would occur when the concept was identified in both models. 3.5. Spearmint Both approaches used the Spearmint process modelling tool to capture the process models created [15]. Spearmint is a graphical modelling tool supporting the Spearmint process modelling language. The concepts in the language are: activity, artifact, role, tool, the relationships of product flow between activities, and artifacts and decomposition of activities into sub-models. Spearmint is designed to support the descriptive modelling of large, complex, real-world software processes but it does not provide any explicit support for process elicitation or model creation. It is therefore designed to support the definition of the kinds of models involved in this study. Expressing the models in the same language simplified the comparison of the models. Researcher A used Spearmint and the process modelling language on a small process modelling project before this study to gain familiarity with the tool and the language. 16
However, though the tool and concepts were not completely new to her, she would by no means be considered expert in their use. Researcher B had been actively involved in the definition and use of process modelling languages and tools (including Spearmint) for about three years before the start of this project. She had an intimate knowledge of the features and process modelling language used in Spearmint. 3.6. Approaches: Grounded theory and ad hoc In this study we compared the models resulting from the use of two approaches: grounded theory methods and ad hoc experiential approach. Approach A was carried out by the psychologist, and Approach B was carried out by the software engineer.
Approach
Data
Tools
Researcher A
Grounded Theory +
e-mails + minutes
Spearmint
(psychologist)
Nvivo
Researcher B
Ad Hoc/ Experiential
e-mails + minutes
Spearmint
(software engineer)
Table 3: Summary of approach, data and tools
Approach A – Grounded theory methods The Approach A comprised two major stages: I - Grounded theory methods were applied in the analysis of textual data. II - Categories identified by the analysis were adapted into building the process model. I - Grounded theory analysis In this approach the textual data was analysed using the procedures of the grounded theory methodology, with the support of NVivo, a software application designed to support qualitative research [16]. The data analysis considered the emails, project documents and project agenda and minutes. The researcher assigned labels to parts of the
17
data describing an idea or theme that was expressed in specific pieces of the text. The data classified under the labels were then searched for commonalities and diversities and grouped together when patterns were found. These groups were then examined for explanations that could contribute to the understanding of the development process, and relationships between these groups were established following the paradigm model (for an overview of some of the categories created see the Appendix). The grounded theory procedures were primarily evaluated against the set of criteria previously described in Table 2: (1) the concepts identified were grounded in the data, and more than one source of evidence was used, (2) the linkages between the concepts were systematically carried out, (3) categories and subcategories were linked as well as links were established between several categories using the paradigm model described in the background section, (4) a number of conditions, action/interactions, strategies were identified, (5) cultural specificities (such as the use of informal ways of communication, informal relationships, the pressure involved in their need to produce quality software quickly and competitively) were understood as having an effect although those were not always explicitly represented in the model itself, (6) the process was the centre of the conceptualisation of the study, (7) the significance of the findings is explored in this paper with the comparison of the model produced by the psychologist (based on the grounded theory analysis) versus the model generated by a software engineering expert. II - Model building The codes and sub-codes identified were then used to generate the activities and the artifacts in the construction of the process model. In the “meetings” example (summarized in Table 1), the passages coded under “meetings” become the activity “meetings”. This activity was decomposed in sub-activities: “organizing the meeting”, “organizing meeting content” and “send feedback on time/dates”. Artifacts related to 18
these activities were: (a) “list of priorities for discussion” as an output of the activity “organizing meeting content” (b)“request of confirmation” and “suggestion of dates” related to the activity “organizing the meeting”. And (c) the activity “send feedback on time” has as output the artifact labelled “confirmation of time” which incorporates both sub-codes “confirmation of presence”, and “confirmation of time” from Table 1. Figure 1 below illustrates these artifacts and activities:
Figure 2: Example of how labels become activities and artefacts.
The structure of the set of codes and sub-codes identified by the grounded theory analysis was followed and sometimes adapted in the construction of the process model. The result was a process model that presented a number of decompositions or layers. Roles were designated, depending on who was sending or receiving the emails, and any mention of who was the person responsible for a specific activity in the textual data. Although supported by the language, no tools (or resources) were identified by the researcher. Approach B – Ad Hoc Researcher B used an ad hoc approach to creating the process model that used experience to interpret the e-mails and transform them into a process model.
19
Figure 3 below presents a graphic representation of the study design. It shows that grounded theory methods (Approach A) were applied in the analisys of process data from which Model A was developed. Ad hoc methods (Approach B) were applied in the analysis of the same process data and Model B was generated. The comparison of the two models generated the summary of differences that we discuss and explore in this paper.
Figure 3: Study design
4. Results In this section we present the results of the comparison of Model A and Model B. Grounded theory has as its main objective the identification of key elements influencing the phenomenon, and the categorization of the relationships of those elements. However, Researcher A reported difficulties in conceptualising certain passages of the process data, more specifically passages containing specific technical terms. As the method greatly relies on interpretation, this would mean that difficulties in deciphering the data would consequently affect the analysis of these passages. The strategy used by Researcher A to overcome this problem was to acknowledge that those were technical concepts, and represent them with little abstraction. This means that although categories were identified
20
in the data, some of them were possibly poorly developed. Despite these difficulties, Researcher A considered that it was possible to apply the methods of grounded theory to the data, satisfying hypothesis 1, and Researcher A was also able to produce a process model, satisfying hypothesis 2. Comparison of the models, however, showed that the models produced by the different approaches could not be considered equivalent, refuting hypothesis 3. Table 3 below presents a summary of the models produced by the researchers regarding effort, size and structure:
Researcher A
Effort
Size
Structure
24 hrs
106
5 levels:
10 m Project Documents
Requirements
Details
Details
Development
Management
Create
Meetings
Work on development
tasks
Dissemination of information
Details
Content
Queries
generator Details B
12 hrs
136
Screen
Details
Details
tool Details
Details
3 levels:
58 m Project Documentation
Requirements
Technical
Meetings
Management
Details
Details
development Details
Details
Details
Table 4: Summary of models produced (time, size and structure)
The data shows that it took almost twice as long to produce the Model A which was significantly smaller (in terms of number of entities) than the Model B. The models also differ in decomposition structure. The two models agree on the first two levels of abstraction but Model A has more levels of abstraction.
21
Table 5 shows the relative sizes of the models in terms of the number of activities, artifacts, roles and tools modelled. It can be seen that Researcher B’s model had significantly more activities than Model A (80 compared to 50) and that the Model A contains no tools (even though the concept of tools is available in the language and tool). Models
Total
Total
Total
Total Roles
Total Tools
Entities
Activities
Artifacts
A
106
50
51
5
0
B
136
80
43
8
5
Table 5: Comparison of size in terms of numbers of activities, artifacts, roles and tools.
The models differ considerably in their detail, making analysis difficult. Examples of the representation of the requirements process in each model are shown in Figure 4. We can see that although both sub-models claim to represent the requirements process they do so very differently. It was therefore painstaking to identify equivalence in the models.
22
Model A – “Work on Requirements”
Model B – “Requirements”
Figure 4: Model A and Model B for the Requirements process
23
One of the complicating factors in the analysis was the vast difference in relationships between entities that had been identified. For example, there is little correspondence in the relationships of the sub-models shown in Figure 4. Only two of the relationships in the grounded theory sub-model have corresponding relationships in the ad hoc sub-model (shown with bold lines). Analysing the entities shows only slightly more equivalence. We can see in Figure 4 that the correspondence between some entities is quite close – for example, the “client”/“customer” “identifies the requirements”/“provides requirements” to produce a “list of requirements”/“customer requirements” groups shown in grey on Figure 4. Examples of this kind of correspondence are, however, rare in the models (only 4 entities in these two sub-models). There are, however, many more entities in one model that correspond to entities in other parts of the other model. For example, “create task” and “list of tasks” in Researcher A’s “requirements” sub-model correspond to “identify tasks” and “task plan” in the “project management” sub-model of Model B. Another example of this are the activities associated with meetings and writing minutes in Researcher A submodel shown in Figure 4 that were included in the “meetings” sub-model of Model B. Entities that fall into this category are circled on the two sub-models. Another kind of agreement exists where an entity has been modelled as an activity in one model but as an artifact in the other model. For example, Model B contains an activity “request new task” where the Model A has an artifact “request for a task”. A final agreement exists when one entity in one model maps to a collection of entities in the other model. This occurs often where one modeller has represented a process concept as an activity/artifact pair (for example, “suggest time for meeting”/”suggested time for meeting”) and the other modeller has simply included it as one activity or artifact (i.e. “suggestion of dates”).
24
To characterise these kinds of agreement we defined a process concept. As mentioned before, a process concept is a singular piece of the process independent of decomposition structure or surrounding relationships. A process concept may be represented by a single entity or a collection of entities. Agreement of process concepts occurs when the concept is represented in both models, regardless of where it is represented or if it is represented as one entity or a group of entities. Using the above analysis, we were able to identify some agreement amongst the model’s entities and therefore begin to analyse the reasons for differences in the models. Entities agreeing with process concepts somewhere in the other model are circled in Figure 4. Analysis showed that the models agreed on 64 independent process concepts. Model A contained a further 43 concepts not present in Model B and Model B still 72 further concepts not present in Model A. The total number of independent concepts in the models was 179 (64 agreed concepts, 43 more from Model A, 72 more from Model B). The agreement was only 33% (64 agreed concepts/179 total concepts). An interesting observation is that the models agreed much more on the concepts to do with “meetings” than on any other topic. This was perhaps because the source of the data was partly meeting agendas and minutes, making meeting concepts more explicit in the data. A breakdown of the major causes of non-equivalence between the models revealed the following patterns: •
Model A has more layers of decomposition.
•
Model A failed to identify 58 concepts identified by the experienced process engineer.
•
Model A contained 20 entities represented as instances rather than abstractions.
•
Model B was more detailed.
•
Model B failed to identify 10 concepts identified in Model A.
•
Model A included some behavioural modelling.
•
Model A took longer to produce. 25
The next section will discuss some possible reasons for these differences.
5. Discussion Model A and Model B were independently produced based in the individual analysis of the two researchers and consequently the models were compared as independently produced artifacts. The analysis of the models shows that they are vastly different in their interpretation of the data and identifies some patterns in the differences. This inspires the question “Why do they differ?” We understand that grounded theory methods point out to a significant interactive context, where the analyst has crucial influence in the model building. So, is it a matter of different individuals looking at different aspects of the phenomenon or can any of the differences be attributed to particular aspects of the different approaches used? Such an analysis can help us to identify where it may be possible to apply grounded theory methods successfully in process modelling. During analysis four major reasons for the differences between the models were established: a) The use of different methodologies. b) The difference in background and experience. c) Human error. d) Misunderstanding of the language/process modelling concepts. In this section, we discuss in more detail the differences resulting from the use of the methodologies, and background experience (items a and b). We consider that human error and misunderstanding of the language may still be likely to happen independently of the method, and are issues that would not be addressed just with the use of grounded theory methods. Also, human error and misunderstanding account for only a small amount of the errors (human error: about 3 of the 58 concepts missed in the Model A, misunderstanding: 3 entities of 106 entities in the Model A))
26
5.1. Use of different methodologies As with any kind of conceptual modelling, process model decomposition can simplify the understanding of a process model by reducing its complexity at any particular level. In Table 4 we can see that Model A contains five levels of decompositions while Model B contains three. The decomposition hierarchies agree almost exactly (with the exception of the placement of the “meetings” category) to the level that Model B has been decomposed. Under discussion it was agreed that the extra levels of decomposition contained in Model A were valid and complete representations of the structure of the data. We believe these extra levels emerged because the grounded theory methods specifically encourage the researcher to generate abstractions grounded in the data, as opposed to the ad hoc approach which provides no technique for generating the decomposition hierarchy other than the modeller’s own initiative. It was also found that the models sometimes differed significantly in the levels of detail of information represented. Twice, as demonstrated in Examples 1 and 2 below, the inclusion of many individual items in Model B is compared with a single abstraction of the same items in Model A. We believe this to be because grounded theory methods emphasise the identification of patterns in the process but do not emphasise the search for isolated but still important aspects of the process (that is, concepts that appear just once in the data). During joint re-analysis the process engineer was adamant that many concepts that appear only once in the data are still key to the accuracy and completeness of the process model. The argument to support this is that not all process concepts are repetitive, many occurring only once during a project (the data corresponded to just one project). Nonetheless, these concepts are still important and should be included in the model. This raises the question that our study lacked the data gathering from more than just one project. Having data from various projects would help to ensure that activities that occur just once in a project lifetime would appear more repetitively in the data, and have then a chance of being identified using grounded theory methods. 27
Example 1. Documentation The activity “create document” found in Model A corresponds to many separate specific activities encountered in Model B. Table 6 below shows the correspondence between the activity “create document” (from Model A) and “Create System Requirements”, “Create Specific System Requirements” and “Create System Architecture” (from Model B): Model A
Model B
Activity
Activity Create System
Create document
Requirements Create Specific System Requirements Create System Architecture, etc
Table 6: Documentation/Example 1
Example 2. Requirements Again, in the requirements model Researcher A abstracts the 9 activities described by Researcher B, grouping these activities under “meetings discussions”. Researcher A identified and represented the meeting itself, while Researcher B identified and represented relevant activities happening in the meetings and artifacts produced by the meetings. Table 7 below illustrates the activity “meetings discussion” (from Model A) and the equivalent activities in Model B: Model A
Model B
Activity
Activity Distribute consolidated requirements doc Finalise software requirements Consolidate requirements Add more detail
Meetings discussion
Sign off SR Doc Circulate Systems requirement doc Request more detail Query requirement
28
Formalize requirements for new system Table 7: Requirements/Example 2
Example 3. Missed Concepts in Model B Occasionally, concepts were also missed in Model B possibly due to a lack of a structured methodology in the model building. The unplanned and unsystematic analysis of the reports and/or emails made it difficult to ensure that all the data had been considered, which resulted in sporadic overlooking of artifacts and activities in Model B. A total of nine activities were not represented in Model B, and those were activities in Documentation, Technical Development and Meetings. Table 8 below shows examples of some concepts that were overlooked in model B: Model B Review of document Templates Creation Work on Templates Integration into System Meetings with Customers Table 8: Missed Concepts in Model B/ Example 3
Another final difference in the use of the methodologies was that Approach A took almost twice as much time as Approach B to produce the models (approximately 24 hours compared to approximately 13 hours). This is obviously partially due to unquantifiable differences in the researchers (for example, English is Researcher A’s second language which means she took more time to read and interpret the e-mails), but it does support the view that applying the grounded theory methods can be a painstaking and time-consuming process. 5.2. Differences in background and experience It has already been noted above that the grounded theory methods do not emphasize the identification of isolated concepts in the data. The process engineer relied heavily on
29
experience to decide if isolated concepts should be included in the abstract model or not. One rule for deciding whether to model a particular activity, for example, was the knowledge that “if it is likely to be done in every project, it should be explicit in the abstract/generic model”. Lacking experience, Researcher A did not have this knowledge to call on to compensate for the failure of the method to identify these concepts. The result was that a number of important (but isolated) activities were not represented in Model A. We identified a total of 49 activities that were missing in Model A. This effect is especially apparent when the data content is more technical, for example in the technical development process. On these occasions Researcher A is more likely to miss artifacts and activities, and Researcher B’s models are much more detailed. As a strategy to compensate, Researcher A often presents a literal interpretation of the data when it becomes hard to generate an abstraction, just representing repetitive activities in the same way they appear in the data, as shown in Example 4 below. It is interesting to note that Researcher B employs the same strategy when she is not so sure about the content. Nevertheless, as Researcher B has a more consistent background in process engineering, this appears less in her models. Example 4. Differences in the representation of Technical Development Table 9 below shows the literal representation of activities of technical development from Model A and the single activity they are equivalent in Model B. Model A
Model B
Activity
Activity
Work on content generator
Work on technical input
Work on screen tool component Work on queries Table 9: Technical Development/Example 4
Example 5. Differences in definitions of roles Another example of using background experience to get a more detailed model is found in the identification of roles. While Researcher A identified five roles, Researcher B 30
identified eight. Moreover the Researcher A and B presented different definitions for the role of a “developer”. For Researcher B developer had a more general meaning and for Researcher A developer was the person performing the tasks, doing the coding. Table 10 below compares roles identified in models A and B: Model A
Model B
Developer
Internal developer External developer
Client
Customer
Project manager
Project manager
Software Engineer
System requirement developer Mock-up page creator
Technical project manager
Task manager Administrator
Table 10: Roles/Example 5
Nevertheless, there were also some positive aspects, in the different background of the researchers. These distinct backgrounds allowed the identification of singular aspects in the data. For example, as a psychologist, Researcher A emphasizes the representation of the behavioural aspects of the process, offering a unique dimension to the process model not present in Model B. This is shown in Example 6. Both researchers agree that this kind of perspectives is an accurate interpretation of the data which could prove very useful for process modelling. Example 6. Management In the Management layer of the model Researcher A represents information flow, modelling the behaviour established by the development team rather than the content of the information. Model B, on the other hand, models the actual activities and artifacts produced and consumed by these activities. The result is that the two models contain very different activities and artifacts, even though they were obtained from the same set of data and the higher level, Management, referred to the same process.
31
Model A
Model B
Activity
Artifact
Activity
Artifact
Meeting discussions
Actions to be done or
Discuss revised pricing
Revised pricing
Record timesheet
Effort information
discussed Performing tasks
Actions done
information Distribution of
Email announcements
information
List of priorities for
Estimate effort tasks
Effort estimates
discussion Table 11: Management/Example 6
6. Summary and future work The comparison showed that the two models were very different. Table 12 gives a summary of common patterns identified in the models and the causes to which they were attributed.
32
Features \ Model
Model A
Model B
Completeness Missing activities and artifacts
- GT method misses isolated
- Lack of a structured methodology
concepts
that ensures coverage of the data
- Lack of experience means researchers cannot compensate for method shortcoming Missing levels of decomposition
- Lack of structured methodology that supports abstraction into a refinement hierarchy
Failure to model behavioural aspects
- Lack of experience - Lack of training in behavioural modelling
Accuracy Concepts included that were inaccurate
- Lack of experience in software engineering - Inability to interpret technical discussions
Effort Model A took almost twice as long as
- carrying out GT is a time-
Model B to interpret
consuming task
Table 12: A summary of the major results.
Approach Grounded theory
Positives
Negatives
- explicitly encourages abstraction
- misses isolated but important
to a refinement hierarchy
concepts
- ensures coverage of all the data
- still relies on experience and skill to interpret the data
Ad hoc relying on experiences
- more ability to judge the
- coverage is not ensured
importance of concepts, even if they
- decomposition is not encouraged
only appear in the data once (but
- relies solely on experience and
not methodically)
skill of process engineer
- ability to understand technical discussions in the data
Table 13: A summary of the positives and negatives of each of the approaches used.
Table 13 gives a summary of the positives and negatives of each of the approaches used based on their performance in our study.
33
Based on this analysis, it seems unlikely that the use of grounded theory methods alone can compensate for experience in process modelling and software engineering. However, it does suggest that an experienced process engineer could utilise grounded theory methods to help ensure coverage of the data and to encourage decomposition of the process model. As a first step in the development of a defined process for extracting process models from descriptive qualitative data, grounded theory methods have been shown to offer advantages that are worth pursuing in the future. It is likely that modifications to the method will be necessary to overcome its disadvantages.
For
example, the effort data suggests that grounded theory is a time-consuming and painstaking method which may add to the amount of time needed to construct a process model. The tradeoff between cost, repeatability and coverage will need to be explored. This is the first study of this kind known to the authors. Future work needs to address some of the open questions identified in this study. One issue that has not been addressed here is that of quality, due to the lack of validated measures of process model quality. One way to address this problem would be to use independent expert opinion to judge the quality of any models generated. This way it could be determined if methods such as grounded theory can contribute to the quality of process models. Another interesting issue is that of making the process more repetitive. With the lack of a real method, there is no guarantee that two experts would have produced equivalent models in this study. However, methods such as grounded theory, in which the model is generated from the data, could possibly provide traceability and a more repetitive approach. Following on from this study, an experiment to compare an experienced engineer using grounded theory methods to an experienced engineer not using grounded theory is suggested. Overall though, this study has shown that qualitative methods have the potential to offer advantages to software engineering and that further refinement should be a fruitful area for research.
34
7. Limitations of the study Limitations of this study include the difference in the professional background of the participants, which clearly influenced their way of interpreting the data. There are specific aspects in the production of the models that were the result of the participants’ specific background (psychologist versus software engineer). It would have been interesting in this study if we also had a process model made by a software engineer researcher using grounded theory. Another important factor to mention is the fact that Researcher A has English as her second language. This may limit the validity of the results of effort, since the researcher may require extra time in reading and understanding the data.
8. Conclusion Our research shows that grounded theory methods can be applied to process data and that a psychologist inexperienced in software engineering could produce a process model based on her analysis using grounded theory methods. The model, however, differed significantly from that produced by an experienced process engineer from the same data. One of the main differences in the models, the inability of grounded theory methods to identify isolated but important concepts in the data, suggests that process engineers should not rely on grounded theory methods alone to analyse process data. On the other hand, grounded theory methods could help to ensure coverage of data and encourage solid decomposition hierarchies in process modelling. The benefits observed in the use of grounded theory methods offer evidence for the future development of these techniques as an aid in software engineering process modelling.
Acknowledgements We would like to thank Allette Systems (Australia) for providing the context for this study
and
the
Fraunhofer
Institute
for
Experimental
Software
Engineering,
Kaiserslautern, Germany for supplying the Spearmint modelling tool. Funding for this 35
project was provided through a collaborative SPIRT grant by the Australian Government (DETYA), AccessOnline Pty Ltd Sydney and Allette Systems (Australia).
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.
12. 13. 14. 15.
Seaman, C., Qualitative Methods in Empirical Studies in Software Engineering. IEEE Transactions on Software Engineering, 1999. 25(4): p. 557-572. Curtis, B., M.I. Kellner, and J. Over, Process Modelling. Communications of the ACM, 1992. 35(9): p. 75-90. Bandinelli, S., et al., SPADE: An Environment for Software Process Analysis, Design and Enactment. 1994, GOODSTEP. Jaccheri, M.L., P. Lago, and G.P. Picco, Eliciting Software Process Models with the E3 Language. ACM Transactions on Software Engineering and Methodology, 1998. 7(4): p. 368-410. Strauss, A. and J. Corbin, Grounded theory methodology: an overview, in Strategies of Qualitative Inquiry, N.K. Denzin and Y.S. Lincoln, Editors. 1998, Sage Publications. Hoeltje, D., et al., Eliciting Formal Models of Software Engineering Processes. Proceedings of the 1994 CAS Conference, 1994. Burrel, G. and G. Morgan, Sociological Paradigms and Organisational Analysis. Elements of the Sociology of Corporate Life. 1979, London: Heinemann. Crotty, M., The Foundations of Social Research: meaning and perspective in the research process. 1998, Australia: Allen&Unwin. Glaser, B. and A. Strauss, The discovery of grounded theory: strategies for qualitative research. 1967, New York: Aldine Publishing Company. Strauss, A. and J. Corbin, Basics of qualitative research: grounded theory procedures and techniques. 1990: Sage Publications. Urquhart, C., Exploring Analyst-Client Communication: Using Grounded Theory Techniques to Investigate Interaction in Informal Requirements Gathering, in Information systems and qualitative research. Proceedings of the IFIP TC8 WG 8.2 International Conference on Information Systems and Qualitative Research, A.S. Lee, J. Lieberau, and J.I. DeGross, Editors. 1997, Chapman & Hall: Philadelphia, Pensylvania, USA. Orlikowski, W., Improvising Organizational Transformation Over Time: A Situated Change Perspective. Information Systems Research, 1996. 7(1): p. 6392. Pidgeon, N., Grounded theory: theoretical background, in Handbook of Qualitative Research Methods, J. Richardson, Editor. 1996, BPS Books: Leicester, UK. p. 75-85. Pidgeon, N. and K. Henwood, Grounded theory:practical implementation, in Handbook of Qualitative Research Methods, J. Richardson, Editor. 1996, BPS Books: Leicester, Uk. p. 86-101. Becker-Kornstaedt, U., et al., Support for the Process Engineer: The Spearmint Approach to Software Process Definition and Process Guidance. Proceedings of 36
16.
the 11th Conference on Advanced Information Systems Engineering CAiSE'99, 1999: p. 119-133. NVivo. 2000: QSR International Pty Ltd. www.qsrinternational.com.
37
Phenomenon Causal Conditions
Development of the System
Management
Production of Documents
- clients’ request of a system - company’s wiliness to get the job
- requirements identification - understanding of what is necessary to build the system - communication between team members - development team technical skills
- need of communicating and exchanging information with team members and client - organizing team members - need to keep track of what is happening in the project
- communicating ideas to team/client - organization of technical information - need to keep track of what is happening in the project
- communication between development team at the company and client. - systems specifications
- work of the development team at the company - client input - systems specifications
- project’s dissemination of information - organization of meeting (content and the arrangements for meeting itself)
- documents’ designing - available standards of documentation
- specificities of culture of small organization (pressure to remain competitive, informal relationships/communication, others).
- specificities of culture of small organization.
- specificities of culture of small organization.
- specificities of culture of small organization.
- client/development team exchange of ideas - meetings between members of the development team - task creation - draft/ production of requirements document
- client/development team exchange of ideas - questions/feedback about development - meeting discussions - task request - work on the system - test the system - rework - exchange emails about development - examination of requirements - research of other products - follow a development plan - creation of tasks - report on the development plan - design and write part to be implemented - test and identify improvements - reports and documents - list of actions to be done/list of actions done - task - part is developed and implemented into system - system
- meeting discussions - exchange/ distribution of information
- client/development team - meeting discussions
- emails requesting confirmation/ suggestion of dates - list of things to be done and things discussed - suggestion/confirmation of dates
- creation of documents templates - produce a documentation list - draft documents and review of draft - research standards for ideas of documents content
- announcements - list of priorities for discussion - meeting minutes - reports of time spent in activities
- documents templates - documents
conditions Interactions
Actions/
Intervening
Context
Gathering of Requirements
Consequences
Strategies
- group discussion - produce a list of improvements - organize list of actions - develop a plan
- identification of requirements - requirements document - list of task - minutes of discussions
Appendix: Example of categories identified in the grounded theory analysis.
38