Qualitative data in modeling and simulation: A survey among interdisciplinary scholars Roman Seidl ETH Zurich, Department of Environmental Systems Science – USYS TdLab Universitaetstrasse 22, CH-8092 Zurich, Switzerland Email:
[email protected]
Abstract. Empirical data obtained with social science methods can be useful for informing agent-based models, for instance, specifying behavioral rules, fixing the attitudinal profiles of heterogeneous agents, or validating simulation results. In this paper, I present the preliminary results of a survey among interdisciplinary researchers. The survey asked for a definition and related functions of qualitative data in modeling and simulation. Furthermore, challenges linked to the inclusion or implementation of qualitative data were investigated. The results show that often, qualitative data are defined in both ways, by describing in positive or in negative terms what they are or are not, respectively, and comparing them to quantitative data, which (allegedly naturally) carry a number. Moreover, three types of functions of qualitative data for modeling can be identified, and the major challenge appears to be the appropriate quantification of these data. Keywords: survey, qualitative data, modeling, simulation, functions, challenges
1
1 INTRODUCTION
Qualitative data can be defined in different ways and can play various roles in their implementation in models (conceptual and simulation). For instance, Yang and Gilbert’s [1] basic definition is that “qualitative data cannot be converted into numeric values without distorting the information they contain.” Their colleagues may have other notions of qualitative data. Moreover, the functions of qualitative data for modeling and simulations may also vary [2], and different methods [3, 4] and challenges may be linked to these functions [5]. Robinson and colleagues [4] illustrate different empirical methods to collect qualitative data, including surveys, participant observation, experiments in the field or the laboratory, companion modeling, geographic information systems (GIS), and remotely sensed spatial data. The method applied may depend on the disciplinary background and experience of a researcher or a research team. However, it would be interesting to know more about how often which methods are used and for what kinds of functions. For instance, surveys or databases may be used for more explorative investigations, whereas “softer” methods may be applied to elicit details on rules of agent interaction or decision making [6, 7]. Moreover, the Z. 1, 2011. © Springer-Verlag Berlin Heidelberg 2011
validation of a model’s plausibility and simulation results may demand other methods [8]. Future scenarios are often used in several research and modeling communities [9– 11]. These scenarios can be in the form of more quantitative variants but often include qualitative parts, denoting a relative increase or decrease in exogenous variables, such as environmental consciousness or non-economic valuing of biodiversity. Several challenges in data acquisition and implementation await the modeler. For instance, acquiring necessary data for model development versus as input for a model may require different types of data and different elicitation techniques [12]. Some scholars have argued that there are ways to combine qualitative with quantitative evidence and formal methods. For instance, a recent special issue in the Journal of Artificial Societies and Social Simulation “amply demonstrates that there is a wellfounded way to combine the qualitative and quantitative—via agent-based simulation” [13, section 1.3]. As Edmonds [13, section 1.4] describes, the process of how the design of simulations is informed by qualitative data often happens in an informal way, neither systematic nor well documented. He notes that the papers published in the special issue use two approaches to formalize this process, as follows: “(1) by constraining the process of elicitation/analysis or (2) via the use of prior structures/frameworks.” [13, section 1.4] In this contribution, I aim to at least partly illuminate the issue of what constitutes the definitions, functions, and challenges of qualitative data in modeling and simulation by referring to the results of a survey among researchers.
2
METHOD
2.1
Sample
The survey was conducted from March 15 to 28, 2016 via an online questionnaire using SurveyMonkey (surveymonkey.net). The sample was recruited through a search in the Scopus database (http://www.scopus.com), including authors of articles published in two journals, GAIA (Ecological Perspectives for Science and Society)1 and AMBIO: A Journal of the Human Environment2 (those who were listed as authors, starting in 2002). This sample was assumed to be sufficiently related to modeling and simulation (although not exclusively) and work with qualitative data. The link to the online questionnaire was then sent to the email addresses of these authors. In total, 2,643 addresses were used, of which 634 were no longer valid and the message could therefore not be delivered; thus, 2,009 potential participants should have received the request to participate in the survey. A total of 80 participants responded after two weeks (4% return rate), of which 74 cases were deemed suitable for further analysis. Gender was biased, with 50 participants (73%) indicating “male.” Five respondents did not indicate their gender. As intended, the participants showed different disciplinary backgrounds although the majority was categorized as environmental scientists
1 2
http://www.gaia-online.net/ http://link.springer.com/journal/13280
(see Table 1). Moreover, 27 (40%) participants reported belonging to more than one discipline. Table 1. Frequency of disciplines (more than one option could be selected; N = 67).
Environmental sciences 50
Social sciences 21
Engineering 14
Economics 7
Mathematics 5
Humanities 4
Physi cs 2
The additionally mentioned disciplines (N = 10) included agronomy, artificial intelligence, chemistry, education, environmental engineering, geography, information science, medicine, and systems ecology, and one participant noted “risk management, complex systems, research on interactions between systems, and CHANS”. The respondents worked in different sectors, but the overwhelming majority was involved in research and education (see Table 2). 3 This was expected as the recruitment was done through published scientific papers. Table 2. Sectors where the participants worked (three participants indicated more than one sector, N = 69).
Research and education (university, research center, etc.) 64
2.2
Administration and management in the public sector (government, agency, survey, etc.) 5
Private sector (consulting, industry, Other etc.) 3 4
Questionnaire and data analysis
Since the first item in the questionnaire was an open question, the answers had to be categorized. The question was “What is your definition of ‘qualitative data’? Please answer from your (disciplinary or practice) perspective.” The answers were categorized by coding terms across the answers that appeared repeatedly. First, the type of definition was investigated, that is, whether it stated what qualitative data were (by description and example) or were not. For instance, the (non) numerical nature of such data was highlighted rather often. Second, more detailed categories were generated bottom-up for each basic category. For the positive side, these were 1) relations/relational aspects, 2) scaling issues (indicating nominal or categorical measurement levels), 3) subjectivity (linking to judgments, values, etc.), 4) descriptive nature (in-contrast-to explanation), 5) verbal (or pictorial, etc.) representation of the data,
3
Four other options were added, as follows: consulting for the United Nations, NGOs and others; independent consulting on atmospheric chemistry and air quality; private (not for profit) associations mostly dedicated to research and knowledge transfer on the environment; and research to support the government.
and 6) definition by example (the respondents often cited examples of what they meant by qualitative data). To obtain an overview of the different functions of qualitative data, the second question was “What are the functions of qualitative data in your modeling and simulation approach? I use qualitative data ...” Eight statements were given as possible answers, to be rated on a Likert scale from 1 = “not at all” to 4 = “very much.” Besides comparing mean values, a factor analysis (Alpha factoring with Varimax rotation, see [14]) was conducted to know more about the structure of the functions. The third section used this same Likert scale to let participants rate how intensely they applied certain methods. In the fourth section, based on their experiences, the respondents were asked to state the challenges they encountered that were linked to the named functions. Here, more than one of eight possible options could be selected, for instance, “translation of (static) behavioral models or theories into (dynamic) simulation models.” A fifth section asked about what kinds of modeling approaches/methods the respondents applied. Again, more than one option could be chosen. For these sections, the respondents could always check the “none of the above” option and add their own functions, methods, challenges, and modeling approaches, respectively. Overall, most participants used the given options. Since the use of qualitative data in modeling and simulation would often mean crossing disciplinary boundaries (content-wise or methodologically), I also asked the participants, “[Do you] see [the] need for action to achieve better cooperation between your discipline and other disciplines to reach better simulation/research results?” At the end of the questionnaire, the following demographic questions were asked: the respondents’ gender, disciplinary background, and sector in which they worked. For all of these questions I conducted analyses as well as some structural analyses. Specifically I applied a factor analysis using Alpha factoring (with Varimax rotation) to reduce the number of dimensions regarding the functions of qualitative data. Three factors were revealed by this method. The next section summarizes the results.
3
RESULTS
In this section, I present some descriptive results obtained from the survey. A categorization of the answers to the open question on the definition of qualitative data reveals a basic distinction between positive and negative definitions. In other words, the participants use positive phrases about what qualitative data are or (often in the same sentence) are not, particularly in comparison to genuine numeric data. Further distinctions can be made by considering whether the negative definitions refer to the (basic) non-numeric nature of qualitative data, the non-measurability/-countability/quantifiability, or the metric scale (and whether statistics can be applied to the data). On the positive side, references to the relational aspect of qualitative data (with terms such as bigger, more or less valued, etc.) and the low level of measurement (scale) can be identified. Moreover, qualitative data are described as subjective (e.g., refer-
ring to expert statements or other judgments) or descriptive (rather than explanatory) and expressed in verbal or pictorial terms. Fourteen participants use examples to clarify their notions of qualitative data. Fig. 1 shows the frequency of the statements within the different categories. The “rest” category contains four statements that could not be assigned otherwise. An interesting result is that obviously, different colleagues have varying notions of scales and levels of measurement. For instance, one participant indicates that qualitative data have “no interval scale level and [are] more nominal than ordinal”; another states that these are data “that [are] not of a continuous, ordinal, or nominal data type.”
Fig. 1. Quantification of qualitative data in this survey (explanations in the text).
The ranking of the potential functions of qualitative data shows only gradual differences. Table 3 shows the mean values for all given statements (the ranking scale is from 1 = not at all to 4 = very much). This means that most often, qualitative data are used for scenario development, better problem definition, and exploration of human behavior. To gain deeper insights into the structures of these statements, a factor analysis was performed.
Table 3. Rankings of the functions of qualitative data in modeling and simulation.
Function
Mean
To develop future scenarios To define a problem better To explore the diversity of human perceptions or behaviors To define specific rules of agent behavior or interaction To validate simulation results (with experts, stakeholders, etc.) To gain expert judgments on the general modeling approach To elicit concrete actions, given a certain situation To calibrate existing models
3.1 3.1 2.9 2.8 2.7 2.7 2.6 2.1
Standard Deviation 0.91 0.95 1.08 1.08 0.86 1.11 0.94 0.92
Three factors emerge from the factor analysis across the functions of qualitative data (see Table 4). The first factor combines functions related to behavioral issues, such as agent rules or actions (note that the scenario function does not load equally high on this factor). The second factor comprises two items related to the better understanding of the model and the problem at hand, respectively. The third factor links to the function of expert judgment. Table 4. Results from the factor analysis of the ratings of the functions of qualitative data. The question is “What are the functions of qualitative data in your modeling and simulation approach? I use qualitative data ...” (Only values above 0.35 are shown; see [15].)
Statements
Factor 1
To elicit concrete actions, given a certain situation
0.836
To define specific rules of agent behavior or interaction
0.621
To explore the diversity of human perceptions or behaviors
0.571
To develop future scenarios
0.390
2
To calibrate existing models
0.613
To define a problem better
0.454
To gain expert judgments on the general modeling approach To validate simulation results (with experts, stakeholders, etc.)
3
0.593 0.354
0.590
Extraction method: Alpha factoring. Rotation method: Varimax with Kaiser normalization. Rotation converged in seven iterations.
Frequency of modeling methods Combinations of modeling approaches
35
Agent-based modeling
27
System dynamics
25
Bayesian networks
19
Qualitative system modeling
19
Optimization models
14
Linear modeling
12
None of the above
8
Cellular automata
8
Game and decision theory
5 0
3
6
9
12
15
18
21
24
27
30
33
36
Fig. 2. Modeling methods applied by the respondents. The question is “What modeling approach/method do you apply?” (As shown, the respondents most often use a combination of modeling approaches, N = 74.).
Fig. 2 shows how often the respondents apply certain modeling methods. A combination of approaches is most often used, but agent-based and system dynamics modeling are also rather frequent. 4 Fig. 3 illustrates the frequency of the challenges experienced by the participants. The main challenge appears to be the appropriate quantification of qualitative data. Equally frequent are interviewer bias, implicit expert knowledge, stakeholder bias, and the translation problem, specifically, how to arrive at dynamic simulation models starting from behavioral models.
Fig. 3. Results of the question about challenges in qualitative data. The question is “From your experience, what challenges do you have to deal with regarding the named functions of qualitative data in your modeling/simulation?”
4
For the “none of the above” option, only one comment is entered: “Iterative model development (knowing the key data missing after first model implementations = > requires several iterations between data collection and implementation)”
At the end of the questionnaire (but before the demographic questions), the participants are asked to indicate whether they see the need for more interdisciplinary cooperation. The results show that the majority agree (yes, N = 48), some only partly (partly, N = 17), and only a few disagree (no, N = 5). To keep the questionnaire short, no additional questions about the nature of this cooperation are asked.
4
DISCUSSION
The purpose of the survey was to shed some light on the notions about qualitative data among scholars with modeling and simulation experiences. The ad hoc definitions given by the respondents vary in the two basic categories of positive or negative descriptions of what qualitative data are or are not, respectively. An important feature of qualitative data is obviously their subjectivity, resembling what Yang and Gilbert [1] call “ethnographic” data, which are “not readily converted into a variable/actor matrix without losing information or doing an injustice to the data” (p. 176). Linked to this, qualitative data are expressed in verbal (or pictorial) forms and perceived as rather descriptive. Another feature of qualitative data is related to the level of measurement and the question on what scale the participant can measure such data and whether statistics can be employed. Usually, a nominal scale is assumed—as one participant states, “Qualitative data: data with no relation to a scale or at most a nominal scale (e.g. statements like ‘beautiful’ and ‘ugly’)”—which is also a definition by example. However, some participants also put forth a categorical scale, such as “Qualitative data are the information that is gathered by [the] categor[y] scale”—which would not comply with the system of Yang and Gilbert [1] because they distinguish “ethnographic” from categorical data. A slightly different notion can be found in the work of Agar [16, section 4.8], where he states, “all that matters is ‘more or less’ validity. All that is required is an ordinal scale, not interval or ratio.” This “relativity” issue can also be found rather frequently in the sample. For instance, one participant notes that “qualitative data = ‘quantitative imprecise data’ or “data expressed relatively to something else, usually in natural language.” The negative definitions in this survey most often refer to the non-numeric nature of qualitative data. For instance, one participant states that it is “information that can be substantiated […] and that is not in numerical form.” However, generally, the respondents do not neglect the positive features of qualitative data or argue against the use of these. On the contrary, they often show that it may be useful or necessary to rely on qualitative data. For instance, it is mentioned that this kind of data “adds context about something but in a way that is often supportive of quantitative data [...]. It can be very useful in establishing trust in other data.” Regarding functions, the most important applications of qualitative data are in scenario analysis and problem definition. The first one is obvious since many examples deal with more bottom-up approaches [11]. For the second, a researcher can remain skeptical of whether this is actually pursued often [17], but it is a good sign if qualitative data are considered in the early phases of modeling efforts. The definition of agents’ rules from qualitative data is also rated as rather important. These rules may
exhibit relative preferences under certain conditions. For instance, Boero and Squazzoni [2, section 2.21] mention that qualitative data “allow introducing realistic rules of behavior or cognitive aspects at the micro level of individual action. They refer to everything that cannot always be quantified, but can be expressed in a logic[al] language.” It is possibly of interest that the function ranked as the least important is the calibration of existing models. Probably, this is perceived as the task of the modeler without reliance on qualitative judgments, for instance, from experts. Regarding the structure of functions, two important functions load on one factor because they are related to the knowledge/judgment from third parties—to gain expert judgments on the general modeling approach and to validate simulation results. Concerning validation, a specific problem has been mentioned in the literature, for instance, by Moss and Edmonds [18, p. 1097], linked to the descriptive nature of qualitative data in a model: “The properties of the model as a whole are amenable to summary using descriptive statistics, while the behavior of the individual agents can (and we argue should) be described qualitatively.” In other words, researchers often describe the outcomes of simulations in qualitative terms although they know that everything in the machine is formal and quantitative. As Verplanken [19] shows, the understanding of numbers can differ, depending on the framing; thus, a qualitative description of a quantitative model’s result may be an option. However, the qualitative description of (or narrative about) quantitative simulation results may pose an additional challenge, particularly when communication with non-modelers is at stake. This links to the next topic investigated by this survey. What are the most important challenges that the participants perceive as related to qualitative data and modeling? Expert bias is ranked among the more important challenges, and the problem of potential stakeholder bias is obvious in scenario development or the validation of simulation results. Countermeasures include a thorough stakeholder analysis and a representative group of participants (actors) from different sectors to cover different factors [20] although in reality, a convenience sample will be the most likely option for various reasons. Nonetheless, the most important challenge appears to be the quantification of qualitative data. This fits the definitions given by the respondents, often mentioning the low level of measurement or the lack of numerical values for such data. Probably in this case, researchers could collect successful strategies and propose some sort of best practice as advice for modelers who are currently hesitating to include qualitative data. Indeed, several examples can be found in the literature for different modeling communities [21–24]. Moreover, there is the option of “getting away from numbers” [1, 25]. However, as shown in this study, psychologists routinely quantify qualitative judgments by applying for instance Likert-type scales. The results of participant’s ratings are numbers, for example between 1 and 5, and those numbers are used to calculate scales (by mean calculating values) and do other manipulations. Similarly, counting specific terms in interview transcripts or from open questions in a questionnaire, allows at least for presenting a frequency table. As could be seen by the remarks from the current sample (and actually by comments from reviews to this pa-
per), this may not be obvious to all modelers. Of course, in this paper the data was not implemented into an agent-based model, but see for example [12].
5
CONCLUSIONS
In conclusion, I suppose this exercise was helpful by highlighting the need for better knowledge of how to translate qualitative data (or basically, most social science data) into numbers, which can be used in modeling and simulation. More effort is needed to inform modelers about the potentials and options; probably in the sense of workshops (such as the essa@work initiative), which may be a good opportunity to show and discuss pros and cons of different solutions to this problem. Moreover, there is literature about (from a modelers perspective) how to deal with social science data – obtain it, process it, apply and implement it; some of it is referenced in this paper, e.g. [4, 12], but see also a special issue coming up in the Journal of Environmental Psychology (accessible via http://www.sciencedirect.com/science/journal/aip/02724944). One important point in this respect is the interdisciplinarity of the modeling community. The majority of the respondents did acknowledge that; I concur and confirm based on my experience that even within disciplines (be at psychology or hydrology) there are different communities who use slightly different terms and refer to different concepts when talking about certain kinds of data (e.g. what scale level can be obtained by social science methods or what is a parameter versus a variable). One can presume that much cooperation work has to be done. Moreover, it is interesting that many participants indicated using a variety of modeling approaches. It would be worthwhile to further investigate whether there are specific problems related to this kind of multi-modeling endeavor. The survey results cannot shed light on this; there was no clear pattern to detect, whether those applying multiple modeling methods answered differently.
6
REFERENCES 1. Yang, L., Gilbert, N.: Getting away from numbers: Using qualitative observation for agent-based modeling. Advances in Complex Systems 11, 175-185 (2008) 2. Boero, R., Squazzoni, F.: Does Empirical Embeddedness Matter? Methodological Issues on Agent-Based Models for Analytical Social Science. Journal of Artificial Societies and Social Simulation 8, 6 (2005) 3. Kemp-Benedict, E.J., Bharwani, S., Fischer, M.D.: Methods for linking social and physical analysis for sustainability planning. 15(3). Ecology and Society 15, no pages (2010) 4. Robinson, D.T., Brown, D.G., Parker, D.C., Schreinemachers, P., Janssen, M.A., Huigen, M., Wittmer, H., Gotts, N., Promburom, P., Irwin, E.: Comparison of empirical methods for building agent-based models in land use science. Journal of Land Use Science 2, 31-55 (2007)
5. Seidl, R.: Social Scientists, Qualitative Data, and Agent-Based Modeling. In: Quesada, M.F.J., Amblard, F., Barceló, J.A., Madella, M. (eds.) Social Simulation Conference, Barcelona (2014) 6. Taylor, R.I.: Agent-based Modelling Incorporating Qualitative and Quantitative Methods: A Case Study Investigating the Impact of E-commerce Upont the Value Chain. Manchester Metropolitan University (2003) 7. Moss, S.: Critical incident management: An empirically derived computational model. Journal of Artificial Societies and Social Simulation 1, 1 (1998) 8. Windrum, P., Fagiolo, G., Moneta, A.: Empirical validation of agent-based models: Alternatives and prospects. Journal of Artificial Societies and Social Simulation 10, 8 (2007) 9. Soboll, A., Elbers, M., Barthel, R., Schmude, J., Ernst, A., Ziller, R.: Integrated regional modelling and scenario development to evaluate future water demand under global change conditions. Mitigation and Adaptation Strategies for Global Change 16, 477-498 (2011) 10. Brand, F.S., Seidl, R., Le, Q.B., Brändle, J.M., Scholz, R.W.: Constructing Consistent Multiscale Scenarios by Transdisciplinary Processes: the Case of Mountain Regions Facing Global Change. Ecology and Society 18, (2013) 11. Alcamo, J.: Environmental futures: the practice of environmental scenario analysis. Elsevier Science Ltd (2008) 12. Smajgl, A., Barreteau, O.: Empiricism and Agent-Based Modelling. In: Smajgl, A., Barreteau, O. (eds.) Empirical Agent-Based Modelling. Challenges and Solutions, vol. 1, pp. 1-26. Springer, New York (2014) 13. Edmonds, B.: Using Qualitative Evidence to Inform the Specification of Agent-Based Models. Journal of Artificial Societies and Social Simulation 18, 18 (2015) 14. Field, A.P.: Discovering statistics using SPSS. SAGE publications Ltd, London (2009) 15. Costello, A.B., Osborne, J.W.: Best practices in exploratory factor analysis: four recommendations for getting the most from your analysis. Practical Assessment, Research & Evaluation 10, 1-9 (2005) 16. Agar, M.: My kingdom for a function: Modeling misadventures of the innumerate. Journal of Artificial Societies and Social Simulation 6, (2003) 17. Hering, J.: Do we need “more research” or better implementation through knowledge brokering? Sustain. Sci. first online, 1-7 (2015) 18. Moss, S., Edmonds, B.: Sociology and Simulation: Statistical and Qualitative Cross‐ Validation1. American journal of sociology 110, 1095-1131 (2005) 19. Verplanken, B.: The Effect of Catastrophe Potential on the Interpretation of Numerical Probabilities of the Occurrence of Hazards1. Journal of Applied Social Psychology 27, 1453-1467 (1997) 20. Kok, K., Rothman, D.S., Patel, M.: Multi-scale narratives from an IA perspective: Part I. European and Mediterranean scenario development. Futures 38, 261-284 (2006) 21. Weiß, M., Schaldach, R., Alcamo, J., Flörke, M.: Quantifying the human appropriation of fresh water by African agriculture. Ecology and Society 14, 25 (2009) 22. Ghorbani, A., Dijkema, G., Schrauwen, N.: Structuring Qualitative Data for AgentBased Modelling. Journal of Artificial Societies and Social Simulation 18, 2 (2015) 23. Luna-Reyes, L.F., Andersen, D.L.: Collecting and analyzing qualitative data for system dynamics: methods and models. System Dynamics Review 19, 271-296 (2003) 24. Eldabi, T., Irani, Z., Paul, R.J., Love, P.E.D.: Quantitative and qualitative decision‐ making methods in simulation modelling. Management Decision 40, 64-73 (2002)
25. Edmonds, B., Hales, D.: When and why does haggling occur? Some suggestions from a qualitative but computational simulation of negotiation. Journal of Artificial Societies and Social Simulation 7, (2004)