have been few systematic attempts to create such measures. Complicating ... Disaster or extreme event situations combine many elements thatâby definitionâ.
Chapter X Measures of Resilient Performance David Mendonça
Introduction In order to theorize, manage – even engineer – resilience, it is necessary that the factors that contribute to resilience be identified, and that measures of these factors be validated and exercised. Yet to date there have been few systematic attempts to create such measures. Complicating the matter is the fact that the resilience of safety-critical systems may only be manifested during actual operations. As a result, opportunities for controlled study (even systematic observations) on resilient organizations are severely limited. There is therefore a clear need to identify the factors that contribute to resilience, develop measures for these factors, and validate instruments for estimating the values of these factors. This chapter takes as a starting point a set of factors developed in prior research on organizational resilience. It then discusses an approach to refining and measuring these factors. The framework is then applied to the development and assessment of a candidate set of measures for the factors, using data drawn from observation of infrastructure restoration in New York City, New York, following the 11 September 2001 World Trade Center attack.
Defining and Measuring Resilience Among the definitions of resilience are an ability to resist disorder (Fiksel, 2003), as well as an ability to retain control, to continue and to rebuild (Hollnagel & Woods, 2006). Indeed, despite its relevance to the maintenance and restoration of system safety and operability, resilience may be a difficult concept to measure. For example, during system operation it may be possible only to measure its potential for resilience, rather than its resilience per se (Woods, 2006). The following factors are thought to contribute to resilience (Woods, 2006):
•
buffering capacity: size or kind of disruption that can be absorbed/adapted to without fundamental breakdown in system performance/structure
•
flexibility/stiffness: system’s ability to restructure itself in response to external changes/pressure
•
margin: performance relative to some boundary
•
tolerance: behavior in proximity to some boundary
•
cross-scale interactions: how context leads to (local) problem solving; how local adaptations can influence strategic goals/interactions Resilience engineering is “concerned with monitoring and managing performance at the boundaries of competence under changing demands” (Hollnagel & Woods, 2006). In seeking to engineer resilience, it is therefore appropriate to consider how these factors may be measured. Resilient performance (or the lack thereof) may arise out of need or opportunity, though the latter case is very rarely studied. In the former case, there are numerous studies of how organizations have dealt with situations that push them to the boundaries of competence. Disaster or extreme event situations combine many elements that—by definition— challenge capabilities for planning and response . Opportunities for examining resilient performance in response to extreme events are limited. First, there may be high costs associated with large-scale and nearly continuous observation of pre-event conditions. Second, the consequence of extreme events can include destruction of established data collection instruments, as occurred with the emergency operations center and, later, at the New York Fire Department command post as a consequence of the World Trade Center attack. Third, new processes, technologies and personnel brought in to aid the response may not be measurable with any instruments that remain available, as commonly occurs when the victims of an extreme event act as first responders, or when ad hoc communication networks are formed. A very real challenge in engineering resilience is therefore fundamentally methodological: how can organizational theorists and designers develop and implement measurement instruments for “experiments” which are essentially undesignable?
Broadly speaking, measurement may be defined as the “process of linking abstract concepts to empirical indicants” (Carmines & Zeller, 1979). It is worth emphasizing that this definition does not presupposed that measures are quantitative, merely that the linkage between abstract concepts and their instantiation in the real world be provided empirically. Intimately bound in any discussion of measurement in science and engineering are the notions of reliability and validity. Reliability refers to “the tendency toward consistency found in repeated measures of the same phenomenon” (Carmines & Zeller, 1979). In other words, a measurement instrument is reliable to the extent that it provides the same value when applied to the same phenomenon. On the other hand, an indicator of some abstract concept is valid to the extent that it measures what it purports to measure (Carmines & Zeller, 1979). In other words, a valid measurement is one that is capable of accessing a phenomenon and placing its value along some scale. Two types of validity are commonly investigated. Content validity “depends on the extent to which an empirical measurement reflects a specific domain of content. For example, a test in arithmetical operations would not be content valid if the test problems focused only on addition, thus neglecting subtraction, multiplication and division” (Carmines & Zeller, 1979). Construct validity is “the extent to which an operationalization measures the concepts it purports to measure” (Boudreau, Gefen, & Straub, 2001). More precisely, construct validity “is concerned with the extent to which a particular measure relates to other measures, consistent with theoretically-derived hypotheses concerning the concepts (or constructs) that are being measured” (Carmines & Zeller, 1979). Construct validation involves determining the theoretical relation between the concepts themselves, examining the empirical relationship between the measures and the concepts, then interpreting the empirical evidence to determine the extent of construct validity. With a few prominent exceptions, the path to instrument development is seldom discussed, thus providing little insight for researchers and practitioners about the validity and reliability of the measurements produced by these instruments. A common exception is the survey instrument, which is often to access attitudes and other psychological states that might be difficult to measure directly. Yet for quite some
time, a number of researchers have argued for increasing the portfolio of measures used in social science. For example, numerous unobtrusively collected measures may be of use in advancing understanding of organizations (e.g., Weick, 1985). In the early days of the development of theory for a new (or newly discovered) class of phenomena, the need for discussions of instrument development is particularly great. Without adequate attention to the assumptions underlying instrument development, the theory may become too narrow (or too diffuse) too quickly, leading either to an unnecessarily narrow view or to a hopelessly broad one. Resilience engineering is clearly a field in the midst of defining itself and its relationship to other fields, and this includes identifying and defining the phenomena which researchers in the field intend to investigate. Research in resilience engineering has been predominantly informed by field observations and not, for example, by laboratory studies. More to the point, research in the field has been strongly interpretive, focusing primarily on case studies. A study may be said to be interpretive “if it is assumed that our knowledge of reality is gained only through social constructions such as language, consciousness, shared meanings, documents, tools and other artifacts” (Klein & Myers, 1999). The types of generalizations that may be drawn from interpretive case studies are the development of concepts, the generation of theory, drawing of specific implications and the contribution of rich insights (Walsham, 1995). Principles for evaluating interpretive case studies may be used for establishing their reliability and validity, though the methods for doing so differ from those used in positivistic studies. Klein and Meyers (1999) provide a set of principles for conducting interpretive field studies, as follows. The fundamental principle—that of the hermeneutic circle—suggests that “we come to understand a complex whole from preconceptions about the meanings of its arts and their relationships.” Other principles emphasize the need to reflect critically on the social and historical background of the research setting (contextualization) and how research materials were socially constructed through interaction between researchers and participants (interaction). The principle of dialogical reasoning requires sensitivity to possible contradictions between theory and findings. Similarly, the principle of multiple interpretations requires sensitivity to differences in participants views, while the principle of suspicion requires sensitivity
to possible biases and distortions in those views. Application of the principles of the hermeneutic circle and contextualization yield interpretations of data collected in the field. The principle of abstraction and generalization requires relating these interpretation to theoretical, general concepts concerning human understanding and social action. In contrast to interpretive studies are positivist studies. A research study may be said to be positivist “if there is evidence of formal propositions, quantifiable measures of variables, hypothesis testing, and the drawing of inferences about a phenomenon from a representative sample to a stated population” (Orlikowski & Baroudi, 1991). There are some obvious challenges associating with a positivist approach to research in resilience engineering at this stage. For example, there are still very many contrasting definitions of resilience itself, as well as of the factors that are associated with it. Combining interpretive and positivist approaches seems a reasonable way to make progress in developing this new area of research, but few studies—at least in the social sciences—seek to do so, and indeed there are very few guidelines to lead the way. One approach is triangulation which may be defined as “the combination of methodologies in the study of the same phenomenon” (Denzin, 1978). There are two main approaches to triangulation: between (or across) methods, and within method (Denzin, 1978). Within-method triangulation “essentially involves cross-checking for internal consistency or reliability while ‘between-method’ triangulation tests the degree of external validity” (Jick, 1979). Triangulation can provide “a more complete, holistic, and contextual portrayal of the units under study,” though it is important to keep in mind that “effectiveness of triangulation rests on the premise that the weaknesses in each single method will be compensated by the counter-balancing strengths of another” (Jick, 1979). The remainder of this paper discusses a combined interpretive and positive approach to the measurement of factors associated with resilience, with a particular focus on the use of triangulation for improving measurement reliability and validity.
Identifying and Measuring Factors Affecting Resilience in Extreme Events Extreme events may be regarded as events which are rare, uncertainty and potentially high and broad consequences (Stewart & Bostrom, 2002). There are some immediately obvious reasons to study resilience in the context of the response to extreme events. Performance of organizations in situations is often at the boundary of their experience. It is conducted by skilled individuals and organizations, who must make high-stakes decisions under time constraint (Mendonça & Wallace, 2007a). On the other hand, the boundaries of experience may be difficult to identify a priori (i.e., before the event has occurred) and perhaps even afterwards. It is very likely that unskilled individuals and organizations will participate in the response. The decisions taken during the response may be very difficult to evaluate, even after the event. Finally, the long lag times between events—coupled with the difficulties involved in predicting the location of events—can make preevent monitoring impractical and perhaps impossible. When a disaster is sufficiently consequential (e.g., Category IV or V hurricanes, so-called strong earthquakes), public institutions may provide essentially unlimited buffering capacity in the form of personnel, supplies or cost coverage. On the other hand, non-extreme events that nonetheless test organizational resilience (i.e., those typically called crises) require that this buffering capacity reside within the impacted organization. In the extreme event situation, then, buffering capacity is essentially unlimited. The remainder of this section therefore offers preliminary thoughts on the measurement of margin, tolerance, and flexibility/stiffness (cross-scale interactions will be discussed briefly in the context of flexibility/stiffness).
Margin System boundaries may be said to represent both limits of performance (e.g., person-hours available for assignment to a task within the system) and the borders that separate one organization from the outside world (e.g., entry and exit points for the products associated with the system). For all but the simplest systems, multiple boundaries will be present for both types, requiring organizations to reckon their performance along multiple (sometimes conflicting) dimensions. Measuring the margin of a system, then, requires an approach that acknowledges these dimensions,
along with possible trade-offs among them. Given the nature of extreme events, as well as their ability to impact system performance, the dimensionality of this assessment problem poses considerable challenge to the measurement of margin.
Tolerance Like margin, tolerance refers to boundary conditions of the system. In this case, however, the concept describes not the performance of the system but rather how that performance is achieved: that is, how the people, technologies and processes of the system function. In measuring margin, a chief problem is paucity of data; in measuring tolerance, the challenge is to develop process-level descriptions of organizational behavior. For example, this might entail pre- and postevent comparisons of communication and decision making processes at the individual, group and organizational levels. Given the rarity of extreme events, cross-organizational comparisons may not be valid beyond a very limited number of organizations.
Flexibility/Stiffness The literature on organized response to disaster has shown the importance of planning (Drabek, 1985; Perry, 1991) to organizational capacity to respond to extreme events, but it has also shown that flexibility and an ability to improvise remain crucial in mitigating losses during response (Kreps, 1991; Turner, 1995). Indeed, the literature on emergency response is replete with examples of how response personnel have improvised social interaction (Kreps & Bosworth, 1993), behavior (Webb & Chevreau, 2006)and cognition (Vidaillet, 2001; Mendonça & Wallace, 2003; Mendonça, 2007) in seeking to meet response goals. Yet the measurement of flexibility and improvisation has been concentrated on product-related constructs, such as the perceived degree of effectiveness and creativity in the response. Only recently have there been attempts to develop process-related measures, and these are limited to cognitive and behavioral constructs. The final factor thought to contribute to resilience is cross-scale interactions, which relates closely to decision making and communication strategies, and therefore to the cognitive processes that underlie these strategies. Cross-scale interactions generally refer to within-organization interactions, though it may be possible that cross-organization
interactions are also relevant to resilient performance. Both of these latter factors are complementary. Flexibility/stiffness refers to organizational restructuring, while cross-scale interactions may perhaps be seen as a particular type of organizational restructuring, one in which new processes emerge during response, perhaps as part of organizational restructuring. Consequently, cross-scale interactions will be discussed in the context of flexibility/stiffness.
Resilient Performance in Practice As suggested above, development of concepts concerning the factors that contribute to resilience has progressed to the point where it is now appropriate to consider how these factors may be measured. This section reports on the development of measures for margin, tolerance and flexibility/stiffness as manifested during the response to the 2001 attack on the World Trade Center (WTC). As a result of the attack, there were extensive disruptions to critical infrastructure systems in New York City, leading to local, national and international impacts. Some disruptions were isolated to single systems, while others cascaded across systems, clearly demonstrating interdependencies that existed either by design (e.g., power needed to run subway system controls) or that emerged during and after the event itself (e.g., conflicting demands for common resources) (Mendonça & Wallace, 2007b). A number of studies have detailed the impact of the attack on critical infrastructure systems (O'Rourke, Lembo, & Nozick, 2003), as well as described some of the restoration activities of subsequent months. Damage to the electric power system was considerable, certainly beyond what had been experienced in prior events. This included the loss of 400 mega-watts (MW) of capacity from two substations which were destroyed following the collapse of World Trade Center building 7, and severe damage to five of the feeders that distributed power to the power networks. Indeed, five of the eight total electric power distribution networks in Manhattan were left without power. In total, about 13,000 customers were left without power as a result of this damage. Restoration of this power was an immediate high priority for the city and, in the case of the New York Stock Exchange, the nation. Within the telecommunications infrastructure, the loss of power impacted a major switching station, backup emergency 911 call routing and consumer telephone service, all located within the building housing
the center. The task of the company was to restore power to the building and recommence telecommunications services as quickly as possible. Taken together, these studies provide a means for understanding the link from initiating incidents (e.g., power outages), to disruptions (e.g., loss of subway service due to lack of power for signaling devices) and finally to restoration (e.g., the use of trailer-mounted generators for providing power to individual subway stations). The human side of both impacts and restoration, on the other hand, has not been nearly as well explored. Since resilience encompasses both human and technological factors, it is appropriate to consider how measures for both sets of factors may be defined and estimated in order to clarify the concept of resilience.
Method Data collection activities associated with both studies may be characterized as initially opportunistic, followed by stages of focused attention to salient sources. An initial concern was simply how to gain access to these organizations. Existing contacts within both industries, combined with the support of the National Science Foundation, were instrumental in providing initial entree. The briefing for the project was to study organized response in the restoration of interdependent critical infrastructure systems. At the time the data were being collected (beginning in late 2001), few studies had addressed the role of the human managers of these systems, instead concentrating on technical considerations of design and management. There were therefore few exemplar studies – and very little direct methodological guidance – on how to proceed in the study. Both studies therefore adopted a strongly interpretive approach to the evolving design of the study. Initial consultations with the companies responsible for the power and telecommunications infrastructures being studied were done in order to identify critical incidents, particularly those which involved highly nonroutine responses. Direct consultations were held with upper management-level personnel, who then contacted individuals involved with the candidate incidents in order to assess whether they would be able (or available) to take part in the study. This occasionally led to additional, clarifying discussions with management, usually to investigate expanding the respondent pool. A considerable amount of
time went into developing a respondent pool that spanned those levels in the organization that were involved in the incident. For example, study participants ranged from senior vice presidents to line workers (e.g., those who conducted the physical work of repairing the infrastructures). For the power company, the initial consultations led to a set of eight incidents. For the telecommunications company, various incidents were discussed, but only one could be investigated given the time commitments of interview subjects, many of whom were still deeply involved in other restoration activities. In hindsight, timely data collection was paramount to the success of both studies. From a practical perspective, data collected so soon after the fact were fresh – something particularly desirable for data drawn from human subjects. It also provided an opportunity for the study team to demonstrate that it could collect data without causing unreasonable perturbations in the work patterns of study participants. Data collection methods reflected both the goals of the project and the perspectives of the four investigators, two of whom were involved in the study of human-machine systems, and two of whom were involved in the technical design aspects of infrastructure systems. Discussions amongst the investigators produced agreement on the salience of core concepts from “systems engineering” (e.g., component and system reliability, time to restoration) as well as human psychology (e.g., planning, decision making, feedback) to the study. Given the range of core concepts, the points of contact at the companies were asked to request study participants to come prepared to discuss the incident, and to bring with them any necessary supplementary materials (e.g., maps, drawings). Suggestions on supplementary materials to bring were sometimes made by the points of contact and the investigators. A detailed protocol for the interviews was provided to these points of contact for review and comment. With a few exceptions, the Critical Decision Method (Flanagan, 1954; Klein, Calderwood, & MacGregor, 1989) was used for all interviews, with two interviewers and one or two respondents. One interviewer asked the probe questions (Klein et al., 1989); a second took notes (with two exceptions, it was not possible to audio- or video-record the interviews). The critical decision method (CDM) is a modified version of the critical incident technique (Flanagan, 1954) and, like other cognitive task analysis methods, is intended to reveal information about
human knowledge and thinking processes during decision making, particularly during non-routine decision making (Klein et al., 1989). It has been used in a wide variety of studies (see (Hoffman, Crandall, & Shadbolt, 1998) for a review). The five stages of the procedure were completed in all interviews (i.e., incident identification and selection; incident recall; incident retelling; time line verification and decision point identification; progressive deepening and the story behind the story). However, not all interviews were equally detailed. In practice— and following guidance in the use of this method—the choice of probe questions asked of respondents was determined mainly by study objectives, but also by exigency. For example, all respondents were asked whether the incident fit a standard or typical scenario, since the study was strongly informed by work on organizational improvisation, and plans may be highly relevant as referents for improvised action. On the other hand, probe questions concerning mental modeling were never asked, since the investigators had decided early on that formal modeling of the reasoning processes of respondents would not be feasible for the project. At other times, respondents simply did not have the time to commit to a full-scale interview. Most respondents appeared highly cooperative. The investigators emphasized throughout their discussions with points of contact and interview participants that logs of system behavior were vital to the study design, since these provided the closest possible approximation of the behavior of technical systems during response and recovery activities. Materials brought to interviews included system maps, engineering drawings, photos, field notes and meeting minutes. These materials were sometimes extensively used. In fact, interviews which did not include these materials tended to be less illuminating than those where they were used. When these materials were present, it was far easier to keep the interviews grounded in the lived experiences of participants. Finally, it should be noted that other logs were collected with the help of the points of contact. These were reviewed with company personnel for completeness and accuracy, and any identified deficiencies were noted. At the conclusion of each interview, participants filled out a brief questionnaire on their background and experience. A second questionnaire, adapted from work by Moorman and colleagues on improvisation by organizations (Moorman & Miner, 1998; Miner,
Bassoff, & Moorman, 2001), was used to measure organizational improvisation, organizational memory and the evaluation of the response. Finally, it should be noted that a small number of supplementary materials, such as newspaper and other reports from the popular press, were sometimes used by the investigators to provide context on the activities of the two companies. The distribution of these different types of data across the two studies is given in Table 1.
Results Data collection activities took place beginning in late 2001 and continued throughout 2002. In total, eleven in-dept interviews were conducted (10 for electric power, one for telecommunications), along with approximately 20 shorter sessions with approximately one interview subject per session. Other data sources are described below. Infrastructure
Organizational Units of Interview Participants
Data Sources
Electric Power
engineering emergency management electric operations energy services distribution engineering
interviews log data meeting notes after-action reports photographs drawings questionnaires
Telecommunications network operations
interview questionnaire after-action reports photographs
Table 1: Summary of organizational units and data sources in the studies
Overview of Restoration Activities The power company engaged in two inter-related strategies for restoring that power: connecting trailer-mounted portable generators to provide spot power; and installing temporary feeder lines – called shunts – to connect live networks to dead ones. The telecommunications company also relied upon trailer-mounted portable generators. An overview of these three critical decisions is presented
before illustrating the development and implementation of measures of margin, tolerance and flexibility/stiffness. For the power company, the loss of distribution capacity was far beyond the scale of previous incidents. Soon after the attack, the company began attempting to procure trailer-mounted generators in order to provide spot power to critical customers. By 12 September, it was clear that the amount of time and effort required to secure, install and operate these generators would be considerable. As a result, the company decided to create a Generator Group, comprised of individuals from various parts of the organization, which would have primary responsibility for work in this area. The second part of the company’s strategy was the use of shunts – cables with 13 kilovolt (kv) capacity – which were used to make connections between dead networks and live ones. This task was handled by existing units in the organization (such as Distribution Engineering and Electric Operations). Procedures executed by these units included determining shunt routes through the city and coordinating pick-ups (i.e., the actual connecting of the shunts to the networks). For the telecommunications company, the loss of power to the building would have triggered a standard operating procedure to connect a generator to the building via hookups in the basement. However, water and debris in the basement made this procedure unexecutable. A decision was then made to connect cable from the diesel generators directly to the floors which they were to power, but, according to interview respondent, “there’s no good way of doing that, because it’s all hard wired in an elaborate system of switches.” The solution required cutting riser cables above the basement and attaching them to portable generators with between one and 2.5 megawatt capacity. (Risers are cables within the building that are normally used to transmit power throughout the building.) The task of connecting the cables required considerable care in order to ensure that cables were properly matched. Generators were running by Friday, 14 September. A gradual transition was then made to commercial power, essentially resolving the incident (though generators remained on stand-by and were periodically tested). (See Mendonça, 2007 for a complete discussion of the case.)
Measuring Resilience Examining the initial definitions for margin and tolerance, it is clear that – in order to estimate these factors – it is necessary to identify system boundaries. In the electric power study, study participants and members of the research team offered numerous suggestions to comprise a set of boundaries. For example, staff utilization represents the extent to which personnel in an organization are utilized. Proxy measures for this construct had been monitored (e.g., sign-in sheets for on-duty employees), but ultimately could not be made available to the research team for reasons of employee confidentiality. Other examples include transmission system capacity, which represents the amount of power that could be delivered over existing infrastructure. A complete picture of transmission system capacity was not available, however. It was, however, possible to estimate the incremental contributions made to transmission capacity via the installation of generators shunts. A more sophisticated measured might combine capacity measures with estimates of anticipated load from customers. In the telecommunication study, system boundaries were considerably more difficult to discern. The main reason was the reliance of the study on a limited range of participants, but also on the highly localized nature of the incident: the case study concerned the restoration of power to a single building. It should also be mentioned that restoration activities were still being conducted during site visits by the research team, and therefore there were limits on the amount of time that participants could devote to supporting data collection. Resource utilization was discussed in terms of managing demand, since there was sufficient slack in the system to allow services that were normally provided through the facility to be provided through other facilities. The amount of load on the network was also discussed in this context. Factor
Power
Telecommunications
Margin/ Tolerance
Transmission capacity Network stability Network load Resource utilization
Resource utilization network load
Flexibility/Stiffness
Restructuring of organizational units Development of new procedures
Development of new procedures
Recognition of unplanned-for contingencies Identification of opportunities for renewal
Table 2: Candidate measures for factors contributing to resilience So, as with other extreme events, both margin and tolerance are difficult to evaluate since organizational boundaries are difficult to identify. In the power restoration case, a key observation is that the magnitude of the restoration problem far exceeded that of previous experience. Indeed, while generators had been part of previous restorations, the company had never before needed this quantity in such a short time. Using the available data for the generator strategy, it does appear that the path to restoration – as indicated by the cumulative number of generators restored to the network – followed an S-shape, similar to that of a prototypical learning curve. In the case of the shunt strategy, the number of feeder connections made per day suggests a straight path to achieving sufficient interim capacity. The nature of flexibility/stiffness in the power restoration case is suggested by its decision to create a new organizational structure—the Generator Group—almost immediately after the attack in order to manage generator procurement and use. The group was dissolved once the generators ceased to be a crucial part of the restoration plan. In other interviews (not discussed here), respondents stated that some existing organizational units improvised their roles, undertaking tasks that were within the capability of the organization but which were not in the usual range of activities for the units themselves. This phenomenon has been amply demonstrated in the response to many other events (Webb, 2004). Flexibility/stiffness is also reflected in the major restructuring of physical network, resulting in a new design for the distribution system (i.e., one using three larger networks instead of eight smaller ones). In contrast to the generator situation, there was no major restructuring of organizational units. In the case of telecommunications restoration, evidence of flexibility is found in the development of new procedures. For example, during the interview the manager emphasized the limited usefulness of plans during the response. He stated “If I’d had to go to anything other than my head or someone else’s it wouldn’t have worked. You don’t pull a
binder off the shelf on this one. You certainly wouldn’t grab a laptop and go into something.” Indeed, “no one to my knowledge when into a system quote unquote that gave them an answer in terms of what to do.” Yet on the other hand, he stated earlier in the interview that the decision to use diesel generators was made in a “split-second.” Similarly, the decision to connect the generators to the risers was “one of those decisions that truly took milliseconds. I said, OK we have to get the building risers – meaning the hard-wired cables – cut them and splice cables from the street into the riser.”
Discussion By considering these studies with respect to the principles evaluating interpretive field studies, recommendations may be made for how to proceed in future studies of power and telecom restoration. Many of the principles for evaluating interpretive studies given by Klein and Myers (1999) speak directly to some of the challenges involved in researching organized response to extreme events. For example, there were close interactions between researchers and subjects throughout the project. Indeed, some individuals in the organizations were both subjects and points of contact, and it was through interactions with these individuals that potential data sources and subjects were identified. Both companies were clearly interested in seeing the results of this work and looking for ways to apply them to their organizations. The principle of suspicion applied here to both sides of the relationship between researchers and subjects. For example, interim reports to key points of contact enabled both groups to look for evidence of bias or distortion. In practice, the principle of multiple interpretations can be difficult to follow in situations such as these, since there is a natural tendency in after-action reporting to construct coherent, even logical or linear narratives to explain the observed sequence of events. Finally, numerous personnel – particularly those who had been with the company for extended periods of time – discussed the relevance of prior experience in their efforts towards restoration. While the inclusion of these observations in the study results may improve the assessment of the study, in practice it was difficult to apply the principle of suspicion to these observations, since they drew upon incidents that had occurred tens of years earlier.
A variety of approaches to measuring the factors is evident from the cases and subsequent discussion. In order of decreasing granularity, they may be described as follows: •
Output measures that describe the resilient performance (e.g., mean time to restoration). These offer limited insights.
•
Measures that describe the impact of contextual factors on resilient performance. This approach is only useful to the extent that these contextual factors can be measured. It does not unveil process-level phenomena. Process measures that show the observed relationship between inputs and outputs, perhaps including explanations of the impact of contextual factors. •
Model-based explanations, which make ongoing predictions about the processes that translate (observed) inputs into (observed) outputs. Associated with any of these approaches are two threats to validity. First, post-event reports by response personnel are notoriously unreliable and potentially invalid, particularly when cognitive demands are unusually high. To achieve consistency (i.e., internal validity), it is often necessary to triangulate the observations of numerous participants. and almost certainly to give considerable weight to data collected concurrent with the occurrence of the phenomena. Second, the external validity is necessarily limited for all but a few cases. To achieve generalizability (i.e., some measure of external validity), we probably need to measure phenomena associated with these factors at a much lower level and then aggregate the results. As an example, begin with the study of individual processes and aggregate these – rather than beginning with looking for results at the group or organizational level.
Concluding Comments A number of observations from the conduct of this study may be used to improve the quality of further research into measuring factors thought to contribute to resilience. In the case of power restoration, margin and tolerance have been assessed according to the behavior of the physical system. Yet even with such assessments, it is difficult – perhaps even impossible – to evaluate performance on this case against some theoretical optimum. Even post-event assessments are
challenging, leading to the use of measures of relative performance or efficiency. Engineering estimates of anticipated system performance tend to be heavily informed by expert judgment rather than historical data (National Institute for Building Sciences, 2001). Evidence of flexibility in the power case is found in the company’s efforts at revising organizational structure, but also its activities and which people are assigned to the activities. The design of the physical system may have helped determine the organizational structure, a question that might be further investigated through other studies of physical change in infrastructure systems. Given the practical difficulties of developing and using measures that can be used in assessing – and eventually engineering – organizational resilience in response to extreme events, it is reasonable to plan on achieving understanding through the use of multiple methods. Given the importance of field studies to these efforts, the evaluation of results may benefit from an assessment of the methods of these studies to principles for evaluating interpretive field studies. This study has illustrated the possible benefits (and complications) of triangulating observations using both quantitative and qualitative methods. Yet it is certainly the case that a broader and more comprehensive range of observation techniques and analytic methods will be necessary in order to inform theory about how to engineer resilience in power and telecommunications infrastructures. Access to a wide variety of preand post-event data sources may facilitate this process, but this access must be negotiated through organizational gatekeepers – further emphasizing the need to embrace the principles discussed here. There are certainly opportunities for the development of technologies that may support measurement concerning organizational boundaries. Concurrently, better approaches must be developed for capturing data associated with low-level phenomena in order to support analysis of organizational resilience at a broad level.
References Boudreau, M. C., Gefen, D., & Straub, D. W. (2001). Validation in Information Systems Research: A State-of-the-Art Assessment. MIS Quarterly, 25(1), 1-16. Carmines, E. G., & Zeller, R. A. (1979). Reliability and Validity Assessment. Newbury Park, CA: Sage Publications.
Denzin, N. K. (1978). The Research Act: A Theoretical Introduction to Sociological Methods. New York: McGraw-Hill. Drabek, T. (1985). Managing the Emergency Response. Public Administration Review, 45, 85–92. Fiksel, J. (2003). Designing Resilient, Sustainable Systems. Environmental Science and Technology, 37, 5330–5339. Flanagan, J. C. (1954). The Critical Incident Technique. Psychological Bulletin, 51, 327-358. Hoffman, R. R., Crandall, B., & Shadbolt, N. (1998). Use of the Critical Decision Method to Elicit Expert Knowledge: A Case Study in the Methodology of Cognitive Task Analysis. Human Factors, 40(2), 254-276. Hollnagel, E., & Woods, D. (2006). Epilogue: Resilience. Engineering Precepts. In E. Hollnagel, D. Woods, & N. Leveson (Eds.), Resilience Engineering: Concepts and Precepts. Aldershot, UK: Ashgate. Jick, T. D. (1979). Mixing Qualitative and Quantitative Methods: Triangulation in Action. Administrative Science Quarterly, 24(Dec.), 602–659. Klein, G., Calderwood, R., & MacGregor, D. (1989). Critical Decision Method for Eliciting Knowledge. IEEE Transactions on Systems, Man and Cybernetics, 19, 462-472. Klein, H. K., & Myers, M. D. (1999). A Set of Principles for Conducting and Evaluating Interpretive Field Studies in Information Systems. MIS Quarterly, 23(1), 67–94. Kreps, G. A. (1991). Organizing for Emergency Management. In T. E. Drabek, & G. J. Hoetmer (Eds.), Emergency Management: Principles and Practice for Local Governments. Washington, D.C.: International City Management Association, 30-54. Kreps, G. A., & Bosworth, S. L. (1993). Disaster, Organizing and Role Enactment: A Structural Approach. American Journal of Sociology, 99(2), 428-463. Mendonça, D. (2007). Decision Support for Improvisation in Response to Extreme Events. Decision Support Systems, 43(3), 952–967. Mendonça, D., & Wallace, W. A. (2003). Studying Organizationallysituated Improvisation in Response to Extreme Events. Newark, NJ: New Jersey Institute of Technology. Mendonça, D., & Wallace, W. A. (2007a). A Cognitive Model of Improvisation in Emergency Management. IEEE Transactions on Systems, Man, and Cybernetics: Part A, 37(4), 547–561.
Mendonça, D., & Wallace, W. A. (2007b). Impacts of the 2001 World Trade Center Attack on New York City Critical Infrastructures. Journal of Infrastructure Systems, 12(4), 260-270. Miner, A., Bassoff, P., & Moorman, C. (2001). Organizational Improvisation and Learning: A Field Study. Administrative Science Quarterly, 46(June), 304-337. Moorman, C., & Miner, A. S. (1998). Organizational Improvisation and Organizational Memory. Academy of Management Review, 23(4), 698723. National Institute for Building Sciences (2001). Earthquake Loss Estimation Methodology HAZUS99 SR2, Technical Manuals IIII. Washington, DC: National Institute for Building Sciences. O'Rourke, T. D., Lembo, A. J., & Nozick, L. K. (2003). Lessons Learned from the World Trade Center Disaster about Critical Utility Systems. In J. L. Monday (Ed.), Beyond September 11th: An Account of Post-Disaster Research. Boulder, CO: Natural Hazards Research and Applications Information Center, 269-290. Orlikowski, W. J., & Baroudi, J. J. (1991). Studying Information Technology in Organizations: Research Approaches and Assumptions. Information Systems Research, 2(1), 1–28. Perry, R. (1991). Managing Disaster Response Operations. In T. Drabek, & G. Hoetmer (Eds.), Emergency Management: Principles and Practice for Local Government. Washington: International City Management Association, 201-224. Stewart, T. R., & Bostrom, A. (2002). Extreme Event Decision Making: Workshop Report. Albany, NY: University at Albany. Turner, B. A. (1995). The Role of Flexibility and Improvisation in Emergency Response. In T. Horlick-Jones, A. Amendola, & R. Casale (Eds.), Natural Risk and Civil Protection. London: E.&F. Spon, 463-475. Vidaillet, B. (2001). Cognitive Processes and Decision Making in a Crisis Situation: A Case Study. In T. K. Lant, & Z. Shapira (Eds.), Organizational Cognition: Computation and Interpretation. Mahwah, NJ: Lawrence Erlbaum Associates, 241-263. Walsham, G. (1995). Interpretive Case Studies in IS Research: Nature and Method. European Journal of Information Systems, 4(2), 74–81. Webb, G. R. (2004). Role Improvising during Crisis Situations. International Journal of Emergency Management, 2(1-2), 47-61. Webb, G. R., & Chevreau, F.-R. (2006). Planning to Improvise: The
Importance of Creativity and Flexibility in Crisis Response. International Journal of Emergency Management, 3(1), 66–72. Weick, K.E. (1985). Systematic Observational Methods, in G. Lindzey and E. Aronson (Eds.), The Handbook of Social Psychology, 567-634, Random House, New York. Woods, D. (2006). Essential Characteristics of Resilience. In E. Hollnagel, D. Woods, & N. Leveson (Eds.), Resilience Engineering: Concepts and Precepts. Aldershot, UK: Ashagate.