RED TEAM PERFORMANCE FOR IMPROVED COMPUTER ...

RED TEAM PERFORMANCE FOR IMPROVED COMPUTER SECURITY Sara Kraemer+, Pascale Carayon+ and Ruth Duggan* +Department of Industrial Engineering Center for Quality and Productivity Improvement University of Wisconsin-Madison 610 Walnut Street 575 WARF Madison, WI 53726 Tel: 1-608- 263-2520 Fax: 1-608-263-1425 Email: [email protected] / [email protected] * Sandia National Laboratories PO Box, 5800, MS-1375 Albuquerque, NM 87185-1375 Tel: 1-505-844-9320 Fax: 1-505-284-9043 Email: [email protected] ABSTRACT This research attempts to develop a human factors understanding of red team assessment strategies in computer and information security. Red teaming is an advanced form of assessment that can be used to identify weaknesses in a variety of security systems. The purpose of this research is to identify and define the various dimensions of red team effectiveness with the aim of improving red team performance. A study of a red team was conducted in collaboration with Sandia National Laboratories Information Design Assurance Red Team (IDART). The design of the study included semi-structured individual interviews and focus groups with red team members and observation of red team practices. The analysis yielded various dimensions of red team effectiveness from the customer, management, individual, and team member perspectives.

INTRODUCTION AND BACKGROUND Red teaming in computer and information security Adversaries of computer and information systems can and will plan and execute strategic attacks campaigns against the United States (Tinnel, Saydjari, & Farrel, 2002). The Department of Defense has recognized that red teaming has long been a valuable, if underutilized, tool for deepening the understanding of the adversaries the United States faces in the war on terrorism (Defense Science Board Task Force, 2003). In particular, red teaming is valuable in understanding adversary’s capabilities and potential responses to United States’ initiatives (Defense Science Board Task Force, 2003). In order to expand and improve the experience base of system defenders, a developed understanding of the strategies and tactics employed by red teams is critical to warding off attacks on computer and information systems. Red teams reveal weaknesses in computer and information security systems. The use of red teams (Schudel & Wood, 2000a) and so-called “ethical hacking” (see, for example, Palmer, 2001) are important mechanisms for detecting system

vulnerabilities and hence enhancing security, since they allow system defenders to understand system weaknesses from the adversary perspective. Whereas ‘daily security enforcement’ may work for a while, red team attacks and correction of the defects that they reveal is necessary for organizations’ computer and information security systems (Computer Science and Telecommunications Board-National Research Council, 2002). This paper specifically examines Sandia National Laboratories Information Design Assurance Red Team (IDART). The objective of this study was to identify measures of red team performance with the purpose of improving red team performance. The knowledge obtained by red teams is especially beneficial when the target system is still in development and designers can readily effect improvements (Wood & Duggan, 1999). The red team approach is based on the premise that an analyst who attempts to model an adversary can find systemic vulnerabilities in a computer and information system that would otherwise go undetected. They seek opportunities to combine system, organizational, and architectural vulnerabilities in order to execute a successful attack. Sandia’s red team has developed a formal methodology of assessment

(Wood & Duggan, 1999). This method includes team building, system assessing and attacking, and reporting to the customer. A significant portion of a red team project is system assessment. This includes gathering source information, describing the system, creating an objective purpose, identifying critical success factors, formulating functional, spatial, temporal, system lifecycle, and consequence-based views of the system, identifying candidate vulnerabilities for attack, and formulating attack plans. Specific adversary goals resulting in negative consequences are called “flags”. The red team has “captured the flag” when they have successfully accomplished those goals. The customers of Sandia’s red teams are from the private sector, ranging from banking and finance, information technology, manufacturing and ecommerce, as well as the public sector, including the Departments of Defense, Energy, Interior, Homeland Security, and State. There is a need for a more comprehensive understanding of red teams. Not only is there a very high demand for a red team assessment, little research has been done to understand the effectiveness of red teams. This is due to their lack of availability, accessibility, and funding for team performance analysis. Teams are more likely to be successful and efficient when they are proficient in recording and filing information and function in a systematic process (Lynn & Reilly, 2000), and often the knowledge obtained by red teams is recorded only in project specific confidential reports or left to the cognition of the people of the official red team. Some research has been done to understand red team performance (Carayon, Duggan, & Kraemer, 2003), but more and better information is needed to define and measure red team performance in order to have a more meaningful impact on the mitigation of vulnerabilities, security breaches, and attacks. Specifically, formalized measures of red team performance are needed in order to monitor performance over time, track the varying factors affecting performance, and indicate types of interventions for performance improvement (e.g., training, feedback). The purpose of this study was to describe preliminary measures of red team performance in order to improve red team performance. Research on red teaming and teams The existing human factors research in red teaming is extremely limited. Cognitive task analyses of individual and groups of hackers who have attacked networks and websites have been used to reveal how hackers select targets, distribute and share responsibilities, and conduct actual attacks (McCloskey & Stanard, 1999). Experimental research has measured the effects of deception defenses on attacks against computer systems and networks (Cohen, Marin, Sappington, Stewart, & Thomas, 2001). Experimentation conducted or funded by the Defense Advanced Research Projects Agency (DARPA) shows the use of red teams in evaluating different defense mechanisms (Kewley & Bouchard, 2000; Schudel & Wood, 2000b) such as survivability solutions (Pal, Atighetchi, Webber, Schantz, & Jones, 2003). These experiments have not directly evaluated red team performance, but have identified a

number of factors that may contribute to red team performance, such as experience or proficiency (Pal, et. al, 2003) and learning (Kewley & Bouchard, 2000; Pal et al., 2003). They have also identified some of the red team behaviors, such as usage of time, work process, and risk perception (Schudel & Wood, 2000b). Research has been done to develop measures of adversary work factor or red team work factor (Schudel & Wood, 2000b; Wood & Bouchard, 2001). Red team work factor measures the amount of effort required by a red team (an adversary) to accomplish an attack (i.e. to capture a flag) (Wood & Bouchard, 2001). Experiments conducted by DARPA show that red team work factor can be useful in comparing different system configurations (Schudel & Wood, 2000b), especially if red team capability varies. Capability may include different red team behaviors, which depend upon their preparations, training, and talents. However, red team work factor may be more a measure of the red team capability instead of system improvement (Schudel & Wood, 2000b). Red team work factor may still be useful in measuring the effectiveness of a red team. In particular, when faced with multiple problems, comparing red team work factor between different exercises by the same team may provide information on how different team characteristics affect red team work factor (Wood & Bouchard, 2001). When trying to identify factors that contribute to red team performance, red team work factor may be one measure of red team performance that can be correlated with various team characteristics. In order to understand what is required for effective teamwork, it is first necessary to define a team. A team is a set of two or more individuals who interact interdependently and adaptively toward a common goal or objective (Cannon-Bowers & Salas, 1998). There are different types of teams and Sandia’s red team fits most closely with the work team definition. Work teams are continuing work units responsible for producing goods or providing services, their membership is typically stable and full time, and well-defined (Cohen & Bailey, 1997). This definition also includes self-managing work teams with members who are cross trained in a variety of skills relevant to the tasks that they perform. There is no single measure of team performance that is appropriate for all purposes. A distinction in team performance assessment concerns outcome versus process. Although teams are valued in large part for their outcomes, these measures often contain variance attributable to factors other than team work (Brannick & Prince, 1997). Team process measures may be closer to a true picture of team functioning, but a comprehensive measure of team performance needs to contain elements of both process and outcome (Brannick & Prince, 1997). The primary types of team performance measures are: (1) descriptive measures (i.e. process), which describe what is happening at any given time and seek to document individual and team behaviors; (2) evaluative measures (i.e. outcome), which judge performance against identifiable standards and serve to answer questions of effectiveness; and (3) diagnostic measures (i.e. process), which seek to identify causes of behavior and question how and why things occurred as they did (Paris, Salas, & Cannon-

The IDART program consists of core, non-core, and matrix red team members. Core red team members are system analysts who regularly participate in red team projects and whose full-time job is within the IDART assessment department. Non-core members are system analysts who semiregularly participate in red team projects and are not members of the IDART assessment department. Matrix members rarely participate in red team projects. They are accessed for their specific expertise, which is needed for specific systems under consideration. For example, a red team examining a biological and chemical agent detection system could include experts on biological and chemical warfare agents. These members are accessed from the pool of experts within the Sandia organization. Individual interviews were done with eleven core members, three non-core members, and two matrix members. The first focus group included six core members and the second focus group consisted of seven core members. The transcribed notes and interviews were analyzed by coding the themes of interviews and observations using the qualitative software package, QSR NVivo©. The coding structure consisted of nodes, representing a defined category of red team effectiveness. When coded, a node held references to passages of text from the observation and interview data. In this paper, analysis data on performance is reported and discussed. RESULTS Findings are reported in measurement category totals and most frequently cited dimensions of measurement. The coding process resulted in 67 nodes and the total number of comments coded was 95. The nodes were grouped into five major categories. The first four categories were designated into four perspectives: (1) individual team members (12

M easurement Type Descriptive (Process) Evaluative (Outcome) Diagnostic (Process) Total

Total

Perspective

Customer

Table 1. Comments on red team performance Management

Due to the lack of research in red team effectiveness and the exploratory nature of this study, the study design was qualitative in nature, consisting of the following elements: fifteen semi-structured individual interviews and two focus groups with red team members, observation of a red team group training session, attendance of a Sandia technical presentation, personal observation of site surroundings, and analyses of documents pertaining to red team work. The data was collected by the first author. Individual interviews and focus groups used the same open-ended interview guide. See Appendix 1 for interview guide. The individual interviews lasted approximately one hour and the focus groups lasted approximately two and one half hours. One focus group and eleven interviews were audio-recorded and one focus group and four individual interviews were not audio-recorded. Personal notes of these interactions were taken by the first author and the audio-recordings were transcribed.

Team

METHOD

comments); (2) the team as a whole (27 comments); (3) management (12 comments); and (4) customer (30 comments). These perspective categories were further stratified into three dimensions of team measurement: descriptive, evaluative, and diagnostic. Refer to Table 1 for a summary of the number of comments on the perspective categories of performance. The fifth category consisted of comments regarding difficulties in measuring red team effectiveness (14 comments). The measurement dimensions with the largest number of comments within each perspective category are reported.

Individual

Bowers, 2000). Diagnostic measures contribute to inputs to the feedback process necessary to improve subsequent performance (Salas & Cannon-Bowers, 1997).

0 0 12 12

0 14 13 27

11 1 0 12

22 8 0 30

33 23 25 81

The most regularly cited individual team member measure was diagnostic (12 comments). Within this grouping, the dimension of individual professional development (4 comments) was reported most frequently. Individual professional development was defined as individual growth in computer science and system analysis acumen. One core member described this experience: “Especially in our particular setting where we are not like auto mechanics where you get trained to do something and every time you service an engine. We need to be learning at every engagement…very important.” From the perspective of the team as a whole, the most frequently cited measures of effectiveness were evaluative (14 comments) and diagnostic (13 comments). System understanding (5 comments), the most frequently cited dimension in the team evaluative measurement category, was defined as the extent the red team synthesizes and characterizes the system from different viewpoints. One core member described system understanding: “The system understanding is fundamental to the root cause of system weaknesses. This is more than other auditing efforts that scan for vulnerabilities; the red team looks at the hardware, software, physical layout, organizational processes, and policies. The red team is communicating across the functions of the customer’s organization and they are creating a view that not even the organization itself has. The red team “marries” these elements into a larger understanding, which is something that an adversary would take advantage of.” Team dynamics (10 comments) was the most frequently cited dimension in the team diagnostic measurement category, and was defined as team behaviors and attitudes. A comment to describe team dynamics: “No matter how much you plan or prepare attack, there is a huge space of variable stuff that changes when you go from preparation to action. This space is void of any planning, and that’s where team dynamics come

in.” Red team members spend a considerable amount of time in the planning phase (i.e. describing the system, identifying vulnerabilities, planning attacks). When the team moves from the planning phase to engaging in an attack on a system, time is limited. Good team dynamics speaks to the ability of the red team to address and resolve unplanned issues that arise in an attack setting swiftly and accurately. In the management perspective category, the most frequently cited measures were in the descriptive dimension. Obtaining all the system targets in a simulated attack (4 comments) was the most frequently cited descriptive measure. An explanation of the importance of red team’s system targets: “In the red team engagements, we are very careful to define what the goals or the “flags” are. Otherwise, at the end of the day, it is very hard to determine whether or not you are successful.” From the customer’s perspective of red teaming, performance is largely descriptive (22 comments). The quality of the communication with the customer (14 comments) was the most frequently cited dimension of customer descriptive measurement. Communication with the customer is related to the ease of communication (e.g., accessibility of the customer to the team and vice versa), the frequency of communications, the formalism of the communication, and the feedback from the customer which is elicited at multiple steps within the assessment methodology. A comment to illuminate the customer communication and feedback standpoint: “Were you able to show the customer that they were well protected or there were holes? It becomes a matter of how the customer perceives the information you give them and what solutions you provide them.” Red team members expressed difficulty in how to measure red team effectiveness (14 total comments). Among the various topics mentioned, the difficulty in measuring how effective their work was for the customer (9 comments) was the topic mentioned the most. One red team member spoke about the difficulty in assessing the effectiveness of the work from the customer’s perspective: “Because every assessment we do is so unique and so different, I think it’s really hard to come up with a metric that you could use to quantify the effectiveness of the red team. I think that’s what it comes down to really, is the customer. It’s whether or not we’ve done a good job, whether they think we’ve…taken a good shot at it, really.” DISCUSSION The current study focused on describing dimensions of red team performance measures. The range of perspectives that were reported by red team members was consistent with the team literature that emphasizes the need for more than one type of performance measurement (i.e. process and outcome measures). In general, the red team placed an emphasis on the team and customer perspectives. At the team performance level, red team members tended to view it both diagnostically and evaluatively, surmising that not only is team process an important measure, but so is the output of the team. Team dynamics stood out as an important measure. Sandia’s red

team spends a considerable amount of time assessing systems and creating ‘multiple system views’. This includes brainstorming and planning attacks, efforts that require substantial group effort, usually under the pressure of limited time. Red team members also emphasized team evaluative (i.e. outcome) measures. They highlighted the ‘lessons learned’ at the end of the project as a marker for the things that went well in the project, as well as the problems encountered. In addition, the extent that the red team understood and accurately characterized the system under consideration was also stressed. How well the team creates the system understanding is directly associated to the ultimate measures of success, capturing the flags with the planned attacks. The learning or feedback processes that were emphasized in this study could be another measure to examine when assessing future red team performance. Red team members viewed the customer’s perspective of performance as largely descriptive, deeming communication with the customer as a key measure. Since the customer is not present in all of the team processes where team dynamics or other key team factors play roles, the quality of the interactions with the customer is important in order to address how the team is working and meeting its project goals. In previous red team studies, the customer perspective was not addressed. Assessing that perspective may be beneficial in validating red team members’ evaluation of performance. Red team members described individual member effectiveness in terms of professional learning and having fun at work. Individual learning during projects was considered an important measure of effectiveness because of the uniqueness of each project. These dimension of individual team member performance measurement may be useful in tracking how increased professional learning is correlated with other outcome measures, such as capturing flags or achieving other statement of work goals (e.g., time and budget constraints). In summary, red team members described numerous measures of red team effectiveness. These ranged from team behaviors and attitudes to more easily quantifiable measures such as meeting deadlines and capturing system targets. In previous red team studies (Schudel & Wood, 2000b; Wood & Bouchard, 2001) time to complete an engagement, such as red team work factor, addressed a quantifiable measure that could be used to measure some dimensions of red team performance. The various measures addressed in this study could broaden the understanding of red team performance in that other factors affecting performance could be used for improvement of both the systems under inspection as well as red team performance. CONCLUSION Comprehensive measures of red team effectiveness could improve red team performance in several ways. Firstly, to determine red team effectiveness for the purpose of pinpointing team performance strengths and weaknesses, a set of team performance measures must exist. Secondly, teams evolve over time (Morgan, Glickman, Woodard, Blaiwes, &

Salas, 1986) and the length of time that they have worked together can have a significant effect on group processes (Foushee, Lauber, Baetge, & Acomb, 1986). Establishing baseline and continuous measures of red team performance could provide feedback and other mechanisms for selfcorrection over time. This has significant implications for selfmanaged work teams (Cannon-Bowers & Salas, 1998) and may also be applied to Sandia’s red team. Further, red team experiments, such as measuring red team work factor, require establishing a baseline for red team and system performance before engaging in multiple runs of a given experiment (Schudel & Wood, 2000b). Thirdly, a set of performance measures would help guide the team improvement efforts, such as team training or resource allocation. Limitations of this study include the fact that descriptions of measures were based solely upon red team members’ perceptions. Further, IDART’s red teams represent one type of red team. There are other red teams in existence and it would be interesting to assess if the same or different dimensions of performance would be identified in other groups. This includes extending this preliminary work by interviewing other red team members and leaders at Sandia, as well as other red teams. Another area of future work is the investigation of the factors that contribute to and hinder red team performance. In addition to interviewing red team members, we may conduct observations of red team interaction at the following stages: team forming, brainstorming sessions, sessions formulating candidate vulnerabilities and attacks, engagement of the system, and wrap-up sessions. Identifying these factors, such as team design or member composition, could help in determining how to configure a high-performing red team. There is much to be learned about the factors associated with red team performance and understanding red team performance measurement and its various facets is the first step into this inquiry. ACKNOWLEDGMENTS Funding provided by Department of Defense on “Modeling and Simulation for Critical Infrastructure Protection” (#DAAD19-01-1-0502, PI: S. Robinson, UW-Madison). REFERENCES Brannick, M. T., & Prince, C. (1997). Overview of team performance measurement. In M. T. Brannick & E. Salas & C. Prince (Eds.), Team Performance Assessment and Measurement (pp. 3-16). Mahwah, NJ: Lawrence Erlbaum Associates. Cannon-Bowers, J. A., & Salas, E. (1998). Team performance and training in complex environments: Recent findings from applied research. Current Directions in Psychological Science, 7(3), 83-87. Carayon, P., Duggan, R., & Kraemer, S. (2003). A model of red team performance. In K. J. Zink (Ed.), Seventh International Symposium on Human Factors in Organizational Design and Management. Aachen, Germany. Cohen, F., Marin, I., Sappington, J., Stewart, C., & Thomas, E. (2001). Red teaming experiments with deception technologies. Retrieved from the Strategic Security Intelligence Website: http://www.all.net/journal/deception/experiments/experiments.html.

Cohen, S. G., & Bailey, D. E. (1997). What makes teams work: Group effectiveness research from the shop floor to the executive suite. Journal of Management, 23(3), 239-290. Computer Science and Telecommunications Board-National Research Council. (2002). Cybersecurity Today and Tomorrow: Pay Now or Pay Later. Washington, DC: National Academy Press. Defense Science Board Task Force. (2003). The role and status of DoD red teaming activities. Washington, D.C.: Office of the Under Secretary of Defense for Acquisition, Technology, and Logistics. Foushee, H. C., Lauber, J., Baetge, M., & Acomb, D. (1986). Crew factors in flight operations: III. The operational significance of exposure to short-haul air transport operations (NASA Technical Memorandum 88322). Sunnyvale, CA: National Aeronautics and Space Administration-Ames Research Center. Kewley, D. L., & Bouchard, J. F. (2000). DARPA Information Assurance program dynamic defense experiment summary, Proceedings of the 2000 IEEE Workshop on Information Assurance and Security (pp. 117-122). United States Military Academy, West Point, NW. Lynn, G. S., & Reilly, R. R. (2000). Measuring team performance. ResearchTechnology Management, 43(2), 48-56. McCloskey, M. J., & Stanard, T. (1999). A red team analysis of the electronic battlefield: A cognitive approach to understanding how hackers work in groups, Proceedings of the Human Factors and Ergonomics Society 43rd Annual Meeting (pp. 179-183): Human Factors and Ergonomics Society. Morgan, B. B., Glickman, A. S., Woodard, E. A., Blaiwes, A. S., & Salas, E. (1986). Measurement of team behaviors in a Navy environment (Tech. Rep. No TR-86-014). Orlando, FL: Naval Training Center. Pal, P., Atighetchi, M., Webber, F., Schantz, R., & Jones, C. (2003). Reflections on evaluating survivability: The APOD experiments, The 2nd IEEE International Symposium on Network Computing and Applications (NCA-03). Royal Sonesta Hotel, Cambridge, MA. Palmer, C. C. (2001). Ethical hacking. IBM Systems Journal, 40(3), 769-780. Paris, C. R., Salas, E., & Cannon-Bowers, J. A. (2000). Teamwork in multiperson systems: A review and analysis. Ergonomics, 43(8), 10521075. Salas, E., & Cannon-Bowers, J. A. (1997). Methods, tools, and strategies for team training. In M. A. Quinones & A. Ehrenstein (Eds.), Training for a Rapidly Changing Workforce: Applications of Psychological Research (pp. 249-279). Washington, D.C.: American Psychological Association. Schudel, G., & Wood, B. (2000a). Modeling behavior of the cyber-terrorist, Conference Proceedings: Research on Mitigating the Insider Threat to Information Systems-#2. Santa Monica, California: Rand. Schudel, G., & Wood, B. (2000b). Adversary work factor as a metric for information assurance, Proceedings from the New Paradigms in Security Workshop. Ballycotton, County Cork, Ireland: Association of Computer Machinery. Tinnel, L. S., Saydjari, O. S., & Farrell, D. (2002). Cyberwar strategy and tactics: An analysis of cyber goals, strategies, tactics, and techniques, Proceedings of the 2002 IEEE: Workshop on Information Assurance. United States Military Academy, West Point, NY. Wood, B. J., & Duggan, R. (1999). Red teaming of advanced information assurance concepts, DISCEX2000 DARPA Information Survivability Conference (pp. SAND99-2590C). Hilton Head, South Carolina. Wood, B., & Bouchard, J. F. (2001). Red team work factor as a security measurement, Proceedings of the Workshop on Information Security Scoring and Ranking. Williamsburg, Virginia: Applied Computer Security Associates.

APPENDIX 1 Individual and focus group interview guide 1. Various factors affect red team performance and red team performance can be evaluated on different dimensions. What are the various criteria for evaluating red team performance?

RED TEAM PERFORMANCE FOR IMPROVED COMPUTER ...

RED TEAM PERFORMANCE FOR IMPROVED COMPUTER ...

Suggest Documents

Simulation: Translation to Improved Team Performance

for improved airplane performance for improved ... - SmartCockpit

for improved airplane performance for improved airplane performance

LNAI 5638 - Improved Team Performance Using EEG ... - Springer Link

curvilinear red: an improved red algorithm for internet

Improved Performance Phase Detector for

Recommendations for Improved Seismic Performance

Search for improved-performance scintillator

Team Performance in Software Development - IEEE Computer Society

Team Development and Team Performance - CiteSeerX

Team Perfectionism and Team Performance: A ...

Team Implicit Coordination and Emergency Team Performance

Team Development and Team Performance - CiteSeerX

team emotion recognition accuracy and team performance

Team Performance Management

Improved Query Performance with Variant Indexes - Duke Computer ...

Team Pay for Performance: Experimental ... - Vanderbilt University

Analyzing Team Performance - American Institutes for Research

protein for performance - Texas Beef Team

Red Team vs. Blue Team Hardware Trojan Analysis - Michael Hutter

Improved performance of InAs0.07Sb0.93

Computer Security Incident Response Team

Health Tracking for Improved Humanitarian Performance - CiteSeerX

Financial Incentives for Improved Sustainability Performance - CiteSeerX