Visualizing patterns in a global terrorism incident database

1 downloads 0 Views 742KB Size Report
visualize temporal data, where the horizontal and vertical axes represent ... actually not countriesöfor example, `occupied territory', `Kashmir', `Northern ... For a very small number of records, FRY is reassigned as `Serbia and Montenegro' and.
Environment and Planning B: Planning and Design 2007, volume 34, pages 767 ^ 784

DOI:10.1068/b3305

Visualizing patterns in a global terrorism incident database

Diansheng Guo, Ke Liao, Michael Morgan

Department of Geography, University of South Carolina, 709 Bull Street, Columbia, SC 29208, USA; e-mail: [email protected], [email protected], [email protected] Received 21 January 2006; in revised form 27 July 2006

Abstract. The terrorism database includes more than 27 000 terrorism incidents between 1968 and 2006. Each incident record has spatial information (country names for all records and city names for some records), a time stamp (ie year, month, and day), and several other fields (eg tactics, weapon types, target types, fatalities, and injuries). We introduce a unified visualization environment that is able to present various types of patterns and thus to facilitate explorations of the incident data from different perspectives. With the visualization environment one can visualize either spatiomultivariate, spatiotemporal, temporal ^ multivariate, or spatiotemporal ^ multivariate patterns. For example, the analyst can examine the characteristics (in terms of target types, tactics, or other multivariate vectors) of aggregated incidents and at the same time perceive how multivariate characteristics change over time and vary spatially. Special attention is devoted to the application-specific data analysis process, from data compilation, geocoding, preprocessing, and transformation, through customization and configuration of visualization components, to the interpretation and presentation of discovered patterns.

1 Introduction The occurrence of terrorism activities involves various social, political, and economic factors, which differ from country to country and vary over time. The characteristics of such terror acts, including the specific tactics, weapons, target types, or fatalities inflicted, also change across time and space. Several organizations have engaged in major efforts to collect data that will assist analysts, policymakers, and practitioners who are attempting to understand the trends of terrorism activities. For example, the RAND Corporation (http://www.rand.org) and the Oklahoma City National Memorial Institute for the Prevention of Terrorism (MIPT, http://www.mipt.org) maintain an online database of terrorism incidents (http://www.tkb.org), including international terrorist incidents that have occurred since 1968 and domestic incidents that have occurred since 1998. The database has more than 27 000 incident records, each having a geographic identifier (eg country name), a date, and several fields describing the characteristic of that incident. It is a challenging task to explore and understand the patterns and trends in this dataset. Very often we are forced to examine patterns along one dimension at a time. For example, the data may be examined chronologically to determine the rate of change. The data may also be organized geographically by generating country-level choropleth or density maps (see figure 1). The data may also be summarized by target types to understand the characteristics of incidents. There are two major challenges to visualize such a complex dataset. First, it is difficult to visualize the data across several dimensions (including geography, time, and multiple attributes) and to present the patterns in an easy-to-understand form. For example, we may want to examine the spatial variations of target compositions. Did terrorists in different countries pick different targets? Did such target compositions form spatial clusters? Second, to gain a comprehensive understanding of the data, we often need to visualize the data from different perspectives and search for different types of patterns.

768

D Guo, K Liao, M Morgan

Number of domestic terrorist incidents (1998 ^ May 2006)

(a)

Number of international terrorist incidents (1998 ^ 2005)

(b)

Figure 1. (a) Domestic incidents (1998 ^ May 2006), (b) international incidents (1968 ^ 2005). Instead of using the raw count of incidents, we can also map the fatalities, injuries, or fatalities and injuries with such choropleth maps. Available in color at http://www.envplan.com/misc/b3305.

For example, in addition to the multivariate spatial patterns described above, we may also want to look at the spatial variations of temporal trends or the evolvement of multivariate patterns over time. What is the temporal trend of terrorism in each country and do countries with similar temporal trends tend to cluster in space? What were the predominant weapons for a period and how did such weapon preferences change over time? To meet the special challenges in visualizing this terrorist incident data, we extend recent research on multivariate mapping (Guo et al, 2005), and space ^ time and multivariate visualization (Guo et al, 2006), to build a unified visualization environment that can flexibly support the visualization of spatiomultivariate (SV), spatiotemporal (ST), temporal ^ multivariate (TV), and spatiotemporal ^ multivariate (STV) patterns. We also incorporate proportional symbols with multivariate visualization, where symbol colors represent multivariate or temporal patterns while symbol sizes represent an additional quantitative variable (eg the number of incidents, or fatalities and injuries). In this paper we specifically focus on the application-oriented data visualization process, including data compilation, geocoding, transformation, configuration of visualization components, and the interpretation of discovered patterns. The purpose of such analyses is to reveal interesting patterns (eg changing characteristics, geographic variations, and shifting temporal trends), which may shed light on our understanding of terrorist activities. This paper is organized as follows. We briefly review related research in the next section. In section 3 we introduce how we collect, geocode, and preprocess the incident data. Section 4 presents an overview of the visualization environment, dynamic data transformation, and related methodologies, including multivariate clustering and coloring, a reorderable matrix, multivariate legend and visualization, and an optimal hierarchical clustering based ordering method. Section 5 presents the visualization results for four different types of patterns in the terrorist incident data, namely, SV, ST, TV, and STV patterns. We conclude the paper with a summary and discussions. 2 Related work 2.1 Quantitative modeling of terrorism

Terrorism analyses are often qualitative (Cronin, 2002) or use simple summary statistics (eg histograms) to show trends in a particular variable (eg number of incidents) over a time period (Hoffman, 1999). Enders and Sandler (1999) used time-series

Visualizing patterns in a global terrorism incident database

769

analysis to support their hypotheses that the end of the cold war resulted in a decrease in transnational terrorism. Later they analyzed transnational terrorism incidents with casualties during the period 1970 ^ 99, using a linear model and a threshold autoregressive model to search for relationships between the occurrence of terrorism incidents and significant policy changes (Enders and Sandler, 2002). O'Brien (1996) described the relationship between foreign policy crisis outcomes and terrorist incidents using Box ^ Jenkins time-series methods. Game theory was used to develop a probability distribution for insurance losses related to terrorism (Major, 2002). Another application of game theory in terrorism analysis was presented in Sandler and Arce (2003); the authors described the interaction between terrorist groups and targeted states as a strategic game between rational actors. They concluded that tourists, civilians, and businesses are most likely to be targets because they have the least deterrence capability. The modeling approaches mentioned above build on rigorous statistical or mathematical models, which are formulated with a priori hypotheses and calibrated with observational data. However, currently we only have limited understanding of the causes, development, and diffusion of terrorism activities. Moreover, terrorism data are often incomplete or inaccurate, and only represent the outcome (eg incidents) but not the process. These factors limit our ability to formulate a valid hypothesis and to test the hypothesis. Therefore, we believe that exploratory analysis approaches, including both visual and computational methods, have much to offer in the exploration of terrorism data by revealing unknown trends or regularities, prompting new thoughts, and helping the analyst to gain insights to formulate better hypotheses and models. 2.2 Multivariate visualization and mapping

Multivariate visualization research focuses on developing methodologies to visualize data that have multiple variables. Multivariate visualization methods range from the commonly used tables, histograms, scatter plots, and charts, to more sophisticated methods like scatterplot matrices (Andrews, 1972), pixel-oriented approaches (Keim and Kriegel, 1996), reorderable matrices (Bertin, 1983; 2001), and parallel coordinate plots (PCP) (Inselberg, 1985). In its generic form, a reorderable matrix is organized like a data table with variables as columns and data items as rows. Each data item (row) has a value for each variable (column). As its name implies, the reorderable matrix can sort its columns and rows so that similar columns or similar rows are placed next to each other. Thus, patterns across multiple variables and multiple data items can emerge. There are also approaches designed to help visualize multiple nominal data variables (Rosario et al, 2003). The goal of multivariate mapping is to make a map that represents the spatial distribution of multiple variables and their relationships to each other. Multivariate mapping methods can be divided into three primary groups, including composite glyphs, strategies for overlay of multiple layers, and multiple linked views. One of the best known composite glyphs are the Chernoff faces (Chernoff and Rizvi, 1975), which visualize multivariate data by relating different variables to different facial features to form a face icon for each data object and then draw each face icon on a map (Dorling, 1994). A recent approach employs a self-organizing map to cluster and color multivariate data and then constructs a single `choropleth map' to visualize multivariate spatial patterns (Guo et al, 2005). Instead of making a single map with multivariate information, multiple views (including a map component) with dynamic linking have been commonly used to visualize multivariate geographic data (Dykes, 1998; MacEachren et al, 1999, Monmonier, 1989).

770

D Guo, K Liao, M Morgan

2.3 Spatiotemporal visualization

In addition to multiple variables and spatial information, data to be analyzed often have temporal information as well, as in the terrorism incident data. There are numerous strategies to visually represent temporal or spatiotemporal data, as outlined in recent reviews (Andrienko et al, 2003; Silva and Catarci, 2002). A natural choice for temporal visualization is a time line, where time points are plotted on a horizontal line (Havre et al, 2002). Pixel-based techniques (Keim et al, 2004) can also be adopted to visualize temporal data, where the horizontal and vertical axes represent different temporal scales (eg year, day, hour) and each data point is shown as a pixel. Spiral displays arrange temporal data points along a spiral in the temporal order (Carlis and Konstan, 1998). Animated maps have been extensively studied as a means to depict spatiotemporal trends (Blok, 2005; Blok et al, 1999; MacEachren and DiBiase, 1991). However, these different techniques are only able to explore univariate datasets and cannot visualize multivariate patterns that change over space and time. Exploratory temporal analysis has recently been studied in a variety of research fields. For example, in a brain study, a homogeneity map was made by clustering a set of multidimensional time courses and then assigning each time course a unique color according to the cluster it belonged to (Baumgartner and Somorjai, 2001). Thus, their homogeneity map shows brain regions that behave similarly over time. In another research, the self-organizing map (SOM) was customized to perform sequence analysis with temporal data and was renamed the recurrent self-organizing map (RSOM) (Varsta et al, 2001). A recent approach introduced in Guo et al (2006) also treats time steps as variables and is thus able to map temporal patterns and to examine their spatial variations. There is research that visualizes highway incidents through multiple views, with each view focusing on a different variable (eg time, location, or categorical data) (Fredrikson et al, 1999). However, with this approach the user has to rely on interactions (eg selections) to highlight patterns across different viewsöfor example, where, when, and what type of vehicles caused the most incidents. 3 Data compilation and preprocessing 3.1 Data source

We collected 27 940 global terrorist incidents (from 1968 to May 2006) from the MIPT terrorism knowledge database (http://www.tkb.org). The database is frequently updated to include the most recent incidents that comply with the definition of terrorism adopted by MIPT. It defines `terrorism' as `premeditated, politically motivated violence perpetrated against noncombatant targets by subnational groups or clandestine agents, usually intended to influence an audience' (http://www.mipt.org/terrorismdefined.asp). However, as pointed out in Cronin (2002), there is no universally accepted definition of terrorism and sometimes it is also very difficult to determine if one specific incident fits a definition. Nevertheless, the MIPT dataset is well maintained and consistent with its own definition, and has been used in several published analyses (eg Enders and Sandler, 1999; 2002; Hoffman, 1999). Table 1 shows the data fields that we collected from the MIPT website. Each incident record has several data fields: (1) location, (2) date, and (3) characteristics of the incidentöfor example, injuries and fatalities, weapon, tactic, target, and a text description of the incident. Incidents are classified into two nonoverlapping typesö that is, domestic incidents and international incidents. A domestic incident is one that is `perpetrated by local nationals against a purely domestic target', while an international incident is one `in which terrorists go abroad to strike their targets, select domestic targets associated with a foreign state, or create an international incident by attacking airline passengers, personnel or equipment' (quoted from the MIPT glossary,

Visualizing patterns in a global terrorism incident database

771

Table 1. Data fields for each terrorist incident record. Field

Type

Unique values

Incident date

date

±

Country

text

±

Targeta

text

abortion related, airports and airlines, business, diplomatic, educational institutions, food or water supply, government, journalists and media, maritime, military, NGO, other, police, private citizens and property, religious figures or institutions, telecommunication, terrorists, tourists, transportation, unknown, utilities

Weapona

text

biological agent, chemical agent, explosives, fire or firebomb, firearms, knives and sharp objects, other, remote-detonated explosive, unknown

Tactic a

text

armed attack, arson, assassination, barricade or hostage, bombing, hijacking, kidnapping, other, unconventional attack, unknown, unspecified

Injuries

integer

±

Fatalities

integer

±

Description

memo

(a brief description of the incident)

a The

classifications of terrorist-incident tactics, targets, and weapons are maintained by the RAND Corporation. All unique categorical values for these three fields are listed in the third column.

http://www.tkb.org/Glossary.jsp). The MIPT database has different temporary coverage for domestic and international incidents. International incidents are recorded from 1968 to 2006 while domestic incidents are recorded from 1998 to 2006. In this paper international incidents and domestic incidents are analyzed separately. 3.2 Geocoding of incidents

Geographic information includes a country name and a city name for each record. However, only about 67% of the records have valid city namesöthat is, there are over 8000 incidents with no city-level geographic information. Therefore, the analysis reported in this paper focuses on country-level patterns. Country names in the original data are not always consistent with international standards. Some `country names' are actually not countriesöfor example, `occupied territory', `Kashmir', `Northern Ireland (UK)', and `West Bank/Gaza'. Some countries do not exist any moreöfor example, `Federal Republic of Yugoslavia (FRY)' and `Union of Soviet Socialist Republics (USSR)'. The National Geospatial-Intelligence Agency (NGA, formerly NIMA) maintains a GEOnet Names Server (GNS, http://earth-info.nga.mil/gns/html/) that provides foreign geographic feature names. We use this NGA database to support the geocoding of incidents at the country level. However, we keep three special regions (which are not countries): `West Bank/Gasa', `Northern Ireland (UK)', and `Kashmir'. `Occupied territories' in the MIPT data was changed to `West Bank/Gaza'. We only use currently existing country names (plus the above three special regions). Records with historical names (eg FRY or USSR) are resolved to a current country if city names are available. For a very small number of records, FRY is reassigned as `Serbia and Montenegro' and USSR is reassigned as `Russia' due to missing or unknown city names.

772

D Guo, K Liao, M Morgan

The analyses reported in the remainder of this paper focus on a selected area that covers European countries, the Middle East, North Africa, and some Asian countries ö altogether 104 countries and 3 special regions [West Bank/Gaza, Northern Ireland (UK), and Kashmir] (see figure 1). We select this study region rather than all countries for three reasons. First, many small-sized countries are not visible in a world map. Although this is not a problem with large computer displays, it dramatically decreases the readability of figures in a paper presentation. Second, as mentioned earlier, the data quality from different countries may differ. Since Europe and its surrounding regions have long been the focus of terrorism and counterterrorism, we believe that its data coverage is relatively reliable. Third, the selected area covers the majority of incidents in the database. Globally there are 10 030 international incidents (1968 ^ 2005) and 17 854 domestic incidents (1998 ^ May 2006). The selected area has 14 311 domestic incidents (80%) and 6699 international incidents (67%). 3.3 Data aggregation and transformation

The entire set of geocoded incidents is the initial data input to the visualization environment. Depending on the user's interactive configuration, the data are aggregated and transformed `on the fly'. We conceptually represent the aggregated data with a 3D data cube, which has a vertical dimension, horizontal dimension, and a multivariate dimension [eg weapons in figure 2(d)]. Figure 2 shows four possible configurations of the 3D aggregation cube. See table 2 for a full range of options that the user can use to visualize the incident data from different perspectives. The user can specify only two of the three dimensions, in which case we use ALL to represent the third dimension [see figures 2(a), 2(b), and 2(c)]. Table 2. Visualization configuration options. Configuration item

Options

Dataset

international incidents (1968 ± 2005) domestic incidents (1998 ± May 2006)

Data cube

select any three of the following to define the cube: countries, years, tactics, targets, weapons, ALL

Value type

raw count of incidents, fatalities, injuries, or fatalities plus injuries

Data normalization

original values (no normalization) percentage values

The value for each cell in the cube can be the total number of incidents, injuries, fatalities, or injuries plus fatalities. Thus, each vertical ^ horizontal combination [eg country/year in figure 2(d) is a multivariate vector (eg values for different weapon types for that country and year]. If years are linked to the multivariate dimension [see figure 2(b)], then each multivariate vector becomes a time series. The user has the option to normalize each vector to percentage values by dividing each value with the total of that vector. Such normalization is often necessary due to the extremely uneven distribution of incidents from country to country. For example, Iraq or Israel has hundreds of incidents each year while some countries only have fewer than ten incidents for over thirty years.

Visualizing patterns in a global terrorism incident database

773

(a)

(b)

(c)

(d) Figure 2. A unified visualization environment. Here it shows four possible configurations: (a) spatiomultivariate (SV) visualization, (b) spatiotemporal (ST) visualization, (c) temporal ^ multivariate (TV) visualization, and (d) spatiotemporal ^ multivariate (STV) visualization.

774

D Guo, K Liao, M Morgan

4 Overview of the visualization environment Our unified visualization environment can visualize the incident data in a variety of ways and thus allow the analyst to examine patterns from different perspectives. Although figure 2 only shows four different configurations of the visualization system, the user can follow the options for the four configuration items to create more variations. Given a 3D cube of aggregated data, the visualization system treats it as a set of multivariate vectors, which are organized with a 2D matrix (see figure 2). First, an SOM is used to reduce each multivariate vector to a color and thus to compress the 3D cube to a color-coded 2D matrix (hereafter reorderable matrix), where colors represent multivariate information. Second, if countries are used to define the 2D matrix [figures 2(a), 2(b), and 2(c)], then that color-coded reorderable matrix can be converted to a map [figures 2(a) and 2(b)] or a map matrix [figure 2(d)]. If countries are not involved in the cube, then a map will not appear in the view [figure 2(c)]. Third, a PCP-based multivariate legend is constructed to show the multivariate meaning of each color so that the analyst can understand the colors in the reorderable matrix or maps. Fourth, to present patterns better and to help the analyst to perceive a holistic view, a hierarchical clustering based optimal ordering method can be used to order rows and/or columns in the reorderable matrix, maps in the map matrix, or dimensions in the parallel coordinate plot. Below we briefly introduce the methods used in each of these four steps. 4.1 Multivariate clustering and coloring

An SOM (Kohonen, 2001) is a nonlinear clustering method that projects multivariate data to clusters and organizes the clusters in a 2D layout so that nearby clusters are similar to each other. We use a 2D color scheme to assign each cluster a unique color and we make sure that nearby clusters have similar colors. Readers are referred to Guo et al (2005) for methodological details. Both figure 3 and figure 4 show a colored SOM. A circle in the SOM view represents a cluster, and the circle size (ie area) is proportional to the cluster size (ie the number of multivariate vectors contained in the cluster). The shade of a hexagon represents the `distance' between neighboring clusters. For example, in figure 3 the similarity between the blue and white clusters is greater than that between the red cluster and the pink cluster. To examine the characteristic of each cluster, the user can select a single cluster or a subset of the clusters to highlight in the SOM view and in other visual components. 4.2 Reorderable matrix and map matrix

Depending on the configuration, the reorderable matrix can be 1D [figures 2(a), 2(b), and 2(c)] or 2D [figure 2(d)]. The rows and/or columns of the matrix can be reordered so that similar rows or columns are placed next to each other (Guo et al, 2006). The similarity between two rows (or two columns) is defined by the Euclidean distance between the two multivariate vectors [figures 2(a), 2(b), and 2(c)] or between the two sets of multivariate vectors [figure 2(d)]. If a 2D reorderable matrix is accompanied by a map matrix, then the maps will also be in the same order as the columns in the reorderable matrix. The optimal ordering method is introduced in section 4.4. In addition to colors that represent multivariate characteristics, we also vary symbol sizes (eg rectangles in the reorderable matrix, or circles in maps) to represent another quantitative measure (eg the total number of incidents for that aggregation unit). This feature is particularly important in visualizing the terrorist incident data. A country that has ten incidents and another country that has 1000 incidents may be colored the same way if they have a similar multivariate profile (eg the composition of weapons, as shown in figure 3). However, it is still important to inform the analyst of their difference in terms of the total number of incidents or total fatalities. Therefore, we

Visualizing patterns in a global terrorism incident database

775

use proportional symbols with colors to show both types of information ö that is, multivariate similarities (represented by colors) and quantities (represented by symbol sizes). 4.3 Multivariate legend and mapping

A PCP is used as a multivariate legend to help the analyst to understand the meaning of colors (Guo et al, 2005; 2006). The PCP can visualize either clusters (ie each string in the PCP represents a cluster using its average vector) or data items (ie each string represents an individual multivariate vector). Each string is painted in the color that is assigned to that cluster by the SOM. If clusters are visualized, the thickness of each string represents the cluster size (ie the number of multivariate vectors contained in that cluster). In other words, larger circles in the SOM view correspond to thicker strings in the PCP. The user can easily switch between the cluster view and the data item view. The cluster view is clearer, less cluttered, and thus easier to perceive, while the data item view conveys more accurate information and can show the internal variance within a cluster. There are several options to scale each dimension (or axis) in the PCP and we normally use a linear scaling, which linearly stretches data between given minimum and maximum values. Using the PCP as the legend, we can understand, for example, that the big blue cluster in figure 3 contains countries that have almost 100% of their fatalities and injuries caused by explosives. 4.4 One-dimensional optimal ordering

Let A ˆ fa1 , a2 , _ , ad g be a set of data items, which can be a set of countries [figures 2(a) and 2(b)], a series of years [figure 2(c)], a set of country ^ year combinations [figure 2(d)], or a set of dimensions (eg target types in the PCP in figure 5). Let jai ÿ aj j be the similarity value between items ai and aj . All pairwise similarity values within A form a symmetric similarity matrix. A 1D ordering of these data items is a projection of data items onto a 1D space according to their similarity matrix. Ideally, similar items should be close in the ordering and the more similar two items are, the closer they should be in the ordering. The purpose of such an ordering is to simplify views and to help humans recognize overall patterns. There are several existing methods that can be used to order data items (Ankerst et al, 1998; Bar-Joseph et al, 2001; 2003; Friendly, 2002; Guo, 2003). Bar-Joseph et al (2001; 2003) propose a hierarchical-clustering-based ordering method to find an optimal (ie the shortest) ordering that is consistent with a cluster hierarchy. Guo (2003) also proposes a hierarchical-clustering-based ordering method, which is faster but does not guarantee an optimal solution. Since the number of data items to be ordered in this research is not very large (90%) of fatalities and injuries caused by firearms, and green countries suffered most from remote-detonated explosives. However, the total number of fatalities and injuries for each of these red and green countries is relatively small (except Spain, where remote-detonated explosives caused a large number of fatalities

Figure 4. Spatiotemporal visualization of international incidents (1968 ^ 2005). Colors represent different temporal patterns, which are grouped into eight clusters by the self-organizing map. Countries are ordered so that countries with similar temporal trends are next to each other in the ordered list (left). Note: the numbers of incidents is used here (instead of fatalities and injuries as used in figure 3).

778

D Guo, K Liao, M Morgan

or injuries). Another interesting pattern we notice is that regions like Northern Ireland, West Bank/Gaza, and Kashmir all belong to the pink cluster, which on average had about 50% fatalities by explosives and about 40% by firearms. Nigeria also belongs to the pink cluster but it has much fewer fatalities or injuries. We want to emphasize that two clustering methods are independently applied to the data: one is the SOM (whose result is represented by colors) and the other is the complete-linkage clustering (whose result is represented by the ordering in the reorderable matrix). Since these two clustering methods are methodologically different from each other, it is interesting and useful to examine the agreement and disagreement between their results. If they agree (ie similar colors are next to each other in the ordering), we have more confidence in the discovered patterns. For example, the two methods are completely in agreement regarding the blue, red, pink, and purple clusters. If they disagree (eg two different colors are mixed in the ordering), it indicates that the two clusters are close in the data space and the boundary between them is not clear-cut. 5.2 Spatiotemporal patterns of international incidents

With the unified visualization environment, ST patterns are visualized in the same way as were SV patternsöby treating temporal series as multivariate vectors. For the result shown in figure 4 we used the international incident dataset (1968 ^ 2005), defined the data cube by countries and years [see figure 2(b)], and set values as the raw count of incidents for each country ^ year combination. To make the comparison between time series more meaningful and robust, we carry out two additional preprocessing steps. First, we exclude countries that have only one or two incidents during the past thirtyeight years. Second, we smooth each time series using a three-year moving average window. These two additional steps are optionalöthe user can choose to include all countries and not to smooth the data. It is possible for future work to incorporate more advanced time smoothing or warping techniques to preprocess temporal series (Fu et al, 2005; Lin et al, 2004). Each time series is then converted to percentage data by dividing each value by the total of that time series. Thus, each time series represents the relative trend for a country. Two countries (time series) are similar if they have peaks of incidents around the same time, even if the total counts of incidents for the two countries are very different. The similarity between two trends is defined as the Euclidean distance between the two time series. Countries are ordered in the reorderable matrix using the optimal ordering method so that countries with similar temporal trends of incidents are placed next to each other (figure 4, left). Countries are grouped into eight clusters by the SOM (figure 4, middle right) and colors represent temporal trends ösimilar trends have the same (or similar) colors. We can see which trend each color represents in the PCP. For example, a red color signifies time series that have a peak in very recent years (2003 ^ 05) ö that is, current hot spots of terrorist activities. The map shows the spatial distribution of these hot spots, which cluster mainly in the Middle East (eg West Bank/Gaza, Iraq, and Saudi Arabia) and in Central Asia (eg Afghanistan, Kashmir, Uzbekistan, and Kyrgyzstan). The SOM and the complete-linkage clustering agree that these red countries have similar trends because we see that all red countries are next to each other in the ordering. Similarly, we can interpret the trend for the pink cluster (peaks around 1997 and 2001), the purple cluster (peaks around 1995), the green cluster (peaks around 1990), and the blue cluster (peaks around the mid-1980s and before). Once we understand the meaning for each color, we can perceive in the map how terrorist activities have spread across space through time.

Visualizing patterns in a global terrorism incident database

779

Note that the PCP shows the average trend of a cluster of temporal series. The user can interactively change the view to show each individual temporal series and to examine the internal variation inside a cluster. The user can also select one or several countries in the matrix to examine the exact trend for each country selected. A cluster can also be selected in the SOM or in the PCP to show only these selected countries in the map. 5.3 Temporal ^ multivariate patterns of international incidents

In this section we introduce the visualization of another interesting and useful type of patterns in the international terrorist incidents ömultivariate patterns that shift over time. For the result shown in figure 5 we used the international incident dataset, defined the data cube by years and target types [figure 2(c)], set values as the raw counts of incidents for each year/target combination, and normalized each multivariate vector (ie the target profile for a year) to percentage values. A map is not needed here since spatial information is not involved. The SOM groups the thirty-eight years into nine clusters and assigns each cluster (thus each year) a color. Now colors represent combinations of terrorism targets. To help the user recognize patterns and perceive targets that are associated with each other, the twenty target types are reordered in the PCP using the optimal complete-linkage ordering method. For this case the similarity between two target types is the Euclidean distance between their temporal series. Very interesting patterns emerge from the visualization result shown in figure 5. Terrorist activities exhibit a clear temporal trend or strategy shift in terms of their targets. In the earliest years (1968 ^ 73, in blue), terrorists primarily targeted diplomatic targets and airports or airlines. Then, for the next ten years (1974 ^ 84, in green or light blue), the target composition changed slightly and business became an additional common target. For the late 1980s (in white and yellow), attacks on diplomatic, business, and airline targets dropped while attacks on military targets increased (relatively). The early and mid-1990s (1992 ^ 97, in red and pink) witnessed a dramatic change, when attacks on diplomatic and airline targets continued to drop while tourists, other, and NGO (nongovernment organizations) were increasingly targeted. Yet, from 1998 to 2003 (in purple), another dramatic change occurred. For this period, attacks on airline, military, and other targets dropped to the lowest point, while the government, the police, transportation, and especially private citizens (>25%) became the popular targets. This shows a dangerous sign that terrorists were getting more aggressive. Most recently (2004 and 2005, in red), the trend returned to that of the mid-1990s, which had a high percentage of incidents targeting business and other targets. To help readers understand what an `other' target is, here are two example incidents whose targets are classified as `other' in the MIPT database. . A Turkish sports and cultural center in Paris was firebombed. Six people were injured (20 August 1995). . Twelve Nepalese workers (who were working as cooks and cleaners for a Jordanian company) were kidnapped and killed in Iraq. The kidnappers said that they kidnapped the men because they were cooperating with the occupation (23 August 2004). Most of the incidents on `other' targets during 2004 and 2005 occurred in Iraq. If we exclude the incidents in Iraq for 2004 and 2005, then the composition of targets for these two years would become more similar to that of the `purple' years (1998 ^ 2003). Target preferences change not only over time, but also from country to country. In the next section we introduce another type of visualization in order to examine multivariate patterns across geography and over time.

780

D Guo, K Liao, M Morgan

Figure 5. A temporal ^ multivariate visualization of international incidents (1968 ^ 2005). The selforganizing-map view is not shown, which grouped the thirty-eight years into nine clusters based on their target compositions. The twenty target types are reordered in the parallel coordinate plot using the complete-linkage clustering method. 5.4 Spatiotemporal ^ multivariate patterns of domestic incidents

As pointed out above, multivariate patterns can change both across geography and across time. In this section we introduce how our unified environment visualizes the spatial variations and temporal trends of multivariate information. Here we used the domestic incident dataset again; defined the data cube by countries, years, and weapons [see figure 2(d)]; set values as fatalities and injuries; and normalized each multivariate vector to percentage values. In other words, except for the fact that the cube is defined by countries, years, and weapons, the configuration here is the same as that used in section 5.1 and in figure 3. In addition to examining the spatial distribution of weapon preferences, we will also be able to see how they change over years. In contrast to the matrix shown in figure 3, the reorderable matrix in figure 6 has multiple columns (one for each year) and thus a map matrix is constructed (one map for each year). The maps are always in the same order as that of the columns in the reorderable matrix. Each country ^ year combination is a weapon composition vector in terms of the fatalities and injuries that were caused. Empty vectors (ie country ^ year combinations that have no fatalities and injuries) are excluded in the subsequent analysis. The SOM (which is not shown) groups all nonempty vectors (converted to percentage values) into twenty-four clusters, which are colored so that similar clusters have similar colors. Both the reorderable matrix and maps use colored proportional symbols whose sizes (ie areas) represent the total fatalities and injuries for country ^ year combinations. We do not directly color each country (as we did in previous sections), because small countries are barely noticeable when maps get smaller in the matrix. Rows (ie countries) in the reorderable matrix are ordered. Since each country now has a 2D array of values (one value for each year and weapon), the similarity between two countries is the Euclidean distance between the two 2D arrays.

Visualizing patterns in a global terrorism incident database

781

Figure 6. Space ^ time-multivariate visualization of domestic incidents (1998 ^ May 2006). Nonempty multivariate vectors (ie weapon compositions) are grouped into twenty-four clusters. The self-organizing map (which is not shown in the snapshot) groups all nonempty multivariate vectors into twenty-four clusters, which are colored with a 2D color scheme so that similar clusters have similar colors. Note that 2006 is not a full year öit only includes incidents up to 28 May 2006.

The four views (including the SOM view that is not shown) in figure 6 provide a holistic view of the major patterns, across space, time, and multivariate dimensions. One can interpret a variety of patterns from the visual presentation. From the reorderable matrix, we can easily perceive how the makeup of weapons (in terms of fatalities that each weapon caused) changes over time. For example, the West Bank/Gaza (the second row at the bottom) suffered most from firearms during the period 1999 ^ 2002, while since 2003 explosives have caused more damage than other weapons in that region. We also notice that in the year 2001 more countries were troubled by remotedetonated bombs (in green). From the map matrix, we can see that in 1998 many Eastern Europe countries had a moderate number of fatalities and injuries, which were mostly caused by explosives (in blue). The years 1999 and 2000 were relatively quiet. Since 2001 the Middle East and Central/South Asian countries have had more fatalities and injuries. At the beginning (2001 ^ 03) the choice of weapons was diverse ö firearms (in red/pink) explosives (in blue/purple), and firebombs (in green) were all involved. However, since 2004 a clear trend has emergedöwith the use mostly of explosives and a spatial focus on the Middle East and on South Asia. Although the year 2006 only has a five-month data included, the trend persists.

782

D Guo, K Liao, M Morgan

6 Conclusion and discussions We introduced a unified visualization environment that can flexibly support the visualization of spatiomultivariate (SV), spatiotemporal (ST), temporal ^ multivariate (TV), and spatiotemporal ^ multivariate (STV) patterns. We explored a large dataset of terrorist incidents with the visualization system and presented a selection of interesting patterns from different perspectives. The paper focuses on the application-specific data analysis process and includes details on data compilation, transformation, various visualization results, and the interpretation of discovered patterns. In terms of this unique dataset, the usefulness of this visualization system lies in the complexity and diversity of patterns that it can present. In other words, it can simultaneously visualize data across several dimensions (including geography, time, and/or multiple attributes) and can explore different types of patterns with a unified environment. In addition to the ability to construct a holistic visual representation of complex patterns, the visualization system supports a variety of user interactions to help the analyst understand patterns. However, it does not depend on user interactions (eg selections) to link different views. Instead, it uses colors to glue different components together. Thus, the visualization system is effective in presenting patterns with static means (eg paper prints or images), although with interactions the user can acquire more specific and concise information. For example, the user can select a single country to examine its temporal trend (instead of looking at the average trend of a cluster). The presented visualization system, by nature, is relatively more complex than conventional approaches. The learning curve for a nonexpert user can be steep. The evaluation of such a complex visual-computation environment is a challenging task that requires careful planning (Tobon, 2005). A recent research suggests a strategy to evaluate a similar geovisualization environment (Koua and Kraak, 2005). We are in the process of designing usability experiments and user tests to make the approach more intuitive and easier to use. In its current form, the reorderable matrix cannot handle a large number of columns and rows. To extend the system to address a large matrix (or cube) of data, one possible solution is to apply data reduction approaches (eg clustering) to merge rows (or columns) into groups and then to treat each group as a single row or column (Guo, 2006). The ordering of countries in the reorderable matrix does not consider spatial relationships (eg the spatial proximity between countries). It might be desirable that the ordering of countries also preserves spatial relations as much as possible. It would also be interesting to study the relationships between terrorism incidents and political ^ socioeconomic factors if more data variables from other sources could be incorporated. Acknowledgements. This research was partially supported by a grant from the Walker Institute of International and Area Studies, University of South Carolina. This research was also partially supported by the United States Department of Homeland Security through the National Consortium for the Study of Terrorism and Responses to Terrorism (START), grant number N00140510629. However, any opinions, findings, and conclusions or recommendations in this document are those of the authors and do not necessarily reflect the views of the US Department of Homeland Security. References Andrews D F, 1972, ``Plots of high-dimensional data'' Biometrics 29 125 ^ 136 Andrienko N, Andrienko G, Gatalsky P, 2003, ``Exploratory spatio-temporary visualization: an analytical review'' Journal of Visual Languages and Computing 14 503 ^ 541 Ankerst M, Berchtold S, Keim D A, 1998, ``Similarity clustering of dimensions for an enhanced visualization of multidimensional data'', in Proceedings of the IEEE Symposium on Information Vizualization IEEE Computer Society, 1730 Massachusetts Avenue, NW, Washington, DC, pp 52 ^ 60

Visualizing patterns in a global terrorism incident database

783

Bar-Joseph Z, Gifford D K, Jaakkola T S, 2001, ``Fast optimal leaf ordering for hierarchical clustering'' Bioinformatics 17 S22 ^ S29 Bar-Joseph Z, Demaine E D, Gifford D K, Hamel A M, Jaakkola T S, Srebro N, 2003, ``K-ary clustering with optimal leaf ordering for gene expression data'' Bioinformatics 19 1070 ^ 1078 Baumgartner R, Somorjai R, 2001, ``Graphical display of fMRI data: visualizing multidimensional space'' Magnetic Resonance Imaging 19 283 ^ 286 Bertin J, 1983 Semiology of Graphics, Diagrams, Networks, Maps (The University of Wisconsin Press, Madison, WI) Bertin J, 2001, ``Matrix theory of graphics'' Information Design Journal 10 5 ^ 19 Blok C A, 2005, ``Dynamic visualization variables in animation to support monitoring of spatial phenomena'', Netherlands Geographical Studies 328, ITC Dissertation 119, http://www.itc.nl/ library/Papers 2005/phd/blok.pdf Blok C A, Ko«bben B J, Cheng T, Kuterema A A, 1999, ``Visualization of relationships between spatial patterns in time by cartographic animation'' Cartography and Geographic Information Science 26 139 ^ 151 Carlis V J, Konstan A J, 1998, ``Interactive visualization of serial periodic data'', in Proceedings of the 11th Annual Symposium on User Interface Software and Technology (ACM Press, New York) pp 29 ^ 38 Chernoff H, Rizvi M H, 1975, ``Effect on classification error of random permutations of features in representing multivariate data by faces'' Journal of American Statistical Association 70 548 ^ 554 Cronin A K, 2002, ``Behind the curve: globalization and international terrorism'' International Security 27 30 ^ 58 Dorling D, 1994, ``Cartograms for visualizing human geography'', in Visualization and GIS Eds D Unwin, H Hearnshaw (Belhaven Press, London) pp 85 ^ 102 Dykes J, 1998, ``Cartographic visualization: exploratory spatial data analysis with local indicators of spatial association using Tcl/Tk and cdv'' The Statistician 47 485 ^ 497 Enders W, Sandler T, 1999, ``Transnational terrorism in the post-cold war era'' International Studies Quarterly 43 145 ^ 167 Enders W, Sandler T, 2002, ``Patterns of transnational terrorism, 1970 ^ 99: alternative time series estimates'' International Studies Quarterly 46 145 ^ 167 Fredrikson A, North C, Plaisant C, Shneiderman B, 1999, ``Temporal, geographical and categorical aggregations viewed through coordinated displays: a case study with highway incident data'', in Workshop on New Paradigms in InformationVisualization and Manipulation (in conjunction with ACM CIKM'99) 6 November, ACM, 2 Penn Plaza, New York, Kansas City, MO, pp 26 ^ 34 Friendly M, 2002, ``Corrgrams: exploratory displays for correlation matrices'' The American Statistician 19 316 ^ 324 Fu A W-C, Keogh E, Lau L Y H, Ratanamahatana C A, 2005, ``Scaling and time warping in time series querying'', in Proceedings of the 31st International Conference on Very Large Data Bases (VLDB) Trondheim, Norway, pp 649 ^ 660, http://www.vldb.org/dblp/db/conf/vldb/vldb2005.html Guo D, 2003, ``Coordinating computational and visualization approaches for interactive feature selection and multivariate clustering'' Information Visualization 2 232 ^ 246 Guo D, 2006, ``Spatio-temporal visual analytics for pandemic response and decision support'', paper presented at Workshop on Visualization, Analytics and Spatial Decision Support, Mu«nster, Germany, copy available from the author Guo D, Gahegan M, 2006, ``Spatial ordering and encoding for geographic data mining and visualization'' Journal of Intelligent Information Systems 27 243 ^ 266 Guo D, Gahegan M, MacEachren A M, Zhou B, 2005, ``Multivariate analysis and geovisualization with an integrated geographic knowledge discovery approach'' Cartography and Geographic Information Science 32 113 ^ 132 Guo D, Chen J, MacEachren A M, Liao K, 2006, ``A visualization system for space ^ time and multivariate patterns (VIS-STAMP)'' IEEE Transactions on Visualization and Computer Graphics 12 1461 ^ 1474 Havre S, Hetzler E,Whitney P, Nowell L, 2002, ``Theme river: visualizing thematic changes in large document collections'' IEEE Transactions on Visualization and Computer Graphics 8 9 ^ 20 Hoffman B, 1999, ``Terrorism trends and prospects'', in Countering the New Terrorism Eds I Lesser, B Hoffman, J Arquilla, D Ronfeldt, M Zanini, RAND Corporation, Santa Monica, CA, pp 7 ^ 38, http://www.rand.org/pubs/monograph reports/MR989/index.html Inselberg A, 1985, ``The plane with parallel coordinates'' The Visual Computer 1 69 ^ 97

784

D Guo, K Liao, M Morgan

Keim D A, Kriegel H P, 1996, ``Visualization techniques for mining large databases: a comparison'' IEEE Transactions on Knowledge and Data Engineering 8 923 ^ 938 Keim D, Panse C, Sips M, North S, 2004, ``Pixel based visual mining of geo-spatial data'' Computers and Graphics 28 327 ^ 344 Kohonen T, 2001 Self-organizing Maps (Springer, Berlin) Koua E L, Kraak M J, 2005, ``Evaluating self-organizing maps for geovisualization'', in Exploring Geovisualization Ed. M-J Kraak (Elsevier, Amsterdam) pp 627 ^ 644 Lin J, Keogh E, Lonardi S, Lankford J P, Nystrom D M, 2004, ``Visually mining and monitoring massive time series'', in Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery (ACM Press, New York) pp 460 ^ 469 MacEachren A M, DiBiase D, 1991, ``Animated maps of aggregate data: conceptual and practical problems'' Cartography and Geographic Information Systems 18 221 ^ 229 MacEachren A M, Wachowicz M, Edsall R, Haug D, Masters R, 1999, ``Constructing knowledge from multivariate spatiotemporal data: integrating geographical visualization with knowledge discovery in database methods'' International Journal of Geographical Information Science 13 311 ^ 334 Major J A, 2002, ``Advanced techniques for modeling terrorism risk'' Journal of Risk Finance 4 15 ^ 24 Monmonier M, 1989, ``Geographic brushing: enhancing exploratory analysis of the scatterplot matrix'' Geographical Analysis 21 81 ^ 84 O'Brien S, 1996, ``Foreign policy crises and the resort to terrorism: a time-series analysis of conflict linkages'' The Journal of Conflict Resolution 40 320 ^ 335 Rosario G E, Rundensteiner E A, Brown D C, Ward M O, 2003, ``Mapping nominal values to numbers for effective visualization'' Information Visualization 3 80 ^ 95 Sandler T, Arce D G, 2003, ``Terrorism and game theory'' Simulation and Gaming 34 319 ^ 337 Silva S F, Catarci T, 2002, ``Visualization of linear time-oriented data: a survey'' Journal of Applied Systems Studies 3 454 ^ 478 Tobon C, 2005, ``Evaluating geographic visualization tools and methods: an approach and experiment based upon user tasks'', in Exploring Geovisualization Eds J Dykes, A M MacEachren, M-J Kraak (Elsevier, Amsterdam) pp 645 ^ 666 Varsta M, Heikkonen J, Lanmpinen J, Millan J D R, 2001, ``Temporal Kohonen map and the recurrent self-organizing map: analytical and experimental comparison'' Neural Processing Letters 13 237 ^ 251

ß 2007 a Pion publication printed in Great Britain

Conditions of use. This article may be downloaded from the E&P website for personal research by members of subscribing organisations. This PDF may not be placed on any website (or other online distribution system) without permission of the publisher.

Suggest Documents