1 Knowledge Discovery from Sensor Data For Scientific ... - CiteSeerX

5 downloads 351 Views 538KB Size Report
The ability to generate actionable predictive insights from raw sensor .... visualization and visual analytics, utilization of predictive insights from offline analysis ...
1 Knowledge Discovery from Sensor Data For Scientific Applications Auroop R. Ganguly*, Yi Fang, Shiraj Khan, and Olufemi A. Omitaomu Computational Sciences and Engineering Division, Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, TN 37831. *Contact Author Email: [email protected] Summary. Wireless sensor networks, in-situ sensor infrastructures and remote sensors offer new opportunities for pervasive monitoring of the built and natural environments. The ability to generate actionable predictive insights from raw sensor data is critical for many scientific applications of high societal priority. One example is sensor-based early warning systems for geophysical extremes like tsunamis or hurricanes, which can help preempt disaster damage through tactical decisions. Indeed, the impacts of the 2004 Indian Ocean tsunami may have been almost entirely be preventable if early warning systems were well developed. One other example is high-resolution risk-mapping for natural hazards, based on predictive insights obtained through a combination of historical and real-time sensor-based data, with physics-based computer simulations. High-resolution risk maps for geophysical extremes, in combination with information about consequences and resiliency, can lead to strategic policy tools, which in turn can be utilized to reduce the impacts of anticipated natural hazards. Sensor-based knowledge discovery requirements need to be geared towards scalable and efficient implementations of offline predictive insights and fast real-time analysis of incremental information. Predictive insights regarding weather, climate and geophysical hazards require models of rare, anomalous and extreme events, nonlinear phenomena, and change analysis, in particular from massive volumes of geographic data. On the other hand, historical data may also be noisy and incomplete, thus robust tools need to be developed for these situations. This chapter describes the problems and research challenges in the interdisciplinary area of hazards mitigation, specifically focusing on new opportunities for knowledge discovery from sensor data and model simulations.

Keywords: Sensors, Knowledge Discovery, Scientific Applications, Weather Extremes, Natural Hazards

2

Auroop R. Ganguly*, Yi Fang, Shiraj Khan, and Olufemi A. Omitaomu

1.1 Introduction Remote sensors [68] like earth observing satellites or weather radars, largescale sensor infrastructures [72] , and environmental wireless sensor networks [10] , yield massive volumes of dynamic and geographically-distributed data at multiple space-time resolutions. The raw data needs to be converted to summary information and subsequently utilized for generating new knowledge, or actionable predictive insights, ultimately leading to faster and more accurate tactical decisions or more reliable longer-term policy. In addition to observations, scientific applications demand that information about the known physics, or data-dictated process dynamics, be taken into account. The scientific domains are diverse and requirements for sensor-based data processing and analysis can be fairly broad on the one hand and domain specific on the other. This chapter focuses on knowledge discovery based on data obtained from sensor observations and/or computer-generated simulations, specifically in the context of providing decision or policy aids for hazards mitigation [43] Hazards can be natural [85] as in weather extremes such as hurricanes and heat waves, technological as in leakage and spread of toxic plumes from industrial facilities [53], or adversarial as in security [2] and war. This chapter primarily focuses on hazards due to weather or climate extremes [65]. The utility of intelligent data sciences and sensor information in the context of hazards mitigation has been demonstrated in a proof of concept way (e.g., [76]). However, the opportunities far exceed the accomplishments. This chapter is organized as follows. The challenges facing knowledge discovery in hazards mitigation is discussed in Section 2. A proposed holistic knowledge discovery framework is described in relation to weather and hazards mitigation in Section 2. Some short- and long-term predictive insights on natural hazards are the focus of Section 3. The impacts of weather extremes on societies and infrastructures as well as the challenges for short-term tactical decisions and long-term strategic policy are discussed in Section 4. In Section 5, we present an overview of some knowledge discovery solutions developed or being developed by the authors and their collaborators in the area of weather and climate extremes and related natural hazards. A discussion of how these solutions can serve as closed-loop knowledge discovery processes is also discussed in Section 5. The summary of the chapter is in Section 6.

1 Knowledge Discovery for Scientific Applications

3

1.2 Knowledge Discovery Challenges and Framework In this section, we describe the challenges of the current knowledge discovery framework, especially in the context of natural hazards, and propose a holistic approach that incorporates these challenges into the conventional framework. 1.2.1 Knowledge Discovery Challenges The earth science, including the weather or climate extremes, community has developed and utilized traditional statistics [127] as well as non-traditional statistical models like extreme value theory [56, 57], spatial and space-time statistics [78, 17], as well as nonlinear dynamics and signal processing [93, 79, 127], Preliminary success stories in knowledge discovery for the earth sciences have been reported in the last few years [91, 107]. The direct contributions to this area from data mining have been limited, other than a few notable exceptions (e.g., [91, 49]). However, there are a few encouraging signs [76]. Newer developments in data mining include the ability to deal with dependence of learning samples or “relational” data [95, 74, 27],; mining rare events from dynamic sequences [119, 120], as well as anomaly detection (primarily in the context of cyber-security, e.g., [67]) and event detection [44]. Computationally efficient methods for anomalies, extreme values, rare events, change, and nonlinear processes need to be developed for massive geographical data. An interdisciplinary focus within the data sciences is necessary, such that new tools can be motivated from a combination of traditional and non-traditional statistics, nonlinear dynamics, information theory, signal processing, econometrics and decision theory, and lead toward application solutions that span these areas. In a section describing the “interdisciplinary nature of KDD”, [34] mention that KDD evolves from the “intersection of research fields” like “machine learning, pattern recognition, databases, statistics, AI, knowledge acquisition for expert systems, data visualization, and high performance computing”. The time may have come to further broaden the interdisciplinary roots. Scientific applications are built on domain knowledge, which are typically embedded with numerical models. The sensor-based knowledge discovery community also has a particular responsibility to work closely with modelers to develop efficient strategies for real-time data assimilation within physically-based computer models. In addition, analysis from observations need to be effectively synthesized with model simulations. While the challenges are tremendous, the scientific opportunities [9] and benefits can be immense. 1.2.2 Knowledge Discovery Framework Based on the challenges described in Section 1.2.1, we propose a somewhat broader knowledge discovery framework (see Figure 1.1), which describes the end-to-end process. The components of the proposed system, in the context of natural hazards, are the briefly listed:

4

Auroop R. Ganguly*, Yi Fang, Shiraj Khan, and Olufemi A. Omitaomu

Fig. 1.1. A holistic approach to knowledge discovery

1. Data Integration • Remote sensors, wired and wireless in-situ sensor networks • Numerical physics-based computer model outputs • Ancillary information and encoded domain knowledge 2. Pattern Detection • Offline knowledge discovery from observations and models • Computational efficiency and scalability to massive data • Anomalies, extremes, nonlinear processes, in space and time • Probabilities, intensities, duration, frequency, risks 3. Process Detection • Numerical models with sensor data assimilation schemes • Extraction of dynamics from massive sensor observations • Extraction of dynamics from incomplete, noisy information 4. Decision Support • Online knowledge discovery from models and observations • Algorithmic efficiency for dynamic, distributed processing • Resiliency, vulnerability, impacts, GIS-based metrics • Visualization and decision or policy aids 1.2.3 Pattern Detection or Offline Knowledge Discovery The broad requirements of the offline pattern detection, other than the need for both capacity and capability computing, are the following: 1. Multidisciplinary: Multiple aspects of problems solved using set of individual tools, each motivated from one or more disciplinary area 2. Interdisciplinary: Comprehensive solutions developed based on blend of methodologies spanning traditional disciplinary areas 3. Process-based: Larger overall problem partitioned into component processes and solved using physics and a suite of data science tools

1 Knowledge Discovery for Scientific Applications

5

4. Holistic: End-to-end solutions for an application from raw sensor and model data to decision and policy aids The distinguishing features compared to traditional data science areas are the following (sub-bullets list the primary differences from the traditional): •







Data Mining – Enhanced focus on scientific rather than business – Algorithms for anomalies, extremes, rare and unusual – Geographic, time series, spatial, space-time relational data Statistics and Econometrics – Focus on computational efficiency and scalability – Methods for nonlinear processes and representations – Statistics of rare events and extremes/anomalies Nonlinear Dynamics and Information Theory – Robust to limited or incomplete and noisy information – Scalability to massive data – Spatial, space-time and geographic data Signal processing – Nonlinear dynamical, even chaotic, system behavior – Colored, even 1/f, noise – Noisy and incomplete information

One key challenge is the multi-scale nature of the problem, which may mean that the physical processes, parameterization and parameter values, as well as the fitted data generation models or distributions, do not remain invariant across space-time scales. The major requirement is to remain scalable to massive data and hence develop or utilize approaches for faster processing speeds and efficient data retrieval or analysis. The massive data volumes and increasing demands on the computational infrastructures suggest the utilization of high-performance computing platforms, both for simulations with real-time data updates as well as for pattern analysis or detection. 1.2.4 Online Knowledge Discovery and Decision Support The “decision support” component comprises online knowledge discovery and the decision sciences. The online knowledge discovery processes needs to be efficient in terms of memory-usage and analysis times, must be able to handle incremental information in real-time, and generate time-phased or event-based decision metrics, at multiple geospatial locations and times, such that the metrics can be utilized for automated alert mechanisms or to facilitate the task of the “human in the loop” in the space-time context. One new example of this application is the concept of ubiquitous sensing. Ubiquitous sensing describes the concept that one has either an array of many sensors that generate high flows of data; much of which may be null

6

Auroop R. Ganguly*, Yi Fang, Shiraj Khan, and Olufemi A. Omitaomu

(background, uninteresting or contradictory) or that one has a few mobile sensing platforms that need to be cost effectively deployed. Examples of the first category include arrays detecting contraband crossing borders or denied areas. Examples of the later category include satellite or air breathing remote sensing assets. Some off-line modeling and decision support tools have been semi-coupled in realtime to direct the next sequence of data acquisition. In one example of these decision support tools, Bayesian approaches formulate hypotheses (such as a missile launch being detected) and marshalling the data from other elements of the arrray (in the first example) or to move the mobile platform to the next location (in the second example) to gain the next most valuable data point that would reduce the uncertainty in the conclusion that an event was detected. Some of the applications and challenges of ubiquitous sensing has been discussed in the literature [92]. Efficient real-time algorithms are required to react to the real-time, dynamic, and distributed nature of the knowledge discovery as well as direct this data acquisition within the time cycle of an event. Overall, the online approaches need to be algorithmically efficient, that is, the mathematical algorithms must be amenable to robust online implementations, which implies fast, storage-efficient, memory efficient, adaptive, and real-time or near realtime, analysis. However, there is a need for tradeoff between computational efficiency, algorithm performance, and domain requirements. A good example of such tradeoff is the SPIRIT algorithm [83], which is essentially an incremental version of Principal Component Analysis (PCA) technique, but the weight estimates are slightly different from the principal directions in conventional (offline) PCA. The difference in computational requirements does not affect the algorithm performance. Therefore, a good understanding of application domains is the key to achieving such compromise. The decision science component encompasses the development of decision metrics in space and time for visualization and visual analytics, utilization of predictive insights from offline analysis and real-time distributed discovery processes, processing dynamic and event-based streams of data in conjunction with offline discovery and real-time analysis, feedback loops from prior decisions or policies, decision metrics, uncertainty and impacts of risks, including resiliency and consequences in the context of natural disasters.

1 Knowledge Discovery for Scientific Applications

7

1.3 Focus Area: Actionable Predictive Insights on Natural Hazards The exposure of human life and economy to natural disasters ranging from hurricanes, tornadoes, tsunamis and earthquakes to heat waves, cold spells, droughts and floods or flash floods appear to have increased even as world economies have developed and prospered [85, 111]. While debates may persist on the exact causes that led to increased losses to human life and property from natural hazards in recent years, the fact that enhanced predictive insights can help mitigate the impacts of such hazards through improved decisions and policies is getting relatively well-accepted. Pearce [84] describes the need for a “shift [in] focus from response and recovery to sustainable hazard mitigation” and laments that in current practices “hazard awareness is absent from local decision-making processes”. An interesting early article by Sarewitz and Pielke, Jr. [98] discusses the art of making scientific predictions relevant to policy-makers, and uses as examples weather extremes and natural hazards. The requirement to move beyond post-disaster consequence management and humanitarian aid disbursal, and toward preemptive policies based on predictive insights, as well as the need to realize that while natural disasters may or may not be “acts of God”, their consequences certainly are not, and can be remedied by policies, was highlighted by two recent Science magazine articles [7, 70]. Hazards mitigation, especially via predictive insights of some rudimentary form, has been attempted since time immemorial with varying success. However, what has changed dramatically in recent years is the availability of massive volumes of historical and real-time data from remote and in-situ sensors and pattern detection approaches to extract actionable predictive insights from the data, as well as enhanced understanding of physical processes leading to improved short- and longer-term computer simulation models, which in turn are initialized and updated in real-time with sensor observations for improved accuracy, and yield massive volumes of simulation outputs. In this sense, both the new opportunities and the key challenges in generating predictive insights for natural hazard mitigation rely on extracting knowledge from massive volumes of dynamic and distributed sensory data as well as large volumes of computer-based simulations. As we show later, while there have been some progress in this area, what have been lacking are concerted research efforts. 1.3.1 Predictive Insights for Decisions and Policies leading to Natural Hazards Mitigation Sykes et al. [109] argue that “earthquake prediction should not be equated solely with short-term prediction - those with time scales of hours to weeks”, and that longer-term warnings are useful to society as well with “stakeholders

8

Auroop R. Ganguly*, Yi Fang, Shiraj Khan, and Olufemi A. Omitaomu

likely to take different mitigation measures in response to each type of prediction”. We believe this argument is equally valid in the context of other types of hazards like tsunamis, hurricanes, floods, droughts, heat waves or cold spells. However, since the requirements, challenges, opportunities and implications of each type of predictive modeling may be different from the other, we discuss these separately. The fundamental distinction we make in this chapter is that short-term predictive analysis is for events (e.g., hurricanes, extreme floods, tsunamis) that are either already in progress or imminent (e.g., the underlying processes which generate the events have been initiated) and where tactical decisions may be appropriate and urgent, while longer-term predictive analysis generates anticipatory information about the probability, or about characteristics like intensity, frequency, duration, location and timing, of future hazardous events and the risks thereof, thus providing tools to policy-makers. We note that both short-term and longer-term insights need to be put in the context of their end-use before they can begin to facilitate tactical decisions or strategic policy. The fact that a hurricane of certain category upon landfall in a specific location at a given time may cause a dam to break, or the fact that the sustained increase of extreme rainfall may cause crops to fail or cause tremendous damage to society, need to be quantified in advance for the decision or policy tools to be effective. Hence, a complete understanding and quantification of risks and consequences of natural extremes, as well as the vulnerability of human population and critical infrastructures, are required to make the predictive insights effective and actionable. Short-term Natural Hazards Prediction An editorial in the Nature journal, a few days after the devastation caused by the Indian Ocean tsunami, mention that “an effective warning system... could have reduced the scale of the disaster” [77]. A BBC news article [5] mentions that “the only warning most people in the region had was the sight of a giant wave heading towards them” and proceed to describe the warning systems and sensors currently being developed. The importance of even marginal improvements in predictive modeling, as well as the need to couple predictive models with follow-on impacts within a decision framework, has been painfully evident for major named hurricanes like Katrina and Rita. The substantial uncertainty surrounding the track of Rita and the inability to assess the impact of mass evacuation on the transportation networks caused traffic standstill in and around Houston on September 21, 2005. This made about 3 million people exposed than necessary to the hurricane, although eventually Rita made landfall elsewhere and caused significant destruction at multiple locations [35]. The damage from Katrina, as is well-known by now, was mostly caused by the levee breach [4] and resultant flooding rather than the direct impact of the hurricane. While there are controversies surrounding how much of that information was known in advance to decision makers [116, 11], the fact that advance warning could have resulted in less damage

1 Knowledge Discovery for Scientific Applications

9

is not doubted. In a recent article in Science magazine, [52] indicate that remotely measured data from aircrafts can improve the forecast of hurricanes by calibrating the numerical weather prediction model outputs. Willoughby [123] discusses the issues of forecasting hurricane intensities and impacts in another article of the same issue of Science. An active area of research, particularly in the civil engineering sciences and social sciences respectively, has been the understanding and quantification of infrastructural and societal disaster resiliency. Examples are provided in a special issue of Science magazine, which was devoted to this area [46], as well as in other recent work [113, 105, 19, 69, 110]. Post-disaster consequence management has been an area of focus within disaster mitigation agencies [82, 18]. The forecasting and prediction problem has been primarily left to meteorologists and numerical weather prediction models, even though convective rainfall and weather physics at high resolutions are known to be somewhat poorly approximated by these models. The combination of sensor-based data with model outputs can be a promising direction, based on results cited in the literature [52, 40]. However, the development of data acquisition and extraction systems leading to knowledge discovery from sensory data for actionable predictive insights has been a relatively neglected area. This situation has persisted even though the need to move beyond post-disaster consequence management is clearly felt by the community. Even so, there are encouraging signs based on recent developments in warning systems. Einstein [30] discuss the availability of warning systems for natural hazards, [109] discusses earthquake prediction, [122] discusses early warning systems for tsunamis, while hurricane prediction based on sensors and computer models has been discussed by [125] and [3]. Walker et al.[115] discuss the improvements in hurricane prediction over the years. As the above review of the literature indicates, significant strides have been made but considerable challenges remain in terms of pattern detection from historical sensor data in a way that these can be used in real-time analysis and decision-making frameworks, the assimilation of sensor data into physics-based and large-scale computer models, sensor-based early warning systems and predictive models with continuous data-dictated feedback loops, as well as predictive modeling approaches that synthesize sensor observations with model simulation outputs in real-time or near real-time. These areas represent tremendous opportunities for the knowledge discovery and modeling/simulation communities, as well as in the area of high performance capability computing [108]. Longer-term Natural Hazards Prediction The need to move beyond mere post-disaster consequence management and the disbursing of humanitarian aid, toward the generation of actionable insights which can guide decisions and policy for the pre-emption of the impacts of natural disasters has been emphasized in venues like Science mag-

10

Auroop R. Ganguly*, Yi Fang, Shiraj Khan, and Olufemi A. Omitaomu

azine. Bohannon [7] suggest that natural disasters are not so much “acts of God” as policy failures or lack of forewarning and preparedness, and sustainable mitigation strategies can be put in place in the longer-term if, for example, risk landscapes for natural hazards were available at zip code levels. High-resolution risk mapping for natural disasters entail an understanding of risks, consequences and vulnerability, and thus represents a call to action for interdisciplinary research involving geophysicists, atmospheric scientists, civil and environmental engineers, including water resources, infrastructures, foundation and transportation and environmental engineers, epidemiologists and geographers. Linnerooth-Bayer et al. [70] emphasize that post-disaster consequence management has failed the needs to developing countries and that there is a need for predictive analysis as well as to transfer the risk of disasters to financial markets. They feel these issues have become critical in light of climate change negotiations. Researchers like the above have pointed to the urgent need for generating anticipatory insights about natural disasters at high-resolutions for multiple time-scales, such that these can lead to tactical, operational and strategic decisions at local, national and regional scales. There are several key challenges for the results of predictive insights to turn into effective policy tools, some of which are described below: 1. High-Resolution Risk Mapping: One key requirement is the generation of spatial vulnerability maps which can help national agencies or world bodies determine the optimal allocation of grant funding to reduce future disaster impacts. 2. Root-Cause Analysis: The other challenge is root-cause analysis, for example, the determination of whether the anticipated damage due to future disasters would be primarily due to geophysical, weather and climate extremes or due to population pressures and reduced community resiliency, or a combination. This determination can help sustainable spending of allocated funds. Thus, if resiliency is the major problem, investments in public education may be appropriate, while if the variability of natural extremes is a problem, then early warning systems may need to be further developed. On the other hand, if population pressure in an urban area is expected to be the primary cause of disaster damage, then the development of satellite towns as well as improvements in civil structural and transportation infrastructural systems, and sensor-based monitoring of such systems, may be needed. 3. Dynamic Preparedness Assessments: The assessment of return on investment from grant funding allocated for enhancing preparedness and resiliency to natural disasters, as well as a modeling of the temporal and spatial change in vulnerability, for example changes in expected disaster damage due to climate change or population growth, may help assess the continued need and effectiveness of fund allocation strategies.

1 Knowledge Discovery for Scientific Applications

11

The design of policy tools is critical as these can facilitate sustainable and long-term mitigation of natural geophysical hazards. The quantification of risks at high geospatial-temporal resolutions, assessment of change in the risks due to environmental change or societal movement, and optimal allocation of mitigation-related funding or scarce resources based on these risks, require significant offline processing of information as well as the ability to react to new information. Historical weather, hydrological or other geophysical observations from remote or in-situ sensors and sensor networks, as well as simulations from geophysical or climate models, needs to be exploited to extract actionable predictive insights on past and anticipated extremes and anomalies, which can in turn facilitate risk modeling as described earlier. In addition to determining the properties of extremes and anomalies based on tails (ends) of probability distributions, stochastic process modeling, or simple user-defined criteria, quantitative information regarding second order properties of extremes or anomalies, as well as changes in the properties thereof due to large-scale regional or temporal effects, may be useful and may lead to a further quantification of the expected impacts on infrastructures based on design specifications. The predictive insights can be based on observations alone, which can help in reconstructions of past scenarios as well as provide insights about the major change in the past, including any possible abrupt regional change. These insights can not help in discovering similar patterns in the more recent past or current data. The insights from observations can also yield predictive insights on the immediate future based on trends and other temporal patterns in the historical records. The predictive insights for the longer-terms need to be obtained from model projections. Thus, climate models can generate scenario-based ensemble projections for the next century at fairly high resolutions. However, precision may not imply accuracy, especially when extremes and anomalies are of interest. Thus, a comparison of the insights obtained from model simulations with those from observations, based on time periods where both data may be available, can help understand and quantify the errors and uncertainties in the model outputs and the corresponding insights extracted from these outputs. Meehl and Tebaldi [73] provide an example where a relatively straightforward analysis led to significant insights on the future properties of heat waves. The predictive insights on geophysical, weather and climate extremes can be combined with consequence metrics like population and critical infrastructures as well as resiliency indices based on economic or financial indicators and other social or community factors. In the short-term, these factors may be assumed to be static or exhibiting a simple analytically tractable variability, however for realistic longer-term projections at higher resolutions the projections for population, demographics, economic development and resiliency need to be developed for future time-periods. These are major areas of future research.

12

Auroop R. Ganguly*, Yi Fang, Shiraj Khan, and Olufemi A. Omitaomu

Prediction and predictability in the longer terms invariably lead to significant debates, both within the scientific community and between scientists and policy-makers. Thus, while many researchers readily relate weather-related disasters in recent years to global warming [101], a few others suggest population pressures and vulnerability as the primary cause [88, 99]. In fact, researchers have even questioned the accuracy of disaster loss data. However, we believe these debates will get resolved as more and better observations are gathered and analyzed, compared with model simulations, utilized in conjunction with other data for root-cause analysis regarding past and anticipated disaster damage, and mined to discover new insights. Meanwhile, there is a need to be cognizant of the ongoing debates, as well as the mainstream scientific opinions, and convey the uncertainties and viewpoints accurately to the policy-makers.

1 Knowledge Discovery for Scientific Applications

13

1.4 CASE Study: Weather Extremes Impacts on Infrastructures and Communities This section describes weather extremes impacts in relation to short-term predictions, long-term predictions, and computational requirements. The research and practice requirements are summarized in Table 1.4. Table 1.4: Weather extremes and impacts on societies and infrastructures Weather Extremes Short NWP Models with Updates Term Nowcasting; Extrapolation Flood and Surge Models Long Probabilities and IDF Term Climate variability impact Climate change impact

Resiliency and Readiness Infrastructural Impact Level Sensor-based Monitoring Population and Evacuation Design Safety Factors Resiliency and Redundancy Policies for Mitigation

1.4.1 Short-Term Predictions and Tactical Decisions The short-term weather predictions involve the use of Numerical Weather Prediction (NWP) models [24] with real-time updates based on sensor observations [51, 25]; and data assimilation schemes [126, 45], for example, in hurricane forecasts [104, 55]. However, one problem with the use of NWP models for short-term forecasting of hurricanes and convective storms is that the physical processes for convection are not well-developed. This has remained true in spite of improvements in our understanding of the physics, better data from sensors, and enhanced data assimilation schemes. Nowcasting based on extrapolation [89] of observations have tended to outperform NWP models at high-resolutions and for very short prediction lead times (an hour or two), while NWP models appear to perform better beyond 6 hours or so, especially at lower resolutions. Approaches which can blend NWP model outputs with extrapolation-based predictions, in a way that the physics of the high-resolution processes are taken into account, appear to be a way forward, especially for the critical 1-6 hour lead times [40]. Some of the short-term impacts are discussed below: 1. Hydrologic and Infrastructural Impacts: The damages from hurricanes are typically caused by floods and coastal surges, thus an ability to couple the hydro-meteorological results with models for flood [114] and surge models [21] is an important requirement. The readiness and resiliency of infrastructures and communities to weather extremes like hurricanes have been topics of intense research lately [1]. The assessment of impacts on infrastructures (e.g., doing in advance what ASCE, 2005, did after the fact) and the dynamic sensor-based monitoring thereof [71]

14

Auroop R. Ganguly*, Yi Fang, Shiraj Khan, and Olufemi A. Omitaomu

can go a long way to improved infrastructure resiliency if the approaches are intimately tied to short-term forecasts of weather extremes and a decision-making framework. 2. Tactical Decisions toward Hazard Mitigation: The short-term tactical decisions in the context of natural hazards like hurricanes are typically restricted to understanding impacts on population [96, 22, 6, 23, 106] and infrastructures [8, 66] to help save lives and economies, effective evacuation strategies for the impacted population [112], strategies for humanitarian aid disbursal [42] and understanding the ability for short-term post-disaster resiliency and recovery [1, 50, 20]. 3. Challenges and Opportunities: Tremendous challenges and opportunities exist in terms of sensor-based data assimilation, as well as synthesis of observations and model outputs in real-time, based on multi-scale information synthesis [40] schemes like the Ensemble Kalman filter [32, 93], and computational infrastructures for capability computing [108]. There is also a need for real-time analysis of data at distributed locations from remote or in-situ sensors, which motivate online analysis and discovery tools for real-time processing of sensor data [33]. Integrated decision frameworks require holistic solutions, where knowledge discovery opportunities exist for dynamic sensor-based monitoring of infrastructures and transportation networks, real-time sensor-based traffic control and advisories for evacuation planning, and sensor-based resiliency monitoring of hydraulic and other structures. These challenges require interdisciplinary teams for holistic decision frameworks. 1.4.2 Longer-Term Predictions and Strategic Policy Patterns and Trends from Observations An important aspect of longer-term predictive modeling of weather extremes involves the analysis of trends and patterns in historical and current observations and utilizing these for future projections [15, 57, 13, 36]. The probabilities of occurrence, as well as the intensities, durations and frequencies, of extreme weather events have been analyzed from observations, along with the trends and variability in these characteristics of extremes [124, 100, 37, 28, 62]. However, there is significant scope for pattern analysis and offline discovery because massive datasets lie unexplored and new datasets are becoming available everyday. We note that there is a need for caution during the analysis and interpretation of extremes [14]. Debates and Uncertainties on Trends There have been intense debates in recent years among the scientific community on whether changes in the number, duration and intensity of tropical cyclones can be considered a trend [117, 12, 118]. The debates are pervasive

1 Knowledge Discovery for Scientific Applications

15

and do not necessarily spring from the veracity of statistical analysis alone [31, 87]. We also note that these debates are different from previous ones regarding the occurrence of global warming, on which a consensus has nearly been achieved, at least among mainstream scientists [54], or the steady increase in atmospheric carbon dioxide over the last several decades, which is an observed fact [58]. The debates here deal with specific impacts of climate change on weather extremes like hurricanes, rather than the phenomenon of climate change itself. Climate Impacts on Weather Extremes The tremendous uncertainty surrounding some of the issues regarding climateweather linkage, for example the links of global warming to the increase in the number and intensity of hurricanes, suggests that a closer inspection is necessary. Specifically, historical weather-sensor observations and climate data gathered from various sources need to be analyzed, in an offline mode, in significant detail and with much greater care. When climate models are indicators are utilized to understand and quantify the impacts of climate on weather extremes, there is a need to delineate the impacts of natural climate variability before or during the quantification of the impacts of climate change. These issues are described in detail below: 1. Natural Climate Variability and Weather Extremes: In the longerterm, natural climate variability, for example the inter-annual El Nino phenomena, can have significant impact on weather and hydrologic extremes [47, 97, 59]; In fact, the 2006 hurricane season turned out to be much quieter than anticipated and the most plausible hypothesis is the occurrence of the El Nino [26], although an influence of African dust storms has also been suggested as an added factor [75]. Incidentally, as of writing this manuscript, a “very active” 2007 hurricane season is being predicted by forecasters [102]. The need to understand and quantify the impacts of natural climate variability on weather extremes is underscored through the previous examples. The ability to quantify climate variability, including climate anomalies like El Nino, requires processing of massive geographic data obtained from remote sensors like satellites and aircrafts as well as in-situ sensors for ocean temperature or salinity and sensor networks like ocean monitoring instrumentation. The ability to relate such large-scale geophysical phenomena to weather or hydrologic extremes and regional change, both for offline discovery and online analysis, requires holistic knowledge discovery approaches and massive computational capabilities. The need to quantify the impact of natural climate variability also stems from the requirement to delineate and isolate the effects of global or regional climate change [54], and in particular possible anthropogenic effects

16

Auroop R. Ganguly*, Yi Fang, Shiraj Khan, and Olufemi A. Omitaomu

[48], on weather extremes and natural hazards. The assertions and debates around the impacts of global warming on weather related hazards have been alluded to earlier in this chapter. The impacts and current wisdom in the insurance sector (e.g., see [121] and [101], for two interesting viewpoints) may provide some indications on how the financial world may be adapting, or may be anticipating the need to adapt, to climate change. The importance of human factors rather than climate change has been emphasized by some researchers as the primary cause for driving recent natural disasters [29], although there a consensus is lacking on the relative impacts of change in climate versus human factors (e.g., [101, 121]). point to two different lines of research. First, there is a need to understand the relative and complementary roles of climate and human-factor induced changes and their combined impact on natural disaster losses. Second, there is a need to understand how future changes in climate may influence the variability of the weather extremes as well as related disaster losses. Finally, there is a need to combine the risks, consequences, vulnerabilities and anticipatory damage assessments on impacts within one policy tool which can provide metrics and visual guidance to policy makers. 2. Climate Change and Weather Extremes: Future projections, especially at sufficiently long-terms when projections based on trends in past or current observations may no longer be valid, need to rely directly or indirectly on climate model simulations. State of the art climate models like the Community Climate System Model, Version 3, (CCSM3) can generate precise climate reconstructions and predictions, for example daily and even 6-hourly estimates of climate variables at three-dimensions and one degree spatial grids, from the year 1870 to 2100 [16]. However, precision does not necessarily imply accuracy, and precise predictions may be only as good as the temporal and spatial scales of the atmospheric and coupled earth system processes that such systems can model. A thorough and careful analysis of the model simulations, with a quantification of the uncertainty, can yield interesting and novel insights. The first step is to compare the model outputs with observations for time periods when both are available, in an effort to understand the problems in the model outputs and quantify the inherent uncertainties in space and time. Since any simulation model is an imperfect realization of reality and tends to smoothen out the outliers and extremes, this can be a hard test for climate models. However, simplified tests may help prove a point. Thus, while the parameters of extreme value theory obtained from observations and model simulations of temperature may or may not be statistically similar, the number, frequency and duration of heat waves based on user defined criteria and thresholds may align well and this alignment may provide sufficient information in some cases. There have been attempts to compare modeled and observed extremes [63],even based on detailed statistical analysis of extreme values [64].

1 Knowledge Discovery for Scientific Applications

17

The next step is to investigate trends and patterns within climate model projection and quantify the uncertainties based on the results of modelobservation comparisons. The Science paper by Meehl and Tebaldi [75] demonstrated how insights about future weather extremes, in their case heat waves, can be obtained in this fashion. However, papers like these merely scratch the surface compared to of what can be potentially achieved in terms of actionable predictive insights from a combination of observations and models if temporally-, spatially- and geographically-aware, knowledge discovery tools [64, 60, 62] which are specifically tailored for earth science applications, are “let loose” on the massive volumes of sensor-observed and model-simulated data to validate, and perhaps discover, insights about weather extremes. This is an urgent and high-priority research area whose time has clearly come. Human Impacts of Weather Extremes 1. Globalization and Change in Human Factors: While our discussions have focused on weather extremes alone, there have been claims that the current increase in disaster losses is caused more due to human factors, as discussed previously. However, there is also an understanding that anticipated climate change may begin to change the relative impacts. In any case, a link needs to be firmly established (or rejected) between the anticipated change in weather extremes, whether caused by inherent climate variability or change, to the corresponding impacts on human population [38, 7] and adaptability. 2. Resiliency, Vulnerability and Policy Tools: Weather disaster impacts relate to the design and safety of infrastructures and the resiliency of vulnerable societies [110, 85, 113]. An integrated policy tool is needed for various levels of strategic decisions. Thus, observation-based insights on hurricane or rainfall extremes may be utilized to design more resilient hydraulic structures in the short-term near the coasts or stronger foundations of offshore structures. In addition, these policy aids can be used to assess the extreme variability, risks or consequences, resiliency and overall damage caused by anticipated extremes (for a proof-of-concept example, see [38, 103, 80]. 3. Policy Impacts: The climate-weather links in the medium term may lead to design of highway and infrastructure sensing or monitoring systems, build redundancies for contingency planning, enhance readiness of societies through early warning systems and public education, and perhaps even plan human habitations such that vulnerabilities may be reduced. Long-term planning based on climate projections may influence human habitation and demographics through carefully regulated policies, for example through planned movement of population from vulnerable regions, developing sustainable resiliency, and may guide climate policy.

18

Auroop R. Ganguly*, Yi Fang, Shiraj Khan, and Olufemi A. Omitaomu

1.4.3 Computational Requirements The difference between the requirements for short-term predictive modeling and tactical decision-making versus longer-term predictive analysis and strategic policy-making has interesting facets. One important example is the differing needs for high performance computing. As noted by Strohmaier et al. [108], solutions for the more computationally challenging problems, including ones with real-time constraints like weather prediction, require “capability computing”, while parametric studies and problems requiring simultaneous running of jobs need to leverage “capacity computing”. At the risk of over-generalization, perhaps one can make a statement that capacity computing may be sufficient for exploratory statistical analysis of historical weather-sensor observations and climate-model simulations, and even for more involved non-traditional analysis algorithms. However, as the knowledge discovery processes and analysis approaches get more involved, the requirements are expected to move toward capability computing.

1 Knowledge Discovery for Scientific Applications

19

1.5 Knowledge Discovery Solutions for Weather Extremes and Impacts This section provides an overview of the knowledge discovery solutions developed or being developed by the authors and their collaborators in the area of weather and climate extremes and related natural hazards. While the sample solutions are neither exhaustive nor intended to be directional, they can help illustrate the possibilities that knowledge discovery, broadly construed, can bring to the area of hazards mitigation. Attempts to predict natural hazards and their consequences are perhaps as old as human civilization but by now recognized to be a major challenge. However, there are recent developments which make us believe, to quote a news article [7] in Science magazine, that “science can provide solutions”. The reason for our optimism are based on recent developments, specifically, 1. Availability of massive volumes of geographic data from remote sensors, in-situ sensor infrastructures and wireless sensor networks 2. Enhanced understanding of process physics in the earth sciences or environmental systems leading to the development of advanced modeling and computational platforms 3. New developments in knowledge discovery for extracting actionable predictive insights based on raw data from sensors or from simulations models 4. Ability to develop closed-loop decision and policy aids that combine risks, consequences and vulnerability through quantifiable metrics for infrastructural and community resiliency and vulnerability The examples presented in this section attempt to emphasize the potential of the above based on the work done by our team and collaborators. In addition, these are intended to provide examples of both a closed-loop knowledge discovery process depicted in Figure 1.1 as well as for an end-to-end solution for decision and policy making for natural hazards mitigation. 1.5.1 End-to-end solution for natural hazards This sub-section discusses examples of a few building blocks which can ultimately lead to a closed-loop process for tactical and strategic decision-making in the context of weather or climate related hazards. In this respect, we highlight the following solutions: 1. Short-term weather prediction based on numerical weather model outputs and remotely sensed observations: [40] 2. Real-time change detection from remotely sensed data and metrics for dynamic and distributed alarms and decisions: [33] 3. Observed trends in weather extremes trends from massive and geographicallydistributed data: [62]

20

Auroop R. Ganguly*, Yi Fang, Shiraj Khan, and Olufemi A. Omitaomu

4. Predictability from short and noisy geophysical datasets and the extraction of underlying dynamics from data: [61] 5. Natural large-scale climate variability and the impacts on local geophysical phenomena like hydrologic cycles: [59] 6. Comparison of weather extreme properties extracted from climate model simulations and historical observations: [64] 7. Evaluation of high-resolution population estimates to develop metrics related to population under threat: [96] 8. Quantification and visualization of human impacts of climate and weather extremes leading to policy tools: [38]

Fig. 1.2. Short-term rainfall prediction [40]

Figure 1.2 depicts the short-term rainfall prediction methodology developed by Ganguly and Bras [40]. Numerical weather prediction model outputs and remote sensor observations from weather radar were blended for highresolution forecasts at radar resolutions and for 1-6 hour lead times. The approach relied on a process-based strategy, where the overall problem was partitioned into component processes based on domain knowledge and exploratory data analysis and then the results were re-combined. The forecasting strategy utilized a combination of weather physics like advecton, as well

1 Knowledge Discovery for Scientific Applications

21

as a suite of traditional and new adaptations of data dictated tools like exponential smoothing, space-time disaggregation, and Bayesian neural networks. Case studies with real data indicated [39, 40] that the methodology was able to outperform the state of the art approach at that time.

Fig. 1.3. Online change detection and alarms [33]

The online approach developed by [33], shown in Figure 1.3, was motivated by statistical process control methods for change detection or alarm generation, and developed a novel adaptation of simulated annealing based technique for change point detection. The method is real-time in the sense that incremental data can be analyzed efficiently as soon as these become available. The method was applied to remotely sensed and geographically-dimensioned vegetation index. The approach was validated with data for Wal-Mart store openings [90], which in turn caused significant changes in vegetation indices in the store area as well as in surrounding regions. This approach can be used in the context of real-time natural disaster management by identifying regions in space and time with significant and rapid change in land cover, or to identify known features. The analysis methods can be used to investigate geographic and remote sensing data at low resolutions and then zero in on areas where change is occurring or has occurred in the recent past. In addition, the methods can be used to identify changes due to deforestation and satellite-based products like cloud cover. The algorithmic efficiency and incremental analysis approach differentiates this method from many traditional approaches used in GIS and remote sensing change detection applications. Figure 1.4 exhibits a preliminary result from the approach developed by [62] for computing the spatio-temporal trends in the volatility of precipitation extremes. The methods were utilized for large-scale geographicallydimensioned data at high spatial resolutions. The geospatial-temporal indices were computed at each grid point in South America for which rainfall time series was available. The change in the indices can be quantified and visualized by utilizing multiple time windows of data. The color scheme in Figure 1.4 and the GIS-based visualization was done as part of the [38] study described later. The red-amber-green color combination is used to denote high (red) to low (green) volatility for the extremes. The extremes volatility index was a new measure based on the ratios of return levels, computed at each grid. The

22

Auroop R. Ganguly*, Yi Fang, Shiraj Khan, and Olufemi A. Omitaomu

Fig. 1.4. Space-time trends in extremes volatility [62]

index presented here was normalized to scale between zero and unity by [38]. The ability to quantify and visualize weather and hydrologic extreme values and their properties (e.g., 100-year levels) in space and time is an important first step to studying the impacts on these extremes on infrastructures and human societies.

Fig. 1.5. Predictability in short and noisy data [61]

The ability to deal with massive volumes of geographic data from remote and in-situ sensors needs to be complemented by the ability to derive pre-

1 Knowledge Discovery for Scientific Applications

23

dictive insights from short and noisy geophysical and weather or climate related observations. Some of the most relevant historical geophysical data and indices may be limited or incomplete, however the presence of nonlinear dynamics and chaos on the one hand, and colored or even 1/f noise with seasonal fluctuations on the other, cannot be ruled out apriori. In fact, the ability to detect the underlying nonlinear dynamical signals from such data may be of significant value for studies in short and long-term predictability. Khan et al. [61] developed a methodology to extract the seasonal, random and dynamical components from short and noisy time series, and applied the methods to simulated data and real river flows. Figure 1.5 shows the application to the Arkansas River. The methodology was based on a combination of tools from signal processing, traditional statistics, nonlinear prediction and chaos detection.

Fig. 1.6. Nonlinear impact of El Nino on rivers [59]

The ability to quantify nonlinear dependence from historical data, even in situations where such data are short and noisy, are critical first steps in studies on predictability, predictive modeling, and physical understanding of weather, climate and geophysical systems. Khan et al. [59] developed an approach based on nonlinear dynamics and information theory, along with traditional statistics, to develop and validate new adaptations of emerging techniques for this purpose. The approach was tested on simulated data, and then applied to an index of the El Nino climate phenomena and the variability in the flow of tropical rivers. The approach revealed additional dependence between the variables than previously thought. The ability to quantify the impacts of natu-

24

Auroop R. Ganguly*, Yi Fang, Shiraj Khan, and Olufemi A. Omitaomu

ral climate variability on weather and hydrologic variables, as shown in Figure 1.6, can help refine our understanding of the impacts of climate change. The methodology, which can have significant broader impact beyond the case study considered here, was refined and expanded in a subsequent work [60].

Fig. 1.7. Geospatial-temporal extreme dependence [64]

Kuhn et al. [64] developed a new approach to quantify the geospatialtemporal dependence among extreme values from massive geographic data. The methodology was motivated by recent development in multivariate extremes, and hence can be applied to quantify the dependence among extremes of multiple variables, for example heat waves and precipitation extremes, in space and time. In addition, this copula-based measure can be useful to analyze simultaneous occurrence of extremes, and use these as indicators of possible change. Thus, if two 100-year events which have zero extremes dependence were to occur simultaneously, this would be a 10,000 year event (see [64], for details), whereas if the extremes have complete dependence then the simultaneous occurrence still represents a 100-year event. The new measure was utilized on geospatial-temporal rainfall observations and climate model simulations for inter-comparisons and model evaluation as shown in Figure 1.7. In addition, the extremes dependence in space and time was compared with the corresponding spatial correlation values, obtained here through a rank-based correlation measure. Metrics were developed by [96] for the comparative evaluation of geospatial datasets, with particular application to high-resolution population maps.

1 Knowledge Discovery for Scientific Applications

25

Fig. 1.8. Threat metrics for population-at-risk [96]

The metrics could be used for the evaluation of the population-at-risk, if a disaster were to strike at any geographic location at a given point in time. Three sets of metrics were developed and utilized for comparative analysis of highresolution population data. The first set comprised traditional spatial autoand cross-correlation measures. The second set of metrics was used for analyzing the differences at spatially aggregated or distributed scales. The third set of metrics was designed to measure the effectiveness of geospatial data in terms of their end-use, specifically the use of population data for disaster management. These last set of metrics combined and refined the concepts of “equitable threat scores” and other skill scores, typically used to rank meteorological or climate predictions, and “receiver operating curves” (ROC) used in signal processing. The metrics are refined to provide measures for evaluating multiple geospatial datasets. Case studies in regions like the one in Figure 1.8 demonstrated the significant difference in information content achieved between high-resolution population maps. Figure 1.9 shows a typical map generated by [38]. This work investigated how a combination of variables, specifically the precipitation extremes volatility of [62], the high-resolution population maps described by [6] and utilized by [96], as well as measures representing development or financial indices like the GDP, could be utilized in conjunction with each other to quantify and visualize the human impacts on natural disasters, specifically those caused by rainfall extremes in South America. The computed geospatial indices included the probabilities of truly unusual rainfall extremes, the risks to human population associated with such extremes, and the resiliency, or the ability of a region to respond to the disaster. Anticipatory information on disaster damage based on refinements of this study can aid policy makers.

26

Auroop R. Ganguly*, Yi Fang, Shiraj Khan, and Olufemi A. Omitaomu

Fig. 1.9. Human impacts of weather related disasters [38]

1.5.2 Closed-loop knowledge discovery process This sub-section will attempt to demonstrate how the solutions described briefly in the earlier sub-section, and in detail within the references cited, serve as examples of closed-loop knowledge discovery processes, specifically focusing on some of the key requirements and challenges for knowledge discovery described in the previous sections. The specific examples are listed below: 1. Multidisciplinary knowledge discovery for the multivariate dependence estimation problem: [96, 60, 64], 2. Interdisciplinary knowledge discovery for problems in short-term weather modeling and for the extraction of underlying nonlinear dependence and dynamics from historical geophysical data: [40, 39, 61, 60] 3. Process-based knowledge discovery for short-term weather modeling: [39, 40], 4. Holistic knowledge discovery for policy tools based on predictive insights on weather and climate extremes and their human impacts: [62, 96, 38] 5. Geographic knowledge discovery with a focus on extremes and anomalies for offline and online analysis of massive geospatial-temporal data: [62, 64, 96, 33]

1 Knowledge Discovery for Scientific Applications

27

6. Offline and online knowledge discovery requiring solutions for computational efficiency and algorithmic efficiency: [64, 33] The specific examples have already been briefly described in the previous section as well as in the references cited to our recent and ongoing research. We now discuss how the approaches serve as examples of the knowledge discovery concepts described above. Multidisciplinary Knowledge Discovery The utilization of knowledge discovery approaches spanning traditional disciplinary areas to develop solutions in the broad area of multivariate dependence estimation is described here. While each solution is motivated from a specific disciplinary area, the disciplines across these areas range from spatial statistics, extreme value theory and nonlinear dynamics or information theory. Multivariate dependence estimation is relatively mature in areas like linear correlation estimation between non-dimensioned data, auto-correlation and cross-correlation functions within and among time serial data, Fourier analysis and signal processing, and spatial auto-correlation and cross-correlation. There have been previous attempts to develop nonlinear information theoretic measures for dependence, especially for time serial data. However, the measures are not usually chosen to be robust to short data sizes and the presence of noise and seasonality. Nonlinear dependence among spatial or space-time is an area yet to be explored significantly. Finally, while dependence is typically calculated only among prevalent or usual values, the ability to compute the dependencies among unusual or extreme values, as well as anomalies, is an area that has so far been neglected. The problems we targeted ranged from linear, nonlinear and extreme dependence estimation within and among temporal, spatial and spatio-temporal data. Thus, Sabesan et al. [96] utilized spatial correlation based metrics to evaluate high-resolution population data, [60, 59]developed new adaptations of metrics motivated from information theory, nonlinear dynamics and statistics, for extracting time serial dependence from short and noisy data, while [64] developed novel metrics for geospatial-temporal dependence among extreme values, motivated by recent developments in multivariate extreme value theory, and adapted for scalability and efficiency. The examples presented here discuss a targeted utilization of existing theory [96], adaptations of recent or emerging theory [60] and development of new methodologies [64]. The articles, as well as the references therein, also provide examples of some of the existing methods in multivariate dependence, which are used in those articles for comparative analysis and/or for exploratory analysis. We note that significant open challenges remain in multivariate dependence estimation, some of which have been discussed within the previous references. Specifically, theories for nonlinear spatial and temporal dependence, or for computing dependence among anomalies and extremes within and among each

28

Auroop R. Ganguly*, Yi Fang, Shiraj Khan, and Olufemi A. Omitaomu

other in space and time, are still at rudimentary stages. Perhaps a major challenge is the development of high performance computing that can embed these approaches for automated dependence extraction within and among massive volumes of temporal, spatial, spatio-temporal and geographically-dimensioned data. Multivariate dependence estimation is typically expected to be within the realm of capacity computing, where jobs need to be run simultaneously, however capability computing may be occasionally needed. Interdisciplinary Knowledge Discovery Certain problems cannot be solved unless a single unified approach is developed which relies on combination of tools or methods motivated from multiple disciplinary areas. The short-term weather forecasting problem described in [39] and [40] could not have been solved unless approaches ranging from weather physics to traditional statistical methods for time series like exponential smoothing were used in conjunction with an adaptation of neural networks for space-time forecasting and uncertainty estimation. Khan et al. [61] leveraged a combination of tools from signal processing, traditional statistics and nonlinear dynamics to extract underlying nonlinear dynamical signals from short and noisy time serial datasets obtained from stream gage sensors. The predictability bounds and extraction of dynamical signals would not have been possible without an approach that relied on methods drawn from the various disciplinary areas. Khan et al. [59] and [60] adapt and utilize various emerging approaches for nonlinear dependence from time serial datasets. The overall dependence estimation and their inter-comparisons would not have been possible without methods drawn from multiple disciplines like time series statistics, nonlinear dynamics, and information theory. Process-based Knowledge Discovery Ganguly and Bras [40] developed a short-term rainfall forecasting strategy that utilized numerical weather prediction model output and remote sensor data from weather radar. A combination of tools was developed, which was motivated by knowledge of the physics of rainfall at high-resolutions and the capability of data dictated tools to model key processes. The overall problem was decomposed into component process based on an understanding of the domain and exploratory data analysis. The component processes were modeled based on physics-based approaches or a suite of data dictated tools ranging from exponential smoothing to neural networks. The individual components were put together to obtain overall forecast means and uncertainty bounds in space and time. Holistic Knowledge Discovery Holistic knowledge discovery solutions go all the way from raw data to decision aids in the context of an application domain. Here we describe how three

1 Knowledge Discovery for Scientific Applications

29

lines of research successfully put together a holistic solution in the context of designing a strategic policy tools for natural hazards due to rainfall extremes in South America. Khan et al. [62] utilized extreme value theory for single time series at each grid point in space, and utilized the results of the analysis to explore spatial and temporal trends in extreme values. The data were available from NOAA in the form of daily rainfall at one degree and 2.5 degree spatial grids over South America. Sabesan et al. [96] developed a set of geospatial-temporal indices for the comparative evaluation of geospatial datasets and utilized the metrics to evaluate high-resolution global datasets for human population under threat. Studies such as this can help quantify and validate the uncertainties and risks associated with disasters and the human population potentially exposed to harm’s way. Fuller et al. [38] developed a set of geospatial-temporal indices to quantify the impacts of anticipated precipitation extremes in South America. The results of the rainfall extremes study of Khan et al. [62] were utilized, while exploiting the relationship with high-resolution population data was motivated by the [96] study. In addition, gross domestic products (GDP) was used as a metric for the development index of a region, and hence as a measure of the region to respond to disasters. This can be thought of as a rough measure of resiliency. The quantification and visualization of human impacts of catastrophic natural disasters in study by Fuller [38] lend to root-cause analysis. Thus, the geospatial and temporal metrics can enable policy-makers to visualize the variability of the natural extremes, risk to human population, and anticipated damage potential, at high spatial resolutions, and in the case of extremes variability, over multiple time windows. This capability can enable policymakers to allocate possibly scarce resources to individual regions depending on vulnerability, and even focus on the areas where improvement may be necessary. In addition, the tools may provide some directions on whether the variability of natural extremes causes the damage potential and hence may be mitigated through early warning, or whether reduced economic resiliency is the cause, leading to the need for regional development, or whether the issue is population pressure, which may require evacuation strategies in the shortterm and better planning in the longer-term to gradually cause the population to move to safer areas. A recent news article in Science magazine [7] indicates that the community “envisions a public database like Google Earth” to develop “risk landscape down to zip code level ”. The holistic approach described here takes a first step in that direction. Geographic Knowledge Discovery The generation of actionable predictive insights and new information from geographic data is an immense challenge of significant societal importance

30

Auroop R. Ganguly*, Yi Fang, Shiraj Khan, and Olufemi A. Omitaomu

[74]. However, examples of knowledge discovery approaches from geospatialtemporal data are relatively few. The study by Sabesan et al. [96], which developed metrics for evaluating geospatial datasets for high-resolution population, is a relatively simple example of generating new insights from geospatial data. The work by Khan et al. [64] is interesting because it focuses on the analysis of extremes rather than the usual, and because it uses large volumes of geospatial-temporal data (grid-based daily rainfall in South America). In fact, [62] utilized extreme value theory for single time series at each grid point in space, and further utilized the results of the individual time series analysis at each grid to explore spatial and temporal trends in extreme rainfall values over South America. [64] went one step further and defined a new measure for quantifying the dependence among extreme values in space and time, and utilized the measure for quantifying and visualizing the geospatial-temporal dependence among extreme values in South America. The new measure was motivated by recent developments in multivariate extreme value theory and relied on a copula-based approach. The work by Fang et al. [33] provide a completely different example of geographic knowledge discovery, where the problem of change and change-point detection from remotely sensed data was addressed, with a focus on making the methodologies applicable to real-time analysis. Online and Offline Knowledge Discovery Offline pattern detection is based on analysis of massive volumes of data, which can be dimensioned by time, space or geography, and hence needs to be scalable and computationally efficient. These methods may be useful in the context of analyzing archived sensor-based data for predictive insights and hence the use of high performance computing is a natural requirement. Online or real-time analysis needs to process incremental information efficiently, based on dynamic and possibly evolving data, often in a spatially distributed setting. These approaches need to be algorithmically efficient for fast and memory-efficient processing such that they can be useful for real-time and distributed processing of sensor-based information. Online approaches may rely on the insights generated from offline discovery, and may in turn feed directly into real-time decision support tools. Here we describe one example each in offline and online knowledge discovery. Kuhn et al. [64] developed copula-based tail dependence metric for geospatialtemporal dependence among extreme values, which was in turn motivated by recent developments in multivariate extreme value theory. The new metric was implemented on observations of rainfall in South America to quantify and visualize the dependence among extreme values within each grid-pair, based on the time series data available at each grid. This is an example of an approach for offline knowledge discovery, where massive datasets need to be processed to gain new insights from data, in this case to obtain the initial pair-wise dependency estimates. As currently implemented, the computa-

1 Knowledge Discovery for Scientific Applications

31

tional complexity of the algorithm is O(n2 d2 ), where n is the sample length and d the number of dimensions. The number of grid-pairs is about half a million, thus the computational task was daunting. The problem was solved through the efficient use of high-performance computing facilities at the Oak Ridge National Laboratory. While the solutions used thus far fall into the umbrella of capacity computing with the running of simultaneous jobs being the primary requirement, future modifications are likely to require capability computing, especially when real-time predictive modeling and change analysis is attempted based on the dependence results. Fang et al. [33] developed a novel approach for online change detection from time series of remotely sensed geospatial data sets. The purpose was to detect any change in real-time or near real-time (where real-time implies in this context the time when incremental data are available), and once a change was detected, estimate the change-point. The change detection algorithm was capable of distributed and dynamic processing and operated in “constant time”, or O(1) complexity, while the complexity of the change-point detection could vary from constant time to O(n), depending on how many points back the change-point happened to be, once a change had been detected at a particular time-point. The computational complexity of the algorithmic solutions developed by [64] and [33] naturally delineates these as offline and online tools. However, the distinction may not be always obvious. The computational complexity of offline approaches can be improved, while online approaches may benefit from high-end computational infrastructures; for example if holistic insights need to be generated based on the dynamic and distributed data available from multiple sensors. In addition, offline and online knowledge discovery approaches typically need to work in a coordinated fashion. Thus, predictive insights about the nature of anomalies and extremes extracted from offline analysis of massive dataset may be utilized as a parameter within the online approaches for assessing to what extent incremental dynamic or event-based data can be classified as usual.

32

Auroop R. Ganguly*, Yi Fang, Shiraj Khan, and Olufemi A. Omitaomu

1.6 Closing Remarks Actionable predictive insights generated from sensor data, in conjunction with data obtained from other sources like computer-based simulations, can facilitate short-term decisions and longer-term policies. The overall process where raw data from sensors or simulations are ultimately converted to actionable predictive insights for decision and policy makers is defined as knowledge discovery in this chapter. This chapter presents a broader view of knowledge discovery compared to the traditional definitions by the data mining community, but with the data mining and other data sciences as key aspects of the overall process. In addition, we define sensors broadly to include wireless sensor networks, in-situ sensor infrastructures and remote sensors. The challenges and opportunities for knowledge discovery based on data from sensors and simulations are described. In particular, we present a vision of knowledge discovery in the context of scientific applications. Scientific applications and scientific knowledge discovery may make sense primarily in the context of a specific domain. Our focus is weather, climate and geophysical hazards. While prediction of natural hazards and mitigating their consequences have been attempted since the dawn of human civilization with varying degrees of success, the possibility of enhanced knowledge discovery from ever-increasing and improving sensor and simulation data make us optimistic that significant breakthroughs may be possible in the near future. Specifically focusing on tactical decision-making and strategic policy design in the context of natural disasters, this chapter describes how knowledge discovery from historical and real-time sensor data and computer model simulations can lead to improved predictive insights about weather, climate and geophysical extremes, which can in turn be combined with metrics for disaster risks, consequence and vulnerability. The challenges and opportunities, both in terms of the deep science and the societal benefits, are immense. There is a need for closer research interactions among the knowledge discovery and the geophysical communities. A recent article in Science magazine [7] mentions that “Because the causes and impacts of disasters are so broad . . . we need teams of geophysicists who can talk fluently with epidemiologists, and engineers with psychologists”. Another article in Science [70] says that “post disaster assistance . . . has failed to meet the needs . . . ”. Indeed, consequence management needs to be complemented with anticipatory and predictive insights for the short- and long-terms, which can in turn lead to tactical decisions and strategic policies for mitigating the immediate or anticipated impacts of natural hazards.

1 Knowledge Discovery for Scientific Applications

33

Acknowledgments This research was funded by the Laboratory Directed Research and Development (LDRD) Program of the Oak Ridge National Laboratory (ORNL), managed by UT-Battelle, LLC, for the U.S. Department of Energy under Contract DE-AC05-00OR22725. We are grateful to Dr. David J. Erickson III and Mark Tuttle, both of ORNL, for their reviews of the manuscript within ORNL’s internal publication tracking system. In addition, we are thankful to Drs. Ranga Raju Vatsavai and Steven J Fernandez, both of ORNL, for their informal reviews and comments. We would also like to thank our co-authors and others who have helped and are continuing to help in our recent and ongoing research highlighted in Section 6 of this chapter. In particular, we thank Dr. Gabriel Kuhn, Aarthy Sabesan, Kathleen Abercrombie, and Christopher Fuller, who worked on some of the cited projects while Gabriel and Aarthy were at ORNL as post-masters associates, and Kate and Chris as undergraduate interns, respectively. We are grateful to Drs. Vladimir Protopopescu and Brian Worley of ORNL for their contributions to Figure 1.1, to Professor George Cybenko of Dartmouth for the process detection emblem, and David Gerdes, formerly of ORNL, for his help with the figure. We also gratefully acknowledge the suggestions by the original “PRIDE” team comprising collaborators at ORNL and multiple university partners for their helpful suggestions. Table 1 is an adapted version of the requirements proposed by the first author at a working group meeting of the Weather Extremes Impacts Workshop organized by Los Alamos National Laboratory and the National Center for Atmospheric Research at Sante Fe on February 27-28, 2007. The authors would like to thank Dr. Budhendra L. Bhaduri of ORNL for his help in many of these research efforts and for his insightful comments on an earlier related draft. Shiraj Khan and Auroop Ganguly would like to acknowledge the help and support of Professor Sunil Saigal, currently at the University of South Florida (USF), for his support, and for facilitating Shiraj complete his PhD at USF with Auroop as his major supervisor. This chapter has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes.

34

Auroop R. Ganguly*, Yi Fang, Shiraj Khan, and Olufemi A. Omitaomu

References [1] Adger, W.N., Hughes, T.P., Folke, C., Carpenter, S.R. and Rockstrom, J. (2005). Social-ecological resilience to coastal disasters, Science, 309(5737): 1036–1039. [2] Agovic, A., Banerjee, A., Ganguly, A.R., and Protopopescu, V.A. (2007). Anomaly Detection in Transportation Corridors Using manifold Embedding, (Under Review). [3] Amerault, C. and Zou, X. (2003). Preliminary steps in assimilating SSM/I brightness temperatures in a hurricane prediction scheme, Journal of Atmospheric and Oceanic Technology, 20(8): 1154–1169. [4] ASCE (2005). Hurricane Katrina: Why Did the Levees Fail? Testimony of Peter Nicholson, Ph.D., P.E., Associate Professor of Civil and Environmental Engineering and Graduate Program Chair University of Hawaii, On behalf of the AMERICAN SOCIETY OF CIVIL ENGINEERS Before the Committee on Homeland Security and Governmental Affairs U.S., Senate, November 2. [5] BBC (2005). Indian Ocean tsunami warning system, British Broadcasting Corporation, News Report, Friday, 23 December. [6] Bhaduri, B., Bright, E., Coleman, P. and Dobson, J. (2002). LandScan: Locating People is What Matters, Geoinformatics, 5(2): 34–37. [7] Bohannon, J. (2005). Disasters: Searching for Lessons from a Bad Year, Science, 310(5756): 1883, 25 December. [8] Boin, A. and McConnell, A. (2007). Preparing for Critical Infrastructure Breakdowns: The Limits of Crisis Management and the Need for Resilience, Journal of Contingencies and Crisis Management, 15(1): 50. [9] Brodley, C.E., Terran, L. and Stough, T.M. (1999). Knowledge discovery and data mining, Computers taught to discern patterns, detect anomalies and apply decision algorithms can help secure computer systems and find volcanoes on Venus, American Scientist, 87(1): 54. [10] Butler, D. (2006). 2020 computing: Everything, everywhere, Nature, 440, 402–405, doi: 10.1038/440402a, 23 March. [11] CNN (2005). Chertoff: Katrina scenario did not exist, CNN.com, Monday, September 5. [12] Chan, J.C.L. (2006). Comment on Changes in Tropical Cyclone Number, Duration, and Intensity in a Warming Environment, Science, 311(5768): 1713b, 24 March. [13] Coles, S. (2001). An introduction to statistical modeling of extreme values, Springer. [14] Coles, S.G., Pericchi, L.R., Sisson, S. (2003). A fully probabilistic approach to extreme rainfall modeling, Journal of Hydrology, 273(1): 35–60. [15] Coles, S. and Tawn, J. (2005). Bayesian modelling of extreme surges in the UK east coast, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 363(1831): 1387–1406.

1 Knowledge Discovery for Scientific Applications

35

[16] Collins W.D., Bitz C.M., Blackmon M.L., Bonan G.B., Bretherton C.S., et al. (2006). The Community Climate System Model Version 3 (CCSM3), Journal of Climate, 19(11): 2122–2143. [17] Cressie, N. (1993). Statistics for spatial data, Wiley Interscience. [18] Cutter, S.L. (2003). GI Science, Disasters, and Emergency Management, Transactions in GIS, 7(4): 439–446. [19] Cutter, S.L., Boruff, B.J. and Shirley, W.L. (2003). Social Vulnerability to Environmental Hazards, Social Science Quarterly, 84 (2): 242–261. [20] Cutter, S.L. and Emrich, C.T. (2006). Moral hazard, social catastrophe: The changing face of vulnerability along the hurricane coasts, The ANNALS of the American Academy of Political and Social Science, 604(1): 102–112. [21] Davis, J.R. and Sheng, Y.P. (2003). Development of a parallel storm surge model, International Journal for Numerical Methods in Fluids, 42(5): 549–580. [22] De, U.S., Khole, M. and Dandekar, M.M. (2004). Natural hazards associated with meteorological extreme events, Natural Hazards, 31: 487–497. [23] Dobson, J.E., Bright, E.A., Coleman, P.R., Durfee, R.C. and Worley, B.A. (2000). LandScan: A global population database for estimating population at risk, Photogrammetric engineering and remote sensing, 66(7): 849–857. [24] Done, J., Davis, C.A. and Weisman, M. (2004). The next generation of NWP: explicit forecasts of convection using the weather research and forecasting (WRF) model, Atmospheric Science Letters, 5(6): 110–117. [25] Droegemeier, K. K. and Co-Authors (2004). Linked environments for atmospheric discovery (LEAD): A cyberinfrastructure for mesoscale meteorology research and education, Preprints, 20th Conference on Interactive Information Processing Systems for Meteorology, Oceanography, and Hydrology, Seattle , WA , American Meteorological Society. [26] Drye (2006). Mild U.S. hurricane season defied predictions, National Geographic, November 30. [27] Dzeroski, S. and N. Lavrac (2001). Relational Data Mining, Springer. [28] Easterling, D.R., Evans, J.J., Groisman, P.Ya., Karl, T.R., Kunkel, K.E. and Ambenje. P. (2000a). Observed variability and trends in extreme climate events: A brief review, Bulletin of the American Meteorological Society, 81(3): 417–425. [29] Easterling, D.R., Meehl, G.A., Parmesan, C., Changnon, S.A., Karl, T.R. and Mearns, L.O. (2000b). Climate extremes: Observations, modeling and impacts, Science, 22 September, 289(5487): 2068–2074. [30] Einstein, H.H. and Sousa, R. (2007). Warning systems for natural threats, Georisk: Assessment and Management of Risk for Engineered Systems and Geohazards, 1(1): 3–20. [31] Emanuel, K. (2005). Increasing destructiveness of tropical cyclones over the past 30 years, Nature, 436: 686–688. [32] Evensen, G. (2003). The Ensemble Kalman Filter: theoretical formulation and practical implementation, Ocean Dynamics, 53(4): 343–367.

36

Auroop R. Ganguly*, Yi Fang, Shiraj Khan, and Olufemi A. Omitaomu

[33] Fang, Y., Ganguly, A.R., Singh, N., Vijayaraj, V., Feierabend, N., and Potere, D.T. (2006). Online change detection: Monitoring land cover from remotely sensed data, icdmw, pp. 626–631, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW’06). [34] Fayyad, U., Patietsky-Shapiro, G. and Smyth, P. (1996). From data mining to knowledge discovery in databases, AI Magazine, 17(3): 37-54. [35] Forbes (2005). Hurricane Rita leaves trail of destruction in southern US, Forbes.com, 9–25. [36] Frei C. and Schr C. (2001). Detection Probability of Trends in Rare Events: Theory and Application to Heavy Precipitation in the Alpine Region, Journal of Climate, 14(7): 1568-1584. [37] Frich, P., Alexander, L.V., Della-Marta, P., Gleason, B., Haylock, M., Klein Tank, A.M.G. and Peterson, T. (2002). Observed coherent changes in climatic extremes during the second half of the twentieth century, Climate Research, 19: 193-212. [38] Fuller, C., Sabesan, A., Khan, S., Kuhn G., Ganguly, A., Erickson, D. and Ostrouchov, G. (2006). Quantification and visualization of the human impacts of anticipated precipitation extremes in South America, Eos Transactions, American Geophysical Union, 87(52), Fall Meeting Supplement, Abstract GC44A-03. [39] Ganguly, A.R. (2002). A hybrid approach to improving rainfall forecasts, Computing in Science and Engineering, 4(4), 14-21, July/August. [40] Ganguly A.R. and Bras R.L. (2003). Distributed Quantitative Precipitation Forecasting Using Information from Radar and Numerical Weather Prediction Models, Journal of Hydrometeorology, 4(6): 1168-1180. [41] Ganguly, A.R., Omitaomu, O.A. and Walker, R. (2007). Knowledge discovery from wide-area sensors for security applications; Case study: Widearea sensors at truck weigh stations, In: Gaber M. and Gama J. (eds), Learning from Data Streams - Processing Techniques in Sensor Networks, Springer-Verlag (to appear). [42] Griekspoor, A. and Collins, S. (2001). Raising standards in emergency relief: how useful are Sphere minimum standards for humanitarian assistance? BMJ, 373(7315): 740-742. [43] Grossman, R.L., Kamath, C., Kegelmeyer, P., Kumar, V. and Namburu, R. R. (2001). Data Mining for Scientific and Engineering Applications, Kluwer. [44] Guralnik, V. and Srivastava, J. (1999). Event detection from time series data, Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 33-42. [45] Hacker, J.P. and Snyder, C. (2005). Ensemble Kalman filter assimilation of fixed screen-height observations in a parameterized PBL, Monthly Weather Review, 133(11): 3260-3275. [46] Hanson, B. and Roberts, L. (2005). Resiliency in the Face of Disaster, Science, 309(5737): 1029, 12 August.

1 Knowledge Discovery for Scientific Applications

37

[47] Hannachi, A. (2006). Quantifying changes and their uncertainties in probability distribution of climate variables using robust statistics, Climate Dynamics, 27(2-3): 301-317. [48] Hegerl, G., Zwiers, C., Stott, F.W. and Kharin, V.V. (2004). Detectability of Anthropogenic Changes in Annual Temperature and Precipitation Extremes, Journal of Climate, 17(19): 3683-3700. [49] Hinke, T.H., Rushing, J., Ranganath, H. and Graves, S.J. (2000). Techniques and experience in mining remotely sensed satellite data, Artificial Intelligence Review, 14(6): 503-531. [50] Hitti, M. (2006). Resilience lets Katrina survivors cope, WebMD Health News. [51] Hou, A.Y., Ledvina, D.V., Da Silva, A.M., Zhang, S.Q., Joiner, J. and Atlas, R.M. (2000). Assimilation of SSM/I-derived surface rainfall and total precipitable water for improving the GEOS analysis for climate studies, Monthly Weather Review, 128, 509-537. [52] Houze, Jr., R.A., Chen, S.S., Smull, B.F., Lee, W.-C. and Bell, M.M. (2007). Hurricane Intensity and Eyewall Replacement, Science, 315 (5816), 1235. [53] Huang, C., Hsing, T., Cressie, N., Ganguly, A.R., Protopopescu, V.A., and Rao, N.S. (2007), Statistical Analysis of Plume Model Identification Based on Sensor Network Measurements, (submitted). [54] IPCC (2007). Climate Change 2007: The Physical Science Basis. Summary for Policymakers, Intergovernmental Panel on Climate Change, 18 pages. [55] Janjic, Z., Black, T., Pyle, M., Rogers, E., Chuang, H. and DiMego, G. (2006). Hazardous weather prediction using the WRF NMM, Geophysical Research Abstracts, 8(02293). [56] Katz, R.W. (2002). Stochastic modeling of hurricane damage, Journal of Applied Meteorology, 41, 754-762. [57] Katz, R. W., M. B. Parlange and Naveau, P. (2002). Statistics of extremes in hydrology. Advances in Water Resources, 25, 1287-1304. [58] Keeling, C.D., Whorf, T.P., Wahlen, M. and van der Plichtt, J. (1995). Interannual extremes in the rate of rise of atmospheric carbon dioxide since 1980, Nature, 375: 666–670. [59] Khan, S., Ganguly, A.R., Bandyopadhyay, S., Saigal, S., Erickson, D.J., III, Protopopescu, V., and Ostrouchov, G. (2006). Nonlinear statistics reveals stronger ties between ENSO and the tropical hydrological cycle, Geophysical Research Letters, 33, L24402. [60] Khan, S., Bandyopadhyay, S., Ganguly, A.R., Saigal, S., Erickson, D.J., III, Protopopescu, V., and Ostrouchov, G. (2007). Relative performance of mutual information estimation methods for quantifying the dependence among short and noisy data, Physical Review E (in revision). [61] Khan, S., Ganguly, A.R. and Saigal, S. (2005). Detection and predictive modeling of chaos in finite hydrological time series, Nonlinear Processes in Geophysics, 12: 41-53.

38

Auroop R. Ganguly*, Yi Fang, Shiraj Khan, and Olufemi A. Omitaomu

[62] Khan, S., Kuhn, G., Ganguly, A.R., Erickson, D.J., and Ostrouchov, G. (2007b). Spatio-temporal variability of daily and weekly precipitation extremes in South America, Water Resources Research, (in revision). [63] Kiktev D, Sexton D.M.H., Alexander L. and Folland C.K. (2003). Comparison of Modeled and Observed Trends in Indices of Daily Climate Extremes, Journal of Climate, 16(22): 3560-3571, 2003. [64] Kuhn, G., Khan, S., Ganguly, A.R., Branstetter, M. (2007). Geospatialtemporal dependence among weekly precipitation extremes with applications to observations and climate model simulations in South America, Advances in Water Resources. (Accepted for publication). [65] Kunkel, K.E., Pielke Jr., R.A. and Changnon, S.A. (1999). Temporal fluctuations in weather and climate extremes that cause economic and human health impacts: A review, Bulletin of the American Meteorological Society, 80, 1077-1098. [66] Leavitt, W.M. and Kiefer, J.J. (2006). Infrastructure Interdependency and the Creation of a Normal Disaster: The Case of Hurricane Katrina and the City of New Orleans, Public Works Management & Policy, 10(4): 306-314. [67] Lee, W., Stolfo, S.J. and Mok, K.W. (1999). A data mining framework for building intrusion detection models, IEEE Symposium on Security and Privacy, sp, p.0120. [68] Lillesand, T.M., Kiefer, R.W. and Chipman, J.W. (2004). Remote sensing and image interpretation, Fifth Edition, John Wiley and Sons. [69] Lindell, M.K. and Prater, C.S. (2003). Assessing Community Impacts of Natural Disasters, Natural Hazards Review, 4(4): 176-185. [70] Linnerooth-Bayer, J., Mechler, R. and Pflug, G. (2005). Refocusing disaster aid. Science, 309(5737): 1044-1046. [71] Lynch, J.P. and Loh, K.J. (2006). A Summary Review of Wireless Sensors and Sensor Networks for Structural Health Monitoring, The Shock and Vibration Digest, 38(2): 91-128. [72] McEnery, J., Ingram, J., Duan, Q., Adams., T. & Anderson, L. (2005). NOAA’s advanced hydrologic prediction service: building pathways for better science in water forecasting, Bulletin of the American Meteorological Society, 86: 375–385. [73] Meehl, G. and Tebaldi, C. (2004). More intense, more frequent and longer lasting heat waves in the 21st century, Science, 305(5686): 994–997. [74] Miller, H.J. and Han, J. (2001). Geographic Data Mining and Knowledge Discovery, London: Taylor and Francis, 3–32. [75] NASA (2007). Did dust bite the 2006 hurricane season forecast? NASA News. [76] NASA (2006). A Growing Intelligence Around Earth, Science@NASA, Headline News Feature. [77] Nature (2005). A divided world, Nature, Editorial article, 433(1).

1 Knowledge Discovery for Scientific Applications

39

[78] Nychka, D., Wikle, C.K. and Royle, J.A. (2002). Multiresolution models for nonstationary spatial covariance functions, Statistical Modelling: An International Journal, 2, 315–331. [79] North, G.R., Kim, K.-Y., Shen, S.S.P. and Hardin, J.W. (1995). Detection of forced climate signals. Part I: Filter theory, Journal of Climate, 8(3): 401–408. [80] Oak Ridger (2007). ORNL project aims to lessen impact of natural disasters, Oak Ridger. [81] Ohsawa, Y. (2002). Chance discoveries for making decisions in complex real world, New Generation Computing, 20(2): 143–163. [82] PAHO (2004). Management of Dead Bodies after Disasters: A Field Manual for First Responders, Washington, D.C.: Pan American Health Organization and World Health Organization, 194 pp, http://www.paho.org/english/dd/ped/ManejoCadaveres.htm. [83] Papadimitriou, S., Sun, J., and Faloutsos, C. (2005). Streaming pattern discovery in multiple time-series, VLDB, pages 697–708. [84] Pearce, L. (2003). Disaster Management and Community Planning, and Public Participation: How to Achieve Sustainable Hazard Mitigation, Natural Hazards, 28(2–3): 211–228. [85] Pelling, M. (2003a). Natural disasters and development in a globalizing world, Rutledge. [86] Pelling, M. (2003b). The vulnerability of cities. Natural disasters and social science, Earthscan. [87] Pielke, Jr., R.A., Landsea, C., Mayfield, M., Laver, J. and Pasch, R. (2005). Hurricanes and global warming, Bulletin of the American Meteorological Society, 86: 1571–1575. [88] Pielke, Jr., R.A. and Mills, E. (2005). Attribution of disaster losses, Science, 310(5754): 1615–1616. [89] Pierce C.E., Ebert E., Seed A.W., Sleigh M., Collier C.G., et al. (2004). The Nowcasting of Precipitation during Sydney 2000: An Appraisal of the QPF Algorithms, Weather and Forecasting, 19(1): 7–21. [90] Potere, D., Feierabend, N., Bright, E., Strahler, A. (2007). Walmart from Space: A New Source for Land Cover Change Validation, Photogrametric Engineering and Remote Sensing. (Accepted for publication). [91] Potter, C., Tan, P.-N., Steinbach, M., Klooster, S., Kumar, V., Myneni, R. and Genovese, V. (2003). Major Disturbance Events in Terrestrial Ecosystems Detected using Global Satellite Data Sets, Global Change Biology, 9(7), 1005–1021. [92] Puccinelli, D., Haenggi, M. (2005). Wireless sensor networks: applications and challenges of ubiquitous sensing, IEEE Circuits and Systems Magazine, 5(3): 19–31. [93] Reichle, R.H., McLaughlin, D.B. and Entekhabi, D. (2002). Hydrologic data assimilation with the Ensemble Kalman filter, Monthly Weather Review, 130(1): 103–114. [94] Rind D. (1999). Complexity and climate, Science,284(5411): 105–107.

40

Auroop R. Ganguly*, Yi Fang, Shiraj Khan, and Olufemi A. Omitaomu

[95] Roddick, J.F., Hornsby, K., Spiliopoulou, M. (2001). An updated bibliography of temporal, spatial and spatio-temporal data mining research, Lecture Notes in Computer Science, 147 pages. [96] Sabesan, A., Abercrombie, K., Ganguly, A.R., Bhaduri, B.L., Bright, E.A. and Coleman, P. (2007). Metrics for the comparative analysis of geospatial datasets with applications to high resolution grid-based population data, GeoJournal ,in review). [97] Sankarasubramanian, A. and Lall, U. (2003). Flood quantiles in a changing climate: Seasonal forecasts and causal relations, Water Resources Research, 39(5): 1134. [98] Sarewitz, D. and Pielke, Jr., R. (1999). Prediction in science and policy, Technology in Society, 21: 121–133. [99] Sarewitz, D. and Pielke, Jr., R. (2005). Rising Tide, The New Republic. [100] Schar, C., Vidale, P.L., Luthi, D., Frei, C., Haberli, C., Liniger, M.A. and Appenzeller, C. (2004). The role of increasing temperature variability in European summer heatwaves, Nature, 427: 332–336. [101] Schiermeier, Q. (2006). Insurers’ disaster files suggest climate is culprit: Rising costs hint at weather effect, Nature, 441: 674–675. [102] ScienceDaily (2007). Very active 2007 hurricane season predicted, Science Daily. [103] ScienceDaily (2006). Study aims to ease natural disaster impact, Science Daily. [104] Shi, J.J., Tao, W., Ardizzone, J., Shen, B. (2006). Impact of GEOS5 Data on Hurricane Forecasts using WRF, Eos Transactions, American Geophysical Union, 87(52), Fall Meeting Supplement, Abstract A13E–0987. [105] Shinozuka, M. and Chang, S.E. (2004). Evaluating the Disaster Resilience of Power Networks and Grids, in Modeling Spatial Economic Impacts of Disasters, Springer-Verlag, pp. 289–312, edited by Y. Okuyama and S.E. Chang. [106] Smith, S.K. (1996). Demography of disaster: Population estimates after hurricane Andrew. Population Research and Policy Review, 15(5–6): 459– 477. [107] Steinbach, M., Tan, P.-N., Kumar, V., Potter, C. and Klooster, S. (2003). Discovery of climate indices using clustering, Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 446–455. [108] Strohmaier, E., Dongarra, J.J., Meuer, H.W. and Simon, H.D. (2005). Recent trends in the marketplace of high performance computing, Parallel Computing, 31(3–4): 261–273. [109] Sykes, L.R., Shaw, B.E. and Scholz, C.H. (1999). Rethinking earthquake prediction, Pure and Applied Geophysics, 155(2–4): 207–232. [110] Tierney, K.J., Lindell, M.K. and Perry, R.W. (2001). Facing the Unexpected: Disaster Preparedness and Response in the United States. The National Academies Press, 318 pages.

1 Knowledge Discovery for Scientific Applications

41

[111] UNDP 2004. Reducing Disaster Risk: A Challenge for Development A Global Report. United Nations Development Programme, 164 pages. [112] Urbina, E. and Wolshon, B. (2003). National review of hurricane evacuation plans and policies: a comparison and contrast of state policies, Transportation Research A: Policy and Practice, 37(3): 257–275. [113] Vale, L.J., Campanella, T.J. and Fishwick, M.W. (2005). The Resilient City: How Modern Cities Recover from Disaster, The Journal of American Culture, 28(4): 456. [114] Vivoni E.R., Entekhabi D., Bras R.L., Ivanov, V.Y., Van Horne, M.P., Grassotti, C. and Hoffman, R.N. (2006). Extending the Predictability of Hydrometeorological Flood Events Using Radar Rainfall Nowcasting, Journal of Hydrometeorology, 7(4): 660–677. [115] Walker, N.D., Haag, A., Balasubramanian, S., Leben, R., Heerden, I.V., Kemp, P. and Mashriqui, H. (2006). Hurricane Prediction: A Century of Advances, Oceanography, 19(2): 24–36. [116] Washington Times (2006). No Katrina coverup, Editorial, The Washington Times. [117] Webster, P.J., Curry, J.A., Liu, J. and Holland G.J. (2005). Changes in Tropical Cyclone Number, Duration, and Intensity in a Warming Environment, Science, 309(5742): 1844–1846. [118] Webster, P.J., Curry, J.A., Liu, J. and Holland G.J. (2005). Response to Comment on Changes in Tropical Cyclone Number, Duration, and Intensity in a Warming Environment. Science, 311(5768): 1713. [119] Weiss, G.M. (2004). Mining with rarity: a unifying framework, SIGKDD Explorations, 6(1): 7–19. [120] Weiss, G.M. and Hirsh, H. (1998). Learning to predict rare events in event sequences, Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, 359–363. [121] White, R. and Etkin, D. (2004). Climate change, extreme events and Canadian insurance, Natural Hazards, 16(2–3): 135–163. [122] Williamson, M. (2005). Catch the wave [tsunami warning system], IEE Review, 51(3): 30–34. [123] Willoughby, H.E. (2007). ATMOSPHERE: Forecasting Hurricane Intensity and Impacts, Science, 315 (5816), 1232. [124] Wulfmeyer, V. and Henning-Muller, I. (2006). The climate station of the University of Hohenheim: analyses of air temperature and precipitation time series since 1878, International Journal of Climatology, 26(1): 113–138. [125] Zhu T., Zhang D.L., Weng F. (2002). Impact of the Advanced Microwave Sounding Unit Measurements on Hurricane Prediction, Monthly Weather Review, 130(10): 2416–2432. [126] Zupanski, M., Zupanski, D., Vukicevic, T., Elis, K. and Haar, T.V. (2005). CIRA/CSU Four-Dimensional Variational Data Assimilation System, Monthly Weather Review, 133(4): 829–843. [127] Zwiers, F.W. and von Storch, H. (2004). On the role of statistics in climate research, International Journal of Climatology, 24(6): 665–680.

Suggest Documents