IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 52, NO. 7, JULY 2014
3851
Automated Poststorm Damage Classification of Low-Rise Building Roofing Systems Using High-Resolution Aerial Imagery Jim Thomas, Member, IEEE, Ahsan Kareem, and Kevin W. Bowyer, Fellow, IEEE
Abstract—Techniques concerning postdisaster assessment from remotely sensed images have been studied by different research communities in the past decade. Such an assessment benefits a range of stakeholders, e.g., government organizations, insurance industry, local communities, and individual homeowners. This work explores detailed damage assessment on an individual building basis by utilizing supervised classification. In contrast with previous research efforts in the field, this work attempts at predicting the type of damages such as missing tiles, collapsed rooftop, and presence of holes, gaps, or cavities. Various existing and novel intensity-, edge-, and color-based features are evaluated. Additionally, preprocessing steps that automatically correct photometric and geometric differences are proposed. Furthermore, a study on the reliability of high-resolution aerial imagery in damage interpretation is conducted by comparing results with the assessment of expert volunteers. Results show that the proposed damage detection framework is very effective and performs at a level similar to that of the experts. This paper concludes that the type and extent of damage to individual rooftops can be identified with good accuracy from high-resolution aerial images. It is envisaged that the automated tools presented in this paper would play a significant role in rapid posthurricane damage estimation and in helping to better manage rescue and recovery missions. Index Terms—Aerial image hurricane disaster assessments, emergency response planning, supervised damage classification.
I. I NTRODUCTION
D
AMAGE estimation from remote sensing (RS) imagery has gained much interest in various research communities over the past decade. In recent years, images of affected areas are easily obtained through satellite or aerial sensors. There are many parties who are interested in damage assessment immediately following a disaster. This includes individual homeowners, local authorities, the Federal Emergency Management Agency, and insurance companies. The rapid detection and assessment of damage is essential for effective emergencymanagement efforts, as the assessment of the geographic extent of relative levels of damage is of principal importance in Manuscript received October 8, 2012; revised April 2, 2013; accepted June 20, 2013. Date of publication October 22, 2013; date of current version March 3, 2014. J. Thomas and K. W. Bowyer are with the Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556 USA (e-mail:
[email protected];
[email protected]). A. Kareem is with the Department of Civil Engineering, University of Notre Dame, Notre Dame, IN 46556 USA (e-mail:
[email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TGRS.2013.2277092
prioritizing relief efforts. Assessment from such images can also assist in providing rapid loss estimates. Building damage and debris spread can be identified and classified by applying change detection algorithms to pre- and poststorm image pairs. In this comparison process, building damage appears as changes in shape, lines, colors, texture, or other image properties. Previous research has shown that the severity of windstorm damage to buildings can be estimated from the extent of change in the roof structure [1]. Furthermore, the damage states observed at ground level have been found to correlate well with those observed from space using RS data [2]. A popular damage scale used for hurricane damages is the RS scale which was proposed by Womble [1] and categorizes rooftops into four damage states RS-(A–D). Rooftops with no damage are classified under the RS-A category. RS-B-classified rooftops are still intact but typically have what can be considered minor damage. In this scenario, some new roofing materials may be visible due to exposed decking from missing tiles, metal, or shingles. For the sake of simplicity, we will refer to all such minor damages as missing tiles for the rest of this work. Rooftops categorized as RS-C have rooftop structures that are partially destroyed. Finally, collapsed or missing rooftops are categorized as RS-D. A detailed description of the RS scale is shown in Fig. 1. Even after nearly a decade of research on automated damage assessment from RS imagery, the deployment of postdisaster efforts is largely based on manual work. For example, Ghosh et al. [3] describe how the Global Earth Observation Catastrophe Assessment Network (GEO-CAN) was formed to facilitate a rapid damage assessment after the January 2010 Haiti earthquake. GEO-CAN used crowdsourcing for RS-based damage interpretation. The GEO-CAN community, working with the World Bank, the United Nation Institute for Training and Research (UNITAR) Operational Satellite Applications Programme (UNOSAT), and the European Commissions Joint Research Center (JRC) led the way for a rapid postdisaster need assessment. Expert volunteers were asked to grade the level of damage to individual rooftops in high-resolution and very high resolution (VHR) aerial imagery using a web portal. They also made use of pictometry or oblique images to confirm the state of level of damage. After comparison with actual field data collected by JRC, this study was found to produce nearly 78% total accuracy. This recent interest in damage assessment through crowdsourcing is a good indicator of the difficulty in automatic assessment and the insufficiency of existing research.
0196-2892 © 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
3852
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 52, NO. 7, JULY 2014
Fig. 1. RS damage scale for hurricane damages as described in [1].
In this paper, we determine the efficacy of supervised classification for damaged interpretation. The key objectives of this work are to determine the reliability of high-resolution images for a detailed fine-grained damage analysis and to match the performance of human volunteers who label the images in various damage categories. We assume that algorithms can perform no better than expert volunteers at damage classification. Furthermore, the scope of this work is limited to studying freely and widely available 37-cm or coarser resolution posthurricane imagery. Our work is unique from previous studies because of the following: 1) We study the reliability of using the freely available 37-cm–1-m aerial images for not just finding out whether the building has been damaged but also the type and severity of damage; 2) the effect of preprocessing steps such as the correction of geometric and photometric differences is explored; 3) we propose new color-, edge-, and intensitybased features for damage classification and compare them with previous works; and 4) for resolving ground-truth ambiguity, we ask expert and nonexpert volunteers to identify different damage states using visual inspection and compare the predictions provided by the algorithms with their interpretation. II. R ELATED W ORK Feature vectors computed by measuring dissimilarity between before- and after-storm rooftop images can be used to visualize, measure, and classify damage. Much of the early work used edge-, intensity-, or color-based features. Yamazaki [4] discusses the use of color indices and edge elements in damage classification. Adams et al. [5] and Matsuoka et al. [6] involve the use of edge-based measures to analyze textural dissimilarity. Simple grayscale statistics were calculated in [7]. Womble et al. [8], [9] computed statistics for each color channel. In [8], values of pixels comprising each roof-facet
object were extracted from the before and after image pairs for each of the four multispectral bands available in QuickBird satellite imagery. Comparison of before and after object level statistics (such as by differencing or ratioing) resulted in damage metrics, which numerically described temporal changes in the roof facets. For this case study, nine separate damage metrics were examined: standard deviation (ratio and difference), variance (ratio), skewness (difference), average deviation (ratio), uniformity (ratio and difference), and entropy (ratio and difference). The case of the earthquake of Bam is studied in [10]. It uses two VHR images and focuses on the footprints of the buildings. Using an object-based assessment and correlation coefficients as features, it achieved a classification performance of the buildings among four damage grades up to 69% in VHR imagery. A unique and different approach to damage classification was proposed in [11], and it used a system-level methodology. An image-driven data mining with sigma-tree structures was demonstrated and evaluated. Results showed a capability to detect hurricane debris fields and storm-impacted near shore features (such as wind-damaged buildings, sand deposits, standing water, etc.) and an ability to detect and classify nonimpacted features (such as buildings, vegetation, roadways, railways, etc.). The sigma-tree-based image information mining capability was demonstrated to be useful in disaster response planning by detecting blocked access routes and autonomously discovering candidate rescue/recovery staging areas. More recently, other texture and statistical measures have been used for damage detection. Chen and Hutchinson [12] used wavelet feature extraction to enhance features suggested by Womble et al. [8]. Vijayaraj et al. [13] used correlation analysis, principal component analysis, and boundary compactness index of extracted rooftops. Sirmacek and Unsalan [14] used local binary pattern (LBP), local edge pattern (LEP), and Gabor texture features computed over pixels and blocks of pixels. Thomas et al. [15] used shadow length differences as a hint of damage, and the ratio of rooftop area to shadow region was used as a damage metric. Many of these efforts [5], [6], [11], [14] are pixel based and only provide information regarding patches of the images that could have changed. None of these efforts try a detailed analysis on the nature of the rooftop damage or try to evaluate the reliability of images in such classifications. In contrast to all these efforts, we address the ground-truth ambiguity problem and present an evaluation of some of the existing and proposed features.
III. P REPROCESSING A. Data Collection and Rooftop Extraction We used before- and after-hurricane images from publicly available National Oceanic and Atmospheric Administration (NOAA) and U.S. Geological Survey (USGS) aerial imagery. NOAA images are available publicly for ongoing research works. The images are uncorrected and not rotated. The approximate ground sample distance for each pixel is 37 cm (1.2 ft) or coarser. Each image varies from 4077 to 8000 pixels for height/width dimensions. The High Resolution Orthoimagery
THOMAS et al.: AUTOMATED POSTSTORM DAMAGE CLASSIFICATION OF LOW-RISE BUILDING ROOFING SYSTEMS
collection has been acquired by the USGS through contracts, partnerships with other federal, state, tribal, or regional agencies, and direct purchases from private industry vendors. Since data come from a variety of sources, the resolution area of coverage varies by data set. However, all Orthoimagery images used in this study were between 60-cm and 1-m resolutions. The images used were from the Joplin tornado (2011), Hurricane Ike (2008), Hurricane Katrina (2005), Hurricane Dennis (2005), and Hurricane Ivan (2004) images of Joplin (MO), Galveston (TX), Pensacola (FL), and New Orleans (LA). Automatic building detection and rooftop extraction is beyond the scope of this paper. Instead, we created our data set manually. An image data set of 635 rooftops was created by manually outlining individual rooftops in the beforestorm images. Subsequently, the mask images produced contain white pixels for every rooftop pixel and black pixel for every nonrooftop image. Multiple volunteers ensured that the masks produced were accurate and reliable. B. Image Registration First, the two-step process described in [16] is used to achieve fast and robust registration of before- and after-disaster aerial image pairs. The images are coarsely registered using a phase-correlation-based algorithm. For fine image registration, we adopted the use of SURF-feature-based matching. The coarsely registered images are divided into grids, and features are matched across corresponding grids using an approximate nearest neighbor search algorithm combined with a constrained random sampling consensus (RANSAC) algorithm for pointpair subset selection. Second, to overcome the effect of heights, each extracted rooftop may require a separate registration stage. It should be noted that a previous work [10] used a cross-correlation-based technique for rooftop registration and observed improvements in classification rates. We propose to improve rooftop registration accuracy by maximizing and thresholding the Fourier cross-power spectrum which is known to be more robust than cross-correlation. The bounding rectangle for each extracted rooftop is calculated by fitting a rectangle circumscribing the boundary contour of the rooftop. Additionally, the dimensions of the rectangle are expanded to account for possible registration error. Each before-storm rooftop defined by its bounding rectangle is then compared with the corresponding area in the after-storm image. Let fl and f2 be the two rooftop images that differ only by a displacement (tx , ty ), i.e., f2 (x, y) = f1 (x − tx , y − ty).
(1)
The cross-power spectrum ps of two images f1 and f2 is defined as F1 (ξ, η)F2∗ (ξ, η) ps = IF Tmax peak |F1 (ξ, η)F2∗ (ξ, η)| (2) = IF Tmax peak ej2π(ξtx +ηty ) where F2∗ is the complex conjugate of F2 . By taking the inverse Fourier transform of the representation in the frequency
3853
domain, we will have a function that is an impulse; that is, it is approximately zero everywhere except at the displacement (tx , ty ) that is needed to optimally register the two images. Slight rotation and scaling differences between two images are found by computing the scale and rotation which maximizes ps as described in [16]. In addition, we thresholded the value of the peak of phase correlation ps ≤ psthresh to filter rooftop images which do not register correctly. The effect of rooftop registration on classification is described in the evaluation section. C. Color Correction Color correction is done by transferring the color characteristics of the before-storm rooftop image to the after-storm rooftop image. The color balancing algorithm described in [17] is used. We briefly discuss the main steps in this process here. Consider the before-storm rooftop image s(i, j), afterstorm image t(i, j), and new after-storm image tnew (i, j). The color transfer proposed previously first converts the RGB color space into the lαβ color space. Once the channels have thus been decorrelated, the statistics are transfered by the following equations: t
new
(i, j)
= μks(i,j)
+
σsk(i,j) σtk(i,j)
k
μks(i,j)
i+ 1 2 = 2 k k
t(i, j) − μkt(i,j)
(3)
j+ k 2
s(l, m)
(4)
l=i− 2 m=j− k 2
σsk(i,j)
i+ k 1 2 = k k
j+ k 2
s(l, m) − μks(i,j)
2 . (5)
l=i− 2 m=j− k 2
The means are now indicated by μks(i,j) and μkt(i,j) , where k denotes the length of the window used for transferring the statistics around the pixel (i, j). Similarly, the standard deviations are indicated by σsk(i,j) and σtk(i,j) . The window length k for each pixel can be fixed by calculating the value of normalized cross-correlation NCC(i, j) for a range of window sizes and choosing the smallest window size that gives a sufficiently high NCC value. For more details, refer to [17]. IV. F EATURES FOR DAMAGE C LASSIFICATION Due to their ease of implementation and efficacy in identifying damaged rooftops, we implemented various features used in previous works and mentioned in the previous section. These features were identified as the top performers in their respective works and provide simple measures to quantify textural change. These include standard deviation, uniformity [8], correlation analysis [13], LBP, and LEP [14]. In addition, we propose new features which include gradient magnitude bins, features based on an invariant color model, and edge density. Two approaches can be used to calculate each feature: extraction per rooftop and extraction over a grid of cells. However,
3854
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 52, NO. 7, JULY 2014
Fig. 2. Grid is placed over (a) before-event building image and (b) afterdisaster building image. (c) Features are extracted for each cell. (d) Each feature calculated per cell is quantized into bins as indicated by the intensity of the heat map. A histogram of these cells is then typically computed.
due to poor performance of features computed over entire rooftops, we do not report those results in this paper. For the rest of this work, all features (with the exception of gradient magnitude bins) are calculated after dividing each rooftop into a grid of cells. In this proposed approach, each rooftop is divided using a grid, and the features are extracted for each cell in the grid, quantized into bins, and described by histograms of cells. See Fig. 2 for an example. This is done in order to capture the localization of damage and improve classification. Thus, a feature vector is extracted for each rooftop by concatenating all features as follows: v = {vstddev , vuniform , . . . , etc.}, where vstddev , vuniform , . . . , etc. correspond to various features that are calculated over the grid of cells. The rest of this section describes these features in greater detail. A. Variance and Uniformity Variance is used to describe the relative smoothness of the texture in grayscale images. The Vari,j of cell (i, j) is calculated using the pixel values Ii,j (x, y) of the cell as
2 x y (Ii,j (x, y) − μi,j ) 2 (6) Vari,j = σ = N where N is the total number of pixels in the cell and μi,j denotes the mean of the cell. Uniformity is an illumination-invariant textural measure used to describe the coarseness of an object based on its histogram [8]. For a particular RGB channel, the frequency of a pixel represents the number of occurrences of a particular pixel value within the object. The occurrence probability of a pixel value, p(x), is simply the frequency of pixel value x divided by the number of pixels in the cell and ranges in value from 0 to 1. The sum of the probabilities for all pixel values is exactly 1. The uniformity for each cell is given by p2i,j (x). (7) U nifi,j = In our implementation, we calculated the variances and uniformities for each 10 × 10 pixel cell in both the before- and after-storm images. The absolute differences of these values for each cell were then quantized into 10 bins. The histogram of bin
Fig. 3. Heat map displaying severity of damage found by variance and uniformity features, with more severe damages shown closer to red. (First column) Before-storm image, (second column) after-storm image, (third column) variance, and (fourth column) uniformity. A rooftop with no after-storm damages is shown in the first row, and another with damages is shown in the second row. Changes in textural uniformity are shown to be sensitive and more prone to false detection of damage.
values was then computed to have two histograms vstddev and vuniform , each of size 10. Fig. 3 shows a heat map for both these features. As shown in this example, uniformity is usually very sensitive to slight textural differences and hence less reliable than variance. B. Correlation Analysis The Pearson correlation coefficient (PCC) is described as a measure of the linear relationship between two random variables. It is calculated between cells (i, j) corresponding to the before- and after-storm images as σsi,j ti,j PCC(i, j) = . (8) σsi,j σti,j σsi,j and σti,j are variances of before- and after-storm image cells. σsi,j ti,j is the cross-covariance between the two cells. Physically, a higher correlation coefficient indicates that less change occurred between a pair of images. Correlation coefficients can be calculated over a pair of cells. Alternatively, a pixelwise calculation within a sliding fixed-size window can first be calculated throughout the before and the after images, and a representative correlation coefficient can use the average value of all the calculated correlation coefficients confined by the boundary of the cell. Previous work [13] found that the latter method better characterizes structural damage, and hence, it was implemented for this work. In our implementation, the cell sizes were 10 × 10 pixels, and correlation coefficient values for each cell were then quantized into 10 bins. The feature extracted vCC is thus a histogram of bin values and is of length 10. C. Local Binary and Edge Pattern LBP-based features have been used in various applications like face detection, image analysis, and image retrieval. Sirmacek and Unsalan [14] used LBP for the damage detection
THOMAS et al.: AUTOMATED POSTSTORM DAMAGE CLASSIFICATION OF LOW-RISE BUILDING ROOFING SYSTEMS
Fig. 4. Heat map displaying severity of damage found by PCC and LBP features, with more severe damages shown closer to red. (First column) Beforestorm image, (second column) after-storm image, (third column) PCC, and (fourth column) LBP.
of rooftops. The LBP is computed by using a moving window operator and producing the binary pattern by thresholding the window elements by the center pixel. The binary pattern is assigned to the center pixel. The histogram of the binary patterns in an image is computed and compared. The LBP values encode different patterns like line, edges, spots, and corner to their corresponding patterns under varying illuminations. LEP is similar to LBP, except that it is extracted from edge maps rather than pixel intensity values. For more details on LBPs, the reader is advised to refer to [18]. In our implementation, the LBP/LEP histogram of size 256 was calculated for each cell in the before and after images. The cell sizes were 10 × 10 pixels, and a chi square distance of corresponding histograms was calculated as follows: 1 [hi (k) − hj (k)]2 . 2 hi (k) + hj (k) 256
χ2 (i, j) =
(9)
k=1
These distance values for each cell were then quantized into 10 bins; features extracted vLBP , vLEP are thus histograms of bin values, and each is of length 10. Fig. 4 shows a heat map for LBP features. In general, while LBP features are usually able to distinguish no damage and damage, they do not appear to capture the severity of the damage. D. Edge Density Some previous works [5], [12] used edge-based features. We propose the use of edge density to identify and categorize more serious structural damage. When a roof collapses partially or fully, the image shows an increase in the number of new edges and nonlinearity of existing edges. An edge detection performed on the difference of before and after images can capture this change in appearance of edges. We define edge density as a measure of the number of new or changed edges that appear per unit area. First, before and after images are converted into grayscale. Then, the absolute difference of the before and after images is computed. A binary thresholding of this difference image will set all pixels above a certain value to 255 and 0 if
3855
Fig. 5. Before- and after-storm images from (top) FL and (bottom) TX are shown. The gradient magnitude images extracted around each rooftop show how strong gradients in certain directions change significantly if the rooftop is missing.
otherwise. This thresholding will eliminate all nonrelevant or minor changes in edges. Next, a Canny edge detection [19] is performed on the binary image. Finally, the number of pixels classified as edges in each 10 × 10 cell is divided by the size of the cell. This will give an edge density value for each cell of the grid. If this measure is very high, then it is usually more likely that the roof has collapsed. These density values for each cell were then quantized into 10 bins; features extracted vED are thus histograms of bin values, and each is of length 10. E. Gradient Magnitude Bins Rooftops and other man-made structures are typically characterized by strong gradients toward certain directions in the image. For example, a rectangular building would show strong gradients in four orientations. When rooftops are missing or totally destroyed, these gradients become weak in the afterstorm image, and the magnitude of gradients is more evenly distributed among all possible orientations (see Fig. 5). To capture this, we propose a new feature, called gradient orientation bins, which is computed over the entire rooftop (unlike other features described in this section) as follows. First, we calculate smoothed (using a Gaussian function) gradients in the x- and y-directions in the grayscale image I(x, y) 2 −x x + y2 exp − gx (x, y) = (10) 2πTg4 2Tg2 2 −y x + y2 exp − gy (x, y) = (11) 2πTg4 2Tg2 where Tg is the smoothing parameter. We calculate the smoothed gradients for the image I(x, y) as d I(x, y) = gx (x, y) ∗ I(x, y) dx d I(x, y) = gy (x, y) ∗ I(x, y) Iy = = dy
Ix = =
(12) (13)
3856
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 52, NO. 7, JULY 2014
where ∗ stands for the 2-D convolution operation. The orientation and magnitude of gradients can be calculated as Iy Iθ (x, y) = tan−1 (14) Ix (15) Imag (x, y) = (Ix )2 + (Iy )2 . Gradient magnitude bins histIθ can then be computed for all pixels (x, y) as Iθ (x, y) Iθ (x, y) histIθ = histIθ + Imag (x, y) (16) N N where 0 ≤ Iθ (x, y) ≤ 360 and N denotes the bin size. The size of N is typically larger than 1 to reduce the space required to store the features as well as make it more invariant to noise. In our implementation, N = 5 gave good performance. The feature vector vGMB is obtained by simply differencing the corresponding gradient magnitude bins for before and after rooftops and is of length 72. F. Features From Invariant Color Model
Fig. 6. First row shows a collapsed building and the corresponding false color images for edge density, V histogram, and H means. Notice that the edge density values are significantly higher in this case. The second row corresponds to a partially damaged building with a cavity in the rooftop. V histogram indicates a significant change in the cavity area, and H means shows a minor damage on the roof.
of corresponding cells of before and after images to be V1 and V2 , respectively. The proposed feature Vdiff is computed over the 10 × 10 cell as Vdiff (V1 , V2 ) = (V1 (x, y) − V2 (x, y)) . (20) x
y
Most previous works extracted features from grayscale-based or RGB images. In contrast, we propose new features that are based on the HSV color model. Previous studies [20] have established the use of such color models in problems such as shadow compensation. This color space is more invariant to photometric differences as it decouples luminance and chromaticity. Color tone is a powerful descriptor, and conversion into the HSV color model helps identify features that intuitively have more discriminating capacity. In this section, we introduce two features based on the HSV model, called hue means and V difference. Our approach assumes three types of damage. Minor damage includes the removal of tiles, slight irregularities in edges, etc. A moderate version would include holes in the roof, dislodged decking, and partial change in the elevation of the roof. The RGB-to-HSV conversion is done as follows: ⎧ 0, if max = min ⎪ ⎪ ⎪ ⎨ 60◦ × g−b + 0◦ mod 360◦ , if max = r max − min h= b−r ◦ ◦ ⎪ × if max = g 60 ⎪ − min + 120 , ⎪ ⎩ ◦ maxr−g ◦ if max = r 60 × max − min + 240 , (17) 0, if max = 0 s = max − min (18) min = 1 − max , otherwise max v = max . (19)
Vdiff is quantized into 10 bins, and a histogram vV Diff is calculated. 2) Hue Means Histogram: The hue component corresponds to the color value of a pixel. Since it is independent of luminance and saturation (richness of the color), it is assumed to be invariant to photometric differences. The variation in color that occurs in the case of milder damage like removed tiles, exposed decking, etc., is noticeable in the hue spectrum. The hue value varies from 0◦ to 360◦ . The distance between two hue angles can tell us how different the colors are. However, notice that the distance should always correspond to the smaller angle between the two hue values. Furthermore, damages like removed tiles show distinct patterns in the corresponding after-storm hue histograms. However, those patterns can be trusted only when the difference between the hue values of before- and after-storm images is high enough. Hence, we compute another feature, a 2-D hue means histogram, to reflect the hue spectrum of the after-storm image as well as the degree of change corresponding to each spectrum. More formally, consider the hue components of before and after image cells to be H1 and H2 , respectively, and their corresponding means are H 1 and H 2 (21) Hdiff = (H1 − H2 ) .
The max and min denote the maximum and minimum of RGB values, respectively. 1) V Difference: When the roof structure is partially or completely destroyed, it can be seen from the aerial view as a dark cavity. The change in V (value) components can be used to discriminate roofs with such cracks, openings, or holes, as it represents a decrease in illumination in the corresponding regions. Hence, to compute V difference, the values are subtracted, and the difference is summed up. A larger sum usually indicates the presences of cavities. Consider the V components
We can ensure that the smaller angle is chosen by computing 360 − Hdiff if Hdiff > 180. A 2-D histogram of cells is computed by considering Hdiff as the first dimension and H 2 as the second dimension. In order to reduce the space required to store the values, Hdiff and H 2 were quantized to 3 and 90 bins, respectively. The feature vector vHM Hist is obtained by concatenating the rows of the 2-D histogram to form a single linear vector of length 270. See Fig. 6 for examples that demonstrate how edge density, V difference, and hue means could be used to capture different types of damage.
THOMAS et al.: AUTOMATED POSTSTORM DAMAGE CLASSIFICATION OF LOW-RISE BUILDING ROOFING SYSTEMS
3857
V. S UPERVISED L EARNING FOR DAMAGE C LASSIFICATION The different edge-, intensity-, and color-based features proposed in the previous section can be used to classify buildings in an image into categories ranging from no damage to severely damaged. The damage classification process proposed is a supervised learning approach that uses the features for classifying damage into qualitative states. The goal of supervised learning is to build a general hypothesis that models the distribution of class labels in terms of predictor features. The resulting classifier is then used to assign class labels to the testing instances where the values of the predictor features are known, but the value of the class label is unknown. In this evaluation, we experimented with many supervised learning algorithms for predicting the damage state of buildings. Supervised learning algorithms include decision trees, rule learners, perception-based techniques, Bayesian networks, etc. In addition, we experimented with the ensemble of classifiers. An ensemble is an aggregation of predictions of multiple classifiers with the goal of improving accuracy. Evaluating the prediction of an ensemble typically requires more computation than evaluating the prediction of a single model, so ensembles may be thought of as a way to compensate for poor learning algorithms by performing extra computation. Common types of ensembles include bagging, boosting, and bucket of models [21]. Bootstrap aggregating, often abbreviated as bagging, involves having each model in the ensemble vote with equal weight. In order to promote model variance, bagging trains each model in the ensemble using a randomly drawn subset of the training set. The random forest algorithm combines random decision trees with bagging to achieve very high classification accuracy [22]. In our evaluation, the random forest algorithm performed the best and was selected as the choice of the classifier. A high-level overview of the proposed damage classification process is shown in Fig. 7. In the preprocessing stage, the before- and after-storm images are corrected for photometric and geometric differences. Feature vectors described in the previous section are then extracted from the rooftops. They are used to train supervised learning algorithms with the aid of ground truth. As described in the next section, separate classifiers are trained to make decisions on the appearance states of rooftops. These appearance states are more specific to the nature of damage and are usually much less ambiguous than RS-scale damages. Each new rooftop sample whose damage state needs to be evaluated can then be classified into an RSscale category on the basis of the predicted appearance states. VI. R ELIABILITY OF G ROUND -T RUTH P REPARATION An expert volunteer labeled all 635 buildings into the four RS damage scales described in the previous section. As this damage scale can be ambiguous, various appearance-based qualitative categories described in Table I were also labeled for each individual rooftop. These simple categories are more intuitive and thought of as less ambiguous for visual interpretation. Ambiguity in the definition of RS-scale classes may affect the manual ground-truth preparation (see Fig. 8). Previous studies have mostly ignored this problem or used a very small
Fig. 7. Overview of the proposed approach.
data set. For estimating the level of ambiguity while preparing ground truth, a study of manual classification by multiple expert and nonexpert volunteers was conducted. Each volunteer was presented with before- and after-storm rooftop images and asked to choose one of the four damage states. In addition, various appearance categories present in Table I were also labeled as a part of the questionnaire. At the end of the experiment, the classifications from all the volunteers were compiled to produce agreement percentage levels for the damage state of rooftops. Nine volunteers worked on a set of 50 rooftop image pairs. Another 13 volunteers worked on a separate set of 65 rooftop image pairs. Both the results are shown in Table II. For the group of nine volunteers, a class label was considered agreed upon if at least nine volunteers marked a rooftop into the same label. For the group of 13, this threshold was nine volunteers. Both groups of volunteers appear to have made similar level of agreements. Since the percentage of missing tiles or collapsed area show higher levels of interpretation ambiguity, we simplify these categories and add two new ones that could have two labels ≤ 5% or > 5%. The overall agreement % shown helps redefine accuracy expectations of supervised classification for damage interpretation. It is proven that damage classification from high-resolution aerial imagery is an inherently ambiguous problem, and the estimates provided by classifiers should not be expected to exceed the agreement % shown in Table II. As the agreement % of the RS damage scale is the lowest, we focus on classifier performance in identifying appearance-based damage categories. These included five categories: missing building, collapse state, debris, cavity, and state of tiles. To prove that these classes are highly correlated with the RS damage scale, we train a J48 decision tree. The classifier was trained on the 635-rooftop data set using the marked classes as features to predict the RS damage scale. The resultant decision
3858
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 52, NO. 7, JULY 2014
TABLE I A PPEARANCE -BASED Q UALITATIVE C ATEGORIES FOR DAMAGE C LASSIFICATION
are used as training data. The cross-validation process is then repeated k times (the folds), with each of the k subsamples used exactly once as the validation data. The k results from the folds are then averaged to produce a single estimate. We used tenfold cross-validation for generating all our results. A. Selected Features and Classification Performance
Fig. 8. First row shows before-storm images. The second row shows afterstorm images. The first column shows an image pair where the exact state of the rooftop is ambiguous and hard to discern. The tiles are removed, but it is unclear whether there are holes or cavities. The second column is an example where the entire rooftop structure is missing, yet the building is standing as evidenced by the shadows. The third column shows either a collapsed or missing rooftop. The exact nature of damage is unclear, although it is evident that it is serious.
tree is shown in Fig. 9 and predicts the RS scale with 94% accuracy. Since debris presence was not selected as a feature by the J48 algorithm and is the category that is the hardest to predict (demonstrated by the lower agreement %), we focus on predicting the other four classes for the rest of this work. VII. E VALUATION For evaluation, we used the data set of 635-rooftop image pairs with the manually prepared ground truth described in the previous section. The class distribution for various damage appearance states is shown in Table III. It can be observed that, with the exception of the missing-tile state, the data set has an imbalanced class distribution. This can be a problem while learning models from the data set as the classifier can be inclined in favor of the majority class. To overcome this, we used cost-sensitive classification. Each class label is weighted with a predetermined cost. Reweighting the training instances according to the total cost assigned to each class is a common cost-sensitive classification approach and is adopted in this work. To establish the statistical validity of classification results, all results reported in this section were generated using k-fold cross-validation. In k-fold cross-validation, the original data set is randomly partitioned into k subsamples. Of the k subsamples, a single subsample is retained as the validation data for testing the model, and the remaining k − 1 subsamples
We considered using all features described in the previous section for the various classification experiments. After being extracted from the images, all features were L2 normalized. As mentioned previously, we chose random forests based on the best performance of various classifiers. For the rest of this section, all results reported were generated by various iterations of random forests. Additionally, we experimented with each feature separately, all features together and with features selected using a best first feature selection algorithm. We report only the best results and their corresponding features. The results are reported in Table V. The metrics used for evaluation are accuracy, precision, and recall. If T P is the true positive rate, F P is the false positive rate, T N is the true negative rate, and F N is the false negative rate TP + TN × 100 TP + TN + FP + FN TP P recision = TP + FP TP . Recall = TP + FN Accuracy =
(22) (23) (24)
It can be observed from Tables IV and V that missing buildings are best identified using gradient magnitude bins. This feature also had the highest accuracy among all other predictions. This result correlates well with the ground-truth agreement percentages reported previously. Feature selection with the best bin first was the best strategy for cavity presence and collapse state. The missing-tile state had the worst performance in terms of precision and recall. This is because of the high number of false positives and false negatives in identifying missing tiles. Fig. 10 shows some of the rooftops and their classifications. B. Effect of Preprocessing The preprocessing steps described previously have varying effects on the final accuracy. The effect of color balancing and rooftop registration on preprocessing is reported in Table V. The measures used for evaluation include accuracy, F-measure, and receiver operating characteristic (ROC) area. If T P is the
THOMAS et al.: AUTOMATED POSTSTORM DAMAGE CLASSIFICATION OF LOW-RISE BUILDING ROOFING SYSTEMS
3859
TABLE II AGREEMENT % OF V ISUAL I NTERPRETATION BY T WO G ROUPS OF VOLUNTEERS
TABLE IV B EST C LASSIFICATION P ERFORMANCE FOR D IFFERENT F EATURE S ELECTION S CHEMES . T HE M EANS OF C LASS ACCURACY, P RECISION , AND R ECALL FOR E ACH C ATEGORY A RE R EPORTED
Fig. 9. J10 × 108 tree trained on the 635-rooftop data set to predict RS damage scale. The leaves represent A to D categories in the RS damage state, and nodes represent features. These features are building missing (F0 ), cavity presence (F1 ), missing tiles (F2 ), and collapse state (F3 ). TABLE III C LASS D ISTRIBUTION OF A PPEARANCE -BASED DAMAGE S TATES IN THE 635-ROOFTOP DATA S ET
true positive rate, F P is the false positive rate, T N is the true negative rate, and F N is the false negative rate F measure =
2 × P recision × Recall × 100. P recision + Recall
(25)
The ROC area represents the area under the curve of a ROC plot created by plotting the true positive rate versus the false positive rate, at various threshold settings. Values closer to 1 are more desirable for the ROC area. Overall, images after color balancing alone perform best in terms of accuracy, F-measure, and ROC area. The highest improvement in accuracy is seen for the missing-tile category. This maybe because the correction of hue spectrum reduced false positives and increases true negatives (see Fig. 11). The collapse state appears to be hardly affected by any preprocessing, while cavity presence shows significant improvement due to color balancing. Detecting missing buildings, however, shows only marginal improvement in accuracy. There appears to be little or no advantage in rooftop registration. Although this appears to be counterintuitive, the inefficacy maybe partly attributed to the fact that most rooftops are already registered correctly and rooftop registration may in fact be counterproductive by introducing minor registration error.
C. Estimating RS Damage Scale To evaluate the performance of predicting the RS damage scale, we first trained a classifier to separate RS-A from the rest of the classes. Then, the instances categorized as nonRS-A were classified using appearance-based damage states. The predictions of these damage states were then used to predict the RS damage scale using the simple decision tree described in Section IV. The final results are shown in Table VI. The overall accuracy in classifying into the four categories is 67.7%. Many of the RS-B-categorized buildings are confused with RS-A and RS-C. RS-C- and RS-D-labeled buildings had the worst performance, with a lot of the buildings being confused with each other. VIII. C ONCLUSION A common outcome of all previous research, including ours described in this paper, is that predicting multiple levels of damage categories is an inherently hard problem. Through our experiments, we prove that this is due to the ambiguity in classifying a rooftop into any category. We estimated expected accuracy by establishing a correlation with the performance of expert human volunteers. Furthermore, we identified more intuitive appearance-based damage categories and proved that predicting these may decrease the ambiguity in identifying the RS damage scale as well. Toward this end, we evaluated novel as well as existing textural features for damage recognition and classification. Additionally, a color balancing preprocessing step is introduced, which offers improvement in the performance of the proposed scheme. To the best of our knowledge, this evaluation was conducted based on the largest known collection of rooftop images. The final evaluation indicates that this technique matches the performance of expert human visual interpretation for most of the damage categories identified. This approach detects missing
3860
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 52, NO. 7, JULY 2014
TABLE V B EST C LASSIFICATION P ERFORMANCE FOR D IFFERENT P REPROCESSING S CHEMES . T HE VARIOUS R ESULTS R EPORTED I NCLUDE N O P REPROCESSING (noprepro), A FTER C OLOR BALANCING (cb), AND A FTER C OLOR BALANCING P LUS ROOFTOP R EGISTRATION (cb + rr). M EAN ACCURACY, F-M EASURE , AND ROC A REA F OR E ACH C LASS A RE R EPORTED
TABLE VI C ONFUSION M ATRIX F OR P REDICTING RS S CALE FOR W INDSTORM DAMAGE
buildings with 96% accuracy, the presence of cavities with 84%, missing tiles with 78%, and the collapse state with 74%. Finally, the RS scale can be predicted using our techniques with 67% accuracy. This compares well with the predictions based on expert opinion, which was accurate nearly 70%. While the reliability of the high-resolution aerial imagery has shown to be limited for the fine-grained damage analysis, this work has pushed the boundaries of what is currently possible. A limitation of our approach is that both the pre- and poststorm images of the area are needed for the damage analysis. The presence of strong occlusions such as clouds in either of the images can render that area of the image unusable. One possible solution is to reconstruct those areas by combining images taken at different times of the day, but this may be precluded by the lack of images at different times. Future work may concentrate on other textural measures, reclassifying damage states for a more refined damage analysis, and the use of VHR imagery. Fig. 10. Actual and predicted damage categories for some of the images in the data set are shown here. All the misclassifications are marked in red.
ACKNOWLEDGMENT The authors would like to thank all the expert volunteers who took part in this study. The authors would also like to thank the Global Center of Excellence at Tokyo Polytechnic University funded by the Ministry of Education, Culture, Sports, Science and Technology (MEXT), Japan, for providing the support required for undertaking this research. R EFERENCES
Fig. 11. Effect of color balancing on hue means feature is shown in this image. After the color correction, only exact places where tiles are removed are shown as severely damaged.
[1] J. A. Womble, “Remote-sensing applications to windstorm damage assessment,” Ph.D dissertation, Civil Eng., Texas Tech Univ., Lubbock, TX, USA, 2005. [2] T. M. Brown, D. Liang, and J. A. Womble, “Development of a statistical relationship between ground-based and remotely-sensed damage in windstorms,” in Proc. 13th Int. Conf. Wind Eng., 2011. [3] S. Ghosh, C. Huyck, M. Greene, S. Gill, J. Bevington, W. Svekla, R. DesRoches, and R. Eguchi, “Crowdsourcing for rapid damage assessment: The Global Earth Observation Catastrophe Assessment Network (GEO-CAN),” in Earthquake Spectra, 2011, pp. 179–198. [4] F. Yamazaki, “Applications of remote sensing and GIS for damage assessment,” in Proc. 8th Int. Conf. Struct. Safety Rel., 2001, pp. 12–20. [5] B. J. Adams, C. K. Huyck, B. Mansouri, R. T. Eguchi, and M. Shinozuka, Application of High-Resolution Optical Satellite Imagery
THOMAS et al.: AUTOMATED POSTSTORM DAMAGE CLASSIFICATION OF LOW-RISE BUILDING ROOFING SYSTEMS
[6]
[7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22]
for Post-Earthquake Damage Assessment: The 2003 Boumerdes (Algeria) and Bam (Iran) Earthquakes. Buffalo, NY, USA: MCEER, 2004. M. Matsuoka, T. T. Vu, and F. Yamazaki, “Automated damage detection and visualization of the 2003 Bam, Iran earthquake using high-resolution satellite images,” in Proc. 25th Asian Conf. Remote Sens., Chiang Mai, Thailand, 2004, pp. 841–845. L. Sampath, “Image based assessment of windstorm damage,” M.S. thesis, Elect. Eng., Texas Tech Univ., Lubbock, TX, USA, 2004, pp.1–7. J. A. Womble, K. C. Mehta, and B. J. Adams, “Remote-sensing assessment of wind damage,” presented at the 5th Int. Workshop Remote Sens. Appl. Nat. Hazards, Washington, D.C., USA, 2007. J. A. Womble, S. Ghosh, B. J. Adams, and J. F. Carol, Advanced Damage Detection for Hurricane Katrina Integrating Remote Sensing and VIEWS Field. Buffalo, NY, USA: MCEER, 2006. A. Chesnel, R. Binet, and L. Wald, “Object oriented assessment of damage due to natural disaster using very high resolution images,” in Proc. IEEE Int. Geosci. Remote Sens. Symp., 2007, pp. 3736–3739. C. F. Barnes, H. Fritz, and J. Yoo, “Hurricane disaster assessments with image-driven data mining in high-resolution satellite imagery,” in Proc. IEEE Trans. Geosci. Remote Sens., 2007, pp. 1631–1641. S. Radhika, M. Matsui, and Y. Tamura, “Using wavelets as an effective alternative tool for wind disaster detection from satellite images,” in Proc. 5th Int. Symp. Comput. Wind Eng., 2010. Z. Chen and T. C. Hutchinson, “Urban damage estimation using statistical processing of satellite images: 2003 Bam, Iran earthquake,” in Proc. SPIE Color Imag. X, Process., Hardcopy, Appl., 2005, pp. 289–300. V. Vijayaraj, E. Bright, and B. Bhaduri, “Rapid damage assessment from high resolution imagery,” in Proc. IEEE Geosci. Remote Sens. Symp., 2008, pp. 499–502. B. Sirmacek and C. Unsalan, “Damaged building detection in aerial images using shadow information,” in Proc. 4th Int. Conf. Recent Adv. Space Technol., 2009, pp. 249–252. J. Thomas, K. W. Bowyer, and A. Kareem, “Fast robust perspective transform estimation for automatic registration in disaster response applications,” in IEEE Geosci. Remote Sens. Symp., 2012, pp. 2190–2193. J. Thomas, K. W. Bowyer, and A. Kareem, “Color balancing for change detection in multitemporal images,” in Proc. IEEE Workshop Appl. Comput. Vis., 2012, pp. 385–390. T. Ojala, M. Pietikainen, and D. Harwood, “A comparative study of texture measures with classification based on feature distributions,” Pattern Recog., vol. 29, no. 1, pp. 51–99, Jan. 1996. J. Canny, “A computational approach to edge detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-8, no. 6, pp. 679–698, Nov. 1986. V. J. D. Tsai, “A comparative study on shadow compensation of color aerial images in invariant color models,” IEEE Trans. Geosci. Remote Sens., vol. 44, no. 6, pp. 1661–1671, Jun. 2006. B. Zenko, “Is combining classifiers better than selecting the best one,” Mach. Learn., vol. 54, no. 3, pp. 255–273, Mar. 2004. L. Breiman, “Bagging predictors,” Mach. Learn., vol. 24, no. 2, pp. 123– 140, Aug. 1996.
Jim Thomas (M’11) received the M.S. degree in computer science and engineering from the University of Notre Dame, Notre Dame, IN, USA, in 2010. Prior to that, he received the B.Tech. degree from the National Institute of Technology, Bhopal, India, in 2007 and worked as a Software Engineer with Hewlett Packard, Bangalore, India, for nearly a year. He received the Ph.D. degree in computer science and engineering from the University of Notre Dame in 2013. He currently works as a Research Scientist with Amazon.com. He previously worked as a Research Assistant with the Computer Vision Research Laboratory, University of Notre Dame. His research dealt with interpreting remote-sensing imagery, particularly for postdisaster analysis. His research interests include change detection, object recognition, image registration, and applications of computer vision in solving real-world problems.
3861
Ahsan Kareem received the B.Sc. degree in civil engineering from the West Pakistan University of Engineering and Technology, Lahore, Pakistan, in 1968, the M.Sc. degree in civil engineering from the University of Hawaii, Hilo, HI, USA, with a joint program at the Massachusetts Institute of Technology, Cambridge, MA, USA, in 1975, and the Ph.D. degree in civil engineering from Colorado State University, Fort Collins, CO, USA, in 1978. He is the Robert M. Moran Professor of Civil Engineering and Geological Sciences and the Director of the NatHaz Modeling Laboratory at the University of Notre Dame, Notre Dame, IN, USA. His research uses computer models and laboratory and fullscale experiments to study the dynamic effects of environmental loads under winds, waves, and earthquakes in order to understand and predict the impact of natural hazards on the constructed environment and to develop mitigation strategies that enhance the performance and safety of structures. Dr. Kareem was elected to the National Academy of Engineering in 2009 for his contributions in analyses and designs to account for wind effects on tall buildings, long-span bridges, and other structures. In 2010, he was elected as a a Foreign Fellow of the Indian National Academy of Engineering. In 2010, he was elected as a distinguished member of the American Society of Civil Engineers.
Kevin W. Bowyer (F’98) received the Ph.D. degree in computer science from Duke University, Durham, NC, USA, in 1980. He is the Schubmehl-Prein Professor and the Department Chair of the Department of Computer Science and Engineering at the University of Notre Dame, Notre Dame, IN, USA. He has made major contributions in several areas of biometrics research, including face recognition, iris biometrics, multibiometrics, and other areas. His research group has been active in support of a variety of governmentsponsored biometrics programs, including the Human ID Gait Challenge, the Face Recognition Grand Challenge, the Iris Challenge Evaluation, the Face Recognition Vendor Test 2006, and the Multiple Biometric Grand Challenge. Prof. Bowyer is a Golden Core Member of the IEEE Computer Society. He has served as the Editor-in-Chief of the IEEE T RANSACTIONS ON PATTERN A NALYSIS AND M ACHINE I NTELLIGENCE and the IEEE Biometrics Compendium and as the General Chair of the 2007, 2008, and 2009 IEEE International Conference on Biometrics Theory Applications and Systems and the 2011 International Joint Conference on Biometrics.