2314
JOURNAL OF HYDROMETEOROLOGY
VOLUME 15
Multi-Index Rain Detection: A New Approach for Regional Rain Area Detection from Remotely Sensed Data SHRUTI UPADHYAYA AND R. RAMSANKARAN Remote Sensing Division, Department of Civil Engineering, Indian Institute of Technology Bombay, Mumbai, India (Manuscript received 2 January 2014, in final form 22 June 2014) ABSTRACT In this article, a new approach called Multi-Index Rain Detection (MIRD) is suggested for regional rain area detection and was tested for India using Kalpana-1 satellite data. The approach was developed based on the following hypothesis: better results should be obtained for combined indices than an individual index. Different combinations (scenarios) were developed by combining six commonly used rain detection indices using AND and OR logical connectives. For the study region, an optimal rain area detection scenario and optimal threshold values of the indices were found through a statistical multi-decision-making technique called the Technique for Order Preference by Similarity Ideal Solution (TOPSIS). The TOPSIS analysis was carried out based on independent categorical statistics like probability of detection, probability of no detection, and Heidke skill score. It is noteworthy that for the first time in literature, an attempt has been made (through sensitivity analysis) to understand the influence of the proportion of rain/no-rain pixels in the calibration/ validation dataset on a few commonly used statistics. Thus, the obtained results have been used to identify the above-mentioned independent categorical statistics. Based on the results obtained and the validation carried out with different independent datasets, scenario 8 (TIRt , 260 K and TIRt 2 WVt , 19 K, where TIRt and WVt are the brightness temperatures from thermal IR and water vapor, respectively) is found to be an optimal rain detection index. The obtained results also indicate that the texture-based indices [standard deviation and mean of 5 3 5 pixels at time t (mean5)] did not perform well, perhaps because of the coarse resolution of Kalpana-1 data. It is also to be noted that scenario 8 performs much better than the Roca method used in the Indian National Satellite (INSAT) Multispectral Rainfall Algorithm (IMSRA) developed for India.
1. Introduction For India, being predominantly an agricultural country, the success or failure of crops in any year is crucial for the development of the country’s economy. The southwest monsoon, which brings over 60%–80% of India’s annual rainfall, has great significance (Prakash et al. 2014). Extreme events like floods and droughts pose a major threat not only to the crops but also to human life. In recent years, there were a number of extreme hydrological events over various parts of India. For example, in 2012, drought became a critical issue and affected major parts of India like southern and eastern Maharashtra, northern Karnataka, Andhra Pradesh, Orissa, Gujarat, and Rajasthan. On the contrary, there was a devastating flood observed in 2013 in the northern part of India due to
Corresponding author address: Dr. R. Ramsankaran, Remote Sensing Division, Department of Civil Engineering, Indian Institute of Technology Bombay, Powai, Mumbai 400 076, India. E-mail:
[email protected] DOI: 10.1175/JHM-D-14-0006.1 Ó 2014 American Meteorological Society
cloud bursts. Experts warn that such extreme events will occur more frequently because of global climate change. Hence, there is a need for proper and accurate knowledge of the spatial and temporal distribution of rainfall at finer resolutions for disaster preparedness and flood forecasting. The detection and estimation of rainfall events at high spatial and temporal resolutions can only be achieved from the vantage of space, that is, from satellites. Because rain gauges and ground radars are not well distributed, it becomes highly expensive for their continuous maintenance in developing countries like India. Hence, for the past two decades, hydrologists have started to utilize the cost-effective globally covered satellite data for estimations of rainfalls all over the world (e.g., Adler et al. 1993, 2000; Haile et al. 2010; Joyce et al. 2004). There are even a few algorithms developed specifically for Indian regions (e.g., Mishra et al. 2009, 2010) to estimate rainfall at high temporal and spatial resolutions. These high temporally resolved estimates of rainfall can only be achieved using geostationary satellite data, which provide images at 15–30-min intervals. Generally,
DECEMBER 2014
UPADHYAYA AND RAMSANKARAN
these satellites carry visible (VIS; 0.4–0.7 mm), water vapor (WV; 6.2 mm), and thermal infrared (TIR; 10.8 mm) sensors. Among these, TIR data are widely used in rainfall estimations, as they provide information about the cloud top brightness temperature (CTBT). The assumption behind the use of CTBT is that the clouds with a colder CTBT produce heavier rainfall (Haile et al. 2010), but these TIR measurements do not penetrate the clouds and have a weak/indirect relation with rainfall rates. The main drawback of the infraredbased methods is the rainfall area detection (Ba and Gruber 2001). If the raining pixels are not properly delineated, it may lead to an over-/underestimation of rainfall area or even to the oversight of a complete rainfall event. Hence, the rain area detection acts as a crucial initial step in the development of satellite rainfall retrieval algorithms, and therefore, many indices have been developed for the detection of rainfall area that not only utilize CTBT but also take advantage of multispectral data availability in geostationary satellites like visible, WV-band, and other IR-band sensors. One of the oldest and most basic methods used to screen raining and nonraining pixels is the simple thresholding technique. Arkin and Meisner (1987) used the TIR channel and defined a CTBT threshold of 235 K. If the pixel has a CTBT of less than 235 K, then it is considered a rain pixel; otherwise, it is neglected as a norain pixel. Most of the TIR-based techniques (e.g., Haile et al. 2010; Huffman et al. 2007; Todd et al. 2001) use this method for rain area detection. This method cannot identify thin cirrus clouds, which usually have a low CTBT but do not produce rain. Vicente et al. (1998) mentioned that using only CTBT for rain area detection without considering the evolution of the cloud system will result in an excessive area of precipitation leading to overestimation of rainfall over the area. Hence, they proposed a cloud growth rate correction factor along with simple CTBT thresholds to improve the detection of rain pixels. Todd et al. (2001) showed that the relation between CTBT and rain rate is not consistent and varies regionally. Thus, different threshold values are being used for different regions in different algorithms (Ba and Gruber 2001; Kalinga and Gan 2010; Roca et al. 2002; Todd et al. 2001), but without a properly defined method for the selection of threshold values. Recently, Haile et al. (2010) and Kuligowski (2002) found the optimal threshold value based on the highest Heidke skill score (HSS) without considering the other categorical statistics results. Likewise, multispectral data available from various geostationary satellites can be used to develop various thresholding indices like brightness temperature difference (BTD) at 10.8- and 6.2-mm and at 10.8- and 12.0-mm channels for improving
2315
the rain area detection. However, very few studies (e.g., Ba and Gruber 2001; Behrangi et al. 2009; Haile et al. 2010; Kühnlein et al. 2014; Mishra et al. 2009; Thies et al. 2008) are available that use such multispectral data. Among the above-mentioned multispectral algorithms, Kühnlein et al. (2014) and Behrangi et al. (2009) use machine-learning techniques to delineate the rain area, whereas others are based on empirical methods using either one multispectral index or a combination of different indices. It is to be noted that all of the algorithms are developed for specific areas or for specific satellite data. Considering all of the above-mentioned facts, it seems there is a need to develop better and generalized regional algorithms that can be easily modified for any region and for any satellite data. Therefore, a new regional approach for rain area detection called MultiIndex Rain Detection (MIRD) is suggested in this article. The proposed MIRD approach has also been tested for the Indian region by evaluating six commonly used multispectral indices in various satellite-based rain area detection techniques. Further, for evaluating our hypothesis, that is, that better results should be obtained for combined indices than an individual index, these six indices were combined in different ways using various AND or OR combinations. In the following sections, each of these combinations is called a ‘‘scenario.’’ To avoid subjectivity in choosing optimal threshold values and an optimal scenario/index, a statistical multidecision-making technique called the Technique for Order Preference by Similarity Ideal Solution (TOPSIS) has been used based on various commonly used independent categorical statistics. As rainfall events are a rare event over a large area, very few raining pixels will be available compared to the nonraining pixels and their proportion will also vary based on the dataset. If the categorical statistics are dependent on the proportion of rain/no-rain pixels in the calibration/validation dataset, then the comparison made between two datasets of different proportions may lead to the wrong conclusion. Hence, it is necessary to carry out such studies. Therefore, in addition to the above-mentioned work, a sensitivity analysis has been carried out to understand the influence of size and proportion of rain and no-rain pixels in the test dataset on the commonly used categorical statistics [accuracy (ACC), probability of detection (POD), probability of no rain detection (POND), bias (BIAS), false alarm ratio (FAR), and HSS]. The results of this sensitivity analysis are also reported in this article and were utilized while implementing the MIRD approach. It is noteworthy to mention here that there are no such studies available in any literature to date.
2316
JOURNAL OF HYDROMETEOROLOGY TABLE 1. Sensor details of the K-1 satellite.
Band
Spectral resolution (mm)
Spatial resolution (km)
Temporal resolution (min)
VIS TIR WV
0.55–0.75 10.5–12.5 5.7–7.1
2 8 8
30 30 30
This article is organized into five sections. The data used in the present work are given in section 2. The methodology and the background of the study are given in section 3. The results and a discussion of the various analyses carried out in the present work are given in section 4, and finally, section 5 summarizes and lists the major conclusions of the study along with the future scope.
VOLUME 15
algorithms described in Iguchi et al. (2000) and Meneghini et al. (2000). The available orbital rainfall rates data between 2010 and 2013 southwest monsoon periods have been utilized in this study for comparison and validation of the obtained results.
c. AWS rain gauge data Rainfall data from automatic weather stations (AWSs) installed by the Indian Space Research Organisation (ISRO) across the country have been utilized in this study. ISRO’s AWSs have a tipping-bucket rain gauge designed for unlimited rain-measuring capacity with accuracy better than 1 mm. The data have been obtained for June 2012 from the MOSDAC website.
3. Methodology adopted 2. Data description a. Kalpana-1 data The Kalpana-1 (K-1) satellite is a dedicated Indian meteorological geostationary satellite launched by the Geosynchronous Satellite Launch Vehicle (GSLV), and it has been operating since 24 September 2002. This geostationary satellite carries on board a Very High Resolution Radiometer (VHRR), along with other instruments. This sensor operates in three wavelength bands, namely, VIS, TIR, and WV. Table 1 gives the details of each band of K-1. The level-1 processed data can be downloaded free of cost from the Meteorological and Oceanographic Satellite Data Archival Centre (MOSDAC) website (www.mosdac.gov.in) after user registration. Considering the importance of southwest monsoon rainfall for India, in this study, the southwest monsoon data of the years 2010–13 have been used for calibration and validation of the proposed MIRD approach.
b. TRMM 2A25 data The Tropical Rainfall Measuring Mission (TRMM) product 2A25 (TRMM 2A25) is the orbital data product of TRMM satellite’s precipitation radar (PR) instrument, which gives the instantaneous surface rainfall rates for the swath of PR. The precipitation radar is considered as a flying rain gauge that provides a threedimensional structure of rainfall. Hence, 2A25 rainfall rates serve as the reference dataset for the calibration and evaluation of the indices for rainfall detection. The 2A25 products represent a snapshot of the rainfall rates with a horizontal resolution of 5 km and a temporal frequency of one or two observations per day depending on the latitude with a 220-km swath. The 2A25 products are processed by the TRMM science team using the
In this section, a brief description is given of the proposed MIRD approach and the procedures followed in implementing the MIRD approach for the study area. In addition, details of the sensitivity analysis carried out to find the independent categorical statistics while implementing the MIRD approach are discussed separately.
a. Description of the MIRD approach The MIRD approach involves three major steps. The first step is the collocation process to identify the pixels of geostationary meteorological satellite images wherever the reference datasets are available for the study regions at roughly the same time. The second step involves finding the optimal threshold values for the selected rain detection indices based on various independent categorical statistics through TOPSIS analysis. The last step in the MIRD approach is about forming different combinations of rain detection indices and evaluating each combination again through TOPSIS to select the optimal combination. In the following sections, details on how the MIRD approach was adopted and tested for the study area are discussed and are also illustrated in Fig. 1.
1) COLLOCATION Collocations between two satellite sensors are where both sensors observe the same place at roughly the same time (Holl et al. 2010). Here, the satellite data are from Kalpana-1, a geostationary satellite, and the other data are TRMM 2A25, obtained from TRMM, a low-Earthorbiting satellite. As TRMM 2A25 orbital rain rates are considered standard reference rain rates, the collocation process is required to compare and validate the obtained results. In-house computer programs have been developed to extract the collocated pixels in K-1 images. It is a two-step process, namely, temporal and spatial collocations. In temporal collocation, the nearest time
DECEMBER 2014
2317
UPADHYAYA AND RAMSANKARAN
FIG. 1. Framework of the proposed MIRD approach.
domain (maximum of 10-min time lag) K-1 images with respect to TRMM 2A25 images have to be found. The second step is the spatial collocation, in which the spatially collocated (matching) pixels have to be identified from the nearest time domain K-1 images. The abovementioned collocation process was carried out over the Indian region (between 37.18 and 8.0768N and 68.11678 and 97.41678E) for all June 2012 data and for the southwest monsoon seasons mentioned in Table 2. All of June 2012 data have been used for calibration and sensitivity analysis. The datasets mentioned in Table 2 have been used for validation purposes.
2) SELECTION OF RAIN DETECTION INDICES Since rain is a highly nonuniform and spatially distributed quantity, correct estimation within a spatial grid box depends not only on the accurate determination of the instantaneous rainfall rates for every raining pixel, but also on the effective screening of the nonraining pixels. There are a few indices in the literature that are based on the thresholding method (Haile et al. 2010) suitable for geostationary meteorological satellite datasets. Among them, the TIR-based simple thresholding is the most commonly used index. This index acts as a necessary condition for detecting raining pixels but not as a sufficient condition. Thus, this index has to be used along with other indices as a necessary condition.
Considering the availability of single-TIR- and WVband data recorded on the K-1 satellite, a few commonly used indices (Table 3) that depend on single-TIR- and WV-band data have been selected. For a quick reference, a brief overview of these indices is given in the next few paragraphs. WV-band images (5.7–7.1 mm; WVt) also give brightness temperature in accordance with the moisture content TABLE 2. Top scenarios obtained for different periods of four southwest monsoon seasons along with categorical statistics results. Case study
Time period
Top scenario
POD
HSS
POND
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
11–16 Jun 2010 21–26 Jul 2010 20–16 Aug 2010 4–10 Sep 2010 26–30 Jun 2011 15–22 Jul 2011 12–17 Aug 2011 3–9 Sep 2011 6–11 Jul 2012 20–25 Aug 2012 14–19 Sep 2012 11–16 Jun 2013 19–26 Jul 2013 13–17 Aug 2013 15–23 Sep 2013
8 8 81 8 8 8 8 1 8 8 65 8 8 1 1
0.757 0.737 0.592 0.516 0.730 0.845 0.819 0.724 0.701 0.712 0.726 0.765 0.766 0.829 0.800
0.431 0.327 0.237 0.210 0.429 0.587 0.419 0.498 0.472 0.490 0.497 0.506 0.445 0.621 0.580
0.674 0.593 0.645 0.703 0.722 0.742 0.600 0.774 0.771 0.777 0.771 0.740 0.679 0.792 0.779
2318
JOURNAL OF HYDROMETEOROLOGY
TABLE 3. List of the indices used for rain area detection. Time t 1 1 is equal to time plus one 30-min interval. Scenario index
Indices
1 2 3
TIRt WVt TIRt 2 WVt
4 5 6
TIRt 2 TIRt11 TIRt 2 mean5 SD
Reference Arkin and Meisner (1987) Behrangi et al. (2009) Ba and Gruber (2001); Haile et al. (2010) Vicente et al. (1998) Kalinga and Gan (2010) Hsu et al. (1997)
in the troposphere. If water vapor brightness temperature (WVBT) is low, then the atmosphere is moist and favorable to condensation; if WVBT is high, then the atmosphere is dry. Hence, this can be used as a supporting index for rain pixel detection. Another index is a check on the brightness temperature difference between the TIR and the WV channels (TIRt 2 WVt). If WVt is higher than or equal to the TIRt, it indicates the presence of deeper convective cloud tops as opposed to mere cirrus clouds. Hence, this is an important index, as shown by Ba and Gruber (2001) and Haile et al. (2010). If the coldest pixels in the first IR image (TIRt) are colder in the second image (TIRt11), that is, TIRt11 , TIRt, the cloud system is intensifying and the pixels in the first image are associated with the heaviest rainfall rates. Hence, Vicente et al. (1998) used an index TIRt 2 TIRt11, which is the difference in CTBT of the current image with the CTBT of the same pixel in the next image, indicating that the cloud growth is also as an important factor for rain area detection. TIRt 2 mean5 gives information about the spatial variation of CTBT by estimating gradients around 5 3 5 pixels. Along with overshooting tops, cold cloud tops, and expanding clouds, a high spatial gradient is also an important property of convective clouds. This index is used in algorithms proposed by Vicente et al. (1998) and Kalinga and Gan (2010) showing that the spatial gradient of CTBT could be used to identify convective cloud pixels. The CTBT standard deviation (SD) of 3 3 3 pixels is the texture property used by Hsu et al. (1997) as an input to the artificial neural networks for estimating rainfall. A large SD is associated with highly variable values of brightness temperature, indicating the presence of rainproducing clouds; a small SD indicates a weak gradient associated with cirrus clouds within the window. Hence, the SD of CTBT can be used to identify the nonraining thin cirrus clouds. In addition to these indices, an empirical method for classifying different levels of clouds developed by Roca et al. (2002) for Indian Ocean regions has also been
VOLUME 15
TABLE 4. Cloud classification scheme (scenario 121; Roca et al. 2002). Cloudiness class
Test
Clear sky Cloudy sky Mid- to upper-level clouds Low-level clouds
If TIRt . 282 K and SD # 0.5 K Otherwise If TIRt # 270 K Cloudy and if TIRt . 270 K and WVt . 246 K Cloudy and if TIRt . 270 K and WVt # 246 K
Semitransparent thin cirrus clouds
evaluated in this study. This empirical technique has been widely used in India as a rain area detection index, which is evident from the studies of Mishra et al. (2009, 2010) and Prakash et al. (2010), where they developed a rain-rate estimation algorithm suitable for Indian land regions called the Indian National Satellite (INSAT) Multispectral Rainfall Algorithm (IMSRA) using the above-mentioned Roca et al. cloud classification method. The rain area detection scheme adopted in IMSRA is given in Table 4. Here, the pixels that satisfy the mid- to upper-level- and low-level-cloud criteria are considered as raining and remaining pixels are classified as no-rain pixels.
3) SELECTION OF CATEGORICAL STATISTICS To evaluate the effectiveness of these indices, various commonly used categorical statistics like POD, POND, HSS, FAR, BIAS, and ACC (Ba and Gruber 2001; Haile et al. 2010; Kalinga and Gan 2010; Kuligowski 2002; Mishra et al. 2009) are considered for the present work. Considering their importance in this study, for a quick reference, they are briefly described in this section. The categorical statistics can be defined using a standard 2 3 2 contingency table. Table 5 shows a sample contingency matrix between the rain detection by a selected method in reference to the rain detection by the standard rainfall data (e.g., TRMM 2A25 is considered the accurate data for comparison): d
d
Hits h are the number of pixels for which both the selected rain detection index (RDI) and the TRMM 2A25 detected as rainfall, which indicates the number of pixels correctly classified as rainy by the RDI. False alarms f are the number of pixels for which rainfall is detected only by the selected RDI but not by
TABLE 5. Contingency matrix of forecast (any rain detection index) and observed (TRMM 2A25) data. Observed
Forecast
Yes (rain) No (no rain)
Yes (rain)
No (no rain)
h m
f z
DECEMBER 2014
d
d
UPADHYAYA AND RAMSANKARAN
the TRMM 2A25, which indicates the number of pixels incorrectly classified as rain by the selected RDI. Misses m are the number of pixels for which rainfall is not detected by the selected RDI but is detected by the TRMM 2A25, which indicates the number of pixels incorrectly classified as nonrainy by the selected RDI. Correct negatives z are the number of pixels for which rainfall is not detected by either the selected RDI or the TRMM 2A25, which indicates the number of pixels correctly classified as nonrainy by the selected RDI.
Based on the contingency matrix, various categorical statistics are defined as follows: BIAS 5 ( f 1 h)/(m 1 h) ,
(1)
POD 5 h/(m 1 h) ,
(2)
POND 5 z/(z 1 f ) ,
(3)
ACC 5 (h 1 z)/(h 1 m 1 z 1 f ) ,
(4)
FAR 5 f /( f 1 h) ,
(5)
HSS 5 (C 2 E)/(N 2 E) ,
(6)
E 5 [(h 1 f )(h 1 m) 1 ( f 1 z)(m 1 z)]/(h 1 m 1 z 1 f ), C 5 h 1 z,
and
(7) (8)
where C is the number of pixels for which both the selected RDI and the TRMM 2A25 report the same result; E is the expected number of rainy pixels correctly classified by the selected RDI purely due to chance; and N is the total number of pixels, that is, (h 1 f 1 m 1 z). d
d
d
BIAS is the ratio of the number of rainy pixels detected by an RDI to the total number of rainy pixels present in TRMM 2A25 and indicates whether the selected RDI under- or overestimates the number of rainy pixels available in TRMM 2A25. A BIAS greater than 1 implies that the RDI overestimates the number of rainy pixels while a BIAS of less than 1 implies the RDI underestimates. POD is the ratio of the number of correct rainy pixels detected by an RDI to the total number of rainy pixels available in TRMM 2A25, and it indicates the fraction of times the reference rainy pixels are correctly detected by the selected index. A POD of 1 indicates that the selected RDI correctly detects all the rainy pixels. Similar to the POD, POND represents the ratio of the number of correct nonrainy pixels detected by an RDI to the total number of nonrainy pixels present in an RDI. ACC is the ratio of the correctly detected pixels to the total number of pixels. It gives an overall accuracy of
d
d
2319
what percent of pixels were correctly classified. It varies between 0% and 100%. The higher the ACC, the better the RDI is. It gives the combined information of POD and POND. If the RDI has high POD/ POND, then ACC will also be high. FAR is the ratio of the number of incorrect rainy pixels detected by an RDI to the total number of rainy pixels detected by that RDI, which indicates the fraction of times the selected index falsely detects rainy pixels compared to TRMM 2A25. HSS values range from 21 to 1. HSS measures forecast accuracy relative to that of random chance. Some cases of HSS are (i) HSS 5 0 for (h 5 0) and (m 5 0), which indicates that the reference dataset, that is, standard rainfall product, did not detect any rainy pixels; (ii) HSS 5 0 for (h 5 0) and ( f 5 0), which indicates that the selected RDI did not detect any rainy pixels; (iii) HSS 5 21 for [(h 5 0) and (z 5 0) and ( f 5 m)], which indicates that the method (selected RDI) did not correctly detect any rainy and nonrainy pixels, and the number of pixels that are incorrectly classified as rainy and the number of pixels that are incorrectly classified as nonrainy by the selected index are equal; and (iv) HSS 5 1 for (m 5 0) and ( f 5 0), which indicates that the selected index did not incorrectly classify any pixel.
As all the statistics mentioned above give different information, only one cannot be used for choosing the better index or optimal threshold value as performed by Haile et al. (2010), Kuligowski (2002), and many others. For any decision making, visual interpretation of the results of the above-mentioned statistics would be difficult and result in subjective decisions. Hence, for an objective/ optimized decision making, the statistical methodology TOPSIS was used and is explained in the next section.
4) TOPSIS TOPSIS is the process of finding the best option from all of the feasible alternatives (Jahanshahloo et al. 2006). Hsu et al. (2007) propose that TOPSIS originates from the concept of a displaced ideal point from which the compromise solution has the shortest distance. The ranking of alternatives will be based on the shortest distance from the positive ideal solution (PIS) and the farthest from the negative ideal solution (NIS) or nadir. This methodology has been widely used in research areas like company financial ratio comparison (Deng et al. 2000), facility location selection (Chu 2002), gear material selection (Milani et al. 2005), manufacturing plant location analysis (Yoon and Hwang 1985), solid waste management (Cheng et al. 2002), and water management (Srdjevic et al. 2004). Considering its utility, in
2320
JOURNAL OF HYDROMETEOROLOGY
TABLE 6. Categorical statistics results at a few different threshold values of the TIRt index.
TABLE 7. Positive and negative ideal solution for the selected categorical statistics along with their possible min and max values.
200
220
240
260
280
300
Statistic
Max value
Min value
Positive ideal (I)
Negative ideal (J)
0.093 0.051 0.655 0.047 0.995 0.055
0.133 0.247 0.719 0.269 0.967 0.301
0.211 0.41 0.763 0.507 0.894 0.636
0.362 0.415 0.709 0.776 0.646 1.213
0.458 0.285 0.583 0.933 0.37 1.732
0.649 0.067 0.4 0.999 0.132 3.152
FAR HSS ACC POD POND BIAS
1 21 0 0 0 2‘
0 1 1 1 1 1‘
Min Max Max Max Max —
Max Min Min Min Min —
Threshold (K)
FAR HSS ACC POD POND BIAS
VOLUME 15
this study, an attempt has been made to apply this technique for selecting optimal threshold values for different rain detection indices as well as for selecting the better rain area detection scenario for Indian regions. In the present work, initially, the TOPSIS process has been applied to find the optimal threshold value for each of the six selected indices given in Table 3. For illustrations of the TOPSIS process, the results of the simple threshold TIRt index for various representative threshold values at an interval of 20 K are shown in Table 6. Kindly note that the actual analysis has been carried out at a finer interval of 1 K. Based on Table 6, only by visual interpretation, it is very difficult to come to a conclusion of the optimal threshold value. If a decision has to be made based on FAR, then it would be the threshold value 200 K, as the lowest FAR is considered best; POD, however, should be as high as possible, but it is lowest at 200 K. If all the statistics had to be considered together, then the visual interpretation result would be highly subjective and no unique solution would be possible. Hence, to overcome this difficulty and to find an optimized solution, the TOPSIS approach has been used. The steps involved in TOPSIS are described below. Step 1: Calculate the normalized decision matrix. As the input parameters might have values in different ranges, the first step is to scale them down between 0 and 1. The normalized value nij is calculated as ,sffiffiffiffiffiffiffiffiffiffiffi nij 5 xij
m
å x2ij ,
for
i 5 1, . . . , m and
i51
j 5 1, . . . , n,
(9)
where i represents different threshold values (total m in number); j represents different categorical statistics (total n different statistics); xij is the value of the categorical statistic under consideration, that is, jth statistic and ith threshold values; and nij is the normalized values of statistics.
Step 2: Determine the positive and negative ideal solutions: 1 A1 5 fy1 1 , . . . , yn g
5 f(max nij ni 2 I) OR (min nij ni 2 I)g
(10)
and 2 A2 5 fy 2 1 , . . . , yn g
5 f(min nij ni 2 J) OR (max nij ni 2 J)g,
(11)
where I is associated with the best result criteria; J is associated with the worst result criteria; A1 and A2 are positive and negative ideal solutions, respectively; and y represents the positive/negative ideal solution for each n categorical statistics. The positive ideal solution is the best value the statistic can acquire; similarly, the negative ideal solution is the worst value the statistic can acquire. For example, in the current work, the positive ideal solution for FAR would be the least normalized value of all the threshold results (min nij ni 2 I) and the negative ideal solution would be the highest normalized value (max nij ni 2 J). Table 7 gives the positive (best result) and negative (worst result) ideal solution selection criteria for each of the categorical statistics and also the maximum and minimum values the statistic can acquire. Statistics like BIAS cannot be considered in the TOPSIS analysis. Since the positive ideal solution for BIAS is 1, whereas the negative ideal can have two values, that is, both 2‘ and 1‘, it cannot be defined as a minimum or maximum value (Table 7). Step 3: Calculate the separation measures using the n-dimensional Euclidean distance. The separation of each alternative from the ideal solution is given as " d1 i 5
n
å
#1/2
2 (nij 2 y 1 j ) j51
,
for
i 5 1, . . . , m.
(12)
DECEMBER 2014
2321
UPADHYAYA AND RAMSANKARAN
TABLE 8. Combination of the indices (scenarios) adopted for rain detection. For example, scenario 20 is the AND combination of indices 4 and 6 given in Table 3 as follows: (TIRt 2 TIRt11) AND (SD). Likewise, scenario 77 is the OR combination of indices 4 and 6 given in Table 3 as follows: (TIRt 2 TIRt11) OR (SD). Scenario index and their logical connectives AND
OR
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Scenario index and their logical connectives
Basic indices combined in the scenario
AND
1, 2 1, 3 1, 4 1, 5 1, 6 2, 3 2, 4 2, 5 2, 6 3, 4 3, 5 3, 6 4, 5 4, 6 5, 6 1, 2, 3 1, 2, 4 1, 2, 5 1, 2, 6
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82
OR
Basic indices combined in the scenario
AND
OR
Basic indices combined in the scenario
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101
1, 3, 4 1, 3, 5 1, 3, 6 1, 4, 5 1, 4, 6 1, 5, 6 2, 3, 4 2, 3, 5 2, 3, 6 2, 4, 5 2, 4, 6 2, 5, 6 3, 4, 5 3, 4, 6 3, 5, 6 4, 5, 6 1, 2, 3, 4 1, 2, 3, 5 1, 2, 3, 6
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120
1, 2, 4, 5 1, 2, 4, 6 1, 2, 5, 6 1, 3, 4, 5 1, 3, 4, 6 1, 3, 5, 6 1, 4, 5, 6 2, 3, 4, 5 2, 3, 4, 6 2, 3, 5, 6 2, 4, 5, 6 3, 4, 5, 6 1, 2, 3, 4, 5 1, 2, 3, 4, 6 1, 2, 3, 5, 6 1, 2, 4, 5, 6 1, 3, 4, 5, 6 2, 3, 4, 5, 6 1, 2, 3, 4, 5, 6
Similarly, the separation from the negative ideal solution is given as " d2 i 5
n
å
j51
#1/2 2 (nij 2 y 2 j )
,
for
i 5 1, . . . , m,
(13)
where d represents the distance from the ideal solution. Step 4: Calculate the relative closeness Ri to the ideal solution. The relative closeness of the alternative Ai with respect to A1 is defined as 1 2 Ri 5 d2 i /(di 1 di ),
for
i 5 1, . . . , m .
Scenario index and their logical connectives
(14)
1 Since d2 i $ 0 and di $ 0, then clearly Ri 2 [0, 1]. Step 5: Rank the preference order: The set of alternatives can now be ranked according to the descending order of the value of Ri . The optimal threshold value of an index is one for which the relative closeness is maximum.
5) DEVELOPMENT OF RAIN DETECTION SCENARIOS
After finding the optimal threshold values for all six indices, as mentioned earlier, various scenarios have been developed by different AND or OR (logical connectives) combinations of the six indices. In AND combinations,
a pixel will be classified as raining only if all the indices in the combination classify that pixel as raining. In OR combinations, a pixel will be classified as raining if at least one of the indices in the combination classifies that pixel as raining. With the six basic indices, using AND and OR logical connectives, 328 combinations can be made. To reduce the computational burden, in the first stage, only the scenarios that use either AND or OR logical connectives have been considered, which reduced the number of scenarios to be tested to 121. The scenarios that involve a combination of AND and OR logical connectives are also considered, but in the second stage. The selected 121 scenarios have been assigned an index/code so that they can be easily referred to in further discussions. The first six scenarios are the basic simple threshold indices given in Table 3, the scenarios from 7 to 120 are given in Table 8, and the last scenario is the multispectral threshold classification (Table 4) developed by Roca et al. (2002), the most widely used method for Indian regions. Based on the results of the categorical statistics for all the 121 scenarios selected in the first stage, a few indices that are not performing well have been identified. Then, during the second stage of the development of scenarios, which involves combinations using both AND and OR logical connectives, the
2322
JOURNAL OF HYDROMETEOROLOGY
TABLE 9. List of scenarios that include both AND and OR logical connectives. For example, scenario 122 is the combination of indices 1, 2, and 3 given in Table 3 connected with AND and OR logical connectives as follows: (TIRt) AND (WVt) OR (TIRt 2 WVt). Scenario index
Basic indices combined in the scenario
Respective logical connectives
122 123 124 125 126 127 128 129 130 131 132 133 134 135
1, 2, 3 1, 2, 3 1, 2, 4 1, 2, 4 1, 3, 4 1, 3, 4 2, 3, 4 2, 3, 4 1, 2, 3, 4 1, 2, 3, 4 1, 2, 3, 4 1, 2, 3, 4 1, 2, 3, 4 1, 2, 3, 4
AND, OR OR, AND AND, OR OR, AND AND, OR OR, AND AND, OR OR, AND AND, OR, OR OR, AND, OR OR, OR, AND OR, AND, AND AND, OR, AND AND, AND, OR
scenarios with the nonperforming indices have been neglected. Details of the scenarios considered in the second stage are given in Table 9. Based on this twostage approach, the total number of scenarios to be considered for the calibration exercise comes out to be 135.
6) SELECTION OF OPTIMAL RAIN DETECTION
VOLUME 15
TABLE 10. Details of test datasets used for rain detection. No. of pixels
Set 1 Set 2 Set 3 Set 4 Set 5
Rain
No rain
Total
1000 1500 500 2000 200
1000 500 1500 2000 3800
2000 2000 2000 4000 4000
detection index it may give different results. Therefore, it will be very difficult to interpret the results and may lead to the wrong conclusion. Hence, it is very important to understand the effect of size and proportion of rain and no-rain pixels in the test dataset on the categorical statistics. A sensitivity analysis was carried out by preparing different test datasets with varying a size and proportion of rain and no-rain pixels as given in Table 10. These five test datasets were prepared from the June 2012 collocated dataset. Results of the sensitivity analysis were used to identify the categorical statistics, which are independent of the size and proportion of rain and no-rain pixels in the test dataset. Thus, the obtained independent statistics were used in the calibration and validation exercises. Details of the analysis and the results obtained are discussed in section 4a.
SCENARIO
With the optimal threshold values for all six indices, various rain detection scenarios have been considered, as mentioned in the previous section. The next and last step of the MIRD approach is to find an optimal rain detection scenario for the study region among all the scenarios considered in the analysis. To accomplish this task for all the considered scenarios, the TOPSIS analysis has once again been carried out based on the independent categorical statistics obtained from the sensitivity analysis (mentioned in section 3b). Here, an optimal scenario is the one with the highest TOPSIS relative closeness value (RCV). Details about various analyses carried out in the MIRD approach and the results thus obtained are discussed in sections 4b–e.
b. Sensitivity analysis As rainfall is generally a rare event, the collocated dataset will be highly biased toward no-rain pixels. Every collocated dataset may have different proportions of rain and no-rain pixels and may be of a different size. If the categorical statistics are dependent on the size and proportion of rain and no-rain pixels in the collocated dataset, for a particular rain
4. Results and discussion a. Sensitivity analysis Before applying TOPSIS for decision making, a sensitivity analysis as mentioned in section 3b was carried out to select the independent categorical statistics. It should be noted that, regardless of whether the statistic could be used in the TOPSIS analysis or not, a sensitivity analysis was carried out on all the statistics mentioned in section 3a(3), as these are the most widely used categorical statistics. For this purpose, the June 2012 dataset was used in the analysis. The actual collocated dataset of June 2012 contained ;2 100 000 pixels, out of which only ;100 000 were rain pixels and the remaining ;2 000 000 were no-rain pixels. The number of rain and no-rain pixels was counted by comparing the corresponding rainfall rates of TRMM 2A25 collocated pixels. If the 2A25 pixel rain rate was greater than zero, then it was considered as a rain pixel; otherwise, it was a no-rain pixel. The influence of the proportion of rain and no-rain pixels and of the size of the dataset on the categorical statistics results were studied by carrying out the sensitivity analysis using the test datasets (Table 10) by
DECEMBER 2014
UPADHYAYA AND RAMSANKARAN
2323
FIG. 2. Plot of various categorical statistics values for five test datasets against various threshold values: (a) BIAS in log scale, (b) POD, (c) POND, (d) ACC, (e) FAR, and (f) HSS.
considering the simple threshold TIRt index with different threshold values. Considering the results of the previous studies (Haile et al. 2010; Arkin and Meisner 1987), different temperature values between 200 and 300 K were used in the analysis. The categorical statistics were then found for each threshold value (considered at an interval of 1 K) for all five test datasets. For illustration purposes, the general behavior of various categorical statistics for the five test datasets (Table 10) is shown in Figs. 2a–e for different representative threshold values. From Fig. 2a, it can be observed that the BIAS increases when the number of rain pixels decreases in the dataset. From the formula for BIAS given in Eq. (1), it can be said that if the dataset contains fewer observed ‘‘yes’’ events compared to the observed ‘‘no’’ events (like set 5), then most of the time, the BIAS will be very high, even if only 5%–10% of observed ‘‘no’’ events were forecasted as ‘‘yes’’ (because the numerator
will be very high compared to the denominator). Hence, BIAS can only be a good measure when the dataset contains an equal number of rain and no-rain pixels. Moreover, BIAS does not measure how well the forecast corresponds to the observations, it only measures relative frequencies. Let us assume a case where h 1 f 5 h 1 m and h 5 0. For this case, BIAS will be equal to 1, though no single pixel is classified as raining because h 5 0. The above-mentioned example highlights that even if a single raining pixel is not identified, then BIAS can also be 1. Hence, the categorical statistic BIAS, which is dependent on the proportion of rain and no-rain pixels, may lead to a wrong conclusion, and therefore, it has not been used in the TOPSIS analysis. From Fig. 2b, it can be observed that the POD values do not depend on the size and the proportion of rain and no-rain pixels, because the values are almost equal for all the test datasets. Equation (2) of POD contains the variables h and m, which depend only on the rain pixels
2324
JOURNAL OF HYDROMETEOROLOGY
in the dataset and do not vary for any change in the size and proportion of no-rain pixels in the dataset. The same thing holds valid for POND (Fig. 2c) too, indicating that POND [Eq. (3)] is also independent of the size and proportion of rain and no-rain pixels in the dataset. From Fig. 2c, it can also be observed that the POND value decreases as the threshold increases, which indicates that most of the no-rain pixels are incorrectly classified as rain pixels. At the same time, the POD value increases (Fig. 2b), indicating that most of the rain pixels are correctly classified. For an ideal condition, both POD and POND should be high. However, selecting an optimal threshold for which both POD and POND should be as high as possible becomes highly subjective. Hence, considering the fact that the POD and POND are not dependent on the size of the dataset as well as the proportion of rain and no-rain pixels in the dataset, both have been considered in the TOPSIS analysis for selection of the optimal threshold and optimal rain detection scenario. From Fig. 2d, it can be observed that the ACC does not have a proper trend, and it is very difficult to ascertain the influence of different test datasets with the varying size and proportion of rain and no-rain pixels. Moreover, if we observe the formula for ACC, that is, Eq. (4), carefully, it can be understood that the ACC gives the combined information of POD and POND. Hence, to avoid redundancy, it is not included in the TOPSIS analysis. From Fig. 2e, it can be observed that if the number of raining pixels increases in the test dataset, the FAR decreases (i.e., the prediction improves). FAR is very sensitive to false alarms (i.e., f ), and can be artificially improved by increasing or decreasing the rain and norain pixels, respectively, in the dataset. Thus, this cannot be considered as a true measure of false alarm rate and therefore not been considered in the TOPSIS analysis. From Fig. 2f, it can be observed that HSS, like ACC (Fig. 2d), does not have a proper trend to conclude its dependence on the proportion of rain and no-rain pixels in the dataset. However, it can be observed from Fig. 2f that the results for sets 1 and 4 are similar at particular threshold values, indicating that HSS is independent of the size of the dataset. HSS measures the forecast accuracy relative to that of random chance and has less of a tendency to be biased when evaluating the performance for rare-event situations (Todd et al. 2001). Therefore, HSS has been considered for the TOPSIS analysis, as it gives different information than POD and POND. Based on the above discussions, it can be said that POD, POND, and HSS are not affected by the size and the proportion of rain and no-rain pixels in the dataset.
VOLUME 15
TABLE 11. TOPSIS RCV and the ranking of TIRt index at different thresholds. Threshold (K)
TOPSIS RCV TOPSIS ranking
200
220
240
260
280
300
0.463 6
0.550 3
0.820 2
0.847 1
0.544 4
0.474 5
However, to further validate this, another test dataset (set 6) containing all June 2012 collocated pixels of ;100 000 rain pixels and about 2 000 000 no-rain pixels, that is, a total of 2 100 000 pixels, has been prepared and sensitivity analysis has been carried out. The results thus obtained (not shown here) were found to be similar to the results obtained earlier. Hence, it can be certainly said that the POD, POND, and HSS are not affected by the size and the proportion of rain and no-rain pixels in the dataset. Based on the above-mentioned sensitivity analysis, it can be said that the statistics of BIAS and FAR can lead to a wrong conclusion if an unequal proportion of rain and no-rain pixel datasets are used for calibration/ validation. Considering the above-mentioned results and the fact that a satellite image will always contain an unequal proportion of rain and no-rain pixels, in the present work, the independent statistics like POD, POND, and HSS are only considered in the TOPSIS analysis, which gives unique information for each.
b. Selection of optimal threshold values The entire June 2012 collocated dataset has been used as a calibration dataset. The TOPSIS analysis has been carried out based on POD, POND, and HSS statistics to estimate the optimal threshold values for all six indices. For illustration purposes, the TOPSIS relative closeness values obtained for the simple threshold index at various representative threshold values are given in Table 11. Kindly note that the actual analysis has been carried out at the 1-K interval. The highest TOPSIS relative closeness value was obtained for 260 K; hence, this is considered the optimal threshold for the TIRt index. In a similar manner, optimal threshold values (Table 12) have been obtained for the other five indices that have high POD, POND, and HSS.
c. Development of new scenarios based on the performance evaluation of the 121 developed rain detection scenarios In this section, a detailed examination of the categorical statistics results obtained from the calibration dataset for 121 scenarios has been carried out. Based on
DECEMBER 2014
UPADHYAYA AND RAMSANKARAN
TABLE 12. Optimal threshold values obtained for the selected indices. Serial No.
Indices
Threshold value (K)
1 2 3 4 5 6
TIRt WVt TIRt 2 WVt TIRt 2 TIRt11 TIRt 2 mean5 SD
260 237 19 227 228 2.5
the study, new scenarios have been developed using the selected indices that have both AND and OR logical connectives.
1) STAGE I: ANALYSIS OF THE SCENARIOS WITH AND OR OR LOGICAL CONNECTIVES (121 SCENARIOS) In the following section, independent categorical statistics have been worked out for all 121 scenarios and their performances have been compared. Figures 3a–c shows the categorical statistics obtained for 121 scenarios, wherein the scenarios are arranged in descending order of POD for better interpretation of results. In particular, Fig. 3a shows the results of AND-based scenarios, Fig. 3b shows the results of OR-based scenarios along with scenario 121, and Fig. 3c shows the results of few better-performing scenarios (based on POD and POND) from all the considered scenarios. By visually analyzing Fig. 3a, it can be observed that many of the AND-based scenarios could detect only a few rain pixels (POD 1), whereas, from Fig. 3b, it can be observed that most of the OR-based scenarios have a high POD but a low POND, indicating that most of the rain pixels are identified but at the cost of the false detection of no-rain pixels. In the case of OR scenarios, if any index in the combination scenario classifies the pixel as raining, then the result would be a raining pixel, and hence, it is logical that the OR-based scenarios tend to overestimate rain pixels by misclassifying no-rain pixels and vice versa for AND-based scenarios. POD is inversely related to POND, which can be confirmed by closely observing Figs. 3a and 3b. As mentioned earlier, an optimal scenario is one that has high POD and POND values, a necessary but not a sufficient criterion. Figure 3c shows such scenarios where both POD and POND values are considerably high. However, it is very difficult to determine the best-performing scenario by visual interpretation of Fig. 3c, and if we do so, it may lead to subjective decision making. Hence, TOPSIS analysis is necessary for selecting an optimal scenario. From Fig. 3b, it can be observed that the Roca classification (scenario 121) and the other AND-based scenarios
2325
with SD (e.g., scenarios 53, 54, and 58) did not perform well. The effect of the performance of the individual indices combined in a scenario can be observed evidently from AND-based scenarios because in AND operator-based scenarios, the individual index’s performance directly affects the scenario’s results. Hence, it can be said that AND-based scenarios with SD and Roca method’s poor performance is due to the poor performance of the SD index. It can also be observed that there is not much improvement in results for the scenarios with more than three indices, signifying that considering a large number of indices may not always produce good results.
2) STAGE II: ANALYSIS FOR SELECTION OF THE SCENARIOS WITH BOTH AND AND OR LOGICAL CONNECTIVES
Based on the results obtained from the analysis of the scenarios with AND or OR logical connectives as discussed in section 4c(1), it can be observed that the scenarios with SD did not perform well. The SD is a texture-based index, which depends on the spatial distribution of CTBT over the surrounding pixels. However, the scenarios with the TIR 2 mean5 index, which is another texture-based index, seem to perform better than the scenarios with SD (Fig. 3). Hence, to check whether this behavior of the scenarios with texturebased indices is consistent, a further evaluation of the texture-based indices was carried out for the other three monsoon months of 2012, that is, July, August, and September. Preparation (i.e., collocation) of the dataset for these three months was done in a similar manner as for the June 2012 dataset. Figures 4a–c show the results obtained for only AND-based scenarios with the TIR 2 mean5 and SD indices, so that the effect of these indices on the scenario’s performance can be clearly seen. Based on the categorical statistics results shown in Figs. 4a–c, it can be observed that the scenarios that include the TIR 2 mean5 index (e.g., scenarios 10, 14, 33, and 63) do not detect any raining pixels in July and September and detect only very few pixels in August. Likewise, the scenarios with the SD index (e.g., scenarios 15, 18, 20, and 36) also detect only a very few raining pixels during these months. Hence, it can be concluded that the texture-based indices are not performing well. Because of the coarse resolution (8 km 3 8 km) of the satellite (Kalpana-1) data considered for the analysis, perhaps the SD and TIR 2 mean5 indices could not capture the textural information properly, and this leads to a poor performance of the scenarios in which they are included. For example, the mean5 index considers the surrounding 5 3 5 pixels, that is, an area around 40 km 3 40 km, which represents a huge area
2326
JOURNAL OF HYDROMETEOROLOGY
VOLUME 15
FIG. 3. Independent categorical statistics results for the calibration dataset (June 2012) for (a) AND-based scenarios, (b) OR-based scenarios along with the scenario 121, and (c) a few better-performing scenarios based on POD and POND among all 121 scenarios.
over which it is practically impossible to have a cloud spread. Hence, it can be said that such an index provides useful information only when fine-resolution data are available. The reason behind the poor performance of scenario 121 may also be the same, as it considers SD for rain pixel detection. Considering the above-mentioned results, for developing the scenarios that include both AND and OR logical connectives, certain combination of indices have been neglected. Because of the poor performance of the texture-based indices, the scenarios that include the TIR 2 mean5 and SD indices are ignored. Table 9 gives the list of scenarios with both AND and OR connectives considered for the selection of optimal scenario.
d. Selection of optimal scenario After finding the optimal threshold values for all the indices, the next task is to find the optimal scenario for Indian conditions. To decide the most suitable (optimal) scenario among 135 scenarios (Tables 3, 4, 8, and 13), the TOPSIS analysis was carried out based on the independent categorical statistics results like POD, POND, and HSS, and the optimal scenario was identified based on the TOPSIS relative closeness value. Table 13 gives the top five scenarios obtained for the calibration dataset along with the TOPSIS relative closeness value. Scenario 8 (TIRt , 260 K and TIRt 2 WVt , 19 K), which has the highest TOPSIS relative closeness value
DECEMBER 2014
2327
UPADHYAYA AND RAMSANKARAN
FIG. 4. Independent categorical statistic results for the AND-based scenarios with the SD and TIR 2 mean5 indices for (a) July, (b) August, and (c) September 2012.
compared to the other scenarios, has been considered as the optimal scenario for rain detection.
e. Validation of the optimal scenario for independent datasets To confirm and validate the performance of scenario 8 among the other 134 selected scenarios, 15 different datasets from four southwest monsoon seasons (between 2010 and 2013) were used. For this purpose, once again the TOPSIS analysis was done for all 135 scenarios using these 15 validation datasets. The top-ranked scenarios, the duration of the dataset considered, and the corresponding categorical statistics results are given in Table 2. Even though the rainfall event is highly uncertain, it can be observed that, out of the 15 cases, in 10 cases scenario 8 performs better. Because of the uncertainties in the reference dataset (TRMM 2A25) and satellite (Kalpana-1) images, scenario 8 did not perform the best in the remaining five cases (3, 8, 11, 14, and 15), but its performance is equivalent to that of the top scenario of these cases (Table 14). Hence, based on the
results (Table 2) of the TOPSIS analysis, it can be confirmed that scenario 8 is an optimal scenario for rain detection over Indian regions.
f. Validation of the optimal scenario using AWS hourly gauge data Considering the poor performance of the scenarios listed in Table 9, the scenarios listed in Tables 3, 4, and 8 were only considered for the validation with AWS datasets. For this validation, the AWS-based ground truth rainfall information for June 2012 was used. Gauge data are
TABLE 13. Top five scenarios obtained for June 2012. Scenario index
TOPSIS RCV
TOPSIS ranking
8 3 1 16 7
0.944 0.931 0.928 0.926 0.915
1 2 3 4 5
2328
JOURNAL OF HYDROMETEOROLOGY
TABLE 14. TOPSIS RCV of scenario 8 and other top scenarios for few cases where scenario 8 is not ranked first.
VOLUME 15
TABLE 15. Top five scenarios by comparing with AWS gauge data for June 2012 dataset.
Case study
Scenario index
TOPSIS RCV
Ranking
Scenario index
3
81 8 1 8 65 8 1 8 1 8
0.860 0.848 0.902 0.902 0.948 0.948 0.954 0.949 0.943 0.937
1 2 3 4 5
8 1 3 7 2
8 12 15 16
point data with accumulated hourly rain rates. The following assumption was made to compare the point data with the pixel data, that is, a pixel is only considered as collocated with a rain gauge station if that station point lies within that pixel. The reason behind the assumption is that the rain gauges are not uniformly distributed over the study area and also that, most of the time, many of the station data were missing. So instead of interpolating the gauge data, a stationwise comparison was made similar to Nair et al. (2009) for collocating the Kalpana-1 images with the gauge data. To temporally collocate the hourly gauge data with half-hourly Kalpana-1 images, three subsequent Kalpana-1 rain/no-rain classified images of each scenario were overlaid separately. In the overlay, if a pixel shows raining in any one of the three images, then the collocated pixel of a particular scenario would be considered as raining. After analyzing the entire June 2012 satellite data with gauge data, a total of ;28 000 collocated pixels were obtained, out of which ;1200 were rain pixels and the rest were no-rain pixels. Using the collocated dataset, the independent categorical statistics like POD, POND, and HSS have been estimated for each scenario. The TOPSIS analysis was then carried out to determine the top five scenarios (Table 15). Validation using the gauge data also shows that scenario 8 performs well, similar to the results obtained when compared with TRMM 2A25 data.
5. Summary and conclusions This article presents a new approach called MultiIndex Rain Detection (MIRD) for regional rain area detection using satellite data. The approach has been tested for Indian land regions using Kalpana-1 satellite data by evaluating six commonly used rain detection indices. The algorithm was built with the following hypothesis: better results should be obtained
for combined indices than an individual index. In total, 328 different scenarios have been developed by combining the six basic indices in different combinations using AND and OR logical connectives. The optimal threshold values for each individual basic index and an optimal rain detection scenario were found through the TOPSIS analysis based on independent categorical statistics like POD, POND, and HSS. These independent categorical statistics were identified through a sensitivity analysis. The results of the sensitivity analysis show that the statistics like BIAS and FAR would lead to a biased or wrong conclusion if the dataset contains an unequal proportion of rain and no-rain pixels in it; hence, they are not used in the analysis. The proposed MIRD approach has been evaluated across the Indian land regions using different datasets. Based on the validation results obtained using TRMM 2A25 rainfall data and AWS rain gauge data, it is found that scenario 8 (TIRt , 260 K and TIRt 2 WVt , 19 K) performs exceptionally well compared to the other scenarios over Indian land regions. It is also found that the scenarios with texturebased indices (SD and TIR 2 mean5) did not perform well, perhaps because of the coarser resolution of K-1 data. It is to be noted that only a few commonly used indices (Table 3) that depend on single-TIR- and WV-band data were used in this study, because the Kalpana-1 image only has single-TIR- and WV-band data. However, if any high-resolution geostationary satellite data are available with more bands like in INSAT-3D, Meteosat10, etc., then the MIRD approach can be further improved using a greater number of multispectral and texture-based indices. Finally, the proposed algorithm (MIRD) could be used in operational satellite-based rainfall rate retrieval algorithms to obtain a more accurate spatiotemporal distribution of rain and no-rain areas, a crucial initial step in satellite rainfall rate retrieval algorithms. Acknowledgments. Authors thank all three reviewers for their insightful comments, which made us rethink many of the aspects of the manuscript and present them
DECEMBER 2014
UPADHYAYA AND RAMSANKARAN
in a better way. We would also like to thank the TRMM science team and the Meteorological and Oceanographic Satellite Data Archival Centre (MOSDAC) team for providing us with the satellite data products and ground truth data free of cost through their data repositories. Finally, we thank the AMS Full Waiver Committee for considering our article for full waiver of publication charges.
REFERENCES Adler, R. F., A. J. Negri, P. R. Keehn, and I. M. Hakkarinen, 1993: Estimation of monthly rainfall over Japan and surrounding waters from a combination of low-orbit microwave and geosynchronous IR data. J. Appl. Meteor., 32, 335–356, doi:10.1175/1520-0450(1993)032,0335:EOMROJ.2.0.CO;2. ——, G. J. Huffman, D. T. Bolvin, S. Curtis, and E. J. Nelkin, 2000: Tropical rainfall distributions determined using TRMM combined with other satellite and rain gauge information. J. Appl. Meteor., 39, 2007–2023, doi:10.1175/1520-0450(2001)040,2007: TRDDUT.2.0.CO;2. Arkin, P. A., and B. Meisner, 1987: The relationship between large-scale convective rainfall and cold cloud over the Western Hemisphere during 1982–1984. Mon. Wea. Rev., 115, 51–74, doi:10.1175/1520-0493(1987)115,0051: TRBLSC.2.0.CO;2. Ba, M., and A. Gruber, 2001: GOES Multispectral Rainfall Algorithm (GMSRA). J. Appl. Meteor., 40, 1500–1514, doi:10.1175/ 1520-0450(2001)040,1500:GMRAG.2.0.CO;2. Behrangi, A., K. L. Hsu, B. Imam, S. Sorooshian, and R. Kuligowski, 2009: Evaluating the utility of multispectral information in delineating the areal extent of precipitation. J. Hydrometeor., 10, 684–700, doi:10.1175/2009JHM1077.1. Cheng, S., C. W. Chan, and G. H. Huang, 2002: Using multiple criteria decision analysis for supporting decision of solid waste management. J. Environ. Sci. Health, Part A: Environ. Sci. Eng., 37, 975–990, doi:10.1081/ESE-120004517. Chu, T. C., 2002: Facility location selection using fuzzy TOPSIS under group decision. Int. J. Uncertainty Fuzziness Knowl. Based Syst., 10, 687–701, doi:10.1142/S0218488502001739. Deng, H., C. H. Yeh, and R. J. Willis, 2000: Inter-company comparison using modified TOPSIS with objective weights. Comput. Oper. Res., 27, 963–973, doi:10.1016/ S0305-0548(99)00069-6. Haile, A. T., T. H. Rientjes, A. Gieske, and M. Gebremichael, 2010: Rainfall estimation at the source of the Blue Nile: A multispectral remote sensing approach. Int. J. Appl. Earth Obs. Geoinf., 12, S76–S83, doi:10.1016/j.jag.2009.09.001. Holl, G., S. A. Buehler, B. Rydberg, and C. Jimenez, 2010: Collocating satellite-based radar and radiometer measurements— Methodology and usage examples. Atmos. Meas. Tech., 3, 693–708, doi:10.5194/amt-3-693-2010. Hsu, K., X. Gao, S. Sorooshian, and H. V. Gupta, 1997: Precipitation estimation from remotely sensed information using artificial neural networks. J. Appl. Meteor., 36, 1176–1190, doi:10.1175/1520-0450(1997)036,1176:PEFRSI.2.0.CO;2. Hsu, S.-S., J.-S. Huan, and S. E. Lee, 2007: An extension of TOPSIS for group decision making. Math. Comput. Modell., 45, 801– 813, doi:10.1016/j.mcm.2006.03.023.
2329
Huffman, G. J., and Coauthors, 2007: The TRMM Multisatellite Precipitation Analysis (TMPA): Quasi-global, multiyear, combined-sensor precipitation estimates at fine scales. J. Hydrometeor., 8, 38–55, doi:10.1175/JHM560.1. Iguchi, T., T. Kozu, R. Meneghini, J. Awaka, and K. Okamoto, 2000: Rain-profiling algorithm for the TRMM precipitation radar. J. Appl. Meteor., 39, 2038–2052, doi:10.1175/ 1520-0450(2001)040,2038:RPAFTT.2.0.CO;2. Jahanshahloo, G. R., F. H. Lotfi, and M. Izadikhah, 2006: An algorithmic method to extend TOPSIS for decision-making problems with interval data. Appl. Math. Comput., 175, 1375–1384, doi:10.1016/j.amc.2005.08.048. Joyce, R., J. E. Janowiak, P. A. Arkin, and P. Xie, 2004: CMORPH: A method that produces global precipitation estimates from passive microwave and infrared data at high spatial and temporal resolution. J. Hydrometeor., 5, 487–503, doi:10.1175/ 1525-7541(2004)005,0487:CAMTPG.2.0.CO;2. Kalinga, A. K., and T. Y. Gan, 2010: Estimation of rainfall from infrared-microwave satellite data for basin-scale hydrologic modelling. Hydrol. Processes, 24, 2068–2086, doi:10.1002/ hyp.7626. Kühnlein, M., T. Appelhans, B. Thies, and T. Nauss, 2014: Improving the accuracy of rainfall rates from optical satellite sensors with machine learning—A random forests-based approach applied to MSG SEVIRI. Remote Sens. Environ., 141, 129–143, doi:10.1016/j.rse.2013.10.026. Kuligowski, R. J., 2002: A self-calibrating real-time GOES rainfall algorithm for short-term rainfall estimates. J. Hydrometeor., 3, 112–130, doi:10.1175/1525-7541(2002)003,0112: ASCRTG.2.0.CO;2. Meneghini, R., T. Iguchi, T. Kozu, L. Liao, K. Okamoto, J. Jones, and J. Kwiatowski, 2000: Use of the surface reference technique for path attenuation estimates from the TRMM Precipitation Radar. J. Appl. Meteor., 39, 2053–2070, doi:10.1175/ 1520-0450(2001)040,2053:UOTSRT.2.0.CO;2. Milani, A. S., A. Shanian, and R. Madoliat, 2005: The effect of normalization norms in multiple attribute decision making models: A case study in gear material selection. Struct. Multidiscip. Optim., 29, 312–318, doi:10.1007/s00158-004-0473-1. Mishra, A., R. M. Gairola, A. K. Varma, and V. K. Agarwal, 2009: Study of intense rainfall events over India using Kalpana-IR and TRMM precipitation radar observations. Int. J. Curr. Sci., 97, 689–695. ——, ——, ——, and ——, 2010: Remote sensing of precipitation over Indian land and oceanic regions by synergistic use of multi-satellite sensors. J. Geophys. Res., 115, D08106, doi:10.1029/2009JD012157. Nair, S., G. Srinivasan, and R. Nemani, 2009: Evaluation of multisatellite TRMM derived rainfall estimates over a western state of India. J. Meteor. Soc. Japan, 87, 927–939, doi:10.2151/ jmsj.87.927. Prakash, S., C. Mahesh, R. M. Gairola, and P. K. Pal, 2010: Estimation of Indian summer monsoon rainfall using Kalpana-1 VHRR data and its validation using rain gauge and GPCP data. Meteor. Atmos. Phys., 110, 45–57, doi:10.1007/ s00703-010-0106-8. ——, V. Sathiyamoorthy, C. Mahesh, and R. M. Gairola, 2014: An evaluation of high-resolution multisatellite rainfall products over the Indian monsoon region. Int. J. Remote Sens., 35, 3018–3035, doi:10.1080/01431161.2014.894661. Roca, R., M. Viollier, L. Picon, and M. Desbois, 2002: A multisatellite analysis of deep convection and its moist environment over the Indian Ocean during the winter
2330
JOURNAL OF HYDROMETEOROLOGY
monsoon. J. Geophys. Res., 107, 8012, doi:10.1029/ 2000JD000040. Srdjevic, B., Y. D. P. Medeiros, and A. S. Faria, 2004: An objective multi-criteria evaluation of water management scenarios. Water Resour. Manage., 18, 35–54, doi:10.1023/ B:WARM.0000015348.88832.52. Thies, B., T. Nauss, and J. Bendix, 2008: Discriminating raining from non-raining cloud areas at mid-latitudes using Meteosat Second Generation SEVIRI nighttime data. Meteor. Appl., 15, 219–230, doi:10.1002/met.56. Todd, M. C., C. Kidd, D. Kniveton, and T. J. Bellerby, 2001: A combined satellite infrared and passive microwave technique
VOLUME 15
for estimation of small-scale rainfall. J. Atmos. Oceanic Technol., 18, 742–755, doi:10.1175/1520-0469(2001)058,0742: ACSIAP.2.0.CO;2. Vicente, G. A., R. A. Scofield, and W. P. Mensel, 1998: The operational GOES infrared rainfall estimation technique. Bull. Amer. Meteor. Soc., 79, 1883–1898, doi:10.1175/ 1520-0477(1998)079,1883:TOGIRE.2.0.CO;2. Yoon, K., and C. L. Hwang, 1985: Manufacturing plant location analysis by multiple attribute decision making: Part I—Singleplant strategy. Int. J. Prod. Res., 23, 345–359, doi:10.1080/ 00207548508904712.