hydrologic regionalization using wavelet based ...

2 downloads 0 Views 5MB Size Report
FCA. Fuzzy clustering algorithm. FM. Fowlkes-Mallows index ...... Jaccard index. ➢ Fowlkes-Mallows (FM) index ..... three heterogeneity measures (HMs):. 1. (. )v.
HYDROLOGIC REGIONALIZATION USING WAVELET BASED MULTISCALE ENTROPY TECHNIQUE Submitted By: ANKIT AGARWAL 2013CEW2204 Department Of Civil Engineering Submitted in fulfillment of the requirements for the degree of MASTER OF TECHNOLOGY in WATER RESOURCES ENGINEERING Under the Guidance of Dr. R. MAHESWARAN, Prof. C. BERNHOFER & Prof. R. KHOSA

To the DEPARTMENT OF CIVIL ENGINEERING INDIAN INSTITUTE OF TECHNOLOGY, DELHI June, 2015

CERTIFICATE

This is hereby certified that the work which is being submitted by Mr. Ankit Agarwal in the report titled, ‘Hydrologic regionalization using wavelet based multiscale entropy technique”, in partial fulfillment of the requirements for the award of degree of Master of Technology in Water Resources Engineering, is an authentic record of bonafide work carried out by him from July, 2014 to June, 2015 at Department of hydrology and meteorology, Technical universityDresden, Germany and Department of Civil Engineering, Indian Institute of Technology-Delhi under our supervision and guidance.

It is certified that this report is an original product and has not been submitted in part or in full to any other institute or university.

Dr. R Maheswaran Inspire Faculty, Department of Civil Engineering, Indian Institute of Technology, Delhi

Prof. Rakesh Khosa Program coordinator Department of Civil Engineeringneering, Indian Institute of Technology, Delhi

Prof. C. Bernhofer Chair of Meteorology Department of hydrology and meteorology Technical University-Dresden, Germany

Prof. M.F. Buchroithner Director r raphy Institute for cartography Technical University-Dresden, Germany

i

ACKNOWLEDGEMENTS First and foremost, praises and thanks to the God, the Almighty, for his showers of blessings throughout my research work to complete the research successfully within in stipulated time. I would like to articulate my deep gratitude to my esteemed project guide Prof. RAKESH KHOSA (program coordinator), Dr. R MAHESWARAN (Inspire Faculty), Department of Civil Engineering, Indian Institute of Technology-Delhi, India and Prof. CHRISTIAN BERNHOFER (Chair of Meteorology) Department of hydrology and meteorology, Technical University-Dresden, Germany for providing me an opportunity to work under their supervision and guidance. They have always been my motivation for carrying out the project. Their constant encouragement at every step was a precious asset to me during my work. I express my heartiest thanks to Prof. B. SIVAKUMAR the University of New South Wales, Sydney, Australia and Prof. MANFRED F. BUCHROITHNER (Director) Institute for cartography, Technical University-Dresden, Germany, for their expert and kind guidance to my thesis work. My work would have been directionless without their encouragement, constant and patient guidance. I express my deep appreciation and sincere thanks to Vinit Sehgal (Project Associate) and Sankalp Lahari (Research Associate) for providing all kinds of possible support, guidance, help and encouragement during my project work. I also extend my thanks to the all research scholars and my fellow classmates especially Anchit Lakhanpal, Durga Prasad Panday and Poornima Mahalawat for their immense cooperation and help whenever I needed.

I wish to give special thanks to Research scholars Thomas Gruenwald (Post-Doc) and Christian Engelman (M.sc) Technical university-Dresden, Germany for their direct and indirect contribution in developing the project.

ii

I am grateful to the staff member of Department of Civil Engineering, Indian Institute of Technology especially Mr. Rajveer Agarwal, Mr.Neeraj Gehlot and Mr. Tikaram and Technical university-Dresden, Germany for providing me all facilities required for the project work. An assemblage of this nature could never have been attempted without reference to and inspiration from the works of others whose details are mentioned in reference section. I acknowledge our indebtedness to all of them. I am also greatly indebted to Deutscher Akademischer Austausch Dienst (DAAD) and MHRD, Govt. of India without whose generous funding and support, the project would have not been possible. Last but not the least; I am greatly indebted to my parents Mr. Rajesh Agarwal and Mrs. Rama Agarwal and my uncle Mr. Naresh Agarwal who brought me up to this position inspiring and supporting my pursuits and are always my guides to step ahead and I also thank all my family members as their contribution in whatever I have achieved till date is beyond expression.

I would like to thank everybody who was important to the successful realization of my thesis, as well as expressing my apology that I could not mention them individually.

JUNE, 2015

ANKIT AGARWAL

iii

MY FAMILY My Dear Son,

The way you think and the decisions you make will come with more confidence and nothing will ever be too hard to tackle. Always approach problems with an open heart and mind, embrace all things new – this will further enhance your knowledge and strengthen your capabilities in life. To be a leader you need courage and you have already demonstrated that. In whatever role you choose, you can always count on your family for love, guidance and support. Take these and any encouragement that come your way and put them to good use. We know you can do it – we believe in you but most importantly, you have to believe in yourself. You are a beautiful soul, and we are so proud and blessed to have you as our son and as our friend forever.

God bless you!!

iv

ABSTRACT Catchment regionalization is an important step in estimating hydrologic parameters of ungauged basins. This thesis work proposes a multiscale entropy method using wavelet transform and Kmeans based hybrid approach for clustering of hydrologic catchments. Multi-resolution wavelet transform of a time series reveals structure which is often obscured in streamflow records, by permitting gross and small features of a signal to be separated. Wavelet-based Multiscale Entropy (WME) is a measure of randomness of the given time series at different timescales. In this study, streamflow records observed during 1951–2002 at 530 selected catchments throughout the United States are used to test the proposed regionalization framework. Further, based on the pattern of entropy across multiple scales, each cluster is given an entropy signature which provides an approximation of the entropy pattern of the streamflow data in each cluster. In the second part of the thesis work a case study is presented on western United States using wavelet power spectrum technique coupled with clustering technique. For this objective, streamflow records observed during 1951–2002 at 117 selected catchments in the western United States are used. Based on the Wavelet power spectrum and Global wavelet spectrum each stream flow station is presented into 4 clusters. The test for homogeneity and discordancy measure reveals that the proposed approach works very well in regionalization. Keywords-Hydrologic regionalization, Ungauged catchments, Wavelet Transform, K-means clustering, Global wavelet spectrum, Multiscale entropy.

v

TABLE OF CONTENTS CERTIFICATE……………….………………………………………………………..…....……i ACKNOWLEDGMENTS..……………………………………………………….……………..ii ABSTRACT……………………………………...………………………………..……………..iv TABLE OF CONTENTS………………………………………………………….…………….v LIST OF FIGURES…………………………………………………….…………….………...vii LIST OF TABLE……………………………………….……………………………………….ix LISTOF ABBREVIATIONS…………………………………….………………………….…..x

1.1

General ............................................................................................................................. 1

1.2

Motivation of Study ......................................................................................................... 4

1.3

Objective .......................................................................................................................... 4

1.4

Contribution ..................................................................................................................... 5

1.5

Outline of Thesis .............................................................................................................. 5

2.1

Regionalization Studies: ................................................................................................... 6

2.2

Time-Frequency Analysis ................................................................................................ 8

2.2.1

Short Term Fourier Transform.................................................................................. 8

2.2.2

Wavelet Studies: ....................................................................................................... 8

2.3

Entropy Studies .............................................................................................................. 13

2.3.1 2.4

Concept of Entropy ................................................................................................. 14

Clustering Technique: .................................................................................................... 14

2.4.1

Hierarchical Clustering ........................................................................................... 15

2.4.2

Partitioning Clustering ............................................................................................ 17

2.5

Cluster Validity Indices .................................................................................................. 18

2.5.1

External Validity Indices ........................................................................................ 18

2.5.2

Internal Validity Indices ......................................................................................... 19

2.6

Principal Component Analysis ....................................................................................... 22 vi

2.7

Regional Homogeneity test ............................................................................................ 22

2.8

Discordancy Measure ..................................................................................................... 25

2.9

Conclusion...................................................................................................................... 27

4.1

Catchment Signature ...................................................................................................... 30

4.2

Study region and data explanation ................................................................................. 30

4.3

Methodology I ................................................................................................................ 32

4.3.1

Wavelet power spectrum......................................................................................... 33

4.3.2

K-means clustering ................................................................................................. 36

4.4

Results ............................................................................................................................ 39

4.5

Discussion ...................................................................................................................... 42

4.6

Conclusion...................................................................................................................... 48

5.1

Study region and data explanation ................................................................................. 50

5.2

Methodology II............................................................................................................... 52

5.2.1

Multiscale Entropy .................................................................................................. 53

5.2.2

Comparison of WME and GWS coefficient ........................................................... 56

5.2.3

K-Mean clustering .................................................................................................. 57

5.3

Results ............................................................................................................................ 58

5.4

Discussion ...................................................................................................................... 69

vii

LIST OF FIGURES Figure 2.1 Discrete Wavelet Decomposition ................................................................................ 11 Figure 3.1 USA stream flow station ............................................................................................. 28 Figure 4.1 Map of the western USA and location of 117 stream flow station selected for study 31 Figure 4.2 Schematic for methodology implemented in the study ............................................... 33 Figure 4.3 Wavelet power spectrum and Global power spectrum of stream flow time series from a sample station................................................................................................................................ 34 Figure 4.4 Variation of 𝑹𝑺𝑺_𝒎𝒊𝒏 with number of cluster ........................................................... 37 Figure 4.5 Figure showing Stream flow station lying in particular cluster ................................. 38 Figure 4.6 Average silhouette value against number of cluster .................................................... 39 Figure 4.7 Plot shows the quality of cluster .................................................................................. 40 Figure 4.8 Number of heterogeneous stations in the respective cluster as a result of regional homogeneity test ........................................................................................................................... 42 Figure 4.9 Wavelet power spectrum for sample station in cluster 1 ............................................. 43 Figure 4.10 Wavelet power spectrum for sample station in cluster 2 ........................................... 43 Figure 4.11 Wavelet power spectrum for sample station in cluster 3 ........................................... 44 Figure 4.12 Wavelet power spectrum for sample station in cluster 4 ........................................... 44 Figure 4.13 Box plot for drainage area of stream flow station in each cluster ............................. 46 Figure 4.14 WPS of sample stations selected for explanation belonging to cluster 1 .................. 46 Figure 4.15 WPS of sample stations selected for explanation belonging to cluster 2 .................. 46 Figure 4.16 WPS of sample stations selected for explanation belonging to cluster 3 .................. 47 Figure 4.17 WPS of sample stations selected for explanation belonging to cluster ..................... 47 Figure 4.18 Global wavelet spectrum for sample stations selected from cluster 1 (a), (b),(c),(d), cluster 2 (e),(f), cluster 3 (g), (h), (i),(j), cluster 4 (k),(l) ............................................................ 48 viii

Figure 5.1 Selected USA stream flow station for Methodology II ............................................... 52 Figure 5.2 Schematic for methodology implemented in the study .............................................. 53 Figure 5.3 Plot shows the variation of MWE with the scale of decomposition............................ 54 Figure 5.4 Illustration for multiscale entropy. Top Figure: Plot of the synthetic time series. Bottom Left: Plot of the wavelet coefficients Bottom Right: Plot of entropy values across different scales. ....................................................................................................................................................... 56 Figure 5.5 Normalized Wavelet-Entropy and GWS coefficient for sample station ..................... 57 Figure 5.6 Variation of 𝑹𝑺𝑺_𝒎𝒊𝒏 with number of cluster ........................................................... 58 Figure 5.7 Wavelet power spectrum and Global power spectrum of stream flow time series for the station (01010500) in St. John River at Dickey, Maine ................................................................ 59 Figure 5.8 Validation indices for selection of optimum number of clusters ................................ 59 Figure 5.9 Cluster- wise geographical distribution of stream flow stations ................................. 60 Figure 5.10 Multiscale entropy values for five selected clusters: (a) Cluster 2; (b) Cluster 5; (c) Cluster 8; (d) Cluster 9; and (e) Cluster 12. .................................................................................. 63 Figure 5.11 Discordancy measure test for cluster 1 to 4............................................................... 65 Figure 5.12 Discordancy measure test for cluster 5 to 8............................................................... 66 Figure 5.13 Discordancy measure test for cluster 9 to 12............................................................ 67 Figure 5.14 Discordancy measure test for cluster 13 and 14 ........................................................ 68 Figure 5.15 Variation of WME across scales for all stations in Cluster 5 .................................... 70 Figure 5.16 Comparison of WME (Normalized) for each scale for all clusters ........................... 71 Figure 5.17 Statistical Properties of Cluster ................................................................................. 73 Figure 5.18 Box plot for drainage area of stream flow stations in all clusters ............................. 74

ix

LIST OF TABLE Table 2.1 Literature review ............................................................................................................. 6 Table 2.2 Dissimilarity measure for computing distance between cluster centroids, or feature vectors ........................................................................................................................................... 16 Table 2.3 External validity indices ............................................................................................... 19 Table 2.4 Internal validity indices ................................................................................................ 22 Table 2.5 Heterogeneity Measure ................................................................................................. 25 Table 2.6 Critical values for Discordancy statistics...................................................................... 26 Table 4.1 Characteristics of Stream flow data selected for study ................................................. 32 Table 4.2 Number of stations in each cluster ................................................................................ 38 Table 4.3 Western United States Stream flow Station belonging to specific cluster ................... 39 Table 4.4 Interpretation of Average silhouette value.................................................................... 40 Table 4.5 Number of Heterogeneous station in each cluster ....................................................... 41 Table 4.6 Cluster attributes ........................................................................................................... 45 Table 4.7 Sample stations properties ............................................................................................ 45 Table 5.1 Characteristics of Stream flow data selected for study ................................................. 51 Table 5.2 Number of stations in each cluster ................................................................................ 60 Table 5.3 USGS Stream flow station in their respective cluster................................................... 61 Table 5.4 Number of Discordant sites .......................................................................................... 69 Table 5.5 Entropy signature of all clusters ................................................................................... 72

x

LIST OF ABBREVIATIONS

AR

Auto regressive

CCA

Canonical Correlation Analysis

CLARA

clustering large application

CLARANS

clustering large application based on random search

CWT

Continuous wavelet transform

DB

Davies-Bouldin

DM

Discordancy measure

DU

Dunn’s index

DWT

Discrete wavelet transform

EM

Euclidian measure

EOF

empirical orthogonal function

EOF

Empirical orthogonal function

FCA

Fuzzy clustering algorithm

FM

Fowlkes-Mallows index

GAMLSS

Generalized Additive Model in Location, Scale and Shape

GWS

Global wavelet spectrum

HM

Heterogeneity measure

ICA

Independent component analysis

LVR

Local variance reduction

MDS

Multidimensional scaling

ME

Multiscale entropy

PAM

Partitioning around medios

PCA

Principal component analysis xi

PUB

Prediction in ungauged basin

RSS

Residual sum of square

SI

Silhouette index

SOM

Self-organizing map

STFT

short term Fourier transform

WME

Wavelet multiscale entropy

WPS

Wavelet power spectrum

WRF

Weather Research Forcasting

WT

Wavelet transform

xii

INTRODUCTION

1.1 General Estimates of streamflow are a prerequisite for solving a number of engineering and environmental problems. These include design or dimensioning a water control structure, economic evaluation of flood protection projects, land use planning and management, water quality control, and stream habitat assessment, among others. When the availability of streamflow records is limited at the site of interest, it is a common practice to apply regionalization techniques to derive the streamflow quantile estimates at the sites where records are limited or in ungauged catchments (Kokkonenetal.,2003). Regionalization can be defined as the transfer of information from one catchment to another (Bloschl and Sivapalan,1995). This information may comprise characteristics describing hydrological data or models. To have a greater confidence in extrapolating hydrological behavior from catchments with flow records to an ungauged catchment, all these catchments should form a relatively homogeneous group (Pilgrim, 1998; Nathan and McMahon, 1990; Post and Jakeman, 1999).The homogeneity is not only in terms of geographic contiguity but also in terms of hydrologic similarity. Some of the common approaches for regionalization in hydrology include: the method of residuals (MOR) (Choquette, 1988), the region of influence (ROI) approach (Zrinji and Burn, 1994, 1996), principal component analysis (PCA) (Singh et al., 1996), and cluster analysis and its extensions (Rao and Srinivas, 2006a, b; Isik and Singh, 2008; Srinivas et al., 2008; Satyanarayana and Srinivas, 2011); see also Razavi and Coulibaly (2013) for a review of regionalization methods for streamflow prediction in ungaged basins and Sivakumar et al. (2015) for a comprehensive account of catchment classification more broadly. Nathan and McMahon (1990) used a combination of multiple regression, cluster analysis, principal component analysis and graphical 1

representation of eighteen physical catchment variables for predicting the low-flow characteristics of 184 catchments in south-eastern Australia. Notwithstanding their ability to provide reasonable outcomes, these approaches have an important disadvantage in that they mainly rely on the preconceived notion of the factors that are thought to influence the behavior of the streamflow from a catchment and that these factors are measurable. In reality, however, the streamflow is a resultant of integrated effects of many factors, such as topography, lithology, climate, etc. (Yadav et al., 2007), which are occurring at a whole range of time (and space) scales. Therefore, it would be more appropriate to analyze the streamflow across different scales and group the catchments using their signatures. In recent years, wavelet analysis has become a common tool for analyzing intermittent behavior and sharp events within a geophysical time series (Torrence andCompo, 1998; Smith et al., 1998; Labat et al., 2005).By decomposing a time series into time-frequency space, it is possible to determine both the dominant modes of variability and how those modes vary in time. Hence, wavelet transform proves to be a useful tool for analyzing localized variations of power within a time series. Many studies have demonstrated the utility of wavelet analysis in regionalization. For example,Saco and Kumar (2000) used wavelets with rotated principal component analysis of the wavelet spectra, to cluster streamflow stations in the United States. A similar approach using kmeans clustering was adopted by Zoppou et al. (2002) to regionalize 286 catchments throughout Australia. They used the wavelet power spectra as the characterizing variable for the cluster analysis and the linear Pearson’s correlation coefficient for measuring the degree of similarity between the clusters.The results revealed the capability of wavelets in quantifying the temporal variability of streamflow and, thereby, aiding in regionalization of different catchments.

2

Even though wavelet power spectrum has successfully been used for capturing the streamflow behavior, it becomes difficult to use wavelet spectrum in case of limited data or incomplete data. In such situations, the concept of entropy theory can be applied, as it enables determination of least-biased probability distribution with limited signal knowledge and data. Entropy theory can serve as a better approach to determine risk and reliability associated with hydrological and meteorological processes (Singh, 1997).Numerous studies have used the entropy concept to study a wide variety of problems in hydrology and water resources. Singh et al. (1987) presented new perspectives for potential applications of entropy theory in water resources. A historical perspective on entropy applications in water resources was presented by Singh and Fiorentino (1992). Harmancioglu et al. (1992) discussed the use of entropy in water resources, especially for the design and evaluation of water quality monitoring network design. Comprehensive reviews of the applications of entropy theory in hydrology and water resources are available in Singh (1997, 2011), among others. The concept of entropy when applied in conjunction with wavelet analysis can be used to determine the randomness of a time series at different timescales. To this end, the Wavelet-based Multiscale Entropy (WME), a measure of the degree of order/disorder of the signal and carries information associated with multi frequency signal, can provide useful information about the underlying dynamical processes associated with the signal and can help in regionalization studies (Cazelles et al., 2008). This provides the motivation for the present study to develop a robust regionalization tool based on WME. In this study, the WME method is applied to monthly streamflow data observed at 530 stations in the contiguous United States. Continuous Wavelet Transform is applied to each of the observed streamflow time series using the Morlet wavelet to capture the temporal multiscale variability of the streamflow in the form of wavelet coefficients. 3

These wavelet coefficients for each scale are utilized to obtain the entropy for the respective scales. The spectral organization of this multi-spectral variability in terms of WME is identified using kmeans clustering.

1.2

Motivation of Study

Problem such as dimensioning a dam or bridge structure  flood control  water quality control  stream habitat assessment  And providing boundary conditions for models dealing with atmospheric general circulation etc. Require stream flow estimation, when no flow records of sufficient length are available at the site of interest. Also, Still we have not established criteria for regionalization due to scarcity of data, subjectivity involved in selecting of attributes, model parameter, weights, threshold value and distance measure.

1.3 Objective The primary objective of the study are as follows: 1. Investigation on Hydrologic Regionalization using wavelet power spectrum technique-A case study in western United States 2. Hydrologic Regionalization using wavelet based Multi-scale entropy

4

1.4 Contribution This study focuses on developing a robust regionalization tool based on Wavelet based Multiscale Entropy (WME). For this study, observed monthly stream flow data observed at 530 stations throughout United States were selected and Continuous Wavelet Transform is applied on each of the observed streamflow time series using Morlet wavelet to capture the temporal multi-scale variability of the stream flow in form of wavelet coefficients. These wavelet coefficients for each scale are utilized to obtain entropy for respective scales. The spectral organization of this multi spectral variability in terms of wavelet based multiscale entropy is organized using k means clustering. Results indicate the existence of fourteen clusters which are not based on any priory assumption of geographical contiguous area.

1.5 Outline of Thesis The study that follows, begins with a description of literature review and a brief discussion on techniques and indices used in the past studies, followed with brief description of study area and dataset in section 3. Section 4 present the complete study of objective 1 along with the result and discussion. Similarly section 5, describes the results and discussion of objective 2. Finally, section 6 present some of the important conclusion drawn from the present study and section 7 proposed potential research area for further investigation.

5

LITERATURE REVIEW

2.1 Regionalization Studies: Regionalization can be carried out by various techniques. Some of them are discussed below in tabular format: Table 2.1 Literature review Author

Purpose classification

Ssegane et al. (2012)

Classification techniques

Study area

Major findings

Flow predictions in ungauged watersheds

K-means clustering using: geographic proximity; watershed hypsometry; causal selection algorithms; PCA and stepwise regression

Three Mid Atlantic Eco regions within USA

Classification performance was highest using causal algorithms

Prinzio M. D. et al. (2011)

Predicting stream flow indices i.e. mean annual runoff, mean annual flood, and flood quintiles in ungauged watersheds

SOM on the available catchment descriptors and derived variables obtained by applying PCA and Canonical Correlation Analysis (CCA)

~300 Italian catchments scattered nationwide

PCA and CCA on the available catchment descriptors before applying SOM improve the effectiveness of classifications

He et (2011)

Set up and test a nonparametric catchment classification scheme

Multidimensional scaling (MDS) and local variance reduction (LVR) using hydrologic model performance as a measure of similarities

27 Catchments in Germany

The scheme is potentially useful for prediction in ungauged watersheds and provides an alternative to conventional regressionbased regional approaches

Mwale et al. (2011)

Regionalize runoff variability and establish baseline pre disturbance hydrologic regimes

Wavelet, independent component analysis (ICA), and empirical orthogonal function (EOF) analysis

59 Stations of catchment runoff data in Alberta, Canada

ICA identified hydrologic clusters that agree better with the five eco regions of Alberta

Sawicz et al. (2011)

Understanding hydrologic similarity in a 6-dimensional signature space

A Bayesian clustering applied on 6 hydrological signatures including: runoff ratio, base flow index, snow day ratio, slope of the flow duration curve, stream flow elasticity, and rising limb density

280 catchments located in the Eastern US

Identification of nine clusters with a relatively clear separation which suggests that spatial proximity is a good indicator of similarity

al.

of

6

Kahya et al. (2008)

Delineating the geographical zones having similar monthly stream flow variations

Hierarchical clustering applied to stream flow data

80 Watersheds in Turkey

The zones having similar stream flow pattern were not overlapped well with the conventional climate zones of Turkey

Stainton and Metcalfe (2007)

To identify reference watersheds in Ontario for understanding the ecological significance of hydrological variability

Classify the full range of flow variability using five components of the natural flow regime: the timing, magnitude, duration, frequency and rate-of-change

135 watersheds in the province of Ontario, Canada

Cluster analysis using mean monthly hydrographs identified a total of 8 hydroclimatic groups and FDC identified 13 groups

Rao and Srinivas (2006)

Estimation of flood Quantiles in ungauged watersheds

Fuzzy clustering algorithm (FCA) on attributes and flow records

245 gauging stations in Indiana, USA

FCA derives homogeneous regions, effective for flood frequency analysis

Chiang et al. (2002b)

Stream flow estimation in Ungauged watersheds

Discriminant Analysis and PCA using 16 parameters of Stream flow time series

94 watersheds in Alabama, Georgia, and Mississippi (USA)

The 6 regions seem to be separated by physiographical boundaries and regional membership is mainly identified by some of the watershed variables

Cavadias et al. (2001)

Estimation of flood characteristics of ungauged watersheds

Canonical correlation Analysis

20 Watersheds in Ontario, Canada

The homogeneous regions determined in the canonical space of the flood variables are based on relationship between the watershed and flood variable

Burn and Boorman (1993)

Estimation of rainfallrunoff model parameters in ungauged watersheds

K-means clustering on flow response variables and determination of group membership based on catchment attributes

99 Catchments, UK

Methods are effective in estimating the unit hydrograph time to peak and standard percentage runoff

Natahan and McMahon (1990)

Prediction of low flow characteristics which can be used in ungauged watersheds

Cluster analysis, multiple regression, Principal Component Analysis

184 Catchments in southeastern Australia

Use of watershed characteristics makes the grouping very sensitive to the initial choice of predictor variables

7

Acreman and Sinclair (1986)

Flood frequency analysis

Multivariate clustering algorithm applied on 11 watershed variables

168 Watersheds in Scotland

Four of the five identified regions yield homogenous distributions of flood frequency

2.2 Time-Frequency Analysis 2.2.1

Short Term Fourier Transform

The decomposition of signal or time series into time frequency domain permits the identification of the dominants modes of variability and how these modes vary in time. Time frequency analysis can be done either using short time Fourier transform or multi resolution analysis. The only difference between short time Fourier transform and multi resolution analysis is that in multi resolution analysis every spectral component is not resolved equally as was the case in short time Fourier transform. In Short time Fourier transform the signal is divided into small length segments, where these segments can be assumed to be stationary. For this purpose, a window function ‘’w’’ is chosen. The width of window must be equal to the segment of the signal where its stationary is valid. 𝑆𝑇𝐹𝑇𝑋⍵ (𝑡, 𝑓) = ∫ [𝑥(𝑡) ∗ ⍵∗ (𝑡 − 𝑡 ′ )] ∗ 𝑒 −𝑗2𝛱𝑓𝑡 𝑑𝑡

(2.1)

The limitation of short term Fourier transform is that, it is unable to provide accurate time frequency localization also unable to perform well on non-stationary signal or irregular spaced events. (Smith et al., 1998). A major advantage of using the wavelet transform over the short term Fourier transform is scale independent (Kaiser, 1994) and hence there is no need to determine scale or response interval, which would limit the frequency range. Also spectral component is not resolved equally as in case of short term Fourier Transform. (Paulin coulibaly et al., 2003) 2.2.2 Wavelet Studies: The approach described in this study utilizes the concept of multi- resolution analysis using wavelet transform which is useful in extracting the underlying information of a time series at 8

different time- frequency scales. Extensive literature is available on wavelet based models for a diverse set of problems in hydrological modeling like monsoonal flood forecasting drought forecasting (Kim and Valdes, 2003), streamflow analysis (Admowski, 2008; Coulibaly and Burn, 2004; Kucuk and Agiralioglu, 2006; Smith et al 1998), precipitation analysis (Kim, 2004; Lu, 2002; Partal and Kisi, 2007), rainfall- runoff relationship (Labat et al., 2000), prediction of river discharge (Zhou et al., 2008); analysis of suspended sediment load (Rajaee et al., 2010); estimation of unit hydrographs (Chou and Wang, 2002) and various other hydrological predictions (Wang and Ding, 2003). Some researchers have been successful applied wavelets in some aspect of climatic downscaling. Cai, 2009 demonstrated Wavelet and Bayesian Methods in Multi-model Ensembles for climatic downscaling. A statistical approach was used to calculate the distributions of future climate change based on an ensemble of the Weather Research and Forecasting (WRF) models. Wavelet analysis was then employed to de-noise the WRF model output before carrying out Bayesian analysis to decrease uncertainties in model. Rashid at. al. 2014 used wavelet coherence to identify predictor variables for hydro- climatic variables and proposed wavelet coupled Generalized Additive Model in Location, Scale and Shape (GAMLSS) models for downscaling rainfall. A good illustration on wavelet analysis can be found in the books of Rao (2004). More recently, wavelet algorithms which has the ability to process data at different scales or resolutions thereby permitting gross and small features of a signal to be separated is used. Saco and Kumar (2000) were able to use wavelets with rotated principal component analysis of the wavelet spectra, to cluster stream flow stations in the United States. They found that over 89% of the stream flow variability could be explained by only three distinct spectra modes. A similar approach using k-means clustering was adopted by (Zoppou et al., 2002) to regionalize 286 catchments throughout Australia and the results revealed the capability of wavelets in quantifying the temporal variability of stream flow and thereby aiding in regionalization of different catchments. In the above studies the authors have used the wavelet power spectra as the characterizing variable for the cluster analysis. However, it is to be understood that these methods use the linear Pearson's correlation coefficient for measuring the degree of similarity between the clusters. Therefore, it is considered that the entropy or degree of disorder to be more reliable variable/factor in forming the clustering.

9

2.2.2.1 Discrete Wavelet Transformation Determining wavelet coefficients at every possible scale is an enormous task. Moreover, actual flow data are discrete in nature and measured at specific time intervals and in such cases, discrete wavelet transform is found more suitable. Normally, DWT uses dyadic scheme of wavelet decomposition where alternate scale and position is adopted for calculating transform coefficients, thereby, reducing the computation burden. Discrete wavelet transform (DWT) enables to achieve the time-frequency localization and multi-scale resolution of a signal by suitably focusing and zooming around the neighborhood of one's choice (Nanavati and Panigrahi, 2004). For a discrete time series, 𝑥𝑖 , with integer time steps, DWT in the dyadic decomposition scheme is defined as N 1

Tm,n  2 m /2 xi (2 m i  n)

(2.2)

i 0

Where 𝑇𝑚,𝑛 is the discreet wavelet coefficient for scale a=2m and location b =2m n, m and n being positive integers; N is the data length of the time series which is an integer power of 2, i.e., N=2M. This gives the ranges of m and n as 0 < n < 2M-m -1 and 1 < m < M, respectively. This implies that only one wavelet is needed to cover the time interval producing only one coefficient at the largest scale (i.e., 2m where m=M). At the next scale (2m-1), two wavelets would cover the time interval producing two coefficients, and so on till m=1. Thus, the total number of coefficients generated by DWT for a discrete time series of length N = 2M is 1+2+3+…+2m-1 = N-1 (Nourani et al., 2009). The process consists of a number of successive filtering steps in which the time series is decomposed into approximation (A) and detail sub-time series or wavelet components (D1, D2, D3, etc). Approximation component represents the slowly changing coarse features of a time series and are obtained by correlating stretched version (low-frequency and high-scale) of a wavelet with the original time series, while detail components signify rapidly changing features of the time series and are obtained by correlating compressed wavelet (high-frequency and low-scale) with the original time series.

10

Figure 2.1 Discrete Wavelet Decomposition

2.2.2.2

Continuous Wavelet Transform

The Continuous Wavelet Transform (CWT) 𝑊 𝑛 of a discrete sequence of observations 𝑥𝑛 is defined as the convolution of 𝑥𝑛 with a scaled and translated wavelet 𝛹(𝑛) that depends on a nondimensional time parameter ɳ with zero mean and localized in both frequency and time (Farge,1992; Torrence and compo, 1998). ∗ 𝑊𝑛 (𝑠) = ∑𝑁−1 𝑛′ 𝑥𝑛′ 𝛹 [

(𝑛′ −𝑛)𝛿𝑡 𝑠

]

(2.3)

Where n is the localized time index, s is the wavelet scale, 𝛿𝑡 is the sampling period, N is the number of points in the time series, and asterisk indicate the complex conjugate. By varying the wavelet scale s and translating along the localized time index n, one can construct a picture showing both, amplitude of any features versus the scale and how this amplitude varies with time.

11

The choice of wavelet function 𝛹(𝑛) is neither unique nor arbitrary. The analyzing wavelet may be chosen from one of several functions having certain admissibility requirements. Farge (1992) describes properties of a function is must possess if it is to be called wavelet transform function. These include: (i)

Admissibility, the average of an integrable function should be zero. Similarity, the scale decomposition should be obtained by the translation and dilation of only one mother function.

(ii)

Invertibility, the function should have at least one reconstruction formula for recovering the signal exactly from the wavelet coefficients and for allowing the computation of energy from them.

(iii)

Cancellations, the function should have some vanishing high-order moments, which allows the elimination of the most regular part of the signal, allowing the study of higher-order fluctuations and possible singularities in some higher-order derivatives. (Torrence and compo, 1998, Christopher zoppu. et al., 2002)

Fourier transformation have only one set of basis function while wavelet analysis have an infinite number of possible basis function, (Holscheider, 1998). For e.g. wavelets used for continuous wavelet transformation are the Marr wavelet 2

t  2 .25 o  n   1  t 2  e 2 3

(2.4)

Which is real wavelet and also known second derivative of Gaussian (m=2). The Morlet wavelet Which is a complex and non-orthogonal wavelet defined as: 2

𝛹𝑜 (ɳ) = 𝛱 −.25 𝑒 𝑖ɳ⍵0 𝑒 −.5ɳ

(2.5)

To obtain a good result, the wavelet function selected should have some resemblance with the analyzing signal. Here some of the factors are listed, which should be considered while selecting suitable wavelet function (Torrence and comp, 1998). These factors are orthogonal and nonorthogonal, complex and real, width and shape. Orthogonal wavelet recommendable for signal processing as it gives the most compact representation and total energy remain conserved. Non orthogonal wavelet transform is recommendable for tie series analysis where smooth, continuous variation in wavelets amplitude are expected. A complex wavelet having both real and complex 12

Part is useful for capturing oscillatory behavior while in real wavelets complex part is zero hence useful for isolate peaks and discontinuities. A narrow wavelet function will give good time resolution but poor frequency resolution while a broad wavelet function will give poor time resolution but good frequency resolution. As mentioned earlier wavelet function should reflect the type of features present in the signal time series.

To display the result of wavelet transform the most common way is to plot the amplitude of the wavelet coefficient obtained|𝑤𝑛 (𝑠)| (Chan 1995; Farge 1992; Meyer et al. 1993). The shortcoming of this method is that it is not directly comparable to Fourier spectrum. To resolve this limitation and for direct comparison of spectra, we used instead the amplitude squared spectrum|𝑊𝑛 (𝑠)|2 i.e. wavelet power spectrum. The important point should kept in mind while dealing with wavelet power spectrum is that the choice of wavelet function seems to have a very significant influence on decomposition but very little influence on energy spectra.

2.3 Entropy Studies However, data scarcity is generally encountered in hydrology and meteorology. In such situations, decision are made on generally on the basis of thumb rules, crude analysis, experiences, professional judgment, safety measure and probabilistic methods (Kokkonen,et. al., 2003; V. P. Singh, 1997) . Among these handful of methods, the probabilistic method allows for a more explicit and quantitative account of uncertainty. However, the limitation comes in form of limited data or incomplete data. Having a small sample size and limited signal information furnish estimation of system variable of probability distribution with conventional method quite difficult. These type of problem can be better deal with the concept of entropy theory, which enables determination of least biased probability distribution with limited signal knowledge and data. Entropy theory can serve as a better approach to determine risk and reliability associated with hydrological and meteorological process. (V. P. Singh, 1997) In the past, authors have used this concept for several problems in hydrology and water resources research. Rajagopal et al., 1987presented new perspectives for potential applications of entropy in water resources research. Singh, 1989 reported on hydrological modelling using entropy. A historical perspective on entropy applications in water resources was presented by (Singh and

13

Fiorentino, 1992. Harmancioglu et al., 1992) discussed the use of entropy in water resources. Alpaslan et al., 1992 discussed the role of entropy, and Harmancioglu et al., 1992 its application in design and evaluation of water quality monitoring networks.

2.3.1 Concept of Entropy If we are having a vector of data 𝑋(𝑖), to estimate an unknown probability density 𝑝(𝑥) of the data, a direct approach would be to build up the histogram of values 𝑋(𝑖), using a suitable interval 𝛥𝑥, counting up how many times 𝑚𝑘 each interval (𝑥𝑘 , 𝑥𝑘+𝛥𝑘 ) occurs among the N occurrences. Then the probability that a data value belongs to an interval k is 𝑝𝑘 =

𝑚𝑘 𝑁

and each data value has a

probability 𝑃𝑘 . (J.-L. starck et al., 1997) The entropy is defined as

S  X   pk ln( pk )

(2.6)

The quantity is referred as entropy of the system and given by Shannon in 1948. This measure has following property 1. Its value will be maximum when all the events have the same probability i.e. 𝑝𝑖 = 1/𝑁𝑒 where 𝑁𝑒 being the number of events and value isln(𝑁𝑒 ). This represent most undefined system. 2. Its value is minimum, when one event is sure. In this case system is perfectly known and no information can be added. 3. The entropy is positive, continuous and symmetric function.

2.4 Clustering Technique: There exist many clustering methods and the main reason for this is the fact that the notion of “cluster” is not precisely defined (Estivill-castro, 2000). Consequently many clustering methods have been developed, each of which uses a different induction principal. Jain and Dubes, 1988; Farely and Raftery, 1998 classified the clustering method in two main groups: hierarchical clustering and partitional clustering. One more classification, according to which clustering method classified into hard clustering and fuzzy clustering. Han and kamber, 2001 suggests categorizing the method into additional three 14

main categories: density based methods, model based clustering and grid based methods. A brief discussion of these methods is presented here: 2.4.1

Hierarchical Clustering

Hierarchical clustering procedure provide a nested sequence of partition, and can be subdivided into two main categories: agglomerative and Divisive. These methods construct the clusters by recursively partitioning the instances in either top-down or bottom-up-fashion.  Agglomerative hierarchical clustering- for a given N feature vector, the agglomerative hierarchical clustering procedure begins with N singleton cluster. Singleton cluster consist of only one feature vector in it. A distance measure listed in table e.g. Euclidean distance measure is chosen to evaluate the dissimilarity between two clusters. The cluster that are least dissimilar are found and merged. This process of identifying and merging continued till the desired number of cluster is obtained. (Rao and Srinivas, 2008)  Divisive hierarchical clustering- in this procedure begins with a single cluster, containing all the N feature vector. The feature vector that has greatest dissimilarity to other vectors of the cluster is then identified and separated to form a splinter group. This step divides the original cluster into groups. And this procedure continues till the desired number of cluster is obtained. (Rao and Srinivas, 2008) The results of hierarchical clustering is a dendrogram, which shows how the clusters that are formed at various steps of the process are related. A distance measure which is chosen to evaluate the dissimilarity between two clusters are tabulated in table 2.2. The hierarchical clustering measure further divided to the manner that the similarity measure is calculated (Jain et al., 1999).  Single linkage clustering (also termed as connectedness, the minimum method or nearest neighbour method) - the distance between two non-singleton clusters is the smallest of the distances between all possible pairs of feature vectors in the two clusters. (Sneath and Sokal, 1973)

15

 Complete linkage clustering (also called the diameter, the maximum method or the furthest neighbour method) - In complete linkage the distance between the new cluster and any other singleton cluster is the greater of the distances. (King, 1967)  Average linkage cluster (also called minimum variance method) - method that consider the distance between two clusters to be equal to the average distances from any member of one cluster to any other member of the other cluster. (Ward, 1963; Murtagh, 1984). Table 2.2 Dissimilarity measure for computing distance between cluster centroids, or feature vectors Distance measure

Equation

Euclidean

𝑛

√∑(𝑥𝑖𝑘 − 𝑥𝑗𝑘 )

2

𝑘=1 𝑛

Squared Euclidean

∑(𝑥𝑖𝑘 − 𝑥𝑗𝑘 )

2

𝑘=1

Mahalanobis distance

𝑇

√(𝑥𝑖 − 𝑥𝑗 ) ⁄ ∑(𝑥𝑖 − 𝑥𝑗 ) 𝑛

Manhattan or city block

∑ |𝑥𝑖𝑘 − 𝑥𝑗𝑘 | 𝑘=1 𝑛

Canberra

∑ 𝑘=1

|𝑥𝑖𝑘 − 𝑥𝑗𝑘 | |𝑥𝑖𝑘 | + |𝑥𝑗𝑘 |

max |𝑥𝑖𝑘 − 𝑥𝑗𝑘 |

Chebychev

1≤𝑘≤𝑛

Cosine

∑𝑛𝑘=1 𝑥𝑖𝑘 𝑥𝑗𝑘

1−

2 ∑𝑛 2 √∑𝑛𝑘=1 𝑥𝑖𝑘 𝑘=1 𝑥𝑗𝑘

Minkowski

𝑛

𝑡

1⁄ 𝑡

(∑|𝑥𝑖𝑘 = 𝑥𝑗𝑘 | ) 𝑘=1

In general, hierarchical methods are characterized with the following strengths:  Versatility- The single linkage methods, for example, maintain good performance on data sets containing non-isotropic clusters, including well-separated, chain like and concentric clusters.  Multiple partitions- Hierarchical methods produce not one partition, but multiple nested partition, which allow different users to choose different partitions, according to the desires similarity level. The hierarchical partition is presented using dendrogram. The main limitation with this method, can be summarized as:

16

 The main drawback is that the resulting clusters are usually not optimal because the feature vectors committed to a cluster in the early stages cannot move to another cluster.  Inability to scale well- the time complexity of hierarchical algorithm is at least 𝑂(𝑚2 ) (where m is the total number of instances), which is non-linear with the number of objects. 2.4.2 Partitioning Clustering Partitioning methods relocate the feature vector by moving them from one cluster to another starting from initial partitioning. Such method typically require to set desired number of cluster. An attempt is made to recover natural grouping present in the data through partition. The method is subdivided into K-mean and K-medoids methods. K-Medoids Clustering Method In K-medoids method, median of each cluster is considered as its representative. This has two main advantages. First, the method can be used for both numerical and categorical attributes and second, the choice of medoids is dictated by the location of a predominant fraction of data points inside a cluster and therefore it is less sensitive to the presence of outliers (Berkin, 2002). Example of this, PAM (partition around medoids), CLARA (clustering large application), CLARANS (clustering large application based on random search). Among them PAM is effective with small data set. K-Mean Clustering Method One of the most popular clustering algorithm is the k-means method in which algorithm partition the data into K clusters, each cluster is represented by its centroid, which is the mean (weighted or otherwise) of feature vectors within the cluster. The K-means algorithm may viewed as gradientdecent procedure, which begins with an initial set of k-clusters and iteratively updates it so as to decrease the error function. A rigorous proof of the finite convergence of the k-means type algorithm is given in (Selim and Ismail, 1984). The complexity of T iteration of the K-mean algorithm performed on a sample size of m feature vector, each characterized by N attributes, is 𝑂(𝑇 ∗ 𝐾 ∗ 𝑚 ∗ 𝑁).

17

 The linear complexity is one of the main reason of popularity of k-mean algorithm. Even if the number of instances is substantially large, this algorithm is computationally attractive. Hence this algorithm has advantages to other method which are having non-linear complexity.  This method is known for its efficiency in clustering large data set with numerical attributes. However it has limitation in clustering categorical data also sensitive to outliers.  Other reason, for popularity is its ease of interpretation, simplicity of implementation, speed of convergence and adaptability to sparse data.  K-mean algorithm having ability to make the difference between global and local optimum.  Being a typical partitioning algorithm, the K-mean algorithm works well only on data sets having isotropic clusters, and is not as versatile as single link algorithm, for instance.  Require number of cluster in prior, which is not known initially.  Algorithm is sensitive to noisy data and outliers, applicable only when mean is defined.

2.5 Cluster Validity Indices Selection of suitable number of clusters and the evaluation of clustering results is important in cluster analysis. It is the validity indices that are usually used to evaluate clustering results. Validity indices are classified as: 2.5.1 External Validity Indices External validity indices are the measures of the agreement between two partitions, one of which is usually a known/golden partition, e.g. true class labels, and another is from the clustering procedure.  Rand index  Adjusted Rand index  Jaccard index  Fowlkes-Mallows (FM) index

18

Table 2.3 External validity indices External Indices

Evaluation Criteria

Rand index

higher the score, better the solution [Dudoit et al. 2002; Halkidi 2001; Sharan et al. 2003]

Adjusted Rand index

higher the score, better the solution [Dudoit et al. 2002; Halkidi 2001; Sharan et al. 2003]

Jaccard index

higher the score, better the solution [Dudoit et al. 2002; Halkidi 2001; Sharan et al. 2003]

Fowlkes-Mallows (FM) index

higher the score, better the solution [Dudoit et al. 2002; Halkidi 2001; Sharan et al. 2003]

2.5.2 Internal Validity Indices Internal validity indices evaluate clustering results by using only features and information inherent in a dataset. They are usually used in the case that true solutions are unknown.  Silhouette index  Davies-Bouldin  Calinski-Harabasz  Dunn index  R-squared index  Hubert-Levin (C-index)  Krzanowski-Lai index  Hartigan index  Root-mean-square standard deviation (RMSSTD) index  Semi-partial R-squared (SPR) index  Distance between two clusters (CD) index  weighted inter-intra index  Homogeneity index  Separation index

19

Four validity indices namely, Davies-Bouldin (DB) index, Dunn’s index, Homogeneity and separation index, were used to identify the best number of clusters for the data. 2.5.2.1 Davies-Bouldin (DB) Davies-Bouldin (DB) index (Davies and Bouldin, 1979; Kasturi et al., 2003) and which are internal cluster evaluation criteria, are most popular and widely used in hydrology due to their ability to identify optimal number of cluster that are well separated and compact. DB is defined as a function of the ratio of the sum of within-cluster scatter to between cluster separations.

DB 

1 K

 diam  Ci   diam(C j )  max    j 1,., K ,i  j | Ci  C j | i 1   K

(2.7)

Where, in this case diameter of cluster is defined as:

1 diam  Ci    n  i



x  zi

xCi

2

  

1

2

(2.8)

With 𝑛𝑖 the number of points and 𝑍𝑖 the centroid of cluster𝐶𝑖 . Since the objective is to obtain cluster with minimum intra cluster distances, small value of DB is interesting.

2.5.2.2

Dunn’s index

Dunn’s index (Dunn, 1973; Bolshakova et al., 2003; Halkidi et al., 2001), which is also an internal cluster evaluation criteria defined as the ratio of minimal intra cluster distance to maximal inter cluster distance also. The Dunn’s index for K cluster is defined as:    diss(Ci , C j )   DU  min  min    i 1,, k j 11,., k  max m 1,.., k diam(cm )      

(2.9)

20

Where 𝑑𝑖𝑠𝑠(𝐶𝑖 , 𝐶𝑗 ) = 𝑚𝑖𝑛𝑥∈𝐶𝑖 ,𝑦∈𝐶𝑗 ||𝑥 − 𝑦|| is the dissimilarity between clusters 𝐶𝑖 and 𝐶𝑗 and 𝑑𝑖𝑎𝑚(𝐶) = 𝑚𝑎𝑥𝑥,𝑦∈𝐶 ||𝑥 − 𝑦|| is the intra-cluster function (or diameter) of the cluster. Large value of Dunn’s index is preferred as it represent well and compacted cluster.

2.5.2.3

Homogeneity and Separation Index

Homogeneity index is calculated as the average distance between each gene expression profile and the center of the cluster it belongs to. Mathematically it is represented as

H ave 

1 N gene

D( g ,C  g ) i

(2.10)

i

i

Where 𝑔𝑖 𝑖𝑠 𝑡ℎ𝑒 𝑖 𝑡ℎ gene and 𝐶(𝑔𝑖 ) is the center of the cluster that 𝑔𝑖 belongs to, 𝑁𝑔𝑒𝑛𝑒 is the total number of genes, D is the distance function. (Chen G. et al., 2002) Whereas Separation index is calculated as the weighted average distance between cluster centers. Which is defined as

Save 

1 i jNci Nc j

N i j

ci

N c j D(Ci , C j )

(2.11)

Where 𝐶𝑖 and 𝐶𝑗 are the center of 𝑖 𝑡ℎ and 𝑗 𝑡ℎ clusters and 𝑁𝑐𝑖 and 𝑁𝑐𝑗 are the number of genes in 𝑖 𝑡ℎ and 𝑗 𝑡ℎ clusters. Thus 𝐻𝑎𝑣𝑒 reflects the compactness of the clusters while 𝑆𝑎𝑣𝑒 reflects the overall distance between clusters. Decreasing 𝐻𝑎𝑣𝑒 or increasing 𝑆𝑎𝑣𝑒 suggests an improvement in the clustering results. (Chen G. et al., 2002)

21

Table 2.4 Internal validity indices Internal Indices

Evaluation Criteria

Silhouette index

a larger Silhouette value indicates a better quality of a clustering result [Chen et al. 2002]

Davies-Bouldin index

a low value indicates good cluster structures [Kasturi et al. 2003; Bolshakova et al. 2003]

Calinski-Harabasz index

it is the pseudo F statistic, which evaluates the clustering solution by looking at how similar the objects are within each cluster and how well the objects of different clusters are separated [Zhao et al. 2005]

Dunn index

large values indicate the presence of compact and well-separated clusters [Bolshakova et al. 2003; Halkidi et al. 2001]

R-squared index

large R-squared statistic indicates large difference between clusters [Halkidi et al. 2001]

Homogeneity Separation indices

&

improving Homogeneity & Separation suggests an improvement in clustering results [Sharan et al. 2003; Chen G. et al. 2002]

2.6 Principal Component Analysis PCA is applied to data in which orthogonal transformation is applied on set of correlated predictor variables producing principal components. Principal components are dimensionally reduced and uncorrelated to one another i.e. it reduces dimensionality and multi-collinearity. These components carry almost the same variability as that of the original data. Although this approach works well and had been used widely but has some limitations. The components or the coefficients are completely different from the original ones, so it is not possible to make out which original decomposed variables reflects the best with the observed data.

2.7 Regional Homogeneity test The homogeneity of regions obtained from cluster analysis is assessed statistically using homogeneity tests Examples of regional homogeneity tests include those proposed by Acreman and Sinclair (1986), Wiltshire (1986), Buishand (1989), Chowdhury et al. (1991), Lu and Stedinger (1992), Hosking and Wallis (1993, 1997), Fill and Stedinger (1995), Cunderlik and Burn (2006b), and Viglione et al. (2007). The L-moment based homogeneity test of Hosking and Wallis (1993) that is widely used by practicing hydrologists is described in this section.

22

Hosking and Wallis (1993) proposed heterogeneity measures that use the advantages offered by sampling properties of L-moment ratios. A discussion of L-moments is found in Hosking and Wallis (1997). One of the prime advantages of using L-moment based methods for testing homogeneity is that they avoid assumptions about the form of the underlying probability distribution of the observed data.

In a homogeneous region all sites are supposed to have the same population L-moment ratios. However, their sample L-moment ratios (LMRs: L-coefficient of variation (L-CV), L-skewness and L-kurtosis) may be different due to sampling variability. The regional homogeneity tests are developed to examine whether the between-site dispersion of the sample LMRs for the group of sites under consideration is larger than the dispersion expected in a homogeneous region. Suppose the region to be tested for homogeneity has 𝑁𝑅 sites, with site 𝑖 having record legth of peak flows𝑛𝑖 . Further, let 𝑡 𝑖 , 𝑡3𝑖 , 𝑎𝑛𝑑 𝑡4𝑖 denote L_CV, L-skewness and L-kurtosis respectively at site𝑖. The regional average represented by 𝑡 𝑅 , 𝑡3𝑅 𝑎𝑛𝑑 𝑡4𝑅 respectively, are computed as

  

NR

t

R

R 3

t

n ti

i 1 i NR

(2.12)

n i 1 i

 nt   n

(2.13)

 nt   n

(2.14)

NR i i 1 i 3 NR i 1 i

t

R 4

NR i i 1 i 4 NR i 1 i

𝑛 Where, 𝑖⁄ 𝑁𝑅 denotes the weight applied to sample LMRs at site𝑖, which is proportional to ∑𝑖=1 𝑛𝑖 sites record length.

Homogeneity measure are based on three measure of dispersion: 23

(i)

Weighted standard deviation of the at-site sample L-CVs (V)

(ii)

weighted average distance from the site to the group weighted mean in the two dimensional space of L-CV and L-skewness (V2);

(iii)

Weighted average distance from the site to the group weighted mean in the twodimensional space of L-skewness and L-kurtosis (V3).

(iv)

 N R n  t i  t R 2  V   i 1 i NR  i1ni  



V4 

2

 

2

(2.15)

 t i  t R   t3i  t3R 

 i1ni NR

V3 

1

NR

n i 1 i



2 2

(2.16)

NR

n i 1 i

t  t   t  t   R 2 3

i 3

i 4

2 R 2 4

(2.17)

 i1ni NR

In these dispersion measure, distance of sample LMRs for site 𝑖 from the regional average LMR is weighted proportionally to the record length of the site, thus allowing greater variability of LMRs for sites having small sample size in a region. Let 𝜇𝑉 , 𝜇𝑉2 𝑎𝑛𝑑 𝜇𝑉3 denote the mean and 𝜎𝑉 , 𝜎𝑉2 𝑎𝑛𝑑 𝜎𝑉3 the standard deviation of the N values of V, 𝑉2 𝑎𝑛𝑑 𝑉3 respectively. These statistics are used to estimate the following three heterogeneity measures (HMs):

H1 

H2 

(V  v )

(2.18)

V

(V2  V2 )

(2.19)

V

2

24

H3 

(V3  V3 )

(2.20)

V

3

A region can be defined as “Acceptably homogeneous” if HM1100m, non-stationary process at high scales, No outlier in drainage area box-plot

3

16

1

Although non-stationary process are distributed at all scales but Process are significant only at scale 1, No outlier in drainage area box-plot

4

29

1

Non-stationary process distributed at all scales but seems to influence WPS at lesser extent, 4 stations are outlier in drainage area box-plot

In this paper, as attempt has been made to relate GWS values at different scales based division to their respective catchment area. Figure 4.13 is represent the box plot of catchment area of stream flow station for each cluster. It is observed that, the stations which are showing high non-stationary process at high scales are having relatively low area. While for lower scales opposite relation is observed. Typically, it is believe that smaller catchment area (Table 4.7, cluster 2) will lead to more random and unstable flows leading to more uncertainty at high scales (Fig. 4.10). Similarly sample station of cluster 3 (area =164km2) and cluster 4 (area =182km2) having more uncertainty at high scale comparable to cluster 1 (area =1154km2) which is clear from fig. 4.9. However, result indicates that other factor also play a vital role in determining the wavelet power spectrum property of a catchment and hence demands deeper analysis which is kept out of scope of this paper. Table 4.7 Sample stations properties Station ID

State

Cluster Number

Area (km2)

Figure Number

06207500

MT

1

1,154.00

Fig. 8 (a)

11058500

CA

2

8.80

Fig. 8 (b)

06710500

CO

3

164

Fig. 8 (c)

06289000

MT

4

182.00

Fig. 8 (d)

45

Figure 4.13 Box plot for drainage area of stream flow station in each cluster

Figure 4.14 WPS of sample stations selected for explanation belonging to cluster 1

Figure 4.15 WPS of sample stations selected for explanation belonging to cluster 2

46

Figure 4.16 WPS of sample stations selected for explanation belonging to cluster 3

Figure 4.17 WPS of sample stations selected for explanation belonging to cluster

47

Figure 4.18 Global wavelet spectrum for sample stations selected from cluster 1 (a), (b),(c),(d), cluster 2 (e),(f), cluster 3 (g), (h), (i),(j), cluster 4 (k),(l)

4.6 Conclusion This study evaluate the ability of proposed technique i.e. regionalization based on wavelet power spectrum coupled with k-means clustering to regionalize the western United States watersheds into hydrologically similar clusters. Applying k-means on wavelet power spectrum of each watersheds and using minimum value of residual sum of square lead to four clusters that appear best number of clusters of watersheds in United States which is also follow the result of B. shivakumar et al., 2012. To validate the number of cluster average silhouette value is plotted which justify the selection of four number of cluster. Also plot of cluster quality shows that, there is significant

48

structure present in the cluster. The homogeneity of each cluster is evaluated using regional homogeneity test which present satisfactory result. B. sivakumar et al., 2012 selected similar stream flow station and applied correlation dimension method, which has its base, on data reconstruction and nearest neighbor concepts hence stream flow station are classified into four groups: low-dimensional, medium-dimensional, highdimensional and unidentifiable. According to this study, the dimension estimates show some “homogeneity” in flow complexity within certain regions of the western US, but there are also exist some strong exceptions.

Our technique somewhat follow the similar results of B. Sivakumar study but in different manner as four homogeneous group formed by B. sivakumar et al., 2012 is as: high-dimensional, mediumdimensional, low-dimensional and unidentifiable. Our study using wavelet power spectrum as criteria, broadly allocate some of the stations of medium-dimensional to either high-dimensional or low dimensional on the basis of their wavelet power spectrum i.e. either in cluster 1 or cluster 4 (fig.4.5). This indicates that there is no need to classify station into medium-dimensional as some of the stations are hydro climatically similar to high-dimensional and some of them are similar to low-dimensional. Or here we can state the classifying criteria into dimensionality (i.e. high dimensional or medium dimensional etc.) can be redefined. Similar case is happening for unidentifiable stations, i.e. stream flow stations falling into unidentifiable group in B. sivakumar et al., 2012 study are either merged with cluster 1 or cluster 4 and those stations which are behaving quite differently are lying in cluster 3.

Overall, in average the investigated technique for regionalization watershed into homogeneous cluster proves to be superior to other techniques proposed as no assumption involved also robust to deal with data scarcity. Wavelet power spectrum coupled with k-means clustering technique capture the variability of streamflow dynamics at each station independently and then allows the formation of homogeneous cluster, which are not based on any priory assumptions. This observation has very important implication for prediction in ungauged basin. Further, to improve the homogeneity the various operation as Discordancy measure can be performed on the heterogeneous stations as suggested by Rao and Srinivas, 2008 which is kept out of scope of this paper and further analysis can be done in future. 49

HYDROLOGIC REGIONALIZATION USING WAVELET BASED MULTISCALE ENTROPY METHOD

5.1 Study region and data explanation The stream flow records for the United States portion of study area are selected from the U.S Geological survey (USGS) Hydro Climatic Data Network (HCDN). The HCDN dataset contain stream flow observation from U.S. Geological survey Stream gauges that are considered to be relatively unaffected by anthropogenic influences, land use changes, measurement changes and measurement error.. In this study, monthly streamflows from the whole United States (US) are studied, with data collected over an extensive network of 530 gaging stations (see Fig.5.1).

The stations are spread over 50 states in the western US: Arizona (AZ), California (CA), Colorado (CO), Idaho (ID), Montana (MT), Nevada (NV), New Mexico (NM), Oregon (OR), Utah (UT), Washington (WA), and Wyoming (WY) etc. Streamflow data in the US are commonly expressed in “water years”, which commence in October. The records used in this study are those observed over a period of 52 year, starting in October 1951 and ending in September 2003, and are average monthly streamflow values. The magnitude of streamflow varies greatly among the 530 stations (e.g. even during the same period) as well as within a station (e.g. at different periods).

The drainage areas range from as small as 50 km2 to as large as 2000 km2. Streamflow data in the US are commonly expressed in “water years”, which commence in October. Notable observations of the flow variations (during the 52-yr period of 1951–2002) are as follows:

50

Table 5.1 Characteristics of Stream flow data selected for study Dataset

Characteristics

Total number of stream flow station selected for study

530 catchments

Data type

Monthly data

Time period of stream flow data at each station

1951-2002 year

Range of area of selected catchments

50- 2000 square km.

Range of longitude in decimal

-124.070- -67.935 decimal

Range of latitude in decimal



26.932-48.999

decimal

The mean flows range from as low as 4.966m3 s−1n at Station #06606600 in Little Sioux River at Correctionville, IA to as high as 4443.4238m3 s−1 at Station #07013000 in Meramec River near Steelville, MO.



The standard deviation values range from as low as 66.8294m3 s−1at station #06464500 keya pahar at wewela SD to as high as 4755.5m3 s−1 1at station #13336500 selway river nr lowell ID

 the coefficient of variation (CV) values (defined as the standard deviation divided by the mean) range from as low as 0.11304 at Station #06797500 in NE to to as high as 4.324 at Station #10258500 in CA;  the maximum flow observed was 2339m3 s−1 at Station #13317000 (the minimum flow at this station was 64m3 s−1, while the flow was zero in 15 stations at one time or another;

All these observations clearly reflect the extreme variability in streamflow among the 117 stations. The variability in streamflow is due to, among others: (1) the different climatic regions in the western US; (2) the different drainage basin characteristics associated with the streamflow stations; and (3) the variations in hydro climatic factors and land-use changes over a period of time at any of these stations.

51

Figure 5.1 Selected USA stream flow station for Methodology II

5.2

Methodology II

Figure 5.2 shows the schematic of the methodology proposed in this study. The stream flow data from all stations is standardized (To reduce the redundancy of the data) and CWT is applied the each time series using Morlet wavelet to obtain wavelet coefficients at different time- frequency scales. Wavelet coefficient which are influenced only by local feature, provide a better measure of variance attributed to localized events, used to obtain multiscale entropy coefficient. These multiscale entropy coefficient are used to form homogeneous cluster using k-means clustering technique.

52

Figure 5.2 Schematic for methodology implemented in the study

5.2.1

Multiscale Entropy

In order to gauge the complexity of a time series (such as the streamflow time series in this study), the wavelet coefficient produced from the CWT analysis of the time series (section 2.2) can be utilized to obtain the multiscale wavelet entropy coefficient using the Shannon entropy measure (Shannon, 1948), which is defined as: n

Swt ( x )   P( xi )ln( P( xi ))

(5.1)

i 1

Where 𝑝(𝑥𝑖 )is the probability distribution function (pdf) used to describe the random behavior of variable x with the length of𝑛. Entropy is a measure of the statistical variability of the random variable x as described by the pdf. The base of the logarithm is arbitrary, but if base log 2

is used,

entropy is measured in bits.𝑆𝑤𝑡 (𝑥), is a measure of information content in the signal; more information represents a lower entropy value and vice versa. Therefore, a high value of entropy represents high degree of unpredictability and, hence, a highly complicated and disordered hydrologic system. In order to measure the pdf,𝑃(𝑥𝑖 ), in Equation (5.1), Cek et al. (2009) proposed an entropy based on the wavelet energy distribution of a time series. Because the value of the entropy 𝑆𝑤𝑡 (𝑥)calculated is based on the wavelet results, using this approach, Sang et al. (2011) were able 53

to propose four entropy measures, namely continuous wavelet entropy, discrete wavelet entropy, continuous relative wavelet entropy, and discrete relative wavelet entropy. The present study uses the approach by Sang et al. (2011) to propose a new entropy measure, named wavelet-based multiscale entropy (WME). The CWT-based pdf,𝑃(𝑥𝑖 ), is estimated according to the wavelet energy (i.e., variance): 2

W (i, j ) E (i , j ) P( xi )   E ( j )  W (i , j ) 2

(5.2)

Where, 𝐸(𝑖, 𝑗) represents the wavelet energy under time position 𝑖 and time scale 𝑗 and 𝐸(𝑗) represents the total wavelet energy of the time series under timescale 𝑗 (Cek et al., 2009; Sang et al., 2011).

Figure 5.3 Plot shows the variation of MWE with the scale of decomposition

To illustrate the concept of multi-scale wavelet entropy, a synthetic time series 𝑆5 is analyzed. 𝑆5 is obtained through linear combination of a stationary time series 𝑆1, a linear component 𝑆2 , a nonlinear signal 𝑆3 and random noise 𝑆4 of range 0 to 10. These signals are mathematically described below:

54

S1 = sin

(2πt) 50

+ cos

(2πt)

(5.3)

60

𝑡

𝑆2 = 200

(5.4) 𝑡2

𝑆3 = 10 ∗ 𝑆1 2 + 𝑆1 + 104

(5.5)

𝑆4 = 𝑅𝑎𝑛𝑑𝑜𝑚 𝑛𝑜𝑖𝑠𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 [0,10]

(5.6)

𝑆5 = 𝑆1 + 𝑆2 + 𝑆3 + 𝑆4

(5.7)

The resultant synthetic time series, plot of the wavelet coefficients and multiscale entropy are provided in Figure 5.4. The wavelet coefficients plot show that the given time series has features having a periods of 16 units and 32 units. Also, it can be seen that there is strong trend which is captured at the scale of 128 units. It is evident that multiscale entropy is sensitive to these features of the time series. High value of entropy was observed at the periods 16, 32, and 128 units indicating low degree of orderliness and inconsistent features around these periods. Further a dip in the multiscale entropy plot around the period 30 and 50 indicates that a strong and ordered feature exists in the signal around this period (which belong to 𝑆3 and 𝑆1 respectively). Lower values entropy of indicates orderliness and higher values of entropy indicates variability.

55

Figure 5.4 Illustration for multiscale entropy. Top Figure: Plot of the synthetic time series. Bottom Left: Plot of the wavelet coefficients Bottom Right: Plot of entropy values across different scales.

5.2.2 Comparison of WME and GWS coefficient The Fig. 5.5 signifies that up to scale 3 there is stationary process in catchment and predictability of this process is very high hence entropy which a measure of disorder is very low approx. near to zero. Initially entropy is showing decreasing trend, this decrease is possibly due to the absence of significant features at these scales. The scale of 3–8 months marks a constant increase in WME value marking a high variability at these scales. For the 9–13 month scale, WME first increases and then decreases. Steep increase in WME up to a scale of 11 months can be attributed to a decrease in information and increased randomness of wavelet coefficients for intermediate scales. As all streamflow stations have very distinct annual patterns, the decrease in WME at around 12–13 months scale is well justified. Beyond the 14-month scale, the trend in WME becomes highly variable. This may be due to the presence of high irregularity in the presence of features at these scales

56

As the scale increases information content at these scale decreases and unpredictability also increases hence having high entropy value at large scale i.e. at high scale there is process becomes highly random.

Figure 5.5 Normalized Wavelet-Entropy and GWS coefficient for sample station

5.2.3 K-Mean clustering As explained in section 4.3.2 to decide optimal number of cluster the method is to plot a curve between 𝐾 𝑣/𝑠 (𝑅𝑆𝑆)𝑚𝑖𝑛 (𝐾) presented in Fig. 5.6 which will be monotonically decreasing function. And observe the “knee “point in the curve, it is the point where successive decrease in(RSS)min (K) becomes noticeable smaller. nq

p

RSSmin  K     xi ,l  xq ,l  k

2

(5.8)

q 1 i 1 l 1

Here, the value of objective function is 𝑅𝑆𝑆𝑚𝑖𝑛 is calculated for the cluster number of k=3 to 25. This represent the radial distance between the centroid of cluster and all the points in the cluster. The minimum value occur when k=n. However, the error becomes very small with a few cluster and there is a point where increasing number of cluster does not produce a significant reduction in objective function. This is clear from this figure 5.6 shown below

57

Figure 5.6 Variation of 𝑹𝑺𝑺_𝒎𝒊𝒏 with number of cluster

Therefore, the k-mean analysis suggests that the stream flow could be adequately describe by fourteen distinct cluster.

5.3 Results Figure 5.7 shows the Normalized wavelet power spectrum, i.e.|𝑊𝑛 (𝑠)|2 /𝜎 2 , of streamflow for St. John River at Dickey, Maine. The figure shows the plot of wavelet coefficients and global wavelet spectrum. It can be observed that the wavelet coefficients plot show the presence of sub annual, decadal (64 months to 128 months) features apart from the annual cycles. The relative power of each of these features are shown in the wavelet global power spectrum. The wavelet multiscale entropy was estimated for all the stations across different scales. The entropy values across different scales are used as the basis for clustering. The optimal number of clusters was decided using different validation indices. The plot of the indices values are shown in Figure 5.8. It can be seen that the Dune Index and DB index indicate that the optimal number of clusters is 14. It can be observed that beyond 14 clusters there is no significant improvement in terms of homogeneity and separation index. The streamflow

58

stations were segregated into 14 clusters using the WME based method as explained in previous section. Table 5.2 shows the number of station that has fallen in each of the cluster category. Figure 5.9 shows the geographical locations of these station in each of the cluster. It can be observed that apart from geographic contiguity, the clustering shows that there is hydrologic similarity in the clusters. The stations in each of these clusters were further examined for any common characteristics (in terms of multiscale entropy) which they have amongst them. It was observed that the entropy for each scale has a little variation across stations for a given cluster.

Figure 5.7 Wavelet power spectrum and Global power spectrum of stream flow time series for the station (01010500) in St. John River at Dickey, Maine

Figure 5.8 Validation indices for selection of optimum number of cluster 59

Table 5.2 Number of stations in each cluster Cluster No

1

2

3

4

5

6

7

8

9

10

11

12

13

14

No of stations

28

50

65

43

67

5

59

7

33

33

52

13

49

26

Figure 5.9 Cluster- wise geographical distribution of stream flow stations

60

Table 5.3 USGS Stream flow station in their respective cluster Cluster no

Cluster 1

USGS site number 02059500, 02064000, 05062000, 05489000, 06600500, 06799500, 06809500, 06817000, 06820500, 06889500, 06898000, 06899500, 07152000, 07153000, 07172000, 07180500, 07208500, 07243500, 07300500, 08380500, 09430500, 10174500, 10183500, 10234500, 10329500, 11152000, 08070000, 81640 00

01452000, 01495000, 01568000, 01573000, 01574000, 01608500, 01614500, 01619500, 01632000, 01634000, 1634500, 01635500, 01637500, 01643000, 01644000,

Cluster 2

01649500, 01664000, 01668000, 01674000, 02042500, 02045500, 02046000, 02051500, 02085500, 02226500, 02245500, 03285000, 03345 500, 03346000, 03379500, 04094000, 05405000, 05412500, 05426000, 05438500, 05439500, 05440000, 05447500, 05466500, 05554500, 05556500, 05569500, 05570000, 05577500, 05579 500, 05580000, 05582000, 06919500, 07189000, 12431000

01022500, 01127000, 01181000, 01197000, 01197500, 01371500, 01372500, 01379000, 01379500, 01396500, 01408500, 01411000, 01411500, 01413500, 01420500, 01439500, 01440000, 01445500, 01534000, 01539000, 01541000, 01543000, 01543500, 01548500, 01550000, 02132000, 02133500, 02337 000, 03010500, 03011020,

Cluster 3

03015500, 03034500, 03049000, 03070500, 03080000, 03093000, 03102500, 03109500, 03110000, 03112000, 03117500, 03182500, 03183500, 03184000, 03186500, 03198500, 03281500, 03329700, 03366500, 03410500,03421000, 03473000, 03524000, 03540500, 04200500, 04214500, 04215500, 042170 00, 04230500, 07029500, 12083000, 14020000,14154500, 14178000, 14185000

01118000, 01544500, 02017500, 02018000, 02054500, 02055000, 02111500, 02118000, 02138500, 02143000, 02177000, 02333500, 03032 500, 03118500, 03144000,

Cluster 4

03146500, 03157000, 03157500, 03164000, 03167000, 03170000, 03173000, 03301500, 03433500, 03434500, 03439000, 03443000, 03446000, 03465500, 03471500, 03479000, 03488000, 03500000, 03504000, 04185000, 05525000, 08010000, 09310500, 10309000, 11264500, 11266500, 12330000, 14325 000

01030500, 01047000, 01052500, 01055000, 01057000, 01064500, 01076500, 01078000, 01134500, 01137500, 01144000, 01169000, 01334 000, 01334500, 02126000, 03020500, 03328500, 04010500, 04027000, 04040500, 04071000, 04073500, 04100500, 04105000, 04105500, 04113000, 04117500, 04124000, 04164000, 04165500,

Cluster 5

04166500, 04168000, 04178000, 04180000, 04231000, 04256000, 04292000, 04293500, 05130500, 05131500, 05362000, 05393500, 05397 500, 05399500, 06207500, 06775500, 09059500, 09112500, 09124500, 09255000, 09292500, 09304500, 10032000, 11230500, 12054000, 12056500, 12205000, 12332000, 13011000, 13120000, 13120500, 13185000, 13186000, 13235000, 13240000, 13258500, 13313000

Cluster 6

01646000, 06846500, 06873000, 08408500, 10258500

61

Cluster no

USGS site number 01127500, 01176000, 01193500, 01350000, 01447500, 01530500, 01532000, 01555000, 01564500, 02088000, 02134500, 02154500, 02192 000, 02198000, 02217500,

Cluster 7

02317500, 02342500, 02347500, 02359000, 02371500, 02374500, 02392000, 02472000, 02475500, 03111500, 03208500, 03237500, 03238500, 03298000, 03303000, 03334500, 03378000, 03380500, 03438000, 03604000, 04027500, 04087000, 04099510, 04115000, 04146000, 04234000, 05379500, 05466 000, 05539000, 06225500, 06289000, 07052500, 07056000, 07057500, 07066000, 07364150, 07376500, 07377500, 07378000, 08247500, 09279000, 11342000, 13139500, 14308000

Cluster 8

05120500, 05479000, 06349500, 09499000, 11124500, 08189500, 08198000

01583500, 01586000, 01616500, 01639500, 02070000, 02369000, 05413500, 05414000, 05418500, 05421000, 05432500, 05434500, 05435500, 05453000, 05455500,

Cluster 9

05463000, 05471500, 05472500, 05484000, 05486000, 05486490, 05487470, 05498000, 05584500, 05587000, 06409000, 06464500, 06609 500, 06808500, 07176500, 07177500, 07187000, 07191000

01628500, 01631000, 01645000, 01666500, 02039000, 02039500, 02040000, 02041000, 02065500, 02074500, 02256500, 02267000, 02270 500, 02296750, 02301500,

Cluster 10

02310000, 02329000, 02376500, 02467500, 05444000, 05452000, 05457000, 05458500, 05458900, 05459500, 05481000, 05482500, 05500000, 05501000, 06601000, 06607200, 06608500, 12093500

01558000, 01559000, 01560000, 01562000, 01596500, 01606500, 02203000, 02225500, 02387000, 02398000, 02450000, 03050500, 03051 000, 03066000, 03069500,

Cluster 11

03075500, 03078000, 03180500, 03219500, 03220000, 03230500, 03265000, 03275000, 03331500, 03339500, 03349000, 03351500, 03361500, 03362000, 03362500, 03363500, 03364000, 03364500, 03406500, 03512000, 03528000, 03531500, 03550000, 03574500, 04176500, 04198000, 05394500, 05515500, 05516500, 05517000, 05517500, 05518000, 05525500, 05572000, 12048000, 14166500, 14209500

Cluster 12

05066500, 05313500, 05317000, 06339500, 06483500, 06710500, 06797500, 06815000, 07203000, 07226500, 09471000, 08070500, 08172 000

01477000, 01487000, 01491000, 01546500, 01555500, 01582000, 02027000, 02028500, 02061500, 02088500, 02108000, 02231000, 02232 000, 02361000, 02492000,

Cluster 13

03139000, 03302000, 03320500, 03325500, 05408000, 05495000, 05495500, 05497000, 5585000, 06892000, 06894000, 06932000, 07013000, 07014500, 07016500, 07018500, 07067000, 07068000, 07071500, 07167500, 07186000, 07196500, 07373000, 07375500, 08013000, 08291000, 08378500, 09330 500, 11367500, 11381500, 11383500, 11501000, 14113000, 08041500

Cluster 14

12020000, 12035000, 12082500, 12098500, 12134500, 12149000, 12167000, 12175500, 12186000, 12189500, 12306500, 12355500, 12358500, 12413000, 12414500, 12451000, 12488500, 13336500, 13337000, 14137000, 14182500, 14190500, 14222500, 14301000, 14301500, 14305500

62

Figure 5.10 Multiscale entropy values for five selected clusters: (a) Cluster 2; (b) Cluster 5; (c) Cluster 8; (d) Cluster 9; and (e) Cluster 12.

63

Figure 5.10 shows, for example, the multiscale entropy values for the stations in five selected clusters: Clusters 2, 5, 8, 9, and 12. As can be seen, the multiscale entropy values are, to a great extent, similar within any given cluster. Also, the pattern of the entropy in a given cluster across all scales for the stations is unique for that cluster but also different from each other. As Figure 5.10 shows, the basis of the clustering is the entropy signature of the streamflow observed at all the stations within a given cluster. For example, in Cluster 2 (Figure 5.10(a)), the entropy signature for all the stations are more or less similar in nature. The peaks in the plots correspond to high value of entropy, which corresponds to high variability of the feature at that scale across time. Further, to examine the homogeneity of each resultant cluster based on WME coupled with Kmeans clustering regional homogeneity test (section 2.7) and discordancy measure (Section 2.8) test is performed. Result are promising and shows that the maximum number of stations are homogeneous with respect to the stations falling in that cluster and heterogeneous with respect to the stations in other cluster. Only result of discordancy measure is presented in terms of figure 5.11 to 5.14 plotted below. The result of homogeneity test is not presented for methodology II as they were quite promising and already shown for methodology I.

64

Figure 5.11 Discordancy measure test for cluster 1 to 4

65

Figure 5.12 Discordancy measure test for cluster 5 to 8 66

Figure 5.13 Discordancy measure test for cluster 9 to 12

67

Figure 5.14 Discordancy measure test for cluster 13 and 14

Table 5.4, shows the discordant number of stations in each resultant cluster which is represented by red dot in above fig 5.11 to 5.14. The criteria to decide discordant site is mentioned in table 2.6 (section 2.8), which is based on number of sites in a cluster which is presented in table 5.2. The stations which are discordant can be further analyzed to increase the homogeneity of cluster as suggested in section 7 under the heading of future scope.

68

Table 5.4 Number of Discordant sites Cluster Number

Number of Discordant sites

1

1

2

3

3

3

4

2

5

1

6

0

7

1

8

2

9

0

10

0

11

1

12

0

13

1

14

0

Further, to make the analysis meaningful and simple, the average entropy over a given cluster is used instead of the individual station-wise entropy values. The average entropy of all member stations of a cluster at each scale is taken as the representative value of entropy for that cluster for the given scale. Average entropy values at each scale are obtained for each of the cluster and further analysis is carried out.

5.4 Discussion The stream flow stations were segregated into 14 clusters as explained in previous sections. The clusters were further examined for the multi- scale entropy of stations falling into them. It is observed that the entropy for each scale has a little variation across stations for a given cluster. Figure 5.15 shows the variation in entropy across multiple scales for all stations in Cluster 5. 69

Cluster 5 is chosen for the illustration as it contains maximum number of stream flow stations (67) as given in Table 5.2. Hence, average entropy of all member stations of a cluster at each scale is taken as the representative WME for that cluster for the given scale. Average entropy values at each scale are obtained for each cluster and further analysis is carried out.

Figure 5.15 Variation of WME across scales for all stations in Cluster 5

The results (Fig. 5.16) indicate that the average scale-wise WME values for all clusters display a similar pattern. In most of the cases, it is observed that the WME first decreases to a minimum value up to the scale less than or equal to 2 months. This decrease is possibly due to the absence of significant features at these scales. The scale of 3–8 months marks a constant increase in WME value marking a high variability at these scales. For the 9–13 month scale, WME first increases and then decreases. Steep increase in WME up to a scale of 11 months can be attributed to a decrease in information and increased randomness of wavelet coefficients for intermediate scales. As all streamflow stations have very distinct annual patterns, the decrease in WME at around 12– 13 months scale is well justified. Beyond the 14-month scale, the trend in WME becomes highly

70

variable. This may be due to the presence of high irregularity in the presence of features at these scales hence not selected for further discussion.

Figure 5.16 Comparison of WME (Normalized) for each scale for all clusters

Three distinct bands are identified for further analysis. Band 1 capturing the features up to 2-month scales. Band 2 and Band 3 capture the features from 3months to 8 months and scales from 9 months to 14 months, respectively. The features having a scale beyond 14 months are grouped under Band 4. Figure 5.16 shows the average normalized WME for all the clusters at different bands (the WME values are normalized for better comparisons). It can be seen that there is a clear distinction in the values of WME for different clusters at different bands. The WME of each cluster is further classified into “High”, “Medium” and “Low” categories based on the position of the individual WME plot with respect to the mean level for that band. If the WME of a cluster, for a given band, falls below the mean of WME of all clusters, then that 71

particular cluster is assigned a signature of 'Low'. Using this classification, an entropy signature is given to each cluster based on the entropy values in the three scale-based bands. For notational simplicity, the classifications “High”, “Medium” and “Low” are represented by “1”,“0” and “-1” respectively. Using this, the entropy signature for each cluster is given as in Table 5.5.As it is clear from Table 5.5, an Entropy signature of (0,-1, and 1) would indicate that the cluster has relatively moderate entropy up to 2 months, low entropy for 3–8 months and high entropy value for 9–14 months. Table 5.5 Entropy signature of all clusters Comparative observation of WE Cluster no.

Band 1

Band 2

Band 3

Entropy Signature

1

Moderate

Moderate

Moderate

(0 , 0, 0)

2

Moderate

Low

Moderate

(0, -1, 0)

3

Low

Moderate

High

(-1,0,1)

4

Low

Moderate

High

(-1,0,1)

5

Low

High

High

(-1,1,1)

6

High

Low

Low

(1,-1,-1)

7

Moderate

High

High

(0,1,1)

8

High

Low

Low

(1,-1,-1)

9

Moderate

Low

Moderate

(0,-1,0)

10

High

Moderate

Moderate

(1,0,0)

11

Low

Moderate

High

(-1,0,1)

12

High

High

Moderate

(1,1,0)

13

Moderate

Moderate

High

(0,0,1)

14

Moderate

High

High

(0,1,1)

As it is clear from table 5.5, an Entropy signature of (0,-1, and 1) would indicate that the cluster has relatively moderate entropy for first 2 scales, low entropy for 3-8 scales and high entropy value for 9 to 14 scales.

72

Further, statistical analysis of the resultant cluster (K=1 to 14) has been done. Figure 5.17 shows the box plot of mean and maximum of streamflow station of all the fourteen cluster. It is clear that, most of the stations statistically show the similar kind of variation and even the cluster are able to capture peak flow.

Figure 5.17 Statistical Properties of Cluster 73

Figure 5.18 Box plot for drainage area of stream flow stations in all clusters

As a further step in the analysis, an attempt is made to relate the WME values at different scalebased bands to their respective catchment areas. Figure 5.18 is a box plot of drainage areas of streamflow stations for each cluster. It is observed that the cluster which have a ‘High ' entropy for scale 9–13 has characteristically small drainage area. On the other hand, the clusters that is characterized by 'Low' entropy for scale 9–13 months have large drainage area. The above observation corroborates with the general ideology that a catchment with a smaller area will be characterized by high variability and unstable properties. This observation remains a preliminary one, as thereare other factors too that play a vital role in determining the properties of a catchment and, hence, demand for an in-depth analysis. Such an investigation, however, is beyond the scope of this study.

74

CONCLUSION

This study has presented a novel method for catchment regionalization using multiscale wavelet entropy. Application of the method to streamflow data from 530 monitoring stations in the contiguous United States offers promising results for regionalization. The results lead to the following highlights:

 The study proposes a robust k-means coupled wavelet based multiscale entropy approach for regionalization of hydrologic catchments, which is able to rectify the limitation of previous approaches and quiet efficient.  Wavelet based multiscale entropy (WME) technique capture the variability of streamflow dynamics at each station independently and then allows to formation of homogeneous cluster, which are not based on any priory assumptions.  Since, study clearly shows that cluster are not based on proximity of station i.e. near does not mean “similar” consequently, that extrapolation and interpolation may not always work even when using data from nearby catchments. This observation has very important implication for prediction in ungauged basin.  Study reveals that, wavelet based multiscale entropy appears to be an important statistic in capturing the catchment characteristics. And according to this 530 stations under study can be categorized into 14 clusters each having a distinct WME pattern across scales under consideration.  Based on the pattern of the average WME for each cluster for 1 to 14 scales, a characteristic signature is provided to each catchment which provides an approximation of WE of a catchment across scales 1-2, 3-8 and 9-14 relative to other stations.  The precise cause of fluctuations in WME at different scales remains to be investigated, although drainage area seems to be one of the factors.

75

FUTURE SCOPE

A primary focus of the present study was to develop a robust regionalization tool based on wavelet based multi-scale entropy coupled with clustering technique. The result and conclusion part clearly shows that technique is quiet efficient and present homogeneous cluster. The homogeneity of cluster can be further increased with the help of various measure suggested by (Hosking and wallias, 1997and Rao and Srinivas, 2008). The options suggested by Hosking and Wallis (1997) for adjusting the regions resulting from clustering algorithm include: (i) eliminating (or deleting) one or more sites from the data set; (ii) transferring (or moving) one or more sites from a region to other regions; (iii) dividing a region to form two or more new regions; (iv) allowing a site to be shared by two or more regions; (v) dissolving regions by transferring their sites to other regions; (vi) merging a region with another or others; (vii) merging two or more regions and redefining groups; and (viii) obtaining more data and redefining regions. Among these, the first three options are useful in reducing the values of heterogeneity measures of a region, whereas the options (iv) to (vii) help in ensuring that each region is sufficiently large. Study reveals that, wavelet based multiscale entropy appears to be an important statistic in capturing the catchment characteristics. The precise cause of fluctuations in WME at different scales remains to be investigated, although drainage area seems to be one of the factors.

76

APPLICATION OF PROPOSED TECHNIQUE As mentioned the prime concern was to develop a technique called “wavelet based multiscale entropy”. The novelty of method already have been described in previous section. Regionalization which is one of the key area where this technique has been successfully applied and the results are presented in this study. Other area where this application may found its applications are:  Climatic Downscaling (Lakhanpal, A., 2015, Sehagal et al., 2015) have already applied this technique in his work “Statistical downscaling of GCM simulations using Wavelet coupled second order Volterra models”. In this study K-means clustering of the climatic variable was done based on MWE. From each of the cluster representative variables were selected using the PCA which defined 90-95% of the variability of the corresponding cluster.  Forecasting modelling  Hydrological modelling  Disaggregation etc.

77

REFERENCES

Acreman, M. and Sinclair, C., 1986. Classification of drainage basins according to their physical characteristics; an application for flood frequency analysis in Scotland. Journal of Hydrology, 84(3): 365-380. Adamowski, J.F., 2008. Development of a short-term river flood forecasting method for snowmelt driven floods based on wavelet and cross-wavelet analysis. Journal of Hydrology, 353(3): 247-266. Allen, M.R. and Smith, L.A., 1996. Monte Carlo SSA: Detecting irregular oscillations in the presence of colored noise. Journal of Climate, 9(12): 3373-3404. Alpaslan, A., Sayar, M. and Atilgan, A., Local Forecasting of Chaotic Time Series. Anctil, F., Perrin, C. and Andreassian, V., 2003. ANN output updating of lumped conceptual rainfall/runoff forecasting models1. JAWRA Journal of the American Water Resources Association, 39(5): 1269-1279. Anctil, F.o. and Coulibaly, P., 2004. Wavelet analysis of the interannual variability in southern Qubec streamflow. Journal of climate, 17(1): 163-173. Ball, G.H. and Hall, D.J., 1967. A clustering technique for summarizing multivariate data. Behavioral science, 12(2): 153-155. Berkin, A., Coxon, B. and Pozsgay, V., 2002. Towards a synthetic glycoconjugate vaccine against Neisseria meningitidis A. Chemistry-A European Journal, 8(19): 4424-4433. Blanco, S., Figliola, A., Quiroga, R.Q., Rosso, O. and Serrano, E., 1998. Time-frequency analysis of electroencephalogram series. III. Wavelet packets and information cost function. Physical Review E, 57(1): 932. Blöschl, G. and Sivapalan, M., 1995. Scale issues in hydrological modelling: a review. Hydrological processes, 9(3-4): 251-290. Blöschl, Günter. (2005) Rainfall-Runoff Modeling of Ungauged Catchments. Wiley Online Library. Bolshakova, N. and Azuaje, F., 2003. Machaon CVE: cluster validation for gene expression data. Bioinformatics, 19(18): 2494-2495. 78

Brunsell, N., A multiscale information theory approach to assess spatial-temporal variability of daily precipitation. Journal of Hydrology, 385(1): 165-172. Buishand, T., 1989. Statistics of extremes in climatology. Statistica Neerlandica, 43(1): 1-30. Burn, D. and Boorman, D., 1993. Estimation of recharge and runoff volumes from ungauged catchments in eastern Australia. J. Hydrol, 143(3-4): 429-454. Burn, D.H. and Arnell, N.W., 1993. Synchronicity in global flood responses. Journal of Hydrology, 144(1): 381-404. Cai, X., D. Wang, et al. (2009). "Assessing the regional variability of GCM simulations." Geophysical Research Letters 36(2). Cavadias, G.S., Ouarda, T.B., Bobée, B. and Girard, C., 2001. A canonical correlation approach to the determination of homogeneous regions for regional flood estimation of ungauged basins. Hydrological sciences journal, 46(4): 499-512. Cazelles, B. et al., 2008. Wavelet analysis of ecological time series. Oecologia, 156(2): 287-304. Cek, M.E., Ozgoren, M. and Savaci, F.A., Continuous time wavelet entropy of auditory evoked potentials. Computers in biology and medicine, 40(1): 90-96. Chan, Y., 1995. Wavelet basics. Springer Science & Business Media. Chen, G. et al., 2002. Evaluation and comparison of clustering algorithms in analyzing ES cell gene expression data. Statistica Sinica, 12(1): 241-262. Chen, G. et al., 2002. Evaluation and comparison of clustering algorithms in analyzing ES cell gene expression data. Statistica sinica, 12(1): 241-262. Chiang, S.-M., Tsay, T.-K. and Nix, S.J., 2002a. Hydrologic regionalization of watersheds. I: Methodology development. Journal of Water Resources Planning and Management, 128(1): 3-11. Chiang, S.-M., Tsay, T.-K. and Nix, S.J., 2002b. Hydrologic regionalization of watersheds. II: Applications. Journal of Water Resources Planning and Management, 128(1): 12-20. Chou, C.-M. and R.-Y. Wang (2002). "On-line estimation of unit hydrographs using the waveletbased LMS algorithm/Estimation en ligne des hydrogrammes unitaires grâce à l'algorithme des moindres carrés moyens à base d'ondelettes." Hydrological sciences journal 47(5): 721-738.

79

Chowdhury, J.U., Stedinger, J.R. and Lu, L.-H., 1991. Goodness-of-fit tests for regional generalized extreme value flood distributions. Water Resources Research WRERAQ, 27(7): 1765-1776. Coulibaly, P. and Baldwin, C.K., 2003. Nonstationary hydrological time series forecasting using nonlinear dynamic methods. Journal of Hydrology, 307(1): 164-174. Coulibaly, P. and Burn, D.H., 2004. Wavelet analysis of variability in annual Canadian streamflows. Water Resources Research, 40(3). Cunderlik, J.M. and Burn, D.H., 2006. Site-focused nonparametric test of regional homogeneity based on flood regime. Journal of Hydrology, 318(1): 301-315. Davies, D.L. and Bouldin, D.W., 1979. A cluster separation measure. Pattern Analysis and Machine Intelligence, IEEE Transactions on (2): 224-227. Dudoit, S. and Fridlyand, J., 2002. A prediction-based resampling method for estimating the number of clusters in a dataset. Genome biology, 3(7): research0036. Dudoit, S., Yang, Y.H., Callow, M.J. and Speed, T.P., 2002. Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Statistica sinica, 12(1): 111-140. Dunn, J.C., 1973. A fuzzy relative of the ISODATA process and its use in detecting compact wellseparated clusters. Ercan, K., Mehmet, C. and Osman, A., 2008. Hydrologic homogeneous regions using monthly streamflow in Turkey. Earth sciences research journal, 12(2): 181-193. Estivill-Castro, V. and Lee, I., 2000. Autoclust: Automatic clustering via boundary extraction for mining massive point-data sets, In Proceedings of the 5th International Conference on Geocomputation. Citeseer. Farge, M., 1992. Wavelet transforms and their applications to turbulence. Annual Review of Fluid Mechanics, 24(1): 395-458. Fill, H.D. and Stedinger, J.R., 1995. Homogeneity tests based upon Gumbel distribution and a critical appraisal of Dalrymple's test. Journal of Hydrology, 166(1): 81-105. Fraley, C. and Raftery, A.E., 1998. How many clusters? Which clustering method? Answers via model-based cluster analysis. The computer journal, 41(8): 578-588. Gilman, D., Fuglister, F. and Mitchell Jr, J., 1963. On the power spectrum of red noise. Journal of the Atmospheric Sciences, 20(2): 182-184. 80

Grinsted, A., Moore, J.C. and Jevrejeva, S., 2004. Application of the cross wavelet transform and wavelet coherence to geophysical time series. Nonlinear processes in geophysics, 11(5/6): 561-566. Gu, C. et al., 2003. Neuropilin-1 conveys semaphorin and VEGF signaling during neural and cardiovascular development. Developmental cell, 5(1): 45-57. Halkidi, M., Batistakis, Y. and Vazirgiannis, M., 2001. On clustering validation techniques. Journal of Intelligent Information Systems, 17(2-3): 107-145. Halkidi, M., Batistakis, Y. and Vazirgiannis, M., 2001. On clustering validation techniques. Journal of Intelligent Information Systems, 17(2): 107-145. Han, J. and Kamber, M., 2001. Data mining: concepts and technologies. Morgan Kaufmann Publisher, San Francisco. Harmancioglu, N.B. and Alpaslan, N., 1992. WATER QUALITY MONITORING NETWORK DESIGN: A PROBLEM OF MULTI-OBJECTIVE DECISION MAKING1. JAWRA Journal of the American Water Resources Association, 28(1): 179-192. He, Y., Bárdossy, A. and Zehe, E., A catchment classification scheme using local variance reduction method. Journal of Hydrology, 411(1): 140-154. Holschneider, D., Kumazawa, T., Chen, K. and Shih, J., 1998. Tissue-specific effects of estrogen on monoamine oxidase A and B in the rat. Life sciences, 63(3): 155-160. Hosking, J. and Wallis, J., 1993. Some statistics useful in regional frequency analysis. Water Resources Research, 29(2): 271-281. Hosking, J., WALLIS JR (1997) Regional Frequency Analysis: An Approach Based on LMoments. Cambridge University Press, Cambridge, UK. Isik, S. and Singh, V.P., 2008. Hydrologic regionalization of watersheds in Turkey. Journal of Hydrologic Engineering, 13(9): 824-834. Jain, A.K. and Dubes, R.C., 1988. Algorithms for clustering data. Prentice-Hall, Inc. Jain, A.K., Murty, M.N. and Flynn, P.J., 1999. Data clustering: a review. ACM computing surveys (CSUR), 31(3): 264-323. Kahya, E., Kalaycı, S. and Piechota, T.C., 2008. Streamflow regionalization: case study of Turkey. Journal of Hydrologic Engineering, 13(4): 205-214. Kaiser, W.M. and Huber, S.C., 1994. Posttranslational regulation of nitrate reductase in higher plants. Plant Physiology, 106(3): 817. 81

Kasturi, J., Acharya, R. and Ramanathan, M., 2003. An information theoretic approach for analyzing temporal patterns of gene expression. Bioinformatics, 19(4): 449-458. Kestin, T.S., Karoly, D.J., Yano, J.-I. and Rayner, N.A., 1998. Time-frequency variability of ENSO and stochastic simulations. Journal of climate, 11(9): 2258-2272. Kestin, T.S., Karoly, D.J., Yano, J.-I. and Rayner, N.A., 1998. Time-frequency variability of ENSO and stochastic simulations. Journal of Climate, 11(9): 2258-2272. Kim, S. (2004). "Wavelet analysis of precipitation variability in northern California, USA." KSCE Journal of Civil Engineering 8(4): 471-477. Kim, T.-W. and Valdes, J.B., 2003. Nonlinear model for drought forecasting based on a conjunction of wavelet transforms and neural networks. Journal of Hydrologic Engineering, 8(6): 319-328. King, M., 1967. Measuring the religious variable: Nine proposed dimensions. Journal for the Scientific Study of Religion: 173-190. Kokkonen, T.S., Jakeman, A.J., Young, P.C. and Koivusalo, H.J., 2003. Predicting daily flows in ungauged catchments: model regionalization from catchment descriptors at the Coweeta Hydrologic Laboratory, North Carolina. Hydrological processes, 17(11): 2219-2238. Kucuk, M and Agiralioglu, O. N.,2006. Wavelet regression technique for streamflow prediction. Journal of Applied Statistics, 33(9): 943-960. Labat, D., 2005. Recent advances in wavelet analyses: Part 1. A review of concepts. Journal of Hydrology, 314(1): 275-288. Labat, D., et al. (2000), Rainfall- runoff relations for karstic springs. Part II: continuous wavelet and discrete orthogonal multiresolution analyses, Journal of hydrology, 238(3), 149-178. Locat, J., Bérubé, M.-A. and Choquette, M., 1990. Laboratory investigations on the lime stabilization of sensitive clays: shear strength development. Canadian Geotechnical Journal, 27(3): 294-304. Lu, L.-H. and Stedinger, J.R., 1992. Sampling variance of normalized GEV/PWM quantile estimators and a regional homogeneity test. Journal of Hydrology, 138(1): 223-245. Lu, R. (2002). "Decomposition of interdecadal and interannual components for North China rainfall in rainy season." Chinese Journal of Atmosphere (in Chinese) 26: 611-624.

82

MacQueen, J., 1967. Some methods for classification and analysis of multivariate observations, Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. Oakland, CA, USA., pp. 281-297. Maimon, O. and Rokach, L., 2005. Data mining and knowledge discovery handbook, 2. Springer. Meyers, S.D., Kelly, B.G. and O'Brien, J.J., 1993. An introduction to wavelet analysis in oceanography and meteorology: With application to the dispersion of Yanai waves. Monthly Weather Review, 121(10): 2858-2866. Murtagh, F., 1984. Complexities of hierarchic clustering algorithms: State of the art. Computational Statistics Quarterly, 1(2): 101-113. Mwale, D. et al., Regionalization of runoff variability of Alberta, Canada, by wavelet, independent component, empirical orthogonal function, and geographical information system analyses. Journal of Hydrologic Engineering, 16(2): 93-107. Nanavati, S.P. and Panigrahi, P.K., 2004. Wavelet transform. Resonance, 9(3): 50-64. Nathan, R. and McMahon, T., 1990. Evaluation of automated techniques for base flow and recession analyses. Water Resources Research, 26(7): 1465-1473. Nourani, V., Alami, M.T. and Aminfar, M.H., 2009. A combined neural-wavelet model for prediction of Ligvanchai watershed precipitation. Engineering Applications of Artificial Intelligence, 22(3): 466-472. Partal, T. and Özgür Kişi (2007). "Wavelet and neuro-fuzzy conjunction model for precipitation forecasting." Journal of Hydrology 342(1): 199-212. Pilgrim, D., Chapman, T. and Doran, D., 1988. Problems of rainfall-runoff modelling in arid and semiarid regions. Hydrological Sciences Journal, 33(4): 379-400. Post, D.A. and Jakeman, A.J., 1999. Predicting the daily streamflow of ungauged catchments in SE Australia by regionalising the parameters of a lumped conceptual rainfall-runoff model. Ecological Modelling, 123(2): 91-104. Prinzio, M.D., Castellarin, A. and Toth, E., Data-driven catchment classification: application to the pub problem. Hydrology and Earth System Sciences, 15(6): 1921-1935. Quiroga, R.Q., Arnhold, J., Lehnertz, K. and Grassberger, P., 2000. Kulback-Leibler and renormalized entropies: applications to electroencephalograms of epilepsy patients. Physical Review E, 62(6): 8380.

83

Rajaee, T., S. Mirbagheri, et al. "Prediction of daily suspended sediment load using wavelet and neurofuzzy combined model." International Journal of Environmental Science & Technology 7(1): 93-110. Rao, A.R. and Srinivas, V., 2006. Regionalization of watersheds by hybrid-cluster analysis. Journal of Hydrology, 318(1): 37-56. Rao, A.R. and Srinivas, V., 2006. Regionalization of watersheds by fuzzy cluster analysis. Journal of Hydrology, 318(1): 57-79. Rao, A.R. and Srinivas, V., 2008. Regionalization of watersheds: an approach based on cluster analysis, 58. Springer Science & Business Media. Rao, G. S. (2004), Wavelet Analysis And Applications. Publisher: New Age International (p) Limited. Rashid, M. M., S. Beecham, et al. "Statistical downscaling of rainfall: a non-stationary and multiresolution approach." Theoretical and Applied Climatology: 1-15. Razavi, T. and Coulibaly, P., Streamflow prediction in ungauged basins: Review of regionalization methods. Journal of Hydrologic Engineering, 18(8): 958-975. Rokach, L. and Maimon, O., 2005. Clustering methods, Data mining and knowledge discovery handbook. Springer, pp. 321-352. Saco, P. and Kumar, P., 2000. Coherent modes in multiscale variability of streamflow over the United States. Water Resources Research, 36(4): 1049-1067. Sang, Y.-F., Wang, D., Wu, J.-C., Zhu, Q.-P. and Wang, L., Wavelet-based analysis on the complexity of hydrologic series data under multi-temporal scales. Entropy, 13(1): 195-210. Satyanarayana, P. and Srinivas, V., Regionalization of precipitation in data sparse areas using large scale atmospheric variables-A fuzzy clustering approach. Journal of Hydrology, 405(3): 462-473. Sawicz, K., Wagener, T., Sivapalan, M., Troch, P. and Carrillo, G., Catchment classification: empirical analysis of hydrologic similarity based on catchment function in the eastern USA. Hydrology and Earth System Sciences, 15(9): 2895-2911. Selim, S.Z. and Ismail, M.A., 1984. K-means-type algorithms: a generalized convergence theorem and characterization of local optimality. Pattern Analysis and Machine Intelligence, IEEE Transactions on(1): 81-87. Shannon, C.E., 1948. A note on the concept of entropy. Bell System Tech. J, 27: 379-423. 84

Sharan, R., Maron-Katz, A. and Shamir, R., 2003. CLICK and EXPANDER: a system for clustering and visualizing gene expression data. Bioinformatics, 19(14): 1787-1799. Sharan, R., Maron-Katz, A. and Shamir, R., 2003. CLICK and EXPANDER: a system for clustering and visualizing gene expression data. Bioinformatics, 19(14): 1787-1799. Singh, K., Singh, J. and Singh, H., 1996. A synthetic entry into fused pyran derivatives through carbon transfer reactions of 1, 3-oxazinanes and oxazolidines with carbon nucleophiles. Tetrahedron, 52(45): 14273-14280. Singh, V. and Fiorentino, M., 1992. A historical perspective of entropy applications in water resources, Entropy and energy dissipation in water resources. Springer, pp. 21-61. Singh, V., 1997. The use of entropy in hydrology and water resources. Hydrological processes, 11(6): 587-626. Singh, V.P. and Rajagopal, A., 1987. Some recent advances in application of the principle of maximum entropy (POME) in hydrology. IAHS Publ, 164: 353-364. Singh, V.P., Hydrologic synthesis using entropy theory: Review. Journal of Hydrologic Engineering, 16(5): 421-433. Sivakumar, B. and Singh, V., Hydrologic system complexity and nonlinear dynamic concepts for a catchment classification framework. Hydrology and Earth System Sciences, 16(11): 4119-4131. Sivakumar, B. and Singh, V., Hydrologic system complexity and nonlinear dynamic concepts for a catchment classification framework. Hydrology and Earth System Sciences, 16(11): 4119-4131. Sivakumar, B., Singh, V.P., Berndtsson, R. and Khan, S.K., Catchment classification framework in hydrology: challenges and directions. Journal of Hydrologic Engineering, 20(1). Smith, L.C., Turcotte, D.L. and Isacks, B.L., 1998. Stream flow characterization and feature detection using a discrete wavelet transform. Hydrological processes, 12(2): 233-249. Sneath, P.H. and Sokal, R.R., 1973. Numerical taxonomy. The principles and practice of numerical classification. Srinivas, S.K. et al., 2007. Predicting failure of a vaginal birth attempt after cesarean delivery. Obstetrics & Gynecology, 109(4): 800-805.

85

Srinivas, V., Tripathi, S., Rao, A.R. and Govindaraju, R.S., 2008. Regional flood frequency analysis by combining self-organizing feature map and fuzzy clustering. Journal of Hydrology, 348(1): 148-166. Ssegane, H., Tollner, E., Mohamoud, Y., Rasmussen, T. and Dowd, J., Advances in variable selection methods I: Causal selection methods versus stepwise regression and principal component analysis on data of known and unknown functional relationships. Journal of Hydrology, 438: 16-25. Stainton, R. and Metcalfe, R., 2007. Characterisation and classification of flow regimes of natural rivers in Ontario to support the identification of potential reference basins. Waterpower Project Science Transfer Rep, 7. Torrence, C. and Compo, G.P., 1998. A practical guide to wavelet analysis. Bulletin of the American Meteorological society, 79(1): 61-78. Torrence, C. and Webster, P.J., 1999. Interdecadal changes in the ENSO-monsoon system. Journal of climate, 12(8): 2679-2690. Viglione, A., Laio, F. and Claps, P., 2007. A comparison of homogeneity tests for regional frequency analysis. Water Resources Research, 43(3). Wang, W. and J. Ding (2003). "Wavelet network model and its application to the prediction of hydrology." Nature and Science 1(1): 67-71 Ward Jr, J.H., 1963. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301): 236-244. Wiltshire, S., 1986. Regional flood frequency analysis II: Multivariate classification of drainage basins in Britain. Hydrological Sciences Journal, 31(3): 335-346. Yadav, M., Wagener, T. and Gupta, H., 2007. Regionalization of constraints on expected watershed response behavior for improved predictions in ungauged basins. Advances in Water Resources, 30(8): 1756-1774. Zhang, Q., Liu, C., Xu, C.-y., Xu, Y. and Jiang, T., 2006. Observed trends of annual maximum water level and streamflow during past 130 years in the Yangtze River basin, China. Journal of Hydrology, 324(1): 255-265. Zhang, Q., Liu, C., Xu, C.-y., Xu, Y. and Jiang, T., 2006. Observed trends of annual maximum water level and streamflow during past 130 years in the Yangtze River basin, China. Journal of Hydrology, 324(1): 255-265. 86

Zhao, Y., Karypis, G. and Fayyad, U., 2005. Hierarchical clustering algorithms for document datasets. Data mining and knowledge discovery, 10(2): 141-168. Zhou, H.-c., Y. Peng, et al. (2008). "The research of monthly discharge predictor-corrector model based on wavelet decomposition." Water resources management 22(2): 217-227. Zoppou, C., Nielsen, O.M. and Zhang, L., 2002. Regionalization of daily stream flow in Australia using wavelets and k-means analysis. CMA Research Report MRR02-003, Australian National University, Canberra. Available at: http://wwwmaths. anu. edu. au/research. reports/mrr/02/003. Zrinji, Z. and Burn, D.H., 1994. Flood frequency analysis for ungauged sites using a region of influence approach. Journal of Hydrology, 153(1): 1-21. Zrinji, Z. and Burn, D.H., 1996. Regional flood frequency with hierarchical region of influence. Journal of Water Resources Planning and Management, 122(4): 245-252.

87

Suggest Documents