A New Time Series Mining Approach Applied to ... - IEEE Xplore

140

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 51, NO. 1, JANUARY 2013

A New Time Series Mining Approach Applied to Multitemporal Remote Sensing Imagery Luciana Alvim S. Romani, Ana Maria H. de Avila, Daniel Y. T. Chino, Jurandir Zullo, Jr., Richard Chbeir, Caetano Traina, Jr., and Agma J. M. Traina, Member, IEEE

Abstract—In this paper, we present a novel unsupervised algorithm, called CLimate and rEmote sensing Association patteRns Miner, for mining association patterns on heterogeneous time series from climate and remote sensing data integrated in a remote sensing information system developed to improve the monitoring of sugar cane fields. The system, called RemoteAgri, consists of a large database of climate data and low-resolution remote sensing images, an image preprocessing module, a time series extraction module, and time series mining methods. The preprocessing module was projected to perform accurate geometric correction, what is a requirement particularly for land and agriculture applications of satellite images. The time series extraction is accomplished through a graphical interface that allows easy interaction and high flexibility to users. The time series mining method transforms series to symbolic representation in order to identify patterns in a multitemporal satellite images and associate them with patterns in other series within a temporal sliding window. The validation process was achieved with agroclimatic data and NOAA-AVHRR images of sugar cane fields. Results show a correlation between agroclimatic time series and vegetation index images. Rules generated by our new algorithm show the association patterns in different periods of time in each time series, pointing to a time delay between the occurrences of patterns in the series analyzed, corroborating what specialists usually forecast without having the burden of dealing with many data charts. Index Terms—Association rules, image information mining, NOAA-AVHRR images, sequential patterns.

I. I NTRODUCTION

A

DVANCES in computer technology as well as knowledge discovery and information mining methods have contributed to increase the access and application of remote sensing imagery. In the past few years, improvements in the data acquisition technology have decreased the time interval of data gathering, bursting the quantity of satellite images stored by institutions in the whole world. Specifically, satellites of Manuscript received September 27, 2010; revised May 6, 2011 and November 18, 2011; accepted April 1, 2012. Date of publication June 11, 2012; date of current version December 19, 2012. L. A. S. Romani is with the Department of Computer Science, University of Sao Paulo, São Carlos 13560-970, Brazil and Embrapa Agriculture Informatics (e-mail: [email protected]). A. M. H. de Avila and J. Zullo, Jr. are with Cepagri, Unicamp, Campinas, Brazil (e-mail: [email protected]; [email protected]). D. Y. T. Chino, C. Traina, Jr., and A. J. M. Traina are with Department of Computer Science, University of Sao Paulo, São Carlos 13560-970, Brazil (e-mail: [email protected]; [email protected]; agma@icmc. usp.br). R. Chbeir is with LE2I-CNRS Lab., University of Bourgogne, 21078 Dijon, France (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TGRS.2012.2199501

polar orbiting that are scheduled to accomplish two daily passes focusing the same target on earth have generated a relevant archive of multitemporal images. New technologies developed to be applied in the remote sensing area have increased its use in real applications. However, several users still have problems to deal with satellite images due to different and more sophisticated demands being imposed to them, as well as the fast growing in quantity and complexity of remote sensing data [1]. In this context, the knowledge discovery approach has been considered a promising alternative to explore and find relevant information on this huge volume of data. Some initiatives involving information and image mining have been accomplished through different techniques with reasonable results [2]–[4]. Instead of extracting features from images, other approaches work on computing measurements (indexes) from images generated by a combination of remote sensor channels that can be used to identify the green biomass, and soil temperature, for example. Thus, these indexes (measurements) can be extracted considering each pixel of multitemporal image data sets generating different time series. Time series are generated and studied in several areas, and data mining techniques have been developed to analyze them [5]–[7]. Association rules were proposed by Agrawall et al. [5] to solve the problem of discovering which items are bought together in a transaction. In the last decade, several techniques have been proposed to discover sequential patterns in temporal data [7], [8]. The number of rules discovered can be so large that analyzing the entire set and finding the most interesting ones can be a difficult task for the user. Then, Klemettinem et al. [9] proposed a method based on rule templates to identify interesting rules. Mannila et al. [10] proposed a method to episodal sequential data mining that uses all frequent episodes within one sequence. There are several methods that use constraints to focus on the mining process to find relevant items. Zaki [11] proposed the use of temporal constraints in transactional sequences. Harms et al. [12] defined methods that combine constraints and closure principles with a sliding window approach. Their objective was to find frequent closed episodes in multiple event sequence. In general, several techniques have been proposed to discover sequential patterns in temporal data [7], [8], [13]–[15] in the last decade. However, these techniques do not support a discretization process that can be tuned by relevance and preserves the time stamp of the event, what is very relevant in several applications, such as in agroclimate processes, considering heterogeneous time series.

0196-2892/$31.00 © 2012 IEEE

ROMANI et al.: NEW TIME SERIES MINING APPROACH

Harms et al. [16] developed the MOWCATL algorithm to mine frequent association rules from sequential data sets. The authors showed that this approach is useful in applications to drought risk management. Wu et al. [17] proposed the GEAM algorithm to find association patterns in abnormal event sequences. Both algorithms, MOWCATL and GEAM, work over event sequences with discrete events. For example, an event type is denoted as a tuple in the form attribute, level, where attribute is a variable such as rain or temperature, and level is the corresponding level of the variable’s value, such as low, normal, or high. Using discrete events, it is possible to discover interesting patterns, but the problem was simplified. For example, it was found an association pattern: “if El Nino, then low precipitation in region X.” In this case, the data is discretized according to the intensity of rainfall only, disregarding the period of time. However, the time period is important to understand the reasons of the phenomenon. Indeed, researchers are more interested in finding out the peaks (with different amplitudes) and periods of their occurrence in the El Nino time series to study the effects of this phenomenon and its relation to the climate changing scenario. Honda and Konishi [18] proposed a method for data mining from a set of time series gotten from images. They applied the method over weather satellite cloud images taken by GMS-5 (Geostationary Meteorological Satellite). The proposed algorithm extracts features from images and cluster images by changing the mass of clouds. Julea et al. [19] applied of the SPADE algorithm [7] to extract frequent evolutions that are observed for geographical zoning represented by pixels. They used Meteosat (Meteorological Satellite) images to perform the experiments. The authors use feature vectors to represent the satellite images or symbols associated to discretized interval representing reflectance values of satellite channels. The literature presents several other works that show data mining techniques applied to remotely sensed images and spatial databases [20]–[22]. However, few computational techniques are developed to support researches that associate heterogeneous data, such as, climate, agroclimate, and remote sensing time series. In this paper,we will present a technique aimed at understanding the correlation and influence of a series in another one in order to assist the agribusiness. In a previous work, we have combined a discretization method with the well-known association rules mining algorithm Apriori [5]. Results showed that the rules generated for rainfall associated to the vegetation index (Normalized Difference Vegetation Index (NDVI)) from satellite images did not present a coherent association. For example, all discrete intervals of rainfall (mm) were associated to average values of NDVI (−1 to 1), such as “rain [0–1] ⇒ NDVI [0.36–0.56]” and “rain [100–150] ⇒ NDVI [0.36–0.56]”. That is, for both ranges of rain between 0 and 1, as well as from 100 to 150, the same NDVI value was indicated. As NDVI is an index that is related to the green biomass, the effect of rain on the crop growth could be detected by NDVI after a period of time. This fact is supported by studies described in [23] and [24], which show the interest of agrometeorologists in remote sensing and climate data association to better understand the influence of climate in the development of agriculture crops.

141

Statistical analysis performed by [24] showed that there are correlations between NDVI and rainfall with a delay of one or two months. In this context, we focus on the association of local patterns in time series pairs aimed at improving yield forecasting of agricultural crops and increasing the sustainable usage of soil. Accordingly, we consider the problem of finding rules that associate patterns in a remote sensing time series to other patterns in climate series considering time delay. Examples of rules relating two or more time series could be “a period of gradual increases in the WRSI (Water Requirement Satisfaction Index) value is followed by an increase in NDVI value” or “in years when El Nino is strong could occur rainfall above average in the Southern Brazilian region.” As a solution, we propose a new unsupervised algorithm for mining association patterns on heterogeneous time series integrated to a remote sensing information system. The time series mining module was developed to generate rules considering a time lag. To do so, we define the constraint of timewindow to find association patterns that are extracted in two steps. First, the algorithm transforms multiple time series in a representation of patterns (peaks, mountains, and plateaus), with discrete intervals that maintain the time occurrence and represent phenomena on climate or remote sensing time series. In a second step, the algorithm generates rules that associate patterns in multiple time series with qualitative information. This algorithm—CLimate and rEmote sensing Association patteRns Miner (CLEARMiner)—uses a sliding window value to find the rules that correspond to the number of patterns by window. We have assessed the algorithm quality using time series of agrometeorological data and multitemporal images from an important region of sugarcane production fields in Brazil. Sugarcane crops have expanded due to different reasons, such as, biofuel production, potential benefits to the environment as a possible way of mitigation of greenhouse gases emission, economic impact, among others. Although traditional ways to assess the sugar cane expansion exist, remote sensing images have been widely adopted to evaluate the direct land conversion to sugar cane [25]. As sugarcane crops are cultivated on large fields, researchers have used satellites of medium and low spatial resolution, such as NOAA-AVHRR,1 to identify areas for sugarcane expansion [26], [27]. We have also applied CLEARMiner to El Nino time series in order to discover their influence over precipitation distribution regime in regions of South America. In fact, both case studies are suitable to test the CLEARMiner algorithm since both experiments presuppose a relationship between series considering a time lag. Preliminary results of this work were presented at ACM SAC’2010 [28]. This paper is organized as follows. After introducing the system architecture in Section II, we detail a theoretical formalization for association patterns in Section III. Section IV presents and discusses the experimental results, and Section V contains the conclusions and proposals for further research.

1 http://noaasis.noaa.gov/NOAASIS/ml/avhrr.html

142


Fig. 1. Schematic diagram of the multitemporal images mining system—RemoteAgri. Module 1 corresponds to the image georeferencing step executed in batch mode by NAVPro [31]; Module 2 is executed by SatImagExplorer which was proposed to extract values or compute indexes from multitemporal images generating time series for each pixel of the image; and Module 3 refers to time series mining module (CLEARMiner) developed to associate climate data with indexes extracted from NOAA-AVHRR images.

II. A RCHITECTURE OF THE I NFORMATION M INING S YSTEM —R EMOTE AGRI The knowledge discovery process in information mining systems involves three main phases: data preparation, data mining, and presentation of knowledge. In fact, the mining process of complex objects, such as images or time series become more complex due to the effort spent in the preprocessing phase. Thus, before applying data mining techniques in remote sensing imagery, it is necessary to submit images to the preprocessing process. In this paper, we have used a database available in the Center for Meteorological and Climatic Researches Applied to Agriculture (Cepagri) at the University of Campinas (Unicamp)—Brazil, having NOAAAVHRR images since April/1995 with approximately 6 TB of images. Since the NOAA-AVHRR images often have geometric distortions caused by earth curvature, rotation, attitude errors, and imprecise orbits [29], these distortions must be corrected specially for land applications that require a highly accurate geometric matching. Geometric correction combines indirect navigation and spacecraft attitude error estimation. After that, the maximum cross correlation technique can be used to detect the geographic displacement between the base image and the target one [30]. The database used in this work also consists of climate data (temperature, rainfall) that was obtained from the Agritempo system.2 Thus, the problem focused in this paper has time series (from multitemporal images or agrometeorological indexes) as inputs. Patterns and rules are the expected outputs of the system. In order to support the mining of NOAA-AVHRR multitemporal images associated to climate series to contribute to the advancement in agriculture research at a regional scale, we have developed the RemoteAgri system. Fig. 1 shows a 2 http://www.agritempo.gov.br

schematic diagram of the system prototype consisting of three major components: image georeferencing module, time series extraction module, and time series mining method. The first module to be executed in the RemoteAgri system corresponds to the image georeferencing process, as is presented in Fig. 1. This module is composed of several C-shell scripts that call the subroutines of NAV system [31], [32] in batch mode to accomplish necessary tasks, such as: • conversion from raw format to an intermediary one; • radiometric calibration; • geometric correction; • identification of pixels classified as cloud. Following the recommendations of [33], it is important to mask out the inappropriate pixels, such as cloud-contaminated pixels. The georeferencing module allows users to generate four different synthesis images: albedo, NDVI, surface temperature, and cloud cover for a specific region as shown in Fig. 1. As the volume of images is huge, an extraction module called SatImagExplorer was proposed to perform it faster and in a more flexibly way. This module extracts values or computes the index from the images opened. Then, it generates a time series computing the index values for all images using the same coordinate (latitude/longitude) of the region. In addition to the direct interaction with the system interface, users can also extract time series using a vector of coordinates that defines the desired region. Time series extracted from multitemporal images SatImagExplorer are then mined in order to discover patterns or association patterns. The last module refers to time series mining developed to associate climate data to indexes extracted from NOAAAVHRR images. In the next section, we describe in detail the three parts of this module. III. T IME S ERIES M INING M ODULE In this paper,we present a new unsupervised algorithm, called CLEARMiner, to mine association patterns from time series extracted from NOAA-AVHRR. The process of time series mining was divided into three parts: quantization process}, association patterns generation}, and rules presentation as it can be seen in Fig. 1. First, time series are rewritten in a symbolic representation that is more succinct and manageable than continuous data. We propose the use of patterns similar to positive and negative peaks as well as plateaus that maintain the information about continuous data and time of occurrence. The proposed algorithm renders a quantization process that preserves the time series semantic proper to weather and agroclimatic events. The second part is related to the generating rules from this symbolic representation. Finally, the third part corresponds to the presentation of association patterns in two formats: short and detailed. A. Quantization Process This module receives as input a set of remote sensing and climate time series. We define time series as a sequence of pairs (ai , ti ) with i = 1, . . . , n, i.e., S = [(a1 , t1 ), . . . , (ai , ti ), . . . , (an , tn )] and (t1 < . . . < ti < . . . < tn ), where each ai is a data value, and each ti is a time value


Fig. 2. Example of event sequence where small variation in the value of di is not enough to change type of event sequence from ascending to descending.

Fig. 3. Representation of the three steps of the quantization process executed by time series mining module (CLEARMiner). (1) Calculation of differences between previous and current values of time series, (2) Identification of ascending, descending, and stable event sequences, and (3) Detection of patterns M, V, and P.

in which ai occurs. Each pair (ai , ti ) is called an evente. A set of events E contains n events of type ei = (ai , ti ) for i = 1, . . . , n. Each ai is a continuous value. Each ti is a unit of time that can be given in days, months or years. Given two sequences S1 and S2 , the values ti of both sequences must be measured in the same time unit. A set of consecutive ei , i.e., Se = (ei , ei+1 , . . . , ek ), where ei = (ai , ti ) for ti ≥ t1 and tk ≤ tn is called the event sequenceSe . The number of elements ei in the event sequence depends on the difference between events given by di = (ai+1 − ai ) (1st step in Fig. 3), and a given δ parameter whose default value is set by the algorithm. The extracted event sequences comprise a period of events having the tendency to rise or fall, when plotted as a graph. δ is a maximum threshold used to verify whether the next event in the sequence changes or not the current tendency. The value of δ is usually very small, tending to zero (δ → 0). However, δ depends on the time series, being initially set as a small value, which is a proportion empirically given. Moreover, δ can also be tuned by the user. This first step of quantization process is shown in Fig. 3. From the difference array (D), we define a set of sequences that can be ascending, descending, or stable as it can be seen in the second step of Fig. 3. The ascending event sequence is defined as a set of consecutive ei , i.e., Sea = (ei , ei+1 , . . . , ek ) k where i (di ) > 0, such that ∀di , di > 0 or |dk−i | < δ to (k − i) ≤ parameter defined by the user. Additionally, we

143

Fig. 4. Examples of patterns found by CLEARMiner algorithm are presented in graph format where the y-axis represents attribute value and time is measured in x-axis. (a) Pattern of type V is similar to negative peaks. (b) Pattern of type M is equivalent to a positive peak.

define a descending event sequence as a set of consecutive ei , i.e., Sed = (ei , ei+1 , . . . , ek ) where ki (di ) < 0, such that ∀di , di < 0 or |dk−i | < δ to (k − i) ≤ parameter defined by user. Considering this parameter and δ, small variations in the value of di are not accounted, maintaining the same event sequence type, that is, ascending or descending. Fig. 2 exemplifies this type of occurrence in an ascending event sequence. Also, a stable event sequence is a set of consecutive ei , i.e., Ses = (ei , ei+1 , . . . , ek ) where ∀di , |di | < δ. This step is also presented in Fig. 3. The combination of different types of event sequences generates patterns that resemble peaks (negative and positive) and intervals with constant distribution. A meaningful change or stability in the data distribution behavior should be monitored. As it can be seen in the third step of Fig. 3, V (valley) corresponds to a pattern defined as the concatenation of a descending event sequence and an ascending event sequence (i.e., V = Sed Sea ). P (plateau) represents a kind of pattern described as a stable event sequence (i.e., P = Ses ), while M (mountain) indicates a pattern generated by the concatenation of an ascending event sequence and a descending event sequence (i.e., M = Sea Sed ). Fig. 4(a) presents an example of a pattern V. In real data, V can be observed when a sharp drop in the minimum temperature occurs. For WRSI time series, a pattern P can occur when ai has values closer to 1. This behavior in time series corresponds to the maximum soil water content, after a long period of rainfall, for example. In a real data set, a pattern M occurs when there is a significant variation in the amplitude, such as a very heavy rain, for example. Fig. 4(b) presents an interval in a time series and highlights a pattern M. Two thresholds (ρ and λ) are defined to identify only the relevant patterns and filtering parameters. The threshold ρ is the relevance factor, and it depends on the amplitude measure (y), which is the difference between the maximum and minimum values of the time series (y = amax − amin ). Relevance factor (ρ) is a percentage of the amplitude and is used to evaluate the height of an ascending (Sea ) and a descending (Sed ) event sequence. The threshold λ defines the length of an stable event sequence (Ses ) and was defined to identify relevant P patterns. All thresholds have a default value, but they can be tuned by the user to spot more subtle, average or extreme phenomena. This tuning procedure allows to better spot the regions of interest for a specific application. After the quantization process, time series are converted into a set of patterns V, M, and P, but the complete format with data value and time is preserved. Thus, patterns are presented in short notation Si [V ], Si [M ], or Si [P ] and in an extended one, such as [ai , ak , an ](tinit − tend ). Algorithm 1 presents the

144


main idea used to convert a time series in a pattern sequence of V, P, and M. For each time series, the PatternsFind module calculates an array composed of the differences between the previous and the current values, i.e., di = ai+1 − ai (lines 1 to 3—Algorithm 1). Thus, by analyzing the array d, we can discover where there are tendencies of rising or falling in the time series, which helps discovering the event sequence. Consequently, the algorithm generates a set of sequences that can be ascending, descending, or stable (lines 4 to 8—Algorithm 1). In the sequence, CLEARMiner eliminates event sequences Sea and Sed whose sizes are smaller than ρ, and Ses smaller than λ (lines 9 and 10—Algorithm 1). Algorithm 1 PatternsFind Method Input: Time series Si ; thresholds δ, ρ, λ Output: Patterns V , M , P 1: for i := 1 to n 2: calculate the array of differences di = ai+1 − ai 3: end for 4: for alldi values do 5: Find Sea = Set of ascending event sequences 6: Find Sed = Set of descending event sequences 7: Find Ses = Set of stable event sequences 8: end for 9: Eliminate Sea and Sed when di < ρ 10: Eliminate Ses when di < λ 11: for allSe not eliminated do 12: V = concatenation of Sed Sea 13: M = concatenation of Sea Sed 14: P = Ses 15: end for 16: Set of all patterns as [ainit , ai , aend ](tinit , tend ) For each event sequence Sei not eliminated, the algorithm concatenates consecutive sequences Sea and Sed to generate an M pattern, Sed and Sea to generate a V pattern, and Ses to generate P patterns (lines 11 to 15—Algorithm 1). The module PatternsFind stores mined patterns in an array for each time series Si . The format of patterns is an interval of events such as [ainit , amid , aend ], where mid is the intermediate (turning point) value found for the event, and the time interval [tinit , tend ] that e occurs (line 16—Algorithm 1). B. Association Patterns Generation To understand the relationship between several time series, we define an association pattern as: if one pattern occurs at period i in time series 1, then another or the same pattern occurs at period j in time series 2. It means that a pattern in one time series can be associated to patterns in other time series. We consider an association pattern as an expression of the form Si [α] ⇒ Sj [β], where Si and Sj are different time series (for example, rainfall series), α and β are frequent patterns, such as M , V , or P .

The frequency of a pattern is the number of times the pattern occurs in the time series and is denoted by f r(Si [pattern]). We have defined two metrics to assess the association patterns: support and conf idence. The support of Si [α] ⇒ Sj [β] represents the frequency of occurrences and is given by support =

f r (Si [α], Sj [β]) T

(1)

where Si and Sj are time series, α and β are patterns (M , V , or P ), f r(Si [α], Sj [β]) corresponds to the total number of inputpatterns in the data set that contains α, β, and T is the total number of patterns in the data set. Different from traditional algorithms of association rules mining that consider T as the total number of transactions in a database, we define T as a function of the number of patterns in the time series converted into a sequence of symbolic patterns M , V , and P . Thus, we define T by T =

m−1

(n − i)

(2)

i=0

where m corresponds to the size of the first time series represented by symbolic patterns and n is the size of the other time series also converted into a sequence of symbolic patterns, for all (n − i) > 0. The rules can be generated for the complete series, which greatly increases the number of generated rules or considering a sliding window of size w that is defined by the number of patterns. This parameter can be changed by the user, depending on how far he/she wants to analyze. In general, the value of w is small because specialists are more interested in knowing the correlation between two series in a short period of time to understand the correlation between specific episodes in different series. If rules are calculated for a window of size w, as a result, T is calculated by T =

w−1

(n − i) ∗

i=0

m w

(3)

where m corresponds to the size of the first time series represented by symbolic patterns, n is the size of the other time series also converted into a sequence of symbolic patterns, for all (n − i) > 0, and w is the window size defined by the user. For example, if a data set contains 96 patterns and out of them 45 patterns correspond to S1 [V ], S2 [M ], the support(S1 [V ], S2 [M ]) = 0.46 (46%). Given a minimum support (min_sup) specified by the user, we say that a pattern is frequent if it occurs more than min_sup times. Thus, frequent patterns are used to generate rules as afore described. The confidence measure indicates the percentage of all patterns in Si and Sj containing Si [α] that also contain Sj [β]. The confidence for the rule Si [α] ⇒ Sj [β] is given by conf =

f r (Si [α], Sj [β]) f r (Si [α])

(4)

where Si and Sj are time series, α and β are patterns (M , V , or P ), f r(Si [α], Sj [β]) corresponds to the total number of input-patterns in the data set that contains α, β, and f r(Si [α]) is the total number of patterns that contains α.


Fig. 5. Diagram illustrating the steps for rules generation in CLEARMiner algorithm. (a) Example of the frequent patterns discovery process. (b) Example of association patterns generation.

145

The PatternsFind module is called to find patterns and to generate an array of patterns for all series. The pseudocode for PatternsFind is showed in Algorithm 1. The CLEARMiner algorithm calculates j-frequentPatterns for each time series. For example, if a data set contains three time series, a 2-frequentPattern time series can be S1 [P ]S2 [V ] or S1 [V ]S2 [M ], i.e., a frequentPattern combines patterns of different time series. The algorithm only stores j-frequentPatterns greater than the min_sup threshold defined by the user (lines 5 to 14—Algorithm 2). This step is shown in Fig. 5(a). To calculate the support of the pattern S1 [M ], S2 [P ], we first calculate the frequency of the pattern S1 [M ], S2 [P ], counting the number of times that S1 [M ] is associated to S2 [P ], that is equal to 9. Then, we calculate T by (2), where m = n = 8, i.e., both time series S1 and S2 have of the same number 7 patterns. As a consequence, T = 8−1 i=0 (8 − i) = i=0 (8 − i) = 36. Using (1), the support sup = f r(S1 [M ], S2 [P ])/T = 9/36 = 0.25 = 25% to the pattern S1 [M ], S2 [P ]. As the sup ≥ min_sup, the pattern S1 [M ], S2 [P ] is selected as frequent. Algorithm 3 RuleGenerate Method

Given a user-specified minimum confidence (min_conf ), association patterns are generated if they satisfy the conditions support ≥ min_sup to discover frequent patterns and conf ≥ min_conf . The CLEARMiner algorithm first converts time series into a sequence of three types of patterns (V , M , and P ) that are relevant and meaningful to agrometeorological researches. In the same time, the algorithm considers the occurrence time of events, organizing the pieces quantized in patterns that are a semanticly related to weather events. Then, CLEARMiner generates rules for the full time series or by window of size w. Algorithm 2 shows the pseudocode for CLEARMiner. Algorithm 2 CLEARMiner Algorithm Input: Dataset A of k time series structured as {e1 , e2 , . . . , en } where ei is an event of time series Si and k is the number of time series; p is frequent pattern and m is the number of patterns; thresholds δ, ρ, λ and w Output: The mined rules Scan data set A 2: for each time series Si do PatternsFind(Si , δ, ρ, λ) 4: end for F1 = {1 − frequentPattern (Si [pattern])} 6: forp = 2; p ≤ m; p = p + 1do Cp = Set of candidate p-frequentPattern 8: (Si [pattern]Sj [pattern] and so on) for all input-frequentPatterns in the data set do 10: increment count of all p-frequentPattern ∈ Cl end for 12: Fp = {f requentP attern ∈ Cp | sup(f requentP attern) ≥ min_sup} 14: end for for allwdo 16: RuleGenerate(Fp , min_conf) end for

Input: Fp and min_conf Output: The mined rules for all frequentPattern Si [α] and Sj [β] ∈ Fp do conf = f r(Si [α], Sj [β])/f r(Si [α]) 3: ifconf ≥ min_conf then output the rule Si [α] ⇒ Sj [β] and conf end if 6: end for For each frequent pattern in F , the algorithm calculates, via the RuleGenerate Method, the confidence value (line 2— Algorithm 3). If confidence is greater than min_conf , it generates rules (lines 3 to 5—Algorithm 3), as it can be seen in Fig. 5(b). C. Rules Presentation To better visualize rules, the algorithm presents them in two formats: short (the succint way) and extended (those with more details and time stamp). The short format is more succinct and easier to be analyzed. However, it contains no information about the context in which the phenomenon occurred. An example of a rule generated is: S1 [V ] ⇒ S2 [M ]. In this example, the rule indicates that a decrease in the time series 1 is associated to an increase in the other series (S2 ). In addition to the rules in short format, the CLEARMiner algorithm generates association patterns in extended format as well. An example is: S1 [ai , ak , an ] (tinit1 −tend1 ) ⇒ S2 [aj , al , am ] (tinit2 −tend2 ) . This rule indicates that the pattern [ai , ak , an ] occurred in the period (tinit1 − tend1 ) for the time series S1 , which is associated to the pattern [aj , al , am ] occurred in the period (tinit2 − tend2 ) for the series S2 with tinit1 ≤ tinit2 and tend1 ≤ tend2 .

146


TABLE I D EFINITION OF DATA S ETS T HAT WAS U SED TO E VALUATE THE P ERFORMANCE OF CLEARM INER A LGORITHM

Fig. 6. Examples of rules in short and extended format generated in the time series mining module. The values are given as in the original series. Rain (mm) and Temperature (Celsius).

Fig. 7. Test area is located in Sao Paulo, an important state of Southeast of Brazil and corresponds to the scene with orbit/point 220/75 of Landsat satellite.

Thus, the user can analyze rules in the short format to verify correlations between time series and to use the extended format to obtain more details. An example with real data is presented in Fig. 6. IV. E XPERIMENTAL R ESULTS Our study has been conducted on multitemporal NDVI images from NOAA-AVHRR, covering the scene with orbit/point 220/75 of Landsat satellite. We have selected regions located in Sao Paulo, an important state of Southeast of Brazil (54◦ 00 to 43◦ 30 W and 25◦ 30 to 19◦ 30 S), which is responsible for the majority of sugar cane production in the country (Fig. 7). Sugar cane crops are cultivated in plain relief. The climate of this region presents fluctuations in temperature during the rainy season: October to March. We present the results of experiments on two real data sets which were performed to evaluate and validate the proposed algorithm. The results from such experiments followed the specialists’ expectations and helped on tuning the algorithms’ parameters. Table I presents a summary of the data sets used, giving their dimensions number (E) and the size of time series (N ). A. Mining NDVI and WRSI Time Series From Sugar Cane Regions We have processed more than 2500 NOAA-AVHRR images from five sugar cane regions through the image georeferencing module. Once processed, we have generated NDVI images by

Fig. 8. Planting of sugar cane usually begins in June (C), and this appearance is represented by green and blue shades in the NDVI images. These colors represent the lowest NDVI values, which indicate areas with exposed soil and sparse vegetation. This kind of color also appears in the NDVI images from July to November (D, E, F, G, H). From December (I), when sugar cane crops present more biomass, these regions in the images acquire yellow, orange, and red shades. The maximum NDVI is represented by a stronger red shade when sugar cane crops reach their peak of vegetal development from February to May (K, L, A, B). Dark areas in the images represent pixels covered by clouds. This phenomena occurs more frequently in December (I), January (J), and February (K).

combining channels 1 and 2 of NOAA-AVHRR satellite. To obtain high quality images, we have used the maximum value composite (MVC) technique that is useful to minimize the effects of shadows, aerosols, and water vapor present in the atmosphere [34]. Thus, each image is the MVC of a month, resulting in one image per month, for a period of 7 years. Fig. 8 shows an example of one year/harvest (2005–2006). There are months in the year, such as January, February, and March, where there are not enough good images to analyze due to clouds coverage. Consequently, each time series contains ∼ =100 measures of NDVI. However, as we used five sugarcane regions, we obtained ∼ =500 measures of NDVI in the data set (I). For each region, we calculated the mean value for all pixels obtaining a single value by month. The agroclimatic conditions through the analyzed period are described by WRSI. This index varies from zero to one and


147

Fig. 9. Graph illustrating time lag of 2 months, detected in a association pattern.

represents the fraction of the water that was really consumed by the plants to ensure maximum productivity. The water balance calculus proposed in [35] was used to calculate WRSI values that were used in this analysis. It defines the input and output water flow in a system, such as a column of soil or a drainage basin. In this experiment, CLEARMiner has mined more M and V patterns than plateaus (P ), because the time interval is monthly. The M patterns detected are given by their actual amplitude values, regarding the three regions analyzed, Jaboticabal, Jau, and Sertaozinho, as follows: • Jaboticabal: [0.247385; 0.648107; 0.307657] in [09/2003–10/2004]; • Jau: [0.296928; 0.615282; 0.237196] in [10/2004–10/2005] [0.237196; 0.618585; 0.264748] in [10/2005–10/2006]; • Sertaozinho: [0.264471; 0.611832; 0.269969] in [10/2002–09/2003]. The M patterns are related to periods when green biomass reaches its high values, before the sugar cane harvest that begins each May. P patterns were found in WRSI time series. It corresponds to a small variation in WRSI index, such as [0.95; 1.0; 0.99] [10/2001–03/2002] found in the Jaboticabal time series. This phenomenon occurs when the maximum soil water content is reached after a long rainy season. We have used window of size 2 (w = 2) because the values are monthly and the number of patterns found was not very large. The thresholds min_sup and min_conf were set to 20% and 90%, respectively. The rules found show that when a negative peak occurs in the WRSI time series, the same pattern occurs in the NDVI time series, as we can see by the rule: Example 1 : Short F ormat : WRSI[V] ⇒ NDVI[V] Extended F ormat : WRSI[0.8; 0.27; 0.87](05/2002 − 09/2002) ⇒ NDVI[0.54; 0.27; 0.63](05/2002 − 02/2003). However, observing the rules in extended format, we can see that pattern V occurs in NDVI time series with a time lag of 1 or 2 months, as it can be seen in Fig. 9. This information is not

Fig. 10. Anomalies of sea surface temperature (SST) in the region of Nino 3.4 (1970–2010). (Adapted from NOAA).

evident in short format, but it is important to better understand the context in which the phenomenon occurs. Moreover, when a plateau with maximum values for WRSI occurs, there is the default type for the positive peak NDVI, as for example the rule: Example 2 : Short F ormat : WRSI[P] ⇒ NDVI[M] Extended F ormat : WRSI[1.0; 0.96; 0.96](01/2003 − 05/2003) ⇒ NDVI[0.27; 0.58; 0.24](10/2002 − 09/2003). This rule indicates that there is a correspondence between NDVI and WRSI because while the value of WRSI is high (around 1,0), the value of NDVI increases and only begins to decline after the decrease in the value of WRSI. They confirm the expectations of researchers in agrometeorology, because high values of WRSI indicate that there was enough rain to make the soil wet. NDVI measure the green biomass and the index increases as the plant grows and acquires more biomass due to the soil being wet. B. Mining Time Series of Rainfall and Anomalies Related to El Nino We have also used the CLEARMiner algorithm for mining patterns in heterogeneous time series of meteorological data (rainfall) and anomalies related to the El Nino phenomenon. The El Nino data set is composed of monthly temperatures and anomalies from four regions in the Pacific Ocean from 1966 to 2008. The warming of the Pacific Ocean can occur in three or four regions, and the values of temperature were measured in these regions. Fig. 10 presents a graph of the sea surface temperature from 1970 to 2010. Both phenomena El Nino and La Nina influence the climate in South and Southeast regions of South America.

148


TABLE II A SSOCIATION PATTERNS TO E L N INO DATA S ET BY CLEARM INER A LGORITHM T HAT C ORRESPONDS TO THE T IME S ERIES M INING M ODULE OF R EMOTE AGRI SYSTEM

TABLE IV S EQUENCES G ENERATED BY THE G ENERALIZED S EQUENTIAL PATTERN (GSP) A LGORITHM T HAT WAS U SED TO C OMPARE W ITH CLEARM INER A LGORITHM

TABLE III RULES G ENERATED F ROM NDVI AND WRSI T IME S ERIES BY CLEARM INER A LGORITHM T HAT C ORRESPONDS TO THE T IME S ERIES M INING M ODULE OF R EMOTE AGRI S YSTEM

This result is a useful rule that was found by the algorithm: when an increase occurred in anomalies series, rain increased during Spring/Summer in the South Region of Brazil. This association pattern highlights a strong El Nino occurred in 1983 as it can be seen in Fig. 10. C. Comparison With Other Algorithms of Association Rules Mining

In this experiment, CLEARMiner has detected M and V patterns. No plateau pattern was found in the El Nino data set, probably because the data was measured monthly, and the anomaly series has very small variation and amplitude. Here, also, we have used a window of size equal to two, because the values are monthly, and the number of patterns found was not very large. The thresholds min_sup and min_conf were also set to 20% and 90%, respectively. Examples of mined rules from El Nino data set are presented in Table II (Table III). These association patterns indicate that an increase in a series led to an increase in the other series in previous or subsequent time. It also found that a decrease in the values observed in a series led to a decrease in another series. It was also observed that an increase in a series can lead to a decrease in the other series analyzed, or vice versa. CLEARMiner detected several practical rules, exemplified as follows: Example 1 : Short F ormat : Anom[M] ⇒ Rain[M]

Extended F ormat : Anom[−1.27; −0.55; −0.84](05/01/1966−09/01/1966) ⇒ Rain[43.0; 241.6; 18.8](05/01/1966−08/01/1966). When an increase occurred in anomalies [−1.27, −1.03, −0.84] in the period between (05/01/1966 and 09/01/1966), the rain increased in South region of Brazil [43.0, 241.6, 18.8] in the period between (05/01/1966 and 08/01/1966). Example 2 : Short F ormat : Rain[M] ⇒ Anom[M]

Extended F ormat : Rain[44.8; 355.2; 70.0](12/01/1982 − 04/01/1983) ⇒ Anom[−1.15; 3.33; 2.13](03/01/1982 − 02/01/1983).

In this section, we show results by comparing our algorithm with two classical and baseline algorithms, apriori [5] and the generalized sequential pattern (GSP) algorithm [7]. Both algorithms were performed in the Weka platform.3 As the two algorithms work with discrete data, we compared only the rules generation. The data sets used to run apriori and GSP were quantized by CLEARMiner to avoid distortions that could be caused by different quantization processes. The apriori algorithm mined few rules and did not consider time of occurrences. Setting confidence to 0.8, the apriori algorithm generated only three rules from the data set with NDVI and WRSI values for Jau region as follows: 1) WRSI = V ⇒ NDVI = V conf : (1); 2) WRSI = M ⇒ NDVI = M conf : (1); 3) NDVI = V ⇒ WRSI = V conf : (0.86). The GSP algorithm scans the database several times to generate a set of candidate k-sequences and to calculate their support. We executed the GSP algorithm with min_sup = 0.2, which generated the sequences presented in Table IV for the data set of NDVI and WRSI values for Jau region. For min_sup values above 0.2, the GSP algorithm in Weka did not work properly. The sequences mined by GSP are similar to the rules generated by CLEARMiner. However, both algorithms (Apriori and GSP) do not keep information about the time occurrence of the events. CLEARMiner generates rules in an extended format, which can be used to obtain more details about the correlation between time series. Another advantage of our method is the quantization process that is executed as a first step. This quantization generates a representation that encompasses the semantics meaningful for climate and agroclimate time series. The criteria to quantize time series is based on phenomena that are observed by meteorologists and agrometeorologists and impact the environment. V. C ONCLUSION In this paper, we have presented a new unsupervised algorithm to mine association patterns in climate and remote sensing time series, integrated in a remote sensing information 3 http://www.cs.waikato.ac.nz/ml/weka/


system produced to improve the monitoring of sugar cane fields. This algorithm works on multiple time series of continuous data, identifying patterns according to a given relevance factor (ρ) and a plateau length (λ) thresholds. In its last step, the algorithm associates patterns according to a temporal sliding window that corresponds to the number of patterns. The number of patterns decreases when the tuning parameters increase, as the experiments showed. Patterns can be seen as discrete intervals that allow the association between series. CLEARMiner presents rules in two formats: short and extended. Short rules are easier to understand, but they are not sufficient to visualize the peak amplitudes and the length of the plateaus. Therefore, the algorithm also presents rules in extended format including details of the values variation and time intervals. The results show that the algorithm detects some association patterns that are known by experts, as expected, indicating the correctness and feasibility of the proposed method. Moreover, other patterns detected using the highest relevance factor are coincident with extreme phenomena as many days without rain or heavy rain as the specialists suppose to. The mined rules for the relevance patterns indicate a relation between series, allowing these patterns (phenomena) happen in different intervals of time. Summarizing, the main contributions of our algorithm are: 1) include a process of discretization that preserves the semantic meaning of data regarding time; 2) keep the discretized continuous intervals with their respective times of occurrence to generate the rules; 3) consider the time lag when it generates rules that associate different time series. Then, this new method can be used by agrometeorologists to mine and discover knowledge from their long time series of past and forecasting data, being a valuable tool to support their decision-making process. Further works include improvement of the CLEARMiner algorithm to find rules with more complex patterns. ACKNOWLEDGMENT The authors thank Embrapa, FAPESP, CNPq, CAPES, SticAmsud, and Microsoft Research for funding; Agritempo for climate data; and CEPAGRI/Unicamp for the database of remote sensing imagery. R EFERENCES [1] M. Datcu, A. Pelizzari, H. Daschiel, M. Quartulli, and K. Seidel, “Advanced value adding to metric resolution sar data: Information mining,” in Proc. 4th EUSAR, s.p. 2002. [2] M. Datcu, H. Daschiel, A. Pelizzari, M. Quartulli, A. Galoppo, A. Colapicchioni, M. Pastori, K. Seidel, P. G. Marchetti, and S. D’Elia, “Information mining in remote sensing image archives: System concepts,” IEEE Trans. Geosci. Remote Sens., vol. 41, no. 12, pp. 2923–2936, Dec. 2003. [3] J. Li and R. M. Narayanan, “Integrated spectral and spatial information mining in remote sensing imagery,” IEEE Trans. Geosci. Remote Sens., vol. 42, no. 3, pp. 673–685, Mar. 2004. [4] H. Daschiel and M. Datcu, “Information mining in remote sensing image archives: System evaluation,” IEEE Trans. Geosci. Remote Sens., vol. 43, no. 1, pp. 188–199, Jan. 2005.

149

[5] R. Agrawal, C. Faloutsos, and A. Swami, “Efficient similarity search in sequence databases,” in Proc. 4th Int. CFDOA, Chicago, IL, 1993, pp. 69–84. [6] G. Das, K. Lin, H. Mannila, G. Renganathan, and P. Smyth, “Rule discovery from time series,” in Proc. 4th ICKDDM, New York, 1998, pp. 16–22. [7] M. J. Zaki, “Spade: An efficient algorithm for mining frequent sequences,” Mach. Learn., vol. 42, no. 1/2, pp. 31–60, 2001. [8] R. Srikant and R. Agrawal, “Mining sequential patterns: Generalizations and performance improvements,” in Proc. ICEDT, Avignon, France, 1996, pp. 3–17. [9] M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A. I. Verkano, “Finding interesting rules from large sets of discovered association rules,” in Proc. CIKM, Gaitherburg, MD, 1994, pp. 401–407. [10] H. Mannila, H. Toivonem, and A. I. Verkamo, “Discovery of frequent episodes in event sequences,” Data Mining Knowl. Discovery, vol. 1, no. 3, pp. 259–289, 1997. [11] M. Zaki, “Sequence mining in categorical domains: Incorporating constraints,” in Proc. CIKM, Washington, DC, 2000, pp. 422–429. [12] S. K. Harms, J. Deogun, J. Saquer, and T. Tadesse, “Discovering representative episodal association rules from event sequences using frequent closed episode sets and event constraints,” in Proc. ICDM, San Jose, CA, 2001, pp. 603–606. [13] C. Bettini, X. S. Wang, S. Jajodia, and J. L. Lin, “Discovering frequent event patterns with multiple granularities in time sequences,” IEEE Trans. Knowl. Data Eng., vol. 10, no. 2, pp. 222–237, Mar./Apr. 1998. [14] J. Wang and J. Han, “BIDE: Efficient mining of frequent closed sequences,” in Proc. ICDE, 2004, pp. 79–90. [15] A. Julea, N. Meger, P. Bolon, C. Rigotti, M.-P. Doin, E. Lasserre, C. Trouve, and V. Lazarescu, “Unsupervised spatiotemporal mining of satellite image time series using grouped frequent sequential patterns,” Int. J. Remote Sens., vol. 49, no. 4, pp. 1417–1430, Apr. 2011. [16] S. K. Harms and J. S. Deogun, “Sequential association rule mining with time lags,” J. Intell. Inf. Syst., vol. 22, no. 1, pp. 7–22, Jan. 2004. [17] T. Wu, G. Song, X. Ma , K. Xie, X. Gao, and X. Jin, “Mining geographic episode association patterns of abnormal events in global earth science data,” Sci. China Ser. E, Technol. Sci., vol. 51, no. Supplement 1, pp. 155– 164, 2008. [18] R. Honda and O. Konishi, “Temporal rule discovery for time-series satellite images and integration with RDB,” in Proc. PKDD, Freiburg, Germany, 2001, pp. 204–215. [19] A. Julea, N. Méger, and E. Trouvé, “Sequential patterns extraction in multitemporal satellite images,” in Proc. PKDD, 2006, pp. 96–99. [20] W. Ding, C. Eick, J. Wang, and X. Yuan, “A framework for regional association rule mining in spatial data sets,” in Proc. 6th IEEE Int. Conf. Data Mining, Hong Kong, China, 2006, pp. 851–856. [21] W. Ding, T. Stepinski, and J. Salazar, “Discovery of geospatial discriminating patterns from remote sensing datasets,” in Proc. SIAM ICDM, Sparks, NV, 2009, pp. 425–436. [22] J. Zhang, L. Gruenwald, and M. Gertz, “VDM-RS: A visual data mining system for exploring and classifying remotely sensed images,” Comput. Geosci., vol. 35, no. 9, pp. 1827–1836, Sep. 2009. [23] W. T. Liu and F. Kogan, “Monitoring Brazilian soybean production using NOAA/AVHRR based vegetation condition indices,” Int. J. Remote Sens., vol. 23, no. 6, pp. 1161–1179, 2002. [24] R. R. V. Goncalves, C. R. Nascimento, J. Zullo, Jr., and L. A. S. Romani, “Relationship between the spectral response of sugar cane, based on AVHRR/NOAA satellite images, and the climate condition, in the state of Sao Paulo (Brazil), from 2001 to 2008,” in Proc. 5th Int. Workshop MultiTemp, Groton, CT, 2009, pp. 315–322. [25] B. F. T. Rudorff, M. Adami, D. A. de Aguiar, A. Gusso, W. F. da Silva, and R. M. de Freitas, “Temporal series of EVI/MODIS to identify land converted to sugarcane,” in Proc. IEEE IGARSS, Cape Town, South Africa, 2009, vol. IV, pp. 252–255. [26] A. C. Xavier, B. F. T. Rudorff, Y. E. Shimabukuro, L. M. S. Berka, and M. A. Moreira, “Multi-temporal analysis of MODIS Data to classify sugarcane crop,” Int. J. Remote Sens., vol. 27, no. 4, pp. 755–768, 2006. [27] C. R. Nascimento, R. R. V. Goncalves, J. Z. , Jr., and L. A. S. Romani, “Estimation of sugar cane productivity using a time series of avhrr/noaa17 images and a phenology-spectral model,” in Proc. 5th Int. Anal. MultiTemp, Groton, CT, 2009, pp. 365–372. [28] L. A. S. Romani, A. M. H. Avila, J. Zullo, Jr., R. Chbeir, C. Traina, Jr., and A. J. M. Traina, “CLEARMiner: A new algorithm for mining association patterns on heterogeneous time series from climate data,” in Proc. ACM SAC, Sierre, Switzerland, 2010, pp. 900–905. [29] G. W. Rosborough, D. G. Baldwin, and W. J. Emery, “Precise AVHRR image navigation,” IEEE Trans. Geosci. Remote Sens., vol. 32, no. 3, pp. 644–657, May 1994.

150


[30] W. Emery, D. G. Baldwin, and D. Matthews, “Maximum cross correlation automatic satellite image navigation and attitude corrections for openocean image navigation,” IEEE Trans. Geosci. Remote Sens., vol. 41, no. 1, pp. 33–42, Jan. 2003. [31] J. C. D. M. Esquerdo, J. F. G. Antunes, D. G. Baldwin, W. J. Emery, and J. Zullo, Jr., “An automatic system for AVHRR land surface product generation,” Int. J. Remote Sens., vol. 27, no. 18, pp. 3925–3942, 2006. [32] W. J. Emery, J. Brown, and Z. P. Novak, “AVHRR image navigation: Summary and review,” Photogramm. Eng. Remote Sens., vol. 55, no. 8, pp. 1175–1183, Aug. 1989. [33] P. Y. Chen, R. Srinivasan, G. Fedosejevs, and J. R. Kinitry, “Evaluating different NDVI composites techniques using NOAA −14 AVHRR data,” Int. J. Remote Sens., vol. 24, no. 17, pp. 3403–3412, 2003. [34] B. N. Holben, “Characteristics of maximum value composite images from temporal AVHRR data,” Int. J. Remote Sens., vol. 7, pp. 1417–1434, 1986. [35] C. W. Thornthwaite and J. R. Mather, “The water balance,” Climatology, vol. 8, no. 1, p. 104, 1955.

Luciana Alvim S. Romani received the B.Sc. degree in computer science from the Federal University of São Carlos, São Carlos, Brazil, in 1993, and the M.Sc. degree in computer science from State University of Campinas, Campinas, Brazil, in 2000. She is a Researcher of the Brazilian Agricultural Research Corporation since 1994. Her research interests include remote sensing applied to agriculture, data mining, information visualization, and humancomputer interaction.

Ana Maria H. de Avila received the B.Sc. degree in meteorology from the Federal University of Pelotas, Pelotas, Brazil, the M.Sc. degree in agrometeorology from Federal University of Rio Grande do Sul, Porto Alegre, and the Ph.D. degree in agricultural engineering at University of Campinas (UNICAMP), Brazil, in 2006. Since July 2006, she has been working at the Center of Research Meteorological and Climatological applied to Agricultural (CEPAGRI) at UNICAMP. Currently, she is the Director of CEPAGRI and a Group Member of Climate Risk Zoning in Brazil. Her research interests include agrometeorology and climatology.

Daniel Y. T. Chino received the B.Sc. degree in computer science from Instituto de Ciências Matemáticas e de Computação (ICMC)—USP São Carlos, Brazil, in 2012. Currently, he is a graduate student in computer science at ICMC—USP São Carlos, Brazil. He has engaged in an undergraduate research project since 2009. His research interests include image processing, remote sensing, and data mining.

Jurandir Zullo, Jr. received the M.Sc. degree in applied mathematics and the Ph.D. degree in electrical engineering from the University of Campinas (UNICAMP), Campinas, Brazil, in 1990 and 1994, respectively. He received the B.Sc. degree in agricultural engineering (1987) and applied mathematics (1985) at the UNICAMP. Since 1987, he has been working at Center of Research Meteorological and Climatological applied to Agricultural (CEPAGRI) at UNICAMP. His research interests include agrometeorology, remote sensing, climatology, and mathematical modeling.

Richard Chbeir received the Ph.D. degree in computer science from the University of INSA-France, in 2001. Currently, he is an Associate Professor in the Computer Science Department of the Bourgogne University, Dijon, France. His research interests are in the areas of distributed multimedia database management, XML similarity and rewriting, spatiotemporal applications, indexing methods, multimedia access control models, security, and watermarking.

Caetano Traina, Jr. received the Ph.D. degree in computational physics, M.Sc. degree in computer science, and B.Sc. degree in electrical engineering from the University of Sao Paulo, São Carlos, Brazil, in 1986, 1982, and 1977, respectively. He is a Full Professor with the Computer Science Department at University of Sao Paulo at São Carlos. His research interests include multimedia databases, similarity queries on complex data, data mining, and metric access methods.

Agma J. M. Traina (A’00–M’11) received the Ph.D. degree in computational physics, M.Sc. degree in computer science, and B.Sc. degree in computer science from the University of Sao Paulo, São Carlos, Brazil, in 1991, 1987, and 1983, respectively. She is a Full Professor with the Computer Science Department at University of Sao Paulo at São Carlos. Her research interests include image databases, image mining, visual data mining, content-based image retrieval, indexing methods for multidimensional data, information visualization, image processing for agroclimate applications, and complex data.