we discuss the role of perception-based time series data mining and computing with words and perceptions ... supported by statistical, data mining, or data processing software. Import of intelligent ...... 2005/2504/01/25041xxx.pdf. In this book.
Perception Based Patterns in Time Series Data Mining I. Batyrshin, L. Sheremetov, and R. Herrera-Avelar
Summary. Import of intelligent features to systems supporting human decisions in problems related with analysis of time series data bases is a promising research field. Such systems should be able to operate with fuzzy perception-based information about time moments and time intervals; about time series values, trends and shapes; about associations between time series and time series patterns, etc., to formalize human knowledge, to simulate human reasoning and to reply on human questions. The chapter discusses methods developed in TSDM to describe linguistic perception-based patterns in time series databases. The survey considers different approaches to description of such patterns which use sign of derivatives, scaling of trends and shapes, linguistic interpretation of patterns obtained as result of clustering, a grammar for generation of complex patterns from shape primitives, and temporal relations between patterns. These descriptions can be extended by using fuzzy granulation of time series patterns to make them more adequate to perceptions used in human reasoning. Several approaches to relate linguistic descriptions of experts with automatically generated texts of summaries and linguistic forecasts are considered. Finally, we discuss the role of perception-based time series data mining and computing with words and perceptions in construction of intelligent systems that use expert knowledge and decision making procedures in time series data base domains.
1 Introduction Till now most of the decision making procedures in problems related with time series (TS) analysis in economics and finance are based on human decisions I. Batyrshin et al.: Perception Based Patterns in Time Series Data Mining, Studies in Computational Intelligence (SCI) 36, 85–118 (2007) www.springerlink.com © Springer-Verlag Berlin Heidelberg 2007
86
I. Batyrshin et al.
supported by statistical, data mining, or data processing software. Import of intelligent features to these systems including the possibility of operating with linguistic information, reasoning, and replying on questions is a promising research field. Computational theory of perceptions (CTP) [1–3] can serve as a basis for such extension of these systems. Fuzzy logic as a main constituent of CTP gives powerful tools for modeling and processing linguistic information defined on numerical domain. Methodology of computing with words and perceptions proposes methods of reasoning with linguistic information based on fuzzy models. The success of fuzzy logic applications in control, technical systems modeling, and pattern recognition is based on a synergy of linguistic descriptions and numerical data available for these application areas. Fuzzy logic serves here as a bridge between linguistic and numerical information. One of the prerequisites of fuzzy logic applications in these areas is the existence of regular resources of numerical data obtained from traditional mathematical models, experiments or measurements which can be used as a basis for construction, examination, and tuning of fuzzy models. In his recent works and lectures, Lotfi Zadeh called attention to decision making applications of fuzzy logic in economics, finance, Earth sciences, etc. with the central role of human perceptions. Perception-based propositions like “The price of gas is low and declining,” “It is very unlikely that there will be a significant increase in the price of oil in the near future,” etc. are usually used by people in decision making procedures. Perceptions like low, declining, very unlikely, significant increase, near future, etc. usually use fuzzy granulation of information [4] obtained from observations, measurements, life experience, mathematical analysis, visual perceptions about curves, etc. The formation of perceptions is a process of knowledge extraction from different sources. Development of intelligent question answering systems [5] supporting decision making procedures related with time series data bases (TSDB) needs to formalize human perceptions about time, time series values, patterns and shapes, about associations between patterns and time series, etc. These perceptions can be represented by words whose meaning is defined on the domains of time series data bases:
Perception Based Patterns in Time Series Data Mining
87
1. On time domain: – – – –
Time intervals: one–two weeks, several days, end of the day Absolute position on time scale: approximately on September 20 Respective position: after one month, near future Periodic or seasonal time intervals: end of the day, several weeks before Christmas, in summer
2. On the domain of TS values: large price; very low level of production 3. As perception-based function or pattern of TS shape: slowly decreasing, quickly increasing, and slightly concave 4. On the set of time series, attributes, or system elements: stocks of new companies 5. On the set of relations between TS, attributes, or elements: highly associated 6. On the set of possibility or probability values: unlikely, very probable Most of such perceptions can be represented as fuzzy sets defined on corresponding domain. Figure 1 depicts an example of fuzzy set SEVERAL DAYS defined on time domain. This fuzzy set reflects the perception that 3, 4, and 5 days are definitely correspond to the term SEVERAL DAYS but 2 and 6 days correspond to this term only partially. m
SEVERAL DAYS 1
0.8
0.6
0.4
0.2
0 1
t
2
3
4
5
6
7
8
9
10
Fig. 1. Fuzzy set SEVERAL DAYS
Figure 2a depicts an example of time series of PRICE values; Fig. 2b depicts a fuzzy set LARGE PRICE defined on a domain of price values y. This fuzzy set together with the time series of price values define by the extension principle of Zadeh a fuzzy set DAYS WITH LARGE PRICE on time domain (Fig. 2c).
88
I. Batyrshin et al. y
a)
b)
y
12 10
10 8 6
5 4 2 0
5
10
m
15
20
15
20
m
t
1
0.8
0.6
0.4
0.2
0
c)
1 0.8 0.6 0.4 0.2 0
5
10
t
Fig. 2. (a) Time series of PRICE values; (b) fuzzy set LARGE PRICE defined on domain of price values y; (c) fuzzy set DAYS WITH LARGE PRICE induced on time domain
Figure 3 depicts an example of fuzzy sets defined on a domain of slope values and corresponding to perception-based trend patterns: Quickly Decreasing, Slowly Decreasing, Constant, Slowly Increasing, and Quickly Increasing. An exact definition of membership values of fuzzy sets used in models of computing with words often is not extremely important because input and output of model are words [1] which are insensible to some change in membership values of fuzzy sets obtained as result of translation of words. 1
Q-DEC
S-DEC
Q-INC
CONST S-INC
0.8
0.6
0.4
0.2
0 -10
-5
0
5
10
Fig. 3. Example of fuzzy perception-based slopes defined on a domain of slope values
Perception Based Patterns in Time Series Data Mining
89
This situation differs from fuzzy modeling based on Mamdani or Sugeno models where initial definition of fuzzy sets usually does not play a large role in construction of final fuzzy model when a tuning of membership functions is used in the presence of training input–output data [6, 7]. Human perceptions are intrinsically imprecise and granular, such that the boundaries of perceived classes are unsharp and the values of attributes are granulated [4]. Such a granulation may be crisp or fuzzy. The CTP gives a conceptual framework and a methodology for computing and reasoning with perceptions. The base for CTP is the methodology of computing with words (CW) [1]. In this case, computing and reasoning with perceptions is reduced to computing and reasoning with words. Granular perceptions can be represented by fuzzy sets and computing with perceptions can be realized by methods of fuzzy sets theory [8]. Computing with words and perceptions can serve as a basis for insertion of deduction capability [5] into decision support systems related with economical and financial time series data bases. Intelligent question answering system based on financial or economic time series data bases should give replies on fuzzy questions, realize perceptionbased inference and do a perception-based forecasting. Below are examples of questions for such systems: 1. Find: – Wells with high level of water production – Securities quickly increasing in price in the end of the day – Highly mutually associated currencies – Most perspective securities 2. Forecast: – The price on sugar in the middle of July if we know this price in the beginning of May. Additional information: The price on sugar is slowly increasing in Spring and more quickly increasing in Summer. – The prices on cosmetics if the oil price will be greater than 75 dollars per barrel. – The sales of the new product after six months. 3. Optimize: – What commodities, when and what amount to buy to obtain maximal profit during the next year? – What production capacity is needed to produce a new product?
90
I. Batyrshin et al.
It is very difficult to give an exact reply to most of these questions without additional information about environment conditions and requirements; and only qualitative, perception-based evaluations may be given. These replies will depend on evaluation of a current situation, tendencies, existing associations between system elements, expert knowledge, etc. The currently developed methods of time series analysis, forecasting, and mathematical programming may be useful for replying to some of these questions if exact information for applying these methods is known. But very often such information is fuzzy, absent, or scarce. In contrast to time series of physical variables, e.g., electromagnetic waves, which find applications in technical systems as result of application of methods of mathematical physics, the economical and financial time series are often evaluated and used in economical and financial decisions based on human perceptions, expertise, and knowledge. Linguistic rules and perception-based descriptions are intrinsic part of such human solutions. Moreover, economic and financial systems are very complex and it is impossible to take into account all information influencing on the solution of the decision making problems. To these systems the Principle of Incompatibility of Zadeh can be applied [9]: “As the complexity of a system increases, our ability to make precise and yet significant statements about its behavior diminishes until a threshold is reached beyond which precision and significance (or relevance) become almost mutually exclusive characteristics.” For this reason, often only qualitative, perception-based solutions have sense in decision making in complex systems. Realization of the system supporting perception-based decision making procedures in the problems related with the analysis of time series data bases needs to extend the methods of time series data mining (TSDM) to give them possibility to operate with perceptions. The goal of data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner [10]. The following list contains the main time series data mining tasks [11–17]: Segmentation. Split a time series into a number of “meaningful” segments. Possible representations of segment include approximating line, perceptual pattern, word, etc. Clustering. Find natural groupings of time series or time series patterns.
Perception Based Patterns in Time Series Data Mining
91
Classification. Assign given time series or time series patterns to one of several predefined classes. Indexing. Realize efficient execution of queries. Summarization. Give short description of a time series (or multivariate time series) which retains its essential features in considered problem. Anomaly Detection. Find surprising and unexpected patterns. Motif Discovery. Find frequently occurring patterns. Forecasting. Forecast time series values based on time series history or human expertise. Discovery of association rules. Find rules relating patterns in time series (e.g., patterns that occur frequently in the same or in neighboring time segments). These tasks are mutually related, for example, segmentation can be used for indexing, clustering, summarization, etc. Perception-based time series data mining systems should be able to manipulate with linguistic information, fuzzy concepts, and perception-based patterns of time series to support human decision making in problems related with time series data bases. Development of such systems needs to extend the methods of time series data mining (TSDM) [12–20] to give them possibility to operate with perceptions. Fortunately, a number of methods in time series data mining were recently developed for manipulating such information. A survey of perception-based patterns used in TSDM is given in the following sections. The goal of this survey is not to observe all papers in TSDM and time series analysis which uses perception-based patterns but to consider main types of such patterns that can be useful for perception-based time series data mining (PTSDM). Usually the patterns used in TSDM are crisp but they may be easily generalized to represent fuzzy patterns. The possible applications of fuzzy models to signal processing, data mining, and knowledge management 0in data bases were discusses, e.g., in [35, 57– 61]. The linguistic description of time series and solutions of time series data mining tasks can have different form depending on the goal of linguistic description. Such description can be given as a sequence of perception-based patterns (A1, A2,…, An), as a sequence of rules “If T is Tk then A is Ak,” k = 1,…,n, where Tk are crisp or fuzzy intervals and Ai are linguistic shape descriptors, or as a less formal text generated as a result of summarization of multivariate time series. Due to space limits we are not going to discuss the
92
I. Batyrshin et al.
methods and algorithms of extraction of these patterns. The necessary details can be found in the cited papers. Most of discussed approaches can be used in different time series data mining tasks. In Sect. 2 we start with the perception-based patterns considered mainly in qualitative process analysis and used for process monitoring, fault detection, and qualitative reasoning about processes. Time series are divided in episodes described by temporal patterns or primitives defined by the signs of the first and the second derivatives. In Sect. 3 we consider patterns based on scaling of trends and shapes. Elementary patterns or primitives can be used for generation of more complicated patterns based on suitable grammar. In Sect. 4 we consider shape definition language [14] which can be used for generation of composed shape patterns. The language facilitates generation (and execution) of queries to discover important information in time series. In Sect. 5 we consider approach to definition of patterns and rules which is based on clustering of shapes and linguistic interpretation of clusters. Once the patterns are identified, the next step forward is finding rules, relating patterns in a time series to other patterns in that series, or patterns in one series to patterns in another series. One specific type of this relationship considered in Sect. 6 is a temporal one. Integration of TSDM system with the experience of human experts and generation of summaries and textual descriptions of data sets need to define patterns in expert knowledge and relate them to patterns in time series. These approaches are discussed in Sect. 7. Conclusion contains discussion of possible directions of research in perception-based time series data mining.
2 Patterns Based on Signs of Derivatives Triangular episodes representation language was formulated in [21, 22] for representation and extraction of temporal patterns. Figure 4 shows seven temporal episodes used for description of temporal patterns. These episodes can be linguistically described as A: Increasing and Concave; B: Decreasing and Concave; C: Decreasing and Convex; D: Increasing and Convex; E: Linearly Increasing; F: Linearly Decreasing; G: Constant.
Perception Based Patterns in Time Series Data Mining
93
Fig. 4. Temporal episodes
Fig. 5. Representation of time series by temporal episodes. Adopted from [23]
Figure 5 depicts an example of time series representation by temporal episodes. Episodes are separated by vertical dashed lines This representation generates segmentation of time series on temporal episodes and codes it by the sequence of temporal patterns: ABCDABCDAB. Such coded representation of time series can be used for dimensionality reduction, indexing, clustering of time series, etc. The possible applications of such representation are process monitoring, diagnosis, and control [21–25]. An extension of considered approach to description of temporal episodes is considered in [24, 25]. Connected temporal episodes A, B and C, D with the same sigh of second derivative are joined together in composite episodes AB and CD. Such episodes are classified by means of the sign of slope of the line joining the boundary points of the episode. In the proposed approach the new episodes AB↓, AB↑, AB=, CD↓, CD↑, CD= are added. Figure 6 depicts proposed classification of episodes.
94
I. Batyrshin et al.
Fig. 6. Extended set of episodes. Adopted from [24]
The more extended dictionary of temporal patterns defined by the signs of the first (sd1) and the second (sd2) derivatives in the pattern is considered in [26]. This dictionary includes the perceptual patterns: PassedOverMaximum; IncreasingConcavely; StartedToIncrease; ConvexMaximum; etc. (see Figure 7). Figure 8 depicts an example of transformation of the profile of measured variable xj(t) into the qualitative form as a result of approximation of xj(t) by a proper analytical function from which the signs of derivatives are extracted [26]. The methods of application of temporal episodes to description of noisy time series as a result of approximation and smoothing are discussed also in [27]. The paper [26] introduces a method for reasoning about the form of the recent temporal profiles of process variables which carry important information about the process state and underlying process phenomena. The method is illustrated by the example of control of fermentation processes. Later are the examples of the shape analyzing rules used by decision making system [26]:
Perception Based Patterns in Time Series Data Mining
Fig. 7. Elements of shape library. Adopted from [26]
95
96
I. Batyrshin et al.
Fig. 8. Example of transformation of the temporal profile into the qualitative form qshape = {sd1; sd2}. Adopted from [26]
IF (DOincrement > 5%) and (DuringTheLast30sec DO has been Increasing) THEN (Report: Glucose depletion) and (Activate glucose feeding); IF (DuringTheLast1hr DO has been DecreasingConcavelyConvexly) THEN (Report: Foaming) and (Feed antifoam agent), where DO denotes Dissolved Oxygen. The qualitative description of time series and processes can be used in qualitative reasoning about processes [28, 29], which takes into account the change of signs of derivatives of considered processes. The temporal patterns considered in this section take into account only the signs of first and second derivatives of a function representing a process. This property may be considered as a positive feature of temporal pattern representtation language because (1) time series representation is invariant to transformations of time and time series values domains, (2) these patterns give qualitative description of processes or time series. But these representations do not use the scaling of patterns typical for people perceptions. Such perceptionbased patterns are considered in the following section.
Perception Based Patterns in Time Series Data Mining
97
3 Scaling of Perception-Based Patterns A scaling of perception-based patterns is used in many papers. This scaling can be applied to time series values, to slope values, to convex-concave shapes, etc. The method of symbolic representation of time series called SAX was proposed in [12]. This method divides the domain of time series values in considered window on intervals and time series values are replaced by the codes of respective intervals containing these values. An example of time series representation by SAX is shown in Fig. 9. The authors use scale with grades a,b,c,d,e,f for coding time series values. Generalization of this method can be based on suitable replacement of these symbols by linguistic labels like very small, medium, large, etc. and on fuzzy granulation of linguistic grades, e.g., as it was shown in Fig. 2b. The paper [12] gives also the classification of various time series representations based on PLR, wavelets, Discrete Fourier Transform etc. Granulation of slope values of functional dependencies and time series is used in [30], where it is described a system that generates linguistic descriptions of time series in the form of rules Rk: If T is Tk then Y is Ak, where Tk are fuzzy intervals, like Between A and B, Small, and Ak are linguistic descriptions of trends, like Very Quickly Increasing, Slowly Decreasing, etc. An evolutionary procedure is used to find optimal partition of time domain on fuzzy intervals where time series values are approximated by linear functions. The paper discusses the methods of retranslation of obtained piece-wise linear approximation of time series into linguistic form.
Fig. 9. Example of time series representation by SAX as a string ffffffeeeddcbaabceedcbaaaaacddee. Adopted from [12]
98
I. Batyrshin et al.
Another approach to linguistic description of time series was proposed in [31]. The method is based on Moving Approximation Transform [32], which replaces time series by a sequence of slope values of linear functions approximating time series values in sliding window. Figure 10 shows an example of piece-wise linear representation of time series obtained by this method. Time series in this example contains information about Industrial Production Index published by Board of Governors of the Federal Reserve System [33]. It includes monthly data of time period from 1940-01-01 to 2003-07-01. The part of linguistic description of time series, corresponding to the last two segments can be presented, e.g., in the following form: “Last 2.5 years the Index Slowly Decreased, whereas during previous 8 years, it is Quickly Increased.” Similar linguistic descriptions of time series are reported in [34] where it has used a scaling of trends in the system that detects and describes linguistically significant trends in time-series data, applying wavelets, and scale space theory. In this work some experimental results of application of this system to summarization of weather data are presented. Later is an example of description generated automatically by the system, which used perception-based patterns like dropped, dropped sharply, etc:
Fig. 10. Piece-wise linear representation of time series. Adopted from [31]
Perception Based Patterns in Time Series Data Mining
99
“The temperature dropped sharply between the 3rd and the 6th. Then it rose between the 6th and the 11th, dropped sharply between the 14th and 16th and dropped between the 23rd and 28th.” The linguistic description of time series based on piece-wise linear representation (PLR) of time series can be based also on optimal PLR algorithms developed in time series data mining [17]. In this case, PLR algorithms should be modified to take into account a linguistic scaling of the set of possible slope values. The resulting algorithm should avoid also the situations when neighboring intervals have the same shape descriptors. Possible application of fuzzy models to signal processing were discussed in [35]. Table 1 gives a list of primitives used for description of Carotid waveforms [35, 36]. Figure 11 shows an example of segmentation of part of Carotid pulse wave by means of these primitives. The sequence of primitives L = (Up, Cap, Cup, Cup, LN, Cup, Cap, TE, Up, Cap, Cup, Cap, LN) can be used in syntactic pattern recognition of systolic and diastolic epochs. In [35] it is discussed the possibility to introduce fuzziness into syntactic descriptions of digital signals in some very natural ways. Two regions in Fig. 11 marked F denote fuzzy boundaries between the systolic and diastolic regimes and between primitives Up and Cap of this signal [35]. The functions mT(t) and mu(t) denote respective transition membership functions. The paper [37] presents an approach to modeling time series datasets using linguistic shape descriptors. A simple linguistic term such as “rising” indicates that the series at this point is changing, i.e., yk+1 > yk. These terms are a measure of the first derivative trend of the series. A more complex term such as “rising more steeply” indicates that the trend of the series is changing, i.e., yk+2 - yk+1 > yk+1 - yk. These terms are a measure of the second derivative trend of the series. Parametric prototype trend shapes are given by the following functions: Table 1. List of primitives used for description of Carotid waveforms [35, 36] Up LP LN MP MN TE Cup Cap
upslope large-pos large-neg med-pos med-neg trailing edge parabola parabola
long line, large positive slope medium line, large positive slope medium line, large negative slope medium line, positive slope medium line, negative slope long line, medium negative slope opening up opening down
100
I. Batyrshin et al.
Fig. 11. Carotid waveform representation as a sequence of primitives. Adopted from [35]
falling less steeply:
f (t ) = 1 − 1 − (1 − t )α ,
falling more steeply:
f (t ) = 1 − t α
rising more steeply:
f (t ) = 1 − 1 − t α
rising less steeply:
f (t ) = 1 − (1 − t )α
crest:
f (t ) = 1 − α 2 (t − 0.5)α
trough:
f (t ) = α 2 (t − 0.5)α .
Figure 12 depicts these prototype shapes for different values of parameter α. Figure 13 depicts how trend shapes can be matched with a series. Each region of interest can have membership in several trend shapes. The trend concepts are used to build fuzzy rules in the form [37]: If trend is F then next point is Y, If trend is F then next point is current point + dY,
Perception Based Patterns in Time Series Data Mining
101
Fig. 12. Examples of prototype shapes (a) “falling less steeply,” α = 6; 3; 1.5; (b) “falling more steeply,” α = 1.5; 3; 6; (c) “rising more steeply,” α = 6; 3; 1.5; (d) “rising less steeply,” α = 1.5; 3; 6; (e) “crest,” α = 2; (d) “trough,” α = 2
Fig. 13. Shape matching. Adopted from [37]
where F is a trend fuzzy set, such as “rising more steeply” and Y, dY are fuzzy time series values. Prediction using these trend fuzzy sets is performed using the Fril evidential logic rule [38]. Approach uses also fuzzy scaling of trends similar to method depicted in Fig. 3 with linguistic patterns: falling fast, falling slowly, constant, rising slowly, and rising fast.
102
I. Batyrshin et al.
More extended methods of scaling and fuzzy granulation of time series shape patterns are considered in [63].
4 Shape Definition Language A shape definition language (SDL) was developed in [14] for retrieving objects based on shapes contained in the histories associated with these objects. For example, in a stock database, the histories of opening price, closing price, the high for the day, the low for the day, and the trading volume may be associated with each stock. SDL allows a variety of queries about the shapes of histories. It performs “blurry” matching [14] where the user cares about the overall shape but does not care about specific details. SDL has efficient implementation based on index structure for speeding up the execution of SDL queries. Table 2 gives an illustrative alphabet of SDL, where lb and ub are the lower and upper bounds, respectively, of the allowed variation from the initial value to the final value of the transition. Table 2. An illustrative alphabet A symbol up Up down Down appears disappears stable zero
description slightly increasing transition highly increasing transition slightly decreasing transition highly decreasing transition transition from a zero value to a nonzero value transition from a nonzero value to a zero value the final value nearly equal to the initial value both the initial and final values are zero
lb .05 .20 –.19 –1.0 0 –1.0 –.04 0
ub .19 1.0 –.05 –.19 1.0 0 .04 0
Figure 14 shows an example of the time sequence. Given alphabet A, this time sequence may be described, e.g., by two different transition sequences: (zero appears up up up down stable Down down disappears) (zero stable up up up down stable Down down stable) This alphabet can be used for definition of shapes as follows: (shape name(parameters) descriptor)
Perception Based Patterns in Time Series Data Mining
103
Fig. 14. Time sequence H=(0,0,.02,.17,.35,.50,.45,.43,.15,.03,0). Adopted from [14]
For example, “a spike” can be defined as (shape spike ( ) (concat Up up down Down)), where the list of parameters is empty and concat denotes concatenation. All the symbols of the alphabet correspond to elementary shapes. Complex shapes can be derived by recursive combination of elementary and previously defined shapes. A set of available operators provide multiple choice, concatenation, multiple and bounded occurrences of shapes in complex shape descriptions. SDL is a natural and powerful language for expressing shape queries with the following syntax [19]: (query (shape history-spec)) Here, shape is the descriptor for the shape to be matched. The history-spec is of the form: history-name, start-time, and end-time. Here history-name specifies the name of the history, in which the shape should be matched and start-time and end-time define the interval, on which matching occurs. The
104
I. Batyrshin et al.
result of the execution of a query is the set of all rules that contain the desired shape in the specified history. In addition, the result also contains the list of subsequences of the history that matched the shape. The approach gives the possibility to retrieve combinations of several shapes in different histories by using the logical operators and and or. The query language provides the capability to discover important information in time series data bases.
5 Patterns with Human Interpretation The paper [13] studies the problem of finding rules, relating patterns in a time series to other patterns in that series, or patterns in one series to patterns in another series. The patterns are formed from data. The method first forms subsequences by sliding a window through the time series, and then clusters these subsequences by using a suitable measure of time series similarity. The discretized version of the time series is obtained by taking the cluster identifiers corresponding to the subsequence. Further the rule finding methods are used to obtain the rules from the sequences. Figure 15 depicts the simplified example of time series s = (1, 2, 1, 2, 1, 2, 3, 2, 3, 4, 3, 4). The set of subsequences formed by sliding window with width = 3 is clustered by some clustering method based on a given distance measure between subsequences. The right side of Fig. 15 shows three primitive shapes a1, a2, and a3 obtained after clustering. Replacement of subsequences by corresponding names of shapes gives the discretized series D(s) = (a1,a2,a1,a2,a3,a1,a2,a3,a1,a2). The discretization process depends on the choice of window size, on the choice of distance measure, and on the type of clustering algorithm used. The simplest rules discovered from a set of discretized sequences have the format: if A occurs, then B occurs within time T,
Perception Based Patterns in Time Series Data Mining
105
Fig. 15. Example of time series discretization. Adopted from [13]
where A and B are identifiers of patterns (clusters of patterns) discovered in T
time series. In a short form, this rule may be written as A ⇒ B . Frequency, confidence, and J-measure were used for selecting interesting rules [13, 39]. The approach was applied to several data sets. As an example, from the daily closing share prices of ten database companies traded on the NASDAG, 20
for example, the following significant rule was found: s18 ⇒ s 4 . The patterns s18 and s4 are shown in Fig. 16. These patterns were obtained for window size w=13. An interpretation of the rule is that a stock which follows a 2.5-week declining pattern of s18 “sharper decrease and then leveling out,” will likely incur a “short sharp fall” within 4 weeks (20 days) before leveling out again (the shape of s4).
106
I. Batyrshin et al.
Fig. 16. Example of patterns of significant rule. Adopted from [13]
As stressed by the authors, the proposed technique is essentially intended as an exploratory method and thus, iterative and interactive application of the method coupled with human interpretation of the rules is likely to lead to the most useful results rather than any fully automated approach [13].
6 Temporal Relationships Between Patterns The approach to knowledge discovery from multivariate time series developed in [40] consists of several stages. Initially, the time series are segmented and transformed into sequences of state intervals (bi,si,fi), i= 1,…n. Here, si are time series states like increasing, decreasing, constant, highly increasing, and convex holding during time periods (bi,fi), where bi ≤ bi+1 and bi < fi. It is required that every state is maximal in the sense, that there are no state intervals in the series with the same state, which overlap or meet each other. The temporal relationships between state intervals are described by 13 temporal relationships of Allen’s interval logic [41] shown in Fig. 17.
Perception Based Patterns in Time Series Data Mining
107
Fig. 17. Allen’s interval relationships. Adopted from [27]
Finally, the rules with frequent temporal patterns in the premise and conclusion are derived. The method was applied to time series of air-pressure and wind strength/wind direction data [40]. The smoothed time series have been partitioned into segments with primitive patterns like very highly increasing, constant, and decreasing. Later is an example of association rule generated by the proposed approach: convex air pressure, highly decreasing air pressure, decreasing air pressure→ highly increasing wind strength where the following temporal relationships take place: 1. Convex air pressure “overlaps” highly decreasing air pressure 2. Highly decreasing air pressure “equals” decreasing air pressure 3. Decreasing air pressure “meets” highly increasing wind strength
The proposed methodology may support a human in learning from temporal data. The meaningful rules that obtained by described technique can be used together with expert knowledge in construction of expert system [40]. Generalization of proposed approach may be based on a fuzzy extension of Allen’s interval algebra and on formalization of relations between fuzzy time intervals [42–50].
108
I. Batyrshin et al.
7 Perception-Based Patterns in Expert Knowledge, Summaries, and Forecasting Texts In this section we consider systems which use preliminary analysis of human expert rules, forecasting, and summaries texts to relate human perceptions with time series patterns and finally to generate texts in the form similar to human descriptions in considered problem area. General questions of generation of fuzzy linguistic summaries in data bases are discussed in [59–61]. The problem of analysis of intentions in utterances is discussed in [62]. A rule-based fuzzy expert system WXSYS attempts to predict local weather based on conventional wisdom [51]. Later are examples of expert rules used in the system: Weather will be generally clear when the wind shifts to a westerly direction. The greatest change occurs when the wind shifts from east through south to west. Generally, if the barometer falls steadily and the wind comes from an easterly quarter, expect foul weather. If the current wind is blowing from S to SW and the current barometric pressure is rising from 30.00 or below, then the weather will be clearing within a few hours, then fair for several days. This system is realized in FuzzyClips and contains formalized descriptions of fuzzy expert rules. For pattern formalization, it is necessary to relate them to fuzzy concepts and to patterns from time series of weather parameters like pressure, wind, etc. Another approach uses special grammar for generation of summaries based on patterns retrieved from the summaries generated by human experts. The first system that generated descriptions of stock market behavior, called Ana, was described in [52]. Data from a Dow Jones stock quotes database serves as input to the system, and the opening paragraphs of a stock market summary are produced as output. As more semantic and linguistic knowledge about the stock market is added to the system, it is able to generate longer, more informative reports. Figure 18 depicts a portion of the real data submitted to Ana for January 12, 1983. The following text sample is one of possible interpretations of the data generated by Ana:
Perception Based Patterns in Time Series Data Mining
109
Fig. 18. Example of stock data used by Ana
Wall Street’s securities markets rose steadily through most of the morning, before sliding downhill late in the day. The stock market posted a small loss yesterday, with the indexes finishing with mixed results in active trading. The Dow Jones average of 30 industrials surrendered a 16.28 gain at 4pm and declined slightly, to finish at 1083.61, off 0.18 points. The more extended system called StockReporter is discusses in [53, 54]. This system is one of a number of developed online text generation systems which generate textual descriptions of numeric data sets. The StockReporter project is heavily influenced by Karen Kukich’s work [52]. In contrast to Ana, StockReporter produces reports that incorporate both text and graphics. It reports on the behavior of any one of 100 US stocks and how that stock’s behavior compares with the overall behavior of the Dow Jones Index or the NASDAQ. StockReporter takes numeric data that describes the performance of a particular stock and produces from this data a textual summary that describes how the stock performed over a user-specified reporting period. It can generate a text like the following: Microsoft avoided the downwards trend of the Dow Jones average today. Confined trading by all investors occurred today. After shooting to a high of $104.87, its highest price so far for the month of April, Microsoft stock eased to finish at an enormous $104.37. The Dow closed after trading at a weak 5682, down 6 points. Another system generating short (a few sentences) summaries of large (100KB or more) time-series data sets is described in [55]. The architecture integrates pattern recognition, pattern abstraction, selection of the most
110
I. Batyrshin et al.
significant patterns, microplanning (especially aggregation), and realization. SumTime-Turbine is a prototype system which uses this architecture to generate textual summaries of sensor data from gas turbines. Figure 19 shows ontology for various patterns used by this system. The goal is to classify patterns into ontology, not to identify specific types of patterns. SumTime-Turbine’s pattern analysis components can operate with different temporal granularity, e.g., 1, 5, and 10 s. For example, when using a temporal granularity of 1 s, the pattern concepts (such as “dip with oscillatory recovery”) used by experts while examining data visualized at a 1-s time scale, are applied. When using a temporal granularity of 5 s, the concepts used by experts while examining data visualized at a 5-s time scale, are applied [55]. A pattern recognition algorithm is composed of a pattern locator and a pattern classifier. The algorithm uses shape description language described in [14] (See Sect. 4 of this chapter). Later is an example of sentences generated to describe patterns: There were large erratic oscillations with short period in all channels at 18:17, large spikes in all channels at 18:40, 18:48 and 20:21. There were variant patterns in all channels at 18:03, 19:30 and 20:44. Finally we cite some perception-based weather forecast generated by weather.com [56]: “Scattered thunderstorms ending during the evening,” “Skies will become partly cloudy after midnight,” “Occasional showers possible.” Such perception-based forecast supports main idea of Computing with Words and Perceptions (CWP) where inputs and/or outputs of decision making system are words [1]. It would be interesting to find application area in economics and finance where such perception-based forecasting plays important role.
Fig. 19. Patterns ontology in the gas turbine domain. Adopted from [55]
Perception Based Patterns in Time Series Data Mining 111
112
I. Batyrshin et al.
8 Toward Perception-Based Time Series Data Mining Many decision making procedures in economics and finance use expert knowledge defined on time series data base domain. This knowledge can serve as a basis for development of intelligent decision making systems integrating expert knowledge with computing with words and perceptions [1] and perceptionbased time series data mining (see Fig. 20). The advantage of such system over a human expert will consist in capability of real time processing of gigabytes of information in permanently changing situations typical for economics and financial markets. The role of CWP in such systems is to realize human decision making procedures and reasoning mechanisms given in expert knowledge. Expert knowledge usually uses fuzzy perceptions and the role of perception-based time series data mining (PTSDM) is to support CWP by extracting from TSDB perception-based patterns and associations relevant to decision making
Intelligent Decision Making System Expert Knowledge
Computing with Words and Perceptions
Perception Based Time Series Data Mining
Time Series Data Base Fig. 20. Architecture of intelligent decision making system based on expert knowledge in time series data base domains
Perception Based Patterns in Time Series Data Mining
113
procedures. As it was shown in this chapter perception-based patterns are considered in many papers and the developed approaches for manipulation by such patterns can be used as a basis of PTSDM. Some of these approaches and effective algorithms of TSDM should be adopted for extracting fuzzy perception-based information useful in decision making models of computing with words and perceptions.
9 Conclusions In spite of the growing number of applications of TSDM, we have only begun to scratch the surface in realizing full benefits of these technologies using perception-based information. We showed the role of linguistic perceptionbased patterns defined on time series domain in representation of expert knowledge in wide range of application areas. Different approaches to description of such patterns use sign of derivatives, scaling of trends and shapes, linguistic interpretation of patterns obtained as result of clustering, a grammar for generation of complex patterns from shape primitives, temporal relations between patterns. Several approaches to relate linguistic descriptions of experts with automatically generated texts of summaries and linguistic forecasts are considered. Semantic imprecision of natural languages is a concomitant of imprecision of perceptions [64]. For this reason considered in this chapter approaches may be extended by using fuzzy granulation of time series patterns to make them more adequate to perceptions used in human reasoning. Perception-based time series data mining together with CWP and natural language computation [64] can serve as a basis for construction of intelligent decision making systems that use expert knowledge in time series data base domains.
10 Acknowledgment The support for this research work has been provided by the IMP, projects D.00006 and D.00322.
114
I. Batyrshin et al.
References 1. Zadeh L.A. (1999) From computing with numbers to computing with words – from manipulation of measurements to manipulation of perceptions. IEEE Transactions on Circuits and Systems 1: Fundamental Theory and Applications, vol. 45, 105–119 2. Zadeh L.A. (2001) A new direction in AI: Toward a computational theory of perceptions. AI Magazine, Spring 2001, 73–84 3. Zadeh L.A. (2002) Toward a perception-based theory of probabilistic reasoning with imprecise probabilities. Journal of Statistical Planning and Inference vol. 105, 233–264 4. Zadeh L.A. (1997) Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Sets and Systems, vol. 90, 111–127 5. Zadeh L.A. (2003) Web intelligence and fuzzy logic – the concept of Web IQ (WIQ). WI’03 and IAT’03 Keynote Talk, Halifax, Canada, October 2003 6. Jang J.-S.R., Sun C.T., Mizutani E. (1997) Neuro-Fuzzy and Soft Computing. A Computational Approach to Learning and Machine Intelligence. Prentice-Hall, NJ, USA 7. Kosko B. (1997) Fuzzy Engineering. Prentice-Hall, NJ, USA 8. Klir G.J., Clair U.S., Yuan B. (1997) Fuzzy Set Theory: Foundations and Applications, Prentice Hall, NJ, USA 9. Zadeh L.A. (1973) Outline of a new approach to the analysis of complex systems and decision processes. IEEE Transactions on Systems, Man and Cybernetics SMC-3, 28–44 10. Hand D., Manilla H., Smyth P. (2001) Principles of Data Mining. MIT, Cambridge 11. KDnuggets: Polls: Time-Series Data Mining (Nov 2004) What Types of TimeSeries Data Mining You’ve Done? http://www.kdnuggets.com/polls/2004/ time_series_data_mining.htm 12. Lin J., Keogh E., Lonardi S., Chiu B. (2003) A symbolic representation of time series, with implications for streaming algorithms. Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery. San Diego, CA 13. Das G., Lin K.I., Mannila H., Renganathan G., Smyth P. (1998) Rule discovery from time series. Proceedings of KDD98, 16–22 14. Agrawal R., Psaila G., Wimmers E.L., Zait M. (1995) Querying shapes of histories. Proceedings of the 21st International Conference on Very Large Databases, VLDB ’95, Zurich, Switzerland, 502–514
Perception Based Patterns in Time Series Data Mining
115
15. Sripada S.G., Reiter E., Hunter J., Yu J. (2002) Segmenting time series for weather forecasting. Proceedings of ES2002, 193–206 16. Cohen P., Adams N. (2001) An algorithm for segmenting categorical time series into meaningful episodes. Proceedings of the Fourth International Symposium on Intelligent Data Analysis, Lisbon Portugal 17. Keogh E.J., Chu S., Hart D., Pazzani M. (2001) An online algorithm for segmenting time series. Proceedings of IEEE International Conference on Data Mining, 289–296 18. Agrawal R., Faloutsos C., Swami A. (1993) Efficient similarity search in sequence databases. Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms, Chicago, 69–84 19. Agrawal R., Psaila G. (1995) Active data mining. Proceedings of the First International Conference on Knowledge Discovery and Data Mining, Montreal 20. Last M., Klein Y., Kandel A. (2001) Knowledge discovery in time series databases. IEEE Transactions on Systems, Man, and Cybernetics, vol. 31B, 160– 169 21. Cheung J.T., Stephanopoulos G. (1990) Representation of process trends. Part I. A formal representation framework. Computers and Chemical Engineering, vol. 14, 495–510 22. Cheung J.T. (1992) Representation and extraction of trends from process data. D.Sci.Th., Massachusetts Institute of Technology, Cambridge/MA, USA 23. Kivikunnas S. (1999) Overview of process trend analysis methods and applications. Proceedings of Workshop on Applications in Chemical and Biochemical Industry. Aachen, Germany 24. Colomer J., Melendez J., De la Rosa J.L., Aguilar J. (1997) A qualitative/ quantitative representation of signals for supervision of continuous systems. Proceedings of European Control Conference-ECC97, Brussels 25. Colomer J. (1998) Representacio Qualitativa Asincrona de Senyals Per a la Supervisio Experta de Processos, Ph.D. dissertation, University of Girona (UdG), Catalonia, Spain 26. Konstantinov K.B., Yoshida T. (1992) Real-time qualitative analysis of the temporal shapes of (bio) process variables. American Institute of Chemical Engineers Journal vol. 38, no. 11, 1703–1715 27. Höppner F. (2003) Knowledge Discovery from Sequential Data. Dissertation. Braunschweig University 28. Forbus K.D. (1984) Qualitative process theory. Artificial Intelligence, vol. 24, 85–168 29. Kuipers B. (1984) Commonsense reasoning about causality: deriving behavior from structure. Artificial Intelligence, vol. 24, 169–203
116
I. Batyrshin et al.
30. Batyrshin I., Wagenknecht M. (2002) Towards a linguistic description of dependencies in data. International Journal of Applied Mathematics and Computer Science. Special Issue on Computing with Words and Perceptions (ed. by D. Rutkowska, J. Kacprzyk, L.A. Zadeh), vol. 12, no. 3, 391–401 31. Batyrshin I., Herrera-Avelar R., Sheremetov L., Suarez R. (2004) On qualitative description of time series based on moving approximations. Proceedings of the International Conference on Fuzzy Sets and Soft Computing in Economics and Finance, FSSCEF 2004, St. Petersburg, Russia, vol. I, 73–80 32. Batyrshin I., Herrera-Avelar R., Sheremetov L., Panova A. Moving approximation transform and local trend associations in time series data bases. In this book. 33. Federal Reserve Board, http://www.federalreserve.gov/rnd.htm 34. Boyd S. (1998) TREND: A system for generating intelligent descriptions of timeseries data. In Proceedings of the IEEE International Conference on Intelligent Processing Systems (ICIPS1998) 35. Bezdek J.C. (1993) Fuzzy models and digital signal processing (for pattern recognition): Is this a good marriage?. Digital Signal Processing, vol. 3, no. 4, 253–270 36. Stockman G., Kanal L., Kyle M.C. (1976) Structural pattern recognition of carotid pulse waves using a general waveform parsing system. CACM 19, 2, 688–695 37. Baldwin J.F., Martin T.P., Rossiter J.M. (1998) Time series modelling and prediction using fuzzy trend information. Proceedings of the Fifth International Conference on Soft Computing and Information/Intelligent Systems, 499–502 38. Baldwin J.F., Martin T.P., Pilsworth B.W. (1995) Fril – Fuzzy and Evidential Reasoning in Artificial Intelligence. Research Studies Press Ltd 39. Smyth P., Goodman R. M. (1991) Rule induction using information theory. In: Knowledge Discovery in Databases, MIT, Cambridge, MA, Chapter 9, 159–176 40. Höppner F. (2001) Learning temporal rules from state sequences. IJCAI Workshop on Learning from Temporal and Spatial Data, Seattle, USA, 25–31 41. Allen J.F. (1983) Maintaining knowledge about temporal intervals. Communications of the ACM, vol. 26, no. 11, 832–843 42. Ohlbach H.J. (2004) Relations between fuzzy time intervals. Proceedings of 11th International Symposium on Temporal Representation and Reasoning, Tatihoui, Normandie, France 43. Nagypál G., Motik B. (2003) A fuzzy model for representing uncertain, subjective, and vague temporal knowledge in ontologies. Proceedings of the International Conference on Ontologies, Databases and Applications of Semantics, (ODBASE), volume 2888 of LNCS. Springer, Berlin Heidelberg New York, 906–923
Perception Based Patterns in Time Series Data Mining
117
44. Dubois D., Prade H. (1989) Processing fuzzy temporal knowledge. IEEE Transactions on Systems, Man and Cybernetics, vol. 19, 729–744 45. Dubois D., Prade H. (1986) Possibility Theory: An Approach to Computerized Processing of Uncertainty. Plenum, New York 46. Kurutach W. (1995) Modelling fuzzy interval-based temporal information: a temporal database perspective. Proceedings of 1995 IEEE International Conference on Fuzzy Systems, Yokohama, Japan, 741–748 47. Godo L., Vila L. (1995) Possibilistic temporal reasoning based on fuzzy temporal constraints. IJCAI’95: Proceedings International Joint Conference on Artificial Intelligence, Montreal 48. Dutta S. (1988) An event-based fuzzy temporal logic. Proceedings of the 18th IEEE Intnational Symposium on Multiple-Valued Logic, Palma de Mallorca, Spain, 64–71 49. Badaloni S., Giacomin M. (2000) A fuzzy extension of Allen’s interval algebra. In E. Lamma, P. Mello (Eds.), AI*IA99: Advances in Artificial Intelligence, Selected Papers – Lecture Notes in Artificial Intelligence, 1792, 155–165, Springer, Berlin Heidelberg New York fuz 50. Badaloni S., Giacomin M. (2006) The algebra IA : a framework for qualitative fuzzy temporal reasoning. Artificial Intelligence, vol. 170, 872–908, Elsevier 51. Maner W., Joyce S. (1997) WXSYS: Weather Lore + Fuzzy Logic = Weather Forecasts. Presented at the 1997 CLIPS Virtual Conference (http://web.cs.bgsu.edu/maner/wxsys/wxsys.htm) 52. Kukich K. (1983) Design of a knowledge-based report generator. Proceedings of the 21st Annual Meeting of the Association for Computational Linguistics (ACL1983), 145–150 53. StockReporter. http://www.ics.mq.edu.au/~ltgdemo/StockReporter/about.html 54. Reiter E., Dale R. (2000) Building Natural Language Generation Systems, (Studies in Natural Language Processing). Cambridge University Press, Cambridge 55. Yu J., Reiter E., Hunter J., Mellish C. (2007) Choosing the content of textual summaries of large time-series data sets. Natural Language Engineering. (To appear) 56. www.weather.com 57. Pons O., Vila M. A., Kacprzyk J. (Eds.). (2000) Knowledge Management in Fuzzy Databases, Physica, Wurzburg 58. Kandel A., Last M., Bunke H. (Eds). (2001) Data Mining and Computational Intelligence, Studies in Fuzziness and Soft Computing, vol. 68, Physica, Wurzburg 59. Kacprzyk J., Zadro ny S. (1998) Data Mining via Linguistic Summaries of Data: An Interactive Approach, In T. Yamakawa and G. Matsumoto (Eds.):
118
60.
61.
62. 63. 64.
I. Batyrshin et al. Methodologies for the Conception, Design and Application of Soft Computing (Proceedings of IIZUKA’98, Iizuka, Japan), 668–671 Yager R.R. (1991) On linguistic summaries of data, In: Piatetsky-Shapiro G. and Frawley B. (Eds.): Knowledge Discovery in Databases, MIT, Cambridge, MA, 347–363 Yager R.R. (1995) Fuzzy summaries in database mining. Proceedings of the 11th Conference on Artificial Intelligence for Applications, Los Angeles, USA, 265– 269 Allen J.F., Perrault C.R. (1980) Analyzing intention in utterances. Artificial Intelligence, vol. 15, 143–178 Batyrshin I., Sheremetov L. Perception-based functions in qualitative forecasting. In this book Zadeh L.A. Computation with information described in natural language – the concept of generalized-constraint-based computation. International Conference on Computational Intelligence for Modelling Control and Automation – CIMCA’2005, Vienna, Austria, http://csdl2.computer.org/comp/proceedings/cimca/ 2005/2504/01/25041xxx.pdf