A Rough-Set-Refined Text Mining Approach for Crude Oil Market ...

International Journal of Knowledge and Systems Sciences Vol. 2, No. 1, March 2005 http://www.jaist.ac.jp/iskss/

A Rough-Set-Refined Text Mining Approach for Crude Oil Market Tendency Forecasting Lean Yu1, 2, Shouyang Wang1, 3, 4, K.K. Lai4, 5 1

Institute of Systems Science, Academy of Mathematics and Systems Sciences, Chinese Academy of Sciences, Beijing, 100080, China 2 School of Management, Graduate School of Chinese Academy of Sciences, Chinese Academy of Sciences, Beijing, 100039, China 3 Institute of Policy and Planning Sciences, University of Tsukuba, Tsukuba, Ibaraki 305-8573, Japan 4 College of Business Administration, Hunan University, Changsha, 410082, China 5 Department of Management Sciences, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong

Abstract In this study, we propose a knowledge-based forecasting system — rough-set-refined text mining (RSTM) approach — for crude oil price tendency forecasting. This system consists of two modules. In the first module, text mining techniques are used to construct a metadata repository and generate rough knowledge by extracting unstructured text documents, including gathering various related text documents, preprocessing documents, feature extraction, and metadata mining and rough knowledge generation. In the second module, rough set theory is used as a knowledge refiner for the rough knowledge, which includes information table formulation, information reduction and rough knowledge refinement. By combining these two components, some useful patterns and rules (“knowledge”) are generated, which can be used for crude oil market tendency forecasting. To evaluate the forecasting ability of RSTM, we compare its performance with that of conventional methods (e.g., statistical models and time series models) and neural network models. The empirical results reveal that RSTM outperforms other forecasting models and demonstrate that the proposed approach is suitable for simultaneous application to a wide range of practical prediction problems under uncertainty. In addition, experimental results reveal that our proposed approach is a promising alternative to the conventional methods for crude oil market tendency forecasting. Keywords: Text mining, Rough set, Knowledge refinement, Crude oil market tendency forecasting

1. Introduction The high volatility and irregularity of crude oil market makes its very difficult to predict market movements, mainly because of the interaction of many factors in crude oil markets. Crude oil market forecasting has attracted increasing attention from academics and practitioners in the past decade, so various methods including qualitative and quantitative approaches have been tried in approaching this problem. For example, Abramson and Finizza [1] used belief network models for oil price forecasting and obtained some good insights into crude oil price prediction. Nelson et al. [2] used the Delphi method to predict oil prices for the California Energy Commission. Huntington [3] used a

sophisticated econometric model to predict crude oil prices in the 1980s. Abramson and Finizza [4] utilized a probabilistic model to predict oil prices. Barone-Adesi et al. [5] suggested a semi-parametric approach for oil price forecasting and Morana [6] used the same approach to predict short-term oil prices and reported some performance improvement. Recently, Wang et al. [7] proposed a new crude oil price forecasting framework with TEI@I methodology and obtained good forecasting performance. However, there are still several problems with the above methods except [7]. The first problem is that only a few factors, such as demand factors and supply factors, are taken into account in the researchers’ models . In fact, many factors interact to have a combined effect on crude oil prices: these include

34

Yu, Wang and Lai / International Journal of Knowledge and Systems Sciences, 2(1): 33-46, 2005

economic, military and political factors, natural disasters and speculation, as well as the expected demand and supply factors. These important factors can be hard to handle and they are not included in the conventional models. Thus it is hard to generate satisfactory forecasting results. The second problem is how to quantify the qualitative factors. Some qualitative variables, such as political variables [1], are not easy to quantify due to uncertainty. Furthermore, related data collection is very difficult. Thus, the above proposed models are not easy to operate or impractical. To summarize, these existing approaches cannot meet the needs of practical applications. Therefore, it is important that new methods be developed for crude oil price forecasting. In the process of developing a new approach for crude oil forecasting, three main problems must be considered. Th e first problem is how to find model variables (i.e., impact factors) for a specific model. Crude oil price formulation is very complex due to the interaction of many factors. Hence, all kind of factors should be considered. The problem is how to identify these factors (Problem I). The second problem is how to extract these factors (Problem II). As most factors affecting crude oil prices hide in unstructured textual documents (e.g., web pages), it is not easy to handle them due to extraction difficulties. The third problem is how to deal with inconsistency in a very natural way (Problem III) when extracted rules or patterns conflict. In view of the first two problems, the text mining technique [8] is introduced to find and deal with various model variables. Usually a great number of factors affect crude oil prices. These factors are not explicit and they are hidden in a large amount of unstructured text collections. Text mining is a suitable technique for finding these factors from text documents. In view of the third problem, rough set theory is adopted to cope with the inconsistency of extracted rules or patterns. The rough sets concept proposed by Pawlak [9] is a new mathematical approach to imprecision, vagueness and uncertainty. By way of rough set theory, we can simplify the extracted rules or patterns and create some new and useful patterns to predict crude oil market movements. In some sense, the rough set is a knowledge refiner for text mining. Considering the three main problems described above, we propose a novel knowledge-based forecasting system — rough-set-refined text mining (RSTM) approach — for crude oil market tendency forecasting. The RSTM consists of two components: one is text mining and the other is rough set. The former is used to find/search factors that impact the oil price and construct a structured metadata repository to generate some rules or patterns. The latter is used to reduce inconsistency of patterns and refine patterns for crude oil market movement direction forecasting. It

should be noted that knowledge as a capacity is built on information extracted from data. The RSTM approach has two distinct advantages over conventional methods. The first is that our proposed approach can consider all quantitative and qualitative factors that we find and thus strengthen our approach’s generalization capability. This is impossible for conventional methods (e.g., statistical models). The second is that the proposed approach is more flexible than conventional methods. The proposed approach does not require quantifying all impact factors. It can predict the future using logical deductions. The main goal of this study is to construct a new forecasting approach for complex dynamic markets (e.g., crude oil market) and explore the predictability of the proposed RSTM approach. The rest of the study is organized as follows. The next section describes the main principles of the RSTM approach, including text mining and rough set theory, in detail. In Section 3, we give an experiment scheme. Empirical results and analysis are reported in Section 4. The concluding remarks are given in Section 5.

2. The Rough-Set-Refined Text Mining Approach In this section, we systemically describe the rough-set-refined text mining (RSTM) approach step by step. First we present the basic principles of text mining and rough set theory. Then a knowledge-based forecasting system approach integrating text mining and rough sets is proposed for crude oil market tendency forecasting. 2.1 The main principles of text mining theory 2.1.1 Introduction Text mining is a newly emerging research area that has become popular in recent years. According to Webster’s online dictionary [10], text mining (also known as intelligent text analysis , text data mining or knowledge-discovery in text (KDT)), refers generally to the process of extracting interesting and non-trivial information and knowledge from unstructured text. As an emerging research area, there is still no established vocabulary for text mining, intelligent text analysis, text data mining and KDT so far. This leads to confusion when attempting to compare results and techniques. In fact the four terms are used to denote the same thing. Text mining [8], ni telligent text analysis [11], text data mining [12] and knowledge discovery in text (or in textual databases) [13, 14] are some of the terms that can be found in the literature. In this study, the term “text mining”is used for consistency. Text mining is a young interdisciplinary field which draws on information retrieval, data mining,


machine learning, statistics and computational linguistics. Since the most natural form of storing information is as text, text mining is believed to have a higher commercial potential than data mining. In fact a recent study, conducted by Delphi Group (http://www.thedelphigroup.com/ ), has indicated that 80% of a company's information is contained in textual documents. In some sense, text mining emergence is the result of the strong need for analyzing a vast amount of textual data. It should be noted that there is difference between text mining, data mining and knowledge management [15]. Data mining takes advantage of the infrastructure of stored data, e.g., labels and relationships, to extract additional useful information. For example, by mining a customer database, one might discover that everyone who buys product A also buys products B and C. The object of data mining is structured database. Text mining is the application of the idea of data mining to non-structured or less structured text documents. Text mining must operate in a less structured world. Documents rarely have strong internal infrastructure (and where they do, it frequently focuses on document format rather than document content). In text mining, metadata about documents is extracted from the document and stored in a database where it may be “mined” using database and data mining techniques. The metadata serves as a way to enrich the content of the document – not just on its own, but by the ways the mining software can then manipulate it. The text mining technique is a way of extending data mining methodologies to the immense and expanding volumes of stored text by an automated process that creates structured data describing documents. In some sense, text mining contains the overall content of data mining, i.e., the range of text mining is broader than that of data mining. Knowledge management is not a technology, but rather a management concept. It is a way of reorganizing the way that knowledge is created, used, shared, and stored in an organization. Knowledge is recognized as a valuable asset and may include historic data of all types, methodologies, the identification of workers and teams with particular and desirable skills. The major emphasis in most successful knowledge management projects is on the organizational and cultural changes required to create an organization where sharing knowledge has a high priority and information gate keeping is no longer acceptable. Technology and tools are valuable enablers, but without the cultural changes, little knowledge management is likely to occur. 2.1.2 The main process of text mining Text mining uses unstructured textual data and examines it in an attempt to discover structure and implicit meanings “hidden” within the text using

35

techniques from data min ing, machine learning, natural language processing (NLP), information retrieval (IR), information extraction (IE), and knowledge management [14]. One main objective of text mining is the support of the knowledge discovery process in large document collections (web or conventional storage). The text mining process consists of four major stages: collecting documents, preprocessing documents, feature extraction, and metadata mining and rough knowledge generation, as illustrated in Figure 1. Document collecting

Multiple documents

Metadata database

Metadata mining

Document Preprocessing

Features extraction

Fig.1. The process of text mining. Stage I: Document collection This is the first stage of text mining. The main work at this stage is to search text documents that the tasks need. In general, the first step of document collection is to identify what documents are to be retrieved. When determining a subject, some keywords should be selected, and then text documents can be searched and queried with the aid of a search engine. When a user comes to the search engine (e.g., Google, http://www.google.com) and makes a query, typically by giving keywords (sometimes advanced search options such as Boolean, relevancy-ranked search, fuzzy search and concept search [14, 16] are specified) the engine looks up the index and provides a listing of best-matching documents from internal file systems or the Internet according to its criteria, usually with a short summary containing the document’s title and sometimes parts of the text. Stage II: Documents preprocessing After the documents are collected, preprocessing is necessary. Much of the literature emphasizes the importance of preprocessing, such as [7, 17, 18]. This research has found that there are preprocessing biases in web-usage mining and that preprocessing has a significant effect on text mining [17]. In general, document preprocessing includes four steps: relevancy check, feature generation, feature selection and document representation [19]. A. Relevancy check The first step of this stage is to check the relevancy of matching documents retrieved. Because hundreds of thousands of documents may be retrieved from internal systems and the web, we need to select highly relevant documents with specific subjects to

36


improve the efficiency and performance of text mining. B. Feature generation When determining a specific number of documents in terms of relevancy, further processing is needed to facilitate subsequent text document processing. Then feature generation step begins. Documents may be represented by a wide range of different feature descriptions. We need to generate this feature description. In this step, two approaches are introduced. (a) Full text division approach. The full text division approach model acts by treating each document as a bag-of-words [16]. For example, “the price of oil rises” can be divided into word sets, i.e., {“the”, “price”, “of”, “ris es”}. More generally, we have such a definition as follows. Full text division approach: Let T = {T1 , T2 , …, Tm } be a collection of documents. Let I = {I1 , I2 , …, In } is a set of interesting words appearing in T. Then each document Ti is a bag-of-words I’= {Ij , …, Ik}, where 1 ≤ j ≤ k ≤ n. In this approach, we eliminate the uninteresting words, i.e., stemming removal and stop words elimination. For example, “rises”, “rising”and “risen” are all mapped to the same stem “rise”. In addition, to avoid too many feature vectors, some stop words, such as “the”, “a/an”, “of”, “you”, are eliminated. (b) Keywords/indexed data approach. Keywords/indexed data approach in text preprocessing [20] emphasizes refining each bag-of-words by referring to a keywords list. This works well when passing a collection of homogenous documents, i.e., automatically indexing each text -mining-related document with a keyword list. Even though this approach does not generate uninformative or uninteresting knowledge, it heavily destroys the “rich” feature of the original collection of textual data. The newly created intermediate form is useless for mining purposes other than computer-science-based rule mining. Keywords/indexed data approach: Let T = {T1 , T2 , …, Tm } be a collection of documents. I = {I1 , I2 , …, In } is a set of keywords appearing in T. Each Ti contains a set of keywords I′ ⊆ I. Many various techniques for keyword determination have been proposed in natural language processing (NLP). In [21], Rajman et al. mentions the idea of “frequency-based weighting schemes” [22] as follows:

1 + ( p i, j )  N × log( ), if pi , j ≠ 0 0. 5 × max j ( pi , j ) nj wi, j =  0 otherwise 

(1)

where wi,j is the weight of word wj in document ti , p i,j is the relative document frequency of word wj in ti

( p i , j = f i, j /

∑

k

f i,k , where fi,j is the number of

occurrences of wj in ti ), N is the number of documents in the collection, and n j is the number of documents containing wj . C. Feature selection Feature generation is followed by feature selection. The main objective of this step is to eliminate features that provide only few items of or less important information. This time statistical values are used to determine the most meaningful features. Hence, it produces a low dimensional representation. The most common indicators are term frequency (TF), inverse document frequency (IDF), and their multiplicative combination (TFIDF). By using TF it is assumed that important words occur more often in a document than unimportant ones. When applying IDF, the rarest words in the document collection are supposed to have the biggest explanatory power. With the combined procedure TFIDF the two measures are aggregated into one variable. Whatever metric is used, only the top n words with the highest score are selected as features at the end of the selection process [19]. Usually, feature selection is based on the information retrieval measure TFIDF because this integrates both indicators. TFIDF: Let TF (i, j) be the term frequency of term j in a document d i ∈ ?D*, i= 1, … , N. Let DF (j) be the document frequency of term j that counts in how many documents term j appears. Then TFIDF (term frequency × inverse document frequency) of term j in document is defined by:

TFIDF (i, j) = TF (i, j ) × log(N DF ( j )) .

(2)

TFIDF weighs the frequency of a term in a document with a factor that discounts its importance when it appears in almost all documents. Therefore terms that appear too rarely or too frequently are ranked lower than terms that hold the balance and, hence, are expected to be better able to contribute to clustering results. From previous approach, we produce the list of all terms contained in one of the documents from the corpus D* except of terms that appear in a standard list of stop words. Then, TFIDF selects the d best terms j that maximize W(j),

W ( j) = ∑i=1TFIDF ( i, j) N

(3)

and produces a d dimensional vector for document d i containing the TFIDF values, TFIDF (i, j) for the d best terms. D. Document representation Document representation is the final task in document preprocessing. Here the documents are represented in terms of the selected features to which the dictionary was reduced in the precedent steps. Thus,


the representation of a document is a feature vector of d elements where d is the number of features remaining after finishing the selection process. The whole document collection can therefore be seen as a (m×d) feature matrix A (with m as the number of documents) where the element a ij represents the frequency of occurrence of feature j in document i. Typical frequency measures are the above mentioned values TF, IDF, and TFIDF; all positive values may be replaced by 1, leading to a binary representation which indicates whether or not a certain feature appears in the document [19]. Stage III: Features extraction The primary objective of the feature extraction operation is to identify facts and relations in text. In general, text feature extraction problems are handled by the text weighting approach or by the semantic analysis based approach. But conventional text weighting schemes do not reveal the text characteristics in the related documents satisfactorily. Here a novel probability-based approach is introduced for features extraction with the help of the K-mixtures model [23]. In the K-mixture model, the probability Pi (k) of the word wi , appearing k times in a document, is given by: x ( y )k Pi ( k ) = (1 − x)δ k ,0 + × , (4) y + 1 ( y + 1) k where δ k ,0 = 1 if and only if k = 0; δ k ,0 =0, otherwise. The parameters x and y can be fitted using the observed mean t and the observed inverse document frequency (IDF) as follows:

t = CFi / N ,   IDF = log( N / DFi ( wi ) ),  (5)  IDF  y = t × 2 − 1 = ( CFi − DFi ) / DFi ,  x = t / y, where collection frequency (CFi ) refers to the total number of occurrences of the ith term in the collection, document frequency (DFi ) refers to the number of documents in the collection in which the ith term occurs, and N is the total number of documents on the collection. The parameter x used in the formula denotes the absolute frequency of the term, and y can be used to calculate the extra terms per document in which the terms occurs (compared to the case where a term has only one occurrence per document). Using the probability-based approach, some words/terms with high probability are used as text features or called “metadata”. This makes metadata repository construction possible. We use extracted features as attributes of the metadata base. In view of these features or attributes, we may find some useful patterns using text classification algorithms from metadata repository.

37

Stage IV: Metadata mining and rough knowledg e generation From feature extraction, high-level information can be obtained, i.e., metadata creation. Patterns and relationships are discovered within the extracted metadata. The main work of this stage is to mine the hidden patterns and their associations in the metadata repository. Pattern. If we consider the data as a set of facts (F), then a pattern is a rule expression (E) that describes facts in a subset FE of F [24]. Generally speaking, there are two kinds of patterns: the predictive and the informative. The former is used to predict one or more attributes in a database. This kind of pattern can make an educated guess about the value of an unknown attribute given the values of other known attributes. The informative patterns do not solve a specific problem, but they present interesting patterns that the user might not know. Association. Given a set of documents, identify relationships between attributes (features that have been extracted from the documents) such as the presence of one pattern implying the presence of another pattern. Take crude oil price forecasting as an example, OPEC oil embargo → crude oil price will raise in international crude oil market could be a rule that had been discovered. An application based on this operation is presented by Feldman [25]. 2.1.3 The evaluation of text mining theory Text mining is powerful on its own, enabling users to turn volumes of electronic documents into new stores of insightful and valuable information. It can “see”the hidden content of documents, including useful relationships. It is even more beguiling when it is used as a powerful enabling tool in a carefully planned and implemented knowledge management project. In practical business application, it allows the user to: (i) identify and solve problems; (ii) eliminate bottlenecks by finding repetitive patterns; (iii) find new business opportunities, many of them with an organization’s “best”customers –the ones it already has. However, text mining is not perfect, and has several drawbacks. First of all, the final solution depends on the initial conditions, e.g., the definition of keywords. Second, text mining suffers from too much human intervention [13]. Third, the final solution (i.e., rules and patterns) may be uncertain, vague and imprecise, making further refining necessary. Additional explanation on this point will be presented in Section 2.3. 2.2 The main principles of rough sets theory 2.2.1 Introduction Rough sets theory [9] is a powerful mathematical tool

38


that handles vagueness and uncertainty. The concept of rough sets theory is founded on the assumption that every object of the universe of discourse is associated with some information. For example, if the object is crude oil listed on a commodity market, the information about the crude oil is composed of price behavior and its variability characteristics. Objects characterized by the same information are indiscernible (similar) in view of the information available . The indiscernibility relation generated in this way is the mathematical basis of rough sets theory. The most important problems that can be solved by rough sets theory are: finding description of sets of objects in terms of attribute values; checking dependencies (full or partial) between attributes; reducing attributes; analyzing the significance of attributes; and generating decision rules [26]. Since Z. Pawlak [9] first proposed rough sets theory in 1982, this theory has been studied by a fast growing group of researchers, who have made great progress [27, 28]. Rough sets theory is widely applied in many areas such as machine learning, knowledge discovery in databases, expert systems, inductive reasoning, neural networks, decision systems, automatic classification, patterns recognition and learning algorithms [29]. For the reader’s convenience, some basic concepts of the theory are presented in this section. For a detailed review on the rough sets theory, readers can refer to Pawlak [27] and Komorowski et al. [30]. 2.2.2 The basic concepts of rough sets theory A. Information systems In the Rough Sets Theory, information systems are used to represent knowledge. The notion of an information system presented here is described in Pawlak [27]. A summary is included here for completeness. A information system S = (U, Ω, Vq , fq ) consists of: U is a nonempty, finite set called the universe; Ω is a nonempty, finite set of attributes; Ω = C ∪ D, in which C is a finite set of condition attributes and D is a finite set of decision attributes. For each q∈ Ω, Vq is called the domain of q; fq is an information function fq : U → Vq . Objects can be interpreted as cases, states, processes, patients and observations. Attributes can be interpreted as features, variables and characteristic conditions. A special case of information systems called decision table or attribute-value table is applied in the following analysis. In a decision table, the row and column correspond to objects and attributes, respectively. B. Lower and upper approximations Due to the imprecision which exists in the real world data, there are always conflicting objects in a

decision table. Here conflicting objects refer to the two or more objects that are indiscernible by employing any set of condition attributes, but they belong to different decision classes. Such objects are called inconsistent. This decision table is called an inconsistent decision table. In rough sets theory, the approximations of sets are introduced to deal with inconsistency. If S = (U, Ω, Vq , fq ) is a decision table, and if B ⊆ X and X ⊆ U, then the B-lower and B-upper approximations of X are defined, respectively, as follows: B X = U{Y ∈ U IND( B) : Y ⊆ X }, (6)

B X = U{Y ∈U IND( B) : Y ∩ X = F },

(7)

where U / IND(B) denotes the family of all equivalence classes of B (classification of U); IND(B), called the B-indiscernibility relation, is defined as follows: IND( B ) (8) = {( x, y ) ∈ U 2 : for every a ∈ B, a( x) = a ( y)}

BN B ( X ) = B X − BX is called the B-boundary of X. B X is the set of all elements of U which can be certainly classified as elements of X, employing the set of attributes B. B X is the set of elements of U which can be possibly classified as elements of X using the set of attributes B. C. Quality of approximation One measure to describe the inexactness of approximation classifications is called the quality of approximation of X by B. It expresses the percentage of objects, which can be correctly classified into class X employing the attribute B: The

set

γ B (Ω ) =

∑ card (B X ) i

card (U )

(9)

If γ B (Ω) =1, then the decision table is consistent, otherwise, it is inconsistent. D. Reducts and core An important issue in rough sets theory is attribute reduction, in such a way that the reduced set of attributes provides the same quality of approximation as the original set of attributes. There are two fundamental concepts in connection with this attribute reduction. The B-reduct of Ω, denoted by RED(B), is the minimal subset of Ω that provides the same quality of approximation of objects into elementary classes of B as the whole attribute Ω. The B-core of Ω, CORE(B), is the essential part of Ω that cannot be eliminated without disturbing the ability to classify objects into elementary classes of B. It is the intersection of all reduction, i.e., CORE ( B) = I Ri , i = 1,2,L (10) Ri∈RED ( B)


2.2.3 The evaluation of rough sets theory The advantages of rough sets theory are as follows [31, 32, 33]: (i) It is based on the original data only and does not need any external information, unlike probability in statistics or the grade of membership in the fuzzy set theory. Nor is it necessary to correct the inconsistencies manifested in data. Instead, the lower and upper approximations are applied to describe the inconsistency and, consequently, deterministic and non-deterministic rules are induced. (ii) The rough sets model is a tool suitable for analyzing not only quantitative attributes but also qualitative ones. (iii) It discovers important facts hidden in data and expresses them in the natural language of decision rules. (iv) The set of decision rules derived by the rough sets model gives a generalized description of the knowledge contained in the information tables, eliminating any redundancy typical of the original data: this makes knowledge refinement possible. (v) The decision rules obtained from the rough sets model are based on facts, because each decision rule is supported by a set of real examples. Due to its advantage, which includes elimination of the need for additional information about data and the ability to extract rules directly from data itself, this theory has been used in more and more domains. However, there are still disadvantages to rough sets theory. (i) Reduct computation is a non-trivial task that cannot be solved by a simple-minded increase of computational resources [30]. This is one of the bottlenecks of rough sets theory. (ii) Decision rules induced by rough sets are not unique, and thus some

39

new objects are hard to judge because there is no clear indication of how to classify them [33]. These points are worth exploring further in the future. 2.3 The knowledge-based forecasting system for oil price movement 2.3.1 Introduction As revealed in previous two subsections, text mining theory does not generate perfect knowledge. In metadata repository, text mining only generates some “rough”knowledge. Thus, it is necessary to refine the rough knowledge and thus generate “crisp”knowledge for prediction and decision purposes. In view of this point, a knowledge-based forecasting approach — rough-set -refined text mining approach — is proposed for crude oil price forecasting. The proposed approach consists of two modules: one is the text mining module and the other is the rough set refinement module. In the first module, text mining techniques are used to construct a metadata repository and generate rough knowledge by extracting unstructured text documents: these include gathering various related text documents, preprocessing documents, feature extraction, and metadata mining and rough knowledge generation. In the second module, rough set theory is used as a knowledge refiner for the rough knowledge. This includes information table formulation, information reduction and rough knowledge refinement. Generally, the process of the proposed knowledge-based forecasting approach is illustrated in Figure 2.

Fig. 2. The main process of the knowledge-based forecasting system.

As can be seen from Figure 2, the knowledge-based forecasting system is actually a rough-set -refined text mining system. Text mining techniques are used to construct a metadata database and generate rough knowledge while rough sets theory is used to refine rough knowledge and generate crisp knowledge. Subsequently, we will explain the two modules in connection with crude oil price variability

in detail. 2.3.2 Text mining for metadata and rough knowledge generation In Section 2.1, the text mining stages are described in detail. In this section, the metadata and rough knowledge generation processes are presented in

40


conjunction with crude oil price movements. We use “oil price”, “crude oil market” and “oil volatility”as keywords to search related data, including some numeric and textual data. The number of documents obtained exceeds 18 million. Via text processing (see Section 2.1.2) we obtain the following

metadata and metadata patterns affecting oil price movement, as shown in Table 1. It should be noted that there is an assumption for metadata patterns: i.e., given other conditions are unchangeable, the patterns can be made.

Table 1. Metadata and metadata patterns. No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Metadata Total world demand Total supply of crude oil OPEC production Core OPEC production Core OPEC production capacity * Non-OPEC production Non-OPEC production capacity Capacity utilization ratio Inventory level Fuel switching capacity Crude oil disposition OPEC oil price Gasoline tax rate Oil import fee World economy growth Foreign exchange rate OPEC market share Market price in exchange Environment protection and tax Rumors and false news Forward price of crude oil OPEC oil embargo Oil worker strike Natural disasters related to oil Wars in oil countries Revolutions in oil countries Political conflict in oil nations Economic sanction to oil nations Terrorist attack Hostage crisis Large oil company merger Speculation

Condition Increase/Decrease Increase/Decrease Increase/Decrease Increase/Decrease Increase/Decrease Increase/Decrease Increase/Decrease Increase/Decrease Increase/Decrease Increase/Decrease Increase/Decrease Increase/Decrease Increase/Decrease Increase/Decrease Increase/Decrease Increase/Decrease Increase/Decrease Increase/Decrease Increase/Decrease Increase/Decrease Increase/Decrease Yes/No Yes/No Yes/No Yes/No Yes/No Yes/No Yes/No Yes/No Yes/No Yes/No Yes/No

Oil price tendency Up/Down Down/Up Down/Up Down/Up Down/Up Down/Up Down/Up Down/Up Down/Up Down/Up Down/Up Up/Down Up/Down Up/Down Up/Down Up/Down Up/Down Up/Down Up/Down Up/Down Up/Down Up/Down Up/Down Up/Down Up/Down Up/Down Up/Down Up/Down Up/Down Up/Down Up/Down Up/Down

* Core OPEC refers to the six Persian Gulf members, Saudi Arabia, Iran, Iraq, UAE, Kuwait, and Qatar [1].

The metadata pattern is actually a form of rules (of the “if … then”format). The patterns are classified into individual patterns and combination patterns. Individual patterns that have relatively simple conditions and attributes are used in defin ing combination patterns. In this study, the pattern itself can be considered to be the representation of a rule because the conditions of a pattern can be seen as conditions of a rule in the rule representation. Figures. 3 and 4 show how individual patterns and combination patterns are defined and constructed. The syntax of an individual pattern uses reserved words such as PATTERN, IF, AND, OR and

EXPLANATION, as illustrated in Figure 3. If certain important events are matched with the IF condition of a particular pattern, then the pattern is identified by the conditions, and the EXPLANATION part gives the information about what the pattern really means. The individual pattern itself has its own meaning and can be an important clue in predicting oil price volatility. Likewise, the combination patterns integrate several conditions or patterns to explain a certain sophisticated phenomenon, as illustrated in Figure 4. From Table 1, we can see that there are 32 metadata related to crude oil price fluctuations.


Generally, individual metadata patterns are deterministic, while combination patterns are not necessarily deterministic. Furthermore, the latter is more important than the former because the oil price formulation is very complex due to multi-factor interaction. The predictability of individual patterns is very limit. In most situations, combination patterns are more practical. However, in many combination patterns there are indiscernibility relations, as revealed in Table 2 for example. Because of this , we also call these patterns with indiscernibility relations rough knowledge. The rough knowledge needs to be processed further to make its predictability stronger and more precise than before. From Table 2, we observe that each pattern has a different description in terms of condition attributes and decision attributes . We find that numbers 3 and 4 are indiscernible in terms of three condition attributes, since they have the same values for these attributes while having different decision values. Therefore, it is necessary to refine these patterns. Subsequently, we use rough sets for this task.

41

PATTERN pattern_name IF condition_A (AND condition_B) (OR condition_C) ……… THEN PATTERN=pattern_name EXPLANATION=statement_A

Fig. 3. The syntax of an individual pattern.

PATTERN pattern_name IF pattern_A (AND pattern_B) (OR pattern_C) (AND condition_A) (OR condition_B) ……… THEN PATTERN=pattern_name EXPLANATION=statement_A

Fig. 4. The syntax of a combination pattern. . Table 2. Some combination patterns with indiscernibility relations (Example). Row no. 0 1 2 3 4 5 6 7 8 9

Demand Small Small Medium Medium Medium Large Large Large Large Large

Inventory High High High Low Low High Low Low Low Low

Condition Attributes Speculation No Yes Yes No No No No Yes Yes Yes

2.3.3 Rough sets for rough knowledge refinement In terms of the previous problem and given the advantages of rough sets theory, rough sets theory is found to be an appropriate tool for pattern refinement. When the rough sets approach is used in knowledge refinement, production rules (of the “if … then”format) can be induced from the reduced set of conditions and decision attributes. A unique feature (or a particular strength) of rough set approach is that unlike many other approaches in machine learning, it allows inconsistency and can deal with inconsistency in a very natural way. The computation involved in the lower

Oil company merger No Yes Yes No No No No No Yes Yes

Decision attribute Oil price movement Down Down Up Down Up Down Down Up Up Up

approximation will produce certain rules while the computation involved in the upper approximation will produce possible rules. We use the simple example of Table 2 to illustrate rule induction and refinement using the rough sets approach. As already mentioned above, a notable strength of using rough sets theory to deal with rough knowledge is its ability to deal with inconsistent information. For example, rows 3 and 4 represent contradictory information. The rough sets approach can handle this by using possible rules (see later). The partition generated by the decision attribute (oil price movement) is:

42


X = {{0, 1, 3, 5, 6}, {2, 4, 7, 8, 9}} The two subsets of X will be referred as X1 and X2 , respectively. We use the notation X* to denote the partition generated by condition attributes X. Here, the partition generated is (again indicated by row number): {D, I, S, O}* = {D, I, S}* = {{0}, {1}, {2}, {3, 4}, {5}, {6, 7}, {8, 9}} The set P of attributes is the reduct (or covering) of another set Q of attributes if P is minimal and the indiscernibility relations, defined by P and Q are the same. Here {D, I, S } is a reduct by removing O (the reader may verify that O has no effect on the partition). For X1 = {0, 1, 3, 5, 6}: Since {0} ⊂ X1 , {1} ⊂ X1 , {5} ⊂ X1 , we have P X 1 = {0} ∪ {1} ∪ {5} = {0, 1, 5}; since X1 ⊂ {0} ∪ {1} ∪ {3, 4} ∪ {5} ∪ {6, 7} = {0, 1, 3, 4, 5, 6, 7}, we have P X 1 = {0, 1, 3, 4, 5, 6, 7}. For X2 = {2, 4, 7, 8, 9}: Since {2} ⊂ X2 , {8, 9} ⊂ X2 , we have P X 2 = {2} ∪ {8, 9} = {2, 8, 9}; since X2 ⊂ {2} ∪ {3, 4} ∪ {6, 7} ∪ {8, 9} = {2, 3, 4, 6, 7, 8, 9}, we have P X 2 = {2, 3, 4, 6, 7, 8, 9}. For the lower approximation, we obtain the following certain rules by rough set refinement: From set P X 1 = {0, 1, 5}, we can induce the following rules: (a) (Demand, small)∧(Inventory, high)∧(Speculation, no) → (Oil price movement, down) (b) (Demand, small)∧( Inventory, high)∧(Speculation, yes) → (Oil price movement, down) (c) (Demand, large)∧( Inventory, high)∧(Speculation, no) → (Oil price movement, down) Note that the first two rules indicate that the value of the speculation does not have any impact on the decision attribute. Therefore, these two rules can be combined and simplified as the following rule: (d) (Demand, small)∧( Inventory, high) → (Oil price movement, down) Certain rules from set P X 2 = {2, 8, 9} can be obtained in a similar way. In addition, we can obtain possible rules from upper approximation. For example, the following possible rules obtained from P X 1 = {0, 1, 3, 4, 5, 6, 7} (after simplification): (e) (Demand, small) → (Oil price movement, down) (f) (Speculation, no) → (Oil price movement, down) Similarly, we can obtain possible rules from P X 2 = {2, 3, 4, 6, 7, 8, 9}. After refining using rough sets theory, more precise knowledge (or “crisp” knowledge, relative to rough knowledge) can be obtained. Using the crisp knowledge, we can predict the future crude oil market

tendency. 2.3.4 The working forecasting system

process

of

knowledge-based

The knowledge-based forecasting integrating text mining and rough sets theory are constructed according to the previous description. We believe that the steps as illustrated in Figure 5 should generally be taken for a successful development of this type of prediction system:

Fig. 5. The working process of the knowledge-based forecasting system. Generally, the proposed forecasting system includes the following eight steps. Goal definition: The goals of the prediction system should adhere to the following requirements: (i) prediction movement direction is consistent with real volatility; (ii) the prediction accuracy is as high as possible, i.e., minimizing the prediction or classification ratio. Document collection: A large number of texts must be prepared, which are used for extracting some related rules and pattern. Text preprocessing: preprocessing of the texts is


essential for improving the text mining performance and efficiency. Feature extraction and metadata generation: With the aid of feature extraction tools, some distinct feature will be extracted. Accordingly, some metadata will be generated and stored, and metadata repository will be constructed. Metadata mining and rough knowledge generation: Using data mining techniques, we can mine some basic rules and patterns from metadata database. Accordingly, rough knowledge is generated. Rough sets refinement: In the proposed forecasting system, rough knowledge is refined further by rough sets theory. More precise knowledge or crisp knowledge is obtained. Prediction and forecasting: With the use of crisp knowledge, we can predict the future crude oil market tendency. Evaluation and analysis of prediction results: In order to evaluate the proposed forecasting, the analysis of prediction results is performed connected with goal definition. Evaluating the prediction results can help us feedback corresponding information to some related steps, e.g., goal definition and metadata mining, as illustrated in Figure 5. So far a novel knowledge-based forecasting system — rough-set-refined text mining forecasting system — is constructed. Generally, the proposed forecasting approach is a method driven by text documents and related event data extracted from documents. This point is distinctly different from other methods driven by data. To confirm the efficiency of this proposed system, we perform an experimental analysis and comparison.

3. Experiment Design

43

driven by text documents, while conventional methods and neural network model belong to a class of models driven by data. Therefore, three requirements – other forecasting model, data and text collection and model input –are taken into consideration. 3.1 Descriptions of other comparable forecasting model In conventional methods, we use the random walk model (RW) as a benchmark for comparison. RW is a one-step-ahead forecasting method, since it uses the current actual value to predict the future value as follows: yˆt = y t −1 (11) where yt-1 is the actual value in current period t and yˆt is the predicted value in the next period. We also compare the RSTM’s forecasting performance with that of linear regression model (LRM), auto-regressive integrated moving average (ARIMA) model and back-propagation neural networks (BPNN). The d-dimensional linear regression model has the form

yˆt = a0 + ∑i =1 ai x t −i d

(12)

where a 0 is the intercept, xi is various factors, and ai is the coefficients of related factors. In an ARIMA model [35], the future value of a variable is assumed to be a linear function of several past observations and random errors. That is, the underlying process that generates the time series takes the form φ ( B ) yt = θ ( B ) et (13) where yt and et are the actual value and random error at time t respectively; B denotes the backward shift operator, i.e. Byt = yt-1, B2 yt = yt-2 and so on,

φ(B) =1−φ1B−L−φpB , θ ( B ) = 1 − θ 1 B − L − θ q B q , p

In our empirical analysis, we use the proposed approach to predict the crude oil market movement direction. The main reasons for selecting the crude oil price movement as an experimental object are as follows. First of all, crude oil plays an increasingly significant role in the world economy and is called the “blood of the national economy ”. Second, crude oil is also the world’s largest and most actively traded commodity, accounting for about 10 percent of total world trade [34]. Third, crude oil price is affected by many interactive factors and oil price formulation mechanism is very complex, which makes oil price prediction very difficult. In order to evaluate the forecasting ability of the rough-set-refined text mining (RSTM) approach, we compare its performance with those of conventional methods (e.g., statistical models and time series models) and neural network models. As mentioned above, the proposed RSTM approach is a forecasting approach

where p, q are integers and often referred to as orders of the model. Random errors, et , are assumed to be independently and identically distributed with a mean 2 of zero and a constant variance of σ , i.e.

e t ~ IID (0 , σ 2 )

. If the dth difference of {yt } is an ARIMA process of order p and q, then yt is called an ARIMA (p, d, q) process. The BPNN [36] is widely used and produces successful learning and generalization results in various research areas. Usually, a BPNN can be trained by the historical data. The model parameters (connection weights and node biases) will be adjusted iteratively by a process of minimizing the forecasting errors. For prediction purpose, the final computational form of the BPNN model is

44


yˆt = a 0 +

∑

q j= 1

w j f (a j +

∑

p

i =1

wij x t−i ) + ξ t

(14)

where a j (j = 0, 1, 2, … , q) is a bias on the jth unit,

wij (i = 1, 2, … , p; j = 1, 2, … , q) is the connection weight between layers of the model, xi (i = 1, 2,… , p) is the factors, f(•) is the transfer function of the hidden layer, p is the number of input nodes and q is the number of hidden nodes. In addition, we use software package Eviews to build the LRM and ARIMA model, while neural network toolbox of Matlab is used to simulate the BPNN model. For comparative purposes, the price information will be translated into tendency information. 3.2 Model inputs selection and data collection In the proposed knowledge-based forecasting approach, model input variables should be all possible events (or metadata) affecting the crude oil market. These input variables can be obtained using the rough-set-refined text mining system extracting data from internal file systems and the internet. The LRM model and BPNN model belong to multi-factor models (or multivariate models). Therefore, we need to collect many related factors. When considering related factors, two criteria – data availability and strong correlation – are satisfied. Considering the two criteria, the four main variables of world oil demand and supply, crude oil production and crude oil stock level are selected. In addition, some

historic price information is included in the BPNN model. Accordingly, some data are collected. Our data source is the US Energy Information Administration (EIA) (http://www.eia.doe.gov/emeu/mer/petro.html). The RW model and ARIMA model belong to time series models. The main variable is the crude oil price. The posted price is spot price and the unit is dollars per barrel. Data source is West Texas Intermediate (WTI) (http://www.economagic.com/em-cgi/data.exe/var/west -texas-crude-long). Various data used in our study are monthly data, and cover the period from January 1970 to October 2004. Generally, we take monthly data from January 1970 to December 1999 as in-sample data for training and modeling purposes, and the remainder as out-of-sample data for testing purposes. For space reasons, the original data are not listed here.

4. Experiment Results Each of the forecasting models described in the preceding section is estimated and validated by in-sample data. The model estimation selection process is then followed by using an empirical evaluation which is based on the out-of-sample data. At this stage, the relative performance of the models is measured by the hit ratio. Table 3 shows the experiment results.

Table 3. The performance of different forecasting methods. Forecasting models Random walk Linear regression model Auto-regressive integrated moving average Back-propagation neural network Rough-set-refined text mining

The RW model performs worst, producing only 51.72 percent hit ratio. RW assumes not only that all historic information is summarized in the current value, but also that increments–positive or negative — are uncorrelated (random), and balanced, that is, with an expected value equal to zero. In other words, in the long run there are as many positive as negative fluctuations making long term predictions other than the trend impossible. BPNN outperforms the conventional linear regression model and time series model (ARIMA) in terms of hit ratio, because the BPNN model includes not only the related factors affecting oil price variability but also the historic time series information. This makes the BPNN model generalization strong. On the other hand, the linear regression model only considers

Abbreviation RW LRM ARIMA BPNN RSTM

Hit ratios (%) 51.72 55.17 60.34 75.86 86.21

the partial factors that avoid complexity and data being unavailable. Furthermore, the ARIMA model only considers historic price information (i.e., time series data). The RSTM model has the highest forecasting accuracy among the individual forecasting methods. One reason for this is that more related factors (quantitative and qualitative) affecting oil price movement are included into the RSTM model, thus increasing RSTM’s generalization ability. Furthermore, the RSTM model is a knowledge-based forecasting system which can learn continuously and update the knowledge base (i.e., forecasting rules and patterns), making the predictability stronger and more flexible. While the other techniques only consider a part of factors, and thus the generalization weakens.


Furthermore, the previous models are usually based on minimization of empirical risk, decreasing the reliability of models. In other words, the construction of these models seeks to minimize training error rather than minimizing generalization error. Moreover, the BPNN method often leads to over-fitting problems in order to increase generalization.

[7]

[8]

5. Conclusions In this paper, we propose a novel knowledge-based forecasting approach — rough-set-refined text mining (RSTM) approach — to predict the crude oil market tendency. RSTM is a promising type of tool for forecasting complex dynamic markets (e.g., the crude oil market). As demonstrated in our empirical analysis, RSTM is superior to the other individual forecasting methods in forecasting the monthly movement direction of international crude oil prices. This sends a clear message for oil forecasters and oil traders, which can lead to a profit gain. In addition, experimental results reveal that our proposed approach is a promising alternative to the conventional methods for crude oil market tendency forecasting.

[9]

[10] [11]

[12]

[13]

Acknowledgements This work is partially supported by National Natural Science Foundation of China; Chinese Academy of Sciences; Key Laboratory of Management, Decision and Information Systems and City University of Hong Kong.

[14]

References

[16]

[1] B. Abramson, A. Finizza, “Using belief networks to forecast oil prices”, International Journal of Forecasting, Vol. 7, No. 3, pp. 299-315, 1991. [2] Y. Nelson, S. Stoner, G. Gemis, H.D. Nix, “Results of Delphi VIII survey of oil price forecasts ”, Energy report, California Energy Commission, 1994. [3] H.G. Huntington, “Oil price forecasting in the 1980s: what went wrong?”, The Energy Journal, Vol. 15, No. 2, pp. 1-22, 1994. [4] B. Abramson, A. Finizza, “Probabilistic forecasts from probabilistic models: a case study in the oil market”, International Journal of Forecasting, Vol. 11, No. 1, pp. 63-72, 1995. [5] G. Barone-Adesi, F. Bourgoin, K. Giannopoulos, “Don’t look back”, Risk August, Vol. 8, pp. 100-103, 1998. [6] C. Morana, “A semiparametric approach to

[15]

[17]

[18]

[19]

[20]

45

short-term oil price forecasting”, Energy Economics, Vol. 23, No. 3, pp. 325-338, 2001. D. Sullivan, Document Warehousing and Text Mining: Techniques for Improving Business Operations, Marketing, and Sales, Wiley Computer Publishing, 2001. R. Chau, C.H. Yeh, “A multilingual text mining approach to web cross-lingual text retrieval”, Knowledge-Based Systems, Vol. 17, No. 5-6, pp. 219-227, 2004. Z. Pawlak, “Rough sets ”, International Journal of Computing and Information Science, Vol. 11, pp. 341-356, 1982. Online Webster-dictionary. http://www.webster-dictionary.org/. A. Gelbukh, “Computational Linguistics and Intelligent Text Processing”, in Proceedings of CICLing’ 01, Lecture Notes in Computer Science, Vol. 2004, 2001. M.A. Hearst, “Untangling text data mining”, in Proceedings of ACL’99: the 37 th Annual Meeting of the Association for Computational Lingistics, pp. 3-10, 1999. R. Feldman, I. Dagan, “Knowledge discovery in textual databases ”, in Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD’95), pp. 112-117, 1995. H. Karanikas, B. Theodoulidis, “Knowledge discovery in text and text mining software”, Technical Report, UMIST Department of Computation, 2002. Online documents: http://www.delft-cluster.nl/TextMiner/. M.W. Berry, M. Browne, “Understanding search engines: mathematical modeling and text retrieval”, SIAM, Philadelphia, 1999. M. Saravanan, P.C. Reghu Raj, S. Raman, “Summarization and categorization of text data in high-level data cleaning for information retrieval”, Applied Artificial Intelligence, Vol. 17, pp. 461-474, 2003. Z. Zheng, B. Padmanabhan, S.O. Kimbrough, “On the existence and significance of data preprocessing biases in web-usage mining”, Informs Journal on Computing, Vol. 15, No. 2, pp. 148-170, 2003. C.P. Wei, Y.X. Dong, “A mining-based category evolution approach to managing online document categories”, in Proceedings of the 34th Annual Hawaii International Conference on System Sciences, Maui, Hawaii, 2001. R. Feldman, I. Dagan, “Mining text using keyword d istributions”, Journal of Intelligent Information Systems, Vol. 10, pp. 281-300, 1998.

46


[21] M. Rajman, R. Besancon, “Text mining: natural language techniques and text mining applications”, in Proceedings of the 7th IFIP Working Conference on Database Semantics (DS-7). Chapam & Hall, 1997. [22] G. Salton, C. Buckley, “Term weighting approaches in automatic text retrieval”, Information Processing and Management, Vol. 24, No. 5, pp. 513-523, 1988. [23] S.M. Katz, “Distribution of content words and phrases in text and language modeling”, Natural Language Engineering, Vol. 2, No. 1, pp. 15-59, 1995. [24] U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, “From data mining to knowledge discovery: An overview”, In U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy (Eds.), Advances in Knowledge Discovery and Data Mining, Menlo Park, CA, AAAI Press/The MIT Press, pp. 1-34, 1996. [25] R. Feldman, M. Fresko, H. Hirsh, “Knowledge management: A text mining approach”, in Proceedings of the 2nd International Conference on Practical Aspect of Knowledge Management (PAKM98), 1998. [26] Z. Pawlak, “Rough sets ”, in T.Y. Lin, N. Cercone, (Eds.), Rough Sets and Data Mining, Dordrecht, Kluwer, pp. 3-8, 1997. [27] Z. Pawlak, Rough Sets, Theoretical Aspects of Reasoning about Data, Dordrecht, Kluwer, 1991. [28] R. Slowiniski, Intelligent Decision Support: Handbook of Applications and Advances of Rough Sets Theory, Dordrecht, Kluwer, 1992. [29] M.E. Yahia, R. Mahmod, N. Sulaiman, F. Ahmad, “Rough neural expert systems ”, Expert Systems with Applications, Vol. 18, pp. 87-99, 2000. [30] J. Komorowski, Z. Pawlak, L. Polkowski, A. Skowron, “Rough sets: A tutorial”, in S.K. Pal, A. Skowron, (Eds.), Rough Fuzzy Hybridization: A New Trend in Decision Making. Springer, Singapore, pp. 3–98, 1999. [31] S. Greco, B. Matarazzo, R. Slowinski, “The use of rough sets and fuzzy sets in MCDM”, in T. Gal, T. Stewart, T. Hanne, (Eds.), Multicriteria Decision Making: Advances in MCDM Models, Algorithms, Theory, and Applications. Dordrecht, Kluwer, pp. 1–59, 1999. [32] A.I. Dimitras, R. Slowinski, R. Susmaga, C. Zopounidis, “Business failure prediction using rough sets”, European Journal of Operational Research, Vol. 114, pp. 263–280, 1999. [33] F.E.H. Tay, L. Shen, “Economic and financial prediction using rough sets model”, European Journal of Operational Research, Vol. 141, pp. 641-659, 2002.

[34] P.K. Verleger, “Adjusting to volatile energy prices”, Working paper, Institute for International Economics, Washington DC, 1993. [35] G.E.P. Box, G. Jenkins, Time series analysis: Forecasting and control, San Francisco: Holden-Day, 1970. [36] D. Rumelhart, G. Hinton, R. Williams, “Learning internal representations by error propagation”, in D. Rumelhart, J. McClelland, (Eds.), Parallel Distributed Processing: Explorations in the Microstructure of Cognition I. MIT press, Cambridge, MA, pp. 318-363, 1986.