mining metallogenic association rules combining cloud ... - IEEE Xplore

1 downloads 0 Views 377KB Size Report
spatial patterns or rules from large-amount, incomplete, noisy, fuzzy, random, and practical spatial databases [2]. However, spatial data mining research is ...
MINING METALLOGENIC ASSOCIATION RULES COMBINING CLOUD MODEL WITH APRIORI ALGORITHM Cui Ying*a, Binbin Heaˈ Jianhua Chenab, Zhonghai Hea, Liu Yueb a. Institute of Geo-Spatial Information Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China b. College of Information Engineering, Chengdu University of Technology, Chengdu 611731, China c. College of Geology Science, Chengdu University of Technology, Chengdu 611731, China [email protected], [email protected], [email protected], [email protected],[email protected] ABSTRACT Spatial data mining refers to extracting and “mining” the hidden, implicit, valid, novel and interesting spatial or nonspatial patterns or rules from large-amount, incomplete, noisy, fuzzy, random, and practical spatial databases. Spatial association rules mining is extracted implicit association rules from spatial database. Many metallogenic association rules lie in geology spatial database. In this paper, a method for mineral resources prediction is proposed, which mainly including uncertainty transmission between qualitative and quantitative geology spatial data using cloud model, metallogenic association rules extracting using Apriori algorithm, and comprehensive assessment of rules. At last, an experiment of iron resources prediction is performed in Eastern Kunlun Mountains, China. The results indicated that the method proposed in this paper is suitable for regional metallogenic prediction. Index Terms—Spatial data mining, Apriori algorithm, cloud model, metallogenic prediction 1. INTRODUCTION Spatial data mining refers to extracting and “mining” the hidden, implicit, valid, novel and interesting spatial or nonspatial patterns or rules from large-amount, incomplete, noisy, fuzzy, random, and practical spatial databases [2]. However, spatial data mining research is currently focused on the efficiency of mining algorithm itself, while there are few studies in practical application. Association rules mining is an important part in spatial data mining. Therefore, mining metallogenic association rules will be

978-1-4244-9566-5/10/$26.00 ©2010 IEEE

4507

worthwhile and provide some reference for evaluation of mineral resources. The study of spatial data uncertainty is particularly important [1, 2]. In the data preprocessing, use the cloud model to get uncertain transition between qualitative and quantitative in data so that it can reduces the data uncertainties impact. We developed the Apriori algorithm which can deal with relational table directly so that relational table need not be transformed to binary transaction. Finally, combining the cloud model based generation method with Apriori algorithm [4] for mining association rules and metallogenic prediction using the spatial geological data of Eastern Kunlun orogenic belt shows its efficiency and effectiveness. 2. METHOD 2.1 Attribute Generalization Based on Cloud Model Cloud model is a model of the uncertain transition between a linguistic term of a qualitative concept and its numerical representation. In short, it is a model of the uncertain transition between qualitatives and quantitatives [3]. The unique definition of cloud is that only with the expectation Ex, entropy En and ultra-entropy He three values can be outlined by the tens of thousands of cloud droplets formed cloud, said the language of the qualitative value of the ambiguity and randomness fully integrated together to achieve a soft discretization of data. The normal compatibility clouds are most useful in representing linguisticatoms and the MEC [3, 7] (mathematical expected Curve) of the normal compatibility cloud to a linguisticatom A is: ( x  Ex ) 2 MEC A ( x) exp[ ] 2 En 2

IGARSS 2010

Fig.1 Linguistic atoms for the linguistic variable “Iron geochemical anomaly” Firstly, based on the above definition, this experiment chooses the quantitative property iron geochemical anomaly as discrete objects. Secondly concept definition and description, Iron geochemical anomaly domain is [0, 10], be divided into three concepts: Iron geochemical anomaly high, Iron geochemical anomaly middle, and Iron geochemical anomaly low atoms, each linguistic variable have some overlaps in Fig1. Experts give the Ex, En, He in the cloud model to define linguistic atoms. Thirdly, input Iron geochemical anomaly numerical data to achieve quantitative conversion and output results as text.

multiplied by each weighting according to AHP (Analytic Hierarchy Process) after being standard. Formulas of every quality evaluation are as follows. Support(A→B)=P˄AĤB˅ Confidence˄A→B˅=P˄B|A˅ Lift(A→B)=P(B)/P(AUB) Coverage = N(A)/N Leverage = P(B|A)-(P(B)*P(A)) Interesting = ((P(X∩Y)/P(X)*P(Y))K-1)*(P(X)*P(Y))m

2.2 Mining Metallogenic Association Rules Apriori algorithm was first proposed by Agrawal et al in 1993[4]. Association rule mining finds interesting association or correlation relationships among a large set of data items [4, 6]. However, Apriori algorithm is only used for mining association rules among one-dimensional binary data. We developed the Apriori algorithm and it can deal with relational table directly so that relational table need not be transformed to binary transaction. The problem of discovering all association rules can be decomposed into two subproblems. Firstly, find all sets of items that have transaction support above minimum support. The support for an itemsets is the number of transactions that contain the itemset. Itemsets with minimum support are called large itemsets, and all others small itemsets. Secondly, use the large itemsets to generate the desired rules. For every such subset a, output a rule of the form a => (l-a) if the ratio of support (l-a) to support (a) is at least minconf. We use support, confidence, and coverage, interesting, leverage, lift and N (number of rule antecedents) on quality evaluation of association rules. And each rule has a K which was used as comprehensive evaluating target. K was calculated by quality evaluations mentioned above

4508

The follows are the standard formulas:

sf

1 | x1 f  m f |  | x2 f  m f | ... | xnf  m f | n 1 ( x1 f  x2 f  ...  xnf ) mf n xif  m f zi f sf

x1fˈ…ˈxnf are metrics of f, mf is average value of f, z-score is the standard metric. In order to verify the feasibility and effectiveness of our uncertain spatial data mining model, we conducted experiment using geospatial data of eastern Kunlun orogenic belt in China, which contains geoscientific data and geochemical data. After extracting, the task relevant data are 481 records. It has seven attributes, the regional fault trend, ore-bearing rocks, wall rock alteration, and ore body status, the size of deposits, iron anomaly, and genetic type of deposit.

Table.1 Metallogenic Association Rules Rule No. 1

Rules

Confidence

Support

Lift

Interesting

Leverage

Coverage

unconformity Æ contact

0.916667

0.135803

1.280172

0.7808642

0.810585

0.148148

0.061728

1.396552

0.9382716

0.955799

0.061728

0.111111

1.256896

0.123457

0.811599

0.7888889

..

..

..

..

..

metasomatic iron

N

1

K 11.31871

NW ġ Cu geochemical anomaly 2

middle Æ contact metasomatic

1

2

20.55859

iron NW ġ Indo-Chinese epoch intrusive 3

rocks of Triassic Period Æ contact

0.9

3

31.13708

metasomatic iron ..

..

..

The task relevant data are organized as a relational table. Since iron geochemical anomalies are numerical, it is impossible to discover strong association rules at the primary level directly. Therefore, we define linguistic atoms for attribute generalization which has been done in the previous step. The generalized database is mined with the minimum support 3% and minimum confidence 80%. We will use “contact metasomatism hydrothermal iron ore” as the rule consequent and the other properties as the rule antecedent. There are many rules in the data mining processing, and they can be described in the form of production rules, such as table 1. 3. EXPERIMENTS AND RESULTS 3.1 Data Preprocessing In this research, vector ore-controlling data of stratum, unconformity, fault, regional chemical anomaly, remote sensing mineralization anomaly, bouguer anomaly, aeromagnetic anomaly and Gold mineral occurrence are used. All of those data were grid partitioned. Based on Weights of Evidence Modeling, the stratum as background layer, the grid is partitioned to 1km*1km in GIS software. All of the layers’ data were overlaid and spatially joined to the newly created grid layer, and every grid polygon in grid layer owns corresponding attributes of other layers, which was an itemset in the association rules.

..

..

To unconformity, fault and mineral occurrence, the feature data should be buffered. Based on Weights of Evidence Modeling, the buffering distances are different, 1000m point buffer to mineral occurrence, 300m line buffer to unconformity, 3000m line buffer to fault for for iron mine. 3.2 Metallogenic Association Rules Assessment After getting the association rules above, then we use the attribute table of grid generated mentioned in the data part to compare with the association rules. There will be three conditions for each record in the attribute table: the record’s items are the same as one of the rules’ antecedent; the record’s items are the same as many of the rules’s antecedents; there are no rules’ antecedents are the same as the record’s items. If there are no rules’ antecedents are the same as the record of attribute of table, its K was set 0. If there are many rules’ antecedents are the same as the items of the record, choose the maximum K as its K. 3.3 Results At last, every polygon of the grid layer has a K, and all of the Ks are joined in grid layer in GIS environment, then the maps of regional metallogenic potential prediction areplotted with grading and color separating. Fig. 2 shows metallogenic potential of contact metasomatic iron mine in east Kunlun area in China.

4509

Fig.2 Prediction Map of Contact Metasomatic Iron Deposits in East Kunlun Mountains, China

CONCLUSIONS The experiments show that the model of uncertainty-based metallogenic association rules mining is efficient, the whole process is efficient. At the same time, combining cloud model and Aprirori algorithm mining association rules in the geological mineralization can provide some reference for evaluation of mineral resources. The prediction map to contact metasomatic iron shows that known iron mines are exactly matched with the high potential areas, and the distribution trends of potential prediction areas are consistent with the direction of stratum and fault, which shows that the model of uncertainty-based metallogenic association rules mining in metallogenic prediction is effective. In addition, the high potential area is about 10% of the whole area, and the medium potential area is about 15% of the whole area. While, calculation given by Weights of Evidence Modeling shows that the upper limit of better area is 10% to high potential, and 15% to medium potential. However, in the experiments spatial relationships were not discussed, this is where the future experiments need to be improved. ACKNOWLEGEMENTS This research work is funded by National High-tech R&D Program of China (863 Program), project number: 2007AA12Z227. REFERENCES [1] Binbin He, Cuihua Chen, “Spatial Data Mining with Uncertainty,” Proceedings of SPIE-The International Society for Optical Engineering, v 6787, 2007.

4510

[2] Binbin He, Fang Tao, and Zhida Guo, “Uncertainty in spatial data miningm,” Proceedings of 2004 International Conference on Machine Learning and Cybernetics, v2, pp.1152-1156, 2004. [3] Deyi Li, Kaichang Di, Deren Li, and Xuemei Shi, “Mining Association Rules with Linguistic Cloud Model’s,” Journal of Software, pp. 143-158, 200,11(2). [4] Rakesh Agrawal, Ramakrishnan Srikant, “Fast algorithms for mining association rules in large databases,” In: Proceedings of the 20th International Conference on Very large Data Bases, pp.478-499, 1994. [5]Michael Chau, Reynold Cheng, Ben Kao, and Jackey Ng, “Uncertain Data Mining: An Example in Clustering Location Data,” In Proc. Workshop on the Sciences of the Artificial, Hualien, Taiwan, 2005. [6] Han J, “Data mining techniques,” In: Proceedings of the ACM 2S IGMOD’ 96 Conference on Tutorial, June 1996. [7] Kaichang Di, DerenLi, KDD and its application and extension in GIS, In: Proceedings of the 2nd Annul Meeting of Chinese Association of GIS, Beijing, 1996.