Remote Sensing Image Classification Method Based on Evidence Theory and Decision Tree LI Xuerong*ab, XING Qianguob, KANG Lingyanab Graduate University, Chinese Academy of Sciences, Beijing 100080, P.R. China; b Yantai Institute of Coastal Zone Research, Chinese Academy of Sciences, 17 Chunhui Road, Laishan District, Yantai 264003, P.R. China a
ABSTRACT Remote sensing image classification is an important and complex problem. Conventional remote sensing image classification methods are mostly based on Bayesian subjective probability theory, but there are many defects for its uncertainty. This paper firstly introduces evidence theory and decision tree method. Then it emphatically introduces the function of support degree that evidence theory is used on pattern recognition. Combining the D-S evidence theory with the decision tree algorithm, a D-S evidence theory decision tree method is proposed, where the support degree function is the tie. The method is used to classify the classes, such as water, urban land and green land with the exclusive spectral feature parameters as input values, and produce three classification images of support degree. Then proper threshold value is chosen and according image is handled with the method of binarization. Then overlay handling is done with these images according to the type of classifications, finally the initial result is obtained. Then further accuracy assessment will be done. If initial classification accuracy is unfit for the requirement, reclassification for images with support degree of less than threshold is conducted until final classification meets the accuracy requirements. Compared to Bayesian classification, main advantages of this method are that it can perform reclassification and reach a very high accuracy. This method is finally used to classify the land use of Yantai Economic and Technological Development Zone to four classes such as urban land, green land and water, and effectively support the classification. Keywords: evidence theory, decision tree, support degree, remote sensing classification
1. INTRODUCTION The classification technique of remote sensing images is a branch of pattern recognition techniques in remote sensing field. It aims to the identification of remote sensing images, i.e. recognizing and classifying ground cover information in remote sensing images thereby distinguishing the corresponding ground truth and extracting the required information [12]. The classification of remote sensing data is important. The uncertainty of remote sensing data is that the value of the attribute has a confidence level, which comes from the acquirement, transmission, storage of remote sensing data. Dempster-Shafer evidence theory (D-S evidence theory) [3] is the extension of probability, which constructs the one-toone relationship between proposition and aggregation. D-S evidence theory is an uncertainty theory through transforming the uncertainty of proposition to the uncertainty of aggregation. D-S evidence theory is applied in the description, processing and deduction of uncertain, incomplete, unreliable data or information in recent years [4-6]. Classification is an important task of data mining, which is to construct models to classify the data into different classes. Decision tree classifier [7-8] is a supervised classification method, which is nonparametric and does not need the data in normal distribution. It depends on the classification rules, which can learn from classification process or predefinition, to classify the data. There are many decision tree algorithms such as ID3, C4.5, CART, etc., which are effective and widely used in classification field, but they could not deal with uncertain data in the construction and classification of the decision trees. As for the limitation of traditional decision tree algorithms, a D-S evidence theory decision tree method is proposed, which combines the D-S evidence theory with decision tree classifier. The method can deal with the uncertainty of *
[email protected]; phone 86-535-2109033; fax 86-535-2109000; yic.ac.cn.
Multispectral, Hyperspectral, and Ultraspectral Remote Sensing Technology, Techniques, and Applications III, edited by Allen M. Larar, Hyo-Sang Chung, Makoto Suzuki, Proc. of SPIE Vol. 7857, 78570Y · © 2010 SPIE CCC code: 0277-786X/10/$18 · doi: 10.1117/12.869544 Proc. of SPIE Vol. 7857 78570Y-1 Downloaded from SPIE Digital Library on 20 Apr 2011 to 159.226.100.156. Terms of Use: http://spiedl.org/terms
remote sensing imageries. To utilize evidence theory to decision tree algorithm, support degree is proposed, which is classification rules of the decision tree. When compared with statistical classification methods, the decision tree method using support degree shows great superiority. Experimental results demonstrate the proposed method is effective and can improve the classification accuracy.
2. D-S EVIDENCE THEORY AND DECISION TREE 2.1 D-S evidence theory The theory of evidence was first proposed by Dempster in early 1967 and then extended by Shafer as a mathematical framework for the representation of uncertainty. D-S evidence theory allows for a representation of both imprecision and uncertainty [9-10] through the definition of two functions: belief ( Bel ) and plausibility ( Pls ), both derived from a mass function m (or basic probability assignment). Mass functions are defined on the power set of the space of discernment D , i.e. a mass is attributed to each subset of D . In classification problems, D may for instance be the set of classes of interest, and a subset of D represents a union of classes. This represents a major difference with probabilistic approaches with only assign probabilities to singletons (i.e. to subsets of D of cardinality 1). In the following, singletons will be called simple hypotheses, whereas subsets containing at least two elements of D are called compound D
hypotheses. A mass function m is thus a function from 2 onto [0, 1], such that
∑ m( A) = 1
m(Φ ) = 0 ,
(1)
A⊆ D
A subset A with non-zero mass value is called a focal element. The problem of assigning masses to hypotheses becomes more complicated if values have to be assigned to compound hypotheses. Belief and plausibility functions are derived from the mass function, and are respectively defined by
Bel (Φ ) = 0 , Bel ( A) =
∑ m( B) , ∀A ⊂ D, A ≠ Φ
(2)
B⊆ A
Pls (Φ ) = 0 , Pls ( A) =
∑
m( B) , ∀A ⊂ D, A ≠ Φ
(3)
B I A≠Φ
Clearly, we have the following properties:
Bel ( D) = 1 ,
(4)
Pls( D) = 1 ,
(5)
Bel ( A) ≤ Pls( A), ∀A ⊂ D ,
(6)
Pls( A) = 1 − Bel ( A), ∀A ⊂ D
(7)
D-S theory evidence provides an explicit measure of ignorance about an event A and its complementary A as the length of the internal [ Bel ( A) , Pls ( A) ] (called belief internal). It can also be interpreted as the imprecision on the “true probability” of A . The mass assigned to D can be interpreted as the global ignorance since this weight of evidence is not discernible among the hypotheses. In summary, as for probability theory, using numerical values in [0, 1] allows us to represent uncertainty, but using the two functions Bel and Pls , D-S evidence is also able to represent imprecision.
Proc. of SPIE Vol. 7857 78570Y-2 Downloaded from SPIE Digital Library on 20 Apr 2011 to 159.226.100.156. Terms of Use: http://spiedl.org/terms
If masses are assigned only to simple hypotheses ( m( A) = 0 for | A |> 1 ), then the three functions m , Bel and Pls are equal and are a probability, called Bayesian mass function. Otherwise, there is no direct equivalence with probabilities. 2.2 Decision tree Decision Tree is one of the most popular classification algorithms [11-15], which is usually applied to data classification based on a tree-structured graph or model of decisions and their possible consequences or decision rules that constructed by learning from training dataset. Decision Tree can be divided into one root node, internal branch nodes and leaf nodes. Analytically, each internal node represents a set of attribute records from the original dataset (usually called test attribute), and each branch represents the probability value of the corresponding node. Each leaf node represents the attribute value of one class or category, different leaf nodes can represents attribute value of the same classes. Decision tree can be described by a group of production rules using IF-THEN style, each path from the root to leaf node stands one rule, the condition of rule is decided by the balance of all the nodes’ attribute value, the result of rule is the class attribute of leaf node in the c corresponding path. Compared with decision attribute, rule is more popular chosen in practical application, because it is more concise, easier to be comprehended, applied and adjusted when building expert system. Decision Tree can be flexibly adjusted by the class condition of internal nodes or rules. The basic scheme of decision Tree is to split and mask every target as an image layer so as to avoid one target’s affection and interference on the other target extraction. The methodology of decision tree is to gradually classify the remotely sensed data into each branch of the decision tree according to some rules. CART (Classification and Regression Trees) is one popular tree growth method to construct binary tree using training dataset for supervised classification. It has the advantage of taking the binary tree, in which the root node stands for all the samples, and the root node is divided into two child nodes, then every child node is divided into lower level child nodes, the division procedure is continued until that there are no nodes can be divided. As a non-parametric, multilayer method, free of data distribution hypothesis, decision tree is more robust and flexible for data analysis and interpretation in the application.
3. D-S EVIDENCE THEORY DECISION TREE METHOD 3.1 Image classification Classification of a digital image is a procedure of converting image pixels with different, especially similar properties, structure, etc., into different classes. The kernel of classification is to define the central point and scope of every classes and according classification decision functions. If two pixels are similar, they should have similar eigenvectors and the minimum distance between the two eigenvectors. Suppose two vectors of remote sensing imageries x and y who have m features respectively:
x = ( x1 , x2 ,..., xm )T , y = ( y1 , y2 ,... ym )T . Euclidean distance || x −
|| x − y ||=
m
∑ (x − y ) i =1
If || x −
y || presents the similarity of x and y :
i
i
2
.
y || is smaller, the difference between x and y in every feature is smaller. Otherwise, the difference is bigger.
A pixel x belongs to class A , which means x is more nearer to the average vector of class A center.
A = {x1 , x2 ,...xn } , and xi has m features or characters, so xi = ( xi1 , xi2 ,...xim ) , (i = 1, 2,...n) .
Proc. of SPIE Vol. 7857 78570Y-3 Downloaded from SPIE Digital Library on 20 Apr 2011 to 159.226.100.156. Terms of Use: http://spiedl.org/terms
yj =
1 m j ∑ xi , ( j = 1, 2,...m) , n i =1
Average vector y = ( y , y ,... y ) of class A is constructed. The pixel x belongs to class A , which means the distance 1
2
m
between x and average vector of class A is nearest. The similarity between x and class A is presented. But it is hard to conclude that the similarity between x and class A is better than the similarity between x and class B. So support degree is needed to reflect the similarity between pixel x and class A . 3.2 Support degree of D-S evidence theory To utilize evidence theory to decision tree algorithm, support degree is proposed. When compared with statistical classification methods, the decision tree method using support degree shows great superiority. Finally, result comes out after fusing the two classification results with D-S evidence theory. Experimental results demonstrate that the proposed method is feasible and can improve the classification accuracy. Suppose θ i represent x ∈ Ai (i = 1, 2,...n) , so let D = {θ1 , θ 2 ,...θ n } be a recognition frame. Plausibility function ( Pls ) is deducted on the recognition frame.
Pls ({θi }) =
C , (i = 1, 2,...n) and C is a constant. || x − Ai ||
Pls( A) = max( Pls({θi })) = max θi ⊂ A
θi ⊂ A
min || x − Ai || C = i∈(1,..,n ) , ∀A ⊂ D || x − Aθi || min || x − Aθi || θi ⊂ A
So we can get the support degree function S ( A) on the recognition frame D in S-D evidence theory:
min || x − Ai || S ( A) = 1 − Pls ( A) = 1 − i∈(1,..., n ) . min || x − Aθi || θi ⊂ A
The bigger the value of S ( A) is, the more similar x should belong to class Ai . So the support degree function is the rule of the classification. 3.3 D-S evidence theory decision tree method D-S evidence theory is an important tool to represent the uncertainty of the data. Support degree is used to represent the pixels which class they should belong to. Through combining the support degree with the decision tree algorithm, a D-S evidence theory is proposed. The following steps are technique process of the D-S evidence theory decision tree: (1) Preprocessing of remote sensing images, such as geometrical correction of TM data, proper bands choosing and so on. (2) Choose proper bands and representative data as the training samples according to different ground objects and classification types to construct decision tree. (3) Use ENVI’s decision tree algorithm where support degree in evidence theory is the branch conditions, and construct the decision tree of one object class and the leaf nodes represent different support degree such as 0, 0.1, 0.2, 0.3 to great than 0.6.
Proc. of SPIE Vol. 7857 78570Y-4 Downloaded from SPIE Digital Library on 20 Apr 2011 to 159.226.100.156. Terms of Use: http://spiedl.org/terms
(4) Execute the decision tree to classify the data, and produce the classification image of one ground object based on different support degree. (5) Produce the other round object classification images according to steps from (2) to (4). (6) Choose proper threshold to produce binary images of different ground object classification. The pixels whose support degree is less than the threshold are assigned to 0, while others are assigned to 1. (7) Overlay the binarization images of ground object classification. (8) Appraise the classification accuracy of the final overlaid image. If the accuracy is lower than need, go to (6); otherwise, classification is finished.
4. APPLICATION EXPERIMENT The experiment chooses 2006’s Landsat 5 / Tm images of Yantai Economic and Technological Development Zone. The D-S evidence theory decision tree method is used to classify the land cover of Yantai Economic and Technological Development Zone to four classes such as urban land, farmland, forest land and water, and effectively support the classification. The following are the steps: (1) Select TM 5-4-3 spectral bands, do geometry correction, and subset the images of Yantai Economic and Technological Development Zone. (2) Choose the common and representative data as the training samples. Classification accuracy depends on the quality and quantity of the samples. (3) Calculate the maximum, minimum and average values of the interesting spectral bands. These values can be used to calculate support degree of the ground object classification. (4) Construct decision tree with different support degree according to the three classes. (5) Choose zero as the threshold to produce binary images of different ground object classification, execute the decision tree algorithm to classify the data, and produce three ground object classification images (figure 1(a)-(c)). (6) Overlay the binarization images of ground object classification, and produce the classification image of the three ground objects (figure 2(a)).
Proc. of SPIE Vol. 7857 78570Y-5 Downloaded from SPIE Digital Library on 20 Apr 2011 to 159.226.100.156. Terms of Use: http://spiedl.org/terms
Figure 1. Classification images of support degree, (a) water, (b) urban and (c) green land.
Figure 2. The result of image classification based on evidence theory, (a) the first classification and (b) the second classification.
Proc. of SPIE Vol. 7857 78570Y-6 Downloaded from SPIE Digital Library on 20 Apr 2011 to 159.226.100.156. Terms of Use: http://spiedl.org/terms
5. APPRAISAL OF CLASSIFICATION ACCURACY Compared to Bayesian classification, main advantages of this method are that it can perform reclassification and reach a very high accuracy. From figure 1(a)-(c), three classification images based on D-S evidence theory decision tree method is showed. Through the comparison with original remote sensing images, the accuracy of water (figure 1(a)) is high, while the accuracy of the other two classification result is low. Binarization operations with the three classification results are done. The pixels that support degree is zero belong to one class, and the others belong to another class. Four classes such as water, urban land, and green land is numbered to 1, 2 and 3. Then the three binarization images are overlaid to one result image (figure 2(a)). We randomly choose 320 points from figure 2(a) and compare with the reference and original images, so we get the classification error matrix and the accuracy assessment report (table 1). Table 1. Classification error matrix and the accuracy assessment report. Class
Water
Urban
Green land
Number of samples
Classification accuracy
Water
69
1
5
75
0.9200
Urban
1
71
23
95
0.7474
Green land
3
46
101
150
0. 6733
Total number of samples: 320, correct classified samples: 241, and overall classification accuracy: 0.7531. From table 1 the total appraisal result of the accuracy is 0.7531, which is similar to the result of six times classification of the maximum likelihood classification method (the result is 0.7312). Because the accuracy of green land is lower, we can adjust the threshold to reclassify until get the ideal accuracy. Because the area of green land is large, we choose the support degree of the three cover classes: urban land more than 0.4, water more than 0.5, green land more than 0.3 and overlay the three images second times. The figure 2(b) is the result. We randomly choose 320 points from figure 2(b) and compare with the reference and original images. The total classification accuracy reaches 0.9023 and the accuracy of the three classes is 0.9600, 0.8532, and 0.9233. The classification accuracy meets our demands.
6. CONCLUSION Remote sensing data has uncertainty and plausibility result from the data acquirement, transmission, storing, handling etc.. D-S evidence theory is a powerful tool that can be applied to express the uncertainty of the data. Decision tree is a classification algorithm, which is a non-parametric, multi layer method, free of data distribution hypothesis, decision tree is more robust and flexible for data analysis and interpretation in the application. Its time complexity is low and has fast classification speed. Combining the D-S evidence theory with the decision tree algorithm, a D-S evidence theory decision tree method is proposed, where the support degree function is the tie. The method is used to classify the classes, such as water, urban land and green land with the exclusive spectral feature parameters as input values, and produce three classification images of support degree. Then the proper threshold value of support degree is chosen to each classification image and binarization handling is executed. Then overlay these images according to the type of classifications, and the initial result is obtained. Finally further accuracy assessment will be done. If initial classification accuracy is unfit for the requirement, reclassification is conducted through re-choosing the support degree threshold of the images of every ground object classification, until final classification meets the accuracy requirements. Compared to Bayesian classification, main advantages of this method are that it can perform reclassification and reach a very high accuracy. This method is successfully used to classify the land cover of Yantai Economic and Technological Development Zone to three classes such as water, urban land and green land. The experiment effectively supports the classification method and has precise classification result.
Proc. of SPIE Vol. 7857 78570Y-7 Downloaded from SPIE Digital Library on 20 Apr 2011 to 159.226.100.156. Terms of Use: http://spiedl.org/terms
REFERENCES [1] MCCLEAN S, SCOTNEY B, SHAPCOTTM, “Aggregation of imprecise and uncertain information in databases”, IEEE Transactions on Knowledge and Data Engineering, 13 (6): 902-912(2001). [2] Blaschke T, "Object based image analysis for remote sensing", ISPRS Journal of Photogrammetry and Remote Sensing 65, 2-16(2010). [3] Shafer G., [A Mathematical Theory of Evidence], Princeton University Press, Princeton, 1976. [4] Duan X S. [Evidence Theory and Decision and Artificial Intelligence]. Beijing: China Renmin University Press, 1993. [5] Yager R J, Kacp rzyk J, FedrizziM, [Advances in the Dempster-Shafer Theory of Evidence]. New York: John Wiley and Sons, 1994. [6] Friedl M A, Brodley C E, Strahler A H, “Maximizing Land Cover Classification Accuracies Produced by Decision Trees at Continental to Global Scales”, IEEE Transactions on Geoscience and Remote Sensing, 37(2): 969-977(1999). [7] Safavian S R, Landgrebe D, “A Survey of Decision Tree Classifer Methodology”, IEEE Trans. Syst. Man Cybern, 21: 660-674(1991). [8] Mclver D K, Friedl M A, “Estimating Pixel-scale Land Cover Classification Confidence Using Non -parametric Machine Learning. Methods”, IEEE Transaction on Geoscience and Remote Sensing, 39: 1959-1968(2001). [9] Mertikas, P. and Zervakis, M. E., "Exemplifying the theory of evidence in remote sensing image classification", International Journal of Remote Sensing, 22(6), 1081-1095(2001). [10] Isabelle Bloch, "Some aspects of Dempster-Shafter evidence theory for classification of multi-modality medical images taking partial volume effect into account", Pattern Recognition Letters 17, 905-919(1996). [11] Mclver D K, Friedl M A. “Using Prior Probabilities in Decision-tree Remotely Sensed Data”, Remote Sensing of Environment, 81: 253-261(2002). [12] Friedl M A, Brodeley C E. “Decision Tree Classification of Land Cover from Remotely Sensing Data”, Remote Sensing of Environment, 61, 399-409(1997). [13] Niccolai, Andrew M. , Hohl, Aaron , Niccolai, Melissa and Dearing Oliver, "Decision rule-based approach to automatic tree crown detection and size classification", International Journal of Remote Sensing, 31:12, 30893123(2010). [14] Chun-Chieh Yang, Shiv O. Prasher, Peter Enright, Chandra Madramootoo, Magdalena Burgess, Pradeep K. Goel and Ian Callum, "Application of decision tree technology for image classification using remote sensing data", Agricultural Systems 76, 1101-1117(2003). [15] Hui YUAN, Rongqun ZHANG and Xianwen LI, "Extracting Wetland Using Decision Tree Classification", Proc. WSEAS 8, 240-245.
Proc. of SPIE Vol. 7857 78570Y-8 Downloaded from SPIE Digital Library on 20 Apr 2011 to 159.226.100.156. Terms of Use: http://spiedl.org/terms