Multi-label class assignment in land-use modelling

0 downloads 0 Views 570KB Size Report
Mar 20, 2015 - 3341. 64,861. 0.97. 0.98. 0.97. Forest. 146,520. 9490. 7986. 91,702. 0.94. 0.95 .... Pontius Jr. R.G., Cornell, J.D., and Hall, C.A., 2001. Modeling ...
This article was downloaded by: [94.252.11.216] On: 23 March 2015, At: 12:54 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

International Journal of Geographical Information Science Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/tgis20

Multi-label class assignment in land-use modelling a

b

c

Hichem Omrani , Fahed Abdallah , Omar Charif & Nicholas T. d

Longford a

Urban Development and Mobility Department, Luxembourg Institute of Socio-Economic Research - LISER, Esch-sur-Alzette, Luxembourg b

Heudiasyc Laboratory, UMR CNRS 6599, Compiegne University of Technology, Compiegne, France

Click for updates

c

Department of Geomatics Engineering, University of Calgary, Calgary, Alberta, Canada d

Statistics Research and Consulting, SNTL, Barcelona, Spain Published online: 20 Mar 2015.

To cite this article: Hichem Omrani, Fahed Abdallah, Omar Charif & Nicholas T. Longford (2015): Multi-label class assignment in land-use modelling, International Journal of Geographical Information Science To link to this article: http://dx.doi.org/10.1080/13658816.2015.1008004

PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,

Downloaded by [94.252.11.216] at 12:54 23 March 2015

systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/termsand-conditions

International Journal of Geographical Information Science, 2015 http://dx.doi.org/10.1080/13658816.2015.1008004

Multi-label class assignment in land-use modelling Hichem Omrania*, Fahed Abdallahb, Omar Charifc and Nicholas T. Longfordd a

Urban Development and Mobility Department, Luxembourg Institute of Socio-Economic Research - LISER, Esch-sur-Alzette, Luxembourg; bHeudiasyc Laboratory, UMR CNRS 6599, Compiegne University of Technology, Compiegne, France; cDepartment of Geomatics Engineering, University of Calgary, Calgary, Alberta, Canada; dStatistics Research and Consulting, SNTL, Barcelona, Spain

Downloaded by [94.252.11.216] at 12:54 23 March 2015

(Received 31 March 2014; final version received 11 January 2015) During the last two decades, a variety of models have been applied to understand and predict changes in land use. These models assign a single-attribute label to each spatial unit at any particular time of the simulation. This is not realistic because mixed use of land is quite common. A more detailed classification allowing the modelling of mixed land use would be desirable for better understanding and interpreting the evolution of the use of land. A possible solution is the multi-label (ML) concept where each spatial unit can belong to multiple classes simultaneously. For example, a cluster of summer houses at a lake in a forested area should be classified as water, forest and residential (built-up). The ML concept was introduced recently, and it belongs to the machine learning field. In this article, the ML concept is introduced and applied in land-use modelling. As a novelty, we present a land-use change model that allows ML class assignment using the k nearest neighbour (kNN) method that derives a functional relationship between land use and a set of explanatory variables. A case study with a rich data-set from Luxembourg using biophysical data from aerial photography is described. The model achieves promising results based on the well-known ML evaluation criteria. The application described in this article highlights the value of the multi-label k nearest neighbour method (MLkNN) for land-use modelling. Keywords: land-use modelling; multi-label; machine learning; geographic information systems

1. Introduction Land-use modelling is concerned with the evolution of the use of land. Experts from different disciplines (e.g. computer science, engineering, geography, mathematics and statistics) have contributed to this field. Machine learning techniques have been proposed as a way to model land-use changes due to their universal approximation capabilities as well as their high performance in a wide range of scientific fields and their ability to learn about spatial patterns in data (Pijanowski et al. 2014). They are presented, as a powerful tool for land-use modelling, on their own, for instance, in land transformation models (Pijanowski et al. 2002, Tayyebi et al. 2011, 2013) or combined with cellular automata models (Charif et al. 2012, Omrani et al. 2012, Basse et al. 2014). Machine learning techniques such as artificial neural networks (ANN) and support vector machines (SVM) build a functional relationship (or decision function) capturing the pattern of the land-use changes and mimicking the land-use evolution (Pijanowski et al. 2002, 2014, Mas et al. 2004, Lin et al. 2005, Yang et al. 2008, Huang et al. 2010, Jiang et al. 2011, *Corresponding author. Email: [email protected] © 2015 Taylor & Francis

Downloaded by [94.252.11.216] at 12:54 23 March 2015

2

H. Omrani et al.

Tayyebi et al. 2011, Omrani et al. 2012, Grekousis et al. 2013, Basse et al. 2014). These methods have shown competitive performance, using mono-class assignment to spatial units, compared to statistical regression models such as multinomial logistic regression. Tayyebi et al. (2014) have developed a framework to classify land-use change models. They compared three models to simulate land-use change in diverse areas of the world. Despite all the achievements in land-use modelling, the development of suitable land-use models remains an active and exciting research topic (Couclelis 2005, Batty 2011). Up to date, only mono-label machine learning methods have been applied to landuse modelling with single-labelled spatial units. However, Tomas et al. (2012) and White et al. (2012) argue that the mono-label class assignment for spatial units (i.e. a spatial unit has only one class/label at a time) is not realistic for regions exhibiting mixed land use. The problem of using conventional mono-label machine learning methods is that they use single-label units to predict land-use change. However, spatial units are often associated with a set of disjoint labels. Traditional mono-label concept (binary or multi-class) is a special case of multi-label (ML) concept, where each unit is assigned only one class. The single-label outcome is partial and incomplete. Therefore, there is a need for improvement in existing land use modelling approaches. A possible improvement is the use of ML concept where a spatial unit may be associated with a set of multiple classes. Compared to mono-label, ML concept (assuming that labels are not mutually exclusive) offers a more flexible approach that does not exclude possible evolution of the land use. Several studies have shown that ML assignment is useful for modelling complex ecological/environmental phenomena, since it may provide better interpretation and more accurate prediction compared to single-label assignment (Ferrier and Guisan 2006). Jones et al. (2011) used ML learning in a species distribution model which plays a key role in creating reserves for species conservation, predicting the effects of ecological change and testing ecological theory. Yang et al. (2012) apply the ML learning algorithms to environmental modelling in relation to sustainable flood retention basins. This work belongs to the same research stream as the two recent studies mentioned before (Jones et al. 2011, Yang et al. 2012). However, the ML concept has received little attention in a geographical context. Applying ML concept in land-use change modelling may look simple, but its implementation is challenging. Taking on this challenge, we propose to use the multi-label k nearest neighbour (MLkNN) for geographical phenomena. The literature on ML machine learning methods is very wide. However, the k nearest neighbour (kNN)-based methods have shown their performance in a wide range of applications. These methods were used successfully also for single-label classification of land use. For instance, the IDRISI’s land-use change modeller of Eastman et al. (2005) and Sangermano et al. (2010) uses an instance-based learning algorithm (SimWeight) based on the logic of the kNN method. Also, the GEOMOD land-use model of Pontius et al. (2001) applies the kNN principle, as a transition rule, for simulation of land-use change. These methods use the local neighbourhood (in the chosen feature space) of a given spatial unit to assign a land-use class. Besides being simple to use and to implement, nearest neighbour-based methods perform well in the presence of a large training set (Zhang and Zhou 2007, Zhang and Zhang 2010). Nevertheless, the parameters of nearest neighbour methods (k number of neighbours) have to be set with care. They can be adjusted by cross-validation to avoid over-fitting and under-fitting problems.

Downloaded by [94.252.11.216] at 12:54 23 March 2015

International Journal of Geographical Information Science

3

The ML class assignment is not new. It has been applied in computer science. It has been applied also in land-use science but only to compare maps with ML classes (Pontius and Cheuk 2006). This work differs from the existing works. In fact, the ML class assignment is different from modelling multiple land-use changes as applied by Tayyebi and Pijanowski (2014). In our model, the input, the output and the learning model are all multi-labelled. The learning model can handle ML data and learn a ‘multi-label prediction function’. In this article, we perform a case study with a rich data-set from the Grand Duchy of Luxembourg for modelling the mixed land-use change. Each cell is classified into four classes of land use: agriculture (A), forest (F), industrial (I) and urban (U). Further, a set of 13 variables is defined for each cell. These variables and the classification are available for years 1999 and 2007. We relate these variables to the classification in 1999 and classify cells in 2007 assuming that the same association applies in 2007. Since the classification of the cells in 2007 is known, we can assess the quality of this method. The results of our analysis demonstrate the effectiveness of the ML learning modelling framework. It offers a new approach that is intuitively more attractive and can generate more realistic simulation results than the existing mono-label land-use models. The key objectives of this research are summarised by the following two points: (1) Modelling the land-use change using the ML concept. (2) Applying ML learning to predict mixed land-use change. The remainder of this article is organised as follows. The next section presents the ML data-set constructed from a vector map, the MLkNN approach and some associated evaluation metrics. Section 3 applies the MLkNN model to predict the land-use change for the Grand Duchy of Luxembourg. Results from the model are presented in Section 4. Section 5 provides a discussion. Section 6 draws conclusions and outlines further research directions.

2. Modelling framework 2.1. ML data-set construction from vector maps A vector map is a vector-based collection of geographic information system (GIS) data about a specific region at different levels of detail (Niu et al. 2006). In the vector map, each feature has one class and traditional rasterisation of these data would result into one class (the majority class). Thus, to apply the proposed MLkNN method to landuse modelling, we have to preprocess this data-set and convert it into a ML one. Recall that by ML cell, we mean a cell that can have more than one class at the same time. The preprocessing consists of extracting each land-use class (such as urban land (U), industrial (I), agriculture (A) and forest (F)) from the vector map, then converting them to raster and finally summing them together. The summation result is coded so that each cell is associated with a vector of binary variables (see Algorithm 1). Each variable presents one of the land-use classes. The Arc-GIS10 software and Python programming language were used for the rasterisation as shown in Figure 1. Its output is a map (i.e. rectangular grid of cells) in which each cell may be associated with several labels.

4

H. Omrani et al.

Downloaded by [94.252.11.216] at 12:54 23 March 2015

Algorithm 1: Multi-label dataset construction Input: map: vector land-use map, landUseClasses: land-use classes set. nbClasses: number of land-use classes Output: multilabelRaster: multi-label land-use dataset begin exist ← 1; non Exist ← 9; for i ←1 to nbClasses do /* extract vector map for each land-use class */ extracted Map(i) ←extract Map(map, landUseClasses(i)); /* convert extracted map into raster dataset */ raster Data(i) ← convertToRaster (extracted Map(i)); /* recode ith raster dataset so that a cell is coded with exist value if it has ith the land-use and non Exist value if not */ recoded Raster Data(i) ← recode Raster (raster Data(i), exist, non Exist); exist ← exist × 10; non Exist ← non Exist × 10; end /* sum all the raster dataset to obtain the multi-label dataset */ multilabel Raster ← sum (recoded Raster Data(1… nbClasses)); return (multilabel Raster); end

2.2. ML machine learning approach for land-use change Machine learning includes two main groups of models in land-use change science: (1) models that use supervised algorithms, such as ANNs and SVM, and (2) models that use unsupervised algorithms, such as k-means. The former build a relationship (or decision function between the inputs and the output) that captures the pattern of the land-use

Figure 1. Vector to grid conversion: vector data-set containing polygons with associated labels (on the left). A grid with the desired cell size is added (middle panel). The values of the grid cells become the values of the labels of the polygons which contain them (on the right).

International Journal of Geographical Information Science

5

Downloaded by [94.252.11.216] at 12:54 23 March 2015

changes and mimics the evolution of land use; kNN is a supervised machine learning algorithm. The latter models simulate output based on the similarity between inputs. These models do not use any output in the training run. We use supervised ML machine learning, the MLkNN, for the calculation of land change transition potentials.

2.2.1. Method: multiple labels and MLkNN Before describing the MLkNN method, we give a short background about kNN. The kNN algorithm is a simple classification method. For each cell in the testing set, we define its distances from all the cells in the learning set. For example, if the learning set contains 200 cells and the testing set has 5000 cells, then we have to evaluate a 200  5000 matrix of distances. We refer to a cell from the testing set as unseen. For each unseen cell u, we identify k cells in the learning set that are closest to u. These cells are called the (k nearest) neighbours (of u). For example, k may be set to 10. An unseen cell may be classified by the label that is in the majority among its k neighbours (Wu et al. 2008). We present a Bayesian version of the kNN method for ML classification (MLkNN). This MLkNN method is adopted as a decision rule for modelling land use. For details of the kNN and MLkNN methods, we refer to Wu et al. (2008) and Zhang and Zhou (2007), respectively. The background to Bayes theory necessary for these methods can be found in Bernardo and Smith (2009). Figure 2 shows a flow chart of the processing steps of the MLkNN modelling approach. This flow chart can very well be applied to mono-label learning. However, the Bayesian rule related to the ML learning brings an algorithmic complexity that monolabel approach cannot handle. The MLkNN does not produce an explicit function. It stores the learning data-set (which usually contains the relevant information about the evolution of land use: (features, land use label at time t); land use label at time t þ 1). For estimating the label at time t þ 1, the label of a cell at time t is added to the input variables of the feature vector. Simply, the learning algorithm extracts the relevant information about the past land use of the cells in the learning data-set and applies it to cells in the testing data-set. For each cell in the testing data-set, we identify the k nearest cells in the learning data-set (without using at the first step the labels of the neighbours at time t þ 1). The distance used is defined by the Euclidean metric. At the second step, labels of these neighbour cells at time t þ 1 are used to predict the labels of the studied cell at time t þ 1. For example for 1-nearest neighbour, this method assigns directly to the cell the labels of the nearest neighbour. In the case of kNN, a simple and intuitive version is to assign to the studied cell the most frequent labels in the neighbours. In a more general version, we use the different information extracted from the neighbours in a probabilistic manner (Bayesian rule) as explained in an example given in Section 3.1.

2.2.2. Principles We introduce some notation first. The set X ¼ R d is considered as the instances (also known as observations or examples) domain.   Each one is represented by a d-dimensional feature vector, where Y ¼ ω1 ; . . . ωQ is the set of classes or labels. Let D ¼ fðx1 ; Y1 Þ; . . . ; ðxm ; Ym Þg represent the multi-labelled data-set, consisting of m training instances, drawn at random (without replacement) from X  2Y . The MLkNN method

6

H. Omrani et al.

Start

Select a cell at time t No

Downloaded by [94.252.11.216] at 12:54 23 March 2015

Learning set (L) (features, land use label) at time t; land use label at time t+1

Compute distance between the studied cell and all learning cells

Last cell?

Find k-nearest neighbors

Apply Bayesian rule using land use label from L at time t+1

Estimate the cell label at time t+1

Figure 2.

Yes

End

Flow chart of the processing steps of the MLkNN modelling approach.

learns a ML classifier H : X ! 2Y from the given training data, which predicts a set of labels to each unseen instance x 2 X (Zhang and Zhou 2007, Spyromitros et al. 2008). In addition to H, MLkNN defines a scoring function f : X  Y ! R assigning a score to each combination instance/label. For each class ω 2 Y, the score fðx; ωÞ represents the probability that ω is relevant to the instance x. The scoring function f is used to rank the labels based on their relevance for the instance that needs to be classified. Note that in the MLkNN method, given a threshold value t, the ML model HðÞ and the scoring function fð; Þ are related by the following relation: HðxÞ ¼ fω 2 Y jfðx; ωÞ > tg

(1)

Label ω is associated with cell x if fðx; ωÞ > t, where t is a threshold value which can be set by cross-validation (Fan and Lin 2007, Tayyebi and Pijanowski 2014). k If x is an instance with a label set Y  Y, then N x is the set of the nearest k training examples in D according to a predefined distance function dð:; :Þ. The Q-dimensional vector yx qualifies the classes of x. Component q of yx indicates whether x belongs to class ωq :

International Journal of Geographical Information Science  yx ðqÞ ¼

if ωq 2 Y otherwise

1 0

"q 2 f1; . . . ; Qg

7 (2)

The vector cx is the Q-dimensional membership counting vector of x, where component q takes into consideration the number of examples in the k nearest neighbours of x that belongs to the class ωq : cx ðqÞ ¼

X

yxi ðqÞ;

"q 2 f1; . . . ; Qg

(3)

Downloaded by [94.252.11.216] at 12:54 23 March 2015

xi 2N kx

2.2.3. Maximum-a-posteriori (MAP) rule We will present hereafter the MAP rule for the ML classification of an instance x. Using the same notation as in the previous section, N xk and cx represent the k nearest neighbours and the counting vector of x, which can be identified from the training set. Let Hq1 be the hypothesis that x belongs to class ωq , and Hq0 be the hypothesis that the example x is not assigned to the class ωq . The fact that we have exactly j examples in N xK that belong to ωq is represented by Eqj ð j 2 f0; 1; . . . ; kgÞ. The MAP rule is used to determine component q of the category vector yx : ^ yx ðqÞ ¼ arg max PrðHqb jEqcx ðqÞ Þ

(4)

b2f0;1g

2.2.4. Posterior probability estimation Based on the Bayes’ rule, Equation (4) gives: ^ yx ðqÞ ¼ arg max b2f0;1g

PrðHqb ÞPrðEqcx ðqÞ jHqb Þ PrðEqcx ðqÞ Þ

(5)

¼ arg max PrðHqb ÞPrðEqcx ðqÞ jHqb Þ b2f0;1g

The probabilities PrðHqb Þ and PrðEqcx ðqÞ jHqb Þ, for all q 2 f1    Qg and b 2 f0; 1g, are estimated straightforwardly from the training data-set D. Finally, the outcome/result of applying MLkNN to classes of x is: HðxÞ ¼ fωq 2 Y j^yx ðqÞ ¼ 1g

(6)

yx ðqÞ ¼ 1. where label ωq is associated with cell x if ^ 2.3. Evaluation The performance evaluation of an algorithm for ML learning is more complex than for single-label learning. The result given by a ML classifier may be correct fully, partly or not at all. A summary of the results is obtained by the confusion matrix. The element in cell ðk; hÞ is the number of observations for which observed and estimated classes are, respectively, k and h. The numbers on the diagonal of this matrix present the correct

8

H. Omrani et al.

estimation (true positive (TP) and true negative (TN)), and the numbers of the offdiagonal represent the errors in estimation committed (false positive (FP) and false negative (FN)). We give next the performance/evaluation measures (metrics) used in this article (Tsoumakas and Katakis 2007). Other measures can be found in Boutell et al. (2004) and Tayyebi et al. (2013). Let Yi be the set of true classes of xi and let Y^i ¼ Hðxi Þ be the set of predicted classes (or ML class) for the example xi. The first measure is Accuracy (Acc) defined as the average degree of similarity between the predicted and the real label sets of all test examples:

Downloaded by [94.252.11.216] at 12:54 23 March 2015

m 1X Acc ¼ m i¼1

  Yi \ Y^i    Yi [ Y^i 

(7)

where m is the number of test cells. Another measure known by F1 is the average of Precision (Prec) and Recall (Rec) (Yang 1999). Precision is defined as a function of correct positive predictions. Recall (also known as sensitivity) is defined as a function of true labels that have been predicted as positives. The definition of these measures is given as follows:   m  Yi \ Y^i  1X   Prec ¼ m i¼1 Y^i 

(8)

  m  Yi \ Y^i  1X   Rec ¼ m i¼1 Y^i 

(9)

  m 2Yi \ Y^i  2  Prec  Rec 1 X F1 ¼ ¼ Prec þ Rec m i¼1 jYi j þ jY^i j

(10)

and,

Finally, the Hamming loss (HLoss) uses the symmetric difference between the observed class Yi and the predicted one Y^i to evaluate the model output. This measure counts prediction errors (an incorrect label is predicted) and missing errors (a true label is not predicted). It is defined as follows: HLoss ¼

m 1X 1 jYi Y^i j m i¼1 Q

(11)

where  stands for the symmetric difference between two sets, and Q is the number of distinct classes/labels. The Hamming loss by cell is defined as the number of errors that occurred during classification divided by the number of classes. For each cell i, Q ¼ ai þ bi þ ci þ di , where ai is the number of mono-labels that are in both Yi and Y^i (the number of labels correctly predicted), bi is the number of labels in Yi but not in Y^i (missed by prediction), ci is the number of labels in Y^i but not in Yi (predicted inappropriately), and di is the number of labels in neither Yi nor Y^i. Then, Accuracy is the average of fraction ai =ðai þ bi þ ci Þ, Precision the average of ai =ðai þ bi Þ;

International Journal of Geographical Information Science

9

Recall the average of ai =ðai þ ci Þ; F1 is the average of 2  ai =ð2  ai þ bi þ ci Þ, and the Hamming loss is the average of ðbi þ ci Þ=Q ¼ 1  ðai þ di Þ=Q. The values of these evaluation metrics are in the interval ½0; 1. Larger values of Accuracy, Precision, Recall and F1 correspond to higher classification prediction. However, the smaller the value for Hamming loss, the better the performance (Yang 1999, Tsoumakas and Katakis 2007).

Downloaded by [94.252.11.216] at 12:54 23 March 2015

3. Application 3.1. Textbook example We illustrate the MLkNN method on a textbook example. We use a learning data-set L with 2074 cells that belong to classes/labels Y ¼ fω1 ¼ U; ω2 ¼ I; ω3 ¼ A; ω4 ¼ Fg. Each cell is associated with at least one of these labels. The frequency of the labels is given in Table 1. Let x be a cell outside the learning data-set (called test cell). We want to predict its class by the MLkNN method. The number of nearest neighbours k of the MLkNN method is set to 10. In Figure 3, we show the 10 neighbours (in the xy coordinates of cells) of the test cell x with their corresponding label sets. The membership counting vector cx of the test cell x is equal to ð7U ; 1I ; 10A ; 2F Þ as shown in Figure 3. The label of cell x is estimated from the following probabilities:     ^ yx ð1Þ ¼ arg max Pr H1b  Pr E17 jH1b b2f0;1g

    ^ yx ð2Þ ¼ arg max Pr H2b  Pr E21 jH2b b2f0;1g

    ^ yx ð3Þ ¼ arg max Pr H3b  Pr E310 jH3b b2f0;1g

    ^ yx ð4Þ ¼ arg max Pr H4b  Pr E42 jH4b b2f0;1g

Table 1.

Summary of the learning data-set.

Land-use class F A AF FI AI AFI U FU AU AUF IU AIU AFIU Total

Frequency 407 520 894 2 16 9 3 6 138 58 3 15 3 2074

(12)

Downloaded by [94.252.11.216] at 12:54 23 March 2015

10

H. Omrani et al.

x

Figure 3. Illustration of a test cell (x) with its 10-neighbors (). The target of the test cell is marked by a distinct/grey colour.

Recall that H11 is the event that x belongs to label 1 (ω1 ¼ U ), and E17 is the event that there are exactly seven cells having label ‘U’ in the neighbourhood of cell x. First, the prior probabilities are computed from the training set as follows:   Pr Hq1 ¼

number of cells that belong to ωq total number of cells in the learning set

  Pr Hq0 ¼

number of cells that belong to ωq total number of cells in the learning set

and

  Pr H11 ¼ 0:1090   Pr H21 ¼ 0:0231   Pr H31 ¼ 0:7970   Pr H41 ¼ 0:6649

  Pr H10 ¼ 0:8910   Pr H20 ¼ 0:9769   Pr H30 ¼ 0:2030   Pr H40 ¼ 0:3351

Second, the posterior probabilities for the MLkNN method are calculated using the training set as follows:  Pr EqcxðqÞ j Hq1 ¼ and

number of cells having exactly cx ðqÞ cells in N 10 x that belong to ωq total number of cells in the learning set that belong to ωq

International Journal of Geographical Information Science

11

 number of cells having exactly c ðqÞ cells in N 10 that belong to ω x q x Pr Eqcx ðqÞ jHq0 ¼ total number of cells in the learning set that does not belong to ωq   Pr E17 jH11 ¼ 0:1327   Pr E21 jH21 ¼ 0:3125   Pr E310 jH31 ¼ 0:6001  4 4 Pr E2 jH1 ¼ 0:0015

  Pr E17 jH10 ¼ 0:0076   Pr E21 jH20 ¼ 0:0281   Pr E310 jH30 ¼ 0  4 4 Pr E2 jH0 ¼ 0:1468

Downloaded by [94.252.11.216] at 12:54 23 March 2015

Using the prior and the posterior probabilities, the category vectors associated with the test cell are calculated as follows (see Equation (12)): ^ yx ð1Þ ¼ arg maxð0:1090  0:1327 ¼ 0:0144;

0:8910  0:0076 ¼ 0:0067Þ ¼ 1

^ yx ð2Þ ¼ arg maxð0:0231  0:3125 ¼ 0:0072;

0:9769  0:0281 ¼ 0:0274Þ ¼ 0

^ yx ð3Þ ¼ arg maxð0:7970  0:6001 ¼ 0:4782;

0:2030  0 ¼ 0Þ ¼ 1

^ yx ð4Þ ¼ arg maxð0:6649  0:0015 ¼ 0:0009;

0:3351  0:1468 ¼ 0:0491Þ ¼ 0

The target class/label for the cell x to classify is determined as follows:   HðxÞ ¼ ωq 2 Y j^yx ðqÞ ¼ 1

(13)

where label ωq is associated with cell x if ^ yx ðqÞ ¼ 1. According to the results in Equation (13), the estimated label set for cell x is ‘AU’, the same as the true label set.

3.2. Study area and data-set We analyse land-use data based on aerial photographs of 1999 and 2007 (scale: 1:10.000). The ‘Occupation Biophysique du sol’ (OBS) database is used as land-cover data-set. These OBS data were produced by the ministry of environment in Luxembourg (Map, 1999 & 2007). The explanatory variables defined for cells are described in Table 2. The input variables are classified as physical, spatial and transport. They are commonly used in studying land use (Verburg et al. 2004, Yang et al. 2008). The outcome variable is the cell state of single-label or ML class from a set of labels (see Figure 4 and Table 3). The data-set for Luxembourg area comprises 255,698 cells. As shown in Table 3, 1399 cells in 2007 have four labels each; their total area is nearly 14 km2. The average number of labels for the cells (called label cardinality) along with its normalised version (cardinality divided by the number of labels jQj, called label density) is, respectively, equal to 1.54 and 0.38. Table 3 shows a binary coding for the ML classes with the corresponding number of examples. The classes forest (F) and agriculture (A) represent together 50.96% (22.40% + 28.56%) of the region in 2007. The class urban (U) represents 1.48%, and the class industrial (I) represents only 0.50% of the land. Nearly half of the cells have multiple labels. The ML categories are clearly useful.

12

H. Omrani et al.

Table 2.

Downloaded by [94.252.11.216] at 12:54 23 March 2015

1 2 3 4 5 6 7 8 9 10 11 12

Explanatory variables.

Category

Variable

Description

Physical

State of cell Slope Urban-neighbours Industrial-neighbours Agriculture-neighbours Forest-neighbours Water-neighbours Distance-border Transport-neighbours Distance-bus-station Distance-train-station Distance-highway Number-bus-station

Multi-label class with 16ð24 Þ different labels Slope value of cell (%) Number of urban cells in the MN3  3 Number of industrial cells in the MN3  3 Number of agricultural cells in the MN3  3 Number of forest cells in the MN3  3 Number of water cells in the MN3  3 Distance from the border of Luxembourg (m) Number of transport cells in the MN3  3 Distance from cell to the closest bus station (m) Distance to the closest train station (m) Distance to the nearest highway access point (m) Number of bus stations within the distance of 2 km from cell Number of train stations within the distance of 2 km from cell

Spatial

Transport

13

Number-train-station

Note: MN3  3 – 3  3 Moore neighbourhood.

Land-use classes Urban Industrial Agriculture Forest Water Railways Luxembourg Major Roads 0

Figure 4.

2,5

5

10 km

Land-use map of Luxembourg in 1999 in vector format.

Source: Biophysical data of land-use based on aerial photos, reference system: LUREF.

International Journal of Geographical Information Science Table 3.

Summary of land-use multi-label data. Observed label set in 1999

Downloaded by [94.252.11.216] at 12:54 23 March 2015

Land-use class F A AF I FI AI AFI U FU AU AUF IU FIU AIU AFIU

13

Observed label set in 2007

#

%

#

%

56,730 75,320 77,330 1305 1961 2558 2277 3933 3583 17,024 9342 973 795 1517 892

22.186 29.456 30.242 0.510 0.766 1.000 0.890 1.538 1.401 6.657 3.653 0.380 0.310 0.593 0.348

57,297 73,025 76,777 1282 2141 3001 2609 3796 3862 15,565 9375 1426 1046 2943 1399

22.408 28.559 30.026 0.501 0.837 1.173 1.020 1.484 1.510 6.087 3.666 0.557 0.409 1.151 0.547

Notes: The output classes are agriculture (A), forest (F), industrial (I) and urban (U). The symbol # denotes the number of cells.

3.3. Model configuration and implementation We split the data-set into 60% for training/learning (L) and 40% for testing (T) by stratified random sampling. The learning set L is used for model calibration, and the testing set T is used for assessing the performance. The model is fitted on L, and its performance is evaluated by comparing the fit with the observed values on T. To avoid the vagaries of sampling (randomness of chosen one specific data split), we replicate the processes of splitting, fitting the model and evaluating its performance N times and obtaining N sets of the evaluation measures. Their means and standard deviations are the overall evaluations. For the MLkNN calibration, the optimal or near-optimal number of neighbours (denoted by k) and value of threshold (denoted by t) are determined by cross-validation method using 100 replicates. The parameter k is set to integers 1; 2; . . . ; 20. We found value of k ¼ 10 best suited. The exact value of the threshold t is equal to 1. This value is intuitively chosen in the MLkNN method. This is because Equation (4) compares the probabilities PrðHq1 Þ  PrðEqcx ðqÞ jHq1 Þ and PrðHq0 Þ  PrðEqcx ðqÞ jHq0 Þ by relating their ratio to 1. We use these values of parameters (k ¼ 10 and t ¼ 1) throughout the application. The Euclidean metric is used to measure distances between instances. For the assessment of results, we use the evaluation measures introduced in Section 2.3.

4. Analysis and results The results with the MLkNN method are reported in Table 4 by the means and standard deviations of the evaluation measures, averaged over N ¼ 100 replications. The model achieves very good results, indicated by values of 0.9875, 0.8977, 0.8935, 0.8952 and 0.0795 for accuracy, precision, recall, F1 and

14 Table 4.

H. Omrani et al. Evaluation metric: results (mean  SD) on the land-use data-set by MLkNN method.

Accuracy 0:9875  0:0012

Precision

Recall

F1

Hamming loss

0:8977  0:0164

0:8935  0:0219

0:8952  0:0033

0:0795  0:0014

Hamming loss 1 3/4 2/4 1/4 0

Downloaded by [94.252.11.216] at 12:54 23 March 2015

0

Figure 5.

3

6

12 km

Map of Hamming loss.

hamming-loss, respectively (Table 4). Figure 5 shows a map of the Hamming loss. The map shows the disagreement between observed and predicted cells labels. According to the results, it is generally low and equal to (0, 1/4, 2/4, 3/4, 1), respectively, with the following percentages ð86:000; 12:663; 1:259; 0:076; 0:002Þ%. The overall Hamming loss is low (0:0795  0:0014). However, in some areas, the Hamming loss is a little bit high in some highly heterogeneous and very urban areas, mostly in the city of Luxembourg, Esch-sur-Alzette (the second municipality) and Differdange (the third municipality of the country), where a lot construction and other development took place in the period 1999–2007. Table 5 presents the 15  15 confusion matrix for ML classification. The rows and columns of this matrix are the labels for the ML classification (e.g. F, A, AF, I, FI, AI, AFI), and the entries are the number of cells. The numbers along the major diagonal of this matrix represent the correct predictions, and the numbers off this diagonal represent the errors in prediction. The confusion matrix for ML classification has to be summarised

F

722 41 194 23 42 0 21 36 157 4 18 3 8 0 0

F A AF I FI AI AFI U FU AU AUF IU FIU AIU AFIU

Confusion matrix.

Observed label set in 2007

Table 5.

15 312 140 6 1 36 7 19 3 78 29 1 0 8 0

A 61 88 375 3 13 16 76 2 12 15 55 0 0 1 0

AF 104 48 8 4777 1069 676 79 14 4 4 0 336 80 62 0

I 1624 43 151 2700 13,141 184 813 10 82 0 1 91 443 0 0

FI 8 191 47 269 31 1958 195 2 0 11 2 44 2 235 0

AI 62 58 376 76 400 370 2579 2 1 1 4 1 1 1 6

AFI 44 39 4 6 4 1 1 1226 388 345 71 107 21 31 1

U 134 3 15 0 7 0 2 193 1131 45 87 22 52 3 0

FU

AU 8 108 28 2 1 10 1 127 25 1216 300 21 2 106 5

Predicted label set in 2007

Downloaded by [94.252.11.216] at 12:54 23 March 2015

3 11 46 0 0 0 5 18 38 132 690 1 4 0 1

AUF

7 4 0 1306 114 72 1 736 104 54 7 71285 6297 2827 1

IU

51 0 1 133 733 0 1 85 1028 0 8 3012 66099 4 1

FIU

1 13 0 55 0 525 3 19 0 215 3 1837 11 53978 31

AIU

0 0 0 1 3 4 9 0 0 0 3 5 4 40 108

AFIU

International Journal of Geographical Information Science 15

16 Table 6.

H. Omrani et al. Results: evaluation measures.

Downloaded by [94.252.11.216] at 12:54 23 March 2015

Urban Industrial Agriculture Forest

TP

FP

FN

TN

Precision

Recall

F1

35,643 9965 181,353 146,520

2075 732 6143 9490

3769 5882 3341 7986

214,211 239,119 64,861 91,702

0.94 0.93 0.97 0.94

0.90 0.63 0.98 0.95

0.92 0.75 0.97 0.94

(condensed). The 4  4 matrix for a unilabel classification can be presented as the final assessment because it is easy to digest, especially if the counts are converted to percentages. Table 6 represents the evaluation measures by land-use classes computed from the confusion matrix. Almost all the evaluation measures are successful: the precision, recall and F1 measures of the four classes are over 90%, except for the class industrial. This is due to the complexity of modelling this land-use class which is not easy to dissociate it from other land-use classes. Generally, the results are influenced by the data structure, class separability as well as to the limited spatial representation of classes (e.g. in the given case study: only 0.5% of cells are industrial). These results show that the ML concept is useful to study and simulate mixed land-use. Nevertheless, these results should be treated with caution as they are based only on a specific case study. The final judgement of this approach also depends on the comparison with other more frequently used methods (such as in the Dinamica software or in the CLUE model) to show the additional value of the proposed ML modelling framework. 5. Discussion In this work, we expand to cases where multiple classes occur within one cell. This approach is illustrated with a case study for Luxembourg. We did not consider any domination or preference among labels. The MLs are treated symmetrically and so that the land use classes are not ordered. This issue of order could be considered in future research. When studying land-use change, scale and spatial resolution should be carefully considered with mono-label or ML concepts (Kok et al. 2001, Verburg et al. 2004, 2008). For high-resolution modelling applications, such as those at ‘Land sat scale’ (e.g. 30 m), a single label may be appropriate. As scale gets coarser, it is more likely to have individual grid cells that cross over different land-cover classes. The spatial resolution depends on the size of the study area and the resolution of available data (Verburg et al. 2008). In the current application for Luxembourg, all simulations are performed with a resolution of 100  100 m (cell size of 1 ha). This resolution is commonly used at a regional or national scale (White and Engelen 2000). Such resolution is appropriate to capture variability that can be partly lost at lower resolutions (Hijmans et al. 2005). The properties of classification algorithms are studied mainly by simulations. The data-set is split into learning (calibration) and test (assessment) sets. The learning set is used to build the decision/classification function, and the test set is used for comparing the classification predicted by the model to the observed classification. The learning set is selected by sampling. The sampling design has to have an element of clustering so that the extent of contiguity and other spatial (neighbourhood) features could be inferred. Stratification may also be applied. In this article, we split the data-set into 60% for training/learning (L) and 40% for testing (T) following stratified random sampling. We used a stratified random sampling because it is more appropriate than simple random

Downloaded by [94.252.11.216] at 12:54 23 March 2015

International Journal of Geographical Information Science

17

sampling, especially with unbalanced class distribution. The model is fitted on L, and its performance is evaluated by comparing the fit with the observed values on T using popular evaluation measures (accuracy and precision). From the results, it is shown the performance of the MLkNN method to study mixed land-use change. The MLkNN method is able to mimic well the land-use changes. Nevertheless, the parameters of MLkNN have to be set with care. Other ML machine learning methods should be considered (such as methods based on SVM or ANN). In several applications, the MLkNN method shows very strong performance and even outperforms some other ML machine learning methods (Zhang and Zhou 2007). This result should be confirmed with more numerical experiments (with land use data) to establish the generality of these conclusions. The applied MLkNN method can be described as a weighted average of the labels in the test set, in which neighbours (cells that are close) have weight 1 and non-neighbours (cell more distant from the target) have weight 0. A method in which the weight is a smooth decreasing function of the distance from the target is bound to be superior. We believe that nonparametric regression and kernel smoothing would offer an obvious improvement over the nearest neighbour method. The principal tool for evaluating a classification procedure is the confusion matrix. In our study, there are four land-use classes and the final assessment is presented in the form of 4  4 confusion matrix. It is easy to digest, especially if the counts are converted to percentages. The 15  15 confusion matrix for ML classification has to be summarised (condensed). A general way to do this is by defining a score (unit penalty) for each cell of the matrix and adding up the scores multiplied by the counts (or percentages) over all the cells, or over disjoint sets of cells. The score should reflect the severity of the error. Of course, the score for a diagonal entry (correct assessment) is set to 0. The construction of the summarised confusion matrix from ML classification needs further investigation with suitable penalty score function.

6. Conclusion and future work Several modelling approaches have been applied in the prediction of land-use change. Conventional methods predict changes in land use with single-label cells. However, cells are often associated with a set of disjoint labels. The single-label outcome is partial and incomplete. Multiple labels are more appropriate for outcomes and, unlike single labels, do not rule out evolution. In this article, we applied a ML learning method, based on the MLkNN method, for modelling land-use change. The method is particularly useful when dealing with a high level of inter-mingling of the labels. When applied to Luxembourg case study, the MLkNN method gave very good results according to the well-established accuracy, precision, recall and F-measures. There are some points that need further investigation. First, it is important to compare the proposed method to recent land-use methods to confirm the merit of the ML learning method for mixed land-use modelling presented in this article. Second, classes in ML learning are usually correlated, and a key challenge is to exploit correlation among different classes. The next challenge is to adapt the methods to much more extensive and detailed data. References Basse, R.M., et al., 2014. Land use changes modelling using advanced methods: cellular automata and artificial neural networks. The spatial and explicit representation of land cover dynamics at the cross-border region scale. Applied Geography, 53, 160–171. doi:10.1016/j.apgeog.2014.06.016

Downloaded by [94.252.11.216] at 12:54 23 March 2015

18

H. Omrani et al.

Batty, M., 2011. Modeling and simulation in geographic information science: integrated models and grand challenges. Procedia-Social and Behavioral Sciences, 21, 10–17. doi:10.1016/j. sbspro.2011.07.003 Bernardo, J.M. and Smith, A.F., 2009. Bayesian theory. Vol. 405. Chichester: John Wiley & Sons. Boutell, M.R., et al., 2004. Learning multi-label scene classification. Pattern Recognition, 37, 1757– 1771. doi:10.1016/j.patcog.2004.03.009 Charif, O., et al., 2012. Cellular automata model based on machine learning methods for simulating land use change. In: O. Rose, ed. Proceedings of the 2012 Winter Simulation Conference (WSC), 9–12 December, Berlin. IEEE, 1–12. Couclelis, H., 2005. “Where has the future gone?” Rethinking the role of integrated land-use models in spatial planning. Environment and Planning A, 37, 1353–1371. doi:10.1068/a3785 Eastman, J.R., Solórzano, L.A., and Fossen, M.E.V., 2005. Transition potential modeling for landcover change. In: D. Maguire, M. Batty, and M. Goodchild, eds. GIS, spatial analysis and modeling. Redlands, CA: ESRI Press, 357–385. Fan, R.E. and Lin, C.J., 2007. A study on threshold selection for multi-label classification. Department of Computer Science, National Taiwan University, 1–23. Ferrier, S. and Guisan, A., 2006. Spatial modelling of biodiversity at the community level. Journal of Applied Ecology, 43, 393–404. doi:10.1111/j.1365-2664.2006.01149.x Grekousis, G., Manetos, P., and Photis, Y.N., 2013. Modeling urban evolution using neural networks, fuzzy logic and GIS: the case of the Athens metropolitan area. Cities, 30, 193–203. doi:10.1016/j.cities.2012.03.006 Hijmans, R.J., et al., 2005. Very high resolution interpolated climate surfaces for global land areas. International Journal of Climatology, 25, 1965–1978. doi:10.1002/joc.1276 Huang, B., Xie, C., and Tay, R., 2010. Support vector machines for urban growth modeling. Geoinformatica, 14, 83–99. doi:10.1007/s10707-009-0077-4 Jiang, X., Lin, M., and Zhao, J., 2011. Woodland cover change assessment using decision trees, support vector machines and artificial neural networks classification algorithms. In: Proceedings of the International Conference on Intelligent Computation Technology and Automation (ICICTA), Vol. 2, 28–29 March, Shenzhen. IEEE, 312–315. Jones, J., Miller, J., and White, M., 2011. Multi-label classification for multi-species distribution modeling. In: L. Getoor and T. Scheffer, eds. Proceedings of the 28th international conference on machine learning, New York, NY, 1–4. Kok, K., et al., 2001. A method and application of multi-scale validation in spatial land use models. Agriculture, Ecosystems & Environment, 85, 223–238. doi:10.1016/S0167-8809(01)00186-4 Lin, H., et al., 2005. Modeling urban sprawl and land use change in a coastal area – a neural network approach. In: Proceedings of the 2005 annual meeting, 24–27 July, Providence, RI. Mas, J.F., et al., 2004. Modelling deforestation using GIS and artificial neural networks. Environmental Modelling & Software, 19, 461–471. doi:10.1016/S1364-8152(03)00161-0 Niu, X., Shao, C., and Wang, X., 2006. A survey of digital vector map watermarking. International Journal of Innovative Computing, Information and Control, 2, 1301–1316. Omrani, H., et al., 2012. Simulation of land use changes using cellular automata and artificial neural network. Technical report, CEPS/INSTEAD Working Paper. Pijanowski, B., et al., 2002. Using neural networks and GIS to forecast land use changes: a land transformation model. Computers, Environment and Urban Systems, 26, 553–575. doi:10.1016/ S0198-9715(01)00015-1 Pijanowski, B.C., et al., 2014. A big data urban growth simulation at a national scale: configuring the GIS and neural network based Land Transformation Model to run in a High Performance Computing (HPC) environment. Environmental Modelling & Software, 51, 250–268. doi:10.1016/j.envsoft.2013.09.015 Pontius Jr. R.G. and Cheuk, M.L., 2006. A generalized cross-tabulation matrix to compare softclassified maps at multiple resolutions. International Journal of Geographical Information Science, 20, 1–30. doi:10.1080/13658810500391024 Pontius Jr. R.G., Cornell, J.D., and Hall, C.A., 2001. Modeling the spatial pattern of land-use change with GEOMOD2: application and validation for Costa Rica. Agriculture, Ecosystems & Environment, 85, 191–203. doi:10.1016/S0167-8809(01)00183-9 Sangermano, F., Eastman, J.R., and Zhu, H., 2010. Similarity weighted instance-based learning for the generation of transition potentials in land use change modeling. Transactions in GIS, 14, 569–580. doi:10.1111/j.1467-9671.2010.01226.x

Downloaded by [94.252.11.216] at 12:54 23 March 2015

International Journal of Geographical Information Science

19

Spyromitros, E., Tsoumakas, G., and Vlahavas, I., 2008. An empirical study of lazy multilabel classification algorithms. In: J. Darzentas et al., eds. Proceedings of the artificial intelligence: theories, models and applications, 2–4 October, Syros. Berlin: Springer, 401–406. Tayyebi, A., et al., 2013. Hierarchical modeling of urban growth across the conterminous USA: developing meso-scale quantity drivers for the Land Transformation Model. Journal of Land Use Science, 8, 422–442. doi:10.1080/1747423X.2012.675364 Tayyebi, A. and Pijanowski, B.C., 2014. Modeling multiple land use changes using ANN, CART and MARS: comparing tradeoffs in goodness of fit and explanatory power of data mining tools. International Journal of Applied Earth Observation and Geoinformation, 28, 102–116. doi:10.1016/j.jag.2013.11.008 Tayyebi, A., et al., 2014. Comparing three global parametric and local non-parametric models to simulate land use change in diverse areas of the world. Environmental Modelling & Software, 59, 202–221. doi:10.1016/j.envsoft.2014.05.022 Tayyebi, A., Pijanowski, B.C., and Tayyebi, A.H., 2011. An urban growth boundary model using neural networks, GIS and radial parameterization: an application to Tehran, Iran. Landscape and Urban Planning, 100, 35–44. doi:10.1016/j.landurbplan.2010.10.007 Tomas, C., et al., 2012. Development of an activity-based cellular automata land-use model: the case of Flanders, Belgium. In: R. Seppelt et al., eds. Proceedings of the international Environmental Modelling and Software Society (iEMSs), 1–5 July, Leipzig, 2000–2007. Tsoumakas, G. and Katakis, I., 2007. Multi-label classification: an overview. International Journal of Data Warehousing and Mining (IJDWM), 3, 1–13. doi:10.4018/jdwm.2007070101 Verburg, P.H., Eickhout, B., and Van Meijl, H., 2008. A multi-scale, multi-model approach for analyzing the future dynamics of European land use. The Annals of Regional Science, 42, 57– 77. doi:10.1007/s00168-007-0136-4 Verburg, P.H., et al., 2004. Determinants of land-use change patterns in the Netherlands. Environment and Planning B, 31, 125–150. doi:10.1068/b307 White, R. and Engelen, G., 2000. High-resolution integrated modelling of the spatial dynamics of urban and regional systems. Computers, Environment and Urban Systems, 24, 383–400. doi:10.1016/S0198-9715(00)00012-0 White, R., Uljee, I., and Engelen, G., 2012. Integrated modelling of population, employment and land-use change with a multiple activity-based variable grid cellular automaton. International Journal of Geographical Information Science, 26, 1251–1280. doi:10.1080/ 13658816.2011.635146 Wu, X., et al., 2008. Top 10 algorithms in data mining. Knowledge and Information Systems, 14, 1– 37. doi:10.1007/s10115-007-0114-2 Yang, Q., Li, X., and Shi, X., 2008. Cellular automata for simulating land use changes based on support vector machines. Computers & Geosciences, 34, 592–602. doi:10.1016/j. cageo.2007.08.003 Yang, Q., et al., 2012. Multi-label classification models for sustainable flood retention basins. Environmental Modelling & Software, 32, 27–36. doi:10.1016/j.envsoft.2012.01.001 Yang, Y., 1999. An evaluation of statistical approaches to text categorization. Information Retrieval, 1, 69–90. doi:10.1023/A:1009982220290 Zhang, M.L. and Zhang, K., 2010. Multi-label learning by exploiting label dependency. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, 999–1008. Zhang, M.L. and Zhou, Z.H., 2007. ML-KNN: a lazy learning approach to multi-label learning. Pattern Recognition, 40, 2038–2048. doi:10.1016/j.patcog.2006.12.019

Suggest Documents