IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 8, NO. 3, MARCH 2015
1371
Spatial-Contextual Supervised Classifiers Explored: A Challenging Example of Lithostratigraphy Classification Matthew J. Cracknell and Anya M. Reading Abstract—Spatial-contextual classifiers exploit characteristics of spatially referenced data and account for random noise that contributes to spatially inconsistent classifications. In contrast, standard global classifiers treat inputs as statistically independent and identically distributed. Spatial-contextual classifiers have the potential to improve visualization, analysis, and interpretation: fundamental requirements for the subsequent use of classifications representing spatially varying phenomena. We evaluate random forests (RF) and support vector machine (SVM) spatial-contextual classifiers with respect to a challenging lithostratigraphy classification problem. Spatial-contextual classifiers are divided into three categories aligned with the supervised classification work flow: 1) data preprocessing—transformation of input variables using focal operators; 2) classifier training—using proximal training samples to train multiple localized classifiers; and 3) postregularization (PR)—reclassification of outputs. We introduce new variants of spatial-contextual classifier that employ self-organizing maps to segment the spatial domain. Segments are used to train multiple localized classifiers from k neighboring training instances and to represent spatial structures that assist PR. Our experimental results, reported as mean (n = 10) overall accuracy ±95% confidence intervals, indicate that focal operators (RF 0.754 ±0.010, SVM 0.683 ±0.010) and PR majority filters (RF 0.705 ±0.010, SVM 0.607 ±0.010 for 11 × 11 neighborhoods) generate significantly more accurate classifications than standard global classifiers (RF 0.625 ±0.011, SVM 0.581 ±0.011). Thin and discontinuous lithostratigraphic units were best resolved using non-preprocessed variables, and segmentation coupled with postregularized RF classifications (0.652 ±0.011). These methods may be used to improve the accuracy of classifications across a wide variety of spatial modeling applications. Index Terms—Decision trees, geology, geophysics, spatial filters, supervised learning, support vector machines (SVMs).
I. I NTRODUCTION
S
PATIAL-CONTEXTUAL supervised classifiers constitute an emerging field of research in geological remote sensing applications [1]. Unlike standard global classifiers, which Manuscript received May 05, 2014; revised October 27, 2014; accepted December 05, 2014. Date of publication January 22, 2015; date of current version March 27, 2015. This work was conducted at the Australian Research Council (ARC) Centre of Excellence in Ore Deposits (CODES) and School of Physical Sciences (Earth Sciences), University of Tasmania, under Project P3A3A. The work of M. J. Cracknell was supported by the University of Tasmania Elite Ph.D. Scholarship. The authors are with the School of Physical Sciences (Earth Sciences) and ARC Centre of Excellence in Ore Deposits (CODES), University of Tasmania, Hobart, Tasmania 7001, Australia (e-mail:
[email protected]; anya.
[email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/JSTARS.2014.2382760
treat spatially distributed samples (pixels) as statistically independent and identically distributed [2], [3], spatial-contextual classifiers exploit the spatial heterogeneity and spatial dependencies commonly encountered in spatial processes [4]–[7]. Spatial heterogeneity refers to a nonstationary spatial process that exhibits contrasting local statistical properties across a given domain or region. This requires different parameters to characterize adequately nonstationary processes or models at various locations [3], [8]–[11]. Spatial dependency, first described by Tobler [12], refers to the characteristic of spatially distributed variables to display autocorrelation, i.e., measurements that are close together in space are more likely to be similar than those farther apart [3], [10], [11], [13]. Lithostratigraphic units share common lithological characteristics and are the fundamental classes for geological mapping. Lithostratigraphic units also indicate specific intervals of geologic time although the concept of time has little relevance in classifying these units and their boundaries [14]. In this study, the classification target represents generalized lithostratigraphic units with similar lithological attributes, i.e., sedimentary materials, igneous mineralogy, metamorphic grade, and units with comparable stratigraphic positions. While useful for interpreting tectonic histories and prospecting for mineralization, these classes encompass subtly different lithologies while also displaying significant lithological similarities to other classes. The motivation for geological mapping using remote sensing data is to compliment subjective geological interpretations used to construct geological maps. This is especially true in environments where rugged terrain, dense vegetation, and a lack of exposed bedrock limit the number observations available to constrain interpretations. The characteristics of lithostratigraphic units, however, present a significant challenge to global supervised classifiers. In practice, global classifiers often generate spatially distributed classifications that contain a large amount of high frequency noise or “speckle.” This noise is a result of a high degree of intra-class variability and low interclass separability [15], [16], and the inclusion of irrelevant or erroneous data [17], [18]. This has a detrimental effect on supervised classifier outputs and inhibits visualization, analysis, and interpretation in the spatial domain. The majority of recent research into remote sensing classifications applications has investigated and tested spatialcontextual supervised classifiers for the prediction of land cover and geomorphic features, e.g., [5]–[7], [15], [19]–[25]. These methods are reviewed and divided into three general categories
1939-1404 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
1372
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 8, NO. 3, MARCH 2015
that provide spatial context to supervised classifiers based on the stages of the classification process: 1) data preprocessing using focal operators; 2) the generation of multiple classifiers from subsets of training data; and 3) postprocessing [postregularization (PR)] methods that reclassify outputs to represent spatially homogeneous regions.
can be derived using unsupervised clustering techniques [18], [21]. This is akin to the methods used in object-orientated classification, although object-orientated methods often require the provision of complex user defined rules to segment image space [32]–[34]. B. Local Training Data Selection
A. Preprocessing Methods Preprocessing input variables prior to classification model training approaches the problem of characterizing spatialcontext using focal operators to derive new values representing similarity (or dissimilarity) within local neighborhoods [4], [6], [11], [24]. Focal operators generate first- or second-order spatial statistics which are a function of the values of pixels within a given neighborhood [11], [23], [26], [27]. First-order spatial statistics represent properties calculated from neighboring pixel intensity histograms, e.g., mean, variance, etc. Whereas, second-order spatial statistics assess the relationship between pairs of neighboring pixel values, e.g., textural properties estimated using [28] gray-level concurrence matrices (GLCM) [11], [23], [27]. Focal operator first-order statistics have been used to provide some notion of spatial context for the supervised classification of land cover, vegetation, and occasionally lithology from remote sensing data, e.g., [5], [15]–[17], [20], [23], [26], [27], [29]. Ghimire et al. [15] incorporated measures of spatial autocorrelation via the Getis statistic [30], [31] as input data the prediction of vegetation classes using random forests (RF). The local Getis statistic G* is essentially a standardized local convolution filter that incorporates mean global values with a low pass (mean) focal operator. Image texture can be thought of as a measure of the variability or regularity within a local region [32]. There have been several studies assessing the use of textural data for remote sensing classification. Zortea et al. [20] randomly selected GLCM texture derivatives with different neighborhood sizes to train multiple support vector machine (SVM) classifiers. In this example, the contributions of trained SVM models were weighted to generate predictions. Grebby et al. [16] and Ricchetti [17] integrated multispectral reflectance imagery with topographic derivatives to classify lithology. The inclusion of topographic derivatives increased classification accuracies when compared to those resulting solely from classifiers trained on multispectral imagery. Shankar [27] employed first- and second-order spatial statistics to derive textural information from airborne magnetics, whereas Li and Narayanan [5] derived textural derivatives using Gabor wavelet filters. The separation of variable space into clusters with similar characteristics is carried out during preprocessing but does not strictly exploit spatial-contextual information. However, in the context of image segmentation, the resulting segments often represent spatially contiguous regions with homogeneous spectral characteristics. Spatial-contextual information can be extracted from these segments indicating spatial structure [4], [6]. The simplest method for segmenting an image is to divide the spatial domain up into quadrants of equal area [11]. Alternatively, multiple regions of arbitrary size and shape
A local approach to supervised classification that utilized the k neighbors for a given test sample to train a linear learning algorithm was first proposed by Bottou and Vapnik [36]. This method did not change the architecture of the learning algorithm just the training procedure. Nonetheless, significant gains in classifier accuracy for hand-written digit recognition were obtained when compared to a standard k-nearest neighbors (kNN) classifier. More recently, Refs. [21], [37], [38], and [21] proposed a kNN hybrid classifier using principles similar to those developed by Bottou and Vapnik [36]. The kNN hybrid classifier identifies k neighboring training samples in variable space to the sample requiring classification. Instead of using a majority vote based on k class labels the decision is made by a trained SVM classifier (kNN-SVM) on the selected k training samples [21], [25], [37]. The kNN hybrid method is computationally expensive as the number of models required to generate predictions is equal to the number of test samples. A reduction in processing time can be obtained using variations proposed by Refs. [39] and [40]. For each training sample, the k nearest training samples in variable space are identified and used to train a localized classification model. Classifications are obtained using the local classification model centered on the closest training sample in variable space to the sample requiring classification. C. Postregularization Spatial-contextual postprocessing involves the PR of classifications based on the labels of neighboring samples [6]. PR encompasses methods that result in relatively homogeneous regions containing a single class. These approaches are commonly encountered in geological remote sensing classification applications, e.g., [16]–[18], [41]. This is because geological units often represent contiguous regions that are larger than the scale of a single sample (pixel). Therefore, it is unlikely that a classified sample will differ markedly to its immediate neighbors unless there is a boundary between classes. Isolated classifications representing different classes are usually a response to high frequency noise present within input variables [17], [18]. The most commonly used PR method for remote sensing classification is the majority filter [6], [15], [24]. Majority filters assign the most abundant class to the center pixel of a local search neighborhood. Alternatively, Tarabalka et al. [6] outline a PR method that combines image segmentation with majority filters. This approach identifies the majority class within a given segment and assigns this class to the region covered by the segment in question. D. Experiment Design Machine learning algorithms, such as RF and SVM, offer practitioners the opportunity to integrate large disparate
CRACKNELL AND READING: SPATIAL-CONTEXTUAL SUPERVISED CLASSIFIERS EXPLORED
1373
Fig. 1. Study region location and generalized lithostratigraphic units, modified from Mineral Resources Tasmania [51]. See Table I for class descriptions.
datasets for supervised classification applications [42]–[45]. Recently, these classifiers have seen increasing use in multiclass lithological classification problems using remotely sensed geophysical data, e.g., [46]–[50]. These studies have demonstrated the efficacy of RF and SVM to classify lithologies in remote or inaccessible terrain from limited training samples. However, classifier training and testing is usually carried out on data using a single global classification model [6], [20]. In this contribution, we compare classifications of lithostratigraphic units generated by RF and SVM supervised global classifiers to those generated by RF and SVM classifiers modified to exploit spatial-contextual information. Several elements of this experiment, such as limited training data, mixed class types, and interference from dense cover (e.g., vegetation and/or unconsolidated sediments) present a challenge to accurately classifying lithostratigraphic units from remote sensing data. Our hypothesis states that using spatial-contextual methods will increase the accuracy and interpretability of classifications representing lithostratigraphic units compared to those obtained using global methods. However, it is unclear which spatialcontextual classifiers, or combinations thereof, are optimal for this task. In addition, we introduce a new and efficient variant of kNN hybrid classifiers that utilizes segmentation to constrain training sample search regions. Furthermore, we assess if there are differences in the abilities of RF and SVM to exploit spatial-contextual relationships. II. DATA This section summarizes the target classes representing lithostratigraphic units, which is followed by a description of input variables and specific data preprocessing steps. A. Lithostratigraphic Units The study region covers ∼1000 km2 of heavily forested and mountainous terrain (west Tasmania; Fig. 1). This region was selected as it presents several challenges to lithostratigraphy
classification, namely: poor exposures of geological materials due to vegetation and/or unconsolidated sediments; and a high degree of intra-class variability and inter-class similarity. The lithostratigraphic units used to train and test spatial-contextual classifiers were obtained from the Tasmania 1:25 000 digital geological map available from Mineral Resources Tasmania [51]. This geological map has been compiled from geological reports, 1:25 000, 1:50 000, and 1:63 360 scale mapping, new field mapping and interpretation of aerial photography and airborne geophysical data. We acknowledge that manually interpreted geological maps contain uncertainties that are rarely reported as estimates of accuracy. Hence, it is difficult to provide meaningful quantified measures of the accuracies of the reference map that we have used to obtain samples. Despite this, the area chosen for our experiment has been mapped in detail and can be regarded as a reliable estimate of the spatial distribution of lithostratigraphic units at the Earth’s surface. Geological classes were rasterized to 100 m cells using the polygon covering the cell center. The original lithostratigraphic units were generalized to 11 classes prior to classification based on common lithologies and overall stratigraphic relationships (Table I). Lithostratigraphic units range in age from Mesoproterozoic to Quaternary, exhibit variable metamorphic grades, and comprise a range of geological materials. Older units are, in places, obscured by Quaternary sediments of variable thickness. Minor Mesozoic clastic sedimentary and mafic intrusive rocks and Cenozoic mafic volcanic and terrestrial sedimentary rocks occur within the study region. These units, along with water bodies, were masked due to their limited coverage. B. Geophysical Data In this study, airborne geophysical data [digital elevation model (DEM), total magnetic intensity (TMI) and airborne gamma-ray spectrometer (GRS) available from Mineral Resources Tasmania at http://www.mrt.tas.gov.au] and cloud free Landsat ETM+ imagery (scene LE70910892000328 EDC00 available from the United States Geological Survey at
1374
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 8, NO. 3, MARCH 2015
TABLE I S UMMARY OF L ITHOSTRATIGRAPHIC C LASSES
Classes are based on generalized lithostratigraphic units sourced from the 1:25 000 scale geological map of Tasmania [51]. Class abbreviations represent geochronological differences between lithostratigraphic units.
http://eros.usgs.gov) were used as input variables. These data were transformed to a common coordinate system (GDA94 zone 55) and resampled to 100 m resolution. TMI data were downward continued to the ground surface to enhance shallow magnetic anomalies [52]. Downward continued TMI data were reduced-to-pole (RTP) using ERDAS ERMappper 7.2 with dec. = 13.1 and inc. = −72.0 parameters based on the Australian Geomagnetic Reference Field (Geoscience Australia: http://www.ga.gov.au/oracle/geomag/agrfform.jsp). RTP shifts the dipolar nature of the TMI anomalies as if the geomagnetic field is vertically orientated [52]. The regional gradient observed in the RTP data was removed by subtracting a linear trend surface. GRS data (i.e., K, eTh, and eU) were corrected for negative values. In addition, the natural logarithm of K/eTh was calculated and included as input. Landsat ETM+ band ratios were calculated corresponding to those described in Cracknell and Reading [50]. Landsat ETM+ variables with Pearson’s correlation coefficients >0.85 associated with a large number of other Landsat ETM+ variables were removed. The preprocessing steps described above resulted in 11 variables for training and testing (i.e., DEM, RTP, K, eTh, eU, K/eTh, Landsat 1, Landsat 4, Landsat 6, Landsat 3/5, and Landsat 5/7). These variables were standardized to zero mean and unit variance. III. M ETHODS This section summarizes the sampling methods used to obtain labeled samples (T ) and subset these into training (Ta ) and test (Tb ) datasets. We then outline the design of the classifiers compared across a total of 16 experiments (listed in Table II). These are based on the methods previously described in Sections I-A–C. A schematic work flow indicating spatialcontextual classifier design is shown in Fig. 2. We end this section with a summary of the prediction evaluation strategies employed to compare our experiments.
Fig. 2. Schematic work flow showing the design of the sixteen individual spatial-contextual classifiers evaluated in this study. The SOM-kNN method has been newly implemented as part of this study.
A. Data Sampling Elements of Ta were obtained using stratified random sampling. For each class, excluding samples representing Qs, 100 training instances were sampled; thus, Ta represents ∼1% of the total number of samples in the study region. This ensured that classifiers were assessed with limited a priori knowledge of the spatial distribution of bedrock lithostratigraphic units [16]. Qs samples were omitted from elements of Ta and Tb in order to assess how well spatial-contextual classifiers can “see” through unconsolidated sediments and classify bedrock lithologies. Fig. 3(a) provides an example of the spatial distribution of Ta . Tb comprised 10 000 randomly sampled instances independent to the elements of Ta . Fig. 3(b) shows an example
CRACKNELL AND READING: SPATIAL-CONTEXTUAL SUPERVISED CLASSIFIERS EXPLORED
1375
Predictions are based on a majority vote from the outcomes of all random DT within the forest [42], [44], [53], [55]. C. Support Vector Machines SVM, first described by Vapnik [56] and [57], has the ability to discriminate between classes in a high-dimensional variable space [58], [59]. Basic SVM theory states that for a linearly separable dataset containing points from two classes there are an infinite number of hyperplanes that divide these classes. Only a subset of instances in Ta , known as support vectors, are used to define class marginal hyperplanes M [42]. In this way, SVM exploit the geometric characteristics of data [60], [61]. In nonseparable linear cases, SVM find M while incorporating a cost parameter C, which adjusts the penalty associated with misclassifying support vectors. High values of C generate more complex M by assigning a higher penalty on support vector error [59]–[61]. A kernel function allows SVM to handle efficiently nonlinear relationships by projecting samples from the original variable space into a potentially infinite dimensional kernel space [58], [61]. The Gaussian radial basis function kernel offers a good first choice for most applications [59]. The architecture of SVM described above deals with binary classification tasks. Nonetheless, SVM can be extended to multiclass problems by combining multiple classifiers. We use the oneagainst-one method to separate classes as this has been shown to efficiently generate robust classifications [46], [59], [62]. Fig. 3. (a) Example of Ta sample locations and (b) class proportions within Tb samples, this is approximately equivalent to total area covered by a given class. Note samples of Qs class not included in Ta or Tb . See Table I for class descriptions.
of the proportions of Tb samples for individual classes. These proportions are approximately equivalent to the area covered by each class within the study region. Ta and Tb were resampled ten times to eliminate the possibility that classifications were influenced by the statistical and/or spatial distributions of randomly sampled data. In addition, the ten resampled sets of Ta and Tb reduces the impact of unreliable information in one or more of these datasets on classifier comparisons. B. Random Forests RF, developed by Breiman [53], is an ensemble classification scheme that induces multiple randomized decision tree (DT) classifiers, known as a forest. Randomness is introduced into the algorithm by randomly subsetting a predefined number of input variables (m) to split at each node of a DT and by bagging (bootstrap aggregation). Bagging [54] generates training samples for each tree by sampling with replacement a number of samples equal to the number of instances in Ta . This equates to approximately two-thirds of samples available for training while the remaining samples are used for evaluation. Bagging is reported to improve classification predictions as long as they are not stable in the presence of altered Ta . The Gini Index is used by RF to determine a “best split” threshold at each node of a DT.
D. Self-Organizing Maps Self-organizing maps (SOM) [63], [64] is an unsupervised clustering algorithm that treats each sample as an ndimensional (nD) vector in variable space. SOM employs vector quantization to project data in nD space onto a twodimensional (2-D) map. SOM finds natural groups or clusters via an iterative two-stage process that 1) identifies input samples to the closest randomly seeded seed-nodes and 2) trains seed-nodes such that their values are adjusted to align more closely to associated samples. In this way, SOM links variables to trained seed-nodes (nodes—representing groups of similar samples) onto a 2-D topologically relevant space. The topology between SOM nodes is preserved such that nodes that are close in nD space maintain their relative proximities on the 2-D map [65], [66]. E. Global Classifiers and Image Segmentation Global standard variable RF (Ex. 1) and SVM (Ex. 5) classifiers were trained using the R caret package [67]. An optimum minimum number of standard variables was obtained using the ranked-variable selection method described in Cracknell et al. [68]. For RF, nonredundant geophysical variables were ranked using Gini Index measures of variable importance. For SVM, nonredundant geophysical variables were ranked using measures of variable importance calculated using receiver operating curves [69] described in Cracknell and Reading [49]. In our experiment, the accuracy tolerance level was set to 0.02 with a step size of 2.
1376
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 8, NO. 3, MARCH 2015
Fig. 4. Segmented images derived using 100 SOM clusters for (a) standard variables and (b) focal operator (texture) variables. Note color ramps are repeated for clarity but result in duplicated segment colors.
RF global classifiers were trained using 1000 trees and m was selected using 10-fold cross-validation. A Gaussian radial basis function kernel was used to train SVM global classifiers. SVM C parameter was selected using tenfold cross-validation and σ (kernel width) was estimated via the R kernlab package [70] sigest() function. SOM was employed to divide the study area into 100 segments representing regions with similar geophysical characteristics (Ex. 3–4, 7–8, 11–12, and 15–16). SOM was implemented using a 10 × 10 hexagonal grid, 100 iterations; an initial search radius 2/3 the dimensions of variable space; linear search radius decrease; and percentage adjustment of seed-node properties from 0.05 to 0.01.
F. Spatial-Contextual Classifiers Focal operator preprocessing of standard variables obtained first- and second-order spatial statistics using 9 × 9 pixel neighborhoods (texture variables; Ex. 9–16). Edge effects were avoided by including a buffer of 1 km. Six first-order statistics (mean, variance, covariance, skewness, kurtosis, and fractal dimension), nine second-order spatial statistics (GLCM textures: contrast, dissimilarity, homogeneity, angular second moment, entropy, maximum probability, mean, variance, and correlation) were derived for the 11 variables documented
in Section II-B. A normalized measure of fractal dimension was calculated using the ratio of range to standard deviation described in Shankar [27]. GLCM textures were calculated from variables uniformly quantized to 64 gray levels (i.e., 6-bit). A 6-bit quantization level was chosen as Soh and Tsatsoulis [71] indicates that it generates sufficiently accurate and consistent GLCM textures while being more computationally efficient than higher quantization levels [20]. GLCM textures were derived for the four principle directions (0◦ , 45◦ , 90◦ , and 135◦ ) and direction invariant texture (average of principle directions). Pairwise gray-level comparisons were defined using a pixel offset of 2. A total of 51 GLCM textures were generated for every input variable. These were standardized to zero mean and unit variance prior to classifier training and testing. The methods for removing correlated variables, described in Section II-B, were used to reduce the total number of GLCM textures from 561 to 166. SOM image segments for standard [Fig. 4(a)] and texture [Fig. 4(b)] variables highlight the effect of focal operators. The SOM segments derived from texture variables are considerably larger, more spatially contiguous and exhibit less irregular boundaries than the segments obtained using standard variables. We implemented the fast-kNN hybrid (kNN-SVM and kNNRF; Ex. 2, 6, 10, and 14) method, first described in Refs. [19] and [38], to select neighboring elements of Ta in variable space. Classifiers were trained using k = 200, centered on individual elements of Ta . This resulted in a total of 1000 classifiers (one for each element in Ta ) trained for each of the kNN hybrid experiments. To avoid cross-validation errors resulting from not having all classes present within individual folds, 3-fold cross-validation was employed to select RF and SVM parameters. Classes represented by 0.10 increase in Tb accuracies when using the texture variable RF global classifier (Ex. 9) compared to accuracies obtained using the standard variable RF global classifier (Ex. 1). Texture variable global and SOMPR RF classifiers (Ex. 9 and 12) achieve mean Tb accuracies >0.05 compared to texture variable kNN-RF (Ex. 10). The best performing texture variable RF classifiers (global and SOM-PR; Ex. 9 and 12) show considerable overlap between resampled sets of Tb . Standard variable kNN hybrid and SOM-kNN SVM classifiers (Ex. 6 and 7) achieved significantly higher Tb accuracies than the standard variable SVM global (>0.04) and SOMPR (>0.02) classifiers (Ex. 5 and 8). Considerable overlap in Tb accuracies is observed using texture variable global, kNN hybrid, and SOM-kNN SVM classifiers (Ex. 13–15). The best performing SVM classifier was obtained using texture variables coupled with the SOM-PR postprocessing method (Ex. 16). This method achieved a significant (>0.02) increase in mean Tb accuracy compared to the other SVM classifiers utilizing texture variables (Ex. 13–15).
Tb mean and standard deviations based on ten spatially and statistically independent resampled Ta and Tb . Standard deviation of 95% confidence intervals (CI) was, for all classifiers, 0.001. Ex. 1–16 refer to individual experiments described in the text.
Fig. 5. Comparison of spatial-contextual classifier Tb accuracies. Box plots indicate the distribution (n = 10) of classifier Tb accuracy. Points (see legend) indicate mean Tb accuracy resulting from PR majority filter methods with different neighborhood dimensions.
Substantial overlap between standard and texture variable RF and SVM, kNN hybrid, and SOM-kNN (Ex. 2–3, 6–7, 10–11, and 14–15) Tb accuracies is observed despite a considerable increase in kNN hybrid training times compared to those required for SOM-kNN. In our experiments, this increase in computational cost is a function of the number of classifiers trained for kNN hybrid classifiers compared to those trained for SOM-kNN classifiers. kNN hybrid classifiers were
1378
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 8, NO. 3, MARCH 2015
Fig. 6. Examples of the best performing spatial-contextual RF classifiers, standard variable SOM-PR (Ex. 4), and texture variable global (Ex. 9). Black square indicates the location of Fig. 10. See Table I for class descriptions.
trained using 1000 classifiers and SOM-kNN was trained using 100 classifiers. Hence, computational cost is reduced by a factor of 10. When using texture variables, kNN hybrid, and SOM-kNN-SVM and -RF classifiers (Ex. 10–11 and 14–15) obtain mean Tb accuracies that fall within the ranges of standard deviations from the mean. Fig. 6 compares standard variable SOM-PR and texture variable RF global classifiers (Ex. 4 and 9). These classifiers are compared in terms of the spatial distributions of their predictions, the mismatch between these predictions and the interpreted geological map used to sample Ta . There is ∼0.10 difference in mean Tb accuracies between these two classifiers. The most striking difference in their predictions is observed in the degree to which errors (or correct predictions) are increasingly spatially contiguous using texture variables. A similar pattern emerges in Fig. 7, which compares SVM classifiers utilizing standard and texture variables. In this case, the texture variable SVM SOM-PR classifier (Ex. 16) achieves a 0.075 increase in mean Tb accuracy over the standard variable kNN-SVM classifier (Ex. 7). Table III provides a summary of the differences observed in mean Tb accuracies obtained using majority filter PR methods and those not reclassified using these methods. As PR filter
dimensions increase there is a corresponding increase in the difference in mean Tb accuracies. Classifiers trained using texture variables exhibit lower increases in mean Tb accuracies using 7 × 7 and 11 × 11 majority filters than classifiers trained on standard variables. The largest increase in mean Tb accuracies (up to ∼0.09) using majority filters are obtained for standard variable RF classifiers (Ex. 1–4) and texture variable kNN-SVM classifiers (Ex. 14 and 15). There is a small increase in mean Tb accuracies using the 3 × 3 PR filters combined with texture variable RF and SVM classifiers (Ex. 9–16). In contrast, the greatest increase in mean Tb accuracies is observed using majority filters with dimensions larger than 3 × 3 cells for kNN hybrid and SOM-kNN predictions (Ex. 2–3, 6–7, 10– 11, and 14–15). The results provided in Table III indicate using PR filters of sizes much less than the size of focal operator neighborhoods used to derive spatial statistics does not result in a significant increase in prediction accuracy. Fig. 8 compares standard variable global RF and SVM classifier predictions (Ex. 1 and 5) and the effect on predictions resulting from PR filters of different sizes. As the size of the PR filter neighborhood increases spatially contiguous classifications covering larger areas and increasingly rounded (convex) outer edges are observed. The original predictions must be dominated by
CRACKNELL AND READING: SPATIAL-CONTEXTUAL SUPERVISED CLASSIFIERS EXPLORED
1379
Fig. 7. Examples of best performing spatial-contextual SVM classifiers, standard variable kNN-SVM (Ex. 6), and texture variable SOM-PR (Ex. 16). See Table I for class descriptions.
correct classifications in the first instance in order to reclassify predictions correctly. Fig. 9 shows the proportion of individual classes classified as Qs for the best performing spatial-contextual RF and SVM classifiers. These plots show Ld, CO, SD classes make up the majority of Qs classifications. The bulk of Ld and SD classifications in areas of Qs occur on the southern limits of the Huskisson Syncline and east of Renison Bell. The majority of CO classifications cover spatially contiguous regions in the southeast of the study area. The valley containing Lake Rosebery, and the area mapped as Qs to the northeast, are likely to be reflecting the presence of transported siliciclastic materials and not generating plausible classifications of bedrock lithologies. V. D ISCUSSION The best performing classifiers are those that employ texture variable spatial-contextual preprocessing methods (Ex. 9–16). In contrast, standard variable global classifiers (Ex. 1 and 5) generate the lowest or equal lowest Tb accuracies. Classifiers trained on texture variables were able to generate spatially homogeneous classifications not affected by high frequency noise. The inability of texture variable classifiers
to resolve lithologies expressed as narrow bodies suggests that the smoothing effect of focal operators is inhibiting the classification of small scale features. This is because large neighborhood dimensions can, in situations where the features of interest are relatively small, incorporate information from neighboring features [26]. Focal operator neighborhood dimensions should, therefore, not be larger than the scale of the smallest features of interest. The use of the kNN-SVM method to train localized (in variable space) classifiers significantly increases the Tb accuracies of standard variable SVM classifiers (Ex. 6). These results align with the findings of Refs. [39] and [40]. The resulting Tb accuracies of kNN-SVM classifiers are, however, equivalent to those obtained by standard variable RF global classifiers (Ex. 1). In addition, there is no significant difference between RF global classifier (Ex. 1 and 9) Tb accuracies and those observed for kNN-RF classifiers (Ex. 2 and 10). These results reflect the adaptive nearest neighbor architecture of RF classifiers, which adjusts the geometry of decision boundaries based on the distribution of local training samples in variable space [44], [73]. By default, RF trains localized (in variable space) classifiers and does not require the use of methods that explicitly select Ta samples representing local sample characteristics.
1380
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 8, NO. 3, MARCH 2015
TABLE III PR M AJORITY F ILTER C OMPARISON
Difference between mean Tb accuracy obtained using 3 × 3, 7 × 7, and 11 × 11 neighborhood (cell) dimensions and mean Tb accuracy resulting from predictions not utilizing majority filters (see Table II). Standard deviation of mean majority filter Tb accuracies is 0.01 and mean 95% CI are ~0.01 ±0.001 for all classifiers. ∗ Denotes majority PR filter T accuracy increase greater than the sum of one b standard deviation and 95% CI of mean Tb accuracy not using majority PR filters.
The kNN hybrid method (Ex. 2, 6, 10, and 14) induces multiple localized classifiers and hence incurs significantly higher computational cost compared to training a single global classifier. This computational cost is related to the fact that the number of classifiers is equal to the number of Ta samples. The SOM-kNN method (Ex. 3, 7, 11, and 15), developed for this study, reduced computational cost by a factor of 10 and generated equivalent Tb accuracies to kNN hybrid classifiers. SOM was implemented by selecting an arbitrary number of seed-nodes from which to cluster samples. An optimal number of SOM nodes and appropriate SOM 2-D map topology can be identified by searching combinations of these SOM parameters and selecting those that result in minimum average quantization and topological errors [74], [75]. PR methods are useful for generating spatially contiguous regions of a given class by eliminating high frequency noise present in original classifications [6], [15], [17]. The SOMPR method generated the highest Tb accuracies for standard variable RF and texture variable SVM classifiers. SOM-PR provided an efficient and effective means of limiting the effect of high frequency noise on classifications. Unlike the classifiers that used texture variables, standard variable SOM-PR classifiers (Ex. 4 and 8) were able to map thin and discontinuous lithostratigraphic units. A comparison of RF global [Fig. 10(a), Ex. 1] and RF SOM-PR [Fig. 10(b), Ex. 4] classifications highlights the SOM-PR methods ability to eliminate the bulk of isolated classifications and improve the definition of boundaries between classes. Fig. 10(c) shows numerous small SOM image
segments (derived from standard variables) are coincident with individual classes. Small segments are desirable as they reduce the possibility of a large segment being misclassified; thus, mitigating against misclassifications impacting significantly on overall accuracy [22]. Incorporating classifier uncertainty measures, derived from class membership probabilities as described in Cracknell and Reading [50], may provide a means of weighting predictions to reduce the effect of erroneous classifications on PR majority filters. Tarabalka et al. [7] demonstrate this concept using a probabilistic SVM classifier coupled with a PR filter that combines edge detection filters and Markov random field (MRF) regularization. This approach aims to maintain the correct position of boundaries between spatially varying phenomena during PR. Larger PR majority filter neighborhoods leads to higher overall classification accuracies at the expense of identifying narrow and/or linear geological features. An ellipsoidal PR filter that characterizes nonstationary spatial covariance parameters representing local spatial scale [26] and anisotropic spatial structure [76], [77] may result in improved representations of linear geological features. The results presented in Fig. 9 indicated regions mapped as Qs were dominated by classifications representing Ld, CO, SD, and Lo units. The spectral characteristics of these classes are likely to overlap with those of Qs. This is because large amounts of siliciclastic materials, specifically derived from CO (Owen Group) lithologies, constitute the bulk of Quaternary sediments in the study area [78], [79]. For example, thick glacial outwash sands and gravels in the Boco Valley [78] obscure what is likely to be bedrock composed of lithologies associated with the Mount Read Volcanics [80]. These Quaternary sediments were dominantly classified as either CO or Lo. In contrast, the valleys surrounding Lake Rosebery are more likely to be classified as SD and (to the southeast) Ld. During the preparation of Ta and Tb , units representing the Western Volcano-Sedimentary Sequence and correlates of the Tyndall Group, within the Huskisson Syncline south of Lake Pieman [51], were merged with the volcaniclastic sediments unit (Cd) of the Mount Read Volcanics. This unit comprises felsic to intermediate volcaniclastic sandstone, siltstone, and chert-rich granule-pebble conglomerate rocks [79]. The majority of spatial contextual classifiers are classifying this region as a combination of siliciclastic and volcaniclastic materials (Ld), rather than dominated solely by volcaniclastic sedimentary materials (Cd). A study conducted by Leaman and Richardson [81] interpreted the presence of granitoid bodies at shallow depths (∼1 km) south of Renison Bell and Rosebery from gravity data. Many of the RF spatial-contextual classifiers trialed in this study have predicted granite bodies in the area described above.
VI. C ONCLUSION We have evaluated new and preexisting methods for spatialcontextual RF and SVM supervised classification with respect to a challenging lithostratigraphy classification problem. We divide methods for including spatial-contextual information into the stages of the supervised classification work flow
CRACKNELL AND READING: SPATIAL-CONTEXTUAL SUPERVISED CLASSIFIERS EXPLORED
1381
Fig. 8. Example of global RF and SVM classifications trained on standard inputs (Ex. 1 and 5). Mismatch images represent correct and incorrect classification as compared to the reference geological map used to train classifiers. PR mismatch images identify cells that were reclassified using majority filters of different sizes. Colors indicate if the reclassification resulted in correct (green) or incorrect (red) reclassified predictions. See Table I for class descriptions.
1) data preprocessing using focal operators; 2) training multiple classifiers from local neighborhoods of training samples identified using the k-nearest neighbor (kNN) hybrid method; and 3) postprocessing via PR. These spatial-contextual methods, and combinations thereof, were compared to RF and SVM global classifiers. Our experiments, reported as mean (n = 10) overall accuracy ±95% confidence intervals, indicate spatial-contextual preprocessing focal operators (texture variables) significantly increased (RF 0.754 ±0.010, SVM 0.683 ±0.010) classification accuracies over the use of standard variables (RF 0.625 ±0.011, SVM 0.581 ±0.011). Combining image segmentation with PR methods was more advantageous for RF (0.652 ±0.011) than SVM (0.608 ±0.011) and was able to
resolve thin and discontinuous lithostratigraphic units using standard variables. The kNN hybrid method only improves classification accuracies for SVM (0.629 ±0.011). Combining textural data with the kNN hybrid method decreased RF classifier accuracy (0.696 ±0.010) and did not result in any significant difference in the accuracy of classifiers trained using SVM (0.671 ±0.010). Our newly introduced kNN hybrid classifier (SOM-kNN), which combines image segmentation with local training sample selection, significantly reduced processing times when compared to standard kNN hybrid classifiers with no appreciable difference in accuracy. Majority filter post regularization methods resulted in a substantial increase in the accuracy of global classifiers using standard variables (RF 0.705, SVM 0.607 for
1382
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 8, NO. 3, MARCH 2015
Fig. 9. Comparisons of mean (n = 10) proportions of mapped Qs samples classified as classes present within Ta . Error bars indicate one standard deviation from the mean. See Table I for class descriptions.
Fig. 10. Comparison of (a) RF global (Ex. 1); b) RF SOM-PR (Ex. 4) classifications, in relation to (c) SOM image segments (white represents NA) covering the region indicated by black square in Fig. 6.
11 × 11 neighborhoods) but reduced the detail with which lithostratigraphic units could be defined. We find that preprocessing spatial data to generate first- and second-order spatial statistics is a good first choice for producing accurate classifications of spatially varying phenomena. The scale of features that can be classified, however, is dependent upon the choice of focal operator neighborhood size. In situations where target features are likely to be thin and discontinuous we recommend using image segmentation of non-preprocessed spatial data coupled with the PR of RF classifications.
ACKNOWLEDGMENT The authors acknowledge the R project for statistical computing (http://www.r-project.org). The newly implemented SOM-kNN classifier code is available from the authors. Airborne geophysical data was sourced from Mineral Resources Tasmania (http://www.mrt.tas.gov.au) and Landsat ETM+ imagery from the United States Geological Survey (http://eros.usgs.gov). Random Forests is a trademark of L. Breiman and A. Cutler. The authors would like to thank two anonymous reviewers whose comments improved this manuscript.
CRACKNELL AND READING: SPATIAL-CONTEXTUAL SUPERVISED CLASSIFIERS EXPLORED
R EFERENCES [1] M. Shaheen, M. Shahbaz, Z. Rehman, and A. Guergachi, “Data mining applications in hydrocarbon exploration,” Artif. Intell. Rev., vol. 35, no. 1, pp. 1–18, Jan. 2011. [2] M. C. Burl et al., “Learning to recognize volcanoes on Venus,” Mach. Learn., vol. 30, no. 2, pp. 165–194, 1998. [3] U. Demšar, P. Harris, C. Brunsdon, A. S. Fotheringham, and S. McLoone, “Principal component analysis on spatial data: An overview,” Ann. Assoc. Amer. Geogr., vol. 103, no. 1, pp. 106–128, 2013. [4] M. Gahegan, “On the application of inductive machine learning tools to geographical analysis,” Geogr. Anal., vol. 32, no. 2, pp. 113–139, 2000. [5] J. Li and R. M. Narayanan, “Integrated spectral and spatial information mining in remote sensing imagery,” IEEE Trans. Geosci. Remote Sens., vol. 42, no. 3, pp. 673–685, Mar. 2004. [6] Y. Tarabalka, J. A. Benediktsson, and J. Chanussot, “Spectral–spatial classification of hyperspectral imagery based on partitional clustering techniques,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 8, pp. 2973– 2987, Aug. 2009. [7] Y. Tarabalka, M. Fauvel, J. Chanussot, and J. A. Benediktsson, “SVMand MRF-based method for accurate classification of hyperspectral images,” IEEE Geosci. Remote Sens. Lett., vol. 7, no. 4, pp. 736–740, Oct. 2010. [8] P. M. Atkinson and N. J. Tate, “Spatial scale problems and geostatistical solutions: A review,” Prof. Geogr., vol. 52, no. 4, pp. 607–623, 2000. [9] L. Anselin, “Local indicators of spatial association—LISA,” Geogr. Anal., vol. 27, no. 2, pp. 93–115, Apr. 1995. [10] A. S. Fotheringham, C. Brunsdon, and M. Charlton, Geographically Weighted Regression: The Analysis of Spatially Varying Relationships. Hoboken, NJ, USA: Wiley, 2002. [11] C. D. Lloyd, Local Models for Spatial Analysis, 2nd ed. Boca Raton, FL, USA: CRC Press/Taylor & Francis, 2011. [12] W. R. Tobler, “A computer movie simulating urban growth in the Detroit region,” Econ. Geogr., vol. 46, no. 2, pp. 234–240, 1970. [13] A. Getis, “Spatial autocorrelation,” in Handbook of Applied Spatial Analysis: Software, Tools, Methods and Applications, M. M. Fisher and A. Getis, Eds. Berlin, Germany: Springer-Verlag, 2010, pp. 255–278. [14] IUGS Commission on Stratigraphy. International Subcommission on Stratigraphic Classification, International Stratigraphic Guide: A Guide to Stratigraphic Classification, Terminology, and Procedure. Hoboken, NJ, USA: Wiley, 1976. [15] B. Ghimire, J. Rogan, and J. Miller, “Contextual land-cover classification: Incorporating spatial dependence in land-cover classification models using random forests and the Getis statistic,” Remote Sens. Lett., vol. 1, no. 1, pp. 45–54, 2010. [16] S. Grebby, J. Naden, D. Cunningham, and K. Tansey, “Integrating airborne multispectral imagery and airborne LiDAR data for enhanced lithological mapping in vegetated terrain,” Remote Sens. Environ., vol. 115, no. 1, pp. 214–226, Jan. 2011. [17] E. Ricchetti, “Multispectral satellite image and ancillary data integration for geological classification,” Photogramm. Eng. Remote Sens., vol. 66, no. 4, pp. 429–435, 2000. [18] C. A. Link and S. Blundell, “Interpretation of shallow stratigraphic facies using a self-organizing neural network,” in Geophysical Applications of Artificial Neural Networks and Fuzzy Logic, W. Sandham and M. Leggett, Eds. Norwell, MA, USA: Kluwer, 2003, pp. 215–230. [19] B. D. Bue and T. F. Stepinski, “Automated classification of landforms on Mars,” Comput. Geosci., vol. 32, no. 5, pp. 604–614, Jun. 2006. [20] M. Zortea, M. De Martino, and S. Serpico, “A SVM ensemble approach for spectral-contextual classification of optical high spatial resolution imagery,” presented at the IEEE Int. Geosci. Remote Sens. Symp. (IGARSS’07), Barcelona, Spain, 2007, pp. 1489–1492. [21] E. Blanzieri and F. Melgani, “Nearest neighbor classification of remote sensing images with the maximal margin principle,” IEEE Trans. Geosci. Remote Sens., vol. 46, no. 6, pp. 1804–1811, Jun. 2008. [22] S. Ghosh, T. F. Stepinski, and R. Vilalta, “Automatic annotation of planetary surfaces with geomorphic labels,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 1, pp. 175–185, Jan. 2010. [23] H. Murray, A. Lucieer, and R. Williams, “Texture-based classification of sub-Antarctic vegetation communities on Heard Island,” Int. J. Appl. Earth Observ. Geoinf., vol. 12, no. 3, pp. 138–149, 2010. [24] C.-H. Li, B.-C. Kuo, C.-T. Lin, and C.-S. Huang, “A spatial-contextual support vector machine for remotely sensed image classification,” IEEE Trans. Geosci. Remote Sens., vol. 50, no. 3, pp. 784–799, Mar. 2012. [25] N. Segata, E. Pasolli, F. Melgani, and E. Blanzieri, “Local SVM approaches for fast and accurate classification of remote-sensing images,” Int. J. Remote Sens., vol. 33, no. 19, pp. 6186–6201, 2012.
1383
[26] S. E. Franklin, M. A. Wulder, and M. B. Lavigne, “Automated derivation of geographic window sizes for use in remote sensing digital image texture analysis,” Comput. Geosci., vol. 22, no. 6, pp. 665–673, 1996. [27] V. Shankar, Texture-Based Automated Lithological Classification Using Aeromagnetic Anomaly Images. Reston, VA, USA: U.S. Geological Survey, 2009. [28] R. M. Haralick, K. Shanmugam, and I. Dinstein, “Textural features for image classification,” IEEE Trans. Syst. Man Cybern., vol. 3, no. 6, pp. 610–621, Nov. 1973. [29] L. Lepistö, I. Kunttu, and A. Visa, “Classification of natural rock images using classifier combinations,” Opt. Eng., vol. 45, no. 9, p. 097201, 2006. [30] A. Getis and J. K. Ord, “The analysis of spatial association by use of distance statistics,” Geogr. Anal., vol. 24, no. 3, pp. 189–206, 1992. [31] J. K. Ord and A. Getis, “Local spatial autocorrelation statistics: Distributional issues and an application,” Geogr. Anal., vol. 27, pp. 286– 306, 1995. [32] R. C. Gonzalez and R. E. Woods, Digital Image Processing, 3rd ed. Englewood Cliffs, NJ, USA: Prentice-Hall, 2008. [33] D. Stow, “Geographic object-based image change analysis,” in Handbook of Applied Spatial Analysis: Software, Tools, Methods and Applications, M. M. Fisher and A. Getis, Eds. Berlin, Germany: Springer-Verlag, 2010, pp. 565–582. [34] A. Stumpf and N. Kerle, “Object-oriented mapping of landslides using random Forests,” Remote Sens. Environ., vol. 115, no. 10, pp. 2564–2577, 2011. [35] D. C. Duro, S. E. Franklin, and M. G. Dubé, “A comparison of pixelbased and object-based image analysis with selected machine learning algorithms for the classification of agricultural landscapes using SPOT-5 HRG imagery,” Remote Sens. Environ., vol. 118, pp. 259–272, 2012. [36] L. Bottou and V. N. Vapnik, “Local learning algorithms,” Neural Comput., vol. 4, pp. 888–900, 1992. [37] E. Blanzieri and F. Melgani, An Adaptive SVM Nearest Neighbor Classifier for Remotely Sensed Imagery. New York, NY, USA: IEEE, 2006. [38] H. Zhang, A. C. Berg, M. Maire, and J. Malik, “SVM-KNN: Discriminative nearest neighbor classification for visual category recognition,” presented at the IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 2006, vol. 2, pp. 2126–2136. [39] N. Segata and E. Blanzieri, “Fast local support vector machines for large datasets,” in Machine Learning and Data Mining in Pattern Recognition, vol. 5632, P. Perner, Ed. Berlin, Germany: Springer-Verlag, 2009, pp. 295–310. [40] N. Segata and E. Blanzieri, “Fast and scalable local kernel machines,” J. Mach. Learn. Res., vol. 11, pp. 1883–1926, 2010. [41] A. Toumani, “Fuzzy classification for lithology determination from well logs,” in Geophysical Applications of Artificial Neural Networks and Fuzzy Logic, W. Sandham and M. Leggett, Eds. Norwell, MA, USA: Kluwer, 2003, pp. 125–142. [42] P.-N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining. Reading, MA, USA: Addison-Wesley/Pearson Education, 2006. [43] I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, 2nd ed. Amsterdam, The Netherlands: Elsevier/Morgan Kaufman, 2005. [44] T. Hastie, R. Tibshirani, and J. H. Friedman, The Elements of Statistical Learning: Data Mining, Inference and Prediction, 2nd ed. New York, NY, USA: Springer, 2009. [45] S. Marsland, Machine Learning: An Algorithmic Perspective. Boca Raton, FL, USA: Chapman & Hall/CRC Press, 2009. [46] M. Kovacevic, B. Bajat, and B. Gajic, “Soil type classification and estimation of soil properties using support vector machines,” Geoderma, vol. 154, no. 3–4, pp. 340–347, 2010. [47] L. Yu, A. Porwal, E. J. Holden, and M. C. Dentith, “Towards automatic lithological classification from remote sensing data using support vector machines,” Comput. Geosci., vol. 45, pp. 229–239, Aug. 2012. [48] B. Waske, J. A. Benediktsson, K. Árnason, and J. R. Sveinsson, “Mapping of hyperspectral AVIRIS data using machine-learning algorithms,” Can. J. Remote Sens., vol. 35, no. 1, pp. 106–116, Sep. 2009. [49] M. J. Cracknell and A. M. Reading, “Geological mapping using remote sensing data: A comparison of five machine learning algorithms, their response to variations in the spatial distribution of training data and the use of explicit spatial information,” Comput. Geosci., vol. 63, pp. 22–33, 2014. [50] M. J. Cracknell and A. M. Reading, “The upside of uncertainty: Identification of lithology contact zones from airborne geophysics and satellite data using random Forests and support vector machines,” Geophysics, vol. 78, no. 3, pp. WB113–WB126, 2013.
1384
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 8, NO. 3, MARCH 2015
[51] Mineral Resources Tasmania, “1:25,000 Scale Digital Geology of Tasmania,” Mineral Resources Tasmania, Rosny Park, Australia, 2011. [52] W. M. Telford, L. P. Geldart, and R. E. Sheriff, Applied Geophysics, 2nd ed. Cambridge, U.K.: Cambridge Univ. Press, 1990. [53] L. Breiman, “Random Forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, 2001. [54] L. Breiman, “Bagging predictors,” Mach. Learn., vol. 24, no. 2, pp. 123–140, 1996. [55] B. Waske, J. A. Benediktsson, and J. R. Sveinsson, “Random Forests classification of remote sensing data,” in Signal and Image Processing for Remote Sensing, 2nd ed., C. H. Chen, Ed. Boca Raton, FL, USA: CRC Press, 2012, pp. 365–374. [56] V. N. Vapnik, The Nature of Statistical Learning Theory. Berlin, Germany: Springer-Verlag, 1995. [57] V. N. Vapnik, Statistical Learning Theory. Hoboken, NJ, USA: Wiley, 1998. [58] C.-W. Hsu, C.-C. Chang, and C.-J. Lin, “A practical guide to support vector classification,” Dept. Computer Science, National Taiwan Univ., Taipei, Taiwan, Tech. Rep., 2010, p. 16 [Online]. Available: https://www. cs.sfu.ca/people/Faculty/teaching/726/spring11/svmguide.pdf [59] A. Karatzoglou, D. Meyer, and K. Hornik, “Support vector machines in R,” J. Stat. Softw., vol. 15, no. 9, p. 28, Apr. 2006. [60] F. Melgani and L. Bruzzone, “Classification of hyperspectral remote sensing images with support vector machines,” IEEE Trans. Geosci. Remote Sens., vol. 42, no. 8, pp. 1778–1790, Aug. 2004. [61] C. J. C. Burges, “A tutorial on support vector machines for pattern recognition,” Data Min. Knowl. Discovery, vol. 2, no. 2, pp. 121–167, Jun. 1998. [62] C.-W. Hsu and C.-J. Lin, “A comparison of methods for multiclass support vector machines,” IEEE Trans. Neural Netw., vol. 13, no. 2, pp. 415–425, Mar. 2002. [63] T. Kohonen, “Self-organized formation of topologically correct feature maps,” Biol. Cybern., vol. 43, no. 1, pp. 59–69, Jan. 1982. [64] T. Kohonen, “The self-organizing map,” Neurocomputing, vol. 21, no. 1–3, pp. 1–6, 1998. [65] F. P. Bierlein, S. J. Fraser, W. M. Brown, and T. Lees, “Advanced methodologies for the analysis of databases of mineral deposits and major faults,” Aust. J. Earth Sci., vol. 55, no. 1, pp. 79–99, Feb. 2008. [66] S. J. Fraser and B. L. Dickson, “A new method for data integration and integrated data interpretation: Self-organising maps,” presented at the 5th Decennial Int. Conf. Miner. Explor., Expanded Abstracts, 2007, pp. 907–910. [67] M. Kuhn et al.. (2014). caret: Classification and Regression Training. R package version 5.15-023 [Online]. Available: http://CRAN.Rproject.org/package=caret [68] M. J. Cracknell, A. M. Reading, and A. W. McNeill, “Mapping geology and volcanic-hosted massive sulfide alteration in the Hellyer–Mt Charter region, Tasmania, using random ForestsT M and self-organising maps,” Aust. J. Earth Sci., vol. 61, pp. 287–304, 2014. [69] F. Provost and T. Fawcett, “Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions,” presented at the Proc. 3rd Int. Conf. Knowl. Discovery Data Min. (KDD’97), 1997, pp. 43–48. [70] A. Karatzoglou, A. Smola, K. Hornik, and A. Zeileis, “kernlab—An S4 package for Kernel methods in R,” J. Stat. Softw., vol. 11, no. 9, pp. 1–20, 2004. [71] L.-K. Soh and C. Tsatsoulis, “Texture analysis of SAR sea ice imagery using gray level co-occurrence matrices,” IEEE Trans. Geosci. Remote Sens., vol. 37, no. 2, pp. 780–795, Mar. 1999. [72] R. G. Congalton and K. Green, Assessing the Accuracy of Remotely Sensed Data: Principles and Practices, 1st ed. Boca Raton, FL, USA: Lewis, 1998. [73] Y. Lin and Y. Jeon, “Random forest and adaptive nearest neighbors,” Dept. Statistics, Univ. Wisconsin, Madison, WI, USA, Tech. Rep. 1055, 2002. [74] K. Kiviluoto, “Topology preservation in self-organizing maps,” presented at the IEEE Int. Conf. Neural Netw., 1996, vol. 1, pp. 294–299. [75] E. A. Uriarte and F. D. Martín, “Topology preservation in SOM,” Int. J. Appl. Math. Comput. Sci., vol. 1, no. 1, pp. 19–22, 2005. [76] J. Boisvert, J. Manchuk, and C. Deutsch, “Kriging in the presence of locally varying anisotropy using non-Euclidean distances,” Math. Geosci., vol. 41, no. 5, pp. 585–601, 2009. [77] J. B. Boisvert and C. V. Deutsch, “Programs for kriging and sequential Gaussian simulation with locally varying anisotropy using nonEuclidean distances,” Comput. Geosci., vol. 37, no. 4, pp. 495–510, 2011.
[78] P. Augustinus and E. A. Colhoun, “Glacial history of the upper Pieman and Boco valleys, western Tasmania,” Aust. J. Earth Sci., vol. 33, no. 2, pp. 181–191, Jun. 1986. [79] A. V. Brown, Geology of the Dundas-Mt Lindsay-Mt Youngbuck Region. Rosny Park, Australia: Tasmanian Geological Survey Bulletin, 1986, GSB62. [80] A. W. McNeill and K. D. Corbett, Geology of the Tullah–Mt Block Area. Rosny Park, Australia: Tasmanian Department of Mines, 1989. [81] D. E. Leaman and R. G. Richardson, The Granites of West and North-West Tasmania—A Geophysical Interpretation. Rosny Park, Australia: Dept. Mines, 1989. [82] D. B. Seymour and C. R. Calver, Explanatory Notes for the Time–Space Diagram and Stratotectonic Elements Map of Tasmania. Rosny Park, Australia: Tasmanian Geological Survey, 1995. [83] M. R. Banks and P. W. Baillie, “Late Cambrian–Devonian,” in Geology and Mineral Resources of Tasmania, vol. 15, C. F. Burrett and E. L. Martin, Eds. Brisbane, Australia: Special Publication Geological Society of Australia, 1989, pp. 182–237. [84] C. A. Noll and M. Hall, “Structural architecture of the Owen Conglomerate, West Coast Range, western Tasmania: Field evidence for Late Cambrian extension,” Aust. J. Earth Sci., vol. 52, no. 3, pp. 411–426, Jun. 2005. [85] K. D. Corbett and M. Solomon, “Cambrian Mt Read Volcanics and associated mineral deposits,” in Geology and Mineral Resources of Tasmania, vol. 15, C. F. Burrett and E. L. Martin, Eds. Brisbane, Australia: Special Publication Geological Society of Australia, 1989, pp. 84–153. [86] D. B. Seymour, G. R. Green, and C. R. Calver, The Geology and Mineral Deposits of Tasmania: A Summary. Rosny Park, Australia: Mineral Resources Tasmania, Department of Infrastructure, Energy and Resources, 2013. [87] M. J. Rubenach, “The origin and emplacement of the Serpentine Hill Complex, Western Tasmania,” J. Geol. Soc. Aust., vol. 21, no. 1, pp. 91– 106, Mar. 1974. [88] R. F. Berry and A. J. Crawford, “The tectonic significance of Cambrian allochthonous mafic-ultramafic complexes in Tasmania,” Aust. J. Earth Sci., vol. 35, no. 4, pp. 523–533, Dec. 1988. [89] A. J. Crawford and R. F. Berry, “Tectonic implications of Late Proterozoic-Early Palaeozoic igneous rock associations in western Tasmania,” Tectonophysics, vol. 214, no. 1–4, pp. 37–56, 1992. [90] N. J. Turner, “Precambrian,” in Geology and Mineral Resources of Tasmania, vol. 15, C. F. Burrett and E. L. Martin, Eds. Brisbane, Australia: Special Publication Geological Society of Australia, 1989, pp. 5–46. Matthew J. Cracknell received the B.Sc. degree in geology and geophysics (first class hons.) and the Ph.D. degree in computational geophysics from the School of Physical Sciences (Earth Sciences)/CODES, Faculty of Science, Engineering, and Technology, University of Tasmania, Hobart, Tas., Australia, 2009 and 2014, respectively. His research interests include the use of machine learning for the classification of geological and geomorphological features Dr. Cracknell is a member of the Geological Society of Australia, the Society of Exploration Geophysicists, and the International Association for Mathematical Geosciences. Anya M. Reading received the B.Sc. degree (hons.) in geophysics with astrophysics from the University of Edinburgh, Edinburgh, U.K., and the Ph.D. degree in geophysics from the University of Leeds, Leeds, U.K., 1991 and 1997, respectively. She has held technical research and academic positions with British Antarctic Survey, The University of Edinburgh, and Australian National University. Currently, she leads the Computational Geophysics and Earth Informatics Group, School of Physical Sciences (UTas), University of Tasmania (UTas), Hobart, Tas., Australia. Her research interests include computational approaches to global and regional geophysics. Dr. Reading is a member of learned societies including the Royal Astronomical Society, American Geophysical Union and Institute of Physics. She currently serves on the Australian Academy of Science, National Committee for Earth Sciences.