Combining Machine Learning and Geophysical ... - CSIRO Publishing

Combining Machine Learning and Geophysical Inversion for Applied Geophysics Anya M. Reading

Matthew J. Cracknell

Daniel J. Bombardieri

Tim Chalke

UTAS / CODES Private Bag 79 Hobart, TAS, 7001 [email protected]

UTAS / CODES Private Bag 79 Hobart, TAS, 7001 [email protected]

Mineral Resources Tasmania PO Box 56 Rosny Park, TAS, 7018 [email protected]

Mira Geoscience 39 Sherwood Road Toowong, Brisbane, QLD, 4066 [email protected]

SUMMARY Machine learning and geophysical inversion both represent ways that the applied geophysicist might gain knowledge from field observations and remote sensed data. The two approaches represent contrasting philosophies based respectively on statistics and physics. Both potentially add insights which might help constrain 3D geology by geophysical means. Machine learning uses patterns in data to provide statistically controlled predictions, e.g. of lithology. In contrast, geophysical inversion relies on modelling the physical response of 3D geological block geometry in a deterministic manner. Although both approaches are widely used, it is not currently commonplace in applied geosciences to make use of a combined approach. We present an example which aims to refine the 3D geology in a prospective region of west Tasmania. Although the region is geologically well-mapped, thick vegetation and significant topography present a challenging set of conditions under which to refine the lithology and block geometry to a level of detail which will support the next generation of exploration. We use multiple layers of remote sensed geophysical data to provide probabilistic information on near-surface lithology extent using the Random Forests classifier. We show how the statistical, robust, output from the machine learning exercise can be used to guide the construction of improved volume geometry within a 3D GOCAD geological and geophysical modelling environment. This enables better constraints to be supplied to the geophysical inversion with resulting improvements in the detail of the 3D geology.

al., 2011, Cracknell and Reading, 2014, Cracknell et al., 2014). More conventionally, we can take a deterministic approach whereby the physical response of a system is measured and, by a process of geophysical inversion, we match observations to a modelled response and thereby find a best-fit estimation of parameters (Aster et al., 2013, Fullagar et al., 2004). This contribution presents an example whereby details of a 3D geological model are improved by using a workflow which combines machine learning and geophysical inversion.

DATA A total of 23 geophysical data layers derived from both airborne and satellite sensors were prepared for input into Random Forests. Total Magnetic Intensity (nT) data were Reduced-to-Pole (RTP) with a linear regional trend removed and the RTP 1st vertical derivative was calculated. Other airborne sensor data comprised radiometrics [K (%), Th (ppm) and U (ppm) and log(K/Th)] and a Digital Elevation Model (DEM). Landsat ETM+ data (7 bands) and bands ratios (3/1, 3/2, 3/5, 3/7, 5/1, 5/2, 5/4, 5/7 and 5x4/3x4) were obtained from a single cloud free scene. The 1: 25,000 scale geological map (Mineral Resources Tasmania, 2011) of the study region in the prospective Mount Read Volcanics of the Dundas element of Tasmania (Figure 1) was simplified to three classes prior to Random Forests training and testing (Figure 2a). These classes represent generalised geological units: U (incorporating the Cambrian Ultramafic-Mafic Complex and equivalent rocks), Q (all unconsolidated Quaternary sediments), and O (other rocks).

Key words: Machine learning, Data mining, Modelling and inversion, Lithology, 3D Geological mapping,

INTRODUCTION Applied geophysicists aim to improve knowledge of 3D geological volume geometries and rock properties. Any 'knowledge discovery' task may be accomplished using one of the wide variety of computational statistics techniques that are encompassed by the discipline of machine learning (Hastie et

ASEG-PESA 2015 – Perth, Australia

Figure 1. Study area location (black rectangle) and major tectonic units exposed in Tasmania, SE Australia. 1

Combining Machine Learning and Geophysical Inversion

Reading et al.

All input data were resampled to 40 m resolution (if necessary), low pass filtered using 3x3 mean spatial filters and then standardised to zero mean and unit variance. Highly correlated data, with mean Pearson’s correlation coefficients >0.9, and associated with a large proportion of other data, were removed resulting in a total of 14 inputs available for Random Forests training and classification (Figure 2b). A total of 437,500 samples were obtained representing pixel centroids covering the study area. Training data and test data (independent of the training data) contained stratified random samples of U and O classes (excluding Q). Training data included 500 samples and test data 5000 samples for both classes.

METHODS Machine Learning using Random Forests Random Forests (Breiman, 2001) is an ensemble supervised classifier that induces multiple randomised decision tree classifiers, known as a forest. Randomness is introduced by randomly subsetting a predefined number of input variables (mtry – defaults to √ number of variables) to split at each node of a tree and by bagging. Bagging generates training samples for each tree by sampling with replacement a number of samples equal to the number of instances in the training data. This equates to approximately two-thirds of instances available for training while the remainder, so called Out-ofBag (OOB) samples, are used for evaluation. The Gini Index is used by Random Forests to determine a “best split” threshold at each tree node. The Gini Index is defined as: j

Gini(t)= ∑ g c (1− g c ),

[1]

c= 1

where gc is the probability of class c at node j,

gc=

nc , n

[2]

and nc is the number of samples belonging to c and n is the total number of samples within j. For each candidate split, the threshold t that results in the maximum reduction in class heterogeneity is selected (Breiman et al., 1984). The class membership probability pc that a given sample i is one of c in the training data is estimated by dividing the total number of votes for each class by the number of trees T (Hastie et al., 2009),

pc=

1 T

T

∑ y ic

c= 1 . [3] Random Forests was trained using mtry = 3 (default) and 5000 trees. Figure 2c shows that the OOB Error (U = 0.082; O = 0.042; and Average = 0.062) has reached a stable minimum at >4000 trees. The trained Random Forests classifier was then applied to all samples within the study area.

Figure 2. a) Simplified geological map showing training data sample locations for U – “Ultramafic” and O – “Other” geological units. Note that Q - “Quaternary” was excluded from the training data. b) Plot of Random Forests ranked variable importance for non-correlated input data express as mean decrease in Gini Index. c) Random Forests Out-of-Bag Error estimates as a function of the number of trees.


2


Reading et al.

3D Geophysical Modelling and Inversion The deterministic modelling of 3D geology was carried out using the GOCAD™ software environment to create geologically realistic volumes, assigned to appropriate rock property values (McGaughey, 2006). As these methods are more widely used by the applied geophysics community than machine learning, we here present only a brief summary. After the painstaking process of creating a 3D geological model, based on field observations and balanced structural geology cross sections, the rock property values and the geometry of the volumes were optimised using a process of constrained inversion using petrophysics data and mapped geology (Fullagar and Pears, 2007, Duffett et al, 2013). In this process, geophysical modelling and inversion, an initial model is adjusted to improve the fit between the observed and modelled (deterministic) geophysical response (Fullagar et al., 2008). Decisions to be taken by the analyst regarding which model parameters to constrain, and which should be allowed to vary, may be significantly assisted by the output from the machine learning process described herein. We suggest that a workflow whereby machine learning is used first, to improve detail in the constraints supplied to the geophysical inversion, is a well-founded strategy. In this case, the shallow geometry constraints in parts of the model covered with dense vegetation or quaternary cover are improved.

RESULTS The independent test data error assessment of Random Forests classifications (Table 1) indicates an overall error (±95% Confidence Limits) of 0.062 ±0.005. Figure 3 shows the spatial distribution of the probability that U (ultramafic rocks) are at or near the surface. There is a high degree of spatial correlation between samples that are class U and the extent of ultramafic rocks (see Figure 2a). At locations A and B (marked on Figure 3) it is highly probable that ultramafic rocks are present but obscured by shallow Quaternary sediments. Figure 4a and 4b compare a 3D volume of ultramafic rock magnetic susceptibility output from a heterogeneous geophysical (magnetic) property inversion conducted in GOCAD™ to the probability that class U is at or near the surface. Locations A and B correspond to regions of anomalously high magnetic susceptibility within the 3D ultramafic volume. In addition, the U class probabilities at locations A and B indicate substantial differences between the modelled 3D ultramafic volume and its likely surface coverage. The magnetic residuals in Figure 4c confirm that changes to the geometry of ultramafic rocks are necessary to construct better fitting 3D volumes. Hence, the Random Forests U class probabilities can be used to guide the construction of model volumes of more accurate extent.

Figure 3. Spatial distribution of Random Forests output probabilities that ultramafic rocks are at or near the surface.

DISCUSSION With goal of inferring 3D geology using geophysical data, including remote sensing data, applied geophysicists now have multiple computational tools at their disposal. These tools provide alternative ways of learning from data, and lend themselves naturally to different data types, with machine learning being well suited to high-dimensional data. Random Forests is a good first choice of algorithm for geological classification characterised by the challenging combination of high intra-class and low inter-class variability (Cracknell and Reading, 2014). Importantly, Random Forests is relatively straightforward to implement by a non-machine learning expert. Using machine learning prior to geophysical inversion allows information from a greater variety of data sources to be used to improve detail in 3D geological mapping.

CONCLUSIONS

Reference U

O

Prediction Error

U

4628

246

0.050

O

372

4754 0.073

Prediction

Table 1. Test data confusion matrix showing the counts of misclassified and correctly classified reference samples (n = 10,000) for two classes – U (ultramafic) and O (other).


We have demonstrated the spatial distribution of Random Forests class membership probabilities can be used to better constrain the geometry of a key lithological unit under cover. This information can be used to resolve the origins of anomalies observed in the residual maps obtained from 3D geophysical property inversions. Using the outputs of machine learning supervised classifications of remote sensed data to guide 3D geophysical inversions enables the generation of more accurate 3D geological models. The approaches that we have introduced will be of significant utility in regions where the geometry of near surface bedrock units is difficult to constrain.

3


Reading et al.

Figure 4. Perspective views showing the Random Forests U class probability draped over topography (a and b), the 3D volume representing ultramafic rocks (b and c), and the magnetic residuals (c). The ultramafic volume indicates low (blue) and high (red) magnetic susceptibility generated from a heterogeneous 3D geophysical (magnetic) property inversion. A and B (magenta circles) - see Figure 3 and main text. Lower surface (light green) indicates granite.

ACKNOWLEDGMENTS We use the R project for statistical computing (http://www.rproject.org), the R package randomForest (Liaw and Wiener, 2002), and caret (Kuhn, 2014) for the selection of variables and evaluation of outputs. Airborne data (westtas2001) are from Mineral Resources Tasmania (http://www.mrt.tas.gov.au) and Landsat ETM+ imagery from the United States Geological Survey (http://eros.usgs.gov). Published with the permission of the Director, Mineral Resources Tasmania. Trademarks: Random Forests™/Breiman and Cutler; GOCAD™ /Mira Geosciences.

REFERENCES Aster, R.C., Borchers, B. and Thurber, C.H., 2013, Parameter Estimation and Inverse Problems, 2nd ed. Academic Press, Elsevier, MA, USA. Breiman, L., 2001, Random Forests. Machine Learning 45, 5– 32. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J., 1984, Classification and Regression Trees, The Wadsworths & Brooks/Cole Statistics/Probability Series, Pacific Grove, USA. Cracknell, M.J., and Reading, A.M., 2014, Geological Mapping using remote sensing data: a comparison of five machine learning algorithms, their response to variation in the spatial distribution of training data and the use of explicit spatial information, Computers and Geosciences, 63, 22-33. Cracknell, M.J., Reading, A.M and McNeill, A.W., 2014, Mapping geology and volcanic-hosted massive sulphide alteration in the Hellyer-Mt Charter region, Tasmania, using Random Forests and Self-Organising Maps, Australian Journal of Earth Sciences, 61, 287-304. Duffett, M., Bombardieri, D., and Chalke, T., 2013. Next generation 3D geological and geophysical modelling, west Tasmania. ASEG-PESA 2013, doi:10.1071/ASEG2013ab246. Fullagar, P.K., Pears, G.A., Hutton, D., and Thompson A., 2004, 3D Gravity and Aeromagnetic Inversion for MVT LeadZinc Exploration at Pillara, WA: Expl. Geophys., 35, 142-146. Fullagar, P.K., Pears, G.A. and McMonnies, B., 2008. Constrained inversion of geological surfaces - pushing the boundaries. The Leading Edge, 27, 98-105. Fullagar, P.K. and Pears, G.A., 2007, Towards geologically realistic inversion: in Proceedings of Exploration 07: Fifth Dec. Int. Conf. on Min. Expl., B. Milkereit (ed.), 444-460. Hastie, T., Tibshirani, R., Friedman, J.H., 2009, The elements of statistical learning: data mining, inference and prediction, 2nd ed, Series in Statistics. Springer, New York, USA. Kuhn, M., 2014. caret: Classification and Regression Training. Liaw, A., Wiener, M., 2002. Classification and Regression by randomForest. RNews 2, 18–22. McGaughey, J., 2006. The Common Earth Model: A Revolution in Mineral Exploration Data Integration: in GIS Applications in the Earth Sciences, Geological Association of Canada Special Publication 44, ed. Harris, J.R., 567-576. Mineral Resources Tasmania, 2011. 1:25,000 Scale Digital Geology of Tasmania.


4