SPE 168111 Ensemble Machine Learning: The Latest

SPE 168111 Ensemble Machine Learning: The Latest Development in Computational Intelligence for Petroleum Reservoir Characterization Fatai A. Anifowose, SPE, King Fahd University of Petroleum and Minerals Copyright 2013, Society of Petroleum Engineers This paper was prepared for presentation at the SPE Saudi Arabia section Annual Technical Symposium and Exhibition held in Khobar, Saudi Arabia, 19–22 May 2013. This paper was selected for presentation by an SPE program committee following review of information contained in an abstract submitted by the author(s). Contents of the paper have not been reviewed by the Society of Petroleum Engineers and are subject to correction by the author(s). The material does not necessarily reflect any position of the Society of Petroleum Engineers, its officers, or members. Electronic reproduction, distribution, or storage of any part of this paper without the written consent of the Society of Petroleum Engineers is prohibited. Permission to reproduce in print is restricted to an abstract of not more than 300 words; illustrations may not be copied. The abstract must contain conspicuous acknowledgment of SPE copyright.

Abstract With the persistent quest for better prediction accuracies of petroleum reservoir properties, research in Computational Intelligence (CI) continues to evolve new techniques to meet this noble objective of petroleum reservoir characterization. In previous presentations, it was established that individual CI techniques are limited in their performance as they have their respective areas of strengths and weaknesses. The concept of Hybrid CI (HCI) was presented to overcome this problem as it utilizes the strengths of two or more techniques to compliment their respective weaknesses. However, the HCI techniques are not able to integrate the various expert opinions on the optimization of CI techniques and those that exist in their respective fields of application. The ensemble learning paradigm is presented here as a possible solution. The ensemble learning paradigm, also called the committee of learning machines, is the latest development in Computational Intelligence and Machine Learning technologies. It is the method of combining the output of several individual learners with different hypotheses employed to solve the same problem in order to produce an overall best result. The success of this paradigm is based on the belief that the decision of a committee of experts is better than that of a single expert. The ensemble method has been successfully applied in other fields such as bio-informatics, hydrology, time series forecasting, soil science, and control systems. Its benefits have not been well utilized in petroleum engineering. As a continuation of what has become like a "SPE Computational Intelligence Lecture Series" over the past 2 years, this paper presents an overview of the ensemble learning paradigm, a review of its successful application in other fields, a justification of its necessity in petroleum engineering and a general framework for its successful application in reservoir characterization. This paper will be of benefit to interested persons to explore the exciting world of computational intelligence and for the appreciation of the benefits of the latest development in computational intelligence. Introduction Over the years, researchers in the petroleum industry have gradually moved from the use of empirical equations and correlations through linear regression models to the use of Computational Intelligence (CI) techniques (Ali, 1994). CI techniques have been well applied in the petroleum industry over the years, especially in reservoir characterization, but with a pace that does not match up with the rate of advancement and the dynamics of the machine learning technology. The degree of application of the CI techniques is an indication of the relative increase in the level of awareness and interest in the concept (Anifowose and Abdulraheem, 2011). It is however not impressing enough that the application of CI that have gained much ground in the industry has majorly been limited to the Artificial Neural Networks (ANN) and Fuzzy Logic toolboxes (Jong-Se, 2005; Abdulraheem et al., 2007; Kaviani et al., 2008; Khoukhi et al., 2010). Few work has been done in the area of hybrid CI modeling (Abe, 2004; Mohsen et al., 2007; Shahvar et al., 2009; Weldu et al., 2010; Anifowose et al., 2011; Anifowose et al., 2013a) but almost nothing yet in the application of ensemble models. However, both ANN and Fuzzy Logic have a number of deficiencies and limitations. Some of the reported deficiencies of ANN are (Petrus et al., 1995): • • • •

There is no general framework to design the appropriate network for a specific task. The number of hidden layers and hidden neurons of the network architecture are determined mostly by trial and error. A large number of parameters are frequently required to fit a good network structure. ANN uses pre-defined activation functions without considering the properties of the phenomena being modeled.

2

SPE 168111

•

ANN is usually trapped in the local optima. This results in its instability when executed several times over the same data and operating conditions.

Fuzzy Logic, especially Type-2, has been reported to have the following deficiencies (Jang, 1993; Mendel, 2003) as well: • • • • •

The three-dimensional nature of type-2 fuzzy sets makes them very difficult to draw. There is no simple collection of well-defined terms that let us effectively communicate about type-2 fuzzy sets. Derivations of the formulas for the union, intersection, and complement of type-2 fuzzy sets all rely on using Zadeh’s extension principle, which in itself is a difficult concept. Using type-2 fuzzy sets is computationally more complicated than using type-1 fuzzy sets. Since it is rule-based, the number of rules grows exponentially with increased dimensionality of input data.

Various studies to address the problems of ANN through the development of other algorithms such as Cascade Correlation Neural Networks and Radial Basis Function Neural Networks did not improve its overall performance (Bruen and Yang, 2005). Later, the reality of the No-Free-Lunch theorem (Wolpert and Macready, 1995; Wolpert, 1996, Wolpert and Macready, 1997; Wolpert, 2001) in machine learning paradigm brought the interest of researchers to the concept of hybrid machine learning. Even the hybrid techniques, like the individual CI techniques, could only solve one hypothesis at a time. However, the petroleum characterization problem is so complex and full of uncertainties that the existence of diversities of expert opinions that lead to diverse hypotheses need a more cooperative and broad-based solution. A solution that will be able to incorporate and integrate existing diversities of opinion to solve the complex problems. In the light of this, though the performance of hybrid CI models has been reported to be better than that of each of its components (Abe, 2004; Mohsen et al., 2007; Anifowose et al., 2011), they are not robust enough to solve the complex and complicated problems in the petroleum industry. Most recently, the ensemble machine learning methodology became widely applied in other fields but has not been adequately applied in oil and gas reservoir characterization. The ensemble learning paradigm, an imitation of the human social learning behavior of seeking several opinions before making a decision (Re and Valentini, 2010), is the most recent Computational Intelligence (CI) tool for designing a "mixture of experts". It has proved to be relevant in solving most challenging industrial problems and has become popular and reportedly successful in applications outside the petroleum industry (Nanni and Lumini, 2009; Zaier et al., 2009; Sun and Li, 2012). Its superior performance over the conventional method of learning individual techniques has been confirmed when applied to classification and regression problems. The ensemble learning paradigm is an advancement in the supervised machine learning technology. Though the ensemble learning paradigm has gained much ground with classification problems in many fields, it is still a new technology whose great benefit is still waiting to be tapped in the petroleum industry. In view of this, petroleum engineers need to embrace this technology and maximize its utility in the modeling and prediction of petroleum reservoir properties. Ensemble models are especially suitable for the petroleum industry where data are usually scarce and the few available ones are noisy. Reservoir characterization is an essential process in the petroleum industry in which various properties of petroleum reservoirs are determined. This, in turn, helps in more successful reservoir exploration, production and management. As the quest for increased performance of predictive models in reservoir characterization continues to increase, the ensemble methodology offers a great potential for better and more robust predictive models. A marginal increase in the prediction accuracy of these petroleum properties is capable of improving the efficiency of exploration and production of petroleum resources with less time and effort. A lot of data is generated and acquired in the petroleum industry due to the proliferation of various sensor-based logging tools such as Wireline, Logging-While-Drilling, Measurement-While-Drilling, and seismic measurements of increasing dimensions. Due to the high dimensionality that may be involved in the data acquired through these systems, the ensemble methodology is most ideal for extracting useful knowledge out of them without compromising expert opinions and model performance. For those outside the facilities that may not have access to these voluminous data, the ensemble methodology is still the ideal technique to manage the little data that may be available to them. The ensemble learning methodology is ideal for handling both cases of too much data and too little data (Polikar, 2009). Ensemble models have the capability to combine different architectures of their base models, diverse data sampling methodologies, different recommended best input features, and various optimized parameters obtained from different experts in the optimization of estimates and predictions of petroleum reservoirs properties. The main objectives of this paper are: • • •

To give an overview of the ensemble technology for better understanding and possible application among petroleum engineers. To establish premises for the application of the ensemble learning paradigm in petroleum engineering. To suggest identified cases of possibilities for its continued application in the petroleum industry.

SPE 168111

3

Towards achieving these objectives, an overview of the ensemble learning methodology is given, a review of successful applications of this learning methodology is presented, premises for the ensemble learning application in petroleum reservoir characterization are established, various ensemble possibilities for petroleum reservoir characterization are discussed, and a distinction is made between the ensemble machine learning and Ensemble Kalman Filter. Overview of Ensemble Learning Methodology Ensemble learning is the methodology by which diverse multiple expert hypotheses are strategically incorporated and smartly combined to solve a problem. The idea of combining the "opinions" of different "experts" (Rokach, 2009) to obtain an overall "ensemble" decision is justified by its association with human behavior where the judgment of a committee is believed to be superior to those of individuals, provided the individuals have reasonable competence (Re and Valentini, 2010). The ensemble method was initially introduced for classification and clustering purposes (Caragea et al., 2007) and later extended and applied to time series prediction problems (Landassuri-Moreno and Bullinaria, 2009). The ensemble learning paradigm is basically used to improve the performance of a model by utilizing the best instance of expert knowledge while reducing the possibility of an unfortunate emergence of a poor decision. This resembles the way we solve problems as humans using the committee system of making decisions. The ensemble methodology makes the selection of such candidate models (representing the best hypotheses) more confident, less risky and unbiased. A generalized flowchart for the ensemble paradigm is shown in Figure 1. As a methodology originally proposed for classification and clustering problems, it was successfully implemented in the Adaptive Boosting technique and later extended for regression problems in the form of Bootstrap Aggregate method. The traditional ensemble method is the Bootstrap Aggregate commonly shortened as bagging (Breiman, 1996). Bagging involves training each of the ensemble base learners on a subset that is randomly drawn from the training data with replacement while giving each data sample equal weight (Liang et al., 2011). This method trains a set of weak learners and combines their outputs using any of the algebraic combination rules such as Max(), Min(), Mean(), etc. The traditional combination rule in regression problems is the Mean() which is equivalent to the Mode() rule in classification problems. The first implementation of the bagging method is found in Random Forest, an ensemble model made up of several instances of Decision Tree models (Breiman, 2001). This technique begins with building a single Decision Tree (DT) model and then creating more instances of this Tree by taking random samples (with replacement) of a portion of the dataset to train each Tree and the remaining portion for testing. This goes on for each Tree until the minimum node count is reached in order to avoid overfitting that comes with larger number of Trees. Thus, the collection of these Tree models makes the Forest. The results of all Trees are combined and the averages of all the performance criteria become the performance indices of the ensemble Forest. The major motivation for the ensemble learning paradigm is the statistically sound argument that the paradigm is part of human daily lives: We ask the opinions of several experts before making a decision: we seek the opinions of several doctors before accepting a medical procedure; we read user reviews before choosing a web hosting service provider; we evaluate reports of referees before taking employees; manuscripts are reviewed by experts before accepting or rejecting them; etc. In each case, the primary objective is to minimize the error that is associated with the final decision that was made by combining the individual decisions of each respondent (Polikar, 2009). The ensemble learning methodology can be applied on regression problems such as in the prediction of porosity, permeability, optimal well placement, history matching, diagenesis, wellbore stability, pressure, temperature, volume, etc. For classification problems, the ensemble learning concept can be applied in the identification of lithofacies, detection of drilling problems, identification of hydraulic flow units, etc. For more details about ensemble combination rules and measures of model diversity, readers are referred to Rokach (2009); Re and Valentini (2010); Breiman (1996; 2001); Polikar (2006; 2009 2012); Lofstrom et al. (2008); Minku et al. (2010); and Wang and Yao (2013). The Bagging Method The bagging method for regression problems works by giving the contribution of each base learner in the ensemble model an equal weight. In order to improve model variance, bagging trains each model in the ensemble using a subset that was randomly drawn from the training set with replacement. The results from the base learners are then averaged over all the base learners to obtain the overall result of the ensemble model. The main concept of using the bagging method to increase the prediction accuracy of ensembles is similar to reducing a high-variance noise using a moving average filter that averages each sample of the data over all available samples. The noise component will be averaged out while the information content of the entire data is unaffected by the averaging operation (Polikar, 2012). When the prediction errors made on the data samples are averaged out, the error of the overall output is reduced. Prediction errors are composed of two controllable components: the accuracy of the model (bias); and the precision of the model when trained on different training sets (variance). Therefore, since averaging has a smoothing (variancereducing) effect, the goal of the bagging-based ensemble systems is to create several classifiers with relatively fixed (or similar) bias and then use the averaging combination rule on the individual outputs to reduce the variance. This is the statistical justification for the bagging method.

4

SPE 168111

The bagging methodology is implemented using the following procedure: Set N to the number of desired iterations. Set T to the desired percentage of data for bootstrapped training data. Do for n = 1 to N Randomly extract T % of the data for training Use the training data to train the SVM model, Sn Use the remaining (100-T)% test data to predict the target variables Keep the result of the above as Hypothesis, Hn Continue Compute the average of all Hypotheses, Hfinal (x) = arg !"# ! µμ! x using the Mean() rule: µμ! x =

! !

! ! H!

(x)

The Random Forest Technique The Random Forest technique is an implementation of the bagging method in Classification And Regression Trees (CART) commonly called Decision Trees. Its counterpart for classification is called Boosting and was implemented in Adaboost technique. Random Forest is an ensemble learning-based technique that consists of a bagging of un-pruned Decision Tree learners (Breiman, 1996) with a randomized selection of input data samples and predictors. The algorithm (Breiman, 2001) is based on the bagging technique developed by Breiman (1996) and the randomized feature selection developed by Ho (1995; 1998). More details about Decision Trees can be found in Sherrod (2008) and cases of successful application can be found in Park et al. (2010) and Leibovici et al. (2011). Random Forest has been shown to be effective and accurate (Caruana, 2008). The algorithm of Random Forest is presented as follows: Starting with a tree: a. Set N = number of training cases. b. Set M = number of features. c. Select a subset m of input variables such that m