hot method prediction using support vector machines

2 downloads 0 Views 328KB Size Report
learning algorithm, the Support Vector Machine (SVM) to identify and predict hot methods in a given ... Keywords: Machine Learning, Support Vector Machines, Hot Methods, Virtual. Machines. ..... /~mstephen/stephenson_phdthesis.pdf , M. W..
Ubiquitous Computing and Communication Journal

HOT METHOD PREDICTION USING SUPPORT VECTOR MACHINES Sandra Johnson ,Dr S Valli Department of Computer Science and Engineering, Anna University, Chennai – 600 025, India. [email protected] , [email protected]

ABSTRACT Runtime hot method detection being an important dynamic compiler optimization parameter, has challenged researchers to explore and refine techniques to address the problem of expensive profiling overhead incurred during the process. Although the recent trend has been toward the application of machine learning heuristics in compiler optimization, its role in identification and prediction of hot methods has been ignored. The aim of this work is to develop a model using the machine learning algorithm, the Support Vector Machine (SVM) to identify and predict hot methods in a given program, to which the best set of optimizations could be applied. When trained with ten static program features, the derived model predicts hot methods with an appreciable 62.57% accuracy. Keywords: Machine Learning, Support Vector Machines, Hot Methods, Virtual Machines.

1

results of the evaluation. Section 7 proposes future work and concludes the paper.

INTRODUCTION

Optimizers depend on profile information to identify hot methods of program segments. The major inadequacy associated with the dynamic optimization technique is the high cost of accurate data profiling via program instrumentation. The major challenge is how to minimize the overhead that includes profile collection, optimization strategy selection and re-optimization. While there is a significant amount of work relating to cost effective and performance efficient machine learning (ML) techniques to tune individual optimization heuristics, relatively little work has been done on the identification and prediction of frequently executed program hot spots using machine learning algorithms so as to target the best set of optimizations. In this study it is proposed to develop a machine learning based predictive model using the Support Vector Machine (SVM) classifier. Ten features have been derived from the chosen domain knowledge, for training and testing the classifiers. The training data set are collected from the SPEC CPU2000 INT and UTDSP benchmark programs. The SVM classifier is trained offline with the training data set and it is used in predicting the hot methods of a program which are not trained. This system is evaluated for the program’s hot method prediction accuracy. This paper is structured as follows. Section 2 discusses related work. Section 3 gives a brief overview of Support Vector Machines. In Section 4 this approach and in section 5 the evaluation methodology is described. Section 6 presents the

Volume 3 Number 4

Page 67

2

RELATED WORK

Machine learning techniques are currently used to automate the construction of well-designed individual optimization heuristics. In addition, the search is on for automatic detection of a program segment for targeted optimization. While no previous work to the best of our knowledge has used ML for predicting program hot spots, this section reviews the research papers which use ML for compiler optimization heuristics. In a recent review of research on the challenges confronting dynamic compiler optimizers, Arnold et al. [1] give a detailed review of adaptive optimizations used in the virtual machine environment. They conclude that feedback-directed optimization techniques are not well used in production systems. Shun Long et al. [3] have used the Instancebased learning algorithm to identify the best transformations for each program. For each optimized program, a database stores the transformations selected, the program features and the resulting speedup. The aim is to apply appropriate transformations when a new program is encountered. Cavazos et al. [4] have applied an offline ML technique to decide whether to inline a method or not. The adaptive system uses online profile data to identify “hot methods” and method calls in the hot methods are in-lined using the ML heuristics.

www.ubicc.org

Ubiquitous Computing and Communication Journal

Cavazos et al. [5, 12] have also used supervised learning to decide on which optimization algorithm to use: either graph coloring or Linear scan for register allocation. They have used three categories of method level features for ML heuristics (i.e.) features of edges of a control flow graph, features related to live intervals and finally, statistical features about the size of a method. Cavazos et al. [11] report that the best of compiler optimizations is method dependent rather than program dependent. Their paper describes how, logistic regression-based machine learning technique trained using only static features of a method, is used to automatically derive a simple predictive model that selects the best set of optimizations for individual methods within a dynamic compiler. They take into consideration the structures of a particular method within a program to develop a sequence of optimization phases. The automatically constructed regression model is shown to out-perform handtuned models. To identify basic blocks for instruction scheduling Cavazos et al. [20] have used supervised learning. Monsifrot et al. [2] have used a decision tree learning algorithm to identify loops for unrolling. Most of the work [4, 5, 11, 12, 20] is implemented and evaluated using Jikes RVM. The authors [8, 19] have used genetic programming to choose an effective priority function which prioritizes the various compiler options available. They have chosen hyper-block formation, register allocation and data pre-fetching for evaluating their optimizations. Agakov et al. [9] have applied machine learning to speed up search-based iterative optimization. The statistical technique of the Principal component analysis (PCA) is used in their work for appropriate program feature selection. The program features collected off-line from a set of training programs are used for learning by the nearest neighbor algorithm. Features are then extracted for a new program and are processed by the PCA before they are classified, using the nearest neighbor algorithm. This reduces the search space to a few good transformations for the new program from the various available sourcelevel transformations. However, this model can be applied only to whole programs. The authors [10] present a machine learningbased model to predict the performance of a modified program using static source code features and features like execution frequencies of basic blocks which are extracted from the profile data collected. As proposed in their approach [9], the authors have used the PCA to reduce the feature set. A linear regression model and an artificial neural network model are used for building the prediction model which is shown to work better than the nonfeature-based predictors. In their work Fursin et al. [14] have used

Volume 3 Number 4

Page 68

machine learning to identify the best procedure clone for the current run of the program. M. Stephenson et al. [18] have used two machine learning algorithms, the nearest neighbor (NN) and Support Vector Machines (SVMs), to predict the loop unroll factor. None of these approaches aims at prediction at the method level. However, machine learning has been widely used in work on branch prediction [21, 22, 23, 24]. 3

SUPPORT VECTOR MACHINES

The SVM [15, 16] classification maps a training data (xi,yi), i = 1,…,n where each instance is a set of feature values xi ∈ Rn and a class label y ∈ {+1,-1}, into a higher-dimensional feature space φ(x) and defines a separating hyperplane. Only two types of data can be separated by the SVM which is a binary classifier. Fig. 1 shows a linear SVM hyperplane separating two classes. The linear separation in the feature space is done using the dot product φ(x).φ(y). Positive definite kernel functions k(x, y) correspond to feature space dot products and are therefore used in the training algorithm instead of the dot product as in Eq. (1):

k ( x, y) = (Φ( x) • y))

Φ(

(1)

The decision function given by the SVM is given in Eq. (2):

f ( x ) = ∑ vi .k ( x, xi ) + b n

(2)

i =1

where b is a bias parameter, x is the training example and vi is the solution to a quadratic optimization problem. The margin of separation extending from the hyperplane gives the solution of the quadratic optimization problem. Optimal Hyperplane

Margin of Separation

Feature Space

Figure 3: Optimal hyperplane and margin of separation 4

HOT METHOD PREDICTION This section briefly describes how machine

www.ubicc.org

Ubiquitous Computing and Communication Journal

learning could be used in developing a model to predict hot methods within a program. A discussion of the approach is followed by the scheme of the SVM-based strategy adopted in this study.

Machine’s (LLVM) [6] bytecode representation of the programs provides the training as well as the test data set. The system architecture for the SVM-based hot method predictive model is shown in Fig.2 and it closely resembles the architecture proposed by the authors C. L. Huang et. al. [26]. Fig. 3 outlines the strategies for building a predictive model. 1.

2.

3.

4.

Create training data set. a. Collect method level features i. Calculate the specified feature for every method in a LLVM bytecode. ii. Store the feature set in a vector. b. Label each method i. Instrument each method in the program with a counter variable [25]. ii. Execute the program and collect the frequency of the execution of each method. iii. Using the profile information, each method is labeled as either hot or cold. iv. Write the label and its corresponding feature vector for every method in a file. c. Steps (a) & (b) are repeated for as many programs as are required for training. Train the predictive model. a. The feature data set is used to train the SVM-based model. b. The predictive model is generated as output. Create test data set. a. Collect method level features. i. Calculate the specified features for every method in a new program. ii. Store the feature set in a vector. iii. Assign the label ‘0’ for each feature vector in a file. Predict the label as either hot or cold for the test data generated in step 3 using the predictive model derived in step 2.

Figure 3: System outline 4.2 Figure 2: System architecture of the SVM-based hot method predictive model 4.1

The approach Static features of each method in a program are collected by offline program analysis. Each of these method level features forms a feature vector which is labeled either hot or cold based on classification by a prior execution of the program. The training data set thus generated is used to train the SVM-based predictive model. Next, the test data set is created by offline program analysis of a newly encountered program. The trained model is used to predict whether a method is hot or cold for the new program. An offline analysis on the Low Level Virtual

Volume 3 Number 4

Page 69

Extracting program features The ‘C’ programs used for training are converted into LLVM bytecodes using the LLVM frontend. Every bytecode file is organized into a single module. Each module contains methods which are either userdefined or pre-defined. Only static features of the user-defined methods are extracted from the bytecode module, for the simple reason that they can be easily collected by an offline program analysis. Table 1 lists the 10 static features that are used to train the classifier. Each feature value of a method is calculated in relation to an identical feature value extracted from the entire bytecode module. The collection of all the feature values for a method constitutes the feature vector xi. This feature vector xi is stored for subsequent labeling. Each feature vector xi is then labeled yi and classified as either hot

www.ubicc.org

Ubiquitous Computing and Communication Journal

(+1) or cold (-1) based on an arbitrary threshold scheme described in the next section. Table 1: static features for identifying hot methods. 1.

Number of loops in a method. Average loop depth of all the loops in the 2. method. 3. Number of top level loops in a method. Number of bytecode level instructions in 4. the method. 5. Number of Call instructions in a method. 6. Number of Load instructions in a method. 7. Number of Store instructions in a method. Number of Branch instructions in the 8. method. 9. Number of Basic Blocks in the method. 10. Number of call sites for each method. 4.3

Extracting method execution frequencies Hot methods are frequently executing program segments. To identify hot and cold methods within a training program, profile information is gathered during execution. The training bytecode modules are instrumented with a counter variable in each userdefined method. This instrumented bytecode module is then executed and the execution frequency of each method is collected. Using this profile information, the top ‘N’ most frequently executed methods are classified as hot. This system keeps the value ‘N’ as the “hot method threshold”. In this scheme of classification, each feature vector (xi) is now labeled yi (+1) and yi (-1) for hot and cold methods respectively. The feature vector (xi) along with its label (yi) is then written into a training dataset file. Similarly, the training data set of the different training programs is accumulated in the file. This file is used as an input to train the predictive model. +1 1:1 2:1 3:1 4:0.880046 5:2.51046 6:0.875912 7:0.634249 8:1.23119 9:1.59314 10:29 -1 1:0 2:0 3:0 4:1.16702 5:1.25523 6:1.0219 7:3.38266 8:1.50479 9:1.83824 10:2 +1 1:2 2:2 3:2 4:1.47312 5:0.83682 6:1.89781 7:1.47992 8:2.59918 9:2.81863 10:3 Figure 4: Sample training data set The general format of a feature vector is yi 1:xi1, 2:xi2, 3:xi3, …….j:xij where the labels 1, 2, 3,.... , j are the feature numbers and xi1, xi2, ...., xij are their corresponding feature values. Fig. 4 shows a sample of three feature vectors from the training dataset collected for the userdefined methods found in the SPEC benchmark program. The first feature vector in Fig. 4 is a hot method and is labeled +1. The values of the ten features are serially listed for example ‘1’ is the value of feature 1 and ‘29’ of 10. The value ‘1’ of

Volume 3 Number 4

Page 70

feature 1 indicates the percent of loops found in the method. The “hot method threshold” used being 50%, 4 out of the 8 most frequently executed methods in a program are designated as hot methods. The first element in each vector is the label yi (+1 or -1). Each element of the feature vector indicates the feature number followed by the feature values. 4.4

Creating test data set When a new program is encountered, the test data set is collected in a way similar to the training data set, except that the label is specified as zero. 0 1:1 2:1 3:1 4:1.13098 5:2.91262 6:2.05479 7:1.09091 8:1.34875 9:1.55172 10:34 0 1:0 2:0 3:0 4:0.552341 5:0.970874 6:1.14155 7:0.363636 8:0.385356 9:0.862069 10:4 0 1:1 2:1 3:1 4:1.26249 5:0 6:2.51142 7:2.90909 8:1.15607 9:1.2069 10:40 Figure 5: Sample test data set 4.5

Training and prediction using SVM Using the training data set file as input , the machine learning algorithm SVM is trained with default parameters (C-SVM, C=1, radial base function). Once trained the predictive model is generated as output. The derived model is used to predict the label for each feature vector in the test data set file. The training and prediction are done offline. Subsequently, the new program used for creating test data set is instrumented. Executing this instrumented program provides the most frequently executed methods. The prediction accuracy of the system is evaluated by comparing the predicted output with the actual profile values. 5

EVALUATION

5.1

Method Prediction accuracy is defined as the ratio of events correctly predicted to all the events encountered. This prediction accuracy is of two types: hot method prediction accuracy and total prediction accuracy. Hot method prediction accuracy is the ratio of correct hot method predictions to the actual number of hot methods in a program, whereas total prediction accuracy is the ratio of correct predictions (either hot or cold) to the total number of methods in a program. Hot method prediction accuracy is evaluated at three hot method threshold levels: 50%, 40% and 30%. The leave-one-out cross-validation method is used in evaluating this system. This is a standard machine learning technique where ‘n’ benchmark programs are used iteratively for evaluation. One out of the ‘n’ programs is used for testing and the rest ‘n1’ programs are used for training the model. This is repeated for all the ‘n’ programs in the benchmark

www.ubicc.org

Ubiquitous Computing and Communication Journal

suite. Total Method Prediction Accuracy

5.2

Benchmarks Two benchmark suites, SPEC CPU2000 INT [17] and UTDSP [13] have been used for training and prediction. UTDSP is a C benchmark and SPEC CPU2000 INT has C and C++ benchmarks. Evaluation of the system is based on only the C programs of either benchmark. The model trained from the ‘n-1’ benchmark programs in the suite is used to predict the hot methods in the missed out benchmark program. Tools and platform The system is implemented in the Low Level Virtual Machine (LLVM) version 1.6 [6]. LLVM is an open source compiler infrastructure that supports compile time, link time, run time and idle time optimizations. The results are evaluated on an Intel (R) Pentium (R) D with 2.80 GHz and 480MB of RAM running Fedora Core 4. This system uses the libSVM tool [7]. It is a simple library for Support Vector Machines written in C.

Hot M e thod Thr e s holds

Hot M ethod Thr e s holds

50%

40%

30%

Prediction Accuracy %

120 100 80 60 40

30%

Prediction Accuracy %

100 80 60 40 20

SPEC CPU2000 INT

Figure 7: Total prediction accuracy on the SPEC CPU2000 INT benchmark The total method prediction accuracy on the SPEC CPU2000 INT and UTDSP benchmark suites is shown in Fig. 7 and 9. The total method prediction accuracy for all C programs on the SPEC CPU2000 INT varies from 36 % to 100 % with an average of 68.43%, 71.14% and 71.14% for the three hot method thresholds respectively. This averages to 70.24%. The average prediction accuracies obtained on the UTDSP benchmark suite are 69%, 71% and 58% respectively for 50%, 40% and 30% hot method thresholds. This averages to 66%. Overall the system predicts both hot and cold methods in a program with 68.15% accuracy. 7

CONCLUSION AND FUTURE WORK

20 0

SPEC CPU2000 INT

Figure 6: Hot method prediction accuracy on the SPEC CPU2000 INT benchmark 6

40%

0

5.3

Hot Method Prediction Accuracy

50%

120

RESULTS

Fig. 6 shows the prediction accuracy of the trained model on the SPEC CPU2000 INT benchmark program at three different hot method thresholds: 50%, 40% and 30%. The hot method prediction accuracy for all C programs on the benchmark is found to vary from 0 % to 100 % with an average of 57.86 %, 51.43% and 39.14% for the three hot method thresholds respectively. This averages to 49.48% on the SPEC CPU2000 INT benchmark suite. Similarly, on the UTDSP benchmark suite, in a 0% to 100% range, the hot method prediction accuracy averages for the three thresholds are 84%, 81% and 62% respectively. This averages to 76% on the UTDSP benchmark suite. Overall, this new system can obtain 62.57% hot method prediction accuracy.

Volume 3 Number 4

Page 71

Optimizers depend on profile information to identify the hot methods of program segments. The major inadequacy associated with the dynamic optimization technique is the high cost of accurate data profiling via program instrumentation. In this work, a method is worked out to identify hot methods in a program using the machine learning algorithm, the SVM. According to our study, with a set of ten static features used in training the system, the derived model predicts total methods within a program with 68.15% accuracy and hot methods with 62.57% accuracy. However, hot method prediction is of greater value because optimizations will be more effective in these methods. Future work in this area is aimed at improving the prediction accuracy of the system by identifying more effective static and dynamic features of a program. Further research in this system can be extended to enhance it to a dynamic hot method prediction system which can be used by dynamic optimizers. Applying this approach, the prediction accuracy of the other machine learning algorithms can be evaluated to build additional models.

www.ubicc.org

Ubiquitous Computing and Communication Journal

Hot Method Prediction Accuracy Prediction Accuracy %

Hot Me thod Thr e s holds

50%

40%

30%

100 80 60 40 20 0

UTDSP benchmark

Figure 8: Hot method prediction accuracy on the UTDSP benchmark. Total Prediction Accuracy Hot M e thod Thre s holds

50%

40%

30%

Prediction Accuracy %

100 80 60 40 20 0

UTDSP benchmark

Figure 9: Total Prediction Accuracy on the UTDSP benchmark.

8

REFERENCES

[1] Matthew Arnold, Stephen Fink, David Grove, Michael Hind, and Peter F. Sweeney: A Survey of Adaptive Optimization in Virtual Machines, Proceedings of the IEEE, pp. 449-466, February 2005. [2] A. Monsifrot, F. Bodin, and R. Quiniou: A machine learning approach to automatic production of compiler heuristics, In Proceedings of the International Conference on Artificial Intelligence: Methodology, Systems, Applications, LNCS 2443, pp. 41-50, 2002. [3] S. Long and M. O'Boyle: Adaptive java optimization using instance-based learning, In ACM International Conference on Supercomputing (ICS'04), pp. 237-246, June 2004. [4] John Cavazos and Michael F.P. O'Boyle: Automatic Tuning of Inlining Heuristics, 11th International Workshop on Compilers for Parallel Computers (CPC 2006), January 2006. [5] John Cavazos, J. Eliot B. Moss, and Michael F.P. O'Boyle: Hybrid Optimizations: Which Optimization Algorithm to Use?, 15th

Volume 3 Number 4

Page 72

International Conference on Compiler Construction (CC 2006), 2006. [6] C. Lattner and V. Adve: LLVM: A compilation framework for lifelong program analysis & transformation, In Proceedings of the 2004 International Symposium on Code Generation and Optimization (CGO’04), March 2004. [7] Chih-Chung Chang and Chih-Jen Lin: LIBSVM : a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. [8] M. Stephenson, S. Amarasinghe, M. Martin, and U. M. O’Reilly: Meta optimization: Improving compiler heuristics with machine learning, In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’03), pp. 77–90, June 2003. [9] F Agakov, E Bonilla, J Cavazos, G Fursin, B Franke, M.F.P. O'Boyle, M Toussant, J Thomson, C Williams: Using machine learning to focus iterative optimization, In Proceedings of the International Symposium on Code Generation and Optimization (CGO), pp. 295305, 2006. [10] Christophe Dubach, John Cavazos, Björn

www.ubicc.org

Ubiquitous Computing and Communication Journal

Franke, Grigori Fursin, Michael O'Boyle and Oliver Temam: Fast compiler optimization evaluation via code-features based performance predictor, In Proceedings of the ACM International Conference on Computing Frontiers, May 2007. [11] John Cavazos, Michael O'Boyle: MethodSpecific Dynamic Compilation using Logistic Regression, ACM Conference on ObjectOriented Programming, Systems, Languages, and Applications (OOPSLA), Portland, Oregon, October 22-26, 2006. [12] John Cavazos: Automatically Constructing Compiler Optimization Heuristics using Supervised Learning, Ph.D thesis, Dept. of Computer Science, University of Massachusetts, 2004. [13] C. Lee: UTDSP benchmark suite. In http://www.eecg.toronto.edu/~corinna/DSP/infr astructure/UTDSP.html, 1998. [14] G. Fursin, C. Miranda, S. Pop, A. Cohen, and O. Temam: Practical run-time adaptation with procedure cloning to enable continuous collective compilation, In Proceedings of the 5th GCC Developer’s Summit, Ottawa, Canada, July 2007. [15] Vapnik, V.N.: The support vector method of function estimation, In Generalization in Neural Network and Machine Learning, Springer-Verlag, pp.239-268, 1999. [16] S. Kotsiantis: Supervised Machine Learning: A Review of Classification Techniques, Informatica Journal 31, pp. 249-268, 2007. Standard Performance Evaluation [17] The Corporation. http://www.specbench.org. [18] M. Stephenson and S.P. Amarasinghe: Predicting unroll factors using supervised classification, In Proceedings of International Symposium on Code Generation and Optimization (CGO), pp. 123-134, 2005. [19] www.cag.csail.mit.edu

Volume 3 Number 4

Page 73

/~mstephen/stephenson_phdthesis.pdf , M. W. Stephenson, Automating the Construction of Compiler Heuristics Using Machine Learning, PhD thesis, MIT, USA, 2006. [20] J. Cavazos and J. Moss: Inducing heuristics to decide whether to schedule, In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2004. [21] B.Calder, D.Grunwald, Michael Jones, D.Lindsay, J.Martin, M.Mozer, and B.Zorn: Evidence-Based Static Branch Prediction Using Machine Learning, In ACM Transactions on Programming Languages and Systems (ToPLaS19), Vol. 19, 1997. [22] Daniel A. Jiménez , Calvin Lin: Neural methods for dynamic branch prediction, ACM Transactions on Computer Systems (TOCS), Vol. 20 n.4, pp.369-397, November 2002. [23] Jeremy Singer, Gavin Brown and Ian Watson: Branch Prediction with Bayesian Networks, In Proceedings of the First Workshop on Statistical and Machine learning approaches applied to Architectures and compilation, pp. 96-112, Jan 2007. [24] Culpepper B., Gondre M.: SVMs for Improved Branch Prediction, University of California, UCDavis, USA, ECS201A Technical Report, 2005. [25] Youfeng Wu, Larus. J. R. : Static branch frequency and program profile analysis, 1994. MICRO-27, Proceedings of the 27th Annual International Symposium on Microarchitecture, pp: 1 – 11, 1994. [26] C.-L. Huang and C.-J. Wang: A GA-based feature selection and parameters optimization for support vector machines, Expert Systems with Applications, Vol. 31, Issue 2, pp: 231240, 2006.

www.ubicc.org

Suggest Documents