Automatic Recommendation of Compiler Options Elana Granston*
Anne Holler*
Texas Instruments Incorporated P.O. Box 1443, M/S 730 Houston, Texas 77251-1443
[email protected]
Omnishift Technologies nd 451 El Camino Real, 2 Floor Santa Clara, California, 95050
[email protected]
ABSTRACT While many optimizations can yield substantial performance improvements under the right circumstances, these same optimizations may cause significant performance degradations or cause other problems under the wrong circumstances. The problem of determining which optimizations to apply is generally relegated to the over-burdened compiler user, who must wade through a daunting set of cryptic documentation to find the right set of compiler switches to use to achieve good performance. In this paper, we address the question of whether this process can be automated. To do so, we developed an automatic options recommender called Dr. Options which attempts to recommend the best options to achieve high performance when using Hewlett-Packard’s PA-RISC optimizing compilers. We have found that Dr. Options recommendations can be substantially better than those which typical users might come up with on their own. Moreover, Dr. Options has proven itself, at the very least, to be a useful consultant for even our most expert in house performance analysts. In a few cases, Dr. Options even beat the experts!
1. INTRODUCTION While many optimizations can yield substantial performance improvements under the right circumstances, these same optimizations may cause significant performance degradations or cause other problems under the wrong circumstances. The problem of determining which optimizations to apply is generally relegated to the over-burdened compiler user, who must wade through a daunting set of cryptic documentation to find the right set of compiler switches to use to achieve good performance. In this paper, we address the question of whether this process can be automated. In particular, how good can To answer this automatic recommendations be? question, we developed an automatic options recommender called Dr. Options. Dr. Options makes application-specific recommendations for use with Hewlett-Packard’s PA-RISC compilers [Hewl95, Holle96,Kane96]. Dr. Options uses information from three sources: the user, the compiler and the profiler. User-supplied information includes that which a compiler cannot determine automatically, such as application type or optimization goals and constraints. Compiler-supplied information is gathered when users compile their applications with their current options and +Oadvise. Profile information includes function-level profile data. Only the compiler-supplied information is required. *
To test the quality of Dr. Options recommendations, we compare the performance of the SPEC 95 benchmark suite using Dr. Options recommendations against the performance of these same benchmarks compiled with -O (to represent a typical user), as well as with our base and peak SPEC options (to represent the expert case). We present results which demonstrate that Dr. Options recommendations can be substantially better than those which typical unassisted users might select. Moreover, Dr. Options has proven itself, at the very least, to be a useful consultant for even our most expert in-house performance analysts. Section 2 presents the reasoning behind the seemingly overwelming choice of compiler options. Section 3 presents an overview of the Dr. Options tool. Section 4 describes the recommendation process that Dr. Options uses. Section 5 presents our experimental results. Section 6 describes a feature of Dr. Options that helps users to understand the basis for the tool’s recommendations. Section 7 addresses the problem of testing compiler options and describes a companion tool that helps overcome some of these problems. Section 8 discusses related work. Section 9 revisits the original question: how good can automatic recommendations be? 2. WHY SO MANY OPTIONS? Industry-wide, one of the most commonly used compiler options is –O. In addition to basic block optimizations, this option typically enables a variety of inter-block optimizations such as global register allocation and induction variable elimination. Because this option is so heavily used, it only triggers optimizations which are always safe (assuming legal source code or at least common programming practices) and virtually always profitable. Moreover, -O typically precludes optimizations which might consume excessive compile time or memory resources.
This work was supported by the California Language Lab at Hewlett-Packard and is covered by U.S. patents 5,960,202 and 5,966,538.
Figure 1: Dr. Options Framework Within the HP PA-RISC compiler, -O is equivalent to +O2. This option enables intra-procedural optimization only. Optimizations levels +O3 and +O4 enable interprocedural optimizations (e.g., constant propagation, inlining) within and across modules, respectively. +O3 and +O4 also enable high-level loop transformations. Additional options provide the user with fine-grain control over optimizations such as data prefetching (+Odataprefetch) and selecting between dynamic and static branch predication +Ostaticprediction), which optimizations are highly profitable for some applications but engender noticeable performance degradations for others. Other optimizations can only be safely applied to code that meet certain conventions. For example, +Ocachepadcommon tells the compiler that FORTRAN COMMON blocks conform to the FORTRAN 77 standard. This frequently violated standard requires that all declarations for any given common block agree on the starting and ending points for all data structures declared within. +Ocachepadcommon enables the compiler to pad data structures within COMMON blocks so that the data structures exhibit better cache behavior. Unfortunately, it is not possible to describe all the PARISC performance-related compiler options in this paper. The interested reader is referred to [Burc00] for more detail. One might ask whether all these options are really necessary? As can be seen from the various computer
manufacturers’ SPEC performance disclosures, expertlyselected, performance-related options can significantly improve program performance on virtually every manufacturer’s system.
3. DR. OPTIONS Dr. Options can serve as the expert in this daunting options selection process. There are six steps to using Dr. Options: Step 1
Provide user-supplied information (optional).
Step 2
Compile application from scratch with current options and +Oadvise to generate compilersupplied information.
Step 3
Execute instrumented application from Step 2 to generate profile information (optional).
Step 4
Invoke Dr. Options to recommend options based on information gathered during Steps 1-3.
Step 5
Build with new options (without modifying build environment, if desired).
Step 6
Execute with new options.
Figure 1 presents these steps pictorially. The sources of information gathered during Steps 1-3 are described below.
•
User-supplied information: The purpose of usersupplied information is to provide Dr. Options with application-specific information that cannot be obtained by analyzing the code, such as the application type and whether the other two sources of input (compiler-supplied and optionally profile information) will be provided for the entire application or just a subset of the modules. User-supplied information also includes user-preferences such as the set of systems on which the user would like to execute the binary, as well as the user’s floatingpoint tolerance, compile-time tolerance and failure tolerance. The user can also supply other information when known, for example, regarding programming styles that are extra safe or extra sloppy (i.e., in violation of programming language standards), thereby affecting the safety of certain optimizations, such as the padding of FORTRAN COMMON data structures discussed in Section 2. While all user-supplied information is optional, certain information such as application type and user preferences is recommended. The user can set some, all or none of the parameters. The remaining parameters assume reasonable default values based on those that have been set (if any) and the defaults built into the tool. For example, if the user sets a conservative failure tolerance, other user preferences will assume conservative defaults as well.
•
Compiler-supplied information: This information forms the basis of Dr. Options recommendations. It is computed at various points during program compilation when the +Oadvise flag is added to the set of normal options. The instrumented compiler collects basic information on language, function and module sizes, and the current set of options being used. It also collects more sophisticated information, such as cases where inter-procedural analysis might (or might not) help, characteristics of loops and data access patterns within individual functions and modules, and information on whether particular optimizations are kicking in.
•
Profile Information: When available, Dr. Options also exploits function-level profile information. The profile information includes dynamic execution time percentages. It also includes the dynamic number of calls between each caller/callee pair. Profile information is used to determine the relative importance of various functions and modules, which in turn is used to determine whether an option that appears profitable for a particular function or
module might be significant for the application as a whole. Dr. Options can provide either a single set of recommendations that are tailored for the program as a whole, or it can provide recommendations for each individual module. The latter possibility allows the user to optimize hot modules aggressively and cold modules less aggressively (or not at all) to save on compile- time and reduce the risk of problems that can arise when code does not strictly adhere to programming language standards.
4. RECOMMENDING OPTIONS To determine the rules for recommending options, we interviewed PA-RISC performance tuning experts. We then attempted to duplicate their performance tuning effort in a deterministic fashion. This task was somewhat challenging because traditionally the options selection process is an iterative process, while we limited ourselves to making the best recommendations possible, while relying at most on a single compilation of the application (from the user’s perspective) and at most one execution of the application. Eventually, despite all the non-determinism that goes into the tuning process, we were able to derive a reasonable set of deterministic rules. Through our own experiments and a few results from the literature, we were able to refine these rules even further. Dr.Options makes its recommendations on the basis of the three sources of information described in the last section. It first computes the best set of options for each function in isolation. Then it determines the best set of options for each module. Then it computes the best set of options for the overall application. Options are selected on the basis of safety and profitability estimates. Dr. Options attempts to minimize both the set of options that it recommends, as well as the set of modules that must be compiled with these options. Since compiler options can be applied at the granularity of individual modules, Dr. Options makes recommendations for each module individually and/or recommends a single options set for the entire application. In the remainder of this section, we present a few examples of the option recommendation rules which Dr. Options uses.
4.1 +Onolimit By default, the compiler tries to achieve a balance between run time and compile time and to ensure that it does not run out of memory. Occasionally, this can cause the compiler not to attempt certain optimizations. Turning on +Onolimit directs the compiler to ignore compile time and resource constraints. However, it is preferable only to enable this option for performance-
critical modules where potentially significant optimizations would otherwise be skipped. When determining whether to recommend +Onolimit, Dr. Options first determines the functions which would benefit from enabling this option. Then it computes the set of modules for which it would recommend this option. Then Dr. Options determines whether to recommend +Onolimit for the entire application. The information gathered during compile-time allows Dr. Options to identify candidate functions where potentially significant optimization opportunities are being skipped. Profile data, if available, allows Dr. Options to restrict the candidate set to functions that are performance-critical. Dr. Options also checks that the recommended optimization level is at least +O2 (equivalently, -O), since +Onolimit has no effect below this level. (Note: Dr. Options always selects the optimization level to recommend before considering any other options.) Because of the risks associated with this particular option, Dr. Options also considers the user’s failure and compile-time tolerances and whether the options the user is currently using already include +Onolimit. Obviously, if a module’s current options include +Onolimit, this option is then trivially safe for that module and thus has no failure risk. Otherwise, failure tolerance must be at least MEDIUM to enable this option. The choice of MEDIUM is based on an assessment of risks and recovery effort. The user’s compile-time tolerance is either LOW, MEDIUM, or HIGH. LOW implies that reducing compile time should be a primary goal. MEDIUM implies that compile time is not a problem. However, controlling compile time should be a secondary goal to increasing performance. HIGH implies that there are no compiletime constraints. Consequently, Dr. Options does not recommend +Onolimit when compile-time tolerance is LOW, regardless of whether +Onolimit is included in the user’s current options. When compile time tolerance is MEDIUM or HIGH, Dr. Options continues to recommend +Onolimit wherever it is currently being recommended, assuming that the user had a reason for enabling it. (The reason for this is that compilersupplied information tells Dr. Options whether optimizations are being skipped when this option is not enabled, but does not tell Dr. Options whether optimizations would have been skipped if this option had not been enabled.) If Dr. options recommends +Onolimit for a given function, it also recommends +Onolimit for the module containing that function and for the application as a whole. Below is the algorithm for determining whether to recommend +Onolimit:
/* check that user’s tolerances are high enough to consider +Onolimit */ if (COMPILETIMETOLERANCE == LOW or == LOW ) FAILURETOLERANCE then return end if APPNOLIMITCANDIDATE = FALSE for each module m MODULENOLIMITCANDIDATE(m) = FALSE for each function f ∈ module m /* turn on +Onolimit for function? */ FUNCTIONNOLIMITCANDIDATE(m,f) = FALSE if (optimizations skipped when compiling f and (PROFILEDATA and HOT(m,f) or not PROFILEDATA)) or +Onolimit ∈ CURRENTOPTIONS(m)) then FUNCTIONNOLIMITCANDIDATE(m,f) = TRUE MODULENOLIMITCANDIDATE(m) = TRUE APPNOLIMITCANDIDATE = TRUE end if if (FUNCTIONOPTLEVEL(m,f) ≥ +Ο2 and FUNCTIONNOLIMITCANDIDATE(m,f)) then FUNCTIONOPTIONS(m,f) ← +Onolimit end if end for each function /* turn on +Onolimit for module? */ if (MODULEOPTLEVEL(m) ≥ +Ο2 and MODULENOLIMITCANDIDATE(m)) then MODULEOPTIONS(m) ← +Onolimit end if end for each module /* turn on +Onolimit for application? */ if (APPOPTLEVEL ≥ +O2 and APPNOLIMITCANDIDATE) then APPOPTIONS ← +Onolimit end if
4.2 +Odataprefetch As the name implies, the +Odataprefetch option enables the generation of data prefetching instructions. This option was first introduced within the PA-RISC family for the PA-8000 processor [Hunt95]. In our experiments on the PA-8000, we have seen some dramatic improvements from enabling this option -- up to 100% performance improvement. In general, the performance degradation from unnecessarily enabling this option is comparatively insignificant. However, under a few circumstances, we have seen performance drop as much
as 25%. (A detailed analysis of these observations is beyond the scope of this paper, but the interested reader is referred to [SaGH97].) The recommendations made by Dr. Options attempt to avoid introducing such circumstances. While data prefetching has performance implications, the safety and compile-time risks associated with this option are usually low. Therefore, Dr. Options first checks that there are prefetching opportunities in the application. Dr. Options uses two simple heuritics for doing this. The first relies on information collected by the compiler as to whether there are loops that manipulate arrays in a regular fashion. The second heuristic is based on application type, which must be SCIENTIFIC or MULTI_MEDIA. The heuristics are based on experts’ experience and results from the literature [SaGH97, ZuFL95]. If profile information is available, Dr. Options then checks that these opportunities arise in performance critical functions. As in determining whether to recommend +Onolimit, Dr. Options first determines which functions which would benefit from data prefetching. Then it computes the set of modules for which it would recommend prefetching. Then it determines whether to recommend prefetching for the entire application. Dr. Options also ensures that it only recommends prefetching when it is recommending at least optimization level +O2, since data prefetching is only performed at optimization level +O2 and above. Our experiments thus far have shown that our heuristics are sufficiently good at catching promising cases that Dr. Options does not consider whether the option is currently being used.
4.3 +Ostaticprediction Beginning with the PA-8000, branch prediction can be either static or dynamic. By default, branch prediction is done dynamically, using history information from the branch prediction cache (BPC). This works well as long as the working set of branches fits into the BPC without an excessive number of conflict or capacity misses. When the working set does not fit well into the BPC, it is better to use static branch prediction. While +Ostaticprediction has performance implications, it has neither the safety nor compile-time risks of +Onolimit. Based on the tuning experiences of one of our local experts, Dr. Options uses a heuristic based on size and type. For example, large DATABASE applications are generally good candidates. Since +Ostaticprediction is only valid if the optimization level is at least +O2, profile-based optimization (PBO) is enabled, and PA8000-based systems are targeted, Dr. Options checks for these conditions as well. Because PBO is one of our more expensive optimizations with several other
restrictions, Dr. Options first determines whether to recommend PBO and then determines whether to recommend +Ostaticprediction. Dr. Options also checks whether the application is currently being compiled with +Ostaticprediction. If so, the user is assumed to have had a reason for enabling this option. If the requirements of +O2, PBO and PA-8000 are met, the option is retained. So far, this simple heuristic has worked well in practice. In the future, if we incorporate instruction-level profile information so that we can detect mispredicted branches, then we may experiment with more sophisticated heuristics.
4.4 Selective Optimization There are several reasons compiler users might not wish to compile their entire application at +O4 (the highest optimization level). First, users are often hesitant to aggressively optimize more than necessary. Their most common fear is introducing problems that only show up at run time, especially if these problems show up after they release their applications to their customers or base important scientific or business decisions on faulty application results. As mentioned earlier, there are several source code errors that may only be exposed under optimization. Second, interprocedural analysis and optimization, the biggest benefit of employing +O4, uses significantly more memory resources than lower optimization levels. The memory requirements can be excessive for large applications. Third, users have compile-time constraints. For example, developers of huge applications often have the requirement of being able to rebuild overnight with production-level options. This constraint could also preclude using +O4 for their entire application. For all of these reasons, it makes sense to selectively optimize large applications where most of the execution time is spent in a small number of functions. Based on dynamic call graph profile information, Dr. Options can recommend +O4 for modules containing cross-module calls which are hot in the call graph profile so that aggressive interprocedural optimization will be applied across those modules. For the rest of the significant modules (significant defined by percentage time spent), Dr. Options would then recommend +O3 or +O2, depending on whether the module would benefit from high-level loop transformations or intra-module interprocedural optimizations, both of which are only performed at +O3 and above. The selection of significant modules is based on function-level dynamic executiontime profile information. The selection of +O3 versus +O2 is based primarily on compiler-supplied
information. For insignificant modules (where virtually no time is spent), Dr. Options would recommend +O0. Other optimizations would be similarly selectively applied. This selective optimization process requires profile information. Although there are default cutoffs built into Dr. Options, the user can override the percentage to compile at +O4 and the percentage to drop to +O0. The user can express the first percentage in terms of lines of code, routines or call sites. The second percentage can be expressed in terms of lines of code, routines or execution time. Clearly, selective optimization should only be applied when users have high confidence in their profile data. Hence, it is an optional feature of Dr. Options.
5. EXPERIMENTAL RESULTS To test the quality of Dr. Options recommendations, we compare the performance of the SPEC 95 benchmark suite using Dr. Options recommendations against the performance of this same suite compiled with +O2 to represent a typical user. We also compare against the performance of these same benchmarks compiled with our base and peak SPEC options to represent the expert case. SPEC 95 is composed of two sets of benchmarks: SPECint95 and SPECfp95, representing integer and floating point applications, respectively. Base performance numbers for SPECint95 are generated by compiling every benchmark in SPECint95 with the same options. Base performance numbers for SPECfp95 are similarly generated. In contrast, peak performance numbers are generated by selecting a potentially different set of options for each individual benchmark. Base and peak spec options are generally selected by a company’s foremost performance tuning experts. For our experiments, we used Dr. Options’ applicationlevel recommendations for each benchmark, since SPEC 95 rules require that the same set of options be used to compile each module within a given benchmark. The results of our experiments are presented in Figure 2, which shows the speedup for each case relative to +O2. For each benchmark, we provided Dr. Options with all three input sources: user-supplied information, compilersupplied information and profile information. Within the user-supplied information, we noted the application types (in this case, they were SCIENTIFIC, MULTI_MEDIA, DATABASE, UTILITY or OTHER). Since we wished Dr. Options to be aggressive, we also designated that we wished to build a complete executable as opposed to a shared library and that it was ok to inline library routines. We also noted that, in each case, the complete source code was provided. Last, we noted that all but two
of the benchmarks, namely 145.fpppp and 146.wave5, conform to the FORTRAN 77 COMMON block standard. As can be seen in Figure 2, on average, the performance of the benchmarks with Dr. Options recommendations was about midway between that achieved with SPEC base and peak options recommendations and substantially higher than that achieved when compiling with +O2. This was true for both the SPECfp95 benchmarks and the SPECint95 benchmarks. In general, the best options for SPECfp95 were relatively similar, except for the recommendation for +Ocachepadcommon which was the primary differentiater between base and peak performance. There was much more variety in the best options for SPECint95, where Dr. Options relied heavily on compiler-supplied information and application type. In general, the options recommended by Dr. Options were both simpler and safer to use than our base or peak options. The reader may note that the information which we supplied to Dr. Options regarding COMMON blocks is information about application internals that the typical user may very well not know, especially if the user is working with a large application that was developed by many engineers over a long period of time. If this information had not been provided, +Ocachepadcommon would not have been recommended and the results for the Dr. Options versions of SPECfp95 would have been closer to our base options, affecting 102.swim most noticeably. SPECint95 performance would not have been affected since none of the integer benchmarks are written in FORTRAN. In the long term, COMMON block information is an example of the type of information that we would like to gather automatically. However, even without the +Ocachepadcommon option, Dr. Options recommendations still achieve our goal of leading to performance that is substantially higher than a typical user might achieve by simply compiling with +O2. Note that in two cases, 125.turb3d and 134.perl, the versions compiled with Dr. Options recommendations actually exceeded our peak recommendations. As a result of these experiments, we modified our peak options for both of these two benchmarks, yielding a 6% improvement in 125.turb3d and a 3% improvement in 134.perl. Although Dr. Options recommendations were better than the (former) peak options for both of these benchmarks, we found that the best set of new peak options was actually a combination of Dr. Options’ recommendations and the (former) peak recommendations. Therefore, from an expert perspective, Dr. Options was useful as a consultant, but there was still some manual intervention required.
Speedup Relative to +O2
1
. 01
to
m
0
0.5
1
1.5
2
2.5
3
3.5
ca
10
tv
2.
sw
or
c u2
SPEC 95
i d u 5 d p g id ps pl ve gr o2 b3 pp Av a p r r . a p a m f d 5 u 1 . t w s 9 0 7. 5. hy 5. 6. 14 3. fp 11 4. 10 14 C 12 14 10 0 E 1 SP
im
cc im ss go .g re 9. ks 6 p 9 8 0 8 12 om .m .c 4 9 12 12
Figure 2: Dr. Options Performance
0 13
.li
x g rl eg pe rte Av . i. jp o 4 5 v 2 t9 7. 13 13 in 14 C E SP
+O2 Base Dr. Options Peak
6. EXPLAINING RECOMMENDATIONS While the recommendations provided by Dr. Options provide a useful starting point, users may want to know why? If users are provided with the reasons for Dr. Options’ recommendations, they can learn about the characteristics of their applications, more easily tweak the recommendations if they so choose, and verify that the information that they supplied is what they intended. Therefore, Dr. Options optionally provides explanations for all of its recommendations in terms of the input sources (e.g., user-supplied, compiler-supplied and/or profile-based) on which the particular recommendation was based. For the case of recommendations based at least partially on user-supplied information, the particular parameter and its value are included. For example, consider the case of 132.ijpeg, a multimedia application for which Dr. Options might recommend +Odataprefetch. If profile information is available, Dr. Options would explain that the recommendation was based on compile-time analysis and profile data. If no profile data is available, but the user informed Dr. Options of 132.ijpeg’s application type, Dr. Options would explain to the user that the recommendation for +Odataprefetch was based on both compile-time analysis and the MULTI_MEDIA application type. Even as the developers of the tool, we found this feature to be very useful. Occasionally, we found that the recommendations were not what we expected. The explanations allowed us to quickly determine whether the cause was a mistake in the user-supplied information or whether the application had characteristics of which we were unaware. Prior to adding this feature, we needed to use a debugger to uncover the reason for a particular recommendation.
7. TESTING COMPILER OPTIONS We have found that many compiler users have very complicated build environments that are difficult to understand, let alone modify. Often default options are overridden in obscure places. In many cases, users are not even sure which options they are currently using. These problems pose a significant hurdle toward convincing users to try out new options. To address these problems, we developed the Log-andOverride wizard as a companion tool to Dr. Options. The Log-and-Override wizard allows users to log their current options and/or override them with other options. It currently works as a wrapper to the compiler and linker tools. In the long run, it could be incorporated directly into our compilers.
When a user wants to log options, the user activates the tool in log mode and then does a normal (complete) build. During the build, the tool logs the user’s current options in a format that is readable by both the user and the Log-and-Override wizard. The user can then modify the log file to include a new set of options that the user would like the compiler to use either instead of or in addition to the current options. Then the user can activate the Log-and-Override wizard in override mode. When the user rebuilds the application, the tool will use the options from the log file either instead of or in addition to the current options, as the user directs. These options can be specified on a module-per-module basis. Dr. Options optionally outputs its recommendations in the format required by the Log-and-Override wizard so that users can test out the new options without modifying their build environments. Originally, the Log-andOverride wizard was intended to be a part of Dr. Options. However, since it turned out to be useful for other purposes as well, we implemented it as a separate companion tool.
8. RELATED WORK Many researchers have looked at pieces of the options (or optimization) selection problem. For example, Santhanam et al. [SaGH97] studied the benefits of data prefetching for floating-point applications on the PA8000. Zucker, Flynn and Lee [ZuFL95] determined that prefetching is beneficial for multi-media applications. McKinley [McKi92] has looked at the problem of automatically determining whether to parallelize a loop based on a loop cost model and training sets. In contrast, our goal was to take research results such as these as well as the tuning experiences of our experts and tackle the general problem of options recommendation. Other researchers have focussed on developing tools that aid the performance tuning process in other ways. Code Coach, which is part of Intel’s VtuneTM performancetuning suite [Inte] interactively suggests source code modifications. Suggestions can be restricted to hot spots identified using Vtune profiling tools. Rice’s Parascope editor [HHK+91] allows users to interact with the compiler during optimization to select transformations and make assertions about their codes. There also exist performance analysis tools such as the interactive Cache Visualization Tool (CVT) [DKTG97] which allows users to visualize the cache performance of small sections of code in minute detail, to tie the cache action back to the source code, and to compare performance of transformed and untransformed source. However, the CVT does not suggest improvements. Meanwhile both Parascope and the CVT target users with a high degree of compiler and/or architecture knowledge. All three (Vtune,
Parascope and the CVT) target users who are willing to invest substantial time and effort to improve small sections of hot code. In contrast, Dr. Options and the Log-and-Override wizzard target users of varying backgrounds seeking to improve overall application performance, with minimal user intervention. Texas Instruments’ Profile-Based Compiler [TeIn] and Greenhills’ CodeBalance [Gree] automate the selection of options that affect code size. The user sets the goal (minimum code size at a given performance level or maximum performance at a given code size level) and the tools strive to meet this goal by selecting code-size related options on a function-by-function basis, based on profile data from multiple executions of the application. The remainder of the options are selected manually by the user. While both these tools allow the user to reduce code size with minimum performance degradation, they cannot be used to improve performance. Thus, all of the strategies presented in this section complement each other in the overall application optimization process.
9. CONCLUSIONS The goal of the work presented in this paper is to study the feasibility of automatically recommending compiler options, in particular to determine how good can automatic recommendations be. To address this question, we developed the Dr. Options tool for automatically recommending compiler options based on user- supplied information, compiler-supplied information and profile information. From our experience, we believe that automatic recommendations can be substantially better than those which unsophisticated compiler users would select on their own. Additionally, Dr. Options allows users to describe their applications, their personal tuning preferences, and their priorities in the terms that the users think of them, as opposed to tediously hunting through documentation for cryptically-described options and then trying to determine which of these are relevant to their situation. Even for the most expert compiler users, Dr. Options can be useful as a consultant. Recall that it already served as a useful consultant for helping us improve our peak SPEC 95 options. Dr. Options is also useful for doing some of the more tedious aspects of performance analysis on a function-by-function basis, which can result in significant time savings. When combined with the Log-and-Override wizard, which allows users to test options without modifying their build environments, we believe that Dr. Options can allow all levels of users to achieve better compiler performance with significantly less effort. In the future, we would like to continue fine-tuning Dr. Options’ recommendations tuning efforts and extend Dr.
Options to use assembly-level profile information to more closely pinpoint hot memory accesses and mispredicted branches. Although Dr. Options’ rules were devised from experiences with a large variety of applications, we have focussed primarily on SPEC 95 for our initial proof of concept experiments. The motivation for doing this is that we wanted to compare performance with Dr. Options recommendations against the best manual tuning efforts. However, in the future, we would like to do more experiments with larger applications. In the future, we would also like to add analyses to further reduce our reliance on user-supplied information. In particular, we would like to obtain information automatically on the use of extra-safe or extra-sloppy programming styles that enable or prevent certain optimizations, respectively. We would also like to add a GUI interface to the user supplied information, with novice and expert levels, to guide users through the relevant set of parameters. At the writing of this paper, Dr. Options and its companion tool are still in the preliminary prototype phase. Overall, however, we were very pleased with our initial results. Our goal was simply to raise the level of the user beyond +O2. We were pleasantly surprised to find that, even with our relatively simple first set of rules, the performance achieved using Dr. Options’ recommendations was competitive with the performance achieved using base and peak SPEC options. Moreover, the options recommended by Dr. Options were often simpler and safer to use. Although the rules used by Dr. Options are PA-RISC specific, it should be possible to develop rules for other compilers and platforms that lead to equally good recommendations.
ACKNOWLEDGEMENTS This work would not have been possible without the input of the PA-RISC optimizer team, the performance delivery team and the technical consulting lab. We would like to thank the many members of these teams who shared their performance tuning experience and expertise with us, especially Carl Burch, Eddie Gornish, Wei-Chung Hsu, Ricky Benitez and Pam Taylor. Additional thanks go to Sridhar Ramakrishnan for his support of the project and to Carol Thompson and Rick Hank for proofreading this paper.
REFERENCES [Burc00]
[DKTG97]
“Performance Tuning with PA-RISC Compilers”, Interworks 2000. http://devresource.hp.com/STK/partner/per f_tuning.pdf. van der Deijl, Eric, Gerco Kanbier, Olivier Temam and Elana Granston, “A Cache
[Gree]
[Hewl95]
[HHK+91]
[Holl96]
[Hunt95]
Visualization Tool”, IEEE Computer, 30(7), July 1997. Greenhills Software, Inc., CodeBalance, http://www.ghs.com/products/CodeBalance .html. HP PA-RISC Compiler Optimization Technology White Paper, HP Part No. 5963-7250E, March 1995. Hall, M. W., T. Harvey, K. Kennedy, N. McIntosh, K. S. McKinley, J. D. Oldham, M. Paleczny and G. Roth, Experiences using the Parascope Editor, CRPCTR91173, Center for Research on Parallel Computation, Rice University, September 1991. Holler, Anne, “Optimization for a Superscalar Out-of-Order Machine”, Proceedings of the 29th International Conference on Microarchitecture (Micro 29), pp. 336-348, 1996. Hunt, Doug, “Advanced Performance Features of the 64-bit PA-8000”, COMPCON ‘95 Digest of Papers, pp. 123-128, March 1995.
[Inte] [Kane96]
[McKi92]
[SaGH97]
[TeIn]
[ZuFL95]
Intel, VTuneTM, http://developer.intel.com/ software/products/vtune/codecoach. Kane, Gerry, PA-RISC 2.0 Architecture, ISBN 0-13-182734-0, Prentice-Hall, Englewood Cliffs, New Jersey, 1996. McKinley, Kathryn S., Automatic and Interactive Parallelization, CRPCTR92214, Center for Research on Parallel Computation, Rice University, Houston Texas, April 1992. Santhanam, Vatsa, Edward H. Gornish and Wei-Chung Hsu, “Data Prefetching on the PA8000,” The 24th Annual International Symposium on Computer Architecture, ACM Press, pp. 264-273, 1991. Texas Instruments, Inc, Profile Based Compilation (PBC), http://www.ti.com/sc/c6000compiler and click on Profile Based Compiler. Zucker, Daniel F., Michael J. Flynn and Ruby B. Lee, “A Comparison of Hardware Prefetching Techniques For Multimedia Benchmarks,” Stanford Technical Report CSL-TR-95-683, December 1995.