AUTOMATIC, TEMPLATE-BASED RUN-TIME SPECIALIZATION ...

40 downloads 30 Views 303KB Size Report
Abstract: Specializing programs with respect to run-time values is an optimization strategy that has been shown to drastically improve code performance on ...
I

IN ST IT UT

DE

E U Q TI A M R

ET

ES M È ST Y S

E N

RE CH ER C H E

R

IN F O

I

S

S IRE O T ÉA AL

A

PUBLICATION INTERNE No 1065

AUTOMATIC, TEMPLATE-BASED RUN-TIME SPECIALIZATION: IMPLEMENTATION AND EXPERIMENTAL STUDY

ISSN 1166-8687

¨ LUKE HORNOF FRANCOIS ¸ NOEL, CHARLES CONSEL, JULIA L. LAWALL

IRISA CAMPUS UNIVERSITAIRE DE BEAULIEU - 35042 RENNES CEDEX - FRANCE

` ´ INSTITUT DE RECHERCHE EN INFORMATIQUE ET SYSTEMES ALEATOIRES Campus de Beaulieu – 35042 Rennes Cedex – France ´ : (33) 02 99 84 71 00 – Fax : (33) 02 99 84 71 71 Tel. http://www.irisa.fr

Automatic, Template-Based Run-Time Specialization: Implementation and Experimental Study Fran cois No el, Luke Hornof Charles Consel, Julia L. Lawall

Theme 2 | Genie logiciel et calcul symbolique Projet LANDE / avant-projet COMPOSE Publication interne n1065 | Novembre 1996 | 23 pages

Abstract: Specializing programs with respect to run-time values is an optimization strategy that has

been shown to drastically improve code performance on realistic programs ranging from operating systems to graphics. Recently, various approaches to specializing code at run-time have been proposed. However, these approaches still su er from shortcomings that limit their applicability: they are either manual, require programs to be written in a dedicated language, or are too expensive to be widely applied. We solve these problems by introducing new techniques to implement run-time specialization. The key to our approach is the use of code templates. Templates are automatically generated from ordinary programs. These templates are compiled and optimized before run time, thus minimizing the time to generate code at run time. Because templates can be compiled by an optimizing compiler, the code generated is of high quality. Experimental results obtained on scienti c and graphics code indicate that our approach is highly effective. Little run-time overhead is introduced since run-time specialization primarily consists of copying instructions. Run-time specialized programs run nearly (80% on average) as fast as fully optimized programs, improving performance up to a factor of 10. The combination of low run-time overhead and high code quality enables specialization to be amortized in as few as 3 runs. Despite the fact that this approach is highly e ective, its implementation is relatively simple since it exploits existing partial evaluation and compiler technologies. Key-words: partial evaluation, program specialization, run-time code generation (Resume : tsvp)

This research is supported in part by France Telecom/SEPT, ARPA grant N00014-94-1-0845 and contract F19628-95-C0193, and NSF grant CCR-92243375.

Centre National de la Recherche Scientifique (UPRESSA 6074) Universit´e de Rennes 1 – Insa de Rennes

Institut National de Recherche en Informatique et en Automatique – unit´e de recherche de Rennes

Automatic, Template-Based Run-Time Specialization: Implementation and Experimental Study

3

1 Introduction Specialization is a program transformation that takes a general program and the context in which it occurs, known as the specialization context, and generates a program specialized with respect to the speci c context. Partial evaluation is a specialization technique that optimizes programs when some of a program's input values are available in the specialization context. More precisely, the program is transformed by performing all of the calculations that depend solely on the known input values. The specialized program contains fewer calculations and therefore should be more ecient than the original program. In this paper we present a partial evaluator that performs specialization at run time, and assess its performance. We begin by motivating the need for specialization at run time. Specialization can be useful in a variety of domains, such as scienti c computing [8], operating systems [28] and graphics [19, 24]. For example, in scienti c computing, matrix multiplication can be specialized with respect to each row of the rst matrix, improving eciency when the rst matrix is sparse without manually introducing a new matrix representation. Similarly, in graphics, an image viewer can be specialized with respect to the number of colors available on the host machine. Traditionally partial evaluation has been carried out at compile time. Compile-time specialization is not, however, always feasible. In some situations context information may change frequently at run time. For example, a scienti c program may perform many matrix multiplications that depend on data that is not known until run time. In other cases the context information is inherently unchanging, but simply not known to the compiler of the source code. For example, only the executable code for an image viewer might be available. Thus, even though the number of colors available on a machine rarely changes, the ability to specialize with respect to this information must be built into the executable code. An attempt has been made to address these issues within the framework of compile-time partial evaluation, by using program transformation. When the set of possible run-time contexts is xed and small, the program can be rewritten to generate all possible specializations at compile time. For example, many machines provide either 8-bit or 24-bit color so specializations could be prepared for these common cases. This rewriting is widely known in the partial evaluation community, and is referred to as \The Trick" [12]. Generally, however, it is not possible to specify all the possible specialization contexts at compile time. Even when all the possible contexts can be predicted, it may not be desirable to generate specializations that are never used. Previous studies show that generating code at compile time for all the contexts of the bitblt graphics routine, for example, would produce over ten thousand times more unused code than used code [24]. Thus we turn our attention to specialization at run time. Run-time specialization is the process of producing specialized code at run time, given a program and a description of the run-time context. Unlike compile-time specialization, which is generally a source-tosource transformation, run-time specialization directly produces machine code. Specialization encompasses two phases. The front end identi es what code can be evaluated at specialization time. The back end produces the specialized program based on this information. Our approach addresses both issues. The code to evaluate at specialization time is inferred automatically from the specialization context before run time, using partial evaluation technology that is amenable to either compile-time or run-time specialization. The specialized code is constructed from templates, code fragments containing holes to ll with the values calculated during specialization. These templates are compiled before run time, using an existing C compiler. At run time, the specialization is constructed from this precompiled code, which introduces little run-time overhead. In contrast to other approaches [1, 15], we make extensive reuse of existing technology.

PI n1065

4

Francois Noel, Luke Hornof Charles Consel, Julia L. Lawall

Concretely, our approach consists of the following steps. A program is rst analyzed with respect to the specialization context. The analysis consists essentially of a binding-time analysis, relying on the result of prior alias and side-e ect analyses. Binding-time analysis identi es program constructs that can be evaluated during specialization, based on known input values available in the specialization context. The results of the binding-time analysis are used to annotate the program with the action to perform for each construct during specialization. An action-annotated program can then be specialized either at compile time or at run time. Compile-time specialization consists of interpreting actions given some specialization values. To perform run-time specialization, the action-annotated program is further analyzed to determine a safe approximation of the possible specializations it may yield. This result is expressed as a tree grammar. This tree grammar is used to generate templates automatically at the source level. Templates correspond to the computations that do not depend solely on specialization values. They are compiled by a standard compiler. Various information on template delimiters and template holes is collected. Finally, a run-time specializer is generated; it consists of the compiled templates, and the static computations intertwined with simple operations to dump templates, ll holes, and relocate jump targets. To generate the specialized program at run time, the run-time specializer is invoked with the specialization values, which returns the address of a procedure which points to the specialized code. This procedure can then be called repeatedly with arbitrary values for the remaining inputs. In previous works, we formally de ned this approach for a subset of an imperative language and proved it correct [4, 22]. This paper describes the techniques used to carry out the implementation, using an existing C compiler. Both gcc and lcc are currently supported. We also demonstrate, based on experimental results, that our run-time specializer can be applied widely, drastically improves performance, and introduces very little overhead. Our approach has been implemented, and is part of our partial evaluator Tempo, which performs both compile-time and run-time specialization. Recently there has been much interest in developing and implementing languages to express what code should be generated at run time [6, 30]. These languages provide some support, such as static typing, to ensure that the code generated at run time is meaningful. Such languages, however, correspond only to the back end of our system: they do not infer how the program should be split into source code and code to be generated at run time. We can, however, easily use our front end to construct code in such languages. Thus, we have additionally integrated the C-like run-time code generation language [6] as the back end of our specializer, and are currently performing experiments. In applying specialization to a large system, there are two issues to consider. This work addresses only the problem of how to specialize code at run time, given the code to specialize and an abstract description of the specialization context. An equally important problem is how to manage the specialization process: what code to specialize, how to safely describe the context, how many specializations to create, how to detect when specialized code should no longer be used, etc. These issues have begun to be investigated [5, 31], but are beyond the scope of this paper. Our approach to run-time specialization, which we review in Section 2, has previously been published [4]. The contributions of this paper can be summarized as follows. 'C

 Section 3 presents the techniques used to implement our run-time specializer. The implementation is

relatively simple since it reuses existing systems: template creation uses part of a traditional compiletime partial evaluator and template compilation is performed by an existing compiler.

Irisa

Automatic, Template-Based Run-Time Specialization: Implementation and Experimental Study

5

 Experimental studies, shown in Section 4, demonstrate that our strategy is widely applicable and the implementation is highly e ective.

{ As our approach is fully automatic and treats the C language, any program can bene t from run-

time specialization. In our experiments, signi cant speedups are obtained for existing, publicly available programs written by others. { Our run-time specializer is drastically faster than other automatic approaches. Our process is fully compiled: the operations needed to specialize code are simple, and no interpretation or auxiliary tables are necessary. { An additional study is made to assess the quality of the code generated by assembling binary templates at run time, by comparing it with a fully optimized specialized program. This is possible due to the fact that Tempo specializes programs at run time and at compile time. Results show that our run-time specialized programs are nearly (80% on average) as ecient as fully optimized specialized versions. These measurements demonstrate that the additional overhead incurred by deferring optimizations to run time, as advocated by others [1, 15], may not pay o in many cases.

Section 5 discusses related work, while Section 6 gives concluding remarks and outlines future directions.

2 Our Approach We begin with the basic concepts of partial evaluation and its extension to perform run-time specialization.

2.1 Overview of Partial Evaluation

The inputs to a partial evaluator are a source program and a specialization context. The source program can be a complete program, or simply a module. The specialization context speci es the values of global variables and entry-point parameters with respect to which the program should be specialized. It may also contain alias information and describe the behavior of external functions. These issues are beyond the scope of this paper, but are treated in the implementation of Tempo. Tempo is an oine partial evaluator. Oine partial evaluation is divided into an analysis phase followed by a transformation phase (specialization). The analysis phase takes as input the source program and an abstraction of the specialization context. Based on this information it annotates program constructs to identify those that can be simpli ed during specialization. The annotated program with the actual specialization context are then sent to the specializer. The specializer produces the specialized program: a C program in the case of compiler-time specialization and machine code in the case of run-time specialization. We will use the program shown in Figure 1 to illustrate this section. The program computes the dot product of two vectors of length size.

2.2 Analysis Phase

The inputs to the analysis phase are the source program and an abstraction of the specialization context. The abstract specialization context speci es which global variables and entry point parameters are known

PI n1065

6

Francois Noel, Luke Hornof Charles Consel, Julia L. Lawall

f

D

)

S S S ; iS

Suggest Documents