Precise Inference of Polymorphic Constrained Types. 11. ´S-Transµ. ´Л. ´ ½. Ѕµ℄ ªµ Ы Ш Ъ. Ш Ш. ´Л. ´ ½. Ъ. Ѕµ℄ ªµ. ´S-Readµ. ´Л. ´ ½. Ѕµ℄ ªµ Ы Ш Ц Щ Ц Ш. ´Л.
Precise Inference of Polymorphic Constrained Types Scott F. Smith and Tiejun Wang? Department of Computer Science The Johns Hopkins University Baltimore, MD 21218, USA s ott,wtj s.jhu.edu
f
g
Abstract. This paper develops a precise polymorphic type inference algorithm. Several methods for polymorphic type inference have been developed, including let-polymorphism and flow-based approaches such as Agesen’s Cartesian Product Algorithm (CPA). In this paper we focus on the flow-based variety. There is a class of polymorphic behavior which CPA misses: so-called data polymorphism. In the context of imperative object-oriented languages such as Java, data polymorphism is in fact quite common. This paper develops a new extension to CPA, DCPA, which improves on CPA by accurately and efficiently analyzing data polymorphic programs. We develop DCPA in a type-constraint-based setting, prove its type-soundness, and implement the algorithm for the full Java language to test its feasibility in practice. Our test implementation is used to statically verify the correctness of Java downcasts. Initial benchmark results are given which show the algorithm has considerable promise, both in terms of precision and efficiency.
1 Introduction This paper focuses on expressive polymorphic type inference. The classic form of polymorphic type inference is the let-polymorphism of ML. Let-polymorphism is very useful, but is limited both in expressiveness and efficiency. Its expressiveness is limited in that only let-bound definitions can be polymorphic, and its efficiency is limited in that the types for let-bound definitions are always duplicated even if the definitions are not used polymorphically. In the context of a unification-based inference algorithm, this efficiency issue is not a problem in practice, but when using let-polymorphism over constraint-based type inference, it becomes a more serious issue. Our interest in this paper is to develop a polymorphic type inference algorithm for constrained types [AW93] so efficiency is an important issue. Several expressive flow-based type inference algorithms have been defined which have advantages over let-polymorphism in a constraint context. The cartesian product algorithm (CPA) [Age95, Age96] analyzes programs with parametric polymorphism in an efficient and adaptive manner. CPA detects functions as polymorphic at the point they are invoked by a particular argument: if a function is being applied at two different types, those two different applications are given different copies of the function type (i.e., different contours). If the types are the same, the contour can be shared. ?
Partial funding provided by NSF grants CCR-9619843 and CCR-9988491
2
Scott F. Smith and Tiejun Wang
So, the CPA form of polymorphism differs from let-polymorphism in the location of the 8 introduction and elimination: in let-polymorphism, the 8 is introduced at every let definition, and eliminated at every use of a let definition. In CPA, a 8 is introduced around every single -term, and eliminated at every call site, and additionally with possible sharing of eliminands for efficiency. We will call this form of polymorphism flow-based polymorphism due to the need for flow information to define 8-elimination. nCFA [Shi91] is in the school of flow-based polymorphism, but does not share enough eliminands and thus is too inefficient in practice. In our previous ESOP paper [SW00], we developed a framework in which to express polymorphic constraint-based flow analyses. Compared to working in the flow graph based approach used in other implementations of flow analyses [Age96, GDDC97, PC94], a constraint-based framework has several advantages: using techniques described in [FF97, Pot99], constrained types can be simplified on-the-fly and garbage collection of unreachable constraints can be performed as well, leading to more efficient analyses; and, re-analysis of a function in a different polyvariant context is also realized by instantiation of the function’s constrained type scheme, and does not require re-analysis of the function body. In the context of flow-based polymorphism, there is a class of polymorphic behavior which CPA misses: so-called data polymorphism. The goal of this paper is to improve on CPA in this regard. Data polymorphism arises when a function creates and returns a mutable data structure, different applications of the same function would return different data structures at run-time, and those mutable data structures are modified differently. An anlysis such as CPA might fail to diambiguate those data structures by letting them share the same type, causing a precision loss. Data polymorphism is quite common in object-oriented languages, especially when generic container classes are used. We present results in this paper showing that CPA in fact misses a very large chunk of the polymorphism in average Java programs due to lack of treatment of data polymorphism. One precise algorithm for detecting data polymorphism is the iterative flow analysis (IFA) of Plevyak and Chien [PC94], but this algorithm may require many iteration passes for large programs. We proposed an algorithm in [SW00] without an implementation, but when we subsequently implemented it, it proved too inefficient to be practical. In this paper we develop a novel algorithm, Data-Polymorphic CPA (DCPA), which extends CPA with the ability to precisely analyze data-polymorphic programs. Unlike IFA, the algorithm requires no generational iteration. For each function application, the algorithm on-the-fly detects whether it is CPA-safe, which means that there cannot be any data polymorphism in it. For CPA-safe applications, only CPA splitting is performed, while for CPA-unsafe applications, more contours are generated to prevent the possible precision loss due to data polymorphism. The notion of CPA-safe is defined by several detection predicates which have two important advantages: they never mark a contour as safe when it is not (thus, no accuracy is lost); and, they mark the vast majority of contours as CPA-safe and thus prevent blowup from occuring by making too many contours. We believe DCPA is close to a maximally expressive polymorphic type inference algorithm. The algorithm is proven type-sound using our polyvariant
Precise Inference of Polymorphic Constrained Types
3
flow framework [SW00]. In general, the use of a constraint-based framework was very helpful not only in establishing soundness, but also in formulating our new analysis. Given the subtle efficiency/expressiveness trade-offs, we believe implementation is a critical test, and have completed a prototype implementation of DCPA for the Java language. We use DCPA as a static tool to determine whether Java type-casts will always succeed at run-time. We report some very promising initial results obtained on benchmarks—nearly all casts which could be verified statically as sound are verified by our system. The precision improvement over CPA is shown to be substantial, and our algorithm runs nearly as fast as CPA.
2 A Framework for Polyvariant Flow Analysis This section briefly reviews our framework of polyvariant constrained type inference [SW00], and presents one instantiation of the framework to give CPA. Readers should refer to the above paper for more details. The DCPA algorithm of this paper uses the same type inference rules as presented in this section, and uses the closure framework presented here to establish soundness. 2.1 The Language Definition 2.1 (The language):
e x j n j su
e j if0 e e e j x:e j e e j new j e := e j !e j he; ei j e: j e: j e e =
1
2
;
This is a standard call-by-value lambda calculus extended with reference cells. Execution of a new expression creates a fresh, uninitialized reference cell. We use new because it models the memory creation mode of languages like Java and C++, where uninitialized references are routinely created. Our language is small, but since the focus of the paper is on data polymorphism it includes mutable references. Pairs and references provide the core building blocks of objects and the key problems arising in Java occur here as well. 2.2 The Types Our basis is an Aiken-Wimmers-style constraint system [AW93]; in particular it is most closely derived from the system described in [EST95], which combines constraints and mutable state. Definition 2.2 (Types): The type grammar is as follows.
2 Type 2 TypeVar ImpTypeVar 2 ImpTypeVar t 2 TypeVarSet v 2 ValueType 1 < 2 2 Constraint C 2 ConstraintSet t u
::=
= ::=
:
=
t j v j read t j write j t1 ! t2
P n TypeVar int j 8 t : t ! n C j ref u j 1 2 P! Constraint (
)
(
(
)
)
4
Scott F. Smith and Tiejun Wang
Var)
(
Int)
(
Succ)
(
If0)
(
Abs)
(
Ax t A ` x t n fg (
) =
:
A ` n int n fg A ` e nC A ` su
e int n f < intg [ C A ` e 1 1 n C 1 ; e 2 2 n C 2 ; A ` e 3 3 n C 3 A ` if0 e1 e2 e3 t n f1 < int; 2 < t; 3 < tg [ C1 [ C2 [ C3 A; fx tg ` e n C A ` x:e 8 t : t ! n C n fg where t FreeTypeVar t ! n C FreeTypeVar A A ` e 1 1 n C 1 ; e 2 2 n C 2 A ` e1 e2 t2 n f1 < t1 ! t2 ; 2 < t1 g [ C1 [ C2 :
:
:
:
:
:
:
:
Appl) New)
(
:
(
Read)
(
Write)
(
Seq)
(
)
(
)
:
:
` ` ` ` ` ` `
:
)
(
:
A A A A A A A
:
:
=
(
:
:
:
:
new ref u n fg e nC !e t n f < read tg e 1 1 n C 1 ; e 2 2 n C 2 e1 := e2 2 n f1 < write 2 g [ C1 [ C2 e 1 1 n C 1 ; e 2 2 n C 2 e 1 e 2 2 n C1 [ C2 :
:
:
:
:
:
:
:
:
:
;
:
Fig. 1. Type inference rules
ValueType
are the types for Function uses (call sites) are given type t1 ! t2 . data values. u is the type for a cell whose content has type u. We distinguish . Read and write operations on reference imperative type variables u 2 cells are represented by types t and respectively. Functions are given polymorphic types (8 t : t ! n C ).
ref
ImpTypeVar read write
2.3 The Type Inference Rules We present the type inference rules in Figure 1. A type environment A is a mapping from program variables to type variables. Given a type environment A, the proof system assigns a type to expression e via the type judgment A ` e : n C , where is the type for e, and C is the set of constraints which models the flow paths in e. We abbreviate A ` e : n C as ` e : n C when A is empty. Definition 2.3 (Type inference algorithm): For closed expression e, its inferred type is n C provided ` e : n C .
The (Abs) rule assigns each function a polymorphic type (8 t : t ! n C ). In this rule, FreeTypeVar () is a function that extracts free type variables, t collects all the type variables generated when the inference is applied to the function body, and C collects all the constraints corresponding to the function body. The (New) rule assigns u, with u, the type of the cell content, initially unconstrained. the reference cell type
ref
Precise Inference of Polymorphic Constrained Types
5
For space reasons, the (standard) inference rules for pair and projection expressions are not given. 2.4 Computation of the Closure The inference algorithm applied to program e results in a type judgment ` e : n C . The constraint closure rules appear Figure 2, propagating information via deduction rules on the subtyping constraints.
:
:
8-Elim
)
(
v < t; t < v < 8 t : t ! n C < t1 ! t2 ; v < t1 v < t ; < t2 ; C where Poly 8 t : t ! n C < t1 ! t2 ; v ref u < read t u< t ref u < write < u 1 2 < 1 2 1 < 1 ; 2 < 2 :
Trans)
(
(
)
:
( )
:
(
)
=
(
Read) Write)
(
Pair)
(
:
:
((
(
)
)
:
)
:
:
:
:
:
:
0
0
0
:
0
Fig. 2. Constraint Closure rules
The critical closure rule is ( 8-Elim). The constraint (8 t : t ! n C )