Generalising Techniques for Type Debugging - Semantic Scholar

1 downloads 0 Views 238KB Size Report
contain a mistake and Duggan Dug98] produces type explanations. This paper ..... MTH97] Robin Milner, Mads Tofte, and Robert Harper. The De nition of ...
Generalising Techniques for Type Debugging Bruce J. McAdam Laboratory for Foundations of Computer Science The University of Edinburgh [email protected]

Abstract. Several authors have presented algorithms to help programmers understand the

types in their programs and to help debug type errors. Each of these has supplied a di erent form of information to the programmer | the types of unbound identi ers and suggestions of where in programs mistakes may lie are examples of such information. This paper presents a means of representing type information as graphs from which we can extract a range of di erent facts about the types in programs. The graph representation is presented, and we see how it can be used to simulate some of the schemes proposed by other authors. We conclude that these graphs generalise a number of previously disjoint pieces of work.

1 Introduction Practical programming experience tells us that it is dicult to correct programs (in strongly typed languages such as SML [MTH97] and Haskell [PH98]) which contain type errors. It has been noted that type error messages produced by compilers are often of little help, and can even be misleading [Wan86]. This problem has motivated a number of authors to present means for producing more useful information about untypeable programs. The authors who have presented means for helping programmers have all selected a di erent form of information. For example Bernstein and Stark [BS95] chose to describe the types of unbound identi ers, while Mitchell Wand [Wan86] describes locations in the program which may contain a mistake and Duggan [Dug98] produces type explanations. This paper presents a way of capturing information about the types of parts of a (possibly untypeable) program as a graph. We then examine the three pieces of previous work mentioned above and describe the form of information presented. It is shown that the graphs can be used to derive a number of di erent forms of information about programs. Hence, it is claimed that this work generalises a number of pieces of previous work.

2 Graphs The vertices of the graphs represent types of parts of the program (fragments ), type constructors (with sub-vertices to connect to their argument vertices), and type variables.

De nition 1 (Varieties of vertex).

v ::= f A program fragment j [f0 ]f1 ;f2 :::fn A program fragment tagged by other program fragments j (1 : : : n )ci A type constructor c with arity n and unique identi er i j ci :j The j th connection point of the vertex (1 : : : n )ci j A type variable

A program fragment is a node on the syntax tree, it tells us both the content of the fragment and its location in the program. We add an extra type nullary type constructor, unbound. Vertices for the function type constructor are denoted i . The unique identi ers on the type constructors allow us to have several instances of an type constructor, for example to have an int list and a real list. All vertices should be thought of as representing types.  !



Edges are added between vertices which must represent the same type. Written f , they are labelled by program fragments and cannot go from type constructor or type variable vertices. They can, however, go from the connection points of type constructor vertices. 7!

Graph for

i

Graph for

i

i:i

i:i i:i

! i:i

i:i

i

Fig. 1. Graph for the identity function. Read as ! . The algorithm for generating graphs from programs will appear elsewhere [McA99], it is omitted here because of space constraints. A simple example of graph generation can be seen in Figure 1. To generate the graph for i:i: rst generate the graph for the result expression, i; then add vertices for the function type and the abstraction expression; connect the left connection point to the argument and the right connection point to the result. Both the argument and the result are i. A more complex example involves let expressions, function application and tuples. This can be seen in Figure 2. Note the use of tagged fragment vertices in this graph. The vertex [i:i]I left represents the type which i:i in the de nition must be able to take according to the left hand use of identi er I .  let I = i:i in (I 3; I true) let

I =i:i

in (I 3;I true)

 (I 3; I true) (I 3;I true)



(I 3;I true)

let

I =i:i

 I3

 I left

in (I 3;I true)

 [i:i]I left i:i

let

I3

I3

i:i i:i

3

int

let

I

=

 [i:i]I right

i:i in (I 3; I

I3

 I true I true

! I true

i:i i:i

 true  [i]I right

 [i]I left

3

 I right

in (I 3;I true)

i:i

! I3

I =i:i

(I 3;I true)

true)

true

let

I

=

bool

Fig. 2. Graph for a let expression. The nal type is int  bool.

i:i

in (I 3; I true)

The graph in Figure 2 looks quite complex. This is an indication that it does contain a great deal of information, but also that we require techniques for extracting more manageable amounts of information from it. Graphs can also be generated for any syntactically correct but untypeable program. Figure 3 shows a classic example of a program which cannot be typed in a Hindley-Milner type system. We can see that the program is not typeable because there is a branch in the graph which allows us to reach two distinct types (int and bool) from a single vertex (the left connection point of the lower arrow).  I:(I 3; I true) I:(I 3;I true)

! I:(I 3;I true)

I:(I 3;I true)

 (I 3; I true)

I:(I 3;I true)

(I 3;I true)

 I right   

 I left I3

I true

(I 3;I true)

 I3 I3

 I true I true

 !  I3

I true

3

 true

3

true

int

bool

Fig. 3. Graph for an untypeable program. The problem in Figure 3 is in the type of the function argument I . It can be seen that I is unambiguously intended to be a function but the con ict lies in the type of its argument. Branches in the graph correspond to type error messages of the form \Cannot unify  with  0 ". Graphs can also have cycles which correspond to a failure to unify because of an `occurs check' failure. Fragment vertices corresponding to unbound identi ers are connected to the special unbound type constructor. If such a vertex is also connected to other vertices, it is possible to see what type it should have. This idea is explored in Section 4. The number of vertices in a graph is exponentially proportional to the size of the program it represents. Like in most type inference algorithms, this limit is only reached if programs have let expressions nesting to a depth proportional to their size | a situation which does not arise in practical programming (in which there is typically a constant limit to the nesting depth).

3 Analysis of Graphs Before we go on to see how to perform complex analyses of graphs, we must consider how to extract the basic information of whether the program was correctly typed and, if so, what type does it have. A program is correctly typed if its graph has no cycles, no vertex with edges leading to two di erent type constructor or leaf vertices, and no unbound vertices. The conditions the graph (V; E ) must meet for its generating program to be correctly typed are given in de nition 2.

De nition 2 (Conditions for (V; E ) being the graph for a correctly typed program). { No con icting types: : 9v1 ; v2 2 V : v 7! v1 ^ (@v10 : v1 7! v10 ) ^ v 7! v2 ^ (@v20 : v2 7! v20 ) ^ v1 6= v2

@v 2 V

i.e. there is at most one leaf reachable from any vertex (all type constructor vertices are leaves). + 2 V : v ) v. 0 0 Where v ) v i v 7! v or v0 is a connection point of v. { No unbound identi ers: @i : unboundi 2 V .

{ No cycles: @v

Reading a type from a graph involves a simple search from the vertex representing the complete program. This can only be done with graphs which meet the conditions in De nition 2. The de nition for this is given in De nition 3.

De nition 3 (Reading a type from a graph). When G is a graph for program e (meeting the conditions in De nition 2) then T (G; e) is the type of program e, where T is de ned as follows T (G; ) = T (G; (1 : : : n )ci ) = (T (G; ci :1); : : : ; T (G; ci :n))ci T ((V; E ); v) = T ((V; E ); v0 ) if v 7! v0 2 E T ((V; E ); v) = v if @v0 : v 7! v0 2 E

The general strategy of program analysis described in this paper is as follows: 1. Create the graph for the program. 2. Examine the graph to see if it represents a valid typing according to the conditions in De nition 2. 3. If the typing is valid, read the type of the term using the function in De nition 3. 4. Examine the graph to nd out other information which the programmer requests as in Sections 4.2, 5.2 and 6.2(whether or not the program was correctly typed). The following sections describe a number of analyses for the fourth stage of this sequence. The key point is that these analyses only use the graph: they do not create any other structures such as substitutions and do not traverse the syntax tree, thus it can be said that the graph encodes all the information obtained.

4 Bernstein And Stark | Assumption Environments Bernstein and Stark's system [BS95] is concerned with the types which unbound identi ers must have in order for a program to be well typed. The results of their inference algorithm can also be obtained by reading graphs.

4.1 Summary of Bernstein and Stark's Technique Bernstein and Stark present their debugging system as a set of operational semantics, much like the type semantics of ML but using assumption environments mapping identi ers to sets of types rather than the usual type environments which map identi ers to type schemes. Soundness and completeness theorems show an equivalence between these semantics and Hindley-Milner. Also presented is a deterministic inference algorithm for the semantics. In contrast to inference algorithms for Hindley-Milner semantics, Bernstein and Stark's algorithm produces an assumption environment as a result rather than taking it as an argument. For the open expression (I 3; I true) Bernstein and Stark's algorithm derives the assumption environment I : int ; bool and the type . This tells us that in order for (I 3; I true) to type check, it should be put in a context which binds I to a type scheme which can be specialised to type int and bool , or the instances of I should be replaced by expressions with these types. To use Bernstein and Stark's work as a debugging aid, rst estimate the probable location of the type error using, for example, a conventional error message. Then examine parts of the program such as declarations and arguments to functions, to nd their types and the types of the free identi ers. Bernstein and Stark's contribution is to overcome Milner and Damas's limitation of not letting you look inside expressions to see the types of their subexpressions [DM82]. If there is a con ict in the type of a bound identi er such as in the type of I in I:(I 3; I true) then this system will fail to produce an assumption environment. Hence this system cannot be considered to be a general technique for debugging all type errors by itself. f

f

!

!

gg



!

!

4.2 Extracting Assumption Environments from Graphs The graph for the expression (I 3; I true) is shown in Figure 4. It can be seen that two types can be read for each instance of I : either unbound or a function type. It is fragments like this where Bernstein and Stark help.  (I 3; I true) (I 3;I true)

(I 3;I true)

 I left I

I3

left

!

unbound

3 int

(I 3;I true)

 I3 I3

 I right I

right

unbound

I3

 I true I true

!

I true

I3

3



 true true

bool

Fig. 4. The graph for an open program. To generate an assumption environment, it is rst of all necessary to locate all the vertices representing unbound identi ers. These are easily identi ed by edges going to unbound, labelled by the identi er. The types of these identi ers can then be read as normal, ignoring unbound.

Two algorithms are required to do this. Figure 4 shows search which takes a graph and vertex and returns sets of vertices reachable from this vertex and relevant to working out the type of this vertex: set of type constructor vertices, set of type variable vertices and other vertices without descendants. Figure 5 shows the algorithm type which takes a graph, vertex and set of vertices already seen (initially empty), and returns the type represented by the vertex in the graph.

De nition 4 (Algorithm search). Takes a graph and a root vertex. Returns three sets of vertices reachable from the root: constructors, type variables and leaf vertices.

search(G; (1 ;    ; n )c) = (f(1 ;    ; n )cg; fg; fg) search(G; ) = (fg; f g; fg) search(G; v ) = children(G; v ) = fg

if then ( ; ; v) else let S r = search(G; v0 ) : v0 children(G; v) in S (S cs : (cs; tvs; vs) r ; tvs : (cs; tvs; vs) r ; S vs : (cs; tvs; vs) r ) fg fg f g

f

2

f

2

f

2

f

2

g

g

g

g

De nition 5 (Algorithm type). Takes a graph, vertex and set of vertices already visited (this should initially be empty). Returns the type represented by that vertex, ignoring the unbound type constructor. May terminate prematurely for one of two reasons: the type is cyclic (an `occurs' error) or there is more than one possible type (a type con ict). type(G; v; s) =

if v s then Terminate (cyclic type) else let (cs; vs; tvs) = search(G; v) and cs0 = ci : ci cs c = unbound case (cs0; tvs; vs) of 2

f

j j

^

6

g

( ; ; ) v ( ; v 0 ; ) v ( ; ; ) ( ( 1 : : : n )ci ; ; ) (type(G; ci :1; v s); : : : ; type(G; ci :n; v Terminate (con icting type) fg fg fg

j

2

fg f

fg fg f f 

)

g fg



g

)

0

)

g fg fg

)

f g[

j

f g[

s))ci

)

Applying algorithm type to the vertex representing the entire program in the graph for a correctly typed program will yield the same answer as the de nition in Figure 3. Bernstein and Stark's inference algorithm fails if no assumption environment exists. This is the case if the graph contains any branches representing type con icts, or cycles. Before using type, the graph should be checked for cycles and branches, i.e that it meets the rst two conditions in Figure 2. Providing there are no cycles or branches, type will always succeed. There can be a number of vertices for each identi er: each instance of the vertex will appear, possibly with a range of di erent tags. The set of types for the vertices representing a particular identi er is the set of types required for the assumption environment. Reading the graph in Figure 4 gives the assumption environment I : int ; bool . Implementing Bernstein and Stark's technique using graphs allows us to give the programmer slightly more useful information as it is possible to relate types to particular instances of identi ers (Bernstein and Stark's presentation discards this information). Indeed, it is possible to give the type of any instance of an identi er (even bound) or any other part of the program. It is also safe f

f

!

!

gg

to use algorithm type even if the graph fails any of the conditions in Figure 4. This makes it more

exible than Bernstein and Stark's algorithm. A further extension of this technique would be to generate the assumption fragment of any fragment of the program without having to regenerate the graph. This is in contrast to Bernstein and Stark, who must rerun their inference algorithm on every subexpression of interest to the programmer.

5 Wand's Source of Type Errors Wand's concern [Wan86] is with the location of a mistake. He observed that the location announced in a type error message is rarely the location at which the programmer has made a mistake. This is attributed to the fact that \the type-checker can only report an error when it nds a program fragment that cannot be assigned a type; because of the exibility introduced by polymorphism, the actual error may be deeply embedded in the erroneous fragment". Wand in fact understates the problem: sometimes the fragment can be assigned a type, and the actual error is in a separate fragment earlier in the program.

5.1 Finding the Source of Type Errors Wand presents his system as a modi ed uni cation algorithm to be used in a type inference algorithm such as Milner's W . For the uni cation algorithm, types are represented using structure sharing. A type is a tree with type variables for leaves, an environment also exists to bind type variables to types. Wand's environment is essentially a substitution but it is always passed explicitly and type variables in types are never expanded. The environment also contains reasons. A reason is a set of subexpressions of the program associated with a binding of a type variable to a type. For example the binding I int might have the reason (I 3) . The uni cation algorithm must do two things with reasons. When making a new binding a reason must be added, and when uni cation fails reasons for the failure should be given. A reason for each type is built up as the algorithm traverses the types: each time a type variable inside type  is expanded using the environment, the reason from the environment is added to the reason for  . When a binding from a type variable, , found in type  is made, the reason is taken from the other type, together with the subexpression which caused uni cation to be called. When uni cation fails, the two reasons are returned. The error message produced for the program I:(I 3; I true) is in Figure 5. From it, the programmer can establish that the mistake is likely to be I true or I 3. Wand's system does not consider the possibility that I should be let-bound rather than -bound. Note that no reason is given for the second type, bool, this is because only expressions which cause a call to the uni cation algorithm can appear in reasons, and there is no such expression in true. 7!

f

!

g

Mismatch between int and bool In expression I true Reason for 1st type: f(I 3)g Reason for 2nd type: fg.

Fig. 5. Wand's error message for I:(I 3; I true). The algorithm was implemented in the SPS system, but was abandoned because it was found to produce too much output to be useful to the programmer [Wan99].

5.2 Using Graphs to Find Wand's Source

Reasons appear in graphs as edge labels. As graph generation adds edges it labels them with the fragment of interest, much as Wand adds the current expression of interest to reasons. To nd the probable source of type errors using graphs, rst identify a branch in the graph which gives two distinct types to a vertex. The graph for the program I:(I 3; I true) was shown earlier in Figure 3, In this gure there is such a branch at the left-hand connection point of the bottom . A branch like this occurs in the graph for each uni cation failure in type inference. Having found a branch, we must now nd out which expressions are associated with it. These are the labels of the edges going from it and the program fragment vertices above it in the graph. From Figure 3, from the vertices above the branch we see that the expression being examined when the branch appeared was either I 3 or I true (depending on the order of type inference) and we also see the reasons for the mismatched type I true or I 3. From the graph we, therefore, have slightly di erent information from Wand's. Wand's system su ered from a left-to-right bias which led it to treat I 3 and I true di erently. One is given as the expression being examined, the other as a reason. This bias does not exist in the graph where each may be either the reason or the expression being examined. Despite the di erences between Wand's system and the graph: from the point of view of suggesting the `source of type errors' they provide the same list of candidates. We can also extract more information from the graph. We can see that not only is the mismatch associated with the application expression, but also that it is in the type of identi er I . Programming experience suggests that this information could be more useful than the site at which uni cation was performed. !

6 Duggan's Correct Type Explanations Dominic Duggan has formally de ned the notion of correct type explanation [Dug98]. This section discusses Duggan's de nition and describes how type explanations may be extracted from graphs. This work is currently in progress.

6.1 Correct Type Explanations

A type explanation is a set of expressions, like Wand's reasons. Duggan associates the expressions which build up explanations with constraints of the form 1 = 2 . Each constraint is labelled by a set of expressions which explain how it was obtained. Duggan's constraints are unlike Wand's environments which constrain type variables to be equal to a type, rather than any two types to be equal. A set of constraints can be used to infer what some type is, for example what type some type variable I associated with a -bound identi er I is. The sets of expressions labelling the constraints used to infer a type form the explanation of that type. An example of a constraint set and explanation is in Figure 6. The constraint equates the type variable, e , for each expression, e, with some other type. The explanation is the union of the labels on the constraints used to work out the type for a particular type variable.

{ Constraint set for (i; x):(i x; i 3) f i fi=xg x ! ix ; i fi= g ! i ; f=g intg 3

{ Correct type explanation of x : int

3

3

3

3

fi x; i 3; 3g

Fig. 6. A constraint set and correct type explanation. Duggan de nes two conditions which must be met for an explanation to be correct:

Completeness states that the explanation is large enough, i.e. no constraints with labels outside

the set are required to infer the type. Soundness states that the explanation is not too large, i.e. all of the labels belong to some constraint necessary to infer the type.

If the expression i x was not in an explanation of x : int then the explanation would be incomplete. If the same expression was in an explanation of i : int i3 then the explanation would be unsound. There can be a number of correct type explanations for a given subexpression. For example if i was applied to 2 and 3 then either application (but not both) would be acceptable for a correct explanation of its type. The explanation produced in inference depends on the inference strategy (top down or bottom up) and constraint solver used. !

6.2 Reading Correct Explanations from Graphs Explanations from graphs are formed from the labels on edges. For example, in Figure 7 the explanation of the type of either occurrence of i must involve the corresponding application (i x or i 3)as this is the only edge which can be followed to discover a type. Explanations are built from the labels on the edges forming a path from a program fragment vertex to a type constructor vertex. (x; i):(i x; i 3) (x;i):(i x;i

!

(x;i):(i x;i

3)

3)

(i x; i 3)

(x;i)

(i

(i

(x;i)

x;i

ileft iright i x ix

x

x;i

3)

  

(x;i)

(i;x):(i x;i

(x;i):(i x;i

(x; i)    (x;i)

3)

ix

3)

i

ix

3

3) (i

x;i

i 3

i

3)

3

! i

3

3 3

int

Fig. 7. Graph for (x; i):(i x; i 3). An algorithm for generating correct type explanations is currently in development. The main diculty is that the paths to be followed are non-directed (you can traverse up an edge as well as down). For example the correct type explanation of the type of x is formed from the path (x;i):::: x i x i 3 3 3 int rather than the path x int. [



7!

7!

[

7 Other Forms of Information Most of the other systems proposed to help programmers are similar to Wand's and Duggan's. Beaven and Stansifer [BS93] produce explanations consisting of lists of subexpressions with an accompanying commentary. This is likely to be a more useful form to present to the user unless there becomes too much information for the user to absorb. Johnson and Walz [JW86] presented a modi ed uni cation algorithm which could recover from failure by picking the most probable solution. For example if f is applied to 3, 2 and true, then it is more likely that it is supposed to be applied to integers. This information is performed by a ow analysis of graphs representing types and substitutions. It would be possible to apply this approach to the graphs used in this paper. It can be seen from the graphs shown that other forms of information may be derived from them. For example in Figure 3 a set of possible types for the function can be read.

8 Conclusions A way of capturing information about the types in programs as a graph has been presented. The graphs can describe both typeable and untypeable programs. From these graphs, we can extract a number of di erent forms of information about programs. Some of these | types of unbound identi ers and probable locations of mistakes | have been proposed previously as means form helping programmers [BS95][Wan86]. The graphs therefore generalise these other forms of information. Work on this representation is continuing. It has yet to be shown whether Duggan's correct type explanations [Dug98] can be extracted from the graphs, and what other useful forms of information can be extracted. Investigations of other forms of information will need to be supported experimentally. All the work described has been implemented for a -with-let calculus.

References [BS93] Mike Beaven and Ryan Stansifer. Explaining type errors in polymorphic languages. ACM Letters on programming languages and systems, 2(1):17{30, March 1993. [BS95] Karen L. Bernstein and Eugene W. Stark. Debugging type errors (full version). Technical report, State University of New York at Stony Brook, Computer Science Department, November 1995. http://www.cs.sunysb.edu/~stark/REPORTS/INDEX.html. [DM82] Luis Damas and Robin Milner. Principal type-schemes for functional programs. In Ninth Annual Symposium on Principles of Programming Languages, pages 207{212. Association of Computing Machinery, 1982. [Dug98] Dominic Duggan. Correct type explanation. In Workshop on ML, pages 49{58. ACM SIGPLAN, 1998. [JW86] Gregory F. Johnson and Janet A. Walz. A maximum- ow approach to anomaly isolation in uni cation-based incremental type-inference. In ACM Symposium on Principles of Programming Languages, number 13, pages 44{57. ACM, ACM Press, 1986. [McA99] Bruce J. McAdam. A data structure for representing type derivations. Technical report, LFCS, 1999. In progress. [MTH97] Robin Milner, Mads Tofte, and Robert Harper. The De nition of Standard ML (revised). MIT Press, 1997. [PH98] Simon Peyton Jones and John Hughes, editors. Report on the Programming Language Haskell 98. http://www.haskell.org/, 1998. [Wan86] Mitchell Wand. Finding the source of type errors. In ACM Symposium on Principles of Programming Languages, number 13, pages 38{43. ACM, ACM Press, 1986. [Wan99] Mitchell Wand. Personal communication, March 1999.