Sep 10, 1996 - fli:Tig1 i n: For simplicity, we deal here only with immutable (or \read-only") records of the sort found in functional programming languages like ...
Typing in object-oriented languages: Achieving expressiveness and safety Kim B. Bruce Williams College September 10, 1996
Abstract
While simple static-typing disciplines exist for object-oriented languages like C++, Java, and Object Pascal, they are often so in exible that programmers are forced to use type casts to get around the restrictions. At the other extreme are languages like Beta and Eiel, which allow more freedom, but require run-time or link-time checking to pick up the type errors that their type systems are unable to detect at compile time. This paper presents a collection of sample programs which illustrate problems with existing type systems, and suggests ways of improving the expressiveness of these systems while retaining static type safety. In particular we will discuss the motivations behind introducing \MyType", \matching", and \match-bounded polymorphism" into these type systems. We also suggest a way of simplifying the resulting type system by replacing subtyping by a type system with a new type construct based on matching. Both systems provide for binary methods, which are often dicult to support properly in statically-typed languages. The intent is to explain why the problems are interesting via the series of sample programs, rather than getting bogged down with pages of type-checking rules and formal proofs. The technical details (including proofs of type safety) are available elsewhere.
Contents
1 Introduction
1.1 Why type checking? : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1.2 Plan of the paper : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
2 Types and Subtypes, Classes and Subclasses
2.1 Types and subtypes : : : : : : : : : : : : : : 2.1.1 Record types : : : : : : : : : : : : : : 2.1.2 Function types : : : : : : : : : : : : : 2.1.3 Types of variables : : : : : : : : : : : 2.1.4 Object types : : : : : : : : : : : : : : 2.2 Classes and Subclasses : : : : : : : : : : : : : 2.3 Dierences between subtypes and subclasses :
: : : : : : :
: : : : : : :
: : : : : : :
This research was partially supported by NSF grant CCR-9424123.
1
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
3 3 5
5
6 7 8 9 11 12 14
2
3 Simple type systems are lacking in exibility
15
4 Toward more exible type systems
19
5 Introducing MyType 6 The matching relation between types
24 26
7 Binary methods complicate subtyping
30
8 Evaluating the use of MyType 9 Combining parametric polymorphism with matching
31 33
3.1 The need to change return types in subclasses : : : : : : : : : : : : : : : : : : : : : : 16 3.2 The need to change parameter and instance variable types in subclasses : : : : : : : 17 3.3 Other typing problems : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 19 4.1 Subtyping of method types in subclasses : : : : : : : : : : : : : : : : : : : : : : : : : 20 4.2 Examples using exible types : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 21
6.1 Type checking classes using matching : : : : : : : : : : : : : : : : : : : : : : : : : : 27 6.2 Matching is necessary in type checking classes : : : : : : : : : : : : : : : : : : : : : : 28 7.1 Subclasses do not generate subtypes : : : : : : : : : : : : : : : : : : : : : : : : : : : 30 7.2 A new de nition of subtyping for object types : : : : : : : : : : : : : : : : : : : : : : 31
9.1 9.2 9.3 9.4
Polymorphism and container classes : : : : : : : : : : : Constraining polymorphism - the failure of subtyping : : Match-bounded polymorphism : : : : : : : : : : : : : : History of matching and match-bounded polymorphism
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
10 Solutions to Eiel's covariant type problems
33 33 34 34
37
10.1 Eiel's system validity check : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 37 10.2 Solving covariance problems with match-bounded polymorphism : : : : : : : : : : : 37 10.3 Meyer's solution: Banning polymorphic catcalls : : : : : : : : : : : : : : : : : : : : : 40
11 Replacing subtyping with matching 11.1 11.2 11.3 11.4
Simplifying matching : : : : : : : : : : : : : : : : : : Replacing subtyping by hash types : : : : : : : : : : Hash types are not compatible with binary methods Evaluation : : : : : : : : : : : : : : : : : : : : : : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
12 Conclusions and related work A Example linked list program using matching
42 43 44 45 46
47 51
List of Figures 1
A record s : fl1 : U1 ; l2 : U2 ; l3 : U3 g, and r : fl1 : T1 ; l2 : T2 ; l3 : T3 ; l4 : T4 g masquerading as an element of type fl1 : U1 ; l2 : U2 ; l3 : U3 g. : : : : : : : : : : : : : : : : : : : :
8
3 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
A function f : Func (R ): U , and f 0 : Func (S ): T masquerading as f . A variable x : ref T , and x 0 : ref S masquerading as x . : : : : : : : : A Point class. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : A Colorpoint subclass : : : : : : : : : : : : : : : : : : : : : : : : : : Typing DeepClone methods in subclasses. : : : : : : : : : : : : : : : Node class : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Changing types of methods in subclasses. : : : : : : : : : : : : : : : Circle and ColorCircle classes with more exible types. : : : : : : : : Node class with MyType. : : : : : : : : : : : : : : : : : : : : : : : : : Doubly-linked node class. : : : : : : : : : : : : : : : : : : : : : : : : Node class with MyType. : : : : : : : : : : : : : : : : : : : : : : : : : Procedure illustrating that subclasses need not generate subtypes. : List type function. : : : : : : : : : : : : : : : : : : : : : : : : : : : : The type Comparable . : : : : : : : : : : : : : : : : : : : : : : : : : : Binary search tree type functions. : : : : : : : : : : : : : : : : : : : : BinSearchTree classes. : : : : : : : : : : : : : : : : : : : : : : : : : : Polymorphic Circle classes. : : : : : : : : : : : : : : : : : : : : : : : Animal and herbivore classes : : : : : : : : : : : : : : : : : : : : : : Polymorphic animal and herbivore classes : : : : : : : : : : : : : : : A type safe rewrite of breakit using match-bounded polymorphism. :
: : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : :
9 11 12 14 17 18 20 22 25 26 29 30 33 34 35 36 39 40 41 42
1 Introduction The object-oriented paradigm has been adopted by an increasing number of programmers and organizations over the last decade because of its clear advantages in organizing and reusing software components. It would clearly be advantageous to be able to provide static type systems for objectoriented languages that are of the same quality as those available for more standard procedural languages. Unfortunately commercially available object-oriented languages fall far short of that goal. The static type systems of object-oriented languages tend to be either insecure or more in exible than one might desire. In some cases the rigidity of the type system leads programmers to rely on type casts (sometimes checked at run-time, sometimes not) in order to obtain the expressiveness desired. In other cases, the type systems are too exible, requiring the run-time system to generate link-time or run-time checks to ensure the integrity of the computation. In this paper we explore the type-checking systems of object-oriented programming languages, examining problems and suggesting solutions.
1.1 Why type checking?
Every value generated in a program is associated with a type. In a strongly typed language, the language implementation is required to check the types of operands in order to ensure that nonsensical operations, like dividing the integer 5 by the string \hello", are not performed. In a dynamically typed language most operations are type-checked just before they are performed. In a statically typed language, every expression of the language is assigned a type at compile time. If the type system can ensure that the value of each expression has a type compatible with the type of the expression, then type checking of most operations can be moved to compile time.
4 There are many advantages to having a statically type-checked language. These include providing earlier (and usually more accurate) information on programmer errors, providing documentation on the interfaces of components (e.g., procedures, functions, and packages or modules), eliminating the need for run-time type checks, which can slow program execution, and providing extra information that can be used in compiler optimizations. One possible disadvantage of static typing is that because static type checkers are necessarily conservative, a static type checker for a programming language may disallow a program that would in fact execute without error. Thus statically typed programming languages may be less expressive than dynamically typed languages. Procedural languages like Pascal [Wir71], CLU [L+ 81], Modula-2 [Wir85], and Ada 83 [US 80], and functional languages like ML [HMM86] and Haskell [HJW92] have reasonably safe static typing systems. While some of these languages have a few minor holes in the type system (e.g., variant records in Pascal), languages like CLU and Ada provide fairly secure type systems. Moreover, support for polymorphism has been very helpful in increasing the expressiveness in statically typed imperative and functional programming languages like CLU, Ada, ML, and Haskell. In object-oriented programming languages, typing issues are more focussed on whether a message can be sent to a particular object (i.e., whether the receiver has a method which can be executed in response to the message). Nevertheless the basic issues are very similar. However, extra complications arise from the presence of subtyping and the use of a pseudo-variable (usually written as self or this) to stand for the object executing the method. Because of subtyping, actual parameters to methods (or functions or procedures) can be of a type dierent from that speci ed in the declaration of the corresponding formal parameters. Because of inheritance, method bodies which are compiled for one class can be reused in subclasses. We must make sure that these features which support reuse do not cause holes in the typing system. Unfortunately the situation for static type checking in object-oriented languages is not as good as for procedural languages. The following is a list of some properties of type-checking systems of some of the more popular object-oriented languages (or the object-oriented portions of hybrid languages). Some show little or no regard for static typing (e.g., Smalltalk [GR83]). Some have relatively in exible static type systems, which require type casts to overcome de ciencies of the type system. These type casts may be unchecked, as in C++ [Str86] and Object Pascal [Tes85], or checked at run-time, as in Java [AG96]. Some provide mechanisms like \typecase" statements to allow the programmer to instruct the system to check for more re ned types than can be determined by the type system (e.g., Modula-3 [CDG+ 88], Simula-67 [BDMN73], and Beta [KMMPN87]). Some allow \reverse" assignments from superclasses to subclasses, which require run-time checks (e.g., Beta, Eiel [Mey92]). Some require that parameters of methods overridden in subclasses have exactly the same types as in the superclasses (e.g., C++, Java, Object Pascal, and Modula-3), resulting in more in exibility than would be desirable, while others allow too much exibility in changing the types of parameters or instance variables, requiring extra run-time or link-time checks to catch the remaining type errors (e.g., Eiel and Beta).
5 Thus all of these languages either require programmers to program around de ciencies of the type system, require run-time type-checking, or allow run-time type errors to occur. While features like typecase statements and run-time checked casts or reverse assignments may occasionally be necessary to handle dicult problems with heterogeneous data structures, we would prefer to have type systems which allow us to program as naturally as possible, while catching as many type errors as possible at compile time. As we shall see later, many problems arise because of the con ation of type with class and with the mismatch of the inheritance hierarchy with subtyping. Whatever the cause, there appears to be a lot of room for improvement in moving toward a combination of better security and greater expressiveness in the type systems.
1.2 Plan of the paper
In the rest of this paper we discuss the complications that arise in designing static type-checking systems for object-oriented languages, and sketch some ways of avoiding these problems by providing more exible and expressive type systems. Of course we wish to ensure that the resulting systems are type safe. We begin in section 2 by reviewing brie y the de nitions of types, classes, subtypes, and subclasses, and illustrating their uses in object-oriented languages. In section 3 we discuss relatively simple type-checking systems like those in C++, Object Pascal, and Modula-3, in order to see what problems arise with the most obvious type systems. In section 4 we see that we can easily add more
exibility by allowing programmers to replace methods in subclasses by new ones whose types are subtypes of the original. In many cases this still does not provide enough expressiveness for the type system to capture the programmer's intentions. Thus in section 5 we introduce the type expression MyType which is used to provide a exible type for self, the receiver of a message. In the following section we introduce the very important notion of matching, a relation similar to, but distinct from, subtyping. The importance of matching will result from the fact that it is much closer to the inheritance ordering than subtyping is, allowing us to type check methods in such a way that they remain type safe when inherited. We discuss problems that arise with subtyping and so-called binary methods in section 7. We then evaluate what the addition of MyType means for the type system in section 8. In section 9 we introduce a kind of constrained polymorphism called match-bounded polymorphism that allows us to write reusable code that is more exible in handling objects of dierent types. In the following section we show how to use this to provide a solution to Eiel's covariant typing problems. We also brie y discuss Eiel's link-time system validity check and Bertrand Meyer's recent \no polymorphic catcalls" proposal to deal with these problems. In section 11 we step back to look at the constructs we have introduced with the view of creating a simpler system. This reexamination will lead us to consider the rather radical step of designing a system that dispenses with the notion of subtyping. We close with a summary and discuss related work.
2 Types and Subtypes, Classes and Subclasses The notions of type and class are often confounded in object-oriented programming languages. In fact they play quite distinct roles, and can be usefully distinguished from each other. Types provide interface information that determines when certain operations are legal, while classes provide
6 implementation information including the names and initial values of instance variables and the names and bodies of methods. In the rst subsection we discuss types and subtypes, while in the following subsection we discuss the notions of class and subclass.
2.1 Types and subtypes
A type in a programming language represents a set of values and the operations and relations that are applicable to them. For example the Integer type represents the set of whole numbers and the usual integer operations and relations, including =, , +, ?, , div, and mod. An object type representing a point represents a collection of points and the messages that can be sent to point objects. Strongly typed languages provide type-checking mechanisms to ensure that nonsensical operations are not applied to values. Such a language is sometimes said to be type safe. For example a strongly typed language will ensure that two characters are not added together as though they were integers. While these type checks can be provided at run-time, in this paper we will concentrate on static or compile-time type-checking mechanisms. In statically-typed programming languages like Pascal and C, expressions are assigned types by the type checker. A set of typing rules based on the structure of expressions are used to build up a static type for each expression. The type checker, if correct, guarantees that if an expression has static type T , then, when that expression is evaluated at run time, the result will be a value of type T . In particular, then, a static type-checking system determines in which contexts an expression may legitimately occur. In object-oriented languages, the most important operation is sending a message to an object. In this case one goal of type checking is to ensure that inappropriate messages are not sent to objects. In particular if a message is sent to an object, the type system should ensure that the object has a method with the same name, and that the formal parameter and return types are compatible with those of the call. From a typing point of view, then, a message send to an object is similar to extracting a eld from a record in which the eld happens to be a function. Types in imperative or functional programming languages include base types like Integer , Real , Character , etc., as well as operators which can be used to build new types. These operators can be used to build record types, array types, and function and procedure types, among others. Objectoriented languages replace most or all of these complex types with object types. Object types provide information on the names and types of the methods supported by objects of that type. A type S is a subtype of a type T (written S