Object-oriented programming languages oopl's provide important support for today's large-scale ... in a system, and hence their degree of polymorphism.
TYPE SYSTEMS FOR OBJECT-ORIENTED PROGRAMMING LANGUAGES
a dissertation submitted to the department of computer science and the committee on graduate studies of stanford university in partial fulfillment of the requirements for the degree of doctor of philosophy
By Kathleen Shanahan Fisher August 1996
c Copyright 1996 by Kathleen Shanahan Fisher All Rights Reserved
ii
I certify that I have read this dissertation and that in my opinion it is fully adequate, in scope and in quality, as a dissertation for the degree of Doctor of Philosophy. John C. Mitchell (Principal Advisor)
I certify that I have read this dissertation and that in my opinion it is fully adequate, in scope and in quality, as a dissertation for the degree of Doctor of Philosophy. Martn Abadi
I certify that I have read this dissertation and that in my opinion it is fully adequate, in scope and in quality, as a dissertation for the degree of Doctor of Philosophy. Vaughan R. Pratt
Approved for the University Committee on Graduate Studies:
iii
Abstract Object-oriented programming languages (oopl's) provide important support for today's large-scale software development projects. Unfortunately, the typing issues arising from the object-oriented features that provide this support are substantially dierent from those that arise in typing procedural languages. Attempts to adapt procedural type systems to object-oriented languages have resulted in languages like Simula, C++, and Object Pascal, which have overly restrictive type systems. Among other things, the rigidity of these systems frequently force programmers to use type casts, which are a notorious source of hard-to- nd bugs. These restrictive type systems also mean that many programming idioms common to untyped oopl's such as Smalltalk are not typeable. One source of this lack of exibility is the con ation of subtyping and inheritance. Brie y, inheritance is an implementation technique in which new object de nitions may be given as incremental modi cations to existing de nitions. Subtyping concerns substitutivity: when can one object safely replace another? By tying subtyping to inheritance, existing oopl's greatly reduce the number of legal substitutions in a system, and hence their degree of polymorphism. Attempts to x this rigidity have resulted in unsound type systems, most notably Eiel's. This thesis develops a sound type system for a model object-oriented language that addresses this lack of exibility. It separates the notions of subtyping and inheritance, producing a more
exible language. It also supports method specialization, which means that the types of methods may be specialized in certain ways during inheritance. The lack of such a mechanism is one of the key sources of type casts in languages like C++ and Object Pascal. The thesis then extends this core object calculus with abstraction primitives that support a class construct similar to the one found in languages such as C++, Eiel, and Java. This formal study explains the link between inheritance and subtyping: object types that include implementation information are a form of abstract type, and the only way to get a subtype of an abstract type is by extension (i.e., by inheritance). The study also suggests that object primitives and encapsulation are orthogonal language features that together produce object-oriented programming. Hence, adding object primitives to a language that already supports encapsulation (such as ML) should be sucient to create an object-oriented language. Formally, the language is presented as an object calculus and a type system with row variables, variance annotations, method absence annotations, and abstract types. The thesis proves type soundness via an operational semantics and an analysis of typing rules. iv
Acknowledgements I would like to thank my advisor John Mitchell for encouraging me to look at formal models of object-oriented programming and for believing in me. His con dence inspired my own. I would also like to acknowledge the other members of my reading committee, Vaughan Pratt and Martn Abadi. While I was doing the research described here, many people provided signi cant moral and emotional support. In particular, I would like to thank my various ocemates, including Ofer Matan, Liz Wolf, and Anna Patterson, for the time they took to listen to my troubles and to show their faith in me. Their support meant the world to me. I would also like to thank Amy Smith and Daphne Koller for being the best of friends through some very dicult times. My family, of course, also provided wonderful support. Finally, I am grateful for the external nancial support I received during my tenure as a Ph.D. student, in particular, a National Science Foundation Graduate Fellowship and a Hertz Foundation Fellowship. Without their generous support, I could not have completed this work.
v
Contents Abstract
iv
Acknowledgements
v
1 Overview
1
1.1 1.2 1.3 1.4
An Object Calculus Adding Subtyping : Adding Classes : : : Road Map : : : : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
2 Introduction to Object-Oriented Concepts
: : : :
: : : :
2.1 Basic Concepts : : : : : : : : : : : : : : : : : : 2.1.1 Dynamic Lookup : : : : : : : : : : : : : 2.1.2 Subtyping : : : : : : : : : : : : : : : : : 2.1.3 Inheritance : : : : : : : : : : : : : : : : 2.1.4 Encapsulation : : : : : : : : : : : : : : : 2.2 ADT's vs. Objects : : : : : : : : : : : : : : : : 2.3 Object-Oriented vs. Conventional Organization 2.4 Advanced Topics : : : : : : : : : : : : : : : : : 2.4.1 Inheritance Is Not Subtyping : : : : : : 2.4.2 Object Types : : : : : : : : : : : : : : : 2.4.3 Method Specialization : : : : : : : : : :
3 Object Calculus
: : : : : : : : : : : : : : :
: : : : : : : : : : : : : : :
: : : : : : : : : : : : : : :
: : : : : : : : : : : : : : :
: : : : : : : : : : : : : : :
: : : : : : : : : : : : : : :
: : : : : : : : : : : : : : :
: : : : : : : : : : : : : : :
: : : : : : : : : : : : : : :
: : : : : : : : : : : : : : :
: : : : : : : : : : : : : : :
: : : : : : : : : : : : : : :
3.1 Method Specialization : : : : : : : : : : : : : : : : : : : : : : : : : : 3.2 Untyped Objects and Object-Based Inheritance : : : : : : : : : : : : 3.2.1 Untyped Calculus of Objects : : : : : : : : : : : : : : : : : : 3.2.2 Examples of Objects, Inheritance, and Method Specialization 3.3 Operational Semantics : : : : : : : : : : : : : : : : : : : : : : : : : : vi
: : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : :
1 3 3 5
7
8 8 10 11 14 16 18 20 20 21 23
25
26 27 27 28 30
3.4 Static Type System : : : : : : : : : 3.4.1 Pro Types and Message Send 3.4.2 Types, Rows, and Kinds : : : 3.4.3 Typing Rules : : : : : : : : : 3.4.4 Example Typing Derivations 3.5 Related Object Models : : : : : : : : 3.6 Summary : : : : : : : : : : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
: : : : : : :
4 Adding Subtypes
4.1 Static Type System : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4.1.1 Subtyping and Delegation : : : : : : : : : : : : : : : : : : : : : : 4.1.2 Adding Subtyping : : : : : : : : : : : : : : : : : : : : : : : : : : 4.1.3 Rows, Types, and Kinds : : : : : : : : : : : : : : : : : : : : : : : 4.1.4 Typing Rules : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4.1.5 Variance Analysis : : : : : : : : : : : : : : : : : : : : : : : : : : 4.1.6 Soundness : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4.2 Examples : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4.2.1 Example Typing Derivation : : : : : : : : : : : : : : : : : : : : : 4.2.2 Subtyping Examples : : : : : : : : : : : : : : : : : : : : : : : : : 4.2.2.1 Subtyping for Code Reuse : : : : : : : : : : : : : : : : 4.2.2.2 Subtyping for Encapsulation : : : : : : : : : : : : : : : 4.3 Related Work: Adding Subtyping to a Language with Object Extension 4.4 Conclusion : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
5 Classes
5.1 Properties of Classes : : : : : : : : : : : : : : : : : : : : : : : : : : 5.2 The Connection between Subtyping and Inheritance : : : : : : : : 5.2.1 Subtyping and Inheritance : : : : : : : : : : : : : : : : : : : 5.2.2 Interface Types : : : : : : : : : : : : : : : : : : : : : : : : : 5.2.3 Implementation Types : : : : : : : : : : : : : : : : : : : : : 5.2.4 Interface Types vs. Implementation Types : : : : : : : : : : 5.2.5 Inheritance Necessary for Subtyping Implementation Types 5.3 Overview of Formal System by Example : : : : : : : : : : : : : : : 5.3.1 Interface Type Expressions : : : : : : : : : : : : : : : : : : 5.3.2 Implementations : : : : : : : : : : : : : : : : : : : : : : : : 5.3.3 Constructor Implementations : : : : : : : : : : : : : : : : : 5.3.4 Class Declarations : : : : : : : : : : : : : : : : : : : : : : : 5.3.5 Class Hierarchies as Nested Abstract Types : : : : : : : : : vii
: : : : : : : : : : : : :
: : : : : : : : : : : : :
: : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
31 31 33 35 36 36 37
38 40 40 41 41 43 44 45 45 45 46 46 47 48 48
49 50 52 52 53 54 56 56 57 57 59 60 60 61
5.4 The Protected Visibility Level : : : : : : : : : : 5.4.1 Protection Levels : : : : : : : : : : : : : : 5.4.2 Point and Color Point Classes : : : : : : : 5.4.3 Interface Type Expressions : : : : : : : : 5.4.4 Constructor Implementations : : : : : : : 5.5 Formal Language and Type System : : : : : : : : 5.5.1 The Language : : : : : : : : : : : : : : : 5.5.2 Operational Semantics : : : : : : : : : : : 5.5.3 Static Type System : : : : : : : : : : : : 5.5.4 Typing Rules : : : : : : : : : : : : : : : : 5.5.5 Type Soundness : : : : : : : : : : : : : : 5.6 Related Work: Modeling Classes : : : : : : : : : 5.6.1 Classes as Records of Pre-Methods : : : : 5.6.2 Example Classes : : : : : : : : : : : : : : 5.6.3 Analysis : : : : : : : : : : : : : : : : : : : 5.6.4 Other Approaches : : : : : : : : : : : : : 5.7 Conclusion : : : : : : : : : : : : : : : : : : : : : 5.7.1 Representation Independence for Classes : 5.7.2 Subtyping and Inheritance : : : : : : : : :
6 Formal System
6.1 Formal System : : : : : : : : : : : : : : : 6.1.1 Subtyping Annotation : : : : : : : 6.1.2 Judgment Forms : : : : : : : : : : 6.1.3 Judgment Shorthands : : : : : : : 6.1.4 Context Access Functions : : : : : 6.1.5 Ordering on Variance Annotations 6.1.6 Operations on Variance Sets : : : : 6.1.7 Generalized Variance : : : : : : : : 6.1.8 Variance Shorthand : : : : : : : : 6.1.9 Variance Substitutions : : : : : : : 6.1.10 Ordering on Kinds : : : : : : : : : 6.2 Typing Rules : : : : : : : : : : : : : : : : 6.2.1 Context Rules : : : : : : : : : : : 6.2.2 Rules for Type Expressions : : : : 6.2.3 Rules for Row Expressions : : : : : 6.2.4 Subtyping Rules for Types : : : : 6.2.5 Subtyping Rules for Rows : : : : : viii
: : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
62 62 63 63 65 65 65 65 65 66 66 67 67 67 68 70 71 72 72
74
74 75 75 75 76 76 76 78 78 78 78 79 79 80 81 83 85
6.2.6 Type and Row Equality Rules : : : : : : : : : : : : : : : : : : : : : : : : : : 6.2.7 Rules for Assigning Types to Terms : : : : : : : : : : : : : : : : : : : : : : :
7 Proofs 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.10 7.11 7.12 7.13 7.14 7.15
Variance Lemmas : : : : : : : : : : : : : : : : : : : Context Lemmas : : : : : : : : : : : : : : : : : : : Type Normal Forms : : : : : : : : : : : : : : : : : Remaining Context Lemmas : : : : : : : : : : : : : Type Substitution Lemmas : : : : : : : : : : : : : Derived Equality Lemmas : : : : : : : : : : : : : : Normal Form Lemma : : : : : : : : : : : : : : : : Properties of Kinding : : : : : : : : : : : : : : : : Subtyping Characterization Lemmas : : : : : : : : Subtyping Implies Component Subtyping Lemmas Properties of Subtyping : : : : : : : : : : : : : : : Method Extraction Lemmas : : : : : : : : : : : : : Row Substitution : : : : : : : : : : : : : : : : : : : Expression Lemmas : : : : : : : : : : : : : : : : : Type Soundness : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : :
: : : : : : : : : : : : : : :
: : : : : : : : : : : : : : :
: : : : : : : : : : : : : : :
: : : : : : : : : : : : : : :
: : : : : : : : : : : : : : :
: : : : : : : : : : : : : : :
: : : : : : : : : : : : : : :
: : : : : : : : : : : : : : :
: : : : : : : : : : : : : : :
: : : : : : : : : : : : : : :
: : : : : : : : : : : : : : :
: : : : : : : : : : : : : : :
: : : : : : : : : : : : : : :
: : : : : : : : : : : : : : :
: : : : : : : : : : : : : : :
: : : : : : : : : : : : : : :
: : : : : : : : : : : : : : :
: : : : : : : : : : : : : : :
87 88
92
95 108 109 112 113 118 121 124 127 133 135 142 147 150 160
A Shape Program: Typecase Version
165
B Shape Program: Object-Oriented Version
173
C Full Formal System
179
C.1 Subtyping Annotation : : : : : : : C.2 Judgment Forms : : : : : : : : : : C.3 Judgment Shorthands : : : : : : : C.4 Context Access Functions : : : : : C.5 Ordering on Variance Annotations C.6 Operations on Variance Sets : : : : C.7 Generalized Variance : : : : : : : : C.8 Variance Shorthand : : : : : : : : C.9 Variance Substitutions : : : : : : : C.10 Ordering on Kinds : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
D Typing rules
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
180 180 180 180 181 181 181 182 182 182
183
D.1 Context Rules : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 183 D.2 Rules for Type Expressions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 184 ix
D.3 D.4 D.5 D.6 D.7
Rules for Row Expressions : : : : : : Subtyping Rules for Types : : : : : : Subtyping Rules for Rows : : : : : : Type and Row Equality Rules : : : : Rules for Assigning Types to Terms
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
184 186 186 188 188
E Operational Semantics
191
F De nition of Evaluation Strategy
192
Bibliography
195
x
Chapter 1
Overview This thesis develops a sound type system for a model object-oriented language that is more exible than the type systems found in common practical languages. This exibility partially stems from separating the notions of subtyping , a code-substitution principle, and inheritance , an objectde nition reuse mechanism. Another source of its exibility is its support for method specialization, a mechanism whereby the types of methods may be re ned in certain ways during inheritance. The lack of such a mechanism is one of the key sources of type casts in languages like C++ and Object Pascal. The formal language presented in this thesis is developed in three stages: (i) an object calculus to model inheritance, (ii) an extension to this calculus that incorporates subtyping, (iii) a further extension to model classes. We prove the soundness of the resulting type system via a subject reduction theorem and an analysis of typing derivations.
1.1 An Object Calculus There are several forms of object-oriented languages. One of the major lines of dierence is between class-based and object-based languages. In class-based languages such as Smalltalk [GR83] and C++ [ES90], each object is created by a class and inheritance is determined by the class. In object-based languages such as Self [US91, CU89], an object may be created from another object, inheriting properties from the original. In this stage, we use an untyped lambda calculus of objects with a functional form of object-based inheritance as a tool for studying typing issues in object-oriented programming languages. Our main interests lie in (i) understanding how the functionality of a method may change as it is inherited, intuitively due to reinterpretation of the special symbol self (or this in C++), and (ii) providing a simple model of object-based inheritance that will serve as a building block for more complex systems. In our calculus, the main operations on objects are to send a message m to an object e , written e ( m , and two forms of method de nition. If expression e denotes an object without method m , 1
2
CHAPTER 1. OVERVIEW
then he + m = e0 i denotes an object obtained from e by adding the method body e0 for m . When he + m = e0 i is sent the message m , the result is obtained by applying e0 to he + m = e0 i . This form of \self-application" allows us to model the special symbol self of object-oriented languages directly by lambda abstraction. Intuitively, the method body e0 must be a function, and the rst actual parameter of e0 will always be the \object itself." To reinforce this intuition, we often write method bodies in the form self:(: : :). The nal method operation on objects is to replace one method body by another. This provides a functional form of update. As in the language Self, we do not distinguish instance variables from methods, since this does not seem essential. The untyped lambda calculus we use bears a strong resemblance to the T object system [RA82, AR88] (although it was originally developed without prior knowledge of T) and the untyped part of the calculus used in [Aba94] to model a fragment of Modula 3 [Nel91, CDJ + 89]. The main goal of this stage is to develop a type system that allows methods to be specialized appropriately as they are inherited. Intuitively, it may be the case that the types of methods should become \more speci c" when they are inherited, re ecting the fact that they belong to a more detailed object after the inheritance. This issue is perhaps best explained via an example. Brie y, suppose p is a two-dimensional point object with x and y methods returning the integer x - and y -coordinates of p , and a move method with type int int ! point . Method move has this type because if we send the message move to p , we obtain a function which given distances to move in the x and y directions, returns a point identical to p , but with updated x, and y -coordinates. If we create a colored point cp from p by the object-extension operation, then cp inherits the x; y; and move methods from p . In an untyped object-oriented language such as Smalltalk, the inherited move method will change the position of a colored point, leaving the color unchanged. Therefore, in a typed language, we want the move method of cp to have the \specialized" type int int ! color point . If the inherited method had its original type int int ! point , then whenever we moved a colored point, we would obtain an ordinary point without color, making the inherited move function largely useless. While an imperative version of move could bypass this diculty by returning type unit (as it is called in ML, or void in C++), experience with imperative object-oriented languages such as Eiel and C++ suggest such self -returning methods are frequently convenient. Eiel's like Current construct [Mey92], analyzed in [Coo89b], illustrates the value of specializing the type of a method in an imperative language. While C++ did not originally include such a construct, its widespread use is not counter-evidence to the usefulness of method specialization. In fact, it appears to be common for novice C++ programmers to attempt to specialize the types of methods in derived classes. More experienced C++ programmers appear to use \down casts" to approximate the eects described here. Additionally, a recent change to C++ adds a form of method specialization that allows the return types of methods to be re ned during inheritance. Formally, in this stage we present an object calculus, an operational semantics, and a type system for the calculus, although we defer the explanation of some of the technical details regarding
1.2. ADDING SUBTYPING
3
\variance analysis" to the next stage. Subject reduction and type soundness theorems for this language follow from the corresponding theorems for the nal language, which appear in Chapter 7.
1.2 Adding Subtyping When this study was started, we initially regarded the object-inheritance-based approach, described above, as a technically simpler method for analyzing inheritance. This impression appeared correct for the study of method specialization carried out in [Mit90, FHM94], but in [FM95b] we observed that there appeared to be a fundamental trade-o between object-based inheritance and subtyping. Speci cally, if an object may be extended with new methods, then it is important to know at compiletime that certain methods have not been de ned already. This requirement con icts with the usual motivation for subtyping, which is to allow code to operate uniformly over all objects having some minimum set of required methods. (Similar observations appear in [AC96c]; see Section 4.1.1.) In this second stage, we present one way of resolving this con ict. Intuitively, the main idea is to distinguish between objects that may be extended with additional methods (or have existing methods rede ned) and those that cannot. This distinction is achieved by giving dierent uses of objects dierent types. In other words, an object may be created and then have new methods added or existing methods rede ned. At this point, only trivial subtyping may be used because the type system must keep track of exactly the set of methods associated with the object. However, such an object may be \converted" to a dierent kind of object, whose methods can no longer be altered. This conversion is done by changing the type of the object to a form which has the expected subtyping properties. In this way, we allow both object-based inheritance and subtyping, at the cost of some increase in the complexity of the type system. Technically, in this stage we present an extension to the type system already developed that supports object subtyping. We also explain the details deferred above regarding the so-called \variance analysis" that tracks how type variables may vary in type expressions. Such information is crucial to determining subtyping relations between partially abstract object types, which will be very useful in modeling classes. As in the previous stage, subject reduction and type soundness theorems for this language follow from the corresponding theorems for the nal language. A preliminary version of this system appeared in [FM95a].
1.3 Adding Classes At this point, we have described an object calculus that models object-based inheritance and rich object subtyping. However, most practical object-oriented languages are class-based languages. This fact raises the question of how to appropriately model classes. Although a full discussion of the motivation for object-oriented languages is beyond the scope of this thesis, it is worth considering some
CHAPTER 1. OVERVIEW
4
desirable language characteristics before proceeding. Generally speaking, objects provide a useful encapsulation mechanism, separating internal implementation details from externally observable behavior, and provide a uniform framework for identifying and specifying the interfaces of various data and system resources. Within this context, classes serve several functions.
Programming methodology: Classes provide a mechanism for declaring both a hierarchy of object types and object implementations.
Implementation considerations: In class-based languages, it is possible to have class-based protection mechanisms, where the methods of an object may access the private data of another object of the same class (or of related classes). This access can be statically checked: classes specify all or part of the implementations of their objects, so all objects of the same class can be guaranteed to share certain implementation characteristics. This guarantee is useful for binary operations, such as set union, and for optimizing method lookup (as in C++).
Static analysis: In comparison with prototype-based languages like Self, where delegation pointers may be set at run time, classes generally force inheritance to be a compile-time operation. This restriction greatly simpli es the static determination of the correctness of class declarations.
Based on these, and other considerations, we believe that in a typed setting, a class mechanism should have the following characteristics.
A class provides an extensible collection of object \parts". The parts may be methods, data,
speci cations of communication protocols, and so on. Extensibility means that a derived class can use the object parts de ned in a base class, possibly adding other parts to be used by subsequent derived classes.
A class construct should include some static condition for guaranteeing that the object \parts" de ned in the class are consistent with each other. For example, it should not be possible to de ne a class where one integer method f requires a string method g , but g is declared to be an integer method requiring a string method f .
A class should provide the ability to specify which \parts" are private (for use within the class implementation) and which are public (for use by client programs). It should be possible to distinguish private from public parts for the current class and for all derived classes.
A class should provide control over initialization of objects, both for the current class and
all derived classes. This is essential for establishing invariants of private data structures, initializing system resources used by the object, and so on.
A class construct should support incremental changes to its de nition. In particular, if a given class is modi ed, all derived classes should be updated automatically.
1.4. ROAD MAP
5
In this third stage, we consider a class construct which resembles the form of class found in C++, Eiel, and Java, and has all of the above properties. This class construct may be written in our object calculus extended with a form of abstract data type declaration. One appeal of this interpretation is that it clearly shows how classes may be viewed as an orthogonal combination of pure operations on objects (providing aggregation but no encapsulation) and data abstraction (providing encapsulation but no aggregation). This analysis provides some insight into the suitability of using object-inheritance-based systems to support traditional class-based programming. To the best of our knowledge, this interpretation of classes provides the rst type soundness proof for the form of class construct found in Eiel, C++, and Java. Furthermore, this analysis sheds light on the long-standing controversy over the proper relationship between subtyping and inheritance. An early and in uential paper, [Sny86], argues that the two ideas are distinct. This point is reinforced in [Coo92], which shows that the subtyping and inheritance hierarchies used in the Smalltalk collection classes are essentially unrelated. We believe that the arguments in [Sny86, Coo92] are correct for interface types, which are types that specify the operations of their objects but not their implementations. Such types have been the focus of recent theoretical studies of object systems, such as [AC96c, Bru93, FHM94, PT94] and the earlier papers appearing in [GM94]. However, existing object-oriented languages such as Eiel [Mey92] and C++ [ES90, Str86] use a form of implementation type that constrains both the interface and the implementations of objects. We argue that there is a connection between subtyping and inheritance for implementation types: the only way to produce a subtype of an implementation type, without violating basic principles of data abstraction, is via inheritance. In addition, we show a connection between interface types and implementation types: every implementation type is a subtype of the interface type obtained by \forgetting" its implementation constraints. Our class construct will provide a mechanism for declaring implementation types, and the subtyping properties of our language will enable us to prove that implementation types are subtypes of the corresponding interface types (as long as a technical condition on the variance of the interface type is satis ed). Formally, we extend the language developed in the second stage with expressions to de ne and use a form of abstract data type. We introduce a new type, a form of bounded existential, to type abstract data type implementations, and we extend our operational semantics to allow evaluation of abstract data type uses. Chapter 6 presents the full language in formal detail. We prove that the resulting language is sound via a subject reduction theorem and an analysis of typing derivations.
1.4 Road Map The rest of the thesis is organized as follows. Chapter 2 summarizes some of the major issues in object-oriented programming. Chapter 3 presents the rst stage described above, a simple prototypebased calculus for modeling inheritance. In Chapter 4, we introduce subtyping. Chapter 5 extends
6
CHAPTER 1. OVERVIEW
our calculus with an abstract data type mechanism to model classes of the form described above. Chapter 6 presents the full formal system needed to model this nal language. Finally, Chapter 7 presents subject reduction and type soundness proofs for the nal language in detail.
Chapter 2
Introduction to Object-Oriented Concepts \Object orientation" is both a language feature and a design methodology. In general, objectoriented design is concerned with ways in which programs may be organized and constructed. Objects provide a program-structuring tool whose importance seems to increase with the size of the programs we build. Roughly speaking, an object consists of a set of operations on some hidden, or encapsulated, data. A characteristic of objects is that they provide a uniform interface to a variety of system components. For example, an object can be as small as a single integer or as large as a le system or output device. Regardless of its size, all interactions with an object occur via simple operations that are called \message sends" or \member function invocations." The use of objects to hide implementation details and provide a \black box" interface is useful for the same reasons that data and procedural abstraction are useful. Although this chapter is about language features, not methodology, we describe object-oriented design brie y since this design paradigm is one of the reasons for the success of object-oriented programming. The following programming methodology is taken from [Boo91], one of many current books on object-oriented design. 1. Identify the objects at a given level of abstraction. 2. Identify the semantics (intended behavior) of these objects. 3. Identify the relationships among the objects. 4. Implement the objects. This is an iterative process based on associating objects with components or concepts in a system. The process is iterative because an object is typically implemented using a number of \sub-objects," 7
8
CHAPTER 2. INTRODUCTION TO OBJECT-ORIENTED CONCEPTS
just as in top-down programming a procedure is typically implemented by a number of ner-grained procedures. The data structures used in the early examples of top-down programming (see [Dij72]) were very simple and remained invariant under successive re nements of the program. Since these re nements involved simply replacing procedures with more detailed versions, older forms of structured programming languages, such as Algol, Pascal, and C, were adequate. When solving more complex tasks, however, it is often the case that both the procedures and the data structures of a program need to be re ned in parallel. Object-oriented languages support this joint re nement of function and data.
2.1 Basic Concepts Not surprisingly, all object-oriented languages have some notion of an \object," which is essentially some data and a collection of methods that operate on that data. There are (at least) two avors of object-oriented languages: class-based and object-based. These avors correspond to two dierent ways of de ning and creating objects. In class-based languages, such as Smalltalk [GR83] and C++ [ES90], the implementation of an object is speci ed by its class. In such languages, objects are created by instantiating their classes. In object-based languages, such as Self, objects are de ned directly from other objects by adding new methods via method addition and replacing old methods via method override. In the remainder of the chapter, we will focus on the more common class-based languages. Although there is some debate as to what exactly constitutes an object-oriented programming language (besides merely having objects), there seems to be general agreement that such a language should provide the following features: dynamic lookup, subtyping, inheritance, and encapsulation. Brie y, a language supports dynamic lookup if when a message is sent to an object, the method body to execute is determined by the run-time type of the object, not its static type. Subtyping means that if some object ob1 has all of the functionality of another object ob2 , then we may use ob1 in any context expecting ob2 . Inheritance is the ability to use the de nition of simpler objects in the de nitions of more complex ones. Encapsulation means that access to some portion of an object's data is restricted to that object (or perhaps to its descendants). We explore these features in more detail in the following subsections.
2.1.1 Dynamic Lookup In any object-oriented language, there is some way to invoke the methods associated with an object. In Smalltalk, this process is called \sending a message to an object," while in C++ it is \calling a member function of an object." To give a neutral syntax, we write receiver
( operation
2.1. BASIC CONCEPTS
9
for invoking operation on the object receiver . For expositional clarity, we will use the Smalltalk terminology for the remainder of this section. Sending messages is a dynamic process: the method body corresponding to a given message is selected according to the run-time identity of the receiver object. The fact that this selection is dynamic is essential to object-oriented programming. Consider, for example, a simple graphics program that manipulates \pictures" containing many dierent kinds of shapes: squares, circles, triangles, etc. Each square object \knows" how to draw a square, each circle \knows" how to draw a circle, etc. When the program wants to display a given picture, it sends the draw message to each shape in the picture. At compile-time, the most we know about an object in the picture is that it is some kind of a shape and hence has some draw method. At run-time, we can nd the appropriate draw method for each shape by querying that shape for its version of the draw method. If the shape is a square, it will have the square draw method, etc. 1 There are two main views for what sending a message means operationally. In the rst view, each object contains a \method table" that associates a method body with each message de ned for that object. When a message is sent to an object at run-time, the corresponding method is retrieved from that object's method table. As a result, sending the same message to dierent objects may result in the execution of dierent code. In the example above, a square shape draws a square in response to the draw message, while a circle draws a circle. This behavior is called dynamic lookup, or, variously, dynamic binding, dynamic dispatch, and run-time dispatch. Both C++ and Smalltalk support this model of message sending. The second view of message sending treats each message name as an \overloaded" function. When a message m is sent to an object ob , ob is treated as the rst argument to an overloaded function named m . Unlike the traditional overloading of arithmetic operators, the appropriate code to execute when m is invoked is selected according to the run-time type of ob , not its static type. In this view, the methods of an object are not actually part of the object. Each object consists solely of its state. The methods from all the objects in a program are collected together by name. For example, the circle and square objects from above would simply contain their local state, i.e., the circle might contain its center and radius, the square its corner points. The draw methods from each would be collected into some \method repository". If the draw message were sent to some object ob , the dynamic type of ob would be determined and the appropriate draw code selected from the repository. If ob were a circle, the circle draw method would be executed, etc. In this view, we again get the important characteristic that sending the same message to dierent objects can result in the execution of dierent code. Languages such as CLOS [Ste84] and Dylan [App92] support this model of message sending. A theoretical study appears in [CGL95]. In the second approach, it is possible to take more than the rst argument into account in the 1 In C++, only member functions designated virtual are selected dynamically. Non-virtual member functions are selected according to the static type of the receiver object. Needless to say, this distinction is the source of some confusion.
10
CHAPTER 2. INTRODUCTION TO OBJECT-ORIENTED CONCEPTS
selection of the appropriate method body to execute. For example, if we write receiver
( operation(arguments)
for invoking an operation with a list of arguments, then the actual code invoked can depend on the receiver alone (as explained above), or on the receiver and one or more arguments. When the selection of code depends only on the receiver, it is called single dispatch; when it also depends on one or more arguments, it is called multiple dispatch. Multiple dispatch is useful for implementing operations such as equality, where the appropriate comparisons to use depend on the dynamic type of both the receiver object and the argument object. Although multiple dispatch is in some ways more general than the single dispatch found in C++ and Smalltalk, there seems to be some loss of encapsulation. This apparent loss arises because in order to de ne a function on dierent kinds of arguments, that function must typically have access to the internal data of each function argument. For example, suppose we wanted to de ne a same center method that compares the centers of any two shapes and returns true if they match. Using multiple dispatch, we can write such a function by giving one version of the method for each pair of shapes we wish to consider: circle and circle, circle and square, square and circle, etc. Notice that this same center method does not conceptually belong to any one of the shapes, and yet it must have access to the internal data of each shape object in order to do any meaningful comparisons. This external access of object internals violates the standard notions of encapsulation for objectoriented languages. It is not clear that this loss of encapsulation is inherent to multiple dispatch. However, current multiple dispatch systems do not seem to oer any reasonable encapsulation of private or local data for objects. Recent work addressing this issue appears in [CL94].
2.1.2 Subtyping The basic principle associated with subtyping is substitutivity: if A is a subtype of B, then any expression of type A may be used without type error in any context that requires an expression of type B. We will write \ A center=copyPt(cp); c->radius=r; c->tag=Circle; return c; };
/* * The function deleteCircle frees resources used by a Circle. */ void deleteCircle(struct Circle* c) { free (c->center); free (c); };
/* * The following Rectangle struct is our representation of a rectangle. * The first field is a type tag to indicate that this struct * represents a rectangle.
The next two fields store the rectangle's
* top-left and bottom-right corner points. */ struct Rectangle { enum ShapeTag
tag;
struct Pt*
topleft;
struct Pt*
botright;
};
/* * The function newRectangle creates a rectangle in the location
APPENDIX A. SHAPE PROGRAM: TYPECASE VERSION
168
* specified by parameters tl and br.
It sets the type tag to
* ``Rectangle.'' */ struct Rectangle* newRectangle(struct Pt* tl, struct Pt* br)
{
struct Rectangle* r = (struct Rectangle*)malloc(sizeof(struct Rectangle)); r->topleft=copyPt(tl); r->botright=copyPt(br); r->tag=Rectangle; return r; };
/* * The function deleteRectangle frees resources used by a Rectangle. */ void deleteRectangle(struct Rectangle* r) { free (r->topleft); free (r->botright); free (r); };
/* * The center function returns the center point of whatever shape * it is passed.
Because the computation depends on whether the
* shape is a Circle or a Rectangle, the function consists of a * switch statement that branches according to the type tag stored * in the shape s.
If the tag is Circle, for instance, we know
* the parameter is really a circle struct and hence that it has * a ``center'' component which we can return.
Note that we need
* to insert a typecast to instruct the compiler that we have a * circle and not just a shape.
Note also that this program
* organization assumes that the type tags in the struct are * set correctly.
If some programmer incorrectly modifies a type tag
* field, the program will no longer work and the problem cannot * be detected at compile time because of the typecasts. */
169 struct Pt* center (struct Shape* s) { switch (s->tag) { case Circle: { struct Circle* c = (struct Circle*) s; return copyPt(c->center); }; case Rectangle: { struct Rectangle* r = (struct Rectangle*) s; return newPt((r->botright->x - r->topleft->x)/2, (r->botright->x - r->topleft->x)/2); }; }; };
/* * The move function receives a Shape parameter s and moves it * dx units in the x-direction and dy units in the y-direction. * Because the code to move a Shape depends on the kind of shape, * this function inspects the Shape's type tag field within a switch * statement.
Within the individual cases, typecasts are used to
* convert the generic shape parameter to a Circle or Rectangle as * appropriate. */ void move (struct Shape* s,float dx, float dy) { switch (s->tag) { case Circle: { struct Circle* c = (struct Circle*) s; c->center->x
+= dx;
c->center->y
+= dy;
}; break; case Rectangle: { struct Rectangle* r = (struct Rectangle*) s; r->topleft->x
+= dx;
r->topleft->y
+= dy;
r->botright->x
+= dx;
APPENDIX A. SHAPE PROGRAM: TYPECASE VERSION
170 r->botright->y
+= dy;
}; }; };
/* * The rotate function rotates the shape s ninety degrees.
Like
* the center and move functions, this code uses a switch statement * that checks the type of shape being manipulated. */ void rotate (struct Shape* s) { switch (s->tag) { case Circle: /* Rotating a circle is not a very interesting operation! */ break; case Rectangle: { struct Rectangle* r = (struct Rectangle*)s; float d = ((r->botright->x - r->topleft->x) (r->topleft->y - r->botright->y))/2.0; r->topleft->x
+= d;
r->topleft->y
+= d;
r->botright->x -= d; r->botright->y -= d; }; break; }; };
/* * The print function outputs a description of its Shape parameter. * This function again selects its processing based on the type tag * stored in the Shape struct. */ void print (struct Shape* s) { switch (s->tag) {
171 case Circle: { struct Circle* c = (struct Circle*) s; printf("circle at
%.1f
%.1f
radius %.1f \n",
c->center->x, c->center->y, c->radius); }; break; case Rectangle: { struct Rectangle* r = (struct Rectangle*) s; printf("rectangle at
%.1f
%.1f
%.1f
%.1f \n",
r->topleft->x, r->topleft->y, r->botright->x, r->botright->y); }; break; }; };
/* * The body of this program just tests some of the above functions. */ void main() { struct Pt* origin = newPt(0,0); struct Pt* p1
= newPt(0,2);
struct Pt* p2
= newPt(4,6);
struct Shape* s1
= (struct Shape*)newCircle(origin,2);
struct Shape* s2
= (struct Shape*)newRectangle(p1,p2);
print(s1); print(s2); rotate(s1); rotate(s2); move(s1,1,1); move(s2,1,1);
APPENDIX A. SHAPE PROGRAM: TYPECASE VERSION
172 print(s1); print(s2);
deleteCircle((struct Circle*)s1); deleteRectangle((struct Rectangle*)s2); free(origin); free(p1); free(p2); };
Appendix B
Shape Program: Object-Oriented Version #include // (The following is a running C++ program, but it does not represent // an ideal C++ implementation.
The code has been kept simple so
// that it can be understood by readers who are not well-versed in C++).
// The following class Pt is used by the shape objects below.
Since
// Pt is a class in this version of the program, the ``newPt'' and // ``copyPt'' functions may be implemented as class member functions. // For readability, we have in-lined the function definitions and // named both of these functions ``Pt''; these overloaded functions // are differentiated by the types of their arguments. class Pt { public: Pt(float xval, float yval) { x = xval; y = yval; }; Pt(Pt* p) { x = p->x; y = p->y; };
173
APPENDIX B. SHAPE PROGRAM: OBJECT-ORIENTED VERSION
174
float x; float y; };
// Class Shape is an example of a ``pure abstract base class,'' // which means that it exists solely to provide an interface to // classes derived from it.
Since it provides no implementations
// for the methods center, move, rotate, and print, no ``shape'' // objects can be created. // class. // it.
Instead, we use this class as a base
Our circle and rectangle shapes will be derived from
This class is useful because it allows us to write
// functions that expect ``shape'' objects as arguments.
Since
// our circles and rectangles are subtypes of shape, we may pass // them to such functions in a type-safe way. class Shape { public: virtual Pt* center()=0; virtual void move(float dx, float dy)=0; virtual void rotate()=0; virtual void print()=0; };
// Class Circle consolidates the center, move, rotate, and print // functions for circles.
It also contains the object constructor
// ``Circle,'' corresponding to the function ``newCircle'' and the // object destructor ``~Circle, corresponding to the function // ``deleteCircle'' from the typecase version.
Note that the
// compiler guarantees that the Circle's methods are only called on // objects of type Circle.
The programmer does not need to keep an
// explicit tag field in the object. class Circle : public Shape { public: Circle(Pt* cn, float r) { center_ = new Pt(cn);
175 radius_ = r; }; virtual ~Circle() { delete center_; }; virtual Pt* center() { return new Pt(center_); }; void move(float dx, float dy) { center_->x += dx; center_->y += dy; }; void rotate() { /* Rotating a circle is not a very interesting operation! */ }; void print() { printf("circle at
%.1f
%.1f
radius %.1f \n",
center_->x, center_->y, radius_); }; private: Pt*
center_;
float radius_; };
// Class Rectangle consolidates the center, move, rotate, and print // functions for rectangles.
It also contains the object constructor
// ``Rectangle,'' corresponding to the function ``newRectangle'' and the // object destructor ``~Rectangle, corresponding to the function // ``deleteRectangle'' from the typecase version.
Note that the
// compiler guarantees that the Rectangle's methods are only called on
APPENDIX B. SHAPE PROGRAM: OBJECT-ORIENTED VERSION
176
// objects of type Rectangle.
The programmer does not need to keep an
// explicit tag field in the object. class Rectangle : public Shape { public: Rectangle(Pt* tl, Pt* br) { topleft_
= new Pt(tl);
botright_ = new Pt(br); }; virtual ~Rectangle() { delete topleft_; delete botright_; }; Pt* center() { return new Pt((botright_->x - topleft_->x)/2, (botright_->x - topleft_->x)/2); }; void move(float dx,float dy) { topleft_->x
+= dx;
topleft_->y
+= dy;
botright_->x += dx; botright_->y += dy; }; void rotate() { float d = ((botright_->x - topleft_->x) (topleft_->y - botright_->y))/2.0; topleft_->x
+= d;
topleft_->y
+= d;
botright_->x -= d; botright_->y -= d; }; void print () { printf("rectangle coordinates
%.1f
%.1f
%.1f
%.1f \n",
177 topleft_->x, topleft_->y, botright_->x, botright_->y); }; private: Pt* topleft_; Pt* botright_; };
/* * The body of this program just tests some of the above functions. */ void main() { Pt* origin = new Pt(0,0); Pt* p1
= new Pt(0,2);
Pt* p2
= new Pt(4,6);
Shape* s1 = new Circle(origin, 2 ); Shape* s2 = new Rectangle(p1, p2); s1->print(); s2->print(); s1->rotate(); s2->rotate(); s1->move(1,1); s2->move(1,1); s1->print(); s2->print(); delete s1; delete s2; delete origin;
APPENDIX B. SHAPE PROGRAM: OBJECT-ORIENTED VERSION
178 delete p1; delete p2; }
Appendix C
Full Formal System In this appendix, we summarize the syntax of the formal system presented in Chapter 6. Expressions e : : = x j c j x: e j e1 e2 j hi j e ( m j he1 m = e2 i j he1 + m = e2 i j fjr