Practical Type Inference for the GADT Type System A Doctoral Dissertation Defense Chuan-kai Lin Advisor: Tim Sheard Department of Computer Science Portland State University
June 1, 2010
Chuan-kai Lin
Practical Type Inference for the GADT Type System
Practical Type Inference for the GADT Type System (Oh no, not one of those again!!)
Practical Type Inference for the GADT Type System (Oh no, not one of those again!!)
Introduction What are types, and What are they good for?
Type Rules
[] [1,2,3] [True] [3,True] . . .
[Int] [Bool] Bool .. . L
The ADT type system cannot describe fine-grained program properties. Hence: the GADT type system. 5M 2L 1S
7M 4M
AVL Balance Factors
Counting (Tree Height) with Types
data Z data S n
Peano encoding of 0 Peano encoding of 1+n
type-level encoding type-level encoding type-level encoding type-level encoding type-level encoding .. .
Z (S Z) (S (S Z)) (S (S (S Z))) .. .
of of of of of
0 1 2 3 4
Generalized Algebraic Data Types (GADTs) data Avl n Tip :: MNode :: SNode :: LNode ::
where Avl Z Avl (S n) → Int → Avl n → Avl (S (S n)) Avl n → Int → Avl n → Avl (S n) Avl n → Int → Avl (S n) → Avl (S (S n))
The LNode data constructor combines an integer with A left subtree of height n and
A right subtree of height 1+n=(S n) To build an AVL tree of height 2+n=(S (S n)). Chuan-kai Lin
Practical Type Inference for the GADT Type System
8 / 75
tree :: Avl (S (S (S (S Z)))) 5M tree = MNode (LNode 2L 7M Tip 2 (MNode 4M 6S (SNode Tip 3 Tip) 4 Tip)) 3S 5 (MNode (SNode Tip 6 Tip) 7 Tip)
tree :: Avl (S (S (S (S Z)))) 5M tree = MNode (LNode 2L 7M Tip 2 (MNode 4M 6S (SNode Tip 3 Tip) 4 Tip)) 3S 5 (MNode (SNode Tip 6 Tip) 7 Tip)
I’m sorry, Dave. I’m afraid I can’t do that. Chuan-kai Lin
Practical Type Inference for the GADT Type System
10 / 75
tree :: Avl (S (S (S (S Z)))) 5M tree = MNode (LNode 2L 7M (SNode Tip 1 Tip) 2 (MNode 1S 4M 6S (SNode Tip 3 Tip) 4 Tip)) 3S 5 (MNode (SNode Tip 6 Tip) 7 Tip)
Everything is going extremely well. Chuan-kai Lin
Practical Type Inference for the GADT Type System
11 / 75
The GADT type system can also describe properties of functions over generalized algebraic data types. X
rotl A B
rotl :: forall n. Avl n → Int → Avl (S (S n)) → E (Avl (S (S n))) (Avl (S (S (S n))))
Type Checking for the rotl Function rotl :: forall n. Avl n → Int → Avl (S (S n)) → E (Avl (S (S n))) (Avl (S (S (S n)))) rotl u v w = case w of SNode a x b → R (MNode (LNode u LNode a x b → L (SNode (SNode u MNode k y c → case k of SNode a x b → L (SNode (SNode LNode a x b → L (SNode (MNode MNode a x b → L (SNode (SNode
v a) x b) v a) x b) u v a) x (SNode b y c)) u v a) x (SNode b y c)) u v a) x (LNode b y c))
Everything is going extremely well. Chuan-kai Lin
Practical Type Inference for the GADT Type System
13 / 75
The introduction so far focuses on type checking — Given a program, a context, and a type, determine if the program has the given type in the context. This dissertation is about type inference — Given a program and a context, determine if the program has a type in the context (and find the type if it exists). Type inference is a lot harder than type checking.
Type Inference for the rotl Function rotl u v w = case w of SNode a x b → R (MNode (LNode u LNode a x b → L (SNode (SNode u MNode k y c → case k of SNode a x b → L (SNode (SNode LNode a x b → L (SNode (MNode MNode a x b → L (SNode (SNode
v a) x b) v a) x b) u v a) x (SNode b y c)) u v a) x (SNode b y c)) u v a) x (LNode b y c))
Avl n → Int → Avl (S (S n)) → E (Avl (S (S n))) (Avl (S (S (S n))))
Everything is going extremely well. Chuan-kai Lin
Practical Type Inference for the GADT Type System
15 / 75
Previous Work The GADT type inference problem has been vigorously studied in past five years, but without much progress. Pottier & Régis-Gianas, Stratified Inference, POPL 2006 Peyton Jones et al., Wobbly Types, ICFP 2006 Schrijvers et al., OutsideIn, ICFP 2009 Stuckey & Sulzmann, GRDT Inference, 2005 Sulzmann et al., Herbrand Constraint Abduction, 2008
Current state: sound & complete with type annotations, extremely ineffective without type annotations Chuan-kai Lin
Practical Type Inference for the GADT Type System
16 / 75
Why GADT Type Inference? — A (Flying) Car Analogy —
Because it is at times quite convenient
And it can be really educational
But we must resist the easy way out
Thesis Statement Designing a practical GADT type inference algorithm leads to new discoveries about the GADT type system. These discoveries, in turn, advance the state of the art in the design of GADT type inference algorithms.
A Crash Course on Generalized Algebraic Data Types (in two slides)
Example: Length-Indexed Lists data Z data S n
(i.e., 0) (i.e., 1+n)
length zero data L n a where length increment Nil :: forall a. L Z a Cons :: forall n a. a → L n a → L (S n) a Nil :: L Z Int Cons 3 Nil :: L (S Z) Int Cons 5 (Cons 3 Nil) :: L (S (S Z)) Int
(length 0) (length 1) (length 2)
Pattern-Matching Length-Indexed Lists data L n a where Cons :: forall n a. a → L n a → L (S n) a isNat :: forall m. L m Int → L m Bool isNat xs = case xs of branch body, type L Z Bool Nil → Nil Cons y ys → Cons (y >= 0) (isNat ys) branch body, type L (S n) Bool
Pattern-Matching Length-Indexed Lists data L n a where Cons :: forall n a. a → L n a → L (S n) a isNat :: forall m. L m Int → L m Bool isNat xs = case xs of scrutinee, type L m Int Nil → Nil Cons y ys → Cons (y >= 0) (isNat ys) pattern, type L (S n) a
Pattern-Matching Length-Indexed Lists data L n a where Cons :: forall n a. a → L n a → L (S n) a isNat :: forall m. L m Int → L m Bool isNat xs = case xs of scrutinee, type L m Int Nil → Nil Cons y ys → Cons (y >= 0) (isNat ys) pattern, type L (S n) a U(L m Int ∼ L (S n) a) = [m 7→ S n, a 7→ Int] GADT type refinement
Pattern-Matching Length-Indexed Lists case expression type data L n a where Cons :: forall n a. a → L n a → L (S n) a isNat :: forall m. L m Int → L m Bool isNat xs = case xs of scrutinee, type L m Int Nil → Nil Cons y ys → Cons (y >= 0) (isNat ys) pattern, type L (S n) a
branch body, type L (S n) Bool
U(L m Int ∼ L (S n) a) = [m 7→ S n, a 7→ Int] GADT type refinement
Contribution #1 Finding a new answer to the question “What makes GADT type inference so hard?”
Background Previous work points out three technical difficulties with the GADT type inference problem: 1
Some programs lack principal types
Different case branches may have different types
Many programs use polymorphic recursion
Presumably, without these three technical difficulties, GADT type inference would be easy.
Methodology Apply existing inference algorithms to programs that 1
Contain GADT pattern-matching branches,
Have no type annotations,
Have principal (i.e., most-general) types,
Do not require GADT type refinements, and
Do not require polymorphic recursion.
Type inference failure indicates unforeseen difficulties.
GADT Type Inference Test #1 tail :: forall n a. L (S n) a → L n a tail xs = case xs of Cons y ys → ys
Stratified Type Inference Wobbly Types OutsideIn Herbrand Constraint Abduction Omega Implementation (Sheard)
7 7 7 7 3
GADT Type Inference Test #2 null :: forall null xs = case Nil → Cons y ys →
n a. L n a → Bool xs of True False
Stratified Type Inference Wobbly Types OutsideIn Herbrand Constraint Abduction Omega Implementation (Sheard)
7 7 7 3 7
The Contribution There is an unforeseen difficulty in the GADT type inference problem: case scrutinee type inference. tail xs = case xs of Cons y ys → ys xs :: L n a ? xs :: L (S n) a ? xs :: L (S Z) a ? .. .
Contribution #2 Generalized Existential Types in GADT Pattern-Matching Branches
Background Existential types (Läufer & Odersky) are type variables in a constructor type that are not in the range type. type existential type data F a where App :: forall a b. (b → a) → b → F a case e :: App f x App f x App f x
App (mod 3) 5, or App ord ’c’? F Int of → x 7 (b escapes from the branch) → f 3 7 (b is instantiated to Int) → f x 3 (no escape, no instantiation)
A GADT pattern can introduce type variables that behave like (but are not) existential types. data Term a where no existential types RepInt :: Int → Term Int RepPair :: forall u v. u → v → Term (u,v) inc1 :: forall a. Term a → Term a inc1 e = case e of allows inc1 (RepPair True 3) RepInt i → RepInt i RepPair x y → RepPair (x+1) y Danger, Will Robinson! {x :: u, y :: v}, type refinement [a 7→ (u,v)]
The Contribution Generalized existential types are pattern type variables that receive no information from the scrutinee type. type generalized existential type data T a where C :: forall b. b → T [b] case e :: T u of C x → x C x → x+3 C x → 3
7 7 3
(b escapes from the branch) (b is instantiated to Int) (no escape, no instantiation)
The Contribution Generalized existential types are evolutionary: An existential type is a generalized existential type. Generalized existential types are also revolutionary: Existential types are intrinsic to a data constructor. Generalized existentials are extrinsic to a pattern (they depend on the type of the scrutinee, i.e., the context where the pattern appears).
Contribution #3 Inferring Valid Scrutinee Types: Avoiding Generalized-Existential-Type Violations
Background Existing algorithms infer scrutinee types first and catch generalized-existential-type violations later. À Looks like a GADT type argument data T a where C :: forall b. b → T [b] case e of C x → x+3
Á I should infer scrutinee type as T a  Generalized existential type instantiation
The Contribution A type inference algorithm should work backward to avoid generalized-existential-type violations. Á b better not be generalized existential data T a where C :: forall b. b → T [b] case e of C x → x+3
 I should infer scrutinee type as T [Int] À Type variable b is instantiated to Int
Contribution #4 Inferring Better Scrutinee Types: Beyond the Myth of Most-General Types
Background How do programmers choose a type for a program? id :: forall a. a → a id :: forall a. [a] → [a] id :: [Int] → [Int]
(best, most-general) (not as good) (even worse)
id x = x
A more-general type allows the program to appear in a wider range of contexts and is thus more preferable.
The Contribution A good scrutinee type should closely match all branch pattern types in the case expression. pattern type for the Cons branch data L n a where Cons :: forall n a. a → L n a → L (S n) a head :: forall n. L n Int → Int head :: forall n a. L n a → a head :: forall n a. L (S n) a → a
(bad) (good, most-general) (even better!)
head xs = case xs of prevents divergence from head Nil Cons y ys → y
The Contribution Choosing scrutinee type specificity is a trade-off. A more general scrutinee type provides: More reusable case expressions More opportunities for pattern-matching failures
A more specific scrutinee type provides: Less reusable case expressions Fewer opportunities for pattern-matching failures
The Contribution A type inference algorithm should specialize a scrutinee type to match the pattern types in the case expression. data Vec n a where Vec0 :: Vec Z a Vec1 :: a → Vec (S Z) a Vec2 :: a → a → Vec (S (S Z)) a  Specialize type to Vec (S n) Int case vec of Vec1 x → x À Infer scrutinee type Vec n a Vec2 x y → x+y Á Infer scrutinee type Vec n Int
Contribution #5 Reconciling Types in Different Branches Using GADT Type Refinements
Background Conflicting branch body types in a case expression is a major technical difficulty in GADT type inference. data L n a where Nil :: forall a. L Z a Cons :: forall n a. a → L n a → L (S n) a head3 :: forall m. L m Int → L m Int head3 xs = case xs of [m 7→ Z], type L Z Int Nil → Nil Cons y ys → Cons 3 ys [m 7→ S n], type L (S n) Int
Background Previously inferred type information: à Nil pattern type, [m 7→ Z] data L n a where Ä Cons pattern type, [m 7→ S n] Nil :: forall a. L Z a Cons :: forall n a. a → L n a → L (S n) a  Infer scrutinee type L m Int head3 xs = case xs of À Infer body type L Z b Nil → Nil Cons y ys → Cons 3 ys Á Infer body type L (S n) Int
À Nil branch body type L Z b  Scrutinee type L m Int
Á Cons branch body type L (S n) Int
à Nil branch type refinement [m 7→ Z]
Ä Cons branch type refinement [m 7→ S n]
Å Fresh type variable u
Branch Types
Nil Cons
L Z b L (S n) Int
S n
Goal: infer u using the type refinements on m
The Contribution I developed two tactics that extract type information from inconsistent columns in the branch type table: 1
Destruct common top-level type constructors
Apply type refinements from the refinement table
These tactics enable type inference for case expressions whose pattern-matching branches have different types.
Branch Nil Cons
Branch Types
Z S n
L Z b L (S n) Int
Tactic #1: destruct type constructor L ρ = [u 7→ L r s] Refinements Branch Nil Cons
Branch Types
Z S n
Z S n
b Int
Refinements Branch Nil Cons
Branch Types
Z S n
Z S n
b Int
Tactic #2: apply GADT type refinement to r ρ = [u 7→ L m s, r 7→ m] Branch Nil Cons
Branch Types
Z S n
b Int
Branch Nil Cons
Branch Types
Z S n
b Int
Last resort: unify each remaining branch type column ρ = [u 7→ L m Int, r 7→ m, s 7→ Int] head3 :: L m Int → L m Int
Hold on a sec. . . Did we really infer a type for head3 without type annotations?!?
Summary of Contributions data L n a where Nil :: forall a. L Z a Cons :: forall n a. a → L n a → L (S n) a head3 xs = case xs of Nil → Nil Cons y ys → Cons 3 ys head3 :: L m Int → L m Int
Evaluation Algorithm P for GADT Type Inference
The Plain GADT Type System Type polymorphism [Milner, 1978] Polymorphic recursion [Mycroft, 1984] Generalized Algebraic Data Types No support for type annotations Type Checking
Type Inference
Previous Work
We Are Here
Algorithm P Type inference for the plain GADT type system Type polymorphism [Milner, 1978] Polymorphic recursion [Mycroft, 1984] Generalized Algebraic Data Types (this work) Haskell Implementation (848 LoC) All systems are functional. Chuan-kai Lin
Benchmark I collected a suite of 30 well-typed plain GADT programs from the following application domains: Dimensional types Generic N-way zip Functional reactive programming Type equality witnesses Shape-indexed binary-tree paths Color-indexed red-black trees
Length-indexed lists Tagless term interpreters Monad libraries Integer ordering witnesses Balance-indexed AVL trees
AVL Tree Left-Rotation rotl u v w = case w of SNode a x b → R (MNode (LNode u LNode a x b → L (SNode (SNode u MNode k y c → case k of SNode a x b → L (SNode (SNode LNode a x b → L (SNode (MNode MNode a x b → L (SNode (SNode
v a) x b) v a) x b) u v a) x (SNode b y c)) u v a) x (SNode b y c)) u v a) x (LNode b y c))
Avl n → Int → Avl (S (S n)) → E (Avl (S (S n))) (Avl (S (S (S n))))
Length-Indexed List Zip zipWith f a b = case a of Nil → case b of Nil → Nil Cons x xs → case b of Cons y ys → Cons (f x y) (zipWith f xs ys)
forall a b c d. (a → b → c) → L d a → L d b → L d c
Tagless Term Interpreter eval4 x = case x of RepInt i → i RepBool b → b RepCond u a b → case eval4 u of True → eval4 a False → eval4 b RepSnd u → case eval4 u of { (x, y) → y } RepPair a b → (eval4 a, eval4 b) forall a. Term a → a
Trivial Arrow Evaluation data FunDesc a b where FDI :: forall a. FunDesc a a FDC :: forall a b. b → FunDesc a b FDG :: forall a b. (a → b) → FunDesc a b fdFun FDI FDC FDG
e = case e of → λx → x b → λx → b f → f forall a b. FunDesc a b → a → b
Integer Ordering Witness data Nat n where Zn :: Nat Z Sn :: Nat n → Nat (S n) data NatLeq m n where LeZ :: NatLeq Z b NatLeq a b → NatLeq (S a) (S b) leq_o k = case k of Zn → LeZ Sn n → LeS (leq_o n)
Benchmark Results Algorithm P infers types for 25 out of 30 programs in the benchmark suite. The OutsideIn algorithm1 infers types for 1 out of 30 programs in the benchmark suite. The Wobbly Types algorithm2 infers types for 0 out of 30 programs in the benchmark suite. 1. Schrijvers et al., ICFP 2009
2. Peyton Jones et al., ICFP 2006
Closing Remarks
Topics Covered in This Talk GADT Type System Properties: Generalized existential types Specificity criterion for scrutinee types
GADT Type Inference Techniques: Generalized existential type elimination Scrutinee type specialization Branch type reconciliation tactics
The Key Lesson
There are two reasons to specialize a scrutinee type: 1
Eliminate generalized existential types to allow escape and instantiation of pattern-type variables, and
Exclude data constructors to reduce opportunities for runtime pattern-matching failures.
A more general scrutinee type is not necessarily better.
From the Dissertation
Pointwise Type Information Flow in GADT Patterns: Characterizes the principle of orthogonal design Sufficiently expressive for most GADT applications Excludes a specific class of pathological programs Makes GADT programs easier to understand Formalized by pointwise unifiers and pointwise unification
From the Dissertation
The GADT Branch Reachability Requirement: Requires every branch to be potentially reachable Interacts with local let definitions in perplexing ways Causes GADT type systems to lose type preservation Requires type consistency constraints in Algorithm P Lesson: restricted type system ; simple type inference
Thesis Statement Designing a practical GADT type inference algorithm leads to new discoveries about the GADT type system. These discoveries, in turn, advance the state of the art in the design of GADT type inference algorithms.
Future Work More work is needed to enable Algorithm P to: Infer types for more programs Support additional type system features Provide more useful error messages More work is also needed to formally describe and to verify the soundness of Algorithm P. I think you know what the problem is just as well as I do.
Acknowledgments Research Advising: Tim Sheard & Andrew P. Black Faculty members at PSU Computer Science Moral Support: Wife & Family Friends at PSU Computer Science Financial Backing: National Science Foundation
Thank You! Chuan-kai Lin
