An Introduction to Erlang - Declarative Programming Group

49 downloads 525 Views 6MB Size Report
Jun 16, 2010 - Dialyzer @ UCM 2010 proc_reg(Name) -> ... case whereis(Name) of undefined ->. Pid = spawn(...), register(Name, Pid);. Pid -> % already ok.
Using Static Analysis to Detect Type Errors & Concurrency Defects in Erlang Programs

Kostis Sagonas National Technical University of Athens, Greece Uppsala University, Sweden

Erlang 





A concurrent programming language −

syntax influenced from Prolog



functional: strict, higher-order, pattern matching, guards, list comprehensions



dynamically typed

The main implementation of Erlang is the Erlang/OTP system from Ericsson Successfully used within telecommunications and in application domains which require high degree of concurrency and/or fault tolerance

Kostis Sagonas

Dialyzer @ UCM 2010

Dynamically typed languages ... such as Erlang can be seen as unityped: − Only







one type: term() or any()‫‏‬ However, some primitive functions are only defined on subtypes of all terms and their arguments need to be checked at runtime Type safety is not an issue as it is provided by the runtime system: all terms are tagged with their type, which is checked in primitive operations However, programmers often make mistakes that a statically typed language would normally catch

Kostis Sagonas

Dialyzer @ UCM 2010

Type system to the rescue? Type system: an analysis that is fast, sound for correctness – “well-typed programs do not go wrong” • w.r.t. some kinds of errors

– pessimistic and inflexible: if cannot prove safety, reject the program

Kostis Sagonas

Dialyzer @ UCM 2010

Using Static Analysis for Detecting Type Errors

False error reports: show-stopper 

Flanagan et al. (ESC/Java people), 2002: −



“[T]he tool has not reached the desired level of cost effectiveness. In particular, users complain about an annotation burden that is perceived to be heavy, and about excessive warnings about non-bugs, particularly on unannotated or partially-annotated programs.”

Rutar et al., 2004: −

> 9k NPE warnings in 170k non commented source statements



“[T]here are too many warnings to be easily useful by themselves.”

Kostis Sagonas

Dialyzer @ UCM 2010

The problem statement 

Infer types for all functions in the program, without imposing any constraints that the programmer never intended foo([],L) -> L; foo([H|T],L) -> [H|foo(T,L)].



To be sound for defect detection, the inferred types have to respect the operational semantics of the language foo(list(), any()) -> any()‫‏‬



Should this type be different if the function was called append instead of foo?

Kostis Sagonas

Dialyzer @ UCM 2010

Women, Men, and Restroom Signs

Restroom signs in the U.S.A.

Kostis Sagonas

Dialyzer @ UCM 2010

Restroom signs at NTUA, Greece

Kostis Sagonas

Dialyzer @ UCM 2010

Tempting conclusion 





In the U.S., everything is forbidden unless it is allowed In Greece, everything is allowed unless it is forbidden That’s not the point of the story...

Kostis Sagonas

Dialyzer @ UCM 2010

More permissive?

Kostis Sagonas

Dialyzer @ UCM 2010

Which is better?

+ −

More restrictive: no dogs allowed



More permissive: what if both?



Too restrictive: what if neither? (e.g., boy/girl)

-

Kostis Sagonas

Dialyzer @ UCM 2010

Which is better?

+ −

More permissive: what if neither? 

“just visiting your planet”



Too permissive: dogs allowed



Too restrictive: what if both?

Kostis Sagonas

Dialyzer @ UCM 2010

All is a matter of desired specification 

No definition is better by default −

all is a matter of desired behavior



usual precision and recall metrics for quality what percentage of allowed behaviors are desired?  what percentage of desired behaviors are allowed? 

Kostis Sagonas

Dialyzer @ UCM 2010

Back to Detecting Type Errors by Static Analysis...

The moral of all this is... 

Being sound for correctness is not always a superior approach −

restricts expressiveness



rejects valid programs 



think of sound analyses for “no division by zero”

All depends on desired behavior −

for defect detection tools the goal is to find definite bugs, rather than to prove correctness



soundness for incorrectness is very valuable!

Kostis Sagonas

Dialyzer @ UCM 2010

Type analysis for defect detection 



The inferred types should: −

Be easy to interpret by the programmer



Never lie: Capture all possible, however unintended, uses of functions

The type inference algorithm should: −

Be completely automatic  

Not require any user annotations Not require any type declarations



Handle cases where not all code is available



Be relatively fast

Kostis Sagonas

Dialyzer @ UCM 2010

Dialyzer “A DIscrepancy AnaLYZer for ERlang programs”  

Part of the Erlang/OTP distribution since 2007 A lightweight static analysis tool for finding discrepancies in Erlang programs −

Managed to uncover many bugs in large, well-tested, commercial applications



Heavily used in the Erlang community

Kostis Sagonas

Dialyzer @ UCM 2010

Dialyzer Characteristics of dialyzer (for type error detection) : 

Sound for defect detection – not for correctness!



Push-button technology, completely automatic



Fast and scalable



Very successful

Kostis Sagonas

Dialyzer @ UCM 2010

An Erlang implementation of and and(true, true) → true; and(false, _) → false; and(_, false) → false.

Erlang program bool() ::= true | false

> and(true, true). true > and(false, true). false > and(false, gazonk). false > and(3.14, false). false

Trial runs

Kostis Sagonas

Dialyzer @ UCM 2010

An Erlang implementation of and and(true, true) → true; and(false, _) → false; and(_, false) → false.

Erlang program and(bool(), bool()) → bool()‫‏‬

HM-type signature

Kostis Sagonas

> and(true, true). true > and(false, true). false > and(false, gazonk). false > and(3.14, false). false

Trial runs

Dialyzer @ UCM 2010

An Erlang implementation of and and(true, true) → true; and(false, _) → false; and(_, false) → false.

Erlang program and(any(), false) → bool()‫‏‬

Subtyping signature

> and(true, true). true > and(false, true). false > and(false, gazonk). false > and(3.14, false). false

Trial runs

Typing inferred by algorithm from S. Marlow and P. Wadler, “A practical subtyping system for Erlang” Kostis Sagonas

Dialyzer @ UCM 2010

A quick look at inferred domains

Dynamic typing domain

Static typing domain

We need to capture all of the dynamic domain! Kostis Sagonas

Dialyzer @ UCM 2010

Success typings 

Definition: A success typing for a function f is a type signature, α→β , such that whenever an application f(x) reduces to a value v,v then x ∈ α and v ∈ β .



Intuition: −

If the arguments are in the domain of the function the application might succeed, but



if they are not, the application will definitely fail.

Kostis Sagonas

Dialyzer @ UCM 2010

Function domains revisited

Dynamic typing domain

Static typing domain Success typing domain

Kostis Sagonas

Dialyzer @ UCM 2010

Recap and(bool(), bool()) → bool()‫‏‬ and(true, true) → true; and(false, _) → false; and(_, false) → false.

Erlang program

HM-type (any(), 'false') → bool()‫‏‬

Subtyping and(any(), any()) → bool()‫‏‬

Success typing

Kostis Sagonas

Dialyzer @ UCM 2010

Two sides to the story Well-typed programs do not go wrong!

Ill-typed programs will surely fail!

Pessimism: If we cannot prove type safety we must reject the program.

Optimism: If we cannot detect a type clash we need to accept the program as it might work.

Static typing view

Success typing view

Kostis Sagonas

Dialyzer @ UCM 2010

Inferring success typings 

There is a most general success typing for all functions of a certain arity   



(any()) → any() for all functions of arity 1 (any(), any()) → any() for all functions of arity 2 ...

The aim of the inference algorithm is to reduce both the domain and the range of the success typing as much as possible without excluding any valid terms

Kostis Sagonas

Dialyzer @ UCM 2010

The inference algorithm [PPDP’06] 



Constraint-based algorithm −

Constraint generation



Constraint solving, bottom-up per SCC

Constraints are organized in disjunctions and conjunctions of subtype constraints C ::= (T1 ⊆ T2 ) | ( C1 ∧…∧ C n ) | ( C1 ∨…∨ C n )



Conjunctions come from straight-line code and disjunctions come from choices (case statements)‫‏‬

Kostis Sagonas

Dialyzer @ UCM 2010

Some examples %% c(integer()) → foo. c(X) → case b(X) of 42 → foo; gazonk → bar end. %% b(integer()) → integer() | float(). b(X) when is_integer(X) → a(X). %% a(integer() | float()) → integer() | float(). a(X) → X + 1. Kostis Sagonas

Dialyzer @ UCM 2010

More examples %% foo(integer() | atom()) → integer() | list(). foo(X) when is_integer(X) → X + 1; foo(X) → atom_to_list(X).

%% gazonk(none()) → none()‫‏‬ gazonk(X) when is_atom(X) → X + 42.

Kostis Sagonas

Dialyzer @ UCM 2010

A higher-order example %% foo() → none(). foo() → F = fun (X) when is_integer(X) → 54 end, h(F).

%% h((any()) → any()) → number()‫‏‬ h(F) → F(true) + 42.

Kostis Sagonas

Dialyzer @ UCM 2010

A slight disturbance... %% length_1(list()) → non_neg_integer()‫‏‬ length_1([]) → 0; length_1([_|T]) → length_1(T) + 1.

%% length_2(list()) → any()‫‏‬ length_2(L) → length_3(L, 0). %% length_3(list(), any()) → any()‫‏‬ length_3([], N) → N; length_3([_|T], N) → length_3(T, N+1).

Kostis Sagonas

Dialyzer @ UCM 2010

Module system to the rescue 





In Erlang, the module system cannot be bypassed −

Code resides in modules



Modules have declared interfaces (exported functions)‫‏‬

Since the module system protects local functions from arbitrary use, we can collect the types of the parameters of all call sites of these functions We can use this information to restrict the domains of module-local functions −

“refined success typings”

Kostis Sagonas

Dialyzer @ UCM 2010

The length example revisited

-module(my_list_utils). -export([length_2/1]). %% length_2(list()) → any(). non_neg_integer(). length_2(L) → length_3(L, 0). %% length_3(list(), non_neg_integer()) → non_neg_integer(). any()) → any(). length_3([], N) → N; length_3([_|T], N) → length_3(T, N+1).

Kostis Sagonas

Dialyzer @ UCM 2010

Adding function specifications

-module(my_list_utils). -export([length_2/1]). -spec length_2(list()) → non_neg_integer(). length_2(L) → length_3(L, 0). length_3([], N) → N; length_3([_|T], N) → length_3(T, N+1).

Kostis Sagonas

Dialyzer @ UCM 2010

Adding contracts

-module(my_list_utils). -export([length_2/1]). -spec length_2(list(atom())) → integer(). length_2(L) → length_3(L, 0). length_3([], N) → N; length_3([_|T], N) → length_3(T, N+1).

Kostis Sagonas

Dialyzer @ UCM 2010

How Erlang modules used to look like

Kostis Sagonas

Dialyzer @ UCM 2010

How modern Erlang modules look

Kostis Sagonas

Dialyzer @ UCM 2010

Using Static Analysis for Detecting Concurrency Defects

Concurrency 

A method to better structure programs



A means to speed up their execution



A necessity these days??

The catch: 

Concurrent programming is harder and more error-prone than its sequential counterpart

Kostis Sagonas

Dialyzer @ UCM 2010

Data race detection in Erlang 

Erlang’s concurrency model is based on user-level processes that communicate via asynchronous message passing −



copying semantics (“shared-nothing”)‫‏‬

If there is nothing shared between processes, how can there be data races?



System built-ins allow processes to share data



Erlang currently provides no atomicity constructs

Kostis Sagonas

Dialyzer @ UCM 2010

What is considered a data race? When a process reads some variable, it then decides to take some write action based on the value of that variable If it is possible for another process to succeed in changing the value stored on that variable in between the read and the action in such a way that the action about to be taken is no longer appropriate, then we say that the program has a race condition Kostis Sagonas

Dialyzer @ UCM 2010

Data races in the process registry

proc_reg(Name) -> ... case whereis(Name) of undefined -> Pid = spawn(...), register(Name, Pid); Pid -> % already ok % registered end, ...

Kostis Sagonas

Dialyzer @ UCM 2010

Data races in the process registry

Kostis Sagonas

Dialyzer @ UCM 2010

Data races in ETS run() -> Tab = ets:new(some_tab_name, [public]), Inc = compute_inc(), Fun = fun () -> ets_inc(Tab, Inc) end, spawn_some_processes(Fun). ets_inc(Tab, Inc) -> case ets:lookup(Tab, some_key) of [] -> ets:insert(Tab, {some_key, Inc}); [{some_key, OldValue}] -> NewValue = OldValue + Inc, ets:insert(Tab, {some_key, NewValue})‫‏‬ end.

Kostis Sagonas

Dialyzer @ UCM 2010

Data races in mnesia -export([table_func/2]). table_func(...) -> create_time_stamp_table(), ... create_time_stamp_table() -> Props = [{type, set}, ...], create_table(time_stamp, Props, ram_copies, false), NRef = case mnesia:dirty_read(time_stamp, ref_count) of [] -> 1; [#time_stamp{data = Ref}] -> Ref + 1 end, mnesia:dirty_write(#time_stamp{data = NRef}) .

Kostis Sagonas

Dialyzer @ UCM 2010

Single-threaded Erlang 







A single scheduler picks up processes from a single ready queue The selected process gets assigned a number of reductions to execute Each time the process does a function call, a reduction is consumed A process gets suspended when the number of remaining reductions reaches zero, or when it gets stuck

Kostis Sagonas

Dialyzer @ UCM 2010

Single-threaded Erlang proc_reg(Name) -> ... case whereis(Name) of undefined -> Pid = spawn(...), register(Name, Pid); Pid -> % already ok % registered end, ...

Being struck by a lightning seems more likely! Kostis Sagonas

Dialyzer @ UCM 2010

Multi-threaded Erlang 





Since May 2006, a multi-threaded version of the system has been released, which is the default on multi-core architectures There are multiple schedulers, each having its own ready queue Since March 2009, the runtime system employs a redistribution scheme based on work stealing when some scheduler’s run queue becomes empty

Kostis Sagonas

Dialyzer @ UCM 2010

Race analysis in Dialyzer Characteristics: 

Sound for either correctness or defect detection



Completely automatic



Fast and scalable



Smoothly integrated into dialyzer

Kostis Sagonas

Dialyzer @ UCM 2010

The analysis: a three-step process 1. Collecting information −

Control-flow graphs of functions and closures



Escape analysis



Inter-modular call graph



Sharing/alias analysis



Fine-grained type information (singleton types)‫‏‬

Kostis Sagonas

Dialyzer @ UCM 2010

The analysis: a three-step process 2. Determining all code points with possible race conditions −

Find the root nodes in the forest of call graphs



Traverse their CFGs using DFS



Special cases: 

Statically known function or closure calls



Unknown higher-order calls



Recursion

Kostis Sagonas

Dialyzer @ UCM 2010

The analysis: a three-step process 3. Filtering false alarms −

Variable sharing



Type information



Characteristics of race conditions foo(Fun, N, M) -> ... case whereis(N) of undefined -> ..., Fun(M); Pid -> ... end, ...

Kostis Sagonas

Dialyzer @ UCM 2010

Some optimizations  



Control-flow graph minimization Avoiding repeated traversals and benefiting from temporal locality Making unknown function calls less unknown

Kostis Sagonas

Dialyzer @ UCM 2010

Detecting data races mod.erl 1 : proc_reg(Name) -> 2 : ... 3 : case whereis(Name) of 4 : undefined -> 5 : Pid = spawn(...), 6 : register(Name, Pid); 7 : Pid -> % already 8 : ok % registered 9 : end, 10: ...

mod.erl:6:The call erlang:register(Name::atom(),Pid::pid()) might fail due to a possible race condition caused by its combination with the‫‏‬erlang:whereis(Name::atom()) in mod.erl on line 3 Kostis Sagonas

Dialyzer @ UCM 2010

Effectiveness and Performance

Kostis Sagonas

Dialyzer @ UCM 2010

Current status and impact 

The race analysis has been publicly released as part of the latest Erlang/OTP distribution (mid November 2009) From: Bernard Duggan (Erlang developer)‫‏‬ Sent on 27 November 2009

“Our Erlang codebase comprises 5 applications and a few little ancillary bits and pieces on the side – it's about 40k lines. So far it's turned up three race conditions. … Thanks for a brilliant tool.”

Kostis Sagonas

Dialyzer @ UCM 2010

Race detection in Erlang (ICFP’09) ‫‏‬ QuickCheck: 





A property-based testing tool PULSE is a ProTest User Level Scheduler for Erlang that randomly schedules the test case processes and records a detailed trace A race condition is a possibility of nondeterministic execution that can make a program fail to meet its specification

Kostis Sagonas

Dialyzer @ UCM 2010

Current and Future Work 





Extend dialyzer to detect: −

more kinds of race conditions



more types of concurrency errors



violations in the requirements of Erlang behaviours (concurrency design patterns)‫‏‬

Extend the language of -spec’s to specify concurrency properties and requirements Generating tests from -spec’s

Kostis Sagonas

Dialyzer @ UCM 2010

Gracias!

Kostis Sagonas

Dialyzer @ UCM 2010

What are types good for? 

Document programmers' intentions



Can be used to prove properties of programs −





e.g., type safety, ...

Often help the compiler generate better code by avoiding some checks during runtime Detect some programmer errors

Kostis Sagonas

Dialyzer @ UCM 2010

Erlang terms 





Primitive terms: −

integers :

42,



floats

:

2.56, 3.14



atoms

:

foo,



binaries :

1593405849584548049385 true



Structured terms: −

tuples :

{foo, 42},



lists

[1, 2, 3.14]

:

{1, 2, 3}

Higher-order terms: −

funs

Kostis Sagonas

: fun(X) when is_atom(X) -> X == a end Dialyzer @ UCM 2010

Refined success typings 

Definition: −

Let f be a function with success typing  α   β. A refined success typing for f is a typing on the form  α '   β ' , such that

− −

and

, and

'⊆ α for which β ⊆' β the application For αall p value, . f  p  ∈ β '

Kostis Sagonas

reduces to a f  p 

Dialyzer @ UCM 2010