AC Type Checker: A First Step Towards Code ... - Semantic Scholar

3 downloads 554 Views 148KB Size Report
Thus, a type checker ts well into the framework of a code veri cation system. This paper presents an outline of the denotational semantics of a C type checker ...
A C Type Checker: A First Step Towards Code-Level Veri cation (Work in progress)

Jim Alves-Foss, Chris Toshok, Luke Sheneman Laboratory for Applied Logic Department of Computer Science University of Idaho, Moscow, Idaho USA [email protected]

Abstract The correctness of code-level implementations of software requires, among other things, the proper use of typed constructs. The use of data types in high-level programming languages provides a mechanism for the partial speci cation of program correctness. Thus, a type checker ts well into the framework of a code veri cation system. This paper presents an outline of the denotational semantics of a C type checker and its implementation in HOL. If the type speci cations of a program are veri ed, we can assume that no type error will occur during execution (or even compilation) of the program. Thus, if we separate the static type semantics from the dynamic semantics, we can verify this type checking portion of the speci cation separately. This makes the dynamic semantics of the language much simpler, since no run-time type checking is necessary.

1 Introduction Software veri cation has been a topic of interest among computer scientists for over twenty years. This interest has been fed not only by the desire to develop methodical tools to bring about assurance that the system implementation performs according to the speci cation, but also by the importance we place on the correct performance of certain systems. With the continued proliferation of computer systems into our lives, human safety is relying more than ever on the correct performance of these systems. The most common techniques for software veri cation involve mapping the actual code into constructs that represent our understanding of what the code does. Axiomatic semantics [1, 2] (sometimes called Hoare-style semantics ) de ne each statement in terms of triples. These triples de ne how the statement a ects the relationship between predicates (preconditions and post-conditions). Operational semantics [4] de ne each statement in terms of how that statement is operationally implemented. This approach also uses an axiomatic basis for the semantic de nitions. Denotational semantics [3, 5, 6] (which is the approach we use in this research) de ne the software environment in terms of mathematical objects, and each statement in terms of 1

functions on these objects. This approach permits us to precisely de ne the functionality of the programming language and it e ects on the system. A system that uses a denotational approach can provide a rigorous analysis of the code-level implementation with respect to a formal speci cation. Regardless of the approach used, it is essential to develop a set of tools to assist in the evaluation of the software. This evaluation consists of examining the code for correct use and validating correspondence with the speci cation. The research presented in this paper is a rst step toward the development of such a system using the HOL theorem proving system. Speci cally we have developed a denotational speci cation for the use of types in the C programming language. This speci cation is currently being implemented in the HOL system, and being applied to sections of C code. In the remainder of this paper we present a little background on code-level veri cation. Following the background, we present our approach to a C Type Checker. The type checker is presented initially in terms of the denotational semantics, and then followed by the HOL implementation. Complete documentation of our e orts can be found in [7, 9].

2 Code-Level Veri cation Programming languages come in many forms, styles and levels of abstraction. Regardless if a language is functional or imperative, low-level or high-level, the de nition of the language can be divided into two parts:

syntax: which consists of the structure and form of the language.  semantics: which consists of the meaning we assign to the syntactic structures.



When verifying the correctness of software (whether through the use of testing or formal methods) it is essential to examine both the syntactic and the semantic aspects of the software. These aspects can be de ned in terms of the syntactic correctness, semantic correctness and functional correctness.

2.1 Syntactic Correctness Syntactic correctness consists of the proper use of variables, operators and constructs of the programming language. The rst phase of code-level veri cation involves the veri cation of the syntactic structure and organization of the software. Fortunately the rules for constructing syntactically correct programs are well known. In general, parsers and other programming tools are able to inform a user if the code is syntactically correct. The concrete syntax of the programming language is de ned in terms of a set of strings over an alphabet. A parser can process a program (collection of strings) and determine if the

2

program is in the language (i.e., is syntactically correct). Once a program is determined to be syntactically correct, it is translated into an abstract syntax. The abstract syntax of the programming language is de ned in terms of a set of trees . A particular program is represented as one of these trees (e.g., a parse tree) that unambiguously represents the structure of the program. We have developed a sml-based parser for the C programming language that performs these standard syntax checks [8] and translates the program into a abstract syntax parse tree. In this project we de ne the semantics for this tree representation of the program.

2.2 Semantic Correctness Semantic correctness consists of verifying that the use of the programming language constructs and variables are consistent with their intended meaning. The denotational semantics of programming languages can be divided into two groups:  

static semantics: which is a mapping of the syntactic constructs to mathematical

objects that de ne what a program means as a whole, without actually executing it. dynamic semantics: which is a mapping of the syntactic constructs to mathematical functions that de ne what a program means when it is executed. These functions operate on the statically de ned mathematical objects.

Type speci cations are part of the semantic description of a programming language. The static aspects of type speci cations are used to determine if programs are well-formed before execution. This includes de ning proper coercions between types. The dynamic aspects of type speci cations are used to indicate the amount of storage to allocate during execution. A more complete description of the dynamic semantics of type speci cations can be found in [3]. We discuss how we de ne and verify the static aspects of type checking in Section 4.

2.3 Functional Correctness A piece of code is considered functionally correct if it correctly performs the functions intended by the developer. This is the traditional scope of code-level veri cation that is discussed in the literature and is one of the goals of our research program. Testing for functional correctness requires the development of a test suite which the code is evaluated against. Formal veri cation of the functional correctness of code requires the development of semantics that de ne the meaning of the code and verifying that the code's meaning satis es the program's speci cation. Further exploration of this topic is left for future presentations.

3

3 Type Checking The use of data types in high-level programming languages provides a mechanism for the partial speci cation of program correctness. Thus, a type checker ts well into the framework of a code veri cation system. Modern compilers and programming tools can often be used to check that programming constructs are well-typed (i.e., satisfy the semantic constraints of the programming language). Type speci cations are a part of the programming language semantics, whether de ned implicitly or explicitly in the formal language semantics. These formal semantics are not completely de ned for poorly typed statements unless they explicitly include type checking. However, if the type speci cations of a program are veri ed, we can assume that no type error will occur during execution (or even compilation) of the program. Thus, if we separate the static type semantics from the dynamic semantics, we can verify this type checking portion of the speci cation separately. This makes the dynamic semantics of the language much simpler, since no run-time type checking is necessary. In the remainder of this section we present a portion of the denotational semantics we have de ned for type checking the C programming language and their implementation in the HOL theorem proving system. Using these de nitions, it is very straight-forward to use the HOL rewriting system to determine if code is well-typed.

3.1 Abstract Syntax Constructs As we mentioned previously we are evaluating the program as it is represented by an abstract syntax parse tree. The abstract syntax constructs that are the output of the parser are grouped into the following:

Statements | normal statements such as while, do-while, if-else, switch, etc.  Expressions | expressions such as the conditional expression and unary expressions.  Declarations | assigning type to identi ers, and possibly initializing these \vari-



ables".

Parameters | formal parameters for functions.  Externals | de nitions external to function bodies, includes function de nitions.



3.2 The Static Semantics and HOL In order to o er a reliable and trustworthy veri cation system and as a rst step to a complete veri cation system, we have chosen to implement the type checker in the HOL automated 4

theorem proving system. In this section we give an outline of the static semantics and their mapping into HOL. Information about the types associated with identi ers is given in Type Info. This information is made up of quali ers, storage classes, and basic types, which are de ned as follows: let TyQual_Axiom = define_type `Type_Qualifier` `Type_Qualifier = Volatile | Const`;; let TySC_Axiom = define_type `Storage_Class` `Storage_Class = Typedef | Extern | Static | Auto | Register`;; let BT_Axiom = define_type `Basic_type` `Basic_type = Void | Char | Short | Int | Long | Float | Double | Signed | Unsigned`;;

Type Info has the following representation: let Type_Info_Axiom = define_type `Type_Info` `Type_Info = TyDef Identifier TyNil Type_Info_List Type_Info Type_Info TyQual Type_Qualifier TySC Storage_Class TyBas Basic_type`;;

| | | | |

Notice that structures and unions are currently not supported in Type Info. Adding them would require adding an element of the form similar to: TyStructUnion TypeEnv

The basis of the type checker is the Type Environment, which returns the type of an identi er: 5

TypeEnv? :: Identif ier ! T ypeResult TypeResult is represented in HOL in the following way: let TypeResult_Axiom = define_type `TypeResult` `TypeResult = Type Type_Info Abstract Parameters | Label | Typeerror | Welltyped`;;

and TypeEnv has the following form in HOL: let TypeEnv_Axiom = define_type `TypeEnv` `TypeEnv = typeenv (Identifier -> TypeResult) | EmptyEnv`;;

The constructor EmptyEnv is a syntactic representation of ?. Pointer information associated with a type is contained in the type Abstract. let Abstract_Axiom = define_type `Abstract` `Abstract = NotPointer | Pointer | PointerPointingTo Abstract`;;

Parameter information associated with a type is contained in the type Parameters. let Parameters_Axiom = define_type `Parameters` `Parameters = Parameter Type_Info Abstract Identifier Parameters | ParamList Parameters Parameters | VarParamList Parameters | NoParameters | NotFunction`;;

Type environments admit three operations: access, to retrieve the type associated with an identi er; update, to add a mapping from a particular identi er to a type; and new env, which is used to create a new environment. let NEWENV_DEF = new_definition (`NEWENV_DEF`, "(newenv:TypeEnv) = typeenv (\i.Typeerror)");; let ACCESS_DEF = new_recursive_definition false TypeEnv_Axiom `ACCESS_DEF` "(access i (typeenv e) = e i) /\

6

(access i EmptyEnv = Typeerror)";; let UPDATE_DEF = new_recursive_definition false TypeEnv_Axiom `UPDATE_DEF` "(update i t (typeenv e) = typeenv (\j.(j=i) => t | e j)) /\ (update i t EmptyEnv = EmptyEnv)";;

3.3 Semantic Clauses for Statements The semantic function that operates on statement constructs has the following type: ST

: Statement ! TypeEnv ! TypeResult

Inside HOL, this statement is called STA. Some examples of semantic clauses for statements are given below along with their mapping in HOL. The if-statement has the following semantic clause: ST [ if E S1 else S2 ] e = ET [ E ] e = Typeerror ! Typeerror [] ST [ S1 ] e = Typeerror ! Typeerror [] ST [ S2 ] e In HOL this would be written as: (STA (If_Statement e1 s1 s2) e = ((EXPR e1 e) = Typeerror) => Typeerror | ((STA s1 e) = Typeerror) => Typeerror | (STA s2 e))

The for-statement has the following semantic clause: ST [ for (E1;E2 ;E3) S ] e = ET [ E1 ] e = Typeerror ! Typeerror [] ET [ E2 ] e = Typeerror ! Typeerror [] ET [ E3 ] e = Typeerror ! Typeerror [] ST [ S ] e In HOL, this clause would have the following form: (STA (For_Statement e1 ((EXPR e1 e) | ((EXPR e2 e) | ((EXPR e3 e) | (STA s e))

e2 e3 s) e = = Typeerror) => Typeerror = Typeerror) => Typeerror = Typeerror) => Typeerror

All the statement constructs have been given semantic de nitions, with all being as straightforward as the two shown. 7

3.4 Semantic Clauses for Expressions The semantic function that operates on expression constructs has the following type: ET

: Expression ! TypeEnv ! TypeResult

Inside HOL, this function is called EXPR. Some examples of semantic clauses for expressions are given below with their mapping in HOL. An identi er expression has the following semantic clause associated with it: ET [ I ] e =(e I) and has the following form in HOL: (EXPR (Identifier_Expression i) e = (access i e))

The conditional expression is handled by the following clause: ET [ E1 ? E2 : E3 ] e = ET [ E1 ] e = Typeerror ! Typeerror [] ET [ E2 ] e = Typeerror ! Typeerror [] ET [ E2 ] e = ET [ E3 ] e ! ET [ E2 ] e [] Typeerror This clause has the following form in HOL: (EXPR (Conditional_Expression e1 e2 e3) e = ((EXPR e1 e) = Typeerror) => Typeerror | ((EXPR e2 e) = Typeerror) => Typeerror | ((EXPR e2 e) = (EXPR e3 e)) => (EXPR e2 e) | Typeerror)

3.5 Semantic Clauses for Declarations and Externals The semantic functions that act upon declarations and externals are unlike the counterparts for statements and expression in that they do not return a value of type TypeResult. Instead, they return a TypeEnv, in which the new declaration or External is de ned. The types for each of these functions is given below: DT : ExT :

Declaration ! TypeEnv ! TypeEnv External ! TypeEnv ! TypeEnv 8

The value that the parser returns and that the type checker operates on is of type External. Therefore the result of type checking a C program is simply the result of the application of ExT to the parse tree. If the result is EmptyEnv (?), the code has a type con ict within it. Otherwise, it can be considered to be well typed. One area of interest and of further research is the embedding of error messages from within the HOL theorem prover to o er an exact location of type error(s).

4 Future Work In this paper we discussed the denotational semantics of types for the C programming language, and an implementation of these semantics in the HOL system. Part of the HOL implementation is a type-checker that evaluates whether the code is well-typed. The results of this type-checker can be determined using the HOL rewrite rules. We chose the C programming language due to its large acceptance in the programming community, especially in the development of system control and application software. The C language is very powerful, and provides a large number of features that make it an interesting language to work with. Unfortunately, the C programming language is not a strongly typed language. Automatic type coercions (or casting) occurs liberally in C, where all simple data types are either void, a oating point value or an integer. All pointers, enumerated types, characters and boolean expressions can be cast to integers. Formal speci cation of a program may require that all coercions be explicitly de ned, or that coercions of certain variable or types be non-permissable. A future focus of this research will be on permitting these additional restrictions to be easily evaluated within our type checker. This research is a portion of the initial phases of a program veri cation research project. The short-term goals of this project are to develop the formal semantics of the C programming language, and establish a set of theories in the HOL system to facilitate reasoning about programs. The project's longer term goals involve the development of proof methodologies for C programs and other imperative programming languages.

References [1] R.W. Floyd. Assigning meanings to programs. American Mathematical Society, pages 19{32, 1967. [2] C.A.R. Hoare. An axiomatic basis for computer programming. Communications of the ACM, 12(10):576{583, 1969. [3] P.D. Moses. Denotational semantics. In J. van Leeuwen, editor, Handbook of Theoretical Computer Science, volume B, pages 575{631. The MIT Press, Cambridge, 1990. 9

[4] G.D. Plotkin. A structural approach to operational semantics. Technical Report DAIMI FN-19, Computer Science Department, Aarhus University, Denmark, 1981. [5] D.S. Scott and C. Strachey. Towards a mathematical semantics for computer languages. In J. Fox, editor, Proceedings of the Symposium on Computers and Automata Systems, pages 19{46, New York, 1971. Polytechnic Institute of Brooklyn Press. [6] J.E. Stoy. Denotational Semantics: The Scott-Strachey Approach to Programming Language Theory. M.I.T. Press, 1977. [7] C. Toshok, L. Sheneman, and J. Alves-Foss. The denotational semantics of a C type checker. University of Idaho Technical Report in Preparation, April 1993. [8] C. Toshok, L. Sheneman, and J. Alves-Foss. Implementation of a C-parser in SML. University of Idaho Technical Report in Preparation, April 1993. [9] C. Toshok, L. Sheneman, and J. Alves-Foss. Implementation of the denotational semantics of a C type checker in hol. University of Idaho Technical Report in Preparation, April 1993.

10

Suggest Documents