The Unix C shell interpreter language shall be used to explore an approach of .... quotient structure) and for extending a language (behaviour preserving ...
Modular, Behaviour Preserving Extensions of the Unix C-shell Interpreter Language Claus Pahl Department of Information Technology, Technical University of Denmark DK-2800 Lyngby, Denmark
Abstract The Unix C shell interpreter language shall be used to explore an approach of modular language descriptions and extensions. Based on a language core, language features are added stepwise on the core. Features can be described separated from each other in a self-contained, orthogonal way. Existing dependencies among the features can be represented by sequential ordering. The presented framework facilitates the stepwise development or extension of languages. Adapting basic constructs for an extended language is simplified by providing templates. These templates will also guarantee that properties of the basic language are preserved in the extension. Categories and Subject Descriptors: D.3.1 [Programming Languages] Formal Definitions and Theory (Semantics), F.3.2 [Logics and Meanings of Programs] Semantics of Programming Languages (Denotational Semantics), F.3.3 [Logics and Meanings of Programs] Study of Program Constructs Additional Key Words: Extension semantics, formal language engineering, language refinement, behaviour preservation, modular extensions
1 Introduction 1.1 Formal Language Engineering This is a case study in language extension. Starting from a simple language core which exhibits the main principles of the language, a number of features are added onto that core in a modular and property-preserving way. Language extension is presented as a refinement process similar to refinement approaches in software development. The property we want to preserve during the refinement process is behaviour, also called safety refinement elsewhere. Syntax (abstract and concrete) and semantics has to be dealt with in the given example. Languages and their extensions are presented in a denotational style. A key idea is to apply software engineering principles to language design: stepwise design, refinement approach, and separation of concerns. Then, a language can be formally developed and defined in the way it would be presented in a book — comprehensibility of language definitions is one of the objectives. The approach will allow us to define languages formally and modular. Our approach shall also enable one to derive adapted languages for particular purposes. The general purpose of this framework is to support the This work has been supported by the EU-HCM project De Stijl.
1
exploration of and experimentation with semantics. The development of a new or the description of an existing full-blown language is not our aim, but principles, features and their interaction can be studied with our approach. Danvy and Hatcliffe have used the notion of a programming language workbench [DH93]. The development of a whole language is an industrial process, but certain archetypical or prototypical aspects can be extracted in prior to be analysed formally in depth with appropriate tools. The presented approach to language extension is part of a larger framework of formal language engineering including other aspects of integrating, combining or restricting languages. This framework of formal language engineering is the aim of the EU-HCM project D E S TIJL [DeStijl]. The refinement approach to language definitions allows a modular development and presentation of languages. A language core and language features can be specified separately. Features will be specified as parameterised operators, thus allowing to be applied to a number of basic languages. Features are specified as independent constructs only referring by a formal parameter interface to the languages which should be extended by them. The preservation of behaviour is the central issue in extensions of languages. It guarantees that programs written in the basic language can be reused in the extended language. These programs are not only executable, they are also executed similarly to the execution in the base language. This similarity is constrained by an observation criterion. Applying this observation on the extended program, the original behaviour should be observable. We will provide a number of common templates for extensions which guarantee behaviour preservation. Behaviour preservation is in particular important since it guarantees orthogonality of the new feature with the basic language. A new feature can be added to a language such that there are no conflicts and, thus, the semantics of the basic constructs will remain. Extending by simply adding elements is easy and should not cause difficulties, but consider an extension in the sense of a refinement or an improvement where a certain notion has to be redefined. This might have side-effects on other constructs using the redefined one. We will concentrate in particular on this issue. In the remainder of this section, the C shell is briefly introduced and two core languages, one being the extension of the other, are presented. Even the extended core is restricted considering the extend of the full interpreter language, but in order to illustrate some of our ideas we have reduced and simplified. The second chapter contains some mathematical background necessary for the remainder. Section 3 contains an illustration of principles of extension in our approach. A more formal treatment of extension can be found in the subsequent section 4. A simple example applying these investigations is supplied subsequently in section 5. We continue with some more theoretical aspects in section 6. Section 7 then add other features onto the minimal core obtaining the extended core. More constructs are added in section 8. Each of the extension steps will be used to illustrate some particular aspects of extension.
1.2 The C Shell The Unix C shell is a shell command interpreter with a C-like syntax [Uni95]. The interactive shell reads commands from the terminal. Commands are distinguished into simple (echo, setenv, ..) and composed ones (pipe, sequence). Some features explained in the csh-manual [Uni95] are: history substitution, aliases, I/O-redirection, variable substitutions, command/filename substitutions, expressions and operators, control flow, command execution 2
Substitutions and aliases are introduced as convenient features. These features distinguish the C shell from more primitive ones. In general, the input is parsed into words, i.e. expressions on strings (see p. 1 in [Uni95]). We will use these ideas to develop a csh-interpreter language CSHL starting with a minimal core, extending this to a more complete core and then add those convenient features. Note that our language will only be a csh-like language. A definitive standard does not exist, all implementations might vary in details. We will make some more obvious deviations from existing implementations of the shell. get and put are not genuine commands. get is used here instead of cat fn, put instead of cat > fn. Those two commands will realize read and write operations on the file system. Pipes and sequences are expressed in the csh-language by the symbols j and ; (and jj and ; ; for the conditional variants depending on the success of the first argument). We will use pipe, seq and cond.
1.3 The Core Languages Two core languages for a csh-like language shall be presented: the first, called minimal core, describes the language in its principle ideas of processing commands on input- and output-streams and the second, called extended core, includes basic elements such as the underlying file system and an environment maintaining variables.
1.3.1 Minimal Core The minimal core shall answer the following questions:
what is processed?
by which medium is it processed?
how is it processed?
Streams are processed, possibly controlled by additional parameters. A basic command echo and two composed ones pipe; seq shall be introduced. The minimal core is presented in figure 1. Characters Chr are atomic tokens. A string expression StrExpr is a list of characters.
1.3.2 Extended Core To obtain a more complete description of a still elementary machine but including abstractions of components which form an operating system such as a file system or a variable environment which allows to configure system parameters, we present an extended core. This extended core (see figure 2) is supposed to be extended by features, which are called convenient in the manual, i.e. features which are not essential. Variables and characters in the extended core are atomic tokens. Variable is syntactic and semantic domain. The extended core will be developed from the minimal in the subsequent sections. 3
LANGUAGE DESCRIPTION syntax Cmd ::= ’echo’ StrExpr j ’pipe’ Cmd Cmd j ’seq’ Cmd Cmd j StrExpr ::= Chr semantic algebras
domain String domain Stream = String semantic functions S : StrExpr ! String S[ se] := if se = then else hd se ^ S[ tl se] C : Cmd ! Stream ! Stream C[ echo(s)]]i := S[ s] C[ pipe(c1; c2)]]i := let o1 = C[ c1] i in let o2 = C[ c2] o1 in o2 C[ seq(c1; c2)]]i := let o1 = C[ c1] i in let o2 = C[ c2] in o1 ^ o2 END
Figure 1: Minimal core
2 Mathematical Background Mathematical notions for our notations for describing a language (semantic algebra, congruence relation, quotient structure) and for extending a language (behaviour preserving morphism) and some properties are presented. The mathematical framework we are using is Universal Algebra [Coh65]. We are aiming at a denotational style. The basic idea is to structure semantics of a language into semantic algebras. This approach was already presented by Schmidt [Sch86]. Structured extensions of the language will be carried out based on this semantical structuring, rather than the classical viewpoint of obtaining structure by the syntactical structure. Definition 2.1 A semantic algebra is an algebra consisting of
(possibly compound) domains D1 ; : : :; Dn functions on these domains
Let Alg be the collection of all semantic algebras. For the extensions we will apply later on, we might identify a domain of interest S 2 fD1; : : :; Dng. This distinguished element is not part of the definition of a semantic algebra. It serves to denote that part of the basic language semantics that shall be redefined (note that redefinition is not always aimed at). Based on the 4
LANGUAGE DESCRIPTION syntax Cmd ::= ’echo’ StrExpr j ’setenv’ Variable StrExprj ’get’ StrExpr j ’put’ StrExpr j ’pipe’ Cmd Cmd j ’seq’ Cmd Cmd j ’cond’ Cmd Cmd StrExpr ::= (VariablejChr) semantic algebras domains String, Name, File, Exit, Variable Env = Variable ! String FileSys = Name ! File semantic functions
V : V ariable ! Env ! String V[ v] e := e(v) S : StrExpr ! Env ! String S[ se] e := if se = then else let s = case hd se in var(v) ) if v 2 dom(e) then V[ v] e else chr(c) ) < c > in s ^ S[ tl se] (e) C : Cmd ! (Env FileSys Stream) ! (Env FileSys Stream Exit) C[ echo(s)]](e; f; i) := (e; f; S[ s] e; true) C[ setenv(v; s)]](e; f; i) := (e y [v 7! S[ s] e]; f; ; true) C[ get(s)]](e; f; i) := if S[ s] e 2 dom(f ) then (e; f; f (n); true) else (e; f; ; false) C[ put(s)]](e; f; i) := (e; f y [S[ s] e 7! i]; ; true) C[ pipe(c1; c2)]](e; f; i) := let (e1; f1; o1; s1) = C[ c1] (e; f; i) in let (e2 ; f2; o2; s2) = C[ c2] (e; f1; o1) in (e2; f2; o2; s1 ^ s2) C[ seq(c1; c2)]](e; f; i) := let (e1; f1; o1; s1) = C[ c1] (e; f; i) in let (e2 ; f2; o2; s2) = C[ c2] (e1; f1; ) in (e2; f2; o1 ^ o2; s2) C[ cond(c1; c2)]](e; f; i) := let (e1; f1; o1; s1) = C[ c1] (e; f; i) in let (e2 ; f2; o2; s2) = C[ c2] (e1; f1; ) in
END
if s1
= true then (e2; f2; o1 ^ o2; s2) else (e1; f1; o1; s1)
Figure 2: Extended core
5
(possibly compound) domain S , a domain extension will be carried out. Also, we will distinguish function types1: 1) f : S ! X , 2) f : X ! S , 3) f : S ! S , and 4) f : X ! Y where X and Y are composed of the Di and S using domain constructors such as ; !; +; P . With each semantic algebra A we have associated a signature sig (A) = (S; Di(i = 1::n); fj (j = 1::m)) using the symbols for semantic entities as syntactic elements. Definition 2.2 Let A and B be semantic algebras with signature . B is subalgebra of A, if Bd Ad ; B 6= ; for each semantic domain d in and b 2 Bd ) opA(b) = opB (b) 2 B for each op with domain Bd. (opB is the restriction of opA to B). Equivalence relations will be used to define the property that certain elements are similar with respect to some observation. In particular, we will express behaviour preservation requirements by equivalence relations. A quotient algebra is obtained from applying the equivalence to a given algebra. Definition 2.3 A congruence relation on a set S is an equivalence relation (reflexive, symmetric, transitive) which satisfies the substitution property with respect to functions f on that set (let a; b; 2 S ):
ab ) f (a)f (b) Definition 2.4 Let A be an algebra and a congruence relation on the sets Ad in A. The quotient algebra A= is defined by
Ad= := f[a] j a 2 Ad g f ([a]) := [f (a)] A language is extended by refining the semantic algebras. Ingredients of a semantic algebra refinement are in general:
sets S and S 0 (the basic and the extended semantic domain of interest, respectively), a equivalence on S 0 describing the relevant or observable behaviour, an indexing function r
: S ! S 0= associating elements of domains.
The mapping associates domain and function names. The extension S 0 do not need to be constructed from S , i.e. the basic semantics do not need to remain directly. They are connected by a notion of observation or retrieval. Note that we are working with many-sorted algebras; is then a signature morphism. r : a 7! [b] with a 2 S and [b] 2 S 0= for b 2 S 0 is called a reinterpretation mapping. For associated semantic domains, each basic element a is associated with an equivalence class in the associated domain. The equivalence relation is uniquely identified by elements from A. These ingredients will be later on combined into templates of extension. The mapping r can be extended to a morphism from A to B= if A and B are algebras containing S and S 0, respectively, and is a congruence. The algebra for the extended language is B (not B=). The quotient structure is only used to capture the notion of behaviour preservation. A special morphism on algebras shall be introduced which formalises the idea of behaviour preservation. 1 The
fourth case of the function types is often omitted in further investigations since S does not appear on the outer level.
6
Definition 2.5 Let A and B be semantic algebras. Let B= be a quotient algebra with respect to a congruence relation on B . Let : A ! B be an association between semantic domains and operations, shall be a signature morphism from sig (A) to sig (B ). A morphism : A ! B is called behaviour preserving, if
is a domain-sorted set of mappings fd : Ad ! B(d) j Ad is a domain in Ag preserves the operation behaviour with respect to an equivalence , e.g. for op : d ! d0 and x 2 d it holds
d0 (op(x)) (op)(d(x)) The criterion for behaviour preservation can be graphically presented as:
0 (d) op - (d0)
6
6
d
op - d0
Note, that behaviour preservation requires only an equivalence of results on the extended level. As we will later see, this construction will allow us to define orthogonal language extensions. If two behaviour preserving morphisms and are composed ( ), then the definition of is relevant for . is based on the domain extension and . Proposition 2.1 If and are morphisms, then there exists a behaviour preserving morphism : such that the following diagram commutes:
A
A!B
-B
@@ @@ R@ ?
B=
?1 can always be represented by a composition ?1 where is onto and ?1 is 1-1. can be constructed by (s) := s0 if (s) = [s0 ] with s0 2 S 0 such that 0 (s ) = [s0 ] . is a morphism since is a congruence and a morphism to B=. Proof:
= since
(f (s)) (f )((s)) ) (f )((s)) = (f )([s0] ) = [ (f )(s0)] = [ (f )((s))] Therefore, preserves the operations as well as . See Corollary II.3.8 [Coh65] for the existence of ?1 . 7
ut
The mapping on function symbols : OP ! OP 0 associates functions for sets of functions OP and OP 0 . has to depend on if the behaviour preservation criterion is to be satisfied. Functions f 0 2 OP 0 in the extension have to be defined regarding the definition of in this case. Appropriate templates will be presented later on. Usually associated with a set of operations is a set of operation combinators (higher order operators), e.g. the conditional if [ b] : OP ! OP or the sequence ; : OP OP ! OP are command combinators. The definition of semantic algebras and behaviour preservation given above does not involve these combinators. A sensible requirement would be the following. Let f1 ; f2 be composable functions in algebra A and f10 ; f20 be composable functions in the extension B . If f10 preserves the behaviour of f1 and f20 preserves the behaviour of f2 , then f10 0 f20 should preserve the behaviour of f1 f2 (or vice versa) with higher order functions in the basic language and 0 in the extension. The definition of OP can be extended in a way such that the case of composition is covered. We will argue later on that the separated treatment of composition operators will lead to an improvement of the presented extension techniques.
3 Extending the Languages We will extend the minimal core (section 1.3.1) to the extended core (section 1.3.2) in a number of steps and we will also add some features to the extended core. Modularity is a key objective, thus complete features are added in each step. As in similar refinement approaches the preservation of properties is essential. Behaviour (of programs) is the property we are interested in when languages are extended. Some of the principal ideas shall be presented using the well-known assignment statement from an imperative language as a very primitive program whose behaviour has to be preserved in an extension of the notion of state.
C[ x := e] := s:update[ x] (E[ e] s)s x := e is a phrase of the syntactic domain Cmd. The semantic function C maps from Cmd to Store ! Store where Store is a semantic domain. A store s 2 Store is updated when the assignment command is executed. Let us now consider an extension of the language containing the assignment. States are stores which were extended somehow. Thus, the semantic function for the assignment has to be adapted. This expressed using a mapping .
: Store 7! State; : CA[ x := e] 7! CB[ x := e] is an association of domains and functions of a semantic algebra. A State-based algebra preserves the behaviour of the assignment on Store (defined by the mathematical model of assignment A [ x := e] ), if the following diagram commutes with respect to some equivalence:
C
State CB [ x := e]- State
6
6
Store
Store
Store CA[ x := e]- Store 8
As an example, an injection shall be used to extend the domain Store to State where a state consists of a store, an input buffer Input and an output buffer Output. INJECT
Store
into
State = Store Input Output
Stores s are refined expressed by a mapping r:
r : s 7! [(s; i; o)]
[(s; i; o)] := f(a; b; c) j a = sg Store Input Output. We have used an equivalence on State (which shall be called ) to define relevant behaviour. Extended functions are only expected to behave where
correctly with respect to these classes, i.e.
Store (CA[ x := e] (s)) CB[ x := e] (Store (s)) formally explains the commuting diagram. The mappings between the elements can be illustrated as follows:
s
p - (s; i; o)
@@ @r@ c R@ ?
[(s; i; o)]
c does exist always, p can be constructed from r and c. These set-based mappings r; c; p can be extended to morphisms ; ; on algebras by using the constructs congruence and quotient algebra State=: Store - State
@@ @@ R@ ?
State=
is a behaviour preserving morphism and is a canonical morphism onto the quotient algebra. The formal background was provided in the previous section. We can say that the quotient structure is an abstract interpretation abstracting from specific implementation details of the extension. The quotient structure focuses on those properties which express the behaviour preservation. The abstract interpretation is a homomorphic image of the extended algebra. 4 A Deeper Look at Extensions A notation for defining languages has been implicitly and informally introduced by presenting the two core languages. This section contains notation and background for extensions of languages. Before we dive into 9
technical details, let us sketch the outline of our extension approach. A language description consists of syntax, semantic algebras and semantic functions. We see semantics as the main aspect on which a language definition should be based and on which also a language extension should be based. Thus, we will provide two major constructs for language definition: semantic algebras and language descriptions, and also two constructs for extension: semantics extensions on semantic algebras and language extensions on language descriptions. Both extension constructs will allow us to derive operators. They shall be presented subsequently.
lifted base language language extension (new and redefined elements)
base language
semantics extension (based on domain extension, equivalence)
A language extension is the construct which serves to specify a new language feature in its semantics, but also in its syntactical interface. A language extension is a self-contained specification of a language feature. It refers only indirectly to a basic language. A basic semantic domain of interest S might be identified which is to be extended by some domain construction (S ). Before any new elements are added or existing elements are redefined, the semantics extension can be applied. Based on the domain construction and an equivalence relation specifying how behaviour has to be preserved, the semantics extension lifts the semantics of the basic language such that the domain constructions are adapted and behaviour is preserved. A semantics extension works in an untyped framework, i.e. it can be applied to any semantic algebra as long as a domain of interest can be identified. The semantics extension is a construct which does not depend on or refers directly to the basic language semantics. Extension templates will be introduced to facilitate the definition of such semantics extensions. The language extensions are based on a semantics extension which would be executed first if applied to a basic language. Then additions and redefinitions on the lifted base language can be carried out. We will summarise these ideas at the end of the example in section 5.3.
4.1 Semantics Extensions A language description, based semantically on semantic algebras, can be extended to another language defined by other semantic algebras. Behaviour might be one of the properties to be preserved. This involves defining the extending morphism, describing the relevant behaviour and proving the proof obligation that the defined morphism is behaviour preserving with respect to the relevant (or observable) behaviour expressed by the equivalence. This procedure shall be facilitated for the language designer by providing operators working on semantic algebras which adapt the basic semantic algebra to some extended domain construction. These operators will take some basic information about the intended extension as argument. They will allow the construction of an extension morphism from this information. We will provide operators which automatically discharge 10
the proof obligation of behaviour preservation. Experience in language extension shows that such common operators exist. A semantics extension is a construct to extend semantic algebras. The construct has to satisfy some properties in order to allow to construct a behaviour preserving morphism. We will present general properties in this section, concrete ones are introduced together with particular extension problems in the subsequent sections. For each operator it will be shown that the required properties are satisfied. The general form allows us to establish a library of semantics extensions which can be extended for particular purposes by a language designer. The general form of such operators facilitating extension shall be investigated now. Let : A 7! B denote a morphism on algebras where nothing is said about the algebras A and B . Let S and T be semantic domains of interest of A and B , respectively. Note that each domain identifier S or T might be defined by a domain equation. A semantics extension shall allow us to provide a morphism on algebras which lifts a basic algebra according to a domain extension, e.g. from Stream to FileSys Stream including a lifting of existing functions. New functionality, e.g. operations on FileSys, is not added by this construct. Definition 4.1 A semantics extension on semantic algebras A and B is a 4-tuple (; ; ; d) where
: sig(A) 7! sig(B) is a signature morphism on semantic algebras, is a binary relation on (S ) if S is the domain of interest of the source algebra, : Alg ! Alg= is a mapping on algebras, d is a collection of default elements s0[s0 ] for each equivalence class [s0] in (S ).
For domains of interest S , there might already be a behavioural equivalence S defined from previous extensions which has to substitute the equality =S on S . For all domains D not equal to S , D is assumed to be the equality (i.e. a congruence) and D is assumed to be the identity mapping. Note, that a semantics extension can be applied to an empty algebra. can be the empty mapping, as well. can also be defined as identity with the singleton class elements as defaults. Semantics extensions are operators to preserve and redefine semantics from the basic language. If their argument is empty, then there is nothing to preserve or redefine. The domain of interest, if chosen, denotes the part to be redefined. Language extensions provide the necessary means to define an extended language on an empty base language. A semantics extension does not in general allow us to derive a behaviour preserving morphism. The following proof obligations arise, if a behaviour preserving morphism has to be derived from a semantics extension: 1. 2. 3.
is an equivalence relation on S (S ) = T . The substitution property holds for , i.e. is a congruence and B= is a quotient algebra. Behaviour preservation is guaranteed by a derived .
Definition 4.2 Let A be a semantic algebra with a domain of interest S and be a signature morphism with domain sig (A). Let be an equivalence on a domain (S ). A behaviour preserving extension is every algebra B which extends A syntactically by such that is a congruence on B and functions (f ) preserve the behaviour of functions f . 11
4.2 Extension Templates In this subsection, some properties shall be investigated allowing us to derive behaviour preserving semantics extensions from a reduced amount of information in some particular situations. It will be proven for each particular case, that using a canonical construction for and for the functions fiB , a mapping can be derived such that behaviour preservation is guaranteed. In general, all elements of a semantics extension are necessary to define an extension on semantic algebras. Let S 7! (S ) the domain extension. An equivalence for (S ) can not be automatically derived from S . Only if there is such a possibility, the mapping can also be derived. This is in particular possible in the following examples. Example 4.1 Common templates based on a particular domain extension are the following (we will give
; ; d for each on domains, on functions will be considered later on): 1. 2. 3. 4.
: S 7! S T : (s; t)(s0 ; t0) iff s = s02 ; : s 7! [(s; t)] ; (s; t0) is the default element for [(s; t0)] : S 7! S + T : xx0 iff x = x0 ; : s 7! [s] ; s is the default element for [s] : S 7! (I ! S ) : tt0 iff 8i 2 I:t(i) = t0 (i) ; : s 7! t with t(i) = s for all i 2 I ; f0 with f0(i) = s is the default element for [f0] : S 7! P(S ) : pp0 iff p = p0 ; : s 7! [fsg] ; fsg is the default element for [fsg]
In the following we will provide three lemmata which generalise some of the ideas just presented. Conditions for obtaining a template from a domain construction will be elaborated. Lemma 4.1 Let s; s0 2 S be elements of a domain of interest and e[x] is an expression of type (S ) where is a signature morphism and x is a variable of type S . Assume e[x=s] e[x=s0 ] iff s = s0 . Then, is an equivalence on (S ). Proof: e depends only on the argument, thus an equivalence is formed. is an equivalence relation since = is one. ut Extended functions (f ) have not been considered so far. Functions (fi) in B can be derived from fi in A which satisfy the substitution property. This shall be presented now. The substitution property xy ) f (x)f (y) shall be investigated for the different kinds of function signatures.
: S ! X : requires f1B (x) = f1B (y) if xy for f1B : (S ) ! X f2A : X ! S : requires f2B (x)f2B (y) if x = y for f2B : X ! (S ) f3A : S ! S : requires f3B (x)f3B (y) if xy for f3B : (S ) ! (S ) f4A : X ! Y : requires f4B (x) = f4B (y) if x = y for f4B : X ! Y
1. f1A 2. 3. 4. 2 The
symbol
= might denote behavioural equivalence on S , see above. 12
Lemma 4.2 The following function definitions fiB in algebra B satisfy the substitution property ab ) fiB (a)fiB (b) for an extension of function fiA of a semantic algebra A, an equivalence and a morphism . 1. f1B ((s)) := f1A (s)
2. f2B (x) := (f2A (x))
3. f3B ((s)) := (f3A (s)) We will refer to this construction by using the mapping fiA on.
7 c fiB . The fourth case will be investigated later !
Proof: Consider the types: 1. 2. 3.
(s1)(s2) ) s1 = s2 ) f1A (s1) = f1A (s2) ) f1B ((s1)) = f1B ((s2)) (x1) = (x2) ) x1 = x2 ) f2A (x1) = f2A (x2) ) (f2A(x1))(f2A (x2)) ) f2B (x1 )f2B (x2) (s1)(s2) ) s1 = s2 ) f3A(s1 ) = f3A (s2) ) (f3A((s1 )))(f3A ((s1))) ) f3B ((s1)) f3B ((s2)) ut
The construction for f4B (x), based on f4A (x), follows from the previous three constructions and a canonical construction if S in involved in one of the domains X and Y , or is the identity otherwise. This shall be explained by a short example. Let f : X ! (S R) with X not involving S . Then we define f 0 : X ! (S 0 R) by f 0(x) := ((1(f (x))); 2(f (x))) if S is extended to S 0. More complicated definitions can always be derived in a compositional way. The mapping and the functions fiB (in particular f1B and f3B ) might be partial. Together they form a subalgebra of the full algebra in which they are mapped. Corollary 4.1 Let A = (S; Di(i = 1::n); fj (j = 1::m)) be a base algebra and E = (S 7! T; ; ; d) a semantics extension with T (S ). The algebra defined by (T; Di(i = 1::n); fj0 (j = 1::m) where and fj0 are constructed according to E , is a subalgebra of ( (S ); Di(i = 1::n); fj0 (j = 1::m)). Proof: T = construction.
fs0 j s0 = (s) for all s 2 S g (S ). Functions fj0 are closed on that subset due to their ut
Lemma 4.3 If is a congruence on (S ) and is an injective mapping and d is the collection of default values, then a behaviour preserving mapping : Alg ! Alg and a semantics extension E = (; ; ; d) can be constructed. Proof: Use the defaults d from the equivalence classes to form with : s 7! s0[s0 ] where Construction is guaranteed by Proposition 2.1.
: s 7! [s0]. ut
The ideas formulated in the three lemmata shall now be summarised in the definition of an extension template summarising that a semantics extension can be derived from a reduced set of information. 13
Definition 4.3 An extension template (S 7! T; ; ; d) consists of a domain maplet S alence on T such that Lemma 4.3 is satisfied, a mapping : S ! T= and defaults d .
7! T , an equiv-
The four examples 4.1 are extension templates. These (and others) will be used in the remainder of this paper to formulate behaviour preserving extensions. Note, that , , and d are derived from S 7! T .
Remark 4.1 The templates we have mentioned so far are based on domain extensions S 7! T where T is constructed on S . In principle, we do not need this occurrence of S in the domain expression T . Instead of associating Store with Store Input Output (as in section 3) we could also modify stores as well and use an association with Store0 Input Output. This allows a higher flexibility in defining behaviour preservation.
4.3 Language Extensions Different notations are used in our approach:
a description notation for describing languages, which is essentially denotational semantics,
a feature notation which is an adaption and modification of the description notation for the purpose of describing features as extensions on some basic language,
an extension notation which is used to describe the application of features as operators to basic languages.
The feature notation will be explained now. In section 4.1 we have introduced semantics extensions on semantic algebras. Now, we will introduce operators on whole language descriptions, called language extensions, including syntax, semantic algebras and semantic functions. Figure 3 contains an example. Another concrete, but simpler example of a specification of a language extension can be found in section 5.2 in figure 5. There, Core is a formal parameter denoting a semantic algebra. In the given example, the actual parameter algebra should contain a semantic function C on a domain Stream. We use a style of implicitly specifying parameter requirements. All identifiers (including Exit, Cmd1 or Cmd2) can be seen as variables which are instantiated when the operator is applied to a concrete semantic algebra. The following diagram shows the main constituent parts and their dependencies. 14
LANGUAGE EXTENSION A BSTRACTION := Extension syntax semantic algebras ( for : StrExpr ! String do UNION StrExpr = Chr into StrExpr = (V ariable + Chr) ; for : StrExpr ! String do INDEX String by Env ! String ) ; for : Cmd ! FileSys Stream ! FileSys Stream Exit do MAP INJECT Env into FileSys Stream ! FileSys Stream Exit composition use STANDARD SEQUENCING for argument [ seq ] ; [ cond] ; use STANDARD DISTRIBUTION for argument [ pipe] ; use STANDARD LAST RESULT for result [ pipe] ; [ seq ] ; [ cond] New Feature syntax StrExpr ::= (Chr j Variable) Cmd ::= ’setenv’ Variable StrExpr semantic algebras domain Variable; domain Env = Variable ! String semantic functions
S S C
C
C C
C
C
V : V ariable ! Env ! String V[ v] e := e(v) ; S[ se] e := if se = then else let s = case hd se in var(v) ) if v 2 dom(e) then V[ v] e else chr(c) ) < c > in s ^ S[ tl se] (e) ; C[ setenv(v; s)]](e; f; i) := (e y [v 7! S[ s] e]; f; ; true)
END Figure 3: Extension by environments
15
C
syntax
semantic algebra
Syntax
composition
Semantics
Semantic Functions
Two orthogonal operators will be provided for the notation: ’,’ and ’;’. ’,’ is a separator for denoting independent specifications and its semantics is union. ’;’ is a sequencing operator and its semantics is override. The override semantics allows elements to be redefined. We have not used add and redefine operations explicitly since we want to emphasise the self-containedness of language feature descriptions. Even the application of templates (or semantics extensions) can be seen as type definitions for the feature definition (similar to defining a compound type by a type expression on its components). A language extension is divided into two parts. The first part deals with a potential base language and their adaptation to the new requirements. The second part contains the definition of the new feature. The first part is divided into three subsections:
Syntax: A renaming operator for syntax identifiers might be applied. Semantic algebra: Semantic algebras of a basic language can be lifted to the extension level. Normally, templates are applied here. Templates can be used in a constrained form for
S : Expr ! String do INDEX
String by Env ! String
and in an unconstrained form: INJECT
Env into FileSys Stream
Composition: Specific extension templates can be applied for higher-order operators.
The second part is as well structured into three, but slightly different parts:
Syntax: Syntactical constructions for the new feature can be specified. When the extension is applied to a basic language, these elements will be added. 16
Semantics: Semantic algebras can be provided. A simple semantic domain is considered as a semantic algebra without functions. Note, that domains can always be named, i.e. specified by domain equations. Semantic functions: New semantic functions can be defined, existing ones can be redefined. Semantic functions can be e.g. structured by syntactic domains.
There is a well-formedness condition when such a language extension is applied to a language description: 1. Syntax: added and redefined rules should not exist, the left-hand side of a redefined rule should exist before. 2. Semantic algebras: newly introduced ones (or domains) should not exist before. 3. Semantic functions: should be well-formed, e.g. a syntactic argument is well-formed. 4. Templates: template applications should be syntactically well-formed. These criteria are syntactical. The syntactic conditions (membership, well-formedness) can be checked automatically. They are either naming problems or well-formedness conditions. The explicit application of the language extension to a concrete language description will normally be omitted in the remainder. We will assume a one-one matching between names of actual and formal parameters. A general aim of our approach is to allow language features to be specified as entities of their own which can be reused in different contexts3. The explicit formal interface can be derived when required (for application). The operator needed to map a basic language to an extension is only derived at the time of application. Some of the identifiers will be associated with identifiers in the basic language, thus forming formal parameters of the operator. The question what a formal parameter is in the specification of a language feature does not need to be decided until the time of application. One of the objectives of our approach is a modular presentation of semantics. Separating core and extensions is one way of achieving this. We have here increased the degree of modularity by introducing a parameterisation concept. This allows the definition of language features independently of concrete core languages. David Schmidt uses the notion of orthogonality of language features to indicate the aim of having standardised language parts that can be assembled to a larger language in a predictable way. The design of language features should result in orthogonal, self-contained units. We expect language features always to be added on top of basic languages. Their semantics is thus always that of an operator working on language descriptions. To support the aim of self-containedness we have tried to provide a notation which does not refer too explicitly to a possible basic language (e.g. by an explicit interface with formal parameters). In general, there will be elements which are expected to be imported (e.g. a notion of values) and also elements which might have to be shared with other features (e.g. identifier). The awareness of these issues shall be considered as part of the methodology of defining features. 3 We have chosen very concrete names directly referring to the core or more basic languages in the examples, since the aim here in this paper is the structured development and representation of one language. Reusing the feature definitions is possible, but not investigated here.
17
5 A Simple Example A simple example of an extension shall be looked at. Exit values shall be added indicating whether a command was executed successfully or not. We will use boolean values instead of numbers as in the original C shell. We will begin with the definition of the INJECT template (already being used for introducing the notion of behaviour preservation) which will be used as the main construct of extension in the subsequent application. In this example the minimal core (figure 1) will be extended by exit values.
5.1 Injection An algebra A shall be extended to B by using the technique of injection on the semantic domains:
S 7! S T Thus, we have the following functions to consider4:
: S ! X and f1B : S T ! X f2A : X ! S and f2B : X ! S T f3A : S ! S and f3B : S T ! S T
1. f1A 2. 3.
Let us define the equivalence by (S T ) (S T ):
(s; t)(s0; t0)
iff
s = s0
is an equivalence relation on S (S ) = S T according to Lemma 4.1. Let S : S ! (S T )= be B a mapping defined by S (s) := [(s; t0)] (for some default element t0 2 T ). The functions fi shall be B A B A B A defined as follows: f1 (s; t) := f1 (s), f2 (x) := (f2 (x); t0), and f3 (y ) := (f3 (s); t0) for y = (s). Note, that rng () = (S T )=, i.e. is onto. The substitution property has been shown in Lemma 4.2. The substitution property holds for with respect to the functions fiB , i.e. is a congruence and B= is a quotient algebra. Behaviour preservation is guaranteed due to Lemma 4.3. The injection template INJECT summarising these definitions is presented in figure 4.
5.2 Applying Injection in a Language Extension Applying the template INJECT as part of a language extension is presented in figure 5. A new domain Exit is injected by INJECT into the result domain of commands, we have numbered the syntactical occurrences of Stream – without any semantical relevance). As explained above in section 4.3, identifiers in the operator are only formal parameters which have to be substituted by actual ones when the language extension is applied to a language description. For the sake of simplicity, we will omit the explicit application of the extension operator and assume an application with a one-one correspondence between formal and actual parameters. The application would be well-formed. 4 Function type
f4 neglected. 18
EXTENSION TEMPLATE INJECT S into S T
:= (
)
: S 7! S T; : fiA 7!c fiB (s; t)(s0; t0) iff s = s0 S (s) := [(s; t0)] (s; t0) for each [(s; t0)] Figure 4: Extension template INJECT
LANGUAGE EXTENSION E XIT VALUES := Extension syntax semantic algebras for : Cmd ! Stream1 ! Stream2 do INJECT Exit into Stream2 Exit with t0 := true composition New Feature syntax Cmd ::= ’cond’ Cmd Cmd semantic algebras domain Exit semantic functions [ cond(c1; c2)]](i) := let (o1; s1) = [ c1] (i) in let (o2; s2) = [ c2] () in if s1 = true then (o1 ^ o2; s2 ) else (o1; s1) END
C
C
C
C
Figure 5: Extension by exit values
19
The INJECT template can be used for all simple commands. It has been mentioned earlier that composed commands are not captured by the simple definition. Some remarks on that issue will be made in the next section 6. The conditional command cond is newly introduced, thus, there is no proof obligation. The other proof obligations concerning existing commands are discharged by using the template which guarantees behaviour preservation. For instance, the resulting definition for echo after applying the template would be:
C[ echo(s)]](i) := (S[ s] e; true) The behaviour is obviously preserved (cf. figure 1).
5.3 Semantics Extension and Language Extension The feature specified here by a language extension is an exit value concept with one operator cond. This operator is an operation combinator whose final result depends on the exit value of its first argument. A new domain Exit is introduced. Using the INJECT template (which allows to derive a semantics extension), the semantics of a basic language to which the language extension might be applied is lifted according to the pattern Stream2 7! Stream2 Exit such that behaviour is preserved. The identifier Stream2 is a formal parameter when applied to a basic language, thus it might be matched with a domain State in a basic language. The semantics extension derived from the templates adapts any argument semantics to the extended domain construction specified by the language designer. The base language is now available in the extended language. Their extended definition is consistent and behaviour preserving. On top of this lifted base language, the language designer can specify a command cond with syntax and semantics. If such a command already exists in the base language, it is overridden, otherwise it is a new definition. We have provide two distinct extension constructs, the one, language extension, is dedicated to the full specification of the properties of the new feature, the other, semantics extension, is dedicated to the behaviour preserving lifting of the basic language to some extended domain construction necessary for the new feature. The language designer shall be freed from adapting the definitions of the basic language and should instead be allowed to focus on the specification of the new feature.
6 Syntactical Substitution, Operation Combinators, and Recursive Domains Some more conceptual issues shall be looked at in this section. The first one concerns the adaption of function calls whose types have been modified. Then, we address a particular, but very important kind of functions: higher-order functions which in our approach will appear in the form of operation combinators. At the end of this section, recursive domain equations are briefly mentioned.
6.1 Syntactical Substitution An example of an operation definition containing calls of semantic functions is the semantic function for expressions in a stateless setting:
E[ e1 + e2] := E[ e1] + E[ e2] 20
In an extension by state we would expect the semantic definition to be parameterised by a state variable s
State.
:
E0[ e1 + e2] := s:E0[ e1] s + E0[ e2] s Each occurrence of a modified function in the body – the call extension:
E[ e] – has to be substituted by a call of the
E[ e] 7! E0[ e] s This applies to every function (not only the semantic functions). The form of the substitution depends on the domain extension, here Expr ! V al is extended to Expr ! Store ! V al by indexing V al with Store. Let in the following be the function which substitutes syntactically uses of other (semantic) functions which have changed their types. Either more parameters (or constructions on parameters) have to be added or a modified result has to be transformed by certain observations to the desired form. In general:
X ! Y to X 0 ! Y : provide more or other parameters X ! Y to X ! Y 0 : adapt result Abstracted, the example presented above would look like f A : S ! V 7! f B : S ! I ! V . Let f A = s:F (s) with a recursive call of f A in F (s). We define f B = s:i:(F (s)[f A(s)=f B (s)(i)]). A canonical construction of based on the domain extension which is applied is possible. The canonical funcc tion extension fiA 7! fiB has to be adapted by this substitution, e.g. f1B ((s)) := f1A (s). The mappings X : X ! X 0 and Y : Y ! Y 0 have to be used to express the adaption for the general case. X is applied to arguments, Y to results. The mapping should express these applications. Behaviour preservation can be proven inductively on the calls.
Note that recursion is here not a problem similar to recursive functions in denotational semantics. Here, the problem can be solved syntactically by appropriate substitutions. There is no problem, if the function itself is called only once in its body. Only if there are two or more calls a problem arises. The question is whether the argument has to be given to all of the subordinated calls unmodified. The example could be extended in another way, e.g. by adding side-effects in expressions requiring different states to be used by [ e1 ] and [ e2] . This imposes a new evaluation order on a binary expression. This issue is investigated in section 6.2 below.
E
E
6.2 Operation Combinators Templates describe transformations on domains and on functions on these domains. Higher order functions define the composition of functions, they depend on the structure of their argument functions. Operators of the language are combined to non-primitive ones. If basic functions are extended in their arguments or results, composition has also to be adapted. It will turn out that there are certain variants in which operation combinators are defined. These variants will lead to some templates for operator combination. These templates are based on variants. Let us assume a combination operator on two basic operators c1 and c2 and a semantic function C. An abstract form of a definition for C is:
C[ (c1; c2)]]x := g(C[ c1] f1(x); C[ c2] f2(x)) 21
Arguments and results to the functions shall be looked at. Argument: Arguments to the argument functions of a composition , which are given firstly to , have to be assigned to the argument functions. There are two common possibilities:
DISTRIBUTION: f1 (x) = f2 (x) = x, i.e. f1 SEQUENCING: f1 (x) = x and f2 (x) = g 0(
= f2 = id. C[ c1] f1(x)) for some function g0.
Result: The result of applying a function composition has to be a value of the result type of each of the functions.
COMPOSE: the result r is a composition r = g (r1; r2) of the results r1 and r2 of both argument functions (where g is an arbitrary expression on the arguments). The composition operator has to be explicitly specified. It is a default similar to default values for the extension . LAST RESULT: often, only the result of the second argument function is taken (if the composition should implement a form of sequencing), i.e. r = r2.
These variants should be preserved, if an operation combinator is extended. This shall be referred to by the template name STANDARD. The application of this template can be implicitly assumed. STANDARD is a higher-order template like operation combinators are higher-order functions. For the command sequence ; as the combination operator, we have the following definitions for f1 , f2 and g :
f1 := id;
f2 := C[ c1] f1 (x);
g := C[ c2] f2 (x)
A template for operation combinators is a combination of an argument-variant and a result-variant. For instance, the sequencing command ; is a combination of SEQUENCING for the arguments and LAST RESULT for the result. The example presented in section 6.1 is a combination of DISTRIBUTION and COMPOSE. These two combinations are the most common ones. For these templates for operation combinators, behaviour preservation has to be proven. It has to be shown:
((c1; c2)) = 0 ((c1); (c2)) where and 0 are constructed using the same variants. Since is part of a behaviour preserving mapping, and the structure is unchanged, behaviour preservation is guaranteed. We have already seen the INJECT template which would allow us to add a new argument or a new result domain to functions. If new arguments and/or results are added, we do not need to stick to the given variants, since the new component is irrelevant for behaviour preservation. Let us summarise this section on operation combinators:
Operation combinators are automatically extended by using the same variants as for the basic definition. This will be referred to by using STANDARD as the template name. The behaviour of operation combinators with respect to new components added by the extension has to be specified explicitly, possibly by using a combination of variants. 22
We could extend templates such that operation combinators can be automatically dealt with even for the new components. For instance, for INJECT we could determine that the new component is dealt with by a SEQUENCING/LAST RESULT combination. This choice would be completely at random. Thus, we prefer the more explicit solution. Peter Mosses [Mos92] introduces several operation combinators, called action combinators in Action Semantics. These combinators serve to reduce overloading generally applied to operators such as for function composition or the sequence ; for composition of program constructs as commands or declarations. As an exercise, his definitions could be analysed using our templates (see [Mos92] p. 275).
6.3 Recursive Domain Equations We will not consider problems of recursive domain equations D = F (D) in particular (see [Sch86], [AJ94]). Elements in our approach are assumed to be denotable to make associations. There must be a solution to a particular recursive domain equation. This lies in the responsibility of the language designer. Any problems resulting from this are not considered (see [Sch86] Chapter 11).
7 Adding Constructs of the Environment We will add a file system and an environment for maintaining system parameters and other variables. Both can be considered as independent language features as they consist of new domains and operations.
7.1 File System The file system as a semantic domain and operations reading and writing files shall be presented. The file system can be read and modified by commands. An appropriate injection template shall be used to model the semantics extension. 7.1.1 Injection into Maps A quite common special case of injection shall be investigated now. The domain S ! S 0 shall be extended by using a variant of injection to S T ! S 0 T , i.e. a component T is injected into both sides of a function space (added as parameter and result). Normally, a value of component T is unchanged. Modifications will only happen in particular cases. Thus, the introduction of an appropriate template seems to be reasonable. Let us consider functions f A defined by
(x; t)X (x0; t0)
iff
: S ! S 0 and f B : S T ! S 0 T . An equivalence X on X T shall be
x = x0
for any domain X (e.g. S or S 0). X is an equivalence relation (see Lemma 4.1). The extended function f B shall be defined as follows assuming a morphism :
f B ((s)) := (1(S0 (f A (s))); 2(S (s))) 23
This construction, using the projection 2 (S (s)) for the second component of the result, propagates the former value, i.e. the value remains unchanged. This is a special case of the general injection presented in the previous section. Since we deviate from the canonical function extension scheme (the new scheme shall c be referred to by fiA 7! fiB ), congruence has to be shown. Behaviour preservation is guaranteed (see Lemma 4.3). Lemma 7.1 The substitution property holds for S ; S 0 with respect to the functions fiB constructed as described above, i.e. is a congruence and B= is a quotient algebra. Proof: By definition of only the first component of pairs in template INJECT.
X T is relevant. This is analogously to ut
EXTENSION TEMPLATE MAP INJECT S
! S 0 into S T ! S 0 T := (
)
c fB : S ! S 0 7! S T ! S 0 T; : fiA 7! i 0 0 0 (s; t)(s ; t ) iff s = s S (s) := [(s; t0)] (s; t0) for each [(s; t0)]
Figure 6: Extension template MAP INJECT.
7.1.2 Applying Map Injection in a Language Extension The file system is injected on both sides of the map defining commands (see figure 7). MAP INJECT can be used for all commands; pipe, sequence and the conditional command use the variants LAST RESULT and SEQUENCING for operation combinators (see section 6.2). These variants only describe the behaviour for file systems as new arguments and results. The behaviour with respect to streams is automatically extended by STANDARD using the original variants. There are no proof obligations. get and put are new commands.
7.2 Environment Introducing an environment is a more demanding task, but nevertheless very common in language design. We will need three templates, two of them have not been presented so far. 7.2.1 Union The domain S shall be extended to S + T by using disjoint union. The relation is defined by xy , x = y for x; y 2 S + T . is an equivalence relation on (S ) = S + T . B= is a quotient algebra. : S ! S + T can be defined by an embedding: : s 7! s. Note, that is not onto. The following functions have to be considered: f1A : S ! X and f1B : S + T X , f2A : X ! S and f2B : X ! S + T , and finally f3A : S ! S and f3B : S + T ! S + T . The functions fiB shall be defined as follows: f1B (x) := f1A (x), 24
LANGUAGE EXTENSION F ILE S YSTEM := Extension syntax semantic algebras for : Cmd ! Stream ! Stream Exit do MAP INJECT FileSys into FileSys Stream ! FileSys Stream Exit composition use STANDARD SEQUENCING/LAST RESULT for [ pipe] ; [ seq ] ; [ cond] New Feature syntax Cmd ::= ’get’ StrExpr Cmd ::= ’put’ StrExpr semantic algebras domain Name, domain File, domain FileSys = Name ! File semantic functions [ get(s)]](f; i) := if [ s] 2 dom(f ) then (f; f (n); true) else (f; ; false)
C
C
END
C S C[ put(s)]](f; i) := (f y [S[ s] 7! i]; ; true)
Figure 7: Extension by File System
25
C
C
f2B (x) := f2A (x), and f3B (x) := f3A (x). The function definitions are trivial, some functions are partial. The substitution property holds for with respect to the functions fiB , i.e. is a congruence and behaviour preservation is guaranteed. The template is presented in figure 8. EXTENSION TEMPLATE UNION S into S + T
:= (
)
: S 7! S + T; : fiA 7!c fiB xy , x = y S (s) := [s] s0 for each [s0] Figure 8: Extension template UNION
7.2.2 Indexing The domain S shall be indexed by I resulting in I ! S . The relation on the function domain I ! S shall be defined by (let f; f 0 2 I ! S ) f f 0 , f (i) = f 0(i) for all i. is an equivalence relation on S (S ) = I ! S . The functions to be considered are f1A : S ! X and f1B : (I ! S ) X , f2A : X ! S and f2B : X ! (I ! S ), and finally f3A : S ! S and f3B : (I ! S ) ! (I ! S ). Let S : S ! (I ! S )= s be a mapping defined by S (s) := [f ] with f (i) = s for all i 2 I . S is not onto since it maps only to functions which map all their arguments to s, i.e. S (S ) I ! S . The functions fiB shall be defined as follows: f1B (f s ) := f1A (s), f2B (x) := (f2A (x)), and f3B (y ) := (f3A (s)) for y = (s). Some of the functions are partial (since they only take functions as arguments which are constant). Proposition 7.1 The image of and the collection of functions fiB form a subalgebra of (I Proof: The functions are defined such that they are closed with respect to the image of .
! S; fiB). ut
The substitution property holds for with respect to the functions fiB , i.e. is a congruence and B= is a quotient algebra. Behaviour preservation is guaranteed. The template is presented in figure 9. EXTENSION TEMPLATE INDEX S by I
! S := (
)
: S 7! I ! S; : fiA 7!c fiB f f 0 , f (i) = f 0(i) for all i S (s) := [f s ] with f (i) = s for all i 2 I f0 for each [f0 ] Figure 9: Extension template INDEX
26
7.2.3 Applying Map Inject, Union and Indexing in a Language Extension With the following extension (figure 3) we obtain the extended core presented at the beginning (figure 2). Variable is syntactic and semantic domain. Elements of StrExpr may now also be variables (achieved by using union for Chr and V ariable). It can be easily seen that using the union template, the behaviour of S with respect to arguments se without variables is preserved. Then, a second template is applied to the first extension by union on strings String and string expressions StrExpr. String shall be indexed by Env . There is only one semantic function which is to be redefined with respect to the structure of its argument se. A template offering help for such a constellation would be a rather uncommon one. Thus, we have decided here to redefine the function S explicitly. Map inject works for all commands; pipe is extended by argument distribution, seq by argument sequence for the arguments, both are extended by last result for the results. The conditional command uses argument sequence and last result. The variants for the operation combinators pipe, seq and cond only describe the behaviour with respect to the new component Env. There is a proof obligation for S: the obligation is easily discharged, since the union step is based on injection of original strings (without variables) into the union domain. The environment is only relevant, if variables are involved (which is not subject to behaviour preservation).
8 Adding Convenient Features Some particular issues (map injection and composition, syntactical issues including concrete syntax and parsing) will be looked at in the subsequent sections in the context of some more extensions on the extended core. These extensions will include error streams, I/O-redirection, aliases and command substitution, features that have been named convenient in the csh-manual.
8.1 Error Streams We will now have a look at operation combinators on S ! S 0 and on the extended domain S T ! S 0 T obtained by applying map injection. Some variants have already been introduced in section 6.2 which were behaviour preserving. The semantics for a composition operator : (S T ! S 0 T ) (S T ! S 0 T ) ! (S T ! S 0 T ) is usually defined on the semantics of its arguments, e.g. sequencing of commands. Arguments (of types S and T ) to the argument functions of are given firstly to from where they have to be assigned to the argument functions.
C[ (c1; c2)]](s; t) := g(C[ c1] f1(s; t); C[ c2] f2(s; t)) The two common possibilities for treatment of arguments are distribution f1 (s; t) = f2 (s; t) = (s; t) and sequencing f1(s; t) = (s; t) and f2 (s; t) = g 0( [ c2] f1 (s; t)). For the particular problem of behaviour preservation neither variant causes problems if the behaviour on S is preserved by the new semantics. This can be achieved by using the STANDARD template. The result of applying a function composition has to be a value of type S 0 T . Considering only T , we have again the possibilities compose (the result t is a composition t1 op t2 of the results of both argument functions where op is an arbitrary operator) and last result (often, only the result of the second argument function is taken – if the composition should implement sequencing –, i.e. t = t2 ). Again, both forms are independent of the choice for T , the results will be equivalent with respect to . Behaviour preservation is guaranteed. The use of templates for the new component T facilitates the
C
27
extension of : (S ! S ) (S between common pattern.
! S ) ! (S ! S ) to the extension. The language designer can choose
Figure 10 contains the introduction of error streams by using INJECT. The only proof obligation for beLANGUAGE EXTENSION S TD E RROR := Extension syntax semantic algebras for : Cmd ! (Env FileSys Stream) ! (Env FileSys Stream Exit) do INJECT Stream into (Env FileSys Stream Exit Stream) with t0 := composition use STANDARD COMPOSE by ^ for argument [ pipe] ; [ seq ] ; [ cond] New Feature syntax semantic algebras semantic functions
C
C
END
C
C
C[ get(s)]](e; f; i) := if S[ s] e 2 dom(f ) then (e; f; f (n); true; ) else (e; f; ; false; file not found) Figure 10: Extend the extended core by error streams
haviour preservation for get is easily discharged. Only a fifth result value is added. pipe; seq; cond can be extended by using the composition variants. Their (re)specification is not necessary, but for the sake of insight into how the templates work, we will give the resulting specifications here:
C[ pipe(c1; c2)]](e; f; i) := let (e1; f1; o1; s1; err1) = C[ c1] (e; f; i) in let (e2; f2; o2; s2; err2) = C[ c2] (e; f1; o1) in (e2; f2; o2; s1 ^ s2; err1 ^ err2) C[ seq(c1; c2)]](e; f; i) := let (e1; f1; o1; s1; err1) = C[ c1] (e; f; i) in let (e2; f2; o2; s2; err2) = C[ c2] (e1; f1; ) in (e2; f2; o1 ^ o2; s2; err1 ^ err2) C[ cond(c1; c2)]](e; f; i) := let (e1; f1; o1; s1; err1) = C[ c1] (e; f; i) in let (e2; f2; o2; s2; err2) = C[ c2] (e1; f1; ) in if s1
= true then (e2; f2; o1 ^ o2; s2; err1 ^ err2) else (e1; f1; o1; s1; err1 ^ err2)
Since only a new result domain was added, we only need to apply a result template. In this case it is COMPOSE by ^ where ^ is concatenation. For pipe; seq; cond we have
(:; :; :; :; err1) (:; :; :; :; err2) (:; :; :; :; err1 ^ err2) The other domains are extended by STANDARD. 28
8.2 Redirection Command redirection is a simple exercise (see figure 11, we only presented an incomplete solution here). Only a new command (requiring some syntactical extensions and the definition of the respective semantic function) is needed. There are no proof obligations! LANGUAGE EXTENSION R EDIRECTION := Extension syntax semantic algebras composition New Feature syntax
RedirCmd ::= 0redir0 RedirKind Cmd StrExpr Append RedirKind ::= 0 in0j0out0 j0err0j0err out0
semantic algebras semantic functions
R : RedirCmd ! (Env FileSys Stream) ! (Env FileSys Stream Exit Stream) R[ redir(s1; c; s2; b)]](e; f; i) := case s1 in 0 in0 ) C[ pipe(get(s2); b)]](e; f; i) 0 out0
:::
) :::
END Figure 11: Extension by redirection
8.3 Aliases Figure 12 contains the result of adding mechanisms for maintaining aliases. This kind of problem is new since modifications in the syntax including concrete syntax and parsing is necessary. Cmd and Word are semantic and also syntactic domains. Remember, that we have already used this technique e.g. for variables. Proof obligations are automatically discharged by using a template. The map inject templates can be applied for all commands echo; setenv; tr; get; put and redir, the combinators pipe; seq; cond are extended by using the STANDARD template for the existing domains. Aliases can be dealt with by sequencing and last result. The treatment of an added alias table a is not relevant with respect to preservation of properties. alias and unalias are not subject to behaviour preservation.
Concrete syntax and parsing have been introduced through using Word as a syntactical and semantical domain and implementing parsing functions on that domain as a semantic algebra WORD. The presented parsing functions only parse words from Word. These functions were expressed as semantic functions, blurring the distinction between syntactical and semantical domains. Syntactical entities have been emphasised in 29
LANGUAGE EXTENSION A LIAS := Extension syntax semantic algebras for : Cmd ! (Env FileSys Stream) ! (Env FileSys Stream Exit Stream) do MAP INJECT Aliases into
C
(Env FileSys Stream) ! (Env FileSys Stream Exit Stream) with t0 :=
composition use STANDARD SEQUENCING/LAST RESULT for New Feature syntax
C[ pipe] ; C[ seq] ; C[ cond]
Cmd ::= 0alias0 Word Word Cmd ::= 0unalias0 Word
semantic algebras
domain Cmd domain Word asub parse cmd : Word (Env FileSys Stream Aliases) ! (Env FileSys Stream Exit Stream Aliases) asub parse cmd[ wl] (e; f; i; a) := C[ parse[ alias subst[ wl] (a)]]] (e; f; i; a) parse : Word ! Cmd parsable : Word ! Bool domain Aliases = Word ! Word alias subst : Word Aliases ! Word alias subst[ wl] (a) := if not(has cycle(a)) then = dom a then wl else alias subst[ a(hd wl) ^ tl wl] (a) if wl = _hd wl 2 has cycle : Aliases ! Bool
semantic functions
END
C[ alias(w; wl)]](e; f; i; a) := (e; f; ; true; ; ay[w 7! wl]) C[ unalias(w)]](e; f; i; a) := (e; f; ; true; ; a=fwg) Figure 12: Extension by aliases
30
figure 12 by using square brackets [ :] when passed as an argument to another function and have been typesetted in this font when appearing in expressions. Blurring the distinction has already been done for the sake of simplicity, e.g. when the domain V ariables was considered as syntax and also semantics.
8.4 Command Substitution A command enclosed by backquotes (‘...‘) in the C shell is executed in a subshell. The standard output of this execution replaces the backquoted string in the command line. In figure 13, an extension by command substitution is presented. Again, syntax is the major problem. One of the particular problems is that evaluations in the language yield concrete syntax, which then has to be reparsed and evaluated again. This problem is solved in the same way as introducing aliases. Token and Chr are semantic and also syntactic domains. There are no proof obligations since only new syntax and corresponding semantic functions are introduced.
cqscan cmdsub scan is the semantic function for the main program, subshells are executed by sub cmd and the result is a word list which can be passed to functions parsing with respect to aliases. sub cmd works recursively on a list of tokens, where each token is either quoted or non-quoted. Quoted ones are executed in a subshell. The result is a string which is subsequently parsed into a word list.
9 Conclusions We have presented a language description in form of a stepwise development by refinement. Based on a language core exhibiting the basic ideas of language, language features were added onto that core step by step. The language features can be specified without referring directly to the core to which they should be applied. This guarantees a high degree of modularity in language design and language presentation. Language features can also be investigated as self-contained constructs of their own. We have provided an operator for language extension which describes the extension of syntax and semantics (or interface and functionality) in one construct even though we have also argued that semantics is the starting point. The idea that the whole extension is specified using semantics only (and adding syntactic representations later) could also be pursued. We have presented a two-level approach using two different kinds of extension operators. The first adapts the semantics of the basic language to an extended domain construction specified by the language designer. Using this operator was simplified and supported by a notion of extension templates. This support was given in order to allow the language designer to focus on the specification of the new features to be added. The technical support by semantics extensions is crucial for the creative part of specifying the new feature. This technical support adapts basic language constructs automatically preserving the behaviour. This support is essential for the feasibility of the extension approach. Since in each step only a few concepts are added or redefined, normally a large amount of rewriting would be necessary. We have facilitated extensions by providing templates which can be applied for a number of standard cases. Applying these templates also allows properties to be preserved. Respective proof obligations are automatically discharged. We would now like to refer to the work of David Schmidt. Most of the ideas presented here have been developed on his text book on denotational semantics [Sch86]. Let us cite here some sentences from his recent book [Sch94] which support our claims. 31
LANGUAGE EXTENSION C OMMAND S UBSTITUTION := Extension syntax semantic algebras composition New Feature syntax
Prgm ::= Token Token ::= quoted j uninterpreted quoted ::= Chr uninterpreted ::= Chr
semantic algebras
domain Chr domain Token = Chr cqscan cmdsub scan : Chr (Env FileSys Stream Aliases) ! (Word (Env FileSys Stream Exit Stream Aliases) cqscan cmdsub scan[ cl] (e; f; i; a) := let (cl0; (e0; f 0; i0; a0)) = cmd sub[ cmd quote scan[ cl] (e; f; i; a)]] in (word scan[ cl0] ; (e0; f 0; i0; a0)) cmd sub : Token (Env FileSys Stream Aliases) ! Chr (Env FileSys Stream Exit Stream Aliases) cmd sub[ l] (e; f; i; a)) := if l = then (; (e; f; i; a)) else case hd l in uninterpreted(cl1 ) ) let (cl2; (e0; f 0; i0; a0)) = cmd sub[ tl l] (e; f; i; a) in (cl1 ^ cl2; (e0; f 0; i0; a0)) quoted(cl1) ) let (e0; f 0; o0; err0; a0) = asub parse cmd[ word scan[ cl1] ] (e; f; i; a); (cl2; (e00; f 00; i00; a00)) = cmd sub[ tl l] cmd sub sequencing((e; f; i; a); (e0; f 0; o0; err0; a0)) in (o0 ^ cl2; (e00; f 00; i00; a00)) cmd sub sequencing : (Env FileSys Stream Aliases) (Env FileSys Stream Exit Stream Aliases) ! (Env FileSys Stream Aliases) cmd sub sequencing((e; f; i; a); (e1; f1; o1; s1; err1; a1)) := (e; f1; ; a) word scan : Chr ! Word word scanable : Chr ! Bool cmd quote scan : Chr ! Token cmd quote scanable : Chr ! Bool
semantic functions END Figure 13: Extension by command substitution
32
Every programming language possesses a ’core’ of values and operations that establish the fundamental capabilities of the language. [...] the concept [of a core] is helpful, because it gives a language designer a starting point. [...] Once the core meets the designer’s expectations, it can be extended by conveniences like subroutines, modules, and the like. Schmidt uses the notion of orthogonal language features to point out that features should be designed as selfcontained units understandable without reference to other language features (and the core). Language features should preferably not conflict. This idea was realised in our approach be the construct of language extensions, i.e. operators with parameters. In particular, the semantics extensions guarantee that the behaviour of the basic language is preserved, which means that the new feature does not conflict with the basic language. In contrast to Schmidt, we do not consider every conflict or interaction between features as harmful and thus to be avoided. This issue is discussed elsewhere [Pah97]. The approach could be further improved if we consider parallel extensions. Instead of extending step by step sequentially, we could extend on a common core as long as there are no dependencies between extensions in parallel. This is be investigated in [Pah97].
Acknowledgements The author would like to thank Bo Stig Hansen for suggestions and comments. Parts of the material presented here have been given as exercises in project work at the Technical University of Denmark, most of the function implementations are based on those project results.
References [AJ94] S. Abramsky and A. Jung. Domain Theory. In S. Abramsky, D. Gabbay, and T. Maibaum, editors, Handbook of Logic in Computer Science, Vol. III, pages 1–168. Oxford University Press, 1994. [Coh65] P.M. Cohn. Universal Algebra. Harper and Row Publishers, 1965. [DeStijl] DeStijl. Design and Specification through Interfacing and Joining Languages, EU-HCM project, http://www.cs.tcd.ie/DeStijl. [DH93] O. Danvy and J. Hatcliffe. On the Transformation Between Direct and Continuation Semantics. In S. Brookes, M. Main, A. Melton, M. Mislove, and D. Schmidt, editors, Proceedings Mathematical Foundations of Programming Semantics MFPS’93. Springer-Verlag, LNCS 802, 1993. [Mos92] P.D. Mosses. Action Semantics. Cambridge University Press, 1992. [Pah97] C. Pahl. An Investigation into Parallel Extensions of the Unix C-shell Interpreter Language. Technical Report IT-TR:1997-015, Department of Information Technology, Technical University of Denmark, 1997. [Sch86] D.A. Schmidt. Denotational Semantics: A Methodology for Language Development. Wm.C. Brown Publishers, 1986. [Sch94] D.A. Schmidt. The Structure of Typed Programming Languages. MIT Press, 1994. [Uni95] Unix Manual Pages. csh - shell command interpreter with a C-like syntax, 1995.
33