Integrating EXPRESS and SGML for Document Modelling in Control ...

0 downloads 0 Views 281KB Size Report
Oct 19, 1995 - ... gives a brief introduction to some of the features of SGML via an ex- .... a pair of elements one of which is a list title, element L1, and the other ...
Integrating EXPRESS and SGML for Document Modelling in Control Systems Design Juan Bicarregui and Brian Matthews Systems Engineering Rutherford Appleton Laboratory Chilton, Didcot, OXON, U.K. fjcb,[email protected] October 19, 1995 Abstract This paper considers the integration of documents written using the Standard Generalized Markup Language (SGML) into an information modelling context using EXPRESS. An architecture is presented which allows storage and retrieval of SGML Document Type De nitions (DTDs), and documents which conform to those DTDs. Also considered is the extraction of documents from the EXPRESS information model which incorporate values from other, non-documentary information represented in EXPRESS.

1 Introduction The computing systems for industrial process automation projects are typically complex, distributed, real-time systems. Several di erent types of information are used in such a project, including data for the components, project documents and process models. A signi cant proportion of the total design e ort in control systems design is in the production of documentation both to accompany the product, and to document the design process. Many of these documents are of standard forms tailored to the details of a particular design. In order to streamline the production and reuse of such documents, it is widely accepted that they should be developed in a structured format so that generic document schemas and sections of documents can be classi ed, stored and retrieved by structure, without recourse to an analysis of semantic content. Systematic reuse of this sort would greatly improve productivity. To this end, the ISO sponsored Standard Generalised Markup Language (SGML) [2] is available as a means of developing structured documents.

1

Nevertheless, to eciently store and retrieve information used in the design of a control system, and to smoothly integrate one information type with another, it is desirable to use one common representation to store all information types. The EXPRESS language [3] is a widely accepted standard for representing engineering data, and in this paper we describe the use of EXPRESS as the common medium for storage and exchange of all data, including documents. However, it would also be desirable to handle documents in SGML to allow the ease of expression in structuring and writing documents and also specialised document generation tools to be used. To facilitate this, we describe the the provision of translators between EXPRESS and SGML. This paper describes how document modelling can be integrated into an EXPRESS system. The work has been undertaken in a larger exercise in information structuring in the context of the reuse of information, in the Esprit III project TORUS1. In a companion paper in this conference, we discuss how process modelling can be integrated into the same framework [4]. Related work in this area is described in[1, 6]. The next section gives a brief introduction to some of the features of SGML via an example. The section 3 discusses architectural issues arising in integrating EXPRESS and SGML. We then discuss some of the details of these translations: in section 4, that of representing SGML Document Data Types in EXPRESS; in section 5, generating EXPRESS schemas from SGML Document Data Types; and in section 6, extracting SGML documents from EXPRESS whilst incorporating EXPRESS data. Finally we report on the ongoing implementation of this architecture and mention some further areas of investigation.

2 The Standard Generalised Markup Language SGML provides a standard method of structuring and exchanging documents. It provides a mechanism for de ning the textual conventions (character set, language etc) within which a document is written and also allows the de nition of a \markup". It is this markup which de nes the structure of documents by identifying generic sections of the text of a document, and by specifying which sections can lie within the scope of others. Each document compatible with SGML provides a reference to a Document Type De nition (DTD) which de nes the markup. DTDs provide the basis of the document structuring which we wish to relate to the information structuring provided by EXPRESS. Each kind of document used in an organisation can be given as a DTD. These are then interpreted in standard ways to generate the document in the desired format. As an example to illustrate the restricted subset of SGML considered in this paper, we describe the structure of a simple report. The Document Type De nition of such a document is given in Figure 2. Line numbers have been added to left of each line. DTDs comprise a 1 ESPRIT III project 8080 - TORUS: Tools for Object Based Large Scale Reuse for Industrial Systems Design, a collaboration between Cegelec Projects Ltd (UK), FLS Automation A/S (Denmark), Rutherford Appleton Laboratory (UK) and Alcatel-ISR (France). TORUS seeks to raise the level of reuse in industrial process control projects.

2

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17) (18) (19) (20) (21) (22) (23)


DTD for simple Report -RAL ELEMENTS MIN CONTENTS -Report - (Head, Body, Appendix?) Head - (Title, Author) Title - O (#PCDATA) Author - O (#PCDATA) Body - (Section+) Section - (H1,P*) H1 - O (#PCDATA) P - O (#PCDATA | List | Subsect) Subsect - (H2,Q*) H2 - O (#PCDATA) Q - O (#PCDATA|List) List - (L1 & Item+) L1 - O (#PCDATA) Item - O (#PCDATA) Appendix - (A1,P*) A1 - O (#PCDATA) ELEMENTS NAME VALUE DEFAULT -Report security (confiden|public) public status (draft|final) draft refid CDATA #REQUIRED

Rutherford Appleton Laboratory

> > > > > > > > > > > > > > > > > > > >

>

Figure 1: Document Type De nition of a Simple Report series of declarations which begin2 with < and end with >. There are three major types of declaration in SGML: elements, attributes, and entities3 . Text, such as lines 1, 3, and 20, contained within a pair of -- is ignored as comments.

2.1 Elements Structure is de ned in a Document Type De nition by means of element declarations, distinguished by the keyword !ELEMENT. Such an element declaration consists of an element name, a minimisation, and a group. The minimisation declares whether the user can omit begin or end tags for this element in documents; it has two characters, each of which can be either \-" for retain or \O" for omit. The group declaration de nes how the element is constructed. This can be simply text, such as the element Title in line 6, denoted by the keyword #PCDATA (for Parsed Character DATA), or it can have a more complicated structure referring to other elements. This is de ned using a notation similar to that of regular expressions, as described below. At the top level, a Report (line 4) has three components in sequence: a Head, a Body and an optional Appendix, optional since it has a ? sux. In line 5, a Head is declared to be a Title followed by an Author, both of which are bottom level #PCDATA elements. A The syntax of SGML is rede nable. In this paper we shall follow the standard reference syntax It is unfortunate that there is some overlap in the terminology of SGML and EXPRESS, especially in the terms \attribute" and \entity". Where confusion may be caused, we shall refer to \SGML attributes" and \SGML entity", whilst reserving \attributes" and \entity" for EXPRESS. 2 3

3

(line 8) consists of a non-empty recurrence of Sections, de ned using the + sux, and in line 9, a Section is declared to have a heading H1 followed by a P* representing a recurrence of paragraphs, which could be empty. A paragraph element P (line 11) is formed by a choice of elements, separated by |s, either plain text (#PCDATA), a List element, or a subsection, de ned by the Subsect element. The List (line 15) element is a pair of elements one of which is a list title, element L1, and the other is a non-empty recurrence of list items Item, but these can occur in either order, as denoted by the & separating them. Other elements in this DTD follow a similar pattern. Body

2.2 Attributes Lines 21-23 declare attributes. Attributes are additional non-structural information which can be associated with an element. Such information is not used for structuring by SGML, but can a ect the appearance of the document. In this example, three attributes are declared all of which pertain to the element Report: security which can take the alternative values confiden or public, so the element can be marked with the level of security of the document; status which can take the alternative values draft or final; and refid which is a value of type CDATA, that is Character DATA, the SGML base type for text which does not need further processing. Thus this attribute represents an identi er for the document. An attribute also has a default value for cases in which the explicit setting of the attribute is omitted in a document instance. Thus security has the default public, and status has default draft, whereas refid has the keyword #REQUIRED denoting that it always has to be explicitly set by the document writer.

2.3 Entities Line 2 gives an example of an SGML entity. At their most straightforward, entities provide \macro substitution"; the text assigned to an entity in the declaration is expanded in the text when the entity is referred to. Thus we have an entity RAL which will expand out into the text \Rutherford Appleton Laboratory" when an entity reference to RAL is placed in the text of a document. Entities can be used in a more sophisticated manner, instance for de ning non-ASCII characters and alphabets, and also can contain markup which is interpreted on use. For the purposes of this paper, we restrict their use to the one described above.

2.4 Document Instances The DTD thus de nes a template for marking up documents, and any document which conforms to this DTD can only use markup in the way speci ed by the DTD. A small document which conforms to the example DTD is given in Figure 2.3. The rst line uses a !DOCTYPE declaration to declare which DTD the document conforms to: this can either be a local declaration, or in this case a reference to a le, report.dtd, where the DTD is stored. Each instance of an element is enclosed by a begin tag and an 4

A Short Report on the Laboratory Brian Matthews

Introduction

The &RAL; is a research laboratory which forms one of the constituent sites of the Central Laboratory of the Research Council ...

...

Systems Engineering

...



Figure 2: A Simple Report Document end tag . Tags can be omitted if the relevant tag minimisation allows. Attributes are also set within the begin tag, so the tag contains values for two of its attributes. The missing attribute, security is assumed to take its default value, public. The entity RAL is referred to in the text by an entity reference &RAL;. This will thus expand in place to its value. Note that SGML only de nes the structure of documents: it is not concerned with de ning their appearance. In order to produce documents in a given document preparation system, a mapping from the DTD to that document preparation system must be de ned which speci es the appearance. For example, it is at this stage that it may be decided to generate headings in a larger, bold font, with more line separation.

3 Architecture for Integrating SGML and EXPRESS To enable the use of SGML to describe and manipulate documents, yet to store, search and retrieve those same documents using EXPRESS, requires an architecture for integrating documents and document models within an EXPRESS framework. In a paper in the 1994 EXPRESS User Group conference [1], a mapping from SGML to EXPRESS was proposed which generates EXPRESS schema from SGML Document Type De nitions (DTDs). This mapping forms the basis of instance translation functions used to store documents which conform to the DTD into EXPRESS instances conforming to the generated schema. In this paper we extend this approach by also de ning DTDs in terms of generic EXPRESS schema, and by a mechanism to insert data from the information model into documents. 5

We thus integrate the SGML document model and the EXPRESS information model. An architecture which achieves this integration by a series of translations between the languages, is depicted in Figure 3. a DTD

a Doc

6 #

?

#

DTD to instance "

?

??

#

Document Translator

!

DTD to EXPRESS "

!

?

"

6

!

?

Instance a DTD a Doc Data ...................................................... Meta - DTD in Exp SGML in Exp data EXPRESS DataBase

Figure 3: The Architecture of Language Translators. At the core of the architecture is an EXPRESS database which stores all components of the information model, be they component data, documents, or process models. In order to store documents in this database and also to store the formats to which they conform, we propose a series of Translators to convert documents and document formats into EXPRESS instance data and meta-data and also extract documents from the database. The translations are as follows:

(1). SGML in EXPRESS. The DTDs themselves are data items which have to be

retrieved and reused just like any other. To store DTDs, an EXPRESS schema is prede ned in the EXPRESS database which captures the SGML language metadata. DTDs are then translated and stored as instances of this prede ned EXPRESS model. (2). DTD to EXPRESS. An SGML to EXPRESS Translator takes a DTD, and generates an EXPRESS schema which has a similar information structure and content. Instances of this generated schema will thus be documents conforming to the DTD, stored as EXPRESS data.

6

(3). Document to EXPRESS Translator. A document translator is used to trans-

late documents conforming to the given DTD, which has been analysed by the DTD-to-EXPRESS translator above. Note that this will require the speci c DTD as input as well as the document. This will convert the document into an EXPRESS instance of the translated DTD's schema. (4). EXPRESS to Document Translator. Retrieval of document components from the EXPRESS database requires a reverse translator from EXPRESS to SGML also to be generated from the SGML to EXPRESS mapping so that documents stored in the database can be extracted in a manner corresponding to their SGML de nitions. We give some of the details of these translations in subsequent sections of this document.

4 Representing DTDs in EXPRESS In this section we discuss the translation of SGML DTDs into instances of the \SGML in EXPRESS" described above. The purpose of this translation is so that DTDs can be handled in the same manner as the rest of the data in the information model. A generic EXPRESS schema is de ned capturing the structure of DTDs. DTDs are then de ned as instances of this schema. The EXPRESS schema and the translation are here de ned step by step, using the previously de ned example DTD. A DTD is a sequence of SGML declarations, stored in the following entity:

ENTITY dtd; lines : LIST [0:?] OF sgml decl; END ENTITY ;

Thus in our example, we generate the following instance. Note that the instance reference identi ers are automatically generated by the implementation and have do not refer to anything in this paper. report inst = dtd([#1,#2,#3,#4,#5, ,#17,#18]) :::

The three kinds of SGML declaration we are using in our example are represented via an ABSTRACT supertype sgml decl which can be one of three subtypes, entity decl, element decl, or attlist decl.

ENTITY sgml decl ABSTRACT SUPERTYPE OF (ONEOF(entity decl, element decl, attlist decl)) ; END ENTITY ;

4.1 Elements Elements become instances of the following EXPRESS entity, with a name, a tag minimisation and a group. 7

ENTITY element decl; element name : STRING ; END

tag min : tag min pair ; content : group ; ENTITY ;

The tag minimisation ags are described via the following entity and enumeration type:

ENTITY tag min pair ;

begin min : TagMin ; end min : TagMin ; ENTITY;

END TYPE TagMin = ENUMERATION OF (KEEP, OMIT); END TYPE ; The groups are represented by the following type:

TYPE group = SELECT(symbol group, base group, seq group, and group, or group, opt group, closure, ne closure); END TYPE; A symbol group is a reference to another SGML element and is thus just a string, while other base groups are references to the base types of SGML and thus are given by an enumeration of names. The other group types store the complex group declarations.

TYPE symbol group = STRING; END TYPE; TYPE base group = ENUMERATION OF (PCDATA, CDATA, EMPTY, ) ; END TYPE ; ENTITY seq group ; s : LIST [0:?] OF group; END ENTITY; :::

:::

(* and group, and or group as seq group - omitted for space *)

ENTITY opt group; opt : group; END ENTITY; :::

(* closure and ne closure as opt group - omitted for space *)

Thus in our example, the Title element translates to: #4 = element decl('Title',#23,PCDATA) #23 = tag min pair(KEEP,OMIT)

Whilst the Head element, Body element and paragraph element P translate to the following more complex set of entity instances. 8

#3 = element decl('Head',#21,#22) #21 = tag min pair(KEEP,KEEP) #22 = seq group(['Title','Author']) #6 = element decl('Body',#25,#26) #25 = tag min pair(KEEP,KEEP) #26 = ne closure('Sections') #9 = element decl('P',#30,#31) #30 = tag min pair(KEEP,OMIT) #31 = or group([PCDATA,'List','Subsect'])

4.2 Attributes Attributes are recorded using two EXPRESS entities. attlist decl records the element which the attributes refer to and each sgml attribute records the form of an attribute, with a name, a range of values for the attribute and a default.

ENTITY attlist decl; element ref : STRING ; attrs : LIST [1:?] OF sgml attribute ; END ENTITY ; ENTITY sgml attribute; attribute name : STRING ; END

declared value : attribute value ; default value : attribute default ; ENTITY ;

In this restricted subset of SGML, attribute value can either be a keyword, such as CDATA, represented as an enumerated type enum attr, or can be a group attribute, for a list of alternate values, given by the tok grp attr.

TYPE attribute value = SELECT(enum attr, tok grp attr); END TYPE ; TYPE enum attr = ENUMERATION OF (CDATA ATT, ); END TYPE ; ENTITY tok grp attr ; tok grp : namegroup ; END ENTITY ; TYPE namegroup = SELECT(base attr,or namegroup); END TYPE ; TYPE base attr = STRING; :::

9

END TYPE; ENTITY or namegroup ; END

o1 : namegroup ; o2 : namegroup ; ENTITY;

A similar selection occurs with attribute defaults, which in the restricted subset of SGML being presented here, can either be a given string, or a keyword such as REQUIRED.

TYPE attribute default = SELECT(enum default, default attr); END TYPE ; ENTITY default attr; def : STRING ; END ENTITY; TYPE enum default = ENUMERATION OF (REQUIRED, ); END TYPE ; :::

In our example, the following EXPRESS instances represent the attribute declarations. #18 = attlist decl('Report',[#44,#45,#46]) #44 = sgml attribute('security',#53,#54) #53 = tok grp attr(#57) #54 = default attr('public') #57 = or namegroup(['con den','public']) #45 = sgml attribute('status',#55,#56) #55 = tok grp attr(#58) #56 = default attr('draft') #58 = or namegroup(['draft',' nal']) #46 = sgml attribute('re d',CDATA ATT,REQUIRED)

4.3 Entities SGML entities4are represented by the following EXPRESS entities.

ENTITY entity decl ABSTRACT SUPERTYPE OF (ONEOF(external entity,parameter entity)); entity name : STRING ; entity value : STRING ; END ENTITY; This entity allows us to distinguish between general entities, which have been discussed and parameter entities, which we will not discuss in the current paper. 4

10

ENTITY external entity SUBTYPE OF (entity decl) ; END ENTITY; ENTITY parameter entity SUBTYPE OF (entity decl) ; END ENTITY; Thus in our example, the entity RAL represented using the following EXPRESS instance. #1 = external entity('RAL','Rutherford Appleton Laboratory')

5 Generating EXPRESS Schemas The second translation required takes SGML DTDs and generates EXPRESS schemas. Thus documents which are written to conform to the DTD can be stored as EXPRESS instances of the resulting schema. This section follows and extends the ideas presented in [1]. To do this translation we establish a correspondence between objects in an SGML DTD and elements in an EXPRESS schema de nition. There are several ways in which this may be done: we present one which is reasonably straightforward. This translation uses some general types, which included in every generated schema. These are de ned in the next two subsections.

5.1 General Supertypes Supertypes are de ned for the major syntactic categories of SGML.

ENTITY Element SGML ABSTRACT SUPERTYPE ; END ENTITY ; ENTITY Entity SGML ABSTRACT SUPERTYPE ; END ENTITY ; ENTITY Attribute SGML ABSTRACT SUPERTYPE ; END ENTITY ; The default method for building subtypes of a supertype is to use ANDOR. However, in our example this would generate many super uous subtypes. Hence, we shall assume that the ONEOF subtype constructor is adopted as the default here.

11

5.2 SGML Base Data Types. SGML provides base types for blocks of text. These are represented using the following EXPRESS types. A typical block of text in an SGML document, represented as a PCDATA block, will contain both standard text and entity references. An EXPRESS type which models this is a list of base elements, each of which can be an string or an entity of supertype Entity SGML.

TYPE PCDATA = LIST OF SGMLBaseDataType ; END TYPE ; TYPE SGMLBaseDataType = SELECT OF (STRING, Entity SGML) ; END TYPE ;

The unparsed character data type CDATA is interpreted as STRING type, and the EMPTY datatype as the enumerated type with the single element empty.

TYPE CDATA = STRING ; END TYPE ; TYPE EMPTY = ENUMERATION OF (empty); END TYPE ;

An alternative approach to handling SGML entity references might be to replace them in situ with their de ned text before translation. However, since the de nitions of entity references may be changed without the text being changed, we choose to leave entity references in the translated form.

5.3 Translating Document Type De nitions An SGML Document Type De nition is translated to an EXPRESS schema. In our example, we generate the following schema skeleton.

SCHEMA Report dtd ; END SCHEMA ; :::

The list of SGML declarations within the DTD are translated in turn to form the entities and types of the generated schema. There is a problem however, in naming schema. EXPRESS Schemas have a name, however, there is no \general" name for a DTD. DTD's are referred to in documents using the is as follows.

name tagmin group >

ENTITY element name type SUBTYPE OF Element SGML ; END ENTITY ; :::

The name of the element is translated to the name of the entity by appending the string \ type" and declare the entity to be a subtype of Element SGML. Note that the tag minimisation ags are omitted in this translation process. We assume that the tag minimisation of the document has been handled by the SGML parsing mechanism, so when the document is translated such information has been lost. When documents are retranslated back from EXPRESS to SGML all tags are included. The form of the groups of the element determine the attributes of the EXPRESS entity. The simplest element group is a base type, such as #PCDATA. This translates into a single attribute of type PCDATA. This attribute is assigned an arbitrary name, unique within the entity. Thus, in our example, the Title (line 6) element translates into the following EXPRESS entity.

ENTITY Title type SUBTYPE OF (Element SGML); a1 : PCDATA; END ENTITY ; Elements Author, H1, H2, L1, Item and A1 are similar. If the element group has two or more elements in sequence, then each sub-element generates an attribute, the order of the attributes re ecting the order of the sequence. Thus the Head element (line 5) is translated into the following entity.

ENTITY Head type SUBTYPE OF (Element SGML); END

Title1 : Title type; Author2 : Author type; ENTITY ;

13

Element Title will be translated to entity Title type so we used that type name and the type for the rst attribute: similarly for the Author element. Note that we have used the name of the element reference for attribute names, appended with a number to make the attribute unique within the entity. This naming convention makes comprehension easier and is used where possible. Uniqueness of attribute names is important since the same element reference could be used more than once in the same group. For example, a heading with a subtitle may have the element group: (Title, Title, Author)

so we would generate the attributes: Title1 : Title type; Title2 : Title type; Author3 : Author type;

The de nition of Body (line 8) speci es a non-empty recurrence of Sections. The translation uses the EXPRESS list type, with minimum bound of 1, as the type of a single attribute, again with a dummy name, in the entity generated for Body.

ENTITY Body type SUBTYPE OF (Element SGML); a1 : LIST [1:?] OF Section type ; END ENTITY ; Similar translations apply for the possibly empty recurrence. This translates to the EXPRESS list type, with minimum bound of 0. Thus Section (line 9) translates to:

ENTITY Section type SUBTYPE OF (Element SGML); H11 : H1 type; a2 : LIST [0:?] OF P type ; END ENTITY ; The Report (line 4) element translates to an entity including an optional valued attribute representing optional element reference, as follows.

ENTITY Report type SUBTYPE OF (Element SGML); Head1 : Head type; Body2 : Body type; a3 : OPTIONAL Appendix type; :::

END ENTITY ; The element group which speci es a choice of elements translates to an EXPRESS SELECT type. However, we cannot place such a type in line for an attribute type, and we have to generate an auxiliary SELECT type, with a unique dummy name. Thus the element P translates to the following: 14

ENTITY P type SUBTYPE OF (Element SGML); a1 : or1 type; END ENTITY ; TYPE or1 type = SELECT OF (PCDATA,List type,Subsect type); END TYPE ; This method of using auxiliary types could, for consistency, be used for all groups, but we chose the approach presented above for clarity. However, there are occasions such as nested groups, when it may be necessary to generate auxiliary types for other groups. Perhaps the most awkward of all the element structuring operators in SGML to translate satisfactorily into EXPRESS is the \&" operator. This speci es that the elements of the group can occur in any order. We translate the group in a similar way to a sequence of elements. Thus the element List (partially) translates into:

ENTITY List type SUBTYPE OF (Element SGML); L11 : L1 type; a2 : LIST [1:?] OF Item type ; END ENTITY ; :::

However, the attributes of an EXPRESS entity occur in a xed order. Thus if we translate the elements in order, we lose information; when the document is regenerated from the EXPRESS schema, the correct ordering of the elements may be lost. Thus we need to record the order in which sub-elements occur. We do this by adding another attribute to the entity, which is a list of numbers recording the order. Thus we add to the List type entity: and ordering 3 : LIST [2:2] OF UNIQUE INTEGER ; WHERE and ordering condition3: is permutation(and ordering 3, [1,2]) ;

This attribute and ordering 3 is a non-repeating list of length two, which is the number of elements in the group. We also add a WHERE clause using an auxiliary function is permutation to assert that and ordering 3 is a permutation of the list [1 2]. ;

5.5 Translating SGML Attributes Each SGML attribute translates to an entity which is a subtype of the abstract supertype Attribute SGML. An auxiliary enumeration type is used to represent any alternative values of the attribute. Thus the three attributes in the example are represented as follows:

15

ENTITY security type SUBTYPE OF (Attribute SGML); a1 : attr9 type; END ENTITY ; TYPE attr9 type = ENUMERATION OF (con den,public); END TYPE ; ENTITY status type SUBTYPE OF (Attribute SGML); a1 : attr10 type; END ENTITY ; TYPE attr10 type = ENUMERATION OF (draft, nal); END TYPE ; ENTITY re d type SUBTYPE OF (Attribute SGML); a1 : CDATA; END ENTITY ; Each SGML attribute is associated with an element. In order to make the association explicit in the EXPRESS model, attributes are added to the translation of the relevant SGML element. In our example, the SGML attributes generate three further EXPRESS attributes of the Report type entity as follows.

ENTITY Report type SUBTYPE OF (Element SGML);

END

Head1 : Head type; Body2 : Body type; a3 : OPTIONAL Appendix type; security4 : security type; status5 : status type; re d6 : re d type; ENTITY ;

We do not cover the translation of attribute defaults in this paper.

5.6 Translating SGML Entities SGML entities are translated into EXPRESS entities which are subtypes of the abstract supertype Entity SGML. However, the value of a SGML entity is set to a constant in the DTD. We thus represent the value of the SGML entity by de ning a DERIVED attribute, with a dummy name, and declaring that the value of that attribute is set to a constant value. Thus the entity RAL in the example (line 2) is translated to the following. 16

ENTITY RAL SUBTYPE OF (Entity SGML); DERIVED d: STRING : 'Rutherford Appleton Laboratory' END ENTITY ; This may not be the best way to translate entities. For example, it may be more fruitful to translate them as constants.

6 Generating Documents from EXPRESS Having generated the above EXPRESS model for any given DTD, it is straightforward to translate document instances and store them in the database. We then have the task of extracting documents from the database and converting them into a form that can be processed by a document generation system. This is also straightforward and will not be discussed further. Extraction is more dicult when we wish to use data from the EXPRESS model, but not originating from a document. For example, reports documenting technical data on the components of a control system will contain data on those components extracted from an EXPRESS representation of the component, rather than from the representation of the report. In order to construct documents which can refer to other data represented in EXPRESS, we have developed the concept of \prototype documents". These are documents with `holes' embedded in their text which will be lled in by data extracted from EXPRESS entities when the document is transformed into its SGML form. These holes specify which EXPRESS entity is referred to and which attribute of the entity is to be displayed: thus the holes are more properly described as \EXPRESS references". As documents are being represented in SGML, these EXPRESS references are given an SGML syntax as follows which can be added to each DTD.
ExpRef ExpRef

- O ent expid att

EMPTY CDATA #REQUIRED CDATA #REQUIRED CDATA #REQUIRED

>

>

We allow any text in the document to contain EXPRESS references in addition to standard SGML text.
Text

O O

(#PCDATA | ExpRef)*

>

Thus at any point in the text of the document we can include an EXPRESS reference, which is just a start tag with no body (as the EMPTY group declaration is used), and no end tag. Each EXPRESS reference has three attributes which are used to refer into the EXPRESS instance. The rst ent is an entity name giving a type of entity; the second 17

expid is an internal object identi er so that the document can refer to a speci c instance in the data base5, and the third att is an EXPRESS attribute name. All three are required attributes in the SGML tag. Thus for example, in a typical control system, we might have delays, which may be represented by the following partial EXPRESS entity.

ENTITY Delay ; ref num : INTEGER ; timeout : INTEGER ; END ENTITY ; :::

And there maybe several instances of this delay used in the system: delay1 = Delay(1, 30) ; delay2 = Delay(2, 10) ; :::

The technical description of this delay might include the following sentence in a prototype document: Delay number has a time out of seconds.

On generation of the nal document, these EXPRESS references will be expanded out to generate the desired text: Delay number 2 has a time out of 10 seconds.

The expid attribute itself could be a parameter to the document. The document could then be parameterised over any EXPRESS database so that we can have reusable documents. This approach is the rst step on integrating EXPRESS data into reusable documents, and still under development. These problems are discussed further in a draft paper [5].

7 Conclusions In this paper we have presented an architecture for integrating document modelling in SGML with data modelling and storage in EXPRESS. The central mechanism of this integration is a series of translation functions between the languages which allow the representation of one form of data in another. A sketch of these translations has been given in respect of an example document. The translation functions sketched in this paper cover only a subset of the full SGML language. There are further aspects of the language which have yet to consider. These include: 5

We assume that there is a mechanism to make such identi ers available

18

 Attribute Defaults. These are only partially covered in the representation of

DTDs and not at all in the generation of schemas from EXPRESS. More consideration is needed to cover defaults comprehensively.  Entities. We only give the most simple use of SGML entities. There are more sophisticated uses of entities which we do not consider here.  Inclusions and Exclusions. SGML provides a mechanism whereby elements can be freely included within other elements (inclusions), or excluded from elements in given circumstances (exclusions). The above translations do not cover this aspect of the language. Also, as mentioned above, further work is required in the area of extracting data from an EXPRESS model into documents. A prototype implementation of the translation functions described above has been begun in the functional language Standard ML. Work is continuing to extend and improve this implementation The TORUS project is primarily interested in reuse of documents and data. This involves combination, parameterisation and inheritance of documents and sub-documents. This is poorly covered by the current SGML standard and will require further investigation.

Acknowledgements We would like to thank our partners on the TORUS project, and also our colleagues at RAL for help and advise on this paper, particularly Brian Ritchie for his work on extracting documents from EXPRESS data.

References [1] Wey-tyng Chang \Comparison of Two Information Modeling Languages: SGML and EXPRESS." EXPRESS User Group Conference, 1994. [2] \Information Processing - Text and Oce Systems - Standard Generalised Markup Language (SGML)", ISO Standard 8807, 1988. [3] "Product Data Representation and Exchange - Part 11: The EXPRESS Language Reference manual", ISO Standard 10303-11 (TC 184/SC4), 1994. [4] B.M.Matthews and J.C.Bicarregui, \Process Modelling in Control Systems Design" to appear EUG'95, Express User Group Conference, Grenoble, France, October 1995. [5] B. Ritchie \Issues in the Reuse of Documentation through Prototypes." ESPRIT 8080 TORUS Project draft document TORUS/RAL/20/1, April 1995. 19

[6] Wenzel, B., et. al., \Interoperability between STEP and SGML" Swedish Association for CALS White Paper, appeared in Engineering Data Newsletter April and June 1995.

Juan Bicarregui.

Juan Bicarregui has a BSc and MSc in Mathematics from Imperial College and Queen Mary College, London and a PhD in Computer Science from the University of Manchester under the supervision of Prof. C. B. Jones. His research interests are in the application of formal methods and formal notations to software engineering. This work has included the development of a speci cation and proof assistant, Mural, and case studies in the use of VDM and B. He recently co-authored a practitioners guide to proof in VDM. His current interests are in the application and integration of formal notations including EXPRESS, SGML.

Brian Matthews.

Brian Matthews has a BSc in Mathematics and a MSc in the Foundations of Advanced Information Technology. He is currently completing a PhD in equational reasoning at the University of Glasgow. He has been working in the Software Engineering Group of the Rutherford Appleton Laboratory since 1986. He is interested in advanced techniques for software engineering, especially in the areas of formal methods and tools, using both the B notation and equational reasoning, and also in the integration of methods, notations and standards.

20

DTD[[ ] : Document Type De nition -! EXPRESS Schema DTD[[ dtd ] = SCHEMA \dtd-name" ; D[[ dtd ] built-ins END SCHEMA ; D[[ ] : SGML Declarations  EXPRESS Declarations -! EXPRESS Declarations D[[ dec decs ] typs = D[[ decs ] ( L[ dec ] typs ) L[[ ] : SGML Element Declaration  EXPRESS Declarations -! EXPRESS Declarations L[[ ] typs = ENTITY element name type SUBTYPE OF Element SGML ; as END ENTITY ; :: typs where (as,typs ) = G[[ group ] typs 0

0

G[[ ] : SGML Element Group  EXPRESS Declarations -! EXPRESS Attributes  EXPRESS Declarations

21

G[[ elem name ] typs G[[ base type ] typs

= ([elem name : elem name type ;],typs) = ([a name : base type ;],typs) where a name = NewName () G[[ g 1 g 2 g n ] typs = (a::as, typs ) where ([a], typs ) = G[[ g 1 ] typs (as, typs ) = G[[ g 2 g n ] typs G[[ g 1 j j g n ] typs = ([a name : group type],typs) where a name = NewName () (group type,typs ) = T[[ g 1 j j g n ] typs G[[ g 1 & & g n ] typs = ([a name : group type],typs) where a name = NewName () (group type,typs ) = T[[ g 1 & & g n ] typs G[[ g? ] typs = ([a name : OPTIONAL group type],typs ) where a name = NewName () (group type,typs ) = T[[ g ] typs G[[ g* ] typs = ([a name : group type],typs ) where a name = NewName () (group type,typs ) = T[[ g* ] typs G[[ g+ ] typs = ([a name : group type],typs ) where a name = NewName () (group type,typs ) = T[[ g+ ] typs ;

00

; :::;

0

00

0

; :::;

:::

0

:::

:::

0

:::

0

0

0

0

0

0

T[[ ] : SGML Element Group  EXPRESS Declarations -! EXPRESS Type Name  EXPRESS Declarations

22

T[[ elem name ] typs T[[ base type ] typs T[[ g 1 g n ] typs

= (elem name type,typs) = (base type,typs) = (t name, typs ) where t name = NewName () ([at 1 at n ],typs ) = fold ( G[[ ] ) typs [g 1 g n ] typs = typs :: ENTITY t name ; [at 1 at n ] END ENTITY ; T[[ g 1 j j g n ] typs = (t name typs ) where t name = NewName () ([g 1 name g n name], typs ) = fold ( T[[ ] ) typs [g 1 typs = typs :: TYPE t name = SELECT [g 1 name g n name] ; END TYPE ; T[[ g 1 & & g n ] typs = (t name typs ) where t name = NewName () s name = NewName () ([g 1 name g n name], typs ) = fold ( T[[ ] ) typs [g 1 n = # [g 1 name g n name] typs = typs :: TYPE t name = LIST [n :n ] OF s name ; 00

; :::;

0

:::

00

:::

0

:::

00

:::

0

:::

00

0

:::

00

:::

0

:::

:::

00

0

WHERE SIZEOF(QUERY(x SELFjTYPEOF(x ) = g 1 :::

0

0

:::

00

0

0

B[[ ] : SGML AttDec  EXPRESS Declarations -! EXPRESS Attribute  EXPRESS Declarations B[[ attname attvals attdef ] typs = (attname:attname type; , typs ) where ([at 1 at n ],typs = V[[ attvals ] typs typs = typs :: ENTITY attname type ; SUBTYPE OF Attribute SGML ; [at 1 at n ] END ENTITY ; 00

0

:::

00

0

:::

V[[ ] : SGML Attribute Value Group  EXPRESS Declarations -! EXPRESS Attributes  EXPRESS Declarations This is similar to G[[ ] and is omitted here for the time being. N[[ ] : SGML Entity Dec  EXPRESS Declarations -! EXPRESS Declarations N[[ ] typs = typs where typs = typs :: ENTITY entity name ; SUBTYPE OF Entity SGML ; DERIVED d :STRING := 'entity value END ENTITY ; 0

0

24

0