The specification language Z ("zed'), based on set theory, has been used to define a microprocessor based system in a formal notation. The 8-bit Motorola 6800 ...
North-Holland Microprocessing and Microprogramming 21 (1987) 223-230
223
F o r m a l Specification and D o c u m e n t a t i o n of Microprocessor Instruction Sets Jonathan P. Bowen Oxford University Computing Laboratory, Programming Research Group 8-11 Keble Road, Oxford OXl 3QD, England bowen~uk,ac. oxford, prg (JANET) The specification language Z ("zed'), based on set theory, has been used to define a microprocessor based system in a formal notation. The 8-bit Motorola 6800 was chosen as an example because of its simplicity. Memory configuration and interrupts were included. This paper presents parts of this specification. The use of a formal description language allows the possibility of verification of the instruction set. Z could also be used at the design stage. Additionally, the use of Z combined with informal text is sufficently readable for the specification to be used for documentation purposes. With the experience gained, a more complicated or a completely new microprocessor could now be attempted.
1. Introduction 1.1. State of the art Currently, computer instruction sets are normally documented using tables, semi-formal formulae and informal text. This is designed to be easily assimilated by the reader. However it is often incomplete or ambiguous because the informality of such methods does not help to check the correctness or completeness of the description. Instruction sets have already been verified using a formal description [1,2]. However these are normally not particularly readable, but they do allow formal reasoning about the operation of the processor. Attempts to use a more formal description in a readable form have often resorted to informality in the more difficult areas [3]. 1.2. T h e p r o b l e m Having two (or more) incompatable descriptions of an instruction set (or anything else for that matter) is bound to lead to the possibility of errors. This paper attempts to show that an instruction set may be described formally in a manner which is suitable for design, verification and documentation. This means that only one description need be produced. 1.3. A solution The specification language Z [4-8], based on set theory and developed at the Programming Research Group
at Oxford University, has been used to define a microprocessor based system in a formal notation. The complete specification has been published elsewhere [9]. The Motorola 6800 8-bit microprocessor [10,111 was chosen for this exercise because its relative simplicity allowed the entire instruction set to be covered in a short period of time. Memory configuration and interrupts were included in the specification. It was felt that a complete microprocessor instruction set should be attempted in order to detect any possible weaknesses in the use of Z for such a task. A processor such as one of the 68000 family was deliberately not selected for an initial attempt at such a specification since its greater complexity would either require a good deal more work, or for many features not to be included. Some of the material developed would be generally useful for the specification of any microprocessor based system. Hence any subsequent specifications could draw on this groundwork. The strategy used is to first consider the state of a microprocessor based system in several parts (e.g. system clock, memory, reglsters), covering static conditions and then to consider changes in state in each case. These are combined together so that changes in state of the entire system when an instruction is executed or an interrupt occurs can be specified. Finally, all the possible changes of state may be combined together to produce a complete specification.
224
J.P. Bowen / Formal Documentation o f Microprocessor Instruction Sets
1.4. Formal notation
operations on words. We can overload these bitwise operators to apply to words as well as bits.
The rest of this paper makes use of the Z specification language to illustrate the way microprocessors may be formally documented at the instruction level. Readers with a knowledge of set theory and predicate calculus should find the notation reasonably easy to follow. Others may prefer to do some initial background reading. [8] is the first generally available book on Z. The Z language has two levels: the mathema$icai and the schema notation. The latter allows structuring of the former. This is necessary if specifications of any size are to remain understandable. Declarations and predicates may be combined into schema boxes which in turn may be combined using schema operators. A brief glossary of the Z notation is included as an appendix at the end of the paper.
2.3. S h i f t functions A word may be shifted left (__). In this case, the bottom (LSB) or top (MSB) bit of the word can attain a certain value, depending on the type of shift (e.g. arithmetic, logical or rotation). For example, _>>_ adds a supplied blt value to the top bit position and shifts the rest of the bits in a word down by one position.
_>>_
: (Bit
x Word) ~
Word
V b : B i t ; w:Word • b >> w = {#w-l~-~b}U(succlw)
2.4. A r i t h m e t i c functions 2. Basic Concepts 2.1. Word organisation Microprocessors manipulate bits which are organised into words. By convention, bit positions are numbered from zero up. We make the following syntactic definitions of two sets:
Microprocessors normally allow arithmetic operations. For example, a word m a y be incremented or decremented. The result wraps around if there is overflow or underflow in each case. This m a y be generallsed for addition ( + _ ) and subtraction (_-_) by repeatedly incrementing or decrementing a word.
2.5. Test conditions
Bit
-:
{0, I}
Word
Q
{w:N-w~BitlwCQAdomw=O..#w-l}
A bit is just a 0 or a 1. A word is modelled as a finite function from natural numbers to bits which is nonempty and has a domain ranging contiguously from zero up to the top bit position. 2.2. Bitwise functions Bitwise logical functions involve individual bits. A bit may be complemented. We can also AND, (inclusive) OR and (exclusive) XOR pairs of bits by providing the relevant truth table in each case. For the infix AND operator ( _ * ) , we can make a globally available definition using an ax/omatic schema:
_*_ •
: (Bit x Bit)
= {(0,0)~0,
"-) B i t
(0, I)~0,
Often we wish to test whether a word has a zero value, returning a '1' if it has and a '0' if not: z e r o : Word --> B i t
V w : Word
*
ran w = {0}
==~ z e r o w = I ^
ran w ~ {0)
~
zero w = 0
We can define similar functions for testing of negative values and so on. 2.6. Hexadecimal notation Most microprocessor documentation uses hexadecimal values for op-codes, addresses, etc., since this notation m a y easily be converted to the corresponding bit pattern. Each hexadecimal digit drawn from a set of characters (CHAR) maps to a numerical value:
( I, 0)~--)0, (I, 1 )~---)I} [CHAR]
[The underlines indicate the position of operands.] NOT (~), OR (_~._) and XOR ( ® _ ) may be defined similarly. Most microprocessors allow bltwise logical
hex : CHAR -H~ N hex = { ' 0 " ~
0,'i'~->
l,...etc.,'F'~IS}
225
J.P. Bowen /Formal Documentation o f Microprocessor Instruction Sets
We can define a recursive function which handles a sequence of hexadecimal digits (i.e. a hexadecimal number). We shall employ the widely used notation of prefixing 0x to the hexadecimal string. Ox : (seq CHAR) - ~ N
that ROM (Read Only Memory) and RAM (Random Access Memory) make up the available real memory. These two areas do not overlap. I Memory Mem : Address ~ Byte ROM, RAM : E Address
Ox = 0
V s : s e q l CHAR I ren s • dom hex • Oxs = 1 6 * O x ( f r o n t s) + h e x ( l a s t s)
3. System State The 6800 operates on 8-bit bytes of data and 16-bit addresses: Byte Address
~ Q
{ w:Word I #w = 8 } { w:Word I #w = 16 }
3.1. S y s t e m clock The system contains a clock which controls its timing. This consists of a sequence of pulses which may be modelled as the number of clock pulses which have occurred since power-up. Clock
~
[ Clk : N ]
An operation such as an instructlon o r i n t e r r u p t t a k e s a certain number ofclock cyclesto execute: AClock Clock Clock' Cycles : N
I
Clk' = Clk+Cycles I
[Note t h a t / l is used to indicate a change of state by convention. The before state is undashed (C1 ock) and the a f t e r state is dashed (C1 o c k ' ). Cucl e s is left undefined and this stage in the specification. Its value will depend on the particular operation being executed. Schemas with before and after states may be considered to denote atomic operations.]
,RAM n ROM = 0
i
The memory may be updated by operations such as instructions and interrupts. In this case, the ROM and RAM areas do not change. The RAM contents may be partially updated by an instruction or interrupt. Areas outside valid ROM and RAM may vary unpredictably and are thus left undefined by this specification. The values in ROM do not vary. Only values in RAM may be updated reliably. AMemory Memory Memory' 8Mem : Address -~ Byte ROM' = ROM RAM' = RAM ROM ~ Mem' = ROM 4 Mem RAM 4 Mem' = RAM 4 (Mem • SMem)
The assumptions about the memory are not strictly true in all cases. For example, it is possible to have software switchable banked memory. However they hold for the majority of simple systems. In practice areas outside ROM and RAM may be used for memory mapped I/O. This is not covered here since it is very system dependent, but it could be considered at this point if required.
3.3. Registers The 6800 has a number of registers which we shall denote as a Z data type. These registers are either 16bit or 8-blt, and two of them are accumulators. Regs : : = A I B I CCR I PCH I PCL t PC I SP H
I
SP L
I
SP t X H I X L I X
3.2. M e m o r y The address space of the 6800 (and many other microprocessors) may be considered as a total function from A d d r e s s e s to Bytes. We shall assume
Regs16 ~ { PC, SP, X } Regs 8 ~ Regs \ RegslG Accumulator Q { A, B }
226
,1.,o. Bowen / Formal Documentation of Microprocessor Instruction Sets
The Condition Code Register (CCR) holds various single bit codes at different bit positions. These are the carry (C ~ 0), overflow (Y ~ 1), zero (Z ~ 2), negative (N ~ 3), interrupt mask (I ~ 4) and half-carry (H ~ 51 bits. The 8-bit registers always contain byte values and the 16-bit registers always contain address values. The low and high bytes of the PC, SP and X registers concatenate to form 16-bit registers. The top two bits of the CCR are unused and are always set to 1. Registers Reg : Regs ~
Word
Reg~RegssD c Byte Reg~Regal6D c Address Reg(PC) = Reg(PCL)^Reg(PC H) Reg(SP) = Reg(SPL)AReg(SPH) Reg(X) = Reg(XL)AReg(XH) {6~--)I,7~I} c Reg(CCR)
There are various types of 6800 addressing modes. Additionally, the 6800 may respond to an external I n t e r r u p t or execute an I11 egal instruction.
Immediate I Direct I Indexed I Extended I Inherent I Relative I Interrupt I Illegal
Modes ::=
When an instruction is executed, the op-code is read from the memory location indicated by the current value of the Program Counter. The instruction will have a particular addressing mode. Note that this is actually redundant information, but can aid in the understanding of the specification. The state of the system will change when the instruction has executed: ASystem AMemory AReg i stera
I
AClock Op • O..255 Mode
: Modes
Op = val Mem(Reg(PC)) Any of the registers may be updated by an instruction (or interrupt). Every instruction consists of a number of bytes (NBytes). Normally the next instruction to be executed is the instruction following the current instructlon. Thls may be overridden, for example by a branch instruction. Individual CCR bits (apart from the top two bits) may be updated by the instruction depending on the result of the operation. .
&Registers Registers Registers' NBytes : N Next : Address SReg : Regs "4-> Word $CCR : ( 0 . . 7 ) -~ Bit
Next = Reg(PC) +NBytes Reg' = Reg • {PC~--)Next} $ SReg • {CCR~-~Reg(CCR)®({B,7}~$CCR)}
I
(val converts a word to a numerical value.) 3.5. Power-up The clock starts from zero for convenience in this model, when the system is initialised (i.e. powered up). It is assumed that the ROM already holds the program to be executed. Interrupts are disabled and the Program Counter is loaded from the top two locations in memory (normally in ROM).
InitSystem System'
I
Clk' = 0 I~-~l • Reg'(CCR) Reg'(PCH) = Mem'(addr OxFFFE) Reg'(PCL) = Mem'(addr OxFFFF) l
(addr convertsa numericalvalueto an address.) 3.4. Complete system 4. I n t e r r u p t s The system state consists of the clock, memory and registers. We can shnply combine the previously defined state schemas to specify the complete state:
System ~ Clock A Memory A Registers
When an interrupt occurs all the 6800 registers are saved on the stack. Program control is transferred to a new address specified by the contents of memory at a particular vector address. The interrupt mask bit is
J.P. Bowen / Formal Documentation of Microprocessor Instruction Sets
set in the Condition Code Register. This information is defined by a framing schema (denoted by ~) which may be used in the subsequent definitions of particular interrupts:
of cycles (Cgc1esBase). The information above may be combined together in a framing schema for use when defining addressing modes.
~Interrupt ASgstem Vector : 0..65535
~AddrMode ASyetem M : Address
SMem = { Mem(Reg(SP)-B)~-~Reg(CCR), ...etc. for allthe otherregisters} SReg = { PCH~-~Mem(addr(Vector)), PCL~-)Mem(addr(Vector+l)), SP ~-)Reg(SP)-7 }
8CCR = { I~-~l }
The hardware interrupt (IRQ) can only be activated if the CCR interrupt mask bit is clear: IRQ ~Interrupt
227
I
I~-~0 • Reg(CCR) Vector = OxFFF8 NBgtes = 0 Mode = Interrupt I
Note that the number of clock cycles taken to execute the interrupt has been left unspecified since this was not found in the documentation used [10,111. 5. Instructions
All microprocessors have a set of instructions which can affect the registers and/or the memory using a variety of addressing modes.
I
OpBese : 0 . . 2 5 5 CgclesBase : N
'
I
For example, immediate mode addressing gives the address of the byte immediately following the instruction op-code byte: ~Immediate ~AddrMode Mode = Immediate Op = OpBase- Ox30 M = Reg(PC)+I NBytes = 3 CyclesBase = 2
Framing schemas for other modes may be defined ci~;larly and then combined for families of b~::/'ructions which use similar addressing modes. 6.2. E x a m p l e instruction
As an example of the final specification of an individual instruction, the LDA instruction loads an accumulator from an address in memory: _ LDA ¢/X)oubI e
5.1. Addressing m o d e s
Many of the 6800 instructions use a selection of memory addressing modes. Each has a memory address (M) of an operand calculated in a manner depending on the addressing mode. The op-code for a given addressing mode is always a constant offset from some base op-code (0pBase) for a particular instruction. The value of 0pease is specified in subsequent schemas defining specific instructions. The number of clock cycles which an instruction takes to execute also depends on the addressing mode. Again this is easily calculated from some base number
OpBaseA = OxB6 Result = Mem(M) $CCR = { N~eMSB(Result), Z~zero(Result),
V~O }
The base op-code using accumulator A is 0xB6. The result is taken from the addressed location in memory. The N, Z and V condition codes are set appropriately. All the other instructions are defined similarly.
228
J.P. Bowen / Formal Documentation of Microprocessor Instruction Sets
6. Overall Operation
Op-codes which are not explicitly specified are considered illegal. The state of the system after the execution of such an op-code is undefined. 111egalOp
9
ASgstem J Mode = 111egal
This specification could be tightened if more were known about an illegal instruction. For example, at present this specification allows the contents of the registers and RAM to be entirely changed after an illegal instruction. If more information were available, extra predicates could be added here. All the legal instructions and interrupts may be combined. LegalOp Extlnterrupt
Q LDA V ...etc. Q IRQ V ...etc.
Each change of state of the 6800 consists of the execution of an instruction (legal or otherwise) or an external interrupt: Exec ~ (lllegalOpeLegalOp)VExtlnterrupt When the 6800 is started, a sequence of such operations is executed depending on the contents of memory and (non-determlnistically in this specification) on the occurence of external interrupts. If required we could add inputs to the specification (corresponding to interrupt lines} to determine when interrupts occur. Given the specification of each of the instructions, it is possible to consider sequences of instructions and prove (in the absence of any external interrupts) properties of such sequences. At present thls must be done manually in Z, but computer based tools are currently being developed at the PRG which should aid such proofs.
7. Conclusion
The entire instruction set of the Motorola 6800 microprocessor has been formally specified in a readable form [9]. Enough experience has been gained so that more complicated modern microprocessors could be specified in a similar manner. Such processors would require a larger document and more work in order to cover them fully. However the extra work should be no more than that involved if more informal methods were used.
During the summer of 1087, I shall be supervising two MSc projects to specify parts of the 68000 and the INMOS transputer respectively in Z. These should help to give an indication of the suitability of Z for larger processors. A semi-formal specification of the transputer, ]n a style similar to Z, is already available from INMOS [3]. One of the problems with producing a correct and complete description of a microprocessor is the possible complexity of the description. Therefore it is important to introduce easily assimilated concepts which may be used succinctly in subsequent parts of the speclfication. Each piece of formal text (usually a schema in the Z notation} should be readily understandable (assuming that the reader is conversant with Z). In this way, the specification may be layered to aid readability of the final specification. Z is an excellent tool for specifying state based systems. However it is not so suitable for specifying concurrency and communication protocols. Other specifications languages, such as CSP [121, are better for such tasks. In the future, it is possible that features of CSP may be combined with Z to build on the strengths of each. The initial specification of the instructions have been compacted as much as is compatible with readability to reduce the overall length of the specification. If Z where to be used to present an instruction set in the form of a manual, it is anticipated that each instruction would be allocated about a page with an expanded specification allowing easy reference for the instruction on that page alone. A possible example layout has been devised, and pages for some instructions have been produced. An example is given in Appendix A. Z has proved to be a good notation for specifying a microprocessor instruction set. The length of the specification is very favourable with the more informal methods currently used for instruction set documentation in industry and elsewhere. Not only that, but we also galn a means of formally reasoning about the properties of the instruction set. This could prove to be invaluable, especially at the design stage. In the future, computer-based tools should be available to check consistency and give assistance with proofs. A type checker, which should be available soon, would help to remove many of the inevitable minor errors in a specification of any size. It is hoped that industry will adopt Z and/or similar methods in due course.
J.P. Bowen /Formal Documentation of Microprocessor Instruction Sets
Acknowledgements
Appendix A
I would like to thank R. Gimson, T. Gleeson, B. Monahan, C. Morgan, K. Paliwoda, B. Sufrin and S. Topp-Jorgensen at the PRG, who checked the original formal specification at various stages. R. Chandra (Intel, California), S. Heath (Motorola, UK), R. Macdonald (RSRE, Malvern) and S. Murrell (University of Miami) also provided useful comments.
Example manual page
The financial support of the Science and Engineering Research Council is gratefully acknowledged.
229
An example layout for an instruction in a 6800 microprocessor instruction set manual is given below. It is suggested that each instruction should be given a page like this in such a manual to allow quick reference for a particular instruction without the necessity for cross reference, once the framework of the specification has been assimilated by the reader.
Branch ffGreater Than zero References [1]
Hunt, W. A., FM8501: A Verified Microprocessor, Technical Report 47 (Institute for Computing Science, The University of Texas at Austin, 1986).
[2]
Geser, A., A Specification of the Intel 8085 Microprocessor: A Case Study, MIP-8608
Operation BGT AM6800 Cond : Bit Op
Sufrin, B. A. (ed.), Z Handbook, Draft 1.1 (PRG, Oxford University, 1986).
[5]
Spivey, J. M., Understanding Z: A Specification Language and its Formal Semantics, DPhil Thesis (PRG, Oxford University, 1986).
[6]
Woodcock, J., Structuring Specifications - Notes on the Schema Notation (PRG, Oxford University, 1986).
[7]
King, S., Scrensen, I. and Woodcock, J., Z: Concrete and Abstract Syntaxes, Version 1.0 (PRG, Oxford University, 1987).
[8] [9]
Hayes, I. J. (ed.), Specification Case Studies (Prentice-Hall, 1987). Bowen, J. P., The Formal Specification of a Microprocessor Instruction Set, PRG-60 monograph (PRG, Oxford University, 1987).
[10] M6800 Microprocessor Programming Manual (Motorola, 1975). [11] M6800 Microprocessor Instruction Set Summary
(Motorola). [12] Hoare, C. A. R., Communicating Sequential Processes (Prentice-Hall, 1985).
2
NBytes
=
Cycles
= 4
Cond = Zcc~(NcceVcc)
Shepherd, D., Register level specification of the transputer instruction set, in: The transputer instruction set - a compiler writer's guide (INMOS, 1987)pp. 115-153.
[4]
= Ox2E
Mode = R e l a t i v e
(Universitat Passau, Fakultat fur Ma~hematik und Informatik, 1980). {3]
BGT
Cond = 0 =~ t R e g = 0 Cond = 1 ==~ t R e g = {PC~Next!Mem(Reg(PC)+l)}
8CCR = 0 6Mem = 0
Description Causes a branch if Z is set or one of N and Y (but not both) is set....etc.
Appendix B
Brief glossary of Z a. b
identifiers
f, g
X, y
terms
m, n
p, q A, B Q, R d, e
predicates s, t sets S, T relations N declarations (e.g. e:
functions numbers
sequences schemas set of numbers A ;b, ...: B...)
Definitions x ~ g Syntactic definition v : : =b I... Data type definition [a ] Introduction of a given set _e _ Infix operator (_ s for postfix, etc.)
230
J.P. Bowen /Formal Documentation of Microprocessor Instruction Sets
Logic
A -~B
"p pAq pvq p==)q p(=)q
f(x)
Vdlp.q 3dlp.q 31dlp'q
Logical negation Logical conjunction Logical disjunction Logical implication (~p v q) Logical equivalence (p ~ q ^ q =) p) Universal quantification (or Vd • q) Existential quantification (or 3 d • q) Unique existential quantification
Sets
f®g Numbers N SUCC
n
m + n m
-
N
m
~
N
m~n
xffiy x:gg xGA yea il Ae_B
AcB
(x, y....} (dip.e} (x, y,...) AxBx... P^ FA ^riB AuB A\B #A
Equality of terms Inequality (~ ( x = y ) ) Set membership Non-membership (- (x e y) ) Empty set Set inclusion Strictset inclusion(Ac_B ^ A#B) Set of elements Set comprehension (or (d Ip)) Ordered tuple Cartesian product Power set (setof subsets) Set of finitesubsets Set intersection Set union Set difference Size of a finite set
m,.n
a ~--~ b
dora R ran
R
ida QSR ^~R A~R ^bR AI~R R~A} R-! R" R+
Rn
seq A seql A
B
Set of finite sequences Set of non-empty finite sequences Empty sequence Sequence (l~--)x, Y~--)g, ...) Sequence concatenation First element of sequence (s ( 1 ) ) All but head of a sequence Last element of sequence (s (#s) ) All but last element of a sequence Sequence restriction
S c h e m a notation
Vertical schema definition. New linesdenote ~;" and "^". The schema name and predicatesare optional.The schema may subsequently be referenced by name.
Relation (Q P AxB) Maplet (Q (a, b)) Domain of relation Range of relation Identity function Foward relationalcomposition Domain restriction Domain corestriction Range restriction Range corestriction Relational image Inverse of relation Reflexive transitiveclosure Transitive closure Relation composed n times
S;d S[a/b, ...] SAT SVT
Partial function Total function
pre S SeT SIT
P S~[dlp]
[T;...I...] z.a
OS S'
Sip
s\(a .... )
Functions
Set of natural numbers {0, 1, 2, ...) Successor function { 0 ~ 1, 1,--#2, ...) Addition (succ n m) Subtraction ((succ -1) n m) Multiplication ((_+m) n O) Less than or equal (_