nite non-empty mutually disjoint sets, S ~ E is the starter (the axiom) n .... If an employee is married~ the name of the wife must be known; more- over, the system ...
DATA STRUCTURES AND GRAPH GRAMMARS P.L. Della Vigna C. Ghezzi Istituto di Elettrotecniea ed Elettronica Politecnico di Milano - Piazza L. da Vinci 32 20133 Milano - Italy
ABSTRACT This paper is concerned with a formal model for data structure definition:
data graph grammars
(DGG's).
The model is claimed to give a rigorous documentation of data structures and to suit very properly program design via stepwise refinement. Moreover it is possible to verify data structure correctness,
with re-
gard to their formal definition. Last, attribute context-free duced.
data graph grammars
(A-CF-DSG's) are intr~
A-CF-DGG's not only give a complete and clean description of d~
ta structures and algorithms running along data structures,
but also
can support an automatic synthesis of such algorithms.
KEY WORDS AND PHRASES
Data structure, correctness,
abstraction,
stepwise refinement,
program synthesis,
mars, parsing.
context-free
software reliability,
grammars, attribute gram-
131
i. INTRODUCTION Programming
methodologies
modifiable,
readable and portable
topic in computer
which can help in designing correct,
easily
software have become an important
science.
A widely accepted principle
is that the quality of software can be consi i
derably -improved if the programmer
can express his tasks in a free and
natural way, without being concerned with details of the machine, which could force him to tailor his solution to\some u n n a t u r ~ or unessential features. Very high-level
languages are an ambitious
answer to these problems,
but it has been argued they cannot exhaust all the needs of programmers. Moreover the serious problems of optimization
which arise have
not yet received a solution which allows to obtain a code of good qua lity. Another attractive
attack to this problem consists
in successively
composing a solution through "levels of abstraction". the solution is initially operations
specified by using an abstract machine whose
and data tailor the problem to be solved.
tion is not directly
supported by the language,
ed until a level is reached which is directly We feel that programming
de-
This means that Whenever an abstrac
it is recursively
supported by the system.
through levels of abstraction
be considered as a general philosophy
should not only
to be divulged to non-believers,
but should also inspire the design of computer-aided
program develop-
ment systems which allow to test, measure and modify programs stage of their stepwise refinement.
detail
at each
Our research effort is presently
in this area. Quoting Liskov /i/, two kinds of abstraction ful in writing programs: Abstract operations
"abstract operations
are naturally
represented
are recognized
and abstract data types by subroutines
dures, which permits them to be used abstractly details of implementation).
the ordinary
representation,
ces the user of the type to be aware of implementation These principles research
for ab-
of the way the objects of the type will occupy storage,
CLU programming
have inspired the definition !anguage/system,
a fo~
information".
and implementation
of the
which is one of the most interesting
efforts towards the definition of a programming
ing structured programming
.
or proce-
(without knowledge of
However, a program representation
stract data types is not so obvious; description
to be use-
/2/ and modularity
/3/.
system support
132
We present here another model tion and refinement
for data structures
definition,
which is based on graph grammars.
In particular,
we will show how the model can be used for clean d o c u m e n t a t i o n project and how it can support a computer assisted tures,resulting
in a considerable
The reader who is interested
improvement
abstra~ of the
design of data stru~
of program reliability.
in this topic is invited to read some re-
lated works which have a p p e a r e d
in the literature
(/4/,/5/~/6/).
2. DATA GRAPH GRAMMARS A data structure
can be viewed a b s t r a c t l y
by a network of access paths. over E (the node al~habet)
as a set of objects
Thus we can formally
and A (the link alphabet)
D = (N, ¢ , ~ ), where ~ N is the set of nodes, are the nod_.~e and link labell!n ~ functions Let ~ =
{DID is a data graph over Z
connected
define a as a triplet
¢ : N ÷ Z and ~ N x
A xN
respectively.
, A};
a data graph l a n g u a g e ~
over
E ~ A is a subset of ~ . Two data graphs D = (ND~ CD' ~D ) and F = (NF, CF' ~F ) are e q u i v a l e n t (D { F) if a one-to-one
equivalence
function e : N D + N F can be found
such that i)
CD (n) = CF (e(n)),
2) (nl~a~n2) Languages strings,
e
~D
~n
iff
of graphs~
e ND (e(nl) , a, e(n2))
as an extention
of the w e l l - k n o w n
have been studied by researchers
number of papers
(/7/,/8/~/9/,/i0/,/ii/).
finition and t r a n s l a t i o n
~ ~F
are explained
languages
of
in p a t t e r n - r e c o g n i t i o n
in
Applications
a
to language de-
in /12/ by Pratt,
from w h o m we
borrow some formalism. Also
if it appears
along lines
that the theory of graph grammars may be developed
similar to the theory of string grammars~
main yet to be studied; a) connections
for example
many preblems
r~
:
with graph-automata;
b) parsing; c) definition
of meaningful
We shall consider
classes of grammars.
here mainly context-free
trying to give any answers /i0/~
restricted
to the questions
/ii/. We shall rather restrict
graph grammars,
without
above for which we refer to
our attention to their use as a
tool for data definition. Let D = (N~
¢, ~) and hi, nj ~ N; the interpretation
of ~(n i) = X i
133
and
¢(nj)
tively.
= Xj is that object n i and nj are of type X i and Xj r e s p e £
(ni, Y, nj)c ~
means that object nj can be accessed by n i fo!
lowing the access link Y. Links should not be considered as pointers, present memory locations; objects whose definition links.
In practice,
as well as nodes do not r[
rather they are abstract ways of r e f e r e n c i n g can be recursively
for example,
given in terms of other
links could represent a simple refe-
rence or even a search algorithm. A top-down
design of a data structure
operations
which recursively
In particular,
detail the description
we shall concentrate
link refinements
should be considered
as a set of
of types and links.
here on data type refinements:
could also be taken into account with minor changes
to the model. Node type refinements
are r e p r e s e n t e d
here as production
rules which
describe the structure of a type in terms of lower level component
da-
ta types. Formally,
a data graph grammar DGG is a 5-tuple G = (Zn,Zt,
where the n on terminal node alphabet zt(z = zt U
Zn, the terminal node alphabet
z n is the total alphabet)
nite non-empty mutually
disjoint
and R is the set of ~roduetion
A ,S, R),
sets,
rules.
and the link alphabe~
A are fi-
S ~ E is the starter (the axiom) n Each element r g R is a 5-tuple
r = (A, D, I, O, W) such that i) A E ~
n (o)
2) D = (N,~, ~) is a connected
graph over ~ and A
3) I E N is the input node 4) 0 e N is the output node 5) W c _ N
.
Before defining how productions troduce the operation ~ of graphs as result. be graphs over
which,
E,A and ~2 a (possibly
i) N = M 1 U M 2 U
M3
where
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
empty)
.
.
.
.
gives a set
subset of N 2. D' e Join
to a graph D = (N,¢,~)
M 2, N 2 = M 3 t) M4, ~ 2 C .
we in-
(N2,~2,~2) , D' = (N',¢',#') such that
MI, M2, M 3 and M 4 are mutually
set such that N I = M I U .
applied to two graphs,
Let D 1 = (NI,~,~I) , D2=
(DI,D2,~ 2) if D' is equivalent
.
are used to derive data graphs,
:
disjoint
M 3, ~ : N 2-~ M 2 U
M3
.
(o) Let D' = (N, 9, ~') be the undirected graph associated to D, such th~,~' = {(nl,a,n 2) I(nl,a,n2) g ~ V (n2,al,nl) e ~} : D is connected if is connected.
!34
is a s u r j e c t i v e a)
q (n)=n
application
~n
E M4
such that
2) a) ~(n)
= }l(n)
n E N1
b) ¢(n)
= ¢2(n)
n E M3
(n,a,m)E
$
iff n, me N I and
such that n = ~(n'), Intuitively,
If
e
w i t h nodes
i) j(n)
= e (n)
2) j(n)
= e (Q(n))
~n
graphs
function
~i (q (n))
(n', a, m')
e N 2 can be found
c $2"
(DI~ D2~ ~2 ) can be v i e w e d
nodes
of D 2 not in [2 can be
same label.
e : N + N',
as follows
the joint
function
:
E NI Vn
join
sult of the ~
c N2
is ~
(~-join)
operation
if ~2
is a single
= N2"
graph
In such a case
formed
the r!
by the pair of
D I and D 2.
The d e r i v a t i o n over
in D I w i t h the
:
¢I or n',m'
g r a p h of j o i n
N 2 + N' is d e f i n e d
The o p e r a t i o n
and
of D I and D 2 w h e r e
is the e q u i v a l e n c e
j : NI U
~2(n)
(n,a,m)E
m = Q(m')
each r e s u l t i n g
as a j u x t a p o s i t i o n identified
the c o n d i t i o n s
g M3
b) ~ (n)E M 2 ~ n
3)
satisfying
set Y(G)
Z and A w h i c h
i. Y(G)
contains
({n},
defined
by the data g r a m m a r
can be r e c u r s i v e l y
all the graphs
~, s), w h e r e
¢(n)
defined
D o (the
= S and
G is a set of graphs
as follows
start
graph~)
E is the e m p t y
:
equivalent
link l a b e l l i n g
to fun~
tion ii.
let D I = (NI, (A, D2, graphs
¢I' ¢I ) s Y(G),
D' e q u i v a l e n t
I o let n{
=
(N{, ~i', % '),
b) - ~ ( n )
) O
= ~1(n)
~n
- }~(n I)
= X~
X E Z
-
= ~,
~ E
${(n O)
~2' @2 )"
Y(G)
contains
to D = (N, ¢, $) c o n s t r u c t e d
a) N{ = (N 1 - {[}
c)
~ E N I, @i (~) : A E Z n,
I, 0, W)E R, D 2 = (N2,
also the
as follows:
where {n I, n 0]
>
n I, n O ~ N 1
E N 1 - {[}
( n i , a ~ n 2 ) E ${ ~ n i , n 2 g N1 - {~} s u c h t h a t ( n l , a , n 2) ~ ~ i (n o , a, n I) g ${ if (~, a, [) e ~I (n, a, n I) E ${ ~ n e N 1 -{~} such that - (no, a, n)
2. let D 1" : (NI, " 3. if j is the
~
~{ ~ n
~ N I - {~}
~i~ " ~ ' i ) e ~oin
joint
function
such that
(D{ , D 2 , W)
j : N{ ~ N 2 ÷ N[
(n, a, ~) (~, a, n)
e *i" c ~i"
135
then a) N : N 1 {J(ni), J(no)} b) ¢(n) = ¢~'(n), V n ¢ N{ - {J(ni) , J(no)} c) -(nl,a,n 2) e ~ n l , n 2 E N~ - {(J(ni), J(no)) -(n,a,j(I))¢ ~ n c N~ - {J(ni) , J(no)} such that (n,a,J (ni))e ~ -(j(O), a , n ) e ~ ? n
e N~ - {J(ni) , J(no)} such that
(J(no) , a, n) a ~i -(j(0), a, j(I)) E @ if (J(no) , a, J(nI))
~T
¢ ~i
In general, the application of a rule to a graph D in Y(G) gives a result which depends on D, i.e. the operation is context dependent. A DGG is a context-free
data graph grammar
(CF-DGG)
if all the rules
(A, D, I, 0, W) where D = (N, ¢, #) are such that W = N. The data graph language
(DGL) defined by a grammar G is :
L(G) = {HIH = (N H, CH' ~H ) E Y(G)A CH(n)
g Zt ~ n ¢
N H}
Example 1 (o) The following grammar graphs over Z = {a}
defines the set of binary directed acyclic and A = {Xl, x 2} .
BDAG ~
x
x2
[ i]
q
BDAG--'~
~
[i]
(o) The rule (A,(N, ¢, ~), I, O, W) is represented as A ÷ (N, ¢, ~), where the input node is marked by an arrow, and the output node by a double circle. The set of nodes W is bracketed by l a n d ] . In the sequel, if no set W is listed, W = N is assumed.
136
Example
2
The following presenting
grammar generates
the employee
the data structure
file of a firm. Employees
shown in fig. i, r[ are grouped accord-
ing to their sex. If an employee over,
is married~
the name of the wife must be known; more-
the system should record married
employee
couples of employees.
file
next ma man list
woman list
man ~woman husbandof
woman
name~
Ill
137
first man
/
~first woman
woman name
next man
newt woman husbandof woman name
~ann~o~ 7 ~
.an
~omanna~e)
/ woman husanof"
end
Figure i
138
3. THE PARSING PROBLEM FOR DATA STRUCTURES
The formalism of DGG's structures
in a clean and rigorous
perty, because increase
should be viewed as a tool for describing
software reliability,
p r o g r a m correctness
are given in /ii/ where
As for the models
arise concerning
(A,D,I,0~W)~
suitable
subclasses
described here
it is decidable
for CF-DGG's
supporting
e~
to prove the following:
is decidable
for DGG's having rules
such that cardinality
is undecidable
for DGG's.
which test data structure
can be a u t o m a t i c a l l y
In what follows we shall restrict
correct-
constructed.
out attention to CF-DGG's
4. DATA GRAPH GRAMMARS AND TOP-D0k~
In this section we give an example
(W) ~ I.
for CF-DGG's.
programs
(data structure parsers)
the stepwise refinement
is correct a c c o r d i n g
of CF-DGG's
it is possible
where D = (N, ¢~ ¢),
given a CF-DGG~
i.e. the
are also studied.
P r o p o s i t i o n , 2 - The parsing p r o b l e m
ness
the formal properties
the parsing problem for DGG's,
1 - The parsing p r o b l e m
In particular~
pro-
can greatly
becomes much more easy to prove
Several results on such problems
ficient parsing algorithms
Moreover~
is a very important
of deeiding whether a data structure
to its formal definition°
Proposition
as it
naturally
One of them regards
possibility
This
and to m a i n t a i n programs.
A number of questions of DGG's.
way.
it is well known that a clean d o c u m e n t a t i o n
data
PROGR~LM DESIGN
: AN EXAMPLE
showing how DGG's can be used in
of p r o g r a m construction.
Given a library organized
in sections of different
matters we develop
an a l g o r i t h m which computes
g, the set of empty sections.
The data
structure will be developed
in parallel with the refinement
of the
search algorithm. The p r o g r a m
is written
conventions
for operations
in an A l g o l - l i k e
type = and a is a link exiting A, then i. B:=a(A)
means
language,
on the data structure:
with the following if A is an object of
:
that the data structure
control
leaves object A follo~
ing link a and the object reached by A under a is denoted by B; 2. is-link
(A~a)
is a boolean
function which is true iff a link label-
139
led a leaves A; 3. if A denotes an object at step rule
i
whose type e is detailed by the
~+D at step i+k (k 5 I), then A denotes the input node of
graph D at step i+k.
Data structure
Prosram
Data structure --~
/initially the current object is START/ Sect :: init (START); /successive integer numbers are associated to successive sections/
Library
i~- 0 ; £ 4 - @ ; Library
---~
scanned4- false;
repeat i~--i+l; i f empty
then
(Sect)
£~-60{i}
if is-link
;
(Sect~ next)
then S e c t 4 - n e x t
(Sect)
else s c a n n e d ~ - t r u e until scanned
We deatil empty(S, ect) Section
Head~--Sect if is-link first
back
; (Head,
then e m p t y . - f a l s e else e m p t y * - t r u e
Section
first)
140
The reader should note that further r e f i n e m e n t is r e q u i r e d to detail step 4. The r e f i n e m e n t implies: i) d e f i n i t i o n and p o s s i b l e r e f i n e m e n t of links; 2) c o n c r e t e i m p l e m e n t a t i o n of the data structure. If we c o n s i d e r each link as a simple reference~
no further r e f i n e m e n t
is r e q u i r e d and we must simply map the a b s t r a c t data structure onto the structures
supported by the p r o g r a m m i n g language.
On the other hand, we could c o n s i d e r links as invocations of algorithms yet to be detailed.
For example,
link next could extract from a secon-
dary storage the file c o n t a i n i n g the next section. On the o t h e r hand, even if the a l g o r i t h m w h i c h computes ~ does not require further r e f i n e m e n t s of the data structure, other queries about the data structure,
such as the list of books w r i t t e n by an author all
o v e r the library, w o u l d require d e t a i l i n g the n o n t e r m i n a ! Vols by means of the f o l l o w i n g p r o d u c t i o n s
SUC Vols
--~
Vols
--~
Book
--~
A u t h l i s t ~-~
A u t h l i s t --~
5. DATA GRAPH G R A M M A R S AND P R O G R A M
In this
SYNTHESIS
section we show how data graph grammars can be used for a u t o m !
tieally synthesizing algorithms which perform computations running a l o n g the data structure.
141
We introduce here the f o r m a l i s m of A t t r i b u t e - C F - D G G ' s w h i c h can be con sidered as an e x t e n s i o n of similar concepts of /13/ /14/. For each symbol X c Z there is a set I(X) of inherited a t t r i b u t e s and a set SCX) of s y n t h e s i z e d attributes.
The evaluation of the a t t r i b u t e s
is defined w i t h i n the scope of a single production,
by means of attri-
butes rules. A t t r i b u t e s of the l e f t h a n d side n o n t e r m i n a i of the p r o d u ~ tion are s y n t h e s i z e d while a t t r i b u t e s of the r i g h t h a n d s i d e elements are inherited;
attribute rules specify how a given a t t r i b u t e can be c o m p u ~
ed in terms of attributes of o t h e r elements in the same production. As to the example d e s c r i b e d in section 4, we introduce the f o l l o w i n g s y n t h e s i z e d attributes
:
- E, giving the set of empty sections; - ~, giving the set of books w r i t t e n by a given AUTHOR; - ~, w h i c h is true iff A U T H O R has at least one book in the library; - in, w h i c h is true iff A U T H O R is in the authorlist of a book; and the inherited
attribute
- n, which numbers each section
of the library.
The A t t r i b u t e - C F - D G G which represents the example is shown in fig.
2.
The indices which a p p e a r in the attribute rules relate a t t r i b u t e s to the elements of the productions. A t t r i b u t e s can be evaluated by an a l g o r i t h m w h i c h runs along the parse structure of the data structure; the values computed for the a t t r i b u t e s of the starter of the grammar are the result of the data structure. In our example
the evaluation of a t t r i b u t e
"Data structure"
e
of the n o n t e r m i n a l
gives the same result as the p r o g r a m d e s c r i b e d in se~
tion 4. The r e a d e r should note that using the f o r m a l i s m of A t t r i b u t e - C F - D G G we simply specify, for each rule, how to compute an attribute, of other attributes.
tion of an algorithm, because the e v a l u a t i o n sequence is not specified.
in terms
In other words we do not give the formal s p e c i f i c ~ of the a t t r i b u t e s
The only c o n s t r a i n t w h i c h must be s a t i s f i e d by an ef
fective a l g o r i t h m is that an a t t r i b u t e can be e v a l u a t e d only if the va lues of the a t t r i b u t e s from w h i c h it depends are known. It is p o s s i b l e to design an a l g o r i t h m which,
given the a t t r i b u t e - C F - D G G
and a data structure s a t i s f y i n g the grammar,
is able to find a suitable
e v a l u a t i o n sequence
(if it exists /13/) which allows:
a) to compute all the attributes or
in an i n t e r p r e t a t i v e
scheme,
142
b) to generate an object p r o g r a m w h i c h computes the attributes. In both eases~ data types and o p e r a t o r s used in a t t r i b u t e rules must be d i r e c t l y s u p p o r t e d by the i n t e r p r e t e r or by the p r o g r a m m i n g language in w h i c h the object p r o g r a m is written. In the example, we have s u p p o s e d that the object language supports data of type i n t e g e r and boolean. If we do not have a c o m p u t e r aided p r o g r a m design system, w h i c h is able to a u t o m a t i c a l l y c o n s t r u c t a p r o g r a m w h i c h evaluates attributes, Attribute-CF-DGG's
seem to play an useful role in giving a complete and
clean d o c u m e n t a t i o n of data s t r u c t u r e s and a l g o r i t h m s w h i c h run along data structures. It must be e m p h a s i z e d that this model
is not suitable to r e p r e s e n t op~
rations w h i c h d y n a m i c a l l y change data structures. data structure is m o d i f i e d
Therefore whenever a
it is n e c e s s a r y to r e - p a r s e the structure
in o r d e r to obtain the new values of its attributes. Attributes
can also be used to impose r e s t r i c t i o n s on the class
data structures DGG or could
d e f i n e d by a CF-DGG w h i c h cannot be specified by a CF-
be w i t h a r a t h e r c o m p l i c a t e d grammar.
In the sequel we p r e s e n t an A t t r i b u t e - C F - D G G
Data Structure l--~P
for the example in Sec. 4
n 3 ÷
2
~
:5
÷
1
--~
i
Irl + Ir3
~i ÷ s3 6
Library I
of
E
3
n2 ÷ nI
n 3 ÷ nl+l
~ ! ÷ ~2 u ~3 ~I ~ i'-~f I~I
= { then
true else false eI
Library
÷
e2 U
e3
i n2 ÷ nl
~I ÷ ~2
el ~ i--f ~i = ~ then true else false eI ~ E2
143
Section I --~
~l ÷ ~ back
f i ~ s t ~
~÷~
Section I --~
Vols I
--~
Vols ~
~-~
Book I
--~
~l ÷
suc
eI ÷ n I
vl÷ ~ --Authlist t
÷ if in 3 then {val (Title)} (o) else
Authlistl~-~
inl+ (if AUTHORzval(Author) then true else false)V in 3
Authlistl.-~
inl÷ if AUTHOR:val(Author) then true else false
(e) Val (a) gives the value of the terminal a
144
G. CONCLUSION In this paper we have given a formal definition of data graph grammars and we have discussed their relevance to data structure design. In particular~ graph grammars,
we have restricted our attention to context-free data and we have shown that:
I) they give a complete and rigorous documentation of a data structure; 2) they describe in a clean and natural way stepwise refinements of data structures; 3) it is possible to verify data structure correctness, to their formal
(syntactic)
with regard
definition;
4) it is possible to associate attribute rules to each production,
so
that algorithms which walk along a data structure can be automatically synthesized. Further investigations are currently going on with regard to the following points: i) dynamic change of data structures 2) data graph realization in a computer memory, with respect both to the automatic choice of efficient
storage structures and restric-
tions on CF-DGG's which derive graphs more easily implementable
/6/.
These points and a deeper insight into the practi~ai relevance of the model are worth studying to support our belief that attribute data graph grammars can play an useful role in computer assisted program design.
145 REFERENCES /i/ /2/
Liskov, B. "An introduction to CLU", Computation Structures Group Memo 136, MIT Project MAC, 1976. Dahl, 0.J., Dijkstra, E.W., Hoare C.A.R. "Structured programming" Academic Press New York~ 1972.
/3/
Parnas, D.L. "On the criterion used in decomposing systems into modules", CACM 15, 12, 1053-58, 1972.
/4/
Earley, J. "Toward an understanding of data structures", CACM 14, 617-626~ 1971.
/5/
Shneiderman, B., Scheuermann, P. "Structured data structures", CACM 17, i0~ 583-587, 1974.
/6/
Rosengerg, A.L. "Addressable data graphs", JACM 19, 2, 309-340, 1972.
/7/
Pfaltz, J.L., Rosenfeld, A., -"Web grammars"Proc, ist Intl. Joint Conference on Artificial Intelligence, Washington, 609-19, 1969.
/8/
Montanari, U.C. "Separable graphs, planar graphs and web grammars", Information and Control, 16, 243-67, 1970.
/9/
Paviidis, T. "Linear and context-free graph grammars", JACM 19, 11-22, 1972.
/i0/ Milgram D.I. "Web automata", University of Maryland, Computer Science Center Technical rep. 271, 1973. /Ii/ Della Vigna, P., Ghezzi, C. "Context-free graph grammars"~ Internal rep. 76-1, Istituto di Elettrotecnica ed Elettronica, Politecnico di Milano, IEEPM, 1976. /12/ Pratt, T.W. "Pair grammars, graph languages and string to graph translations"~ JCSS 5, 580-595, 1971. /13/ Knuth, D. "Semantics of context-free languages", Math. Systems Theory, 2~ 127-145, 1968; Correction: Math. Systems Theory 5, 95-96, 1971. /14/ Bochmann, G.V. "Semantic evaluated from left to right"~ CACM 2, 19, 55-63, 1976