SYNTAX: A table driven parser for the recognition of behavioural patterns Jacques P. Beaugrand and Robert Proulx Université du Québec à Montréal, Québec, Canada Corresponding author:
[email protected]
Abstract A table driven parser for the recognition of behavioural patterns is presented. SYNTAX is able to perform basic syntactic analysis. As a general automaton, it can check if a given corpus of data conforms to specific behavioural grammar. To do this, the Pascal program reads a sequence of EBNF-productions followed by data, and transforms these productions into an internal data structure, upon which the behavioural parser can operate. Marshall's pigeon recursive grammar is tested to illustrate the possibilities of the program. Key words: Syntactic analysis; Parsing; Structural analysis; Behaviour description; Behavioural grammar
Introduction Sequential machines or automata decide whether a string of terminal symbols belongs to the language generated by a particular grammar. The program SYNTAX that is presented here is a general automaton. Since it is a table driven parsing program, individual language grammars are first fed to the general automaton in the form of initial data that SYNTAX uses to process the sentences to be parsed. SYNTAX strictly follows the rules of the simple top-down parsing method described by Wirth (1976). However, general parsing programs developed to process computer languages are generally designed to scan sentences and to report illegal ones. On the contrary, the present parser is built to locate occurrences of constructs that conform to a given formal definition. In the program, this is accomplished by a special function that calculates the set of terminal symbols that can appear at the beginning of every legal sentence generated by a given construct. The program then scans the data file until it finds a symbol that belongs to such a set, and parsing begins from there. Parsing is straightforward when the underlying syntax is deterministic, that is, when the grammar used enables sentences to be parsed one symbol of look-ahead without backtracking. As a language translator or processor, SYNTAX reconstructs the generating steps, which in general form a structural tree, from their start symbol (basic vocabulary) to the final sentence. SYNTAX was developed for the description of sentences composed of behaviour units as basic vocabulary; but many other applications requiring categorisation and structuring could use the program (e.g., taxonomy, general pattern recognition). Behaviour units that are coded by the observer are, as usual, considered as atomic elements i.e., as primitive symbols. The task of the behaviourist is to structure these symbols into a meaningful tree of larger constructed behavioural categories. The program can aid this conceptual task if it is relatively simple. For instance, SYNTAX can make frequency counts of specific symbols and metasymbols, and counts of nth order transitions between them. But, rudimentary tasks such as these can be more efficiently carried out by conventional programs that do sequential analysis. So, the main interest for using SYNTAX resides in its ability to perform basic syntactic analysis. As a general automaton, it can check if a given corpus of data conforms to specific behavioural grammars.
General background on language processing Every language is based on a vocabulary whose elements are called words, or behaviour units as in the present application; in the realm of formal languages, however, they are called atoms. It is characteristic of languages that some sequences of words be recognized as correct, well-formed sentences belonging to the language and that others are defined as ill-formed. It is the grammar, syntax, or structure of the language that determines whether a sequence of words is correct or not. We define the syntax as the set of rules or formulas that, given a start symbol, can generate the set of formally correct sentences. Such a set of rules not only enables us to decide whether or not a given sequence of words is a correct sentence, but it also provides the sentences with a structure that is used in the recognition of the meaning (semantical aspects) of a given sentence. Although the issues of meaning, interpretation and biological function cannot be ignored when the user formulates hypotheses concerning sentence or behaviour structure, the program that is presented here is concerned by structural aspects exclusively. For a more formal treatment of language structures see Wirth (1976). For their relevance in the description of behaviour, see Westman (1977). Take, for example, the sentence, «They are flying planes» that has two meanings, depending on the phrase structure. This sentence belongs to the language that may, for instance, be defined by the following syntax structure (corresponding to one of the meanings only).
PRODUCTION RULES S = NP VP. NP = [A] N. A = Adj [A]. VP = V NP. N = they | planes. V = are. Adj = flying | big | red.
KEY FOR THEORETICAL CONSTRUCTS USED ABOVE S: sentence NP: noun phrase V: verb A: quality N: noun Adj: adjective
The idea then is that a sentence may be derived from a start symbol S by repeated application of replacement rules. The formalism or notation in which the rules given above are written is called extended Backus-Naur-Form (EBNF). It is extended BNF because it incorporates symbols that were not part of the initial BNF used by Naur (1963) to define ALGOL-60. The constructs S, NP, A, VP, N, V, Adj of the phrase-structure example are called non-terminal symbols; the words they, are, flying, and planes are called terminal symbols or atoms, and the rules are called productions. Each replacement rule must be concluded by a punctuation mark (point). The symbols =, |, [ and ] are called metasymbols in EBNF notation. The syntax of the EBNF that is significant to SYNTAX follows that described by Wirth (1976). On the left of the equal sign is only one metasymbol that is replaceable by what is on the right side of the equal sign. Within the right hand term of each expression, the juxtaposition of two symbols or metasymbols represents a succession within a text or in time. In the sentence analysis example given above, the expression «VP = V NP.» indicates that a VP is interchangeable with the succession of metasymbols V and NP. The sign «|» stands for logical OR (exclusive). For instance, expression «N = they | planes.» means that the metasymbol N is interchangeable with either of the terms they OR planes. In the present case these terms happen to be atoms, but they could be metasymbols as well. The enclosing square brackets «[» and «]» define a zero or more frequent repetition of the enclosed atom or metasymbol. Thus, the adjective flying could be absent as in the phrase «they are planes»; or [A] could stand for two or more adjectives as in expression «big red flying planes». A production rule can also be recursive. When the same metasymbol is used on both sides of the replacement sign, the structure that is defined is used in its own definition. For instance, expression «A = Adj [A].» indicates that metasymbol A can be replaced by metasymbol Adj, followed by any repetition of A (including zero repetition). In other words, one «Adj» would satisfy the definition of A, as well as any repetition of it, e.g., «Adj ...». A given metasymbol can also serve in the definition of any other metasymbol.
Execution of the program Program input. Since SYNTAX proceeds in a top-down approach, the user must first formulate structural hypotheses concerning the syntactic aspects of the data; the data structure that drives the parser must somehow be constructed before parsing can start. To do the job, SYNTAX reads a sequence of EBNF-productions followed by data, and transforms them into an internal data structure upon which the behavioural parser can operate. The program reads two disk files: a command, and a data file. The names of these two files have to be supplied by the user when execution begins. Command file «MARSHALL.BNF» in Appendix A illustrates a typical command file. It contains two sections separated by a dollar ($) sign. The first section is reserved for production rules written in EBNF. These rules will be applied to the data to be read later. The sequential order in which the production rules are arranged in the command file has no importance. The second section is the report section in which the user lists metasymbols for which a frequency count is desired. In this section, the syntax of each line is that of any already defined metasymbol, followed by a punctuation mark (ex: PREP., SBSEQ.). Symbols corresponding to atoms are not legal in the report section. However, their counts can be obtained by defining a corresponding dummy metasymbol in the section reserved to production rules and to request a report on it. First order and nth order transition frequencies can also be obtained in a similar way. The command file is read, parsed and compiled for minimal coherence. If any incoherence is found, it is reported to the screen and the program exits. Upon successful parsing of the EBNF code, the data file is also read and minimally compiled for the coherence of symbols read. Again, detected errors are reported. Upon successful compilation, metasymbols requested by the user in the report section are searched into data and reported. File «PIGEON.DAT» in Appendix A illustrates such a data file. A data file includes one or more cases. The case is composed of an identification line followed by a sequence (or sentence) spread over one or more lines having a maximum of 80 characters. The case corresponds to the basic unit of analysis and the sequence or sentence is the chain of codes derived from observing that case that typically would be a specific pair of individuals interacting. Each sentence must begin on a new line and must be terminated by one punctuation mark. The program does not parse the title line; its first five characters are used in reports as a header for identification purpose. Data proper begins on the second line of each sentence. Data must be a succession of atoms (primitives, here behaviour units). File «PIGEON.DAT» in Appendix A shows 10 well formed sentences.
Program output In its current version, and only upon request in the report section of the command file, SYNTAX simply counts the occurrence(s) of the metasymbols found in the text submitted for analysis. It does not calculate conditional or transitional probabilities. This aspect is left to the user, and for future developments. Counts are outputted on the screen and onto a report disk file having the same prefix as the command file but the extension «PRN». Total counts are shown on the screen for each metasymbol requested, and counts for each sentence, as well as totals, are written to the report file on disk. The report file is formatted in such a way as to be readily importable by a database or spreadsheet (see output disk file «MARSHALL.PRN» in Appendix A). EXAMPLE Marshall's pigeon grammar revisited Syntactic methods, i.e., methods that analyze behavioural sequences as if they were sentences in a language and could therefore be represented by grammars, have been proposed and used (e.g., Lashley, 1951; Altmann, 1965; Hutt and Hutt, 1970; Westman, 1977). The first user seems to have been Marshall (1965) in an unpublished paper. His grammar was proposed to generate the sequence of pigeon courtship described by Fabricius and Janson (1963). Dawkins (1976) presents a recursive version of Marshall's generative grammar in the form of an ALGOL-60 program. We found inconsistencies between his algorithmic interpretation and those published by Hutt and Hutt (1970) and Vowles (1970); we finally settled for Dawkins' (1976) interpretation. File MARSHALL.BNF» of Appendix A formalizes the pigeon grammar as rules expressed in EBNF parsable by SYNTAX. To illustrate the possibilities of SYNTAX in testing grammar, we have generated a corpus containing 100 courtship sequences respecting, in terms of transition probabilities, Fabricius and Jansson's (1963) original transition matrix between pairs of behaviour. Auto-transitions were not allowed, the original matrix having zeros in all cells of the diagonal. Each behavioural sentence had to start with a bow (BW) and had to conclude with a copulation (CO). Ten of the 100 sentences thus generated are presented in file «PIGEON.DAT» in Appendix A. SYNTAX is then used to verify the application of the various rules of Marshall's grammar. The details of the grammar analysis are presented in the Appendix A. SYNTAX reports that Marshall's highest metaconcept SBSEQ is verified in 79% of 100 cases (see the Screen I/O, Appendix A). Examination of Fabricius and Jansson's (1963) original data suggested that displacement preening (D) and billing (BI) were equivalent units with respect to their capacity to be followed by mounting (MO) and copulation (CO). This equivalence could be betters rendered by modifying Marshall's fifth rule from «WA = D BI [WA]» to «WA = D [WA] | BI [WA].»
Following this modification, metaconcept SBSEQ was recognized in 96 of the 100 simulated sentences (results not illustrated). The reasons that seem to account for the dissatisfaction of SBSEQ in the four non-grammatical sentences are that (1) the male pigeons consummated without warm up (cases #11 and #43, illustrated in file «PIGEON.DAT», Appendix A), or (2) they mounted (without subsequent copulation) during the preparatory period, a behaviour that is clearly ungrammatical (cases #17 and #79 of file «PIGEON.DAT»).
System requirements SYNTAX is written in Turbo Pascal 5.5 (Borland International) for PC compatible computers. The amount of memory required to run the program depends on the number of variables used, and on the complexity of the structures searched for. Recursive definitions are especially costly in memory and should be tested independently before being incorporated into a larger system of production rules. The example with pigeons ran in less than 400 Kbs of memory and the program occupies only 12 Kbs of disk-space. The source of program SYNTAX is listed at the end of this paper.
Acknowledgment Software development was made possible by grants by the Canadian NSERC to the first author.
References Altmann, S.S., 1965. Sociobiology of Rhesus: II. Stochastics of social communication. J. Theor. Biol., 8: 490-522. Dawkins, R., 1976. Hierarchical organization: a candidate principle for ethology. In: P.P.G. Bateson and R.A. Hinde (Editors), Growing points in Ethology, Cambridge University Press, NY, pp. 7-54. Fabricius, E. & Jansson, A., 1963. Laboratory observations on the reproductive behaviour of the pigeon (Columba livia) during the pre-incubation phase of the breeding cycle. Anim. Behav., 11: 534-547. Hutt, S.J. & Hutt, C., 1970. Direct observation and measurement of behavior. Charles C. Thomas Pub., Springfield (Ill.). Lashley, K.S., 1951. The problem of serial order in behavior. In: L.A. Jeffress (Editor), Cerebral Mechanisms in Behaviour, Wiley, NY, pp. 112-146. Marshall, J.C., 1965. The syntax of reproductive behaviour in the male pigeon. Medical Research Council Psycholinguistics Unit Report, Oxford. Naur, P. Ed., 1963. Report on the Algorithmic Language ALGOL-60. Association for Computer Machinery (Computing Surveys) 6, p. 1-17. Vowles, D.M., 1970. Neuroethology, evolution, and grammar. In: L.R. Aronson, E. Tobach, D.S. Lehrman & J.S. Rosenblatt (Editors), Development and evolution of behavior, essays in memory of T.C. Schneirla, Freeman, San Francisco, pp. 194-215. Westman, R.S., 1977. Environmental languages and the functional bases of animal behavior. In: Hazlett, B.A. (Editor), Quantitative methods in the study of animal behavior, Academic Press, NY, pp. 145-202. Wirth, N., 1976. Algorithms + Data Structures = Programs, Prentice-Hall, Englewood Cliffs (NJ).
APPENDIX A EBNF transcription of Marshall's recursive version of the generative grammar as published in an ALGOL-60 program by Dawkins (1976). Comments are presented between accolades. Input file: MARSHALL.BNF KEY BEHAVIOURAL {SECTION FOR PRODUCTION RULES } SBSEQ = PREP CON. PREP = INT WA [PREP]. INT = BW [AGG] [INT]. AGG = DR [AGG] | A [AGG]. WA = D BI [WA]. CON = M CO. $ {REPORT SECTION } CON. WA. AGG. INT. PREP. SBSEQ. CONSTRUCTS ATOMS (BEHAVIOUR UNITS) SBSEQ: Sexual behaviour sequence PREP: Preparatory behaviour CON: Consummatory behaviour INT: Introduce WA: Warm up AGG: Aggressive behaviour BW: Bowing DR: Driving A: Attacking D: Displacement preening BI: Billing M: Mounting CO: Copulation.
Control file: MARSHALL.BNF SBSEQ = PREP CON. PREP = INT WA [PREP]. INT = BW [AGG] [INT]. AGG = DR [AGG] | A [AGG]. WA = D [WA] | BI [WA]. CON = M CO. $ CON. WA. AGG. INT. PREP. SBSEQ.
One hundred simulated sentences respecting (in terms of transition probabilities) Fabricius & Jansson's (1963) original data on the courting behaviour of the male pigeon. Data file: PIGEON.DAT #1 BW D BI D BI D BI D BI D BI BW DR BW DR BW DR BW DR BW D M CO. #2 BW DR BW A BW D BI D M CO. #3 BW D BI D BI M CO. #4 BW DR BW DR BW D BI D BI D BI D BW A BW A BW A BW DR BW D BI D M CO. #5 BW DR BW DR BW DR BW DR BW DR BW A BW DR BW D BI D BI D BI D BI D BI A DR BW DR BW DR BW A BW D BW A BW DR BW D BI D M CO. #6 BW DR BW DR BW D BW DR BW A BW D BI D M CO. #7 BW D BI M CO. #8 BW DR BW DR BW D BI D M CO. #9 BW DR BW DR BW D BI D BI A BW DR BW BI D BI D BI D BW D BI BW D M CO. #10 BW D BW D M CO. #11 BW D BI D BI M CO. #12 BW DR BW DR BW A BW DR BW DR BW DR BW D M CO. #13 BW D M CO. #14 BW DR BW DR BW DR BW D BI D BI D BI D BI D BI D M CO. #15 BW DR BW DR BW DR M CO. #16 BW D BI D M CO. #17 BW DR BW DR BW DR BW DR BW DR BW D BI D BI D BI D BI BW A BW DR BW DR BW DR BW D M CO. #18 BW DR BW D BI A BW D BI D BI D BI A BW DR BW D BI D BW DR BW DR BW D BI D BI D BW D BI D BI D BW A BW DR BW DR BW D BW D BI D BI BW DR BW A BW DR BW DR BW DR BW DR BW DR BW DR BW D M CO. #19 BW DR A BW DR BW DR BW DR BW DR BW DR BW DR BW DR BW D BI M CO. #20 BW D BI BW D BW DR BW DR BW DR BW A BW DR BW DR BW DR BW DR BW A BW D BI D BI D BI D BI D BI D BI BW DR BW DR D BI D M CO. #21 BW D BI BW DR BW DR BW A BW DR BW DR BW D BI D BI D BI D BI A BW D BI D BI M CO. #22 BW D BW DR BW D M CO. #23 BW D BI D BI D BW A BW D BI D BI D BI D BI D BI D BI D BW DR BW DR BW DR BW D BI D M CO. #24 BW DR BW A BW A BW D BI D BI D BI D BI D M CO. #25 BW DR BW D BI M CO. #26 BW DR BW DR A BW DR BW D BW D BI D BI M CO. #27 BW A BW D BI D BI D BI BW D BI D BI A DR BW DR BW DR BW D M CO. #28 BW DR BW DR BW DR BW D BI D BI D BI D BI D BI D BW D BI M CO. #29 BW D BI D BI D BI D BI D BI D BI M CO. #30 BW DR BW DR BW DR BW DR BW D M CO. #31 BW D BI D BW D BI D BI D BI D BI D M CO. #32 BW DR BW DR BW D BI D BI D BI M CO. #33 BW DR BW DR BW DR BW DR A BW D BI D BI M CO. #34 BW DR BW D BI D BI D BI D BI D BI D BW DR BW DR BW DR A BW DR BW DR BW D BI D BI D M CO. #35 BW DR BW D BI D BI D BI D BI D BI D BI D BI D BI D BI D BI D BI D BI D M CO. #36 BW DR BW DR BW D M CO. #37 BW DR BW DR BW DR BW DR BW DR A BW D BI D BI D BI D M CO. #38 BW DR BW DR BW D BI D BI D BI D BI D BI D BI BW DR BW DR BW DR BW DR BW DR BW DR BW DR BW DR BW D M CO.
#39 BW DR BW DR BW DR BW DR BW DR BW DR BW DR BW D BI D M CO. #40 BW DR BW DR BW A BW D BI M CO. #41 BW DR BW DR BW DR BW DR BW DR BW D BI D BI D BI D BI BW D BI D BI D BI D BI M CO. #42 BW DR A BW A BW DR BW DR BW DR BW DR BW A BW A BW DR BW DR BW D BI D BI M CO. #43 BW DR BW D BI D BI D M CO. #44 BW A BW D BI D BI M CO. #45 BW DR A BW D BI D BW D M CO. #46 BW DR BW A BW D BI D BI M CO. #47 BW D BI D BI D BI D BI D BI D BI D BI D BI D BW D M CO. #48 BW DR BW DR BW A BW DR BW DR BW D BI D BI D BW DR BW DR BW D BI D BI D BI D BI D BW D BI D BI D BI D BI D BI D BI D BI M CO. #49 BW DR BW DR BW DR BW DR BW D BI D BI D BI D BI D BI D BW D BI M CO. #50 BW DR BW D BI D BI D M CO. #51 BW D BI M CO. #52 BW DR BW DR BW D BI D M CO. #53 BW DR BW DR BW A BW DR BW D BI A BW D BW DR BW DR A BW A BW D BI D BI M CO. #54 BW DR BW DR BW A D BI M CO. #55 BW D M CO. #56 BW D M CO. #57 BW D M CO. #58 BW DR BW D BI D BI D BI BW D BI D BW D BI D BI D BI D BI D BI D BW DR BW D BI D BI D M CO. #59 BW D BI D BI D BI D BI M CO. #60 BW DR BW D BI D M CO. #61 BW DR BW DR BW DR BW DR BW DR BW DR BW DR BW DR BW DR BW D BI D BI D BI D BI D BI D BI D BI M CO. #62 BW D BI D BI BW D BI D BI M CO. #63 BW DR BW A BW D M CO. #64 BW DR BW D BI D BI D BI D BI BW DR BW DR BW DR BW D BI D BI D BI D BI D BI A DR BW DR BW DR BW DR BW DR BW DR A BW DR BW A BW D BI D BW DR BW D BI D BI D M CO. #65 BW D BI D BI D BI D BI M CO. #66 BW DR BW A BW DR BW D BI D BI M CO. #67 BW DR BW DR BW DR BW DR BW DR BW D BI D BI D BW D BW D M CO. #68 BW DR BW DR BW DR BW DR BW DR BW DR BW D BI D BI D BI M CO. #69 BW DR BW D BI D BI D BI D BW DR BW DR BW DR BW DR BW D BW DR BW DR BW D BI D BI D BI D BI D BI M CO. #70 BW D BI D BW D BI D BI M CO. #71 BW DR BW DR BW D BW D BI D BI D BI D BI D BI D BI D BW D BI D BI D BI D M CO. #72 BW D BI BW DR BW DR BW DR BW DR BW DR BW D BI M CO. #73 BW DR BW A D BI D BI BW D BI D BI D BI D BI D BI D BI BW DR BW DR BW DR A BW A BW DR BW D A BW DR BW D BI M CO. #74 BW D BW DR D BI D BI D BI M CO. #75 BW DR BW D BI D M CO. #76 BW DR BW DR BW D BI D BI D M CO. #77 BW DR BW D BI D BI M CO. #78 BW DR BW DR A DR BW DR A BW D BI M CO. #79 BW DR BW A BW D M CO. #80 BW D BW DR BW D BI D BI D BI D BI D BI BW DR BW A BW DR BW D BI D BI D BI D BI D BI D BI D BI D BI D BI D M CO. #81 BW D BI D BI D BI D BI D BI D BI D M CO. #82 BW DR BW D BI D BI D BI D BI D BI D BI D BI D BI D BI D BI D BI D BI D BI D BI D BI BW DR BW DR BW D BI D BW D BI D M CO.
#83 BW D BI D BI BW DR BW DR BW DR BW DR BW D BI D BI D M CO. #84 BW D BI BW DR BW A BW D BI D BI D BI D BI D BW DR BW D M CO. #85 BW BI D BI D BI D BI D BI M CO. #86 BW D BI D BI D BI D BI D BI BW DR BW DR BW DR BW DR BW DR BW DR BW DR BW DR BW DR BW D BI D BI BW DR BW A DR BW DR BW DR BW D BI D BW D BI D BI D M CO. #87 BW D BI D BI D BW DR A BW D BW DR BW DR BW DR BW DR BW D M CO. #88 BW D BW DR A BW DR A BW DR BW D M CO. #89 BW D BI D BI D BI D BI D BW D BW DR A BW D BI D BI D BI D BI D BI D BI D BW DR BW A BW DR BW DR BW DR BW D BI D BI D BI D BI D BI D BI D BW DR BW D BI D BI D M CO. #90 BW DR BW DR BW DR BW DR BW D BI D BI D BI D BI D BI D BI D BI D BI D BI A DR BW DR BW DR BW DR BW D BW D BI D BI D BI D BI D BI D BI D BW D M CO. #91 BW DR BW D M CO. #92 BW DR BW DR BW DR BW DR BW D BI D BI D BI D BI D BI M CO. #93 BW DR BW DR BW DR BW DR BW A D M CO. #94 BW D BI D BI D A BW DR BW D BI A BW DR BW DR BW DR BW DR BW DR BW DR A BW D BI D BI D M CO. #95 BW D BI D M CO. #96 BW DR BW DR BW D M CO. #97 BW D M CO. #98 BW D BI BW DR BW DR BW DR BW DR BW D M CO. #99 BW DR BW DR BW A BW DR BW A BW A BW D BI D BI D BI D BW DR BW D BI D BW D BW DR BW DR BW DR BW DR BW D BI D BI D BI D BI D BI BW A BW A BW DR BW D BI D M CO. #100 BW D BI D BI D BI D BI D BW DR BW DR BW D BW D BW DR BW D BW A BW D BI M CO.
Results of the application of Marshall's grammar and of its augmented version onto the simulated data. Screen I/O Program SYNTAX-PC -- Version 6.0 (95/02) Name of the BNF file (ex: Marshall.BNF) ? Marshall { supplied by the user} Name of the Data file (ex: Pigeon.DAT) ? Pigeon { supplied by the user} BNF Analysis of file Marshall Undefined Object(s): none "PREP" in file PIGEON = 85 "INT" in file PIGEON = 142 "AGG" in file PIGEON = 267 "WA" in file PIGEON = 113 "CON" in file PIGEON = 100 "SBSEQ" in file PIGEON = 79 Job done... Program SYNTAX-PC – Version 6.0 (95/02) Name of the BNF file (ex: Marshall.BNF) ? Augmen Name of the Data file (ex: Pigeon.DAT) ? Pigeon BNF Analysis of file Augmen Undefined Object(s): none "PREP" in file pigeon = 102 "INT" in file pigeon = 142 "AGG" in file pigeon = 267 "WA" in file pigeon = 142 "CON" in file pigeon = 100 "SBSEQ" in file pigeon = 96 Job done...
Results from the application of Marshall's grammar (recursive version) onto the simulated data. Detailed output for each sentence. The format is «Comma and quotes delimited» and can be imported by a database or spreadsheet. To save space, only a partial report is illustrated. Output disk file: MARSHALL.PRN "Marshall.BNF on PIGEON", "# 1 ", "PREP ", 1 {..} "Marshall.BNF on PIGEON", "Total", "PREP ", 85 "Marshall.BNF on PIGEON", "# 1 ", "INT ", 1 {..} "Marshall.BNF on PIGEON", "#100 ", "INT ", 1 "Marshall.BNF on PIGEON", "Total", "INT ", 142 {..} "Marshall.BNF on PIGEON", "# 10 ", "WA ", 0 "Marshall.BNF on PIGEON", "# 11 ", "WA ", 0 "Marshall.BNF on PIGEON", "# 12 ", "WA ", 0 {..} "Marshall.BNF on PIGEON", "# 98 ", "SBSEQ", 1 "Marshall.BNF on PIGEON", "# 99 ", "SBSEQ", 1 "Marshall.BNF on PIGEON", "#100 ", "SBSEQ", 1 "Marshall.BNF on PIGEON", "Total", "SBSEQ", 79
Program listing {SYNTAX.PAS A table driven parser for the recognition of behavioural patterns } Program SYNTAX; Uses Crt; Const MaxL= 81; { LineLength +1 space at the end } LMess= 60; { error messages length } NMess= 8; { Number of messages } Type NomSymb= String[16]; Numerr= 1..NMess; {Change the content of Symbole to declare your atoms } {atoms BW,DR,A,BI,D,M,CO are for the pigeon grammar study
}
Symbole=(Vide,AIW,ADE,ADW,EIA,EIW,EDA,EDW,WDA,WDE, Ident,AccolGauche,AccolDroite,Barre,Point, Dollar,Est); SymSet= Set of Symbole; TabSymb= Array [Vide..Ident] of NomSymb; TabMess= Array [1..NMess] of String[LMess]; Pointeur= ^Noeud; Pointe_Liste= ^Item; Noeud= Record Alt,Suc: Pointeur; Case Terminal: Boolean of True: (Cible: Symbole); False: (NewObj: Pointe_Liste) End; { Noeud } Item= Record Nom: NomSymb; Entree: Pointeur; Suivant: Pointe_Liste End; { Item } Var LL: 1..MaxL; Nfois,Ntot,IERR: Integer; TraceOn,Trouve: Boolean; Sym: Symbole; IdSymb,IIdSymb: NomSymb; Prem: SymSet; Liste,Sentinel,T: Pointe_Liste; Pt: Pointeur; S: String[MaxL]; NF,NF2: String[8]; BNF,Donnees,Results: Text; Const Message: TabMess = ('End-of-file encountered while searching for a symbol', 'Illegal character','"}" expected','Symbol or "{" expected', 'Illegal or undefined symbol','"=" expected', '"." expected','Undefined Symbol'); {Change the content of TabSym to } {declare your own atoms }
Table : TabSymb = ('Vide','AIW','ADE','ADW', 'EIA','EIW','EDA','EDW','WDA','WDE','Ident'); Procedure Erreur (N: Numerr); Begin If TraceON Then Writeln ('^': LL-Length(S)); Writeln (' ** Error:',N:2,' ',Message[N]); Halt End; { Erreur } Procedure Nouvelle_Ligne (Var Fich: Text); Begin If Eof (Fich) Then Erreur (1) Else Begin Readln (Fich,S); While (Length(S) > 0) AND (S[Length(S)] = ' ') do Delete (S,Length(S),1); { If TraceON Then Writeln (S);} S:= S + ' '; LL:= Length (S); While (Length(S) > 0) AND (S[1] = ' ') do Delete (S,1,1) End End; { New Line } Procedure LireSymbole (Var Fich: Text); Var Ch: Char; I: 1..MaxL+1; I2: Integer; Begin While S = '' do Nouvelle_Ligne (Fich); I:=1; Ch:= S[I]; If Ch in ['A'..'Z','a'..'z'] Then { Terminal Symb. or Ident } Begin While S[I+1] in ['A'..'Z','a'..'z','0'..'9'] do I:=I+1; I2:= 16; If I 0) AND (S[1] = ' ') do Delete (S,1,1) End; { LireSymbole } Procedure Localiser (X: NomSymb; Var T: Pointe_Liste); Var Tprim: Pointe_Liste; Begin Tprim:= Liste; Sentinel^.Nom:= X;
While Tprim^.Nom X do Tprim:= Tprim^.Suivant; If Tprim = Sentinel Then { Unexistant } Begin { Insert the new objet } New (Sentinel); Tprim^.Suivant:= Sentinel; Tprim^.Entree:= Nil End; T:= Tprim End; { Localiser } Procedure Terme (Var P,Q,R: Pointeur); Var A,B,C: Pointeur; Procedure Facteur (Var P,Q: Pointeur); Var A,B: Pointeur; T: Pointe_Liste; Begin If Sym in [Vide..Ident] Then { Symbol } Begin New (A); If Sym = Ident Then { Non terminal Symbol } Begin A^.Terminal:= False; Localiser (IdSymb, T); A^.NewObj:= T End Else { Terminal Symbol } Begin A^.Terminal:= True; A^.Cible:= Sym End; P:= A; Q:= A; LireSymbole (BNF) End Else { Verify for a term between accolades} If Sym = AccolGauche Then Begin { OK } LireSymbole (BNF); Terme (P,A,B); B^.Suc:= P; { Closing the loop } New (B); { Insert an empty symbol } B^.Cible:= Vide; B^.Terminal:= True; A^.Alt:= B; Q:= B; If Sym = AccolDroite Then LireSymbole (BNF) Else Erreur (3) End Else Erreur (4) End; { Facteur } Begin { Procedure Terme } Facteur (P,A); Q:= A; While (Sym in [Vide..Ident]) or (Sym = AccolGauche) do Begin Facteur (A^.Suc, B); B^.Alt:= Nil; A:= B End; R:= A End; { Terme }
Procedure Expression (Var P,Q: Pointeur); Var A,B,C: Pointeur; Begin Terme (P,A,C); C^.Suc:= Nil; While Sym = Barre do Begin LireSymbole (BNF); Terme (A^.Alt, B,C); C^.Suc:= Nil; A:=B End; Q:= A End; { Expression } Procedure Premier (T: Pointe_Liste; Var SS: SymSet); Var S: Pointeur; S2: SymSet; Begin S:= T^.Entree; SS:= []; Repeat With S^ do If Terminal Then SS:= SS + [Cible] Else Begin Premier(NewObj,S2); SS:= SS + S2; End; S:= S^.Alt Until S = Nil; End; { Premier } Function Existe (Groupe: SymSet; Var Fich: Text): Boolean; Begin While Not (Sym in (Groupe + [Point])) do LireSymbole (Fich); Existe:= Sym Point End; { Existe } Procedure Analyser (Obj: Pointe_Liste; Var Correct: Boolean); Var S: Pointeur; Begin S:= Obj^.Entree; Repeat With S^ do If Terminal Then If Sym = Cible Then { whose next... } Begin Correct:= True; LireSymbole (Donnees) End Else Correct:= (Cible = Vide) Else Analyser (NewObj, Correct); If Correct Then S:= S^.Suc Else S:= S^.Alt Until S = Nil End; { Analyser } Begin { MAIN } ClrScr; Writeln ('Program SYNTAX-PC --
Version 5.0 (91/06)');
Writeln; Writeln ('Name of the BNF file (ex: Marshall.BNF) ?'); Readln (NF); Writeln ('Name of the Data file (ex: Pigeon.DAT) ?'); Readln (NF2); Writeln; Assign (Results, NF + '.PRN'); Rewrite (Results); Writeln ('BNF Analysis of file ', NF); Writeln; S:= ''; LL:=1; TraceOn:= True; New (Sentinel); Liste:= Sentinel; Assign (BNF, NF + '.BNF'); Reset (BNF); LireSymbole (BNF); While Sym Dollar do Begin If Sym Ident Then Erreur (5); Localiser (IdSymb, T); LireSymbole (BNF); If Sym Est Then Erreur (6); LireSymbole (BNF); Expression (T^.Entree, Pt); Pt^.Alt:= Nil; If Sym Point Then Erreur (7); LireSymbole (BNF) End; Writeln; Write ('Undefined Object(s):'); T:= Liste; Trouve:= True; While T Sentinel do Begin With T^ do If Entree = Nil Then Begin Write (' ', Nom); Trouve:= False End; T:= T^.Suivant End; If Trouve Then Writeln (' none') Else Begin Writeln; Halt End; While Not Eof (BNF) do Begin Write ('"'); LireSymbole (BNF); Write (IdSymb, '" in file ' ,NF2, ' = '); TraceOn:= False; IIdSymb := IdSymb; If Sym Ident Then Erreur (5); Localiser (IdSymb, T); If T^.Entree = Nil Then Erreur (8); Assign(Donnees, NF2 + '.DAT'); Reset(Donnees); Ntot:= 0; Premier (T,Prem); While Not Eof (Donnees) do Begin Nouvelle_Ligne (Donnees); While (Length(IIdSymb) > 0) AND (IIdSymb[1] = ' ') do
Delete(IIdSymb,1,1); Write (Results, '"',NF,'.BNF on ',NF2,'", "',Copy(S,1,6),'", "', IIdSymb,'",'); Nouvelle_Ligne (Donnees); LireSymbole (Donnees); Nfois:= 0; While Existe (Prem, Donnees) do Begin Analyser (T, Trouve); If Trouve Then Nfois:= Nfois + 1 End; Writeln (Results, Nfois:5); Ntot:= Ntot + Nfois End; Writeln (Ntot:4); Writeln (Results, '"',NF,'.BNF on ',NF2,'", "','Total','", "', IIdSymb,'",', Ntot:5); End; {Job to do on BNF} Close (BNF); Close (Donnees); Close (Results); Writeln ('Job done...'); Writeln; ReadLn(NF); End. { of SYNTAX}