Building on Success in Genetic Programming

0 downloads 0 Views 191KB Size Report
3 VietNam Military Technical Academy, Hanoi, VietNam .... Development and Fitness Evaluation: Each individual undergoes a fixed ... We use an elite of 1. 4.
Building on Success in Genetic Programming: Adaptive Variation & Developmental Evaluation Tuan-Hao Hoang1 , Daryl Essam1 , Bob McKay2 , and Hoai Nguyen-Xuan3 1

Australian Defence Force Academy, Canberra, Australia 2 Seoul National University, Seoul, Korea 3 VietNam Military Technical Academy, Hanoi, VietNam hao [email protected],[email protected],[email protected],[email protected]

Abstract. We investigate a developmental tree-adjoining grammar guided genetic programming system (DTAG3P+ ), in which genetic operator application rates are adapted during evolution. We previously showed developmental evaluation could promote structured solutions and improve performance in symbolic regression problems. However testing on parity problems revealed an unanticipated problem, that good building blocks for early developmental stages might be lost in later stages of evolution. The adaptive variation rate in DTAG3P+ preserves good building blocks found in early search for later stages. It gives both good performance on small k-parity problems, and good scaling to large problems. Key words: Genetic Programming, Developmental, Incremental Learning, Adaptive Mutation

1

Introduction

Developmental tree adjoining grammar guided GP (DTAG3P) is a grammar guided GP system using L-systems [1] to encode tree adjoining grammar guided (TAG) derivation trees [2]. In [3, 4], we introduced developmental evaluation, in which individuals are evaluated throughout development on problems of increasing difficulty. We demonstrated it could solve some difficult problems and in [5], that this good performance was associated with increased genotypic regularity. However we also noted that it might find, but subsequently lose, structures which had been successful in early developmental phases. This differs from natural evolution, in which archaic structures are highly conserved. To validate our hypothesis, that adaptive variation rates might ameliorate this, we compare DTAG3P+ with both Koza-style GP [6] and the original TAG3P [2], on a family of k-parity problems of increasing difficulty. Also, we observe DTAG3P+ results on a family of symbolic regression problems previously studied in [4], to provide comparison with the original DTAG3P representation. The paper is organised as follows. The next section briefly mentions the literature on adaptive GP and surveys L-systems, concluding with k-parity problems, emphasising their scaling properties. Section 3 discusses the interaction of

2

Tuan-Hao Hoang, Daryl Essam, Bob McKay, and Hoai Nguyen-Xuan

evolution, development and evaluation, and describes our L-system based developmental evolutionary system with adaptive variation. Experimental setups are described in section 4. Section 5 provides the results, with discussion in section 6. Conclusions and future work are laid out in the final section.

2 2.1

Background and Previous Work Adaptive Evolutionary Parameters

We call an evolutionary algorithm ”adaptive” if it modifies or updates some aspect of the method based on its behaviour in solving a problem. It has been an important theme in evolutionary and GP research over the years [7–14], generally involving either a heuristic algorithm for determining the adaptation based on previous behaviour, or incorporating parameters controlling the evolution into the evolutionary genotype (self-adaptation). 2.2

L-systems and DOL-systems

L-systems were introduced by Lindenmayer in 1968 [1], to simulate the developmental processes of natural organisms. For more details see [15]. We use the simplest form, a Deterministic L-system with O interactions (DOL-system), corresponding to Context Free Grammars (CFG). A DOL-system is an ordered triplet G = (V, ω, P ) where: – V is the alphabet of the system, V ∗ the set of all words over V . – ω ∈ V ∗ is a nonempty word called the axiom. – P ⊂ V × V ∗ is a finite set of productions. A production (p, s) ∈ P is written p → s; p and s are the predecessor and successor of this production. – Whenever there is no explicit mapping for a symbol p, the identity mapping p → p is assumed. – There is at most one production rule for each symbol p ∈ V . Let p = p1 p2 ...pm be an arbitrary word over V . The word s = s1 s2 ...sm ∈ V ∗ is directly derived from p, denoted p ⇒ s, iff pi → si for all i ∈ {1 . . . m}. If there is a developmental sequence p0 , p1 , . . . , pn with p0 = ω, pn = s, and p0 ⇒ p1 . . . ⇒ pn , we say that G generates s in a derivation of length n. 2.3

Parity Problems

The developmental evaluation approach aims to handle scaling of GP problems by incrementally solving a family of problems of increasing difficulty (during the developmental process). The k-parity problems constitute a long-studied family of difficult GP benchmarks. The even (odd) task is to evolve a function returning 1 if an even (odd) number of the inputs evaluate to 1, and 0 otherwise. Langdon and Poli observed in [16] that the task is extremely sensitive to change in the value of its inputs, and that the commonly-used function

Adaptive Variation and Developmental Evaluation in GP

3

set OR, AN D, N OR, N AN D omits the usful XOR and EQ building blocks of this problem. Inspired by Poli and Page [17], we chose the function set AN D, OR, XOR, N OT as a suitable compromise set for comparisons on the family of odd-k-parity problems – containing the XOR building block, unlike the first function set, but tougher than Poli and Page’s set (which contained all binary Boolean functions).

3

Evolution, Development and Evaluation

3.1

Genotype and Development: TAG Based DOL-systems

For our purposes, DOL systems must not only represent development, but must generate at each stage an individual which can be evaluated. We use TAG representation, as introduced in [2] 4 . Briefly, TAG representation consists of an α tree (’The cat sat on the mat’) and instructions for adjoining β trees (’black’ or ’which it liked’) to form a more complex whole (’The black cat sat on the mat which it liked’). In our representation, the DOL triple G = (V, ω, P ) is mapped to TAG representation by defining ω to consist of an α tree together with a predecessor from P , and each letter {L1 , L2 , L3 , . . . ...} ∈ V to be either a predecessor from P or a β tree. Thus the DOL-rewriting of ω corresponds to adjunction of successive β trees into the initial α tree. For example, assume an Lsystem G′ = (V ′ , ω ′ , P ′ ) with V ′ = {L1 , L2 , L3 , L4 }, ω ′ = (α1 , L1 ), and P ′ being P1′ : L1 → β1 β2 β3 L4 β1 L2 , P2′ : L2 → β2 β1 β4 β3 L2 L3 , P3′ : L3 → β5 β6 L4 β7 β8 L1 , P4′ : L4 → β1 L2 β4 β6 β7 L3 . Figure 1 shows tree representations of these productions, together with three stages of the expansion of this system into a TAG derivation tree.The expansion starts with the TAG initial tree α1 together with the predecessor L1 . In the stage 1 expansion, L1 is replaced by its successor in the production rule P1′ . This successor has two predecessors L2 and L4 . In the stage 2 expansion, these are replaced by their successors using the corresponding production rules P2′ and P4′ . This leaves us with four available predecessors – two occurrences each of L2 and L3 . This process continues until predefined limits on the number of stages are reached. 3.2

Developmental TAG GP (DTAG3P)

DTAG3P uses TAG-based DOL-systems to encode Tree Adjoining Grammars, so delimiting the language of the genetic programming system. It is a developmental form of the earlier TAG3P system, and shares many aspects. We describe these briefly, but refer readers to [2] for detail. We assume a lexicalised TAG (LTAG) grammar 5 Glex defining the sets A of α trees, and B of β trees. We 4

5

Any rooted subtree of a valid TAG derivation tree is also valid, as is any extension [2]. This property is not shared by other tree-based GP representations. Schabes [18] showed any CFG G generates an equivalent lexicalised TAG Glex , and an inverse transformation converts a Glex derivation tree to a G derivation tree. We use grammars G which generate the expression trees for the problem domain.

4

Tuan-Hao Hoang, Daryl Essam, Bob McKay, and Hoai Nguyen-Xuan

Fig. 1. TAG-Based L-Systems Example, Left: representations of individual rules, Right: stages of development

evolve DOL rulesets (each ruleset is an evolutionary individual). The ruleset specifies the development of the individual, generating a TAG derivation tree at each stage s of development. This tree is fitness-evaluated against the corresponding problem Ps from our target family of problems. Evaluation uses the standard conversion, first to the corresponding CFG derivation tree, and then to the expression tree [2]. We follow Koza’s specification scheme [6], adapting it to incorporate developmental evaluation: 1. Initialisation: We randomly generate maxpop DOL-systems, each containing nrules rules R = {R1 , R2 , . . . , Rnrules }. We denote the of these S predecessors S rules as Λ = {L1 , L2 , . . . , Lnrules }, so that V = Λ A B . We randomly select ω = (α, L): α ∈ A, L ∈ Λ. We construct the successor (RHS) of Ri by first randomly drawing β-trees from B and assigning them to the RHS of Ri , up to a random limit between minbetas , . . . maxbetas , and then randomly drawing numletter predecessors from V and inserting them into the RHS. 2. Development and Fitness Evaluation: Each individual undergoes a fixed number maxlif e of developmental stages (corresponding to the size of the problem family) 6 . Each individual I is expanded through its development stages (see section 3.1). At stage s, this generates a TAG derivation tree Is of Glex and the corresponding CFG derivation tree 4 CF (Is ) of G and expression exp(Is ). We evaluate exp(Is ) against the corresponding problem, Ps , to get a fitness value f it(Is ). 3. Selection uses a developmental form of tournament selection of size sizetourn. We first compare the individuals on stage 1 fitness f it(I1 ). The fittest individ6

Many fitness evaluations from later life stages are not used in selection. Lazy evaluation would eliminate their computational cost. For analysis, we perform these evaluations, but also report the computational cost had we avoided them.

Adaptive Variation and Developmental Evaluation in GP

5

uals are carried to stage 2. This is repeated as necessary with I2 , . . . , Imaxlif e ; if more than one reach maxlif e , we use random choice 6 . Two individuals I, J are considered equal iff |f it(Is ) − f it(Js ) ≤ δ|. We use an elite of 1. 4. Genetic operators: individuals in the next generation are produced with probability pX by recombination and 1 − pX by alteration. – Recombination takes two individuals {P1 , P2 } and creates two offspring {C1 , C2 } by uniform crossover on rules: a rule in C1 (C2 ) is with probability pcopy copied from the corresponding rule of P1 (P2 ), otherwise from the corresponding rule of P2 (P1 ). – Alteration consists of three sub-operators 7 acting on RHSs of rules: • internal crossover 8 : subtree crossover is performed between rules • subtree mutation: subtrees are mutated by subtree mutation • lexical mutation: the symbol in a node is randomly substituted The probability of alteration uses an adaptive alteration rate padapt , initially set to a high value pbad . When a rule is used in a developmental stage which was used to select the parent, it is reset to a lower value pgood . Thus the child is more likely to inherit this rule unchanged. 5. Parameters: The maximum number of generations maxgen , population size maxpop , and recombination (pX , pcopy ) and alteration (pbad , pgood ) rates specify the evolutionary system; the number of rules nrules and minbetas , maxbetas , numletter – respectively minimum and maximum number of β trees and of predecessors in a rule RHS – together with the maximum lifetime maxlif e and minimum difference δ, specify the developmental system. 3.3

Domain Variables

Our previous work [3, 4] used a family {F1 , . . . , F9 } of symbolic regression problems, with Fi (X) = 1 + x + . . . + xi . In these problems, F1 , . . . , F9 all have the same domain variable, x. By contrast, each new k + 1-parity problem introduces a new variable xk+1 not required for k-parity. But the learning system doesn’t know this. How should it handle an expression with xj when it is solving a problem Pk , k < j where it has no meaning 9 ? We can see three different ways of dealing with such ’undefined’ variables (denoted undef ) during evaluation. We chose alternative 2 in these experiments after preliminary testing. 1. Always replace undef with 1 (or equivalently, with 0) 2. (undef OP xi |undef ) = (xi |undef OP undef ) = undef ; when an expression evaluates to undef , its error is calculated as 0.5 3. undef OP xi and xi OP undef depend on OP (for example, undef AND 0 = 0; undef AND 1 = undef ), giving a Lukasiewicz logic [19]. 7 8

9

They are operators of the TAG3P GP system, described in full detail in [2]. We emphasise that this is an exchange of information between components of an individual, not a recombination operator. The developmental process could prevent this, but we regard it as cheating.

6

4

Tuan-Hao Hoang, Daryl Essam, Bob McKay, and Hoai Nguyen-Xuan

Experimental Details

The first experiments compare the adaptive variation system DTAG3P+ with the original DTAG3P on two benchmark problems: the structured family of symbolic regression problems {F1 , . . . , F9 }, and the family of odd-k-parity problems {k = 2, . . . , 10}. The context-free grammars G1 and G2 for symbolic regression and parity are given in table 1. The corresponding TAGs are Gilex ={Vi , Ti , Ii , Ai }, i =

Table 1. Grammars for Solution Spaces: (top) Symbolic Regression, (bottom) Parity G1 = (V1 , T1 , P1 , S1 ) P1 = EXP → EXP OP EXP |P RE EXP |V AR S1 = EXP OP → +| − | ∗ |/ V1 = {EXP, P RE, OP, V AR} P RE → sin|cos|lg|ep T1 = {x, sin, cos, lg, ep, +, −, ∗, /} V AR → x G2 = (V2 , T2 , P2 , S2 ) P2 = BOOL → BOOL OP BOOL S2 = BOOL |PRE BOOL|VAR V2 = {BOOL, P RE, OP, V AR} OP → AND|OR|XOR|NOT T2 = {AND|OR|XOR|NOT PRE → NOT |xi , i = 1, . . . , k} VAR → x, i = 1, . . . , k

1, . . . 2, with I1 ∪ A1 for symbolic regression as in [3], and with I2 ∪ A2 for parity as shown in figure 2. All individuals are composed of instances of these α-and β-trees. Detailed parameters for experiments on the two problem families are

Fig. 2. Lexicalized TAG (Glex ) elementary trees for k-parity problem

given in Table 2. For the parity experiments, three experimental settings have been used with different k-parity problems (k = 8, 10, 12). We start development with the 2-parity problem, giving lifetimes maxlif e of 7, 9 and 11. The second set of experiments, addressed scaling issues, testing the performance of DTAG3P+ , TAG3P and GP on k-parity (k = 8, 10, 12). Counting generations gives a distorted view of DTAG3P+ ’s computational cost: not all evaluations are used 6 , and the parsimony of DTAG3P+ leads to cheaper evaluation. We use evaluations of nodes in the expression tree as the primary measure, with a budget of 1.25 ∗ 108 node evaluations, but also report function evaluations and generations in the results. Parameters were otherwise as in table 2.

Adaptive Variation and Developmental Evaluation in GP

7

Table 2. Evolutionary & Developmental Parameter Settings Parameter Success Predicate Fitness Cases Fitness Genetic Operators Elite Size ♯ of Runs maxlif e maxgen maxpop pX pcopy pbad pgood nrules (minbetas , maxbetas ) numletter δ

5

Symbolic Regression Parity Error Sum < ε= 0.01 Zero Error 20 points in [−1 . . . + 1] All Boolean Combinationsdot Error Sum ♯ of Errors (undef counts 0.5) Tournament selection (3); Recombination, internal crossover, subtree mutation, lexical mutation 1 100 30 8 7,9,11 10 (1) 51 100,100,100 250 500 0.9 0.8 0.8 0.05 15 20 (1,2) 1 0.01 0

Results

Table 3. Percentage Success Rates, Left: Symbolic Regression, F9 , Right: Parity

GP 0%

TAG 8%

DTAG3P 73%

DTAG3P+ DTAG3P 100% DTAG3P+

6-Parity 0% 93%

8-Parity 0% 80%

10-Parity 0% 53%

Table 3 and the top row of figure 3 summarise the results of the first set of experiments, giving some impression of the overall performance of DTAG3P+ . The first part of the table shows the percentage of successful runs for GP, TAG3P, DTAG3P and DTAG3P+ on F9 from the symbolic regression problem family, with the second showing the same for DTAG3P and DTAG3P+ on 6, 8 and 10-parity problems. The left figure shows the cumulative probability of success by generation for these systems on the symbolic regression, while the middle figure compares the median best fitnesses by generation for 6, 8 and 10-parity problems. In the latter, the combination of our treatment of ”undefined” values, so that 0.5 fitness is easilty attained, and the use of elitism, probably explain the ’blocky’ look of the plots. From the right figure we gain an impression of the

8

Tuan-Hao Hoang, Daryl Essam, Bob McKay, and Hoai Nguyen-Xuan 1

90

0.8 Median of the best fitness

Cumulative success frequency

100 6−DTAG3P 8−DTAG3P 10−DTAG3P 6−DTAG3P+ 8−DTAG3P+ 10−DTAG3P+

0.9

DTAG3P+ DTAG3P

80 70 60 50 40 30

0.7

0.5 0.4 0.3 0.2

10

0.1

0

0

0

20

40 60 Generations

80

100

1

80

0.6

20

6−parity 8−parity 10−parity

90

Cumulative success frequency

100

70 60 50 40 30 20 10

0

20

40 60 Generations

80

0

100

0

20

40 60 Generations

80

100

1 8−DTAG3P+ 8−TAG3P 8−GP

0.9

8−DTAG3P+ 8−TAG3P 8−GP

0.9

0.8

1 12−DTAG3P+ 12−TAG3P 12−GP

0.9

0.8

0.6 0.5 0.4 0.3

0.7 0.6 0.5 0.4 0.3

0.2

0.2

0.1

0.1

0

0

0.5

1 1.5 2 Number of function evaluations

0

2.5

Median of the best fitness

Median of the best fitness

Median of the best fitness

0.8

0.7

0.7 0.6 0.5 0.4 0.3 0.2 0.1

0

5

x 10

0.5

1

1.5 2 2.5 3 Number of function evaluations

3.5

4

0

5

x 10

0

10

20

30

40

50

60

Number of node evaluations

70

80

90

100 x 106

Fig. 3. Top: Overall Performance of DTAG3P+ : (Left: Cumulative Success Rate vs Generation (F9 , Symbolic Regression), Middle: Median Best Fitness vs Generation (8parity), Right: Cumulative Success Rate vs Generation on family of k-parity problems). Bottom: Median of the best fitness of GP, TAG3P and DTAG3P+ on 12-parity problem vs number of evaluations, left: all functions, middle: necessary functions, right: nodes

scaling of DTAG3P+ on the parity problems, showing as it does the cumulative probablity of success on each of the 6, 8, 10 and 12 parity problems. The bottom

Table 4. Success Rates after 1.25 ∗ 108 Node Evaluations: 8, 10 and 12 Parity, 30 runs

8 Parity 10 Parity 12 Parity

GP 23% 0% 0%

TAG 23% 0% 0%

DTAG3P+ 100% 100% 100%

row of figure 3 gives a more detailed insight into the relative behaviour of GP, TAG3P and DTAG3P+ on the difficult 12-parity problem. The left figure gives the conventional view of the relative computational complexity, using the number of function evaluations as X-axis. The middle figure takes into account lazy evaluation of fitnesses, and only counts the number of evaluations which would actually be necessary. The right figure gives the most accurate comparison of computational cost, showing the actual number of node evaluations (AND, OR, XOR, NOT operations) required. The Y axis, in all cases, is the median best fitness (calculated over 30 runs). Using this metric, table 4 shows the relative rates of success of GP, TAG3P and DTAG3P+ on 8, 10 and 12 parity.

Adaptive Variation and Developmental Evaluation in GP

6

9

Discussion

The results in table 3 and figure 3(top) confirm that the good performance of DTAG3P on symbolic regression is not damaged by use of adaptive variation rates – it appears to be substantially enhanced – and that this performance improvement extends to parity problems. They show that DTAG3P scales well with the difficulty of this problem, and that reasonable computational resources can handle this difficult problem well. Figure 3(bottom) shows that the results further favour DTAG3P+ when more realistic measures of computational cost are used: DTAG3P+ is able to solve the difficult 12-parity problem reliably, with a computational effort equivalent to only 2-3 generations of standard GP – when the latter has made virtually no progress on the problem. Table 4 shows that, while GP and TAG3P scale very poorly on parity problems, DTAG3P+ scales well. Finally we note that the developmental methods not only solve one problem (F9 or 12-parity) in less computational cost than conventional approaches, but also solve the smaller problems (Fi , i < 9, k-parity, k < 12) for free.

7

Conclusions and Future Work

We demonstrated that developmental evaluation, combined with adaptive variation, both performs well in symbolic regression and parity problems, and scales well. We believe we have demonstrated a general purpose problem decomposition engine, applicable to a range of domains, and scaling well, for families of related problems. Future work will include – replacing adaptive variation with self-adaptive variation, to reduce the number of system parameters and further improve scalability – further investigating the role of replicated building blocks in DTAG3P+ with compression methods – more detailed investigation of the scalability of DTAG3P+ – extension to a range of new problems – investigating alternative ways to handle undefined variables Acknowledgements We gratefully acknowledge the benefit of discussions with Naoki Mori. This research was partially financially supported by a Seoul National University support grant for new faculty.

References 1. Lindenmayer, A.: Mathematical models for cellular interaction in development, parts i and ii. Journal of Theoretical Biology 18 (1968) 280–299 and 300–315 2. Hoai, N.X., McKay, R.I.B., Essam, D.: Representation and structural difficulty in genetic programming. IEEE Transactions on Evolutionary Computation 10(2) (April 2006) 157–166

10

Tuan-Hao Hoang, Daryl Essam, Bob McKay, and Hoai Nguyen-Xuan

3. McKay, R.I., Hoang, T.H., Essam, D.L., Nguyen, X.H.: Developmental evaluation in genetic programming: the preliminary results. In Collet, P., Tomassini, M., Ebner, M., Gustafson, S., Ek´ art, A., eds.: Proceedings of the 9th European Conference on Genetic Programming. Volume 3905 of Lecture Notes in Computer Science., Budapest, Hungary, Springer (10 - 12 April 2006) 280–289 4. Hao, H.T., Essam, D., McKay, R.I., Nguyen, X.H.: Developmental evaluation in genetic programming: A TAG-based framework. In Pham, T.L., Le, H.K., Nguyen, X.H., eds.: Proceedings of the Third Asian-Pacific workshop on Genetic Programming, Military Technical Academy, Hanoi, VietNam (2006) 86–97 5. Shin, J., Kang, M., McKay, R.I.B., Nguyen, X., Hoang, T.H., Mori, N., Essam, D.: Analysing the regularity of genomes using compression and expression simplification. In: Proceedings of the 10th European Conference on Genetic Programming (EuroGP2007, Valencia, Spain). Volume 4445 of Springer Lecture Notes in Computer Science. Springer-Verlag, Berlin, Germany (April 2007) 251–260 6. Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA, USA (1992) 7. Schwefel, H.P.: Numerical Optimization of Computer Models. John Wiley & Sons, Inc., New York, NY, USA (1981) 8. B¨ ack, T., Schwefel, H.P.: An overview of evolutionary algorithms for parameter optimization. Evol. Comput. 1(1) (1993) 1–23 9. Angeline, P.J., Pollack, J.B.: Coevolving high-level representations. In Langton, C.G., ed.: Artificial Life III. Volume XVII of SFI Studies in the Sciences of Complexity., Santa Fe, New Mexico, Addison-Wesley (15-19 June 1992 1994) 55–71 10. Rosca, J.P., Ballard, D.H.: Hierarchical self-organization in genetic programming. In: Proceedings of the Eleventh International Conference on Machine Learning, Morgan Kaufmann (1994) 11. Angeline, P.J.: Two self-adaptive crossover operators for genetic programming. In Angeline, P.J., Kinnear, Jr., K.E., eds.: Advances in Genetic Programming 2. MIT Press, Cambridge, MA, USA (1996) 89–110 12. Teller, A.: Evolving programmers: The co-evolution of intelligent recombination operators. In Angeline, P.J., Kinnear, Jr., K.E., eds.: Advances in Genetic Programming 2. MIT Press, Cambridge, MA, USA (1996) 45–68 13. Iba, H., de Garis, H.: Extending genetic programming with recombinative guidance. In Angeline, P.J., Kinnear, Jr., K.E., eds.: Advances in Genetic Programming 2. MIT Press, Cambridge, MA, USA (1996) 69–88 14. Angeline, P.J.: Adaptive and self-adaptive evolutionary computations. In Palaniswami, M., Attikiouzel, Y., eds.: Computational Intelligence: A Dynamic Systems Perspective. IEEE Press (1995) 152–163 15. Prusinkiewicz, P., Lindenmayer, A.: The algorithmic beauty of plants. SpringerVerlag New York, Inc., New York, NY, USA (1996) 16. Langdon, W.B., Poli, R.: Why “building blocks” don’t work on parity problems. Technical Report CSRP-98-17, University of Birmingham, School of Computer Science (13 July 1998) 17. Poli, R., Page, J.: Solving high-order boolean parity problems with smooth uniform crossover, sub-machine code GP and demes. Genetic Programming and Evolvable Machines 1(1/2) (April 2000) 37–56 18. Schabes, Y., Waters, R.: Tree insertion grammar: A cubic-time parsable formalism that lexicalizes context-free grammar without changing the trees produced. Computational Linguistics 20(1) (1995) 479–513 19. Lukasiewicz, J. In: On Three-Valued Logic. Clarendon Press, Oxford, UK (1967) 16–18