Database Technologies for L-system Simulations in Virtual Plant Applications on Bioinformatics
Database Technologies for L-system Simulations in Virtual Plant Applications on Bioinformatics Yi-Ping Phoebe Chen1 and Robert M. Colomb2 1
Faculty of Information Technology Queensland University of Technology, QLD 4001, Australia email:
[email protected] and 2
School of Computer Science and Electrical Engineering The University of Queensland, QLD4072, Australia email:
[email protected]
Abstract One of the most important advantages of database systems is that the underlying mathematics is rich enough to specify very complex operations with a small number of statements in the database language. This research covers an aspect of biological informatics, that is the marriage of information technology and biology, involving the study of real world phenomena using virtual plants derived from L-systems simulation. L-systems were introduced by Aristid Lindenmayer as a mathematical model of multicellular organisms. Not much consideration has been given to the problem of persistent storage for these simulations. Current procedures for querying data generated by L-systems for scientific experiments, simulations and measurements are also inadiquate. To address these problems the research in this paper presents a generic process for data modelling tools (L-DBM) between L-systems and Database systems. This paper shows how L-system productions can be generically and automatically represented in database schema and how a database can be populated from the L-system strings. This paper further describes the idea of pre-computing recursive structures in the data into derived attributes using compiler generation. A method to allow a correspondence between biologistís terms and compiler generated terms in a biologist computing environment is supplied. Once the L-DBM gets any specific L-systems productions and its declarations, it can generate the specific schema for both simple correspondence terminology and also complex recursive structure data attributes and relationships. Key words - Database Models, Bioinformatics, Database System Development, L-Systems, , Advanced Application and Branching Systems, Knowledge Representation.
1.0 Introduction Keeping track of and querying data generated by scientific experiments, simulations and measurements requires database management facilities (Chen and Markowitz, 1995; Markowitz and Ritter, 1995). One of the most important advantages of database systems is that the underlying mathematics is rich enough to specify very complex operations with a small number of statements in the database language (Chen and Colomb, 1998). The ability to store and query information simply is important from a software engineering point of view (Leebaert, 1995), since the cost of developing an application is closely related to the number of programming language constructs needed to implement it. This research covers an aspect of biological informatics (Bioinformatics), that is the marriage of information technology and biology, involving the study of real world phenomena using virtual plants. Virtual plants are computer simulations of the structural dynamics of individual plants in 3-D space (Room, Hanan and
1
Database Technologies for L-system Simulations in Virtual Plant Applications on Bioinformatics
Prusinkiewicz, 1996). One approach is using L-systems. In 1968 a biologist, Aristid Lindenmayer (Lindenmayer, 1968), introduced a new type of string rewriting mechanism, subsequently termed L-systems. L-systems are context-sensitive parallel grammars. The main difference between Chomsky grammars (1950, cited in Pittman and Peters, 1992) and L-systems lies in the method of applying productions. Chomsky grammar productions are applied sequentially one at a time. L-systems however, are applied in parallel and simultaneously replace all letters in a given word (Prusinkiewicz and Hanan, 1989). Further, there are no terminals in L-systems. L-systems have been used in many scientific areas related to branching (plant-like) structures. L-systems can be used to simulate rather than just deliver the productions. For example, L-system productions can simulate morphogensis1 of shoots and roots of single or multiple individuals of any parts of plant so long as they are given the appropriate growth rules or productions. Particular plant parts in particular states are represented by symbols and associated parameters. The process of transformation is expressed as rules or productions (Room, Hanan and Prusinkiewicz, 1996; Prusinkiewicz et al, 2000). The Commonwealth Scientific and Industrial Research Organisation (CSIRO) research program traced a process for measuring plant structure over time, deriving statistics for development and growth from measurements, and modelling morphology2 as a set of growth rules expressed in the L-system formalism (Hanan and Room, 1997). At any point during development, a virtual plant’s architecture is defined by a string of symbols representing its constituent models. The string can be processed to generate statistical summaries of plant attributes or visualised as schematic or realistic computer graphic images seen from any angle (Room et al, 1994). The example shows in Fig.1 describes modular nature of plant through L-system string.
A
B
L I
B L I L-system String: I[L][B]I[L][B]A
FIGURE 1. L-system String can be Visualised This project is based on cooperation with a major Australian government research laboratory, the CSIRO Division of Entomology. Their actual research data and their branching system are used as an application domain. L-systems allow specification of how an object transforms from one state to another. Particular plant parts in particular states are represented by symbols. At any point during development, a virtual plant’s structure is defined by a string of symbols representing its constituent parts. The L-system approach to plant ecology3 builds on the treatment of plants as multiplication of parts (Room, Hanan and Prusinkiewicz, 1996).
1.1 The Main Problem The major motivation for this research is to develop tools for data analysis. Tracking of the data collection is a most
1. Morphogensis means the formation structure of plants . 2. Morphology means that branch of biology which deals with the form and structure of plants. 3. Ecology means the branch of biology that deals with the relations between organisms and their environment.
2
Database Technologies for L-system Simulations in Virtual Plant Applications on Bioinformatics
important task for an analytical scientist. The database application tools investigate and improve processing efficiency for data and simulation outputs. In particular, simulations of even small stands of plants quickly generate massive amounts of data. For example, a scientist may choose to run a series of small field scale simulations stepping a number of values through a few parameters (for example, 5 parameters with 5 values each is 3125 runs). If each run has 100 plants and 100 time-steps and if each plant has 100 parts then each run generates about 106 data values. The total experiment generates 3.125 x 10 9 values in a complex organisation. If such a simulation were performed, then it would seem reasonable that the investigators would spend considerable time looking at the results in a variety of ways. For example: • selecting a subset of values for visualisation • performing aggregations (counts, totals, averages) on selected subsets From this example we have an indication of the size of data sets we will be dealing with. In fact, there could be multiple sets of data derived from the original simulation results.
1.2 Persistent Storage in Bioinformatics There are limitations and difficulties in persistent storage for these L-systems simulations and, not much consideration has been given to the problem of persistent storage for these simulations. To address this problem the research in this paper presents a generic process for data modelling (Elamasri and Navathe, 1989) tools between L-systems and Database systems. It shows how L-system productions can be represented in database schema and how a database can be populated from the L-system strings. The relational data model (Codd, 1970) is supported by standards for data description and manipulation allowing development of many different kinds of tools for storage, retrieval, processing and presentation of data on a broad range of platforms (Colomb, 1998). The main purpose of this paper is to show how a database system can be used to capture and query the results of L-system simulations. Specifically, this research applies relational database technology to a virtual plant environment and a process for creating data modelling tools. The generic Data Based Model for L-Systems (L-DBM) prototype developed for this paper for the persistent storage probelm includes: • The Schema Generator mapping L-system productions to database schema • The population translator for converting L-system strings to tuples of the corresponding schema.
1.3 The Biologist Querying Environment Because of the inadequacy of current procedures for querying data generated by L-systems for scientific experiments, simulations and measurements, scientists currently rerun the simulations and write out a new flat file or files of information each time they want to look at a new set of questions. The L-DBM contributes a query builder using Object-Relational DBMS to satisfy the requirement for a biologically computing environment to supply and extend functionality to satisfy the scientific requirement in bioinformatics research. As a result of this research, decision-support systems will be improved so that for example an agricultural researcher can see how plants will develop if any conditions change.
1.4 Research Focus This research combines a novel application area for databases and highlights the interesting database issues. It also discusses what makes L-systems problematic for databases and what challenges virtual plants pose to database researchers. Compiler technology is used in combination with database systems in this research. Once a relational database is created, the work also shows how it is used. The approach uses examples from the most widely-based
3
Database Technologies for L-system Simulations in Virtual Plant Applications on Bioinformatics
domain for L-systems; that is, virtual plants. Beyond this, the project also intends to investigate the possibilities of generalising this research beyond the virtual plant system to other uses of L-systems, in future work.
1.5 Principal Contributions and Outline of the Paper The following are the major contributions of this paper to the extension of database technologies to L-systems simulation in bioinformatic area. We list them in bottom-up order, that is one supports its successors, which are almost in parallel to the contents of Section 3 to 6. The principal contributions of this paper are the following: • First, finding how L-system simulation data can be accessible to the “utility”. How the branching structure defined by a set of L-systems can be represented in a database schema, and how a database can be populated from the L-system strings. • Second, the creation of a generic solution to persistent storage for L-systems simulation results. The significance of generic is that the solution can accept general L-systems as input and is not designed only for a specific L-system. Once the L-DBM gets any specific L-system productions and its declarations, it can generate the specific schema for both simple correspondence terminology and also complex recursive structure data attributes and relationships. The same correspondence applies to any L-systems using the same vocabulary. Once established it can be used to support an entire research program. • Third, this research contributes the idea of pre-computing recursive structured data into derived attributes using compiler generation. The results show querying recursive structure without recursive queries. • Fourth, a method to allow a correspondence between biologists’ terms and compiler generated terms is supplied, helping to make a biologist-friendly computing environment.
After section 1, the remaining sections of this paper present in detail our ideas and solutions to attain the abovestated objectives. The organization of these sections are as follows. The L-systems: Section 2 introduces the basic concept of L-systems (Lindenmayer, 1968 & 1974) and, in particular, the research on branching systems. The L-systems employ a group of three different kinds of characteristics to describe: basic L-systems, stochastic L-systems (Prusinkiewicz and Hanan 1989 and Hanan 1992), and context-sensitive L-systems. L-systems Problematic for Databases: Section 3 defines the research problem between L-systems and database systems. Generic Solution for Database Schema and A Generic Approach to Population: In Section 4, the generic process solution between L-systems and database systems and how compiler technologies can be applied to L-systems simulation is demonstrated. The section examines how to use the compiler technology, specifically how the L-systems information can be processed to produce the data schema that is relational tables. We begin by presenting the L-DBM (database modelling for L-systems) framework and the relationship with each part. Finally the Population Interpreter is introduced and its production by the Schema Translator explained. The Population Interpreter itself is then described. We begin by discussing the data transformation from L-system strings to database population. After that we start to describe the architecture of the Population Interpreter. Biologist Computing Environment: Section 5 introduces the correspondence problem between L-systems and Database systems. It also
4
Database Technologies for L-system Simulations in Virtual Plant Applications on Bioinformatics
discusses how recursive structure works in conceptual schema for biological terms and demonstrates how to deal with recursive structure without recursive queries. In this section we also present a visualization browser to visualize the biological terms in database systems to give a biologically computing environment. Further Extensions Section 6 will illustrate the summary and further extensions of the work of on this research.
2.0 L-System L-systems allow specification of how an object transforms from one state to another, sometimes adding new parts, during an interval of time. Particular plant parts in particular states are represented by symbols and the process of transformation is expressed as morphonlogical rules or ‘productions’. An L-system is an alphabet, an axiom and a set of productions. Each rule consists of two components, a predecessor and a successor. The format is : predecessor --> successor. During a derivation step, the predecessor (identified by its label) is replaced by the successor. L-systems use a string notation that makes it easy to specify productions and carry out simulations. Fig. 2 shows one simple example for an L-systems production. In L-systems different types and parts of plants are represented by different letters of the alphabet. “A” is the predecessor and “B[C]D[E]F” is the successor in this production. A-->B[C]D[E]F means A transforms to B[C]D[E]F. In this example C and E symbols are enclosed in “[ ]”. The “[ ]” square brackets give a branch structure and allow the strings of symbols to represent a branching structure. The above production A->B[C]D[E]F is without attributes. Attributes are used for indicating additional information in the figure normally used for 3D position, color and angle. It could not be visualized without them. The production below has attributes “+” and “-” A-->B[+C]D[-E]F with direction “-” indicates the right hand side of the plant and “+” indicates left direction in this case. From this example, we could see that the plant grows from the predecessor to its successor.
A-->B[C]D[E]F
F E D
C predecessor
successor
A transform
B
A-->B[+C]D[-E]F A L-system Production
FIGURE 2. Example of A L-system Production
5
Database Technologies for L-system Simulations in Virtual Plant Applications on Bioinformatics
2.1 Virtual Plant Virtual plants are assembled by software which interprets rules of growth worked out from measurements of real plants (Room and Hanan, 1995), and virtual plants describe the shapes and three dimensional positions of plant parts. L-systems can be used to simulate morphogenesis of shoots and roots of single or multiple individuals of any plants. A virtual plant exists as a string which is updated according to rules at each time-step of a simulation. The string can be used to construct images (Room, Hanan and Prusinkiewicz, 1996; Prusinkiewicz et. al, 2000). For this very simple example, some of the rules used to generate Fig. 4 virtual plant can be expressed as follows: (‘A’ ‘D’ represent apex, ‘B’ ‘C’ for bud, ‘L’ for leaf, ‘F’ for flower, ‘E’ for cotyledon4, ‘I’ for internode and ‘[ ]’ encloses a branch). In general the start symbol(s) (called the axiom) must be specified separately. Even very simple L-systems can produce plant-like structures. We have mentioned that it does not matter what symbols the user employs to represent the plant architecture as long as it describes the tree structure. We have provided a facility for documenting of these symbols in the L-DBM. Fig. 3 shows the definition of Fig. 4 Lsystems. We recall that, in general, the builder of the L-system can use any symbol to represent anything except the branch control symbol “[ ]”.
S: Stem I: Internode E:Cotyledon A:Apex L:Leaf B:Young Bud C:Old Bud D:Reproductive Apex F:Flower
FIGURE 3. The Definition of Symbols For example, Fig. 4 presents an L-system with five productions. The main stem is represented as a series of internodes: the Apex adds them one after another according to production R1. If the I’s and A’s had [ ] around them they wouldn’t represent a stem, but a number of stems coming from the same point. The [ ] controls the branching topology. The productions control how a plant changes from time-step to time-step. So B, C and D describe the development of “laterals” while A describes the development of the main stem.
We start with a stem: S L-system Axiom: S with productions:
R1. S --> I[E][E]A R2. A --> I[L][B]A R3. B --> C R4. C --> I[L][C]D R5. D --> [L]F FIGURE 4. Example of L-system Productions
4. Cotyledon means an embryonic plant leaf, the first to appear from a sprouting seed.
6
Database Technologies for L-system Simulations in Virtual Plant Applications on Bioinformatics
The axiom describes the starting configuration of the L-systems. In this case, ‘S’ is representing the plant’s meristem. The five productions are growth rules that describe how the plant components develop over time. The first production captures the emergence of the cotyledons represented by the ‘E’. The vegetative stage of growth is controlled by the second production. The third production specifies that the bud develops, then creates a branching substructure (sub-branch) with the fourth production. Both the second and the fourth productions increase the stem by adding another internode length. The fifth production captures the flower. In successive time-steps, the strings in Fig. 5 capture a few stages of development for this simple L-system:
S (0) I[E][E]A (1) I[E][E]I[L][B]A (2) I[E][E]I[L][C]I[L][B]A (3) I[E][E]I[L][I[L][C]D]I[L][C]I[L][B]A (4) I[E][E]I[L][I[L][I[L][C]D][L]F]I[L][I[L][C]D]I[L][C]I[L][B]A(5)
FIGURE 5. Five Time-Steps (Steps 0, 1, 2, 3, 4 and 5) for the L-systems Strings The corresponding simulated plants for the L-system strings of Fig. 5 are shown in Fig. 6. Fig. 6 illustrates the results of time-steps 3, 4 and 5 of the five L-system production rules given in Fig. 5. The fig. is annotated with additional information needed to query the set of virtual plants. The major information includes branch identifier, internode identifier and leaf identifier in the virtual plant. In Fig. 6: • The “Œ” expresses the branch identifier appearing in the plant. For example, the flower (F) belongs to branch number 2 (•). • The “1” represents the internode identifier in the plant. For example the flower (F) belongs to internode number 3 ( I3 ). • The “1” expresses the leaf identifier of the plant. • The “•” describes the order of branch. The order of branching is the depth of nesting in branches. For example, in the third Plant (5) in Fig. 6, the first and third Ls (E1 and L3) are leaves on the main stem or first-order “branch” ( • ). The fourth and sixth Ls (L4 and L6) are leaves on a second-order branch ( ‚ ). The fifth L is a leaf on a third-order branch ( ƒ ). So this is a topological property that must be derived from the string, simply by keeping track of the depth of nesting of [ ]’s.
A
A
L10 B
B L6
B
L5
I5 B
C4
I3
L3 E1
•
I2 I1
L5 I4
E1
‚ C•
I3
I2
I8
L9
L7
Ž
L6 I5
L3
C
C I6 I7
L8
L4
E2
Œ
(3)
D
• D ‚
A
E2
I3
I1
Œ
E1 I1
(4)
E2
L5 C
ƒ D
• ‚
I2
•
F
I4 L4 L3
• Œ (5) Œ: Branch identifier , •: Order of branch, I1: Internode identifier, L1: Leaf identifier FIGURE 6. Visualization of Result for Time-Steps 3, 4 and 5 of Implementing the Five Productions Given in Fig. 5
7
Database Technologies for L-system Simulations in Virtual Plant Applications on Bioinformatics
2.2 Stochastic L-systems All plants generated by the same deterministic L-system are identical (Hanan, 1992), but of course real plants are not all the same. A plant part development depends on its environment. The use of stochastic5 techniques can provide a convenient method to simulate this. Data can be collected from a number of plants and analysed statistically to determine the appropriate probabilities (for example, R1:50%; R2:50%) for branching and growth processes. Basically, the format of stochastic L-systems is: predecessor --> successor : P For example: R1: A1 --> I[L][B1]A1 : 0.5 R2: A1 --> I[L]A2 : 0.5
2.3 Context-Sensitive L-systems Context sensitive grammar gives an useful functionality for producing the plant development environment of a problem. In the general case L-systems are context-sensitive parallel grammars, but they can be context free. We use context sensitivity in L-systems to represent the flow of photosynthates6, hormones7 and other signals in the plant. For the context sensitive L-system productions section, the format is: left context < predecessor > right context: condition --> successor So “” are separators of the predecessor from the contexts, and --> should be read as “goes to” or “becomes”. If there is no left context it may be represented as * < or it could just be left out; the same for the right context. So, for a particular predecessor to match, the predecessor itself must match, the characters to the left and right must match the left and right contexts, and the condition must be true. Then the predecessor would be replaced by the successor in the new string. To give a simple example for context: say we have a string ABCB and productions are: A < B --> D, C < B --> E, and A --> F. Note that only the first two are context-sensitive and that if there is no production for a symbol there is an implied identity production, in this case C --> C. The next string would be FDCE. So the B with an A to the left becomes a D while a B with a C to the left becomes an E. As before (section 2.2), in basic L-system production format, “predecessor --> successor”, this information is a major part in the rule for the state of data. In the L-DBM we can ignore the context-sensitivity; that is we ignore context-sensitive information to produce the L-DBM Schema Translator and the L-DBM Population Interpreter. The major reason is that, although the L-system strings may have resulted from context-sensitive productions, the L-system string itself is interpreted by a context-free grammar. The context has been considered when the L-system string is generated using the simulator.
5. Stochastic is based on item in the probability distribution of an ordered set of observation. 6. Photosynthates means the process by which light energy is converted to chemical energy. 7. Hormones means any of various internally secreted compounds that affect the functions of specially receptive affect the functions of specifically receptive organs.
8
Database Technologies for L-system Simulations in Virtual Plant Applications on Bioinformatics
In this paper, we focus on how to put and record strings into population but we are not interested in knowing how they produced in Context-Sensitive. We do not need to look at context-senitivities. Context-sensitive is important when they produce the simulation. Currently, we disregard context information when deriving the schema, as strings don't actually reflect context information that was used in derivation. But it is possible we could generate tables that will never hold any data because certain patterns that can be derived in the context-free form don't arise in the context-sensitive form.
3.0 Branching Structure Can Be Represented in Tables This section describes the relationship between L-systems and database systems. We start to introduce the relationship between an L-system string and its expression in relational tables. Fig. 7 is a visualization of time-step 5 of the L-system string. In the previous section, we have mentioned that this fig. was annotated with additional information needed to query the set of virtual plants: branch identifiers, internode identifiers and leaf identifiers. These identifiers are used to keep track of parts of strings, and to find and compute the values of queries.
L10
• D ‚
B
A
I8
C L9
C I6 I7
L8 L7
Ž
L6 I5
I4
I3
L4
I2 E2
ƒ D
• ‚
E1 I1
L5 C
F
L3 (5)
• Œ
Œ: Branch identifier , •: Order of branch, I1: Internode identifier, L1: Leaf identifier FIGURE 7. (Same as Fig. 6 (5)), Result in Time-Step 5 of Implementing the Five Productions Given in Fig. 4.
Tables 1 and 2 show the leaf and branch table capturing information about the simulated structure at timestep 5, corresponding to the fifth string in Fig. 5. The branch table shows Branch_ID, this branch’s Super_Branch (its parent’s branch number), its Super_Internode (the internode the branch belongs to) and its related attributes, for example Depth_Number. Depth_Number actually means branch depth. Regarding the branch structures, information details are discussed in the following section. The leaf table shows Leaf_ID, this leaf’s Branch_ID and Internode_ID and its related attributes, for example Leaf_Type. In this example, Leaf_Type could be “E” or “L”. This kind of result allows easy collection of statistics from virtual plant and data analysis, using existing database techniques. Representation of this particular L-system in tables is fairly straightforward but given more complicated structures, such as a tree this may not be so easy. How can we use the generic L-system productions and strings to produce the generic database schema and populations automatically? A solution will be presented in the following sections. The main interest is to capture information about the differences and similarities in plant component related to their position in the branching structure. For example, do leaves at the second order of a branch have the same properties as those at the third order of a branch? Or do leaves at the first position on a branch have similar or different properties?
9
Database Technologies for L-system Simulations in Virtual Plant Applications on Bioinformatics
Leaf_ID
Branch_ID
Internode_ID
Leaf_Type
1
u
1
‘E’
2
u
1
‘E’
3
u
2
‘L’
4
v
3
‘L’
5
w
4
‘L’
6
v
3
‘L’
7
u
5
‘L’
8
x
6
‘L’
9
u
7
‘L’
10
u
8
‘L’
Attributes...
TABLE 1. Example for Leaf Table
Branch_ID
Super_Branch
Super_Internode
Depth_Number
u
NULL
NULL
j
v
u
2
k
w
v
3
l
x
u
5
k
Attributes...
TABLE 2. Example for Branch Table
3.1 Using the Grammar to Derive the Schema and to Populate The data structure of the general branching system includes the branch, internode, leaf, bud and flower. Fig. 8 and 11 show an extract of an Entity Relationship (ER) conceptual schema (Chen, 1976) to describe an L-system. This conceptual schema shows how the branch growth expresses the recursive structure in the database of a virtual plant. Lindenmayer’s concept of bracketed strings can be applied to parametric words in an elementary way. The formalism of parametric L-systems is extended to branching structures using the branch delimiters “[” and “]” as in non-parametric bracketed L-systems (Hanan, 1992 & 1995). Left and right brackets, “[” and “]”, must occur in matching pairs in the same way as parentheses are used in an arithmetic expression. By considering the brackets as delimiters of branches, the bracketed strings can be interpreted as branching structures. Particularly, a left bracket indicates the node of the parent branch (Super Branch in the database system) to which the child branch (Branch in the database system) is to be attached, while the matching right bracket terminates branch specification. The brackets may be nested indicating higher-order branches. If more than one branch occurs at a site, the bracketed substrings representing each one are listed sequentially in an order.
10
Database Technologies for L-system Simulations in Virtual Plant Applications on Bioinformatics
From the L-system productions, we can see the relationship between these productions and schema. Using the previous example (Fig. 2): ‘A’ gives a stem structure to start with, ‘B’ and ‘C’ give branch structure, ‘I’ expresses internode structure, ‘E’ and ‘L’ express different kinds of leaf structure, ‘F’ expresses the flower structure and ‘D’ is a sub-leaf structure that is a reproductive apex. The model defines the concrete syntax for a particular L-system example . We start with a stem: S L-system Axiom: S with productions:
Œ • R2. Ž R3. R1.
S: Stem I: Internode E:Cotyledon A:Apex L:Leaf B:Young Bud C:Old Bud D:Reproductive Apex F:Flower
S --> I[E][E]A A --> I[L][B]A B --> C
• R4.
C --> I[L][C]D
• R5.
D --> [L]F
FIGURE 8. L-system Production and its Definition for its Symbol These productions are analysed to create an abstract syntax tree and manipulate static semantic attributes and static semantic conditions to compute branch_identifier, internode_identifier, leaf_identifier, as well as structured parameters such as Depth_Number. The basic schema in Fig. 9 (using ER for the L-system productions of Fig. 8) shows the Stem, Cotyledon, Leaf and Flower and their attributes are generated from their corresponding symbol and attributes.The techniques described in this paper maps directly from the grammar to the relational schema. The ER diagrams are provided as a means of explaining the process of converting from an L-systems axioms and productions to a relational database schema. The inside dashed-line schemas generated from L-system production structures, where the left-hand side symbols also appear in the right hand side (eg. ‘A’, ‘B’, ‘C’, ‘D’). These symbols’ attachment to the branch structure symbols ‘[‘ and ‘]’ (eg. Branch ‘B’ and ‘C’) also show their relation with the internode (in Fig. 8 is ‘I’). Basically the entity of Branch is generated by the L-system productions (in this example are ‘A’, ‘B’, ‘C’ and ‘D’). The relationships represented by the structure of the L-system productions can be generated. More specifically, in this simple example, the first L-system production ŒS--> I[E][E]A shows the main stem structure in the database. The second •A --> I[L][B]A and fourth • C --> I[L][C]D interpret the branching structure. To be a recursive structure (in the database) the productions must include the following condition: a non-terminal occurs within [ ] in the right hand side of a production and also has a branching production for its own expansion. In the second production •A --> I[L][B]A, “B” has the “[“ and “]” branch structure tokens around it and the third production ŽB -> C just shows the delay development in the plant structure. The fourth production •C-->I[L][C]D actually represents the branching structure (recursive structure in database) for the second production A-->I[L][B]A. From these information structures, the L-DBM constructs the branch and Super_Branch database schema. The relationships between Flower, Branch and Internode come from the fourth •C-->I[L][C]D and fifth productions •D --> [L]F. Now let us focus on the relationship “Super” between branch and branch. The most important interesting part of the schema in Fig. 9 is inside of the dashed line “branch belongs to its super branch”, where the schema is produced by •Ž• production rules. From the first production Œ S --> I[E][E]A, we find the relationship between “Stem” S and “Branch” A (the branch type in this case is “Apex”). It gives the main stem branch structure to start with “stem belongs to main branch”. The main branch does not have a super branch. The “A” (Apex) later will become the growing branch structure replaced by “I[L][B]A” (“B” is Young_Bud). From the second production rule • A --> I[L][B]A, we could see its right hand side of symbol B has “[ ]” around it and Ž B-->C that is delay process, describes from Young_Bud to Old_Bud. This production does not describe the branch and Super_Branch relationship but it gives a connection to the next production. In • C --> I[L][C]D does support the branch struc-
11
Database Technologies for L-system Simulations in Virtual Plant Applications on Bioinformatics
ture, because “B” is replaced by “C” through the number Ž production rule. Then “C” will be replaced by “I[L][C]D” through number • production, that is [I[ L][C]D]. We can see through this production the relationship of the second branch with its super branch (first branch). Through this relationship we can see where the inside dashed line schema comes from. We also can see that the first, second and fourth L-system productions all present internode data structures that can be described in the extract of conceptual schema showing that each internode belongs to its branch and its Super_Branch.
•Ž•
Schema
Super Branch_ID
Branch_Type
N
1
Œ
Branch N
1
1
1 Stem
Leaf_ID N
•••
Leaf
N
N
Stem_ID
1
N
1
N Flower_ID
Flower
1
•• Internode
Internode_ID
Œ••
FIGURE 9. Conceptual Schema in ER for Plant-Like Data Structure
4.0 The Map between L-systems and Database Systems An important subsidiary problem is how to represent L-system productions and strings as relational tables. The relationship between L-systems and data model for L-systems (L-DBM) is shown in Fig. 10. The left hand side of Fig. 10 (L-systems production to L-system Strings) can produce the virtual plant that simulates measurement data. This part has been done by the CSIRO research centre.
L-systems
L-system Productions
Declaration
Visualization
L-system Interpreter
L-DBM: Schema Translator
L-system Strings Biological Computing Environment L-DBM: Population Interpreter
Schema
Visualization
L-DBM: Query Builder
Population
Database Representation
User Query
FIGURE 10. The Architecture of L-DBM between L_System and Database System
12
Database Technologies for L-system Simulations in Virtual Plant Applications on Bioinformatics
Our research aims to:
•Build a generator (L-DBM: Schema Translator) to convert L-system production into an appropriate database schema.
•Construct an interpreter (L-DBM: Population Interpreter) to populate the database automatically from L-system strings according to their corresponding L-system productions .
•Establish a query builder (L-DBM: Query Builder) applicable for the biological computing environment. The top of Fig. 10 indicates two parts of first level input files: one is the L-system declaration, the other is the L-system productions. The L-DBM schema translator will produce a database schema automatically and also will produce a L-DBM Population Interpreter for this particular L-system. The second level input of the L-DBM Population Interpreter is the L-systems strings produced by running the same L-system. The second level output of L-DBM is the population of the database system that the L-DBM schema translation has specified. 4.1 Database Development of L-DBM The architecture of L-DBM tools is shown in Fig. 11. These tools have been implemented as a prototype, and have been successfully tested on real data sets.
L-system Declaration
L-system Productions
L-system Interpreter
L-DBM Schema Translator
L-system Strings
L-DBM Population Interpreter
Database Schema
Database Population
FIGURE 11. The Architecture of L-DBM Tools The L-DBM tools generally accept as input a specification written in the L-systems language and produce modules in a database data definition and manipulation language. Therefore the tools can be seen as a generic solution of the problem of representing L-systems as database. This means that the large data sets produced by L-systems can be easily analysed using current database management technology. L-DBM includes two level of compilers. The higher level of compiler (the Schema Translator) is built from the meta-grammar for L-systems. The schema translator accepts any specific L-system productions (grammars) producing a schema in the database language, in the form of a collection of SQL “Create Table” statements. The meta-grammar is fixed - it describes the data structure of L-systems productions and therefore the compiler is fixed. The lower level compiler (Population Interpreter) is built from the higher level of compiler (Schema Translator) that is generated by Schema Translator and therefore varies depending on the particular L-system grammar. The Population Interpreter’s input is an L-systems string, and its output is a population of the database generated by Schema Translator in the form of a set of DML (Data Manipulation Language) statements “Insert data record into the table”. j describes the number one process produced by Schema Translator and produced by Schema Translator.
13
k describes the number two process
Database Technologies for L-system Simulations in Virtual Plant Applications on Bioinformatics
Meta-Grammar Fixed Particular L-systems
Schema Translator
• ‚
instances
Schema DDL
Strings
Population Interpreter Can be varied
Population DML
FIGURE 12. Compiler Structure in the L-DBM Fig. 13 states the other point of view of Fig. 12. ST means Schema Translator and PI means Population Interpreter. Fig. 13 presents program structures rather than instances structures.
Meta-grammar
Compiler Generator
Particular L-system
ST
Schema DDL
Grammar in notation of tools
Compiler Generator
L-system Strings
PI
Population DML
FIGURE 13. The Compiler Structure in the L-DBM (2) A compiler is composed of four essential components: the scanner, the parser, the constrainer and the code generator (Pittman and Peter, 1992). The scanner reads the source file as a string of characters and identifies from it a stream of words and symbols called tokens. The tokens output by the scanner are input to the parser which identifies the phrase construction of the source language and builds an abstract syntax tree. The symbol table is the repository of semantic information attached by the compiler to individual identifiers in the program being compiled. The code generator is that part of the compiler that implements the actual generation of output code from the internal representation of the source program. The Scanner, Parser, Constrainer and Code Generator relationship are shown in Figure 14. In Figure 15 both L-DBM Schema Translator and Population Interpreter have this structure. One benefit of the schema translator and the population interpreter is automation.
14
Database Technologies for L-system Simulations in Virtual Plant Applications on Bioinformatics
Main
Parser
Constrainer
Code Generator
token Builds Abstract Syntax tree
Scanner
Symbol Table
String Table
FIGURE 14. The Structure of a Compiler (Pittman and Peter, 1992) = Schema Translator (Schema Translator)
l_tree.scan
l_tree.cg Ident_Table.cg
l_tree.pars
Ident_Table_Access..puma = Executable file produced by author
LHS_Table.cg
rpp
= Input file
cg -xz
Scanner.rpp
LHS_Table_Access..puma
reserved words l_tree.rex
Print_Schema.puma
l_tree.lalr
cg
= Output file
puma
=
Input data flow
= Output data flow = Executable file data flow
rex
lalr
Scanner.h Scanner.c ScannerSource.c ScannerSource.h
Parser.h Parser.c
Tree.c Tree.h Smantic.c Smantic.h Ident_Table.h Ident_Table.c LHS_Table.h LHS_Table.c
Ident_Table_Access..h Ident_Table_Access.c LHS_Table_Access.h LHS_Table_Access.c Print_Schema.h Print_Schema.c
Level One Input One L-system
Level One Schema Translator
l_tree
Gen_LTree.scan
rpp
Gen_LTree.pars
Gen_LTree.cg
cg
rex
Scanner.c Scanner.h ScannerSource.c ScannerSource.h
Level One Output Two
Database Schema for a specific L-system
cg -xz
Scanner.rpp reserved words
Gen_Ltree.rex
Print_LTree.puma
Tree.TS
puma
Parser.lalr
lalr
Parser.c Parser.h
Tree.c Tree.h Smantic.c Smantic.h
DataBase.h Database.c
Level One Output One: Population Interpreter
Level Two Input Two L-system String
Gen_LTree Level Two Output Three Database Population
FIGURE 15. The Implementation Relationship between Schema Translator and Population Interpreter
15
Database Technologies for L-system Simulations in Virtual Plant Applications on Bioinformatics
Figure 15 illustrates the implementation structure relationship between the schema translator and the population interpreter. It also presents how their input and output are processed. From Figure 15, Schema Translator will produce the l_tree executable file and the executable program l_tree, which can accept as Input One (level one) - any kind of conventional L-system. Schema Translator produces two second level outputs: Population Interpreter and a database schema for this particular input level one L-system. In the Figure we also see that Population Interpreter produces one executable program Gen_LTree. Gen_LTree accepts the second level input, an L-system String (this L-system string must be produced by the level one input L-system). Finally, the executable file Gen_LTree produces the second level of output, a population for the first level of relational tables (database schema). The L-DBM has been implemented using C and scanner and parser tools from the GMD tool box: Rex, Lalr, Cg and Puma (Grosch, 1992). 4.2 Schema Translator
(a)
Rules
(b)
RULE RuleList
(G1)
(c)
One_Rule
(G2)
(d) FirstComponent (e) (f)
S
(i) (j)
(G4 and G5)
StructureList
Component I
(G3)
--> StructureList
Structure
(g) (h)
RuleList
[
........ ..
Structure StructureList
Structure
[
StructureList ]
Structure
(Attachment =0) Component
Component
E
(k)
StructureList
Structure
StructureList ]
Structure Component A
E
(Attachment =0) Recursive_NO = 1
FIGURE 16. The Parse Tree for the Rule S--> I[E][E]A In considering these compilers, it will be beneficial to think of the parse tree being constructed in schema translator. The parser makes a single left-to-right scan of the input, examining ahead one token at a time. Figure 16 illustrates the parsing of the string. RULE S-->I[E][E]A The left hand side of Figure 16 illustrates the steps in the top-down construction of a parse tree. The right hand side of Figure 16 describes which meta grammar rule is used for L-systems productions. The arrow appearing in the bottom of Figure 16 illustrates top-down parsing while scanning the input from left to right. The goal of the Code Generator for Schema Translator is to accept the grammars and produce (output) the schema, and also produce the grammar rules for the Population Interpreter which accepts the strings. The mapping of the abstract syntax tree to the target language is carried out in two steps. The following sections will present the working of the code generator. The L-DBM schema translator includes two sets of symbol tables for identifiers, and is basically composed of the Ident_Table and Ident_Table_Access module and the LHS_Table and the LHS_Table_Access module. The symbol table of LHS_Table has similar structure in the symbol table of Ident_Table but it stores the identifier of the left hand side of L-system productions. Each node of the abstract syntax tree will have Nodetype. The value of Nodetype for each node is derived from the correspondence in the symbol table (Ident_Table, LHS_Table). In this
16
Database Technologies for L-system Simulations in Virtual Plant Applications on Bioinformatics
example, there is only one recursive structure so there is only one branching structure. One of the important properties of each component is Attachment (“Attachment” is stored in the abstract syntax tree) which describes whether or not the components have square brackets around them. If they do, then the Attachment variable is equal to 1; if they do not , it is equal to 0 (see Figure 16). The other important property is Recursive_NO (“Recursive_NO” is also stored in the abstract syntax tree) which describes the number of recursive structures existing for this particular L-system. If the right hand side of the first rule includes two pairs of square brackets (both their Attachment is equal to 1) around two different idents and also the related rules, i.e. the extension of these two identifiers, involve the recursive structures, then in this case the Recursive_NO will be equal to 2 and so on. So Recursive_No counts the number of recursive stuctures. A Branch_type component is a component that appears on both the right and left hand sides. The component also has square brackets around it, which means its Attachment is equal to 1. The Internode_type is an abstract syntax attribute that describes a component which appears as the first of component of a left hand side preceding “[“ and Branch_type component in the production. If that is the case then the Internode_type value is equal to 1, otherwise is equal to 0. Here we describe one of procedures of code generator in the schema translator as follows: Procedure 4.1. Building a branch table as in Figures 8 and 16 Input: An abstract syntax tree (Tree) and symbol tables (Ident_Table, LHS_Table) Output: The database schema of one or more branch tables Method:
1. If the value of Recursive_NO for the root (Rules) is equal to 1 then it will create only one branch table. Otherwise it will create more than one branch tables. 2. Then the program starts the Axiom (FirstComponent) and work through the entire abstract syntax tree (traversal). 3. If any Component in LHS_Table has Attachment equal to 1, then the Branch_type of this column needs to be added in the branch table. 4. There is a Branch_type attribute in the branch table if it is necessary. This is because component appears both in the right and left hand sides and also has square brackets around it. In Figure 16 we can see its Recursive_NO is equal to 1. 5. If there is geometrical attribute has its primary attribute ‘b’. Also ‘A’ has its own primary attributes ‘t’ and ‘l’. A belongs LHS_Table and Attachment equal to 1. Then its relational table 8 will be constructed as follows: Branch ( Branch_ID, Super_Branch 9 , Branch_type, down.b, t, l ) In this case (which do not have geometrical and primary attributes, attachment =0) the output of the branch table is as follows: Branch ( Branch_ID, Super_Branch )
4.3 Population Intrepreter We could populate the tables by running the L-system after the database schema has been created. It is possible to do that and re-write simulator in a database language. Part of the reason for our approach is that we already have existing mechanism to do the simulation, and it is not economic to re-write it. They involve contextsensitivity which is an important part of simulation. Further, the simulation make great use of the contextsensitivity, which is not relevant to the database population, so translating the simulation output to populations of tables is more appropriate to our purpose. It is quite complex to implement the simulators, but it is easier to interpret the population from L-system
8. We use the relational table representation A(B,C,....), where A is schema name that is table name, B and C is the column name that belongs to the table. 9. Super_Branch is the parent of branch for that particular branch.
17
Database Technologies for L-system Simulations in Virtual Plant Applications on Bioinformatics
strings. The complier can write the program to parse the productions, so we do not need to worry about the simulator. This strategy is good software-engineering practice.
Œ Branch_IDIn =0
S I
E 1
•
Branch_IDOut =1
I
B Branch_IDIn = 1
L
‚
C I
•
I
Ž
L 4
C
„
L 5
L 7
…
D
Branch_IDIn = 2 Branch_IDOut = 3 Super_Branch=2
I
A
Branch_IDOut = 2 Super_Branch = 1
3
ƒ
Abstract Syntax Tree
A Super_Branch = NULL
E 2
B Branch_IDIn = 1
Branch_IDOut = 4 Super_Branch=1
•
C
C
D
L F 6 1
I
‡
L 9
I
ˆ I
†
L 8
C
A C
A
L 10
B
A
D
Œ: Branch ID, •: Internode ID, 1: Leaf ID, 1: Flower ID FIGURE 17. The Abstract Syntax Tree as Input of Code Generator To give a systematic description of Population Interpreter, it will be useful to look at the abstract syntax tree as input of the code generator in the Population Interpreter. Figure 17 shows some attributes (such as Super_Branch) and explain how they work. All these attributes are associated with nodes of the abstract syntax tree. Figure 17 presents the Super_Branch computation using the inherited method. Super_Branch has been defined as a parent branch of a branch. The branch table schema’s Super_Branch is produced by the attribute computation (pre-computing) traversal of all nodes of the abstract syntax tree. The root of the abstract syntax tree starts from the number 0 branch which has no super branch. Each node in the abstract syntax tree will have a Super_Branch attribute. The Super_Branch value will be assigned to Branch_IDIn. Basically the Super_Branch value will be assigned its parent’s Branch_ID in the abstract syntax tree. For example, Branch_ID number four •, has parent branch (Super_Branch) equal to 1 (Œ). Because its Branch_IDIn is equal to 1, we can find the Branch_IDOut value is equal to 4 after traversing the abstract syntax tree to this node. In the summary, the Branch_IDIn is the value of parent branch , Branch_IDOut is the value of current branch. For example in the Schema Translator, we have introduced the Recursive_NO, Attachment, and Branch_type attributes etc. All these attributes are associated with nodes of the abstract syntax tree. One of procedures producing a population of the relational tables is shown as follows. These procedures are generated by the Schema Translator. Therefore they contain all the necessary information about schema names and attributes. Figure 17 shows the population values in the abstract syntax tree, used in the following procedure. There will be one procedure for each relational table schema produced by the Schema Translator. Procedure 4.2. Populate a branch table Input: An abstract syntax tree Output: Population of one or more branch tables Method: 1. This procedure has been produced by the Schema Translator from the L-systems production input to Schema Translator. The branch relational table is already produced by Schema Translator. 2. In this procedure, start at the branch of the abstract syntax tree and put the necessary data (input) into the branch relational table. 3. In this example, the branch table has three columns Branch_ID, Super_Branch, and Branch_type. The program will find the branch component (Nodetype = “Branch”) in the abstract syntax tree (see Figure 18 the black number), some derived information like Branch_ID, Super_Branch (the value has been described in Figure 18) are already pre-computed by the attribute evaluator.
18
Database Technologies for L-system Simulations in Virtual Plant Applications on Bioinformatics
4. The program will traverse the abstract syntax tree then insert the values into the Branch tables for each branch. It will then look for the next branch, until the end of the abstract syntax tree. Cases with more than one branch table will use similar methods. The branch table will have the following population in this example: TABLE 3. Branch Table and its Tuple Branch_ID
Super_Branch
Branch_type
1
0
A
2
1
C
3
2
C
4
1
C
5.0 The Correspondence Problem between L-systems and Database Systems Even very simple L-systems can produce plant-like structures. It does not matter what symbols the user employs to represent the plant architecture so long as it describes the tree structure. In general the builder of the L-system can use any symbol to represent anything except the branch control symbol “[ ]”. In the basic ER schema in Figure 14 for the L-system productions in the Figure 8 and Figure 9 conceptual schema in ER for Plant-Like Data Structure, the Stem, Cotyledon, Leaf and Flower and their attributes are generated from their corresponding symbol and attributes. The correspondence is established in a declaration. Figure 8 shows that the L-systems Productions and their declarations are used as the input of the first level in the L-DBM schema generator. The schema in the Figure 14 is the one of first level outputs of schema translator. The other output is population interpreter.
Super
Schmea
•Ž•
Branch_Id
Branch
Œ
N
Leaf_Id
1
1
1 Stem
Depth_Number
N N
Leaf
•••
N
Stem_ID
1
N
1
N Flower_ID
Flower
1
•• Internode_Id
Internode
Œ••
FIGURE 18. L-systems Production, its declarations and corresponding schema 5.1 Recursive Structure in Conceptual Schema for Biological Terms So far, we have established the context correspondence between L-systems and database systems in productions and schema. The key issue of this paper is how the biologist can talk about and query on this result. The biologist might ask: How many branches in the plant? or How many leaves on each level of branching? The query itself is not expressed recursively, but the concepts used are recursive properties of the data.
19
Database Technologies for L-system Simulations in Virtual Plant Applications on Bioinformatics
SELECT count(Leaf_No), Depth_Number FROM Leaf, Branch WHERE Branch.Branch_Id = Leaf.Branch_Id GROUP BY Depth_Number;
FIGURE 19. Example Query The following information comes from the derived attributes.
•Depth_Number: number steps of branch structure from main branch •Super_Branch: branch’s super branch (parent branch) •Super_Internode: branch’s super branch’s internode For example: How many leaves at each level of branching? Using SQL in L-DBM Query Builder (in Figure 19), the biologist does not need to use a complex recursive query to get the query answer. The Depth_Number and Branch_Id in this example are computed by the compiler from the recursive structure. Before the biologists query on these terms (attributes), the L-DBM had already derived them. Basically we use compiler technology to generate recursive complex attributes. The recursive structure is buried in the schema (Figure 20, inside dashed line) and will not directly appear in the schema that the users employ to describe the plant structures (Figure 19 query example corresponds to Figure 20 grey part of the schema). However, these derived attributes are not named in the correspondence declaration. How the user assigns names to these derived attributes is shown below.
Depth_Number
Schmea
Super_ Super_Branch Internode
Branch_Id
•Ž• Branch
Œ1 Stem
N
Leaf_Id 1
1
N N
•••
Boundary of derived information
N
Stem_ID
1
N Flower_ID
Leaf
1
Result of Query example for Figure 15
N Flower
1
••
Œ•• Internode_Id
Internode
FIGURE 20. The Specific ER Conceptual Schema for sample L-systems Productions 5.2 Visualization for Biological Terms in Database Systems A secondary issue of this approach is the correspondence problem. The correspondence between most terms in generic L-system productions can be made using declarations but complex recursive derived data (Branch_ID and Depth_Number) can not be done this way. How the user assigns names to these derived attributes is shown below. Each term has an image which presents the concept to the user. The user assigns a corresponding term in his or her own vocabulary to the image. Each term has an image which presents the concept to the user. The user assigns a corresponding term in their own vocabulary to the image. This, combined with the declaration of Figure 8, allows all the concepts used to describe the simulated plants to be customised. In this section, we show a visualization tool (VT) in the L-DBM query builder. The VT uses a virtual plant to present derived terms like Depth_Number etc. This gives a more biologist-friendly computing environment.
20
Database Technologies for L-system Simulations in Virtual Plant Applications on Bioinformatics
L-systems Strings are used to present images of the plant. If at the time of generation of the schema, the user provides not only the L-system grammar, but also a string exhibiting the interesting properties of the grammar, then we have developed a simple way to permit the user to provide domain-specific names for the recursive derived attributes. Figure 21 shows an L-systems string and its image constructed from the provided sample string. In Figure 21, Branch identifiers are labelled like Œ, Internode identifiers like • and Leaf identifiers like 1. In this figure, the dashed circle around the second branch corresponds to the part of the L-systems Strings between the arrows. . Stages 4 I [ K ] [ K ] I [ L ][ I [ L ] [ C ] D ]I [ L ] [ C ] I [ L ] [ B ] A A L6
B I… L5
C I„
• D C Iƒ
L4 L3
I‚ K I•
K
Œ Œ: Branch identifier, • : Internode identifier, 1: Leaf identifier
FIGURE 21. The Image of Virtual Plant The basic user interface in the L-DBM VT is shown in Figure 22. The L-DBM VT includes all objects for the special terms that come from the L-systems productions that are built by the L-DBM schema generator and the LDBM population interpreter. The L-DBM VT also links to the L-systems strings to give the corresponding visualization of a particular virtual plant. When the user clicks on the special object term that they are interested in, the VT will give another window which provides three options. One is “Term Corresponding”, the second is “Visualization: Virtual Plant” and last one is “Help!”. The Term Corresponding presents the compiler generated term and the corresponding user term, which can be entered at this stage. A visualization of the special virtual plant corresponding to the L-systems Strings such as in Figure 21 is presented by “Visualization: Virtual Plant”. As shown in Figure 22, the user can see the interface to understand the terminology. The “Help!” option just gives more details about how VT works.
L-DBM: VT VT: Depth_Number
Objects Branch_Id
Term Corresponding
Internode_Id
Visualization: Virtual Plant Help!
Depth_Number Super_Branch Super_Internode
FIGURE 22. The L-DBM:VT user interface
21
Database Technologies for L-system Simulations in Virtual Plant Applications on Bioinformatics
The next level of user interface is shown in Figure 23. In this figure, the Depth_Number was chosen. The visualization of the virtual plant representing the Depth_Number appears in the Virtual plant: Depth_Number window. The number corresponds to the L-systems Strings that were chosen by the user to represent typical strings. The middle of figure (Figure 23) shows the term corresponding between the compiler generated and the biological point of view for the Depth_Number which in this case “level of branching”. The bottom of Figure 23 shows the Help information for Internode_Id. It can also give more standard details for special terms. A research program studying a particular plant will typically result in many L-Systems presentation using the same set of symbols with the same interpretations, so this correspondence, once established, will secure the ongoing program.
FIGURE 23. The Components of VT of L-DBM
6.0 Conclusions and Future Work This paper has reported four main contributions. Firstly, the L-DBM is a generic procedure. Once the LDBM gets any specific L-systems productions and its declarations, it can generate the specific schema for both simple correspondence terminology and also complex recursive structure data attributes and its relationships. The same correspondence applies to any L-system using the same vocabulary. Once established it can be used to support an entire research program. The research contributes a generic solution for all kinds of L-systems. Secondly, this research contributes the idea of pre-computing recursive structures data into derived attributes using compiler generation.Thirdly, we presented a visualization tool (VT) to representing the different data structures. Finally, the whole L-DBM supplied a valuable package for biological scientific database management. Finally, we supplied a method to allow a correspondence between biologist’s terms and compiler generated terms in a biological computing environment. In future work, we will look at querying results of L-systems simulation on other issues for example the plot level query and functionality that is to extend existing query technologies on aggregation functions, quantification over attributes and plot level queries. Also currently we are working on biological queries in the L-DBM query builder, extraction and analysis of biological data samples, extending current query builder facilities by studying scientists’ requirements. For example: • write a query to extract sequences of data, say leaf length along a branch, so that the sequences can be compared;
22
Database Technologies for L-system Simulations in Virtual Plant Applications on Bioinformatics
• given a set of strings representing a number of plants at the same stage of development, with stochastic variation between them; • querying results on plot level and turtle functionality 3 -D structure could be extended and improved in both technologies and studies. Also the implementation of the query builder extra functionalities will be worth doing. In the framework some additional efficiency issues should be taken into account in the future research. For example, there seems to be many common parts in a plant at different time steps, and the question is how can we store them more efficiently, and how the space required for strings changes when they are put in the database. An interesting question is whether one wants to record every step of the experiment. Since parts of the experiment appear to be deterministic, why not treat the appropriate productions as unmaterialized views? This may result in a saving of space, and space-time tradeoffs are always important in databases. We can extend this approach from L-system to graph grammars. Since L-systems are one kind of graph grammar, we could extend this approach to other graph grammars in scientific research work and to use this research result in other applications. Beyond this study, there is potential to cross-reference databases of morphogenetic specifications to databases of gene sequences, taxonomic characteristics and ecological attributes. The construction of gene and taxonomic databases has been underway internationally for some time. It is intended that this project will be extended to explore the extent to which protocols in those fields can be used in the construction of L-system databases and cross-referencing systems. Our work is applicable to any field involving branching structures, such as nerve development, vascular growth and crystal evolution (Fleury, Gouyet and Leonetti, 2001).
References Bloesch, A. C. and Halpin, T.A. (1996), ConQuer: a Conceptual Query Language, ER 96 Conference on conceptual modelling, Springer LNCS, no 1157, pp 121-133. Bloesch, A. C. and Halpin, T.A. (1997), Conceptual Queries using ConQuer-II, ER’97, Proc of 16th International conference on Conceptual Modeling, Springer LNCS, no 1331, pp113-126. Chen, I. A., and Markowitz, V.M. (1995) “An Overview of the Object-Protocol Model (OPM) and OPM Data Management Tools.” Information Systems, Vol. 20, No. 5. Chen, P. Y. and Colomb, R. M. (1998) "Design: A Visual Database Development System" in Verma B., Liu Z., Sattar Z., Surawski R. and You J. (eds), Proceedings of 2nd IEEE International Conference on Intelligent Processing Systems, August 4-7, Gold Coast Australia, IEEE CS Press, pp.198-202. Chen, P. Y. and Colomb, R.M. (1999) "Decision Support in Bioinformatics Research", in Burstein F. (ed.), Proceedings of the 5th International Conference of the International Society for Decision Support Systems, International Society for Decision Support Systems. Chen, P. Y. and Colomb, R. M. (2000), "Querying Recursive Structures without Recursive Queries", in Orlowska M. E. (ed.), Proceedings of the 11th Australian Database Conference, ADC2K, Vol.22 No.2 of Australian Computer Science Communication, Jan 31-Feb 3, Canberra, Australia, IEEE CS Press, pp 21-27. Codd, E. F. (1970), “ARelational Model for Large Shared Data Banks”, CACM 13,6. Colomb, R. M. (1998), Deductive Databases and their Applications, Taylor & Francis Ltd. Elamasri R. and Navathe S. B. (1989), Fundamentatals of Database Systems, The Benjamin/Cummings Publishing Company, Inc. Fleury V., Gouyet J.-F. and Leonetti M. (2001), Branching in Nature: Dynamics and Morphogenesis of Branching Structures From Cell to River Networks. Springer-Verlog. Grosch J. (1992), Compiler generation - a toolbox for compiler construction. Technical Report 16, Gesellschaft fiir Mathematik und Datenverarbeitung mgH, Forschungsstelle ander Universitat Karlsruhe, 1992.
23
Database Technologies for L-system Simulations in Virtual Plant Applications on Bioinformatics
Hanan J. (1992), Parametric L-Systems and Their Application to The Modelling and Visualization of Plants, PhD paper, Department of Computer Science, University of Regina,1992. Hanan J. (1995), Virtual plants: Integrating architectural and physiological plant models, P. Binning, H. Bridgman and B. Williams (editors) MODSIM 95 International Congress on Modelling and Simulation Proceedings 1 (1995) 44-50. Hanan J. and Room P. (1997), Practical aspects of Virtual Plant Research. In: Plants to ecosystems. Advances in Computational Life Sciences. Vol.1 (Ed. by Michalewicz, M.), CSIRO Publishing, Melbourne, 28-44. Leebaert D. (1995), The Future of Software. Massachusetts Institute of Technology, Graphic Composition Inc., USA. Lindenmayer A. (1968), Mathematical models for cellular interactions in development, Parts I and II. Journal of Theoretical Biology, 18:280-315, 1968.Pittman T. and Peters J.(1992), The ART of Compiler Design, Prentice-Hall. Lindenmayer A. (1974), Adding continuous components to L-systems. In Rozenberg G., and Salomaa A, editors, L Systems, Lecture Notes in Computer Science 15, page 53-68, Springer - Verlag, Berlin, 1974. Markowitz, V.M., and Ritter, O. (1995) “Characterizing Heterogeneous Molecular Biology Database Systems”, Journal of Computational Biology, Vol 2, No 4, 1995. Pittman T. and Peters J. (1992), The ART of Compiler Design, Prentice-Hall. Prusinkiewicz P. and Hanan J. S. (1989), Lindenmayer Systems, Fractals, and Plants, Springer-Verlag, Lecture Notes in Biomathematics. Prusinkiewicz, P. and Lindenmayer, A. (1990), The Algorithmic Beauty of Plants, Spring-Verlag, New York. Prusinkiewicz P., Hanan J. S. and Mech R. (2000), An L-systems-based plant modelling language . In M. Nagl and A. Schurr and M. Nunch (eds), Applications of graph transformation with industrial relevance, SpringerVerlag, Lecture Notes in Computer Science 1779, p295-410. Room P., Maillette, L. and Hanan J. (1994), Module and metamer dynamics and virtual plants. Adv. Ecol. Res. 25, 105-157. Room P. and Hanan J. (1995), Challenging the Future, Proceedings of the World Cotton Research Conference-1. Editors: Constable G.A. and Forrester N.W. pp. 40-44. CSIRO Australia 1995. Room P., Hanan J. and Prusinkiewicz P. (1996), Virtual plants: new perspectives for ecologists, pathologists and agricultural scientists, Trends in Plant Science 1:33-38, 1996 Vol1, No.1, Elsevier Science Ltd.
24