Eugene A Domain Specific Language for Specifying

0 downloads 0 Views 974KB Size Report
GenoCAD [8] is a Web-based visual application guiding users through the design of ..... grammar with different semantic actions and the creation of parsers in ...
Eugene A Domain Specific Language for Specifying Biological Constructs at Higher Levels of Abstraction Lesia Bilitchenko

Adam Liu

Douglas Densmore

California State Polytechnic University, Pomona Computer Science Department Pomona, CA 91768 +1 626 423 2711

University of California, Berkeley DOP Center Cory Hall Berkeley, CA 94720-1770 +1 925 895 2305

University of California, Berkeley 545P Cory Hall Berkeley, CA 94720-1770 +1 510 434 4978

[email protected]

[email protected]

[email protected] ABSTRACT Synthetic biological systems undergo an iterative process of design, simulation and assembly. As a result, the creation of robust and efficient design flows and tools is imperative in the development and specification of new biological constructs. Currently human readable languages neither exist which allow for these tasks nor do languages allow for the exchange of design metadata. Eugene is a domain specific, human readable language for the description of biological constructs which explicitly addresses both of these needs. The key ideas in Eugene are its structural representation of abstract biological parts, part instances which tie the abstract representations to physical implementations, and the control of composite Part creation through user specified rules. Eugene is intended for forward engineering of DNA-based devices and through its data types and its implementation reflects the desired abstraction hierarchy in synthetic biology. Eugene has the capability of creating new data types, allowing a richness of expression in the language as the synthetic biology community continues to evolve.

the most basic unit on which everything else is built. This is followed by Parts, which are well-defined sequences with specific biological functions. Devices, which can contain one or more Parts, are the next level. Finally Devices are followed by a system view which will contain collections of Devices. The traversal of each level will require a refinement and abstraction process.

Keywords Abstraction, Parts, Devices, Properties, Rules, Synthetic Biology, Composition, Hierarchy, Data Exchange, Standardization

1. INTRODUCTION In its development as an engineering field, synthetic biology is at a stage where standardization has been identified as a fundamental challenge [1], [2], [3]. DNA sequence information can be, encapsulated as a Part, and then ideally reused in various designs, thus encouraging the development of new larger constructs for the community. The process of developing standardized and wellcharacterized Parts is a key challenge and community efforts in this direction have been undertaken through the BioBricks Foundation (http://bbf.openwetware.org/), OpenWetWare initiative (http://openwetware.org/), and the International Genetically Engineered Machine (iGEM) competition (http://www.igem.org). The development of standard Parts is just one piece in the vision of a larger abstraction hierarchy as shown in Figure 1. Levels of abstraction are essential in the management and creation of large systems, as they undergo an iterative process of design, simulation and assembly. Figure 1 demonstrates four levels in the hierarchy where the actual DNA information forms

Figure 1: Desired Abstraction Model for Synthetic Biological Systems As Figure 1 shows, Eugene can bridge Device and Part levels by specifically addressing the different levels of the hierarchy. Eugene provides a human readable specification, which reflects the creation of systems by defining, specifying, and combining collections of Parts. This is explicitly reflected in Eugene’s data

types where Devices and Parts hide low-level implementation details, yet there exists the capability to address the bottom level through the encapsulation of specific design information as a Part Instance. Furthermore, Eugene can be carried across platforms and used with various visual tools, hence stimulating the interchange and ease of designs. Another important aspect in Eugene is the creation and assertion of rules. Rules should directly apply to the relationship between various Parts in a Device and provide the functionality of validation mechanisms in order to ensure the successful completion of a construct. These are not predefined in the language. Such flexibility allows users to define as well as assert numerous combinations of rules. The rest of this paper is organized as follows: Section 2 discusses the background related to the development of languages for synthetic biology. Section 3 introduces the core concepts of Eugene as a language. Section 4 details the data structure used to realize Eugene. Section 5 provides examples of biological constructs and how they can be easily specified in Eugene. Section 6 provides results which reflect the power of our approach and finally Section 7 discusses future work and conclusions.

2. BACKGROUND In the attempt to bring engineering principles to biology, the requirement of modeling and analysis has become essential in the creation of the design. Various programming languages currently exist that address this aspect. However, the current standards are mostly directed towards mathematical modeling [1] and do not easily support synthetic biology constructs.

the form of mathematical rules. However, the concept of the relationship between Parts in a Device, an ordered composite of one or more Parts, cannot be expressed in such rules as Antimony and similar languages address the simulation and computational aspect in the design. GenoCAD [8] is a Web-based visual application guiding users through the design of Part-based genetic systems and derives the construct sequence from its libraries of Parts. Rules on the construction of Parts are incorporated into the language and validation based on the predefined rules in the language can be performed. Yet, with any prepackaged interface, there exists the risk of limiting the tool’s user level customization. As the field of synthetic biology matures, a language should have the capability to grow with the science and at the same time give users a certain amount of expressiveness. As GenoCAD is a web based tool, we foresee the ability to extend it to work with Eugene.

3. LANGUAGE DEFINITION In this section we describe the elements in the language. These involve: primitive data types, properties, parts, devices, rules, and conditional execution. The relationships between these language elements are shown in Figure 2. Here you can see that each subsequent category is built upon the previous category.

Systems Biology Markup Language [4] is a computer-readable language to represent models of biological processes. It is applied to simulation of metabolism, biochemical reactions, gene regulation and other processes where biological entities are involved and modified. SBML allows models to be encoded in XML. Although different software tools can directly communicate and store the same representations and there are numerous software tools providing an interface, a human writable script representing biological constructs, such as biological parts or protein coding sequences, cannot be created for several reasons. Firstly, there is no direct relationship between the syntax and the standardized sequences created by the synthetic biology community. Secondly, the complex syntax of XML prevents simple and brief textual descriptions. CellML [5] is based on XML. Its purpose is to store and exchange biological models by including information about the model structure, the mathematics describing the processes and the metadata. CellML combined with FieldML [6], a declarative language for building hierarchical models, can describe biological information at different levels. Yet, these languages cannot express the abstraction level, which is reflected through the standardization effort in the synthetic biology community through the specification of Parts and Devices. Also the use of XML syntax prevents raw CellML descriptions from being human writable and limits the expression of models to visual tools only. Antimony [7] is a modular model definition language that allows scientists to define and use reaction networks. It is designed to be human-writable, unlike the previously described languages, and acts as an extension by translating the model to SBML. The notion of rules is addressed in Antimony similarly as in SMBL in

Figure 2: Relationship between Eugene Categories

3.1 Primitives The language supports five predefined primitives. These are txt, num, boolean, txt[], and num[]. Strings (sequences of characters) are represented through the data type “txt”, where the actual text is specified in double quotes. Real numbers and integers are supported by the data type “num” and logical values by the data type “boolean”. Ordered lists of num and txt values can be created and individual members inside a list accessed by specifying an integer in the range from 0 to |list| - 1. Examples (1) and (2) are two real code snippets of how primitives can be specified in Eugene. “listOfSequences” is simply a list of 3 arbitrary DNA sequences. “specificSequence” is the last element

of “listOfSequences” (i.e. “ATCG”). Examples (3) and (4) show how the data type “num” can support integers and decimals. txt[] listOfSequences = [ “ATG”, “TCG”, “ATCG”]; txt specificSequence = listOfSequences[2]; num[] listOfNumbers = [ 2.5, 10, 3.4, 6]; num one = listOfNumbers[0];

(1) (2) (3) (4)

3.2 Properties Properties represent characteristics of interest and are defined by primitives and associated with Parts. For example a user could define a property “Sequence” (the DNA sequence), ID (the uuid for a relational database which may hold the part), or Orientation (e.g. a forward or backward promoter). Examples 3-5 show how such properties would be defined. Property definitions must be defined by the five primitive types. In Part definitions (section 3.3.1) properties will be bound to that Part as placeholders for the instantiation of values in Part declarations. Properties have to be defined before Parts can use them. The user can create new Property labels or use those created by other users and captured in “header files” (Section 3.8). For example, the following Properties are predefined in the header file PropertyDefinition.h and do not need to be defined again if the header file is included in the main program: Property Property Property Property

ID(txt); // in header Sequence(txt); // in header Orientation(txt); // in header RelativeStrength(num); //custom

file file file Property

3.3.2 Part Declaration Part declarations make instances of predefined Parts and assign values to their properties. If the declaration specifies a list of values, it is assumed that every property will be assigned a value, where the order of the values corresponds to the order of the properties in the Part Definition as shown in example (17). Otherwise, a “dot notation“ followed by the name of the property can be employed, where the order becomes irrelevant as specified in the example below (16). The Part instance BBa_K112234_rbs has three properties associated with the Part RBS. These are ID, Sequence and Orientation. The identification label of a particular part from a database is stored in the ID placeholder to allow future access to the database. Sequence stores the DNA of a Part, while Orientation specifies the direction of the Part. Since dot notation is used, the ID value instantiation can be left out from the statement. Part declarations can be found in the header file PartDeclarations.h and are predefined if the header files are included in the main program. RBS BBa_K112234_rbs (.Sequence("GATCTtaattgcggagacttt"), .Orientation("Forward")); (16) RBS BBa_K112234_rbs (“BBa_K112234_rbs”, "GATCTtaattgcggagacttt", “Forward”);

(17)

(5) (6) (7) (8)

3.3 Parts The data type Part represents a standard biological Part. A Part can be defined empty intially and then property labels can be added through the function addProperties() or properties can be bound to a Part during the definition.

Figure 3a: Device BBa_K112133, consisting of one Part Promoter BBa_K11212 and one Device BBa_K112234

3.3.1 Part Definition Part definitions do not construct any Parts, but rather specify which Parts can be constructed. This can be done in the header file or in the main program. When the header file PartDefintion.h and PropertyDefintion.h are included the following Parts and their corresponding property labels are predefined. For instance, the Part “Promoter” will have three properties associated with it and all instances of Promoter will inherit ID, Sequence and Orientation: Part Part Part Part Part Part

Promoter(ID, Sequence, Orientation); ORF(ID, Sequence, Orientation); RBS(ID, Sequence, Orientation); Terminator(ID, Sequence, Orientation); RestrictionSite(ID, Sequence, Orientation); PrimerSite(ID, Sequence Orientation);

(9) (10) (11) (12) (13) (14)

If the properties are unknown during Part Definition process, the Part can be defined either empty or with the known properties. Later property labels can be added through the function addProperties() provided the property labels have been created beforehand. RBS will have four property labels after the following statement: RBS.addProperties(RelativeStrength);

(15)

Figure 3b: Device BBa_K112234, consisting of one Part Ribosome Binding Site and one Part Open Reading Frame

3.4 Devices Devices represent a composite of standard biological Parts and/or other Devices. In a Device declaration, the same Part and/or device can be used more than once. Property values of devices can be accessed with the dot operator; however, the value is the union of the property values of its members returned as a list. If the property is a txt or num, a txt[] or a num[] is returned. If the property is a txt[] or a num[], a txt[] or a num[] is also returned that consists of the lists appended together. For example the sequence of Device BBa_K112133 is the ordered union of the sequence of Part BBa_K112126 and the Device BBa_K112234. These two Devices are shown in Figures 3a and 3b, where the icon figures use true Visual BioBrick Open Language symbols (vBOL) [9] icon graphic.

Device BBa_K112234(BBa_K112234_rbs, BBa_K112234_orf); (13) Device BBa_K112133(BBa_K112126, BBa_K112234); (18)

Individual Parts can be accessed through the use of square brackets and an index. The first member is indexed at zero. Square brackets can be stacked in the case of devices within devices. To access the first element BBa_K112234_rbs of Device BBa_K112234 through Device BBa_K112133, the following notation is supported: BBa_K112133[1][0]

//references BBa_K112234_rbs

(19)

3.5 Rules The specification of rules provides the ability to validate Device declarations. Rule declarations in themselves do not perform the validation. They have to be “noted”, “asserted” or used as expressions inside an if-statement to give meaning. Rule declarations are single statements consisting of a left and right operand and one rule operator. The rule operators BEFORE, AFTER, WITH, NOTWITH, NEXTTO can be applied to Part instances or Device instances. Property values of Part/Device instances or primitives in relation with one Part/Device can be operators in rule declarations when using the relational operators =, !=, ==. These operators are overloaded when evaluating text and the text is compared according to alphabetical meaning. Table 1 provides a summary of the operators for Eugene rules. Example (20) illustrates a rule where all Parts BBa_K112234_rbs have to come before all Parts BBa_K11223_orf. Example (21) illustrates a rule where the Part BBa_K112234_rbs has to be contained together with BBa_K112234_orf inside a Device. Example (22) illustrates a rule where the Part BBa_K112126 has to be next to BBa_K112234 when a Device is declared. Example (23) illustrates a rule that checks whether the sequence of BBa_K112234_rbs is equivalent to the sequence of BBa_K112234_orf. Example (24) illustrates the comparison of Property values of Parts, where the “RelativeStrength” Property value for Part BBa_K112234_rbs has to be greater than the “RelativeStrength” Property value for Part BBa_B0032. Example (25) shows a similar comparison but uses the variable “relativeS” for comparison.

Compositional Operators BEFORE AFTER WITH NOTWITH NEXTTO

operand 1 appears before operand 2 on devices operand 1 appears after operand 2 on devices operand 1 appears with operand 2 on devices operand 1 does not appear with operand 2 on devices operand 1 is adjacent to operand 2 on devices

Comparison Operators < less than greater than >= greater than or equal to != not equal to == equal to Boolean Operators AND operand 1 AND operand 2 OR operand 1 OR operand 2 NOT NOT operand Table 1: Eugene Operators for Specifying Rules

3.6 Asserting and Noting Rules In order to take effect, rules need to be “asserted” or “noted”, once they are declared. The scopes of all assert or note statements encompass every new Device. Every time a new Device is declared and provided “Assertions” and “Note” statements exist, the validation process is performed on the newly created Device. Rule instances can be combined with each other through the use of the logical operators AND, OR, NOT in the statements. The difference between rule assertions and rule notes lies in the strength of the consequence once a violation is found. If no violation is found the program continues running.

3.6.1 Rule Assertion These statements are strong assertions and the program terminates with an error once a Device composition violates the statement. The following statement will check if BBa_K112234_rbs is not contained together with BBa_K112234_orf in the Device and their sequences should not be equal. In this case an error will terminate the program since both parts are components of the device, therefore violating the Assert statement. Assert ((NOT r4) AND (NOT r2)); Device BBa_K112234(BBa_K112234_rbs, BBa_K112234_orf);

3.6.2 Rule Notes Rule r1(BBa_K112234_rbs BEFORE BBa_K11223_orf); Rule r2(BBa_K112234_rbs WITH BBa_K112234_orf); Rule r3(BBa_K112126 NEXTTO BBa_K112234);

(20) (21) (22)

Rule r4(BBa_K112234_rbs.Sequence != BBa_K112234_orf.Sequence);

(23)

Rule r5(BBa_K112234_rbs.RelativeStrength > BBa_B0032.RelativeStrength);

(24)

num relativeS = BBa_B0032.RelativeStrength; Rule r6(p.RelativeStrength > relativeS);

(25)

Notes issue warnings in the output when the violation occurs. But the program continues running. In the following example the Device BBa_K112133 meets the first note’s condition. However, the next note is violated and the program will issue a warning. Note (r2 AND r3); Note (NOT r1); Device BBa_K112133(BBa_K112126, BBa_K112234);

3.7 Conditional Statements The use of conditional statements breaks up the flow of execution and allows certain blocks of code to be executed. Eugene

supports two kinds of if-statements to achieve this: Rule validating if-statement and standard if-statement. The three logical operators AND, OR, NOT can combine statements of each type but not together.

4. IMPLEMENTATION

3.7.1 Rule validating if-Statement Rules can be checked not just through Assert and Note statements but also in an if-statement. In this approach only specific rules will be considered, as they might not apply to all Devices. The notation should specify a list of Devices and a logical combination of rule instances pertaining to that list. Suppose we would like to test a rule only on the specific Device instance BBa_K112133, where the Promoter BBa_K112126 comes before the Ribosome Binding Site BBa_K112234_rbs. Then the following conditional statement can achieve such conditional evaluation. In this case, the if statement will evaluate to true: Rule r7(BBa_K112126 BEFORE BBa_K112234_rbs); if(on (BBa_K112133) r7) { Block statement, in case of true evaluation } else { Block statement, in case of false evaluation }

3.7.2 Standard if-Statement Expressions not pertaining to rules and Devices can be evaluated by the standard if-statement which supports the relational operators =, !=, == as well as the logical operators AND, OR, NOT. boolean test = true; if(test) { Assert(ruleWith); } else { Assert(NOT ruleWith); } Device BBa_K112133(BBa_K112126, BBa_K112234);

3.8 Header Files The inclusion of header files allows the use of predefined Properties, Parts and Part Instances in the program. The manageability of code in the main file is more efficient by hiding the low level implementation of sequence and Parts. The user needs only to define Devices in the main file. On such a level the program can be written quickly and it is less error prone. Also, this allows each lab to have their own header file libraries. At the same time the option to change or declare other Properties, Parts and Part instances exists in the language.

Figure 4: Complete Design Flow Using Eugene

4.1.1 Header File creation Header files give the language the functionality to access many already predefined Parts in the databases. For the purpose of convenient data exchange over the Internet, XML could be used to read information from a database. Then the data is converted into Eugene syntax to represent the header files. As a result the language definitions are not just abstract statements but are tied to existing designs. There are three main header files: PropertyDefintion.h, PartDefiniton.h and PartDeclaration.h shown in Figure 4.

4.1.2 Eugene Main File The main .eug file can include the header files, which need to be specified at the top: include PropertyDefintion.h, PartDefinition.h, PartDeclaration.h;

(26)

The main file will generally consist of custom Part definitions/declarations, Device constructs, rule implementations and control statements.

4.1.3 ANTLR ANTLR is a LL(*) recursive-descent parser generator that accepts lexer, parser, and tree grammars [10]. It is used as the parser generator for Eugene code, since ANTLR allows the reuse of grammar with different semantic actions and the creation of parsers in another language, which can be useful in implementing Eugene with other tools. After some preprocessing of the header files a data structure is created, which can be applied either directly to Spectacles (a vBOL [9] viewer in Clotho [10]) or to other visual tools after conversion to XML from our internal Data Structure (section 4.1.4).

4.1.4 Data Structure The data structure consists of four main classes, which directly relate to the Eugene syntax. Each instance of these classes is referenced by the user-defined name from the Eugene files and stored in global hash maps according to the class type.

The Device class stores the instance of a Device and the names of the ordered list of components. An image path can also be associated with a specific instance. The Rule class stores the instance of a Rule definition, where the rule statement is broken into three components: the left and right operand and the operator.

4.1.4.2 Global Data Structure Figure 5a: Global Data Structure

4.1.4.1 Classes

The data structure is divided between the storage of Part and Property definitions and the actual instances corresponding to their classes for efficient and immediate access to the data. Every instance is referenced by name and stored in a hash map according to the class it belongs. Parts, Devices, rules and primitives are kept in separate hash maps The hash map propertyDefinitions stores the defined Property labels and their type. For instance the sequence property will be of type txt as shown in Figure 6a. The hash map partDefinitions in Figure 6a stores the defined Parts and the property labels as well as any image associated with the specific Part. For example the Part Promoter will have the properties ID, Sequence and Orientation. The hash map partDeclarations contains the declared Part instances. Each instance contains a list of Property values. For example, the Part instance BBa_I0500 is of type Promoter having the properties ID, Sequence and Orientation where Sequence is of type “txt” and has “GATCTtta…” as its value as shown in Figure 6b. Similarly, deviceDeclarations, ruleDeclarations and primitiveDeclarations store the instance names referring to class instances of Device, Rule and Primitive respectively.

Figure 5b: Class Organization for Eugene The Primitive class acts as a container for numbers, text, lists and Boolean values. The Part class stores the instance of a Part and its Property values as a Hash map of Property labels referencing Primitive data types. Each Part instance will point towards the Part definition it came from through its type field. An image path can be bound to a Part where Part instances can have different images if the user specifies accordingly.

The hash maps ruleAsssertions and ruleNotes in Figure 6c store the assert and note statements as keys which point towards lists containing the individual elements of the statement in reverse Polish notation. Postfix notation is used to help evaluate the truthvalue of each assert or note statement. The statements have a global scope. Therefore, every time a Device is created the program goes through each list and applies these statements to the Device.

Figure 6a: Populating Global Data Structures with Property and Part Definitions

Figure 6b: Populating Global Data Structures with Primitive, Part, and Device Declarations

Figure 6c: Populating Global Data Structures with Rule Declarations, Assertions, and Notes

5. EXAMPLES Several Device constructs have been selected from the Registry of Standard Biological Parts [11] to show the creation of visual and textual designs. The visual representation has been implemented using Visual BioBrick Open Language symbols [9] and Spectacles [10], a visual tool editor, while Eugene was employed to demonstrate a textual representation. Due to the one to one relationship between standard biological parts and the Eugene syntax, the conversion could easily be achieved in both directions. BBa_K112809 is T4 Lysis Device with Pbad as the inducible Promoter. The Lysis device allows for the easy release of any product produced in the cell. The main file in Figure 7b displays the construct as an ordered list of Parts, while the header files are imported to hide the details of the Part declarations. With only one line of declaration, a complicated Device construct can be reproduced while the detailed information can still be accessed.

Figure 9a: Visual representation of BBa_K118021

Figure 9b: Textual representation of BBa_K118021 BBa_I8510 in Figure 8a is an Inverter that takes 3OC6HSL as input and produces lacZalpha as output. It also creates an orthogonal GFP protein generator.

Figure 7a: Visual representation of BBa_K112809

Figure 7b: Textual representation of BBa_K112809 BBa_E7104 is a GFP Reporter Device. Again, each new Device can be created using very few lines of code as Figure 8b demonstrates.

Figure 10a: Visual representation of BBa_I8510 The textual representation in Figure 10b shows the compactness and readability of a construct with twenty Parts.

Figure 8a: Visual representation of BBa_E7104

Figure 8b: Textual representation of BBa_E7104 BBa_K118021 in Figure 9a is a Promoter characterization Device. The second Part in BBa_K118021 is a Reporter as specified by the Standard Registry of Parts [11]. It has no image associated, since technically speaking BBa_J33204 is a Device. However, currently with the available standards and specifications, there does not exist a way of breaking this Device cleanly into separate existing Parts. For example, there needs to be a way of specifying point mutation, which could be a feature of Devices, currently not supported by Eugene but is considered for future releases. Therefore, BBa_K118021 was specified as the Part Reporter rather than Device in the code. Figure 10b: Textual representation of BBa_I8510

Table 2: Examination of Experimental Designs Specified Using Eugene

6. RESULTS Table 2 is an examination of actual experimentally created devices from a variety of sources. The first two devices were created in the summer of 2008 in J.C Anderson’s lab at UC Berkeley for the International Genetically Engineering Machine Competition. These can be found in MIT’s registry of standard biological parts. The third device is from Drew Endy’s lab at Stanford. The fourth part is from Edinburgh’s 2008 iGEM team. The fifth part is from Tom Knight’s lab at MIT. The description of these devices as well as their registry id is provided. In addition we detail the number of properties, parts, and devices used by Eugene to specify them. For running the examples a MacBook Pro was used with a 2.4 GHz processor speed and 2 GB of memory. The purpose of this investigation was to display the significance in the separation of Part and lower level property information, which is hidden in the header files from the Device level construction in the main file. As a result about an average of 85% less code is utilized in the main file. The real strength of the language becomes apparent with the introduced level of abstraction through the transfer of Part designs from databases to the header files. Such functionality allows the augmentation of data from other sources to the main design. The focus shifts from the low level details of the system to design validation through rule constructs and control flow implementation. The program becomes easier to manage. At the same time, the ratio of DNA base pairs to total lines of code implies the portability of very complex designs with large base pairs to other tools or systems. Sharing designs becomes much easier, especially, since the creation of the data structure is quickly achieved. The compilation times are very reasonable as the column in the table above shows. We have confidence that as designs move to encompass 10’s or 100’s of devices, the compilation time will remain very reasonable.

7. CONCLUSION AND FUTURE WORK Overall, Eugene is a human writeable language that demonstrates both efficient textual representation of biological systems but also the capability of validating constructs. It presents the ability to work both on lower level property information but also on a Device level design. Eugene proves to be a portable solution to large system designs through its connection to different databases. Furthermore, based on its direct one to one relationship to standard biological Parts, Eugene can be translated to visual designs and vice versa. For future releases, it will be important to incorporate other control statements. The language will require a notion of systematically going through lists, which can be achieved through loops. This can be useful when different combinations of Parts or Devices need to be traversed and some operations on them performed. At the moment the highest level in the Parts Hierarchy are Devices. Ideally, we would like to extend Eugene to contain Systems and the ability to operate on such a level by providing built-in functions, which will depend on new assembly standards. At the same time the user should have the ability to create functions. This introduces the importance of scope in variables and instances, since functions should only apply to specific instances. Currently, all instances in a file can be accessed globally. Another aspect in the language is the direct access to a database. By providing a function to connect to a specified database would certainly give more expressional power to the language. Currently, data base access is performed outside of Eugene by translating XML information from the database to Eugene code.

ACKNOWLEDGMENTS We want to acknowledge the following people who were helpful in the development/discussions of Eugene: Cesar Rodriguez, Drew Endy, Joanna Chen, Richard Mar, Thien Nguyen, Nina Revko, Bing Xia, Evan Yang, Robert Ovadia, Akshay Maheshwari, J. Christopher Anderson, Armen Khodaverdian, Nade Sritanyaratana, Paul Hilfinger. Also we would like to thank the Center for Hybrid Embedded System Software (CHESS) as well as the Summer Undergraduate Program in Engineering Research at Berkeley (SUPERB).

8. REFERENCES [1] Matsuoka, Y., Ghosh, S. and Kitano, H. 2009. Consistent design schematics for biological systems: standardization of representation in biological engineering. J. R. Soc. Interface. published online 3 June 2009. DOI= http://rsif.royalsocietypublishing.org/10.1098/rsif.2009.004 6.focus [2] Endy, D., Foundations for engineering biology. Nature, 2005. 438(7067): p. 449-53. [3] Arkin, A., Setting the standard in synthetic biology. Nat Biotech, 2008. 26(7): p. 771-774.

[4] Hucka, M. et al. 2002. The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics, Mar 2003; 19: 524 - 531. DOI = 10.1093/bioinformatics/btg015 [5] http://www.cellml.org/ [6] http://www.physiome.org.nz/xml_languages/fieldml [7] Smith LP, Bergmann FT, Chandran D and Herbert M. Sauro. Antimony: A modular model definition language. Bioinformatics. July 2009. DOI = 10.1093/bioinformatics/btp401 [8] Cai Y, Hartnett B, Gustafsson C, Peccoud J (2007). A syntactic model to design and verify synthetic genetic constructs derived from standard biological parts. Bioinformatics 23(20): 2760-2767. [PubMed ID: 17804435]. [9] http://openwetware.org/wiki/Endy:Notebook/BioBrick_Ope n_Graphical_Languagehttp://www.antlr.org/ [10] http://biocad-server.eecs.berkeley.edu/wiki/index.php/Tools [11] http://partsregistry.org/