1999 Oxford University Press
Nucleic Acids Research, 1999, Vol. 27, No. 2
601–607
‘LABNOTE’, a laboratory notebook system designed for academic genomics groups Marie-Christine Imbert1, Van Khanh Nguyen, Samuel Granjeaud, Catherine Nguyen and Bertrand R. Jordan* TAGC group, ICIM, Centre d’Immunologie INSERM/CNRS de Marseille-Luminy, Case 906, 13288 Marseille Cedex 9, France and 1Centre de Thérapie Génique, IPC, 232 Boulevard de Ste Marguerite, 13273 Marseille Cedex 9, France Received August 31, 1998; Revised and Accepted November 20, 1998
ABSTRACT We have developed a relational laboratory database system, adapted to the daily book-keeping needs of laboratories that must keep track of information acquired on hundreds or thousands of clones in an effective and user-friendly fashion. Data, whether final or related to experiments in progress, can be accessed in many different ways, e.g. by clone name, by gene, by experiment or through DNA sequence. Updating, import and export of results is made easier by specially developed tools. This system, in network version, serves several groups in our Institute and (over the Internet) elsewhere, and is instrumental in collaborative studies based on expression profiling. It can be used in many similar situations involving progressive accumulation of information on sets of clones or related objects. INTRODUCTION Many public databases have been established to store and make available all kinds of genomic data, from maps to sequences through catalogues of mutants and protein motifs (1). Recent efforts have been aimed, in particular, at making gene expression data publicly available and at the same time providing users with data analysis tools (2). In addition, large laboratories, such as Genome Centers involved in intensive genome mapping or sequencing, have set up their own database systems to store their results and prepare the data for distribution to the scientific community. Being geared to a particular operation, such systems are rarely made available or even published. One of them, the ‘Genome Notebook’, developed primarily to handle results from the human chromosome 11 mapping project, has however been described in some detail (3). Increasingly, ‘conventional’ laboratories (i.e. groups of relatively small size operating in an academic environment) are interfacing with the genome project and making use of its results (e.g. sequence or mapping data), but also of resources such as the IMAGE cDNA clone set (4), and of semi-automated procedures that boost throughput by one or two orders of magnitude. This trend results in a large increase in the number of objects and in the amount of
information, requiring efficient archiving and easy retrieval of experiments in progress, intermediate results and ‘final’ data. The traditional approach, i.e. manual notebooks supplemented by a number of computer files in spreadsheet software, can no longer cope with this data flow, and a proper laboratory database becomes necessary. A number of projects developed in our Institute are centred around genes expressed in the mouse thymus. We use organised cDNA libraries, measure expression levels by hybridisation of DNA arrays with complex probes, and obtain additional information (tag sequence, genome mapping, etc.) for sets of clones selected according to their expression pattern (5–8). Thus the information we wish to store in a laboratory database is largely organised around a list of clones (expression data, sequences, results from Southerns and northerns, etc.) but also includes the description of libraries, the make-up of specific arrays, as well as protocols or publication references. Other ongoing research programmes use sets of a few hundred IMAGE cDNA clones as reagents for expression profiling in various situations; again, good book-keeping is essential to keep track of clone choice, procurement, verification and of expression data. Ready-made membranes provided by several suppliers (http://www.clontech.com/clontech/Catalog/ Hybridization/Atlas.html , http://www.genomesystems.com/GDA/ and http://www.resgen.com/ ) as well as by resource centres are also used in some projects and generate a need for data archiving. To be really useful in the context of a biological laboratory, a notebook system must be extremely user-friendly: it should be used daily by each member of the group, and the interface must be designed with this in mind. It should run well on affordable machines that the prospective users are familiar with, i.e. in most cases on PC or Macintosh microcomputers. The system must be extremely flexible and allow additions and changes to be made without loss of previously entered data, to accommodate easily new experimental approaches or new ways of analysing existing information; data security and access privileges should also be well organised. We have used the 4th Dimension (ACI) relational database management system (http://www.aci-4D.com/ ) to develop a laboratory database, LABNOTE, aiming to fulfil this need. 4th Dimension (4D) has been used previously for biological databases (9,10), for a large number of (unpublished) medical databases,
*To whom correspondence should be addressed. Tel: +33 4 91 26 94 96; Fax: +33 4 91 26 94 30; Email:
[email protected]
602
Nucleic Acids Research, 1999, Vol. 27, No. 2
and for at least one genome mapping database (3). Although some aspects of our implementation are tailored to the specific needs of our project, LABNOTE has proven readily adaptable to other laboratories and we believe it has general applicability in groups performing various genomic and expression studies. SOFTWARE AND DATABASE Database management system Labnote was developed in the ‘ACI 4th Dimension’ Relational DataBase Management System (RDBMS) (http://www.aci-4D. com/ ). Dr G. A. Evans provided us with a version of the ‘Genome Notebook’ (3) constructed in the same system; while this is geared to a different aim (data handling for a large-scale genome mapping project), it was very helpful in terms of defining data architecture and important relationships. 4th Dimension (4D) is an RDBMS whose relational technology allows modular application development and good control of the appearance of corresponding views. Its graphic module allows the definition of database structure by drawing entities and links. The 4D RDBMS is platform-independent. Applications developed in e.g. Windows 95, Windows NT, Mac OS or Power PC can be deployed, without changes, on all the other platforms. The request language used is adapted to 4th Dimension. Referential integrity is automatically implemented (http://www.aci-4D.com/ ). For the complete development of LABNOTE, we used various procedural packs (e.g. ACI-Pack v1.9.2c, Button Package v3.0.2, File Pack v3.0.5), and a number of development tools (e.g. 4D Insider, 4D Transporter, External Mover). LABNOTE was originally developed in ACI 4th Dimension v5.0.2; it presently uses 4D Server v1.5.4, the multi-user version of 4th Dimension. The unified client/server architecture optimises database performance and provides a transparent interface in a heterogeneous hardware environment (PC or Macintosh Client). 4D Server sends each client the requested data in the format adapted to its deployment platform. 4D makes available a number of functionalities that are very useful in an experimental approach such as ours: ‘Format List’ (allowing work on sets of clones), ‘Enumerations’ and ‘Pull-down menus’ (allowing guided data entry with choice limited to a subset of a previously defined list). In addition, various formats of Import–Export are available such as 4D Format, to save sub-selection of clones or hybridisation results etc., or Text Format for the creation of files compatible with a robot used for physical rearraying of selected clones. Implementation A general outlook indicating the kind of data stored, the links used and the structural organisation of LABNOTE is given in Figure 1. The database is primarily organised around four entities: libraries, clones, experiments and sequences. The network version of LABNOTE runs on 4D Server version Windows NT and supports connections (via Internet or Intranet networks) to PC and Macintosh platforms. The 4D Server license is based on the maximum number of concurrently connected users (four in our case). The distribution of client software is free, only actually used licenses need to be purchased; expansion packs in 1, 5 or 10 increments are available for additional users. Although the 4D system provides client/server architecture, significant programming efforts were necessary to ensure security of data and consistency while remaining user-friendly. Different access levels were
defined, ranging from ‘consultant’ (read-only access of a data subset) to ‘Manager’ (full read/write access). Referenced users are mainly allowed to enter new data, and can only modify or delete items entered previously by themselves. With such a security scheme, for example, many users can add comments or data to a clone, but only the Manager can decide to delete it from the set (11). Description of database features We primarily use LABNOTE via its main menu toolbar, that has been divided in three submenus: Tools, Experiments and Results (Figure 2). All the data stored in LABNOTE are visible from this menu toolbar but there are also others links between these pieces of information. The Tools menu leads to all information concerning cDNA libraries (Library), High Density filters (HD Filters) prepared from these, experimental protocols (Protocol) used for all experiments stored in LABNOTE, and some bibliographic references (Biblio Reference). The Experiments menu leads to results from High Density membrane hybridisations (HD hybridisations), northern blots (Northern Blots) and Polymerase Chain Reaction (PCR). For these experiments, some experimental conditions, an author name, a creation date, results and free comments are stored. Results Menu: in LABNOTE all information concerning any clone (Clones) and any sequence (Sequences) is stored as ‘results’. Capture of these results is limited and modifications can only be done with specific authorisation. The two tables harboring the most significant data (in terms of quantity and of relevance to the principle of analysis and synthesis of the results) are tables ‘Hybrid HD’ and ‘Clones’. ‘Hybrid HD’ contains in particular: the name of the filter hybridised (FilterName), the name of the probe used (Probe), the name of the image filter (FileImage), the name of the author of the experiment (Author) and the result of the quantification (Quantification): more than 20 000 currently. Examples of data access procedures To highlight the main features of this database, we present here some examples of its use: One clone and the corresponding experiments (Fig. 3). From the view of a clone, it is possible to see all experiments in which it has been involved: HD hybridisations where the clone was present on a given HD membrane, any northern blot in which it was used as probe, the set of PCR reactions (not shown) in which the clone was amplified; and all available information concerning its sequence. One sequence and its functionalities (not shown). After (partial or complete) sequencing of a clone, its sequence is stored in LABNOTE; vector sequences can be removed, and it can be used for comparison using Internet tools. Comparison results can be directly stored in LABNOTE. Northern blot: values and image (not shown). In LABNOTE, each northern blot is stored with the corresponding experimental conditions, the results of quantification and also its image (200 kb maximum file size; larger or very numerous images would require external storage to avoid slowing down the system).
603 Nucleic Acids Acids Research, Research,1994, 1999,Vol. Vol.22, 27,No. No.12 Nucleic
603
Figure 1. General structure of the LABNOTE database. The four main tables around which the library is organised are Library, Clones, Experiments and Sequences, each with a number of fields and a few small subtables (thin black links). Active links between the files are indicated in blue. Experiments are an open and evolutive category; they are defined by ‘type’ and ‘IdExp’ and are accessed ad hoc without automatic links (see the three types of experiments displayed, i.e. ‘PCR’, ‘Hybrid HD’ and ‘BlotFilter’). This makes it possible to add further kinds of experiments without modifying the database structure.
Subsets of selected clones (not shown). At present nearly 50 000 clones from five different cDNA libraries are stored in LABNOTE. Most users work with subsets of these. Accordingly, it is possible
to save a set of clone names in a file (text or 4D format) and to reload this selection at any time. This will bring up the corresponding clones with all the information that is attached to
604
Nucleic Acids Research, 1999, Vol. 27, No. 2
Figure 2. Submenus, directly accessible items and links between them.
them. Users can refine selections by use of filtering queries or Boolean operators. The resulting text files are used, for example, as input to a robot for rearraying of sets of clones from a number of original plates to one or a few new ones. Assembly of hybridisation results for a set of clones (Fig. 4). This functionality allows the user to specify a list of clones; the probes that have been used for them are then displayed, and the user can output an EXCEL table with the hybridisation quantification value for each of the chosen probes for each clone of the list. Such a table can then be numerically analysed, correlated to new results or displayed as grey levels using macrocommands in Excel with more ease than within 4D. DISCUSSION The definition of the basic database structure for LABNOTE was a long, interactive and iterative process, beginning from the start of the experimental project. At the outset, the members of the group had little idea of what a laboratory database could achieve and how it should be organised; the example of the Genome Notebook (3) was very useful to give a feeling for what was possible. After a number of meetings and discussions with the developer (who was not familiar with our experimental approaches), a first working database was constructed and tested; successive versions were produced until a reasonably adapted—but still evolving—system was in operation. LABNOTE has proven absolutely essential to our research. The attachment to any clone of virtually all the information ever obtained is extremely useful, has avoided many unnecessary experiments and helped tremendously to achieve close collaboration between several groups that are interested in different aspects of thymus function but use the same technology and the same libraries to find ‘new’ genes relevant to their interests. The flexibility of the system has been amply demonstrated as new features demanded by the users were incorporated into successive versions without loss of previously entered data. Users can
provide comments on their experiments or analyses, and data is available for reanalysis at any subsequent time in the light of newly acquired information. In our operation, expression data is first acquired as text files generated by image analysis of phosphor plate data. These results are then imported into EXCEL spreadsheets where a number of correction and normalisation procedures are carried out by macrocommands. The verified expression data is then transferred to the database, from which it can be later exported for specific subsets of clones as indicated in the examples (Fig. 4). The ease with which information in many different formats can be imported thanks to the special tools available is impressive. In addition, it is quite practical to run simultaneously LABNOTE and a WWW browser and to alternate between them. For example, the partial sequence of a clone accessed in LABNOTE can be pasted into a WWW BLAST form for comparison with newly archived ESTs; BLAST results can then be saved in LABNOTE, replacing the previous (and now obsolete) sequence comparison. The system as described in this paper handles many different kinds of information, and can be used with little adaptation in different projects; variations on this theme can be produced fairly easily thanks to the powerful tools provided by the 4th Dimension system, although this requires additional software and some programming expertise. Hard disk requirements are modest—the whole database structure occupies 1.5 Mb, and all our present data (including more than 43 000 clones and 10 000 tag sequences), 43 Mb. The new version 6 of 4D provides Web support; this will allow any Web navigator to interact with the database. In conclusion, we believe that this system represents a very practical approach to better laboratory management in an academic environment. AVAILABILITY OF SOFTWARE The current version of LABNOTE is freely available to academic users; the only necessary commercial software is the relatively
605 Nucleic Acids Acids Research, Research,1994, 1999,Vol. Vol.22, 27,No. No.12 Nucleic
605
Figure 3. Example of information accessible from a clone name. Top: ‘Clone’ view, accessed e.g. by giving the clone name (MTA.F02.091) in the ‘Clone List’ accessible from the ‘Results’ submenu, and a few of the views that can be accessed from there: hybridisation experiment (middle left), northern image (bottom left, quantified data is also stored as well as the makeup of the northern blot), interpreted sequence summary (middle right), actual sequence and comparison results (bottom right). More information is accessible from the ‘Clone’ view e.g. rearrayed sets (‘Plate’), PCR data, library of origin etc.
inexpensive ‘4D runtime’ package. It can be provided in single-user version for PC or Macintosh machines; a Windows NT version is also available. Adaptation to the specific needs of a given laboratory may require programming additions or changes that we cannot
undertake to perform. Please contact
[email protected] for details. The Resource Centre of the British Human Genome Mapping Programme will shortly make a version of this system available to its users.
606
Nucleic Acids Research, 1999, Vol. 27, No. 2
Figure 4. Extraction of multiple hybridisation data for a defined set of clones. A list of clones is imported into LABNOTE or defined within the ‘Clone List’ using the sorting tools available (tool bar). The ‘Prob.’ button calls up a list of probes that have ‘seen’ any of these clones. The user indicates which probes are of interest, and an EXCEL table giving the data is generated (a value of 1 corresponds to an mRNA abundance of 0.1% after correction and normalisation; 7). Blanks correspond to absence of data, i.e. clones that have not been hybridised with this particular probe.
ACKNOWLEDGEMENTS We thank Dr G. Evans for communication of the (then unpublished) Genome Notebook database (3). This work was supported by
institutional grants from INSERM and CNRS to our Institute, as well as by specific grants to the TAGC group from the French Muscular Dystrophy Foundation (AFM) and GREG (Groupement de Recherches et d’Etudes sur les Génomes).
607 Nucleic Acids Acids Research, Research,1994, 1999,Vol. Vol.22, 27,No. No.12 Nucleic
REFERENCES 1 Database issue (1998) Nucleic Acids Res., 26, 1–390. 2 Ermolaeva,O., Rastogi,M., Pruitt,K.D., Schuler,G.D., Bittner,M.L., Chen,Y., Simon,R., Meltzer,P., Trent,J.M. and Boguski,M.S. (1998) Nature Genet., 20, 19–23. 3 Clark,S., Evans,G. and Garner,H. (1994) In Smith,D.W. (ed.), Biocomputing: Informatics and Genome Projects. Academic Press, San Diego, CA, pp. 13–24. 4 Lennon,G., Auffray,C., Polymeropoulos,M. and Soares,M.B. (1996) Genomics, 33, 151–152. 5 Nguyen,C., Rocha,D., Granjeaud,S., Baldit,M., Bernard,K., Naquet,P. and Jordan,B.R. (1995) Genomics, 29, 207–216.
607
6 Granjeaud,S., Nguyen,C., Rocha,D., Luton,R. and Jordan,B.R. (1996) Genet. Anal., 12, 151–162. 7 Bernard,K., Auphan,N., Granjeaud,S., Victorero,G., Schmitt-Verhulst,A.M., Jordan,B.R. and Nguyen,C. (1996) Nucleic Acids Res., 24, 1435–1442. 8 Rocha,D., Carrier,A., Naspetti,M., Victorero,G., Anderson,E., Botcherby,M., Guenet,J.L., Nguyen,C., Naquet,P. and Jordan,B.R. (1997) Immunogenetics, 46, 142–151. 9 Mohr,E., Horn,F., Janody,F., Sanchez,C., Pillet,V., Bellon,B., Roder,L. and Jacq,B. (1998) Nucleic Acids Res, 26, 89–93. 10 Klainguti,G., Chamero,J. and Presset,C. (1995) Klin Monatsbl Augenheilkd, 206, 397–400. 11 Imbert,M.C. (1997) DESS Report, Institut National Polytechnique Toulouse III.