Journal of Integrative Bioinformatics 2007
http://journal.imbio.de/
A Reliable and Distributed LIMS for Efficient Management of the Microarray Experiment Environment
Hee-Jeong Jin BK Center for U-Port IT Research Education, Pusan National University, Busan, South Korea,
[email protected] Jeong-Won Lee and Hwan-Gue Cho Dept.of Computer Science and Engineering, Pusan National University, Busan, South Korea, {jwlee,hgcho}@pusan.ac.kr Summary A microarray is a principal technology in molecular biology. It generates thousands of expressions of genotypes at once. Typically, a microarray experiment contains many kinds of information, such as gene names, sequences, expression profiles, scanned images, and annotation. So, the organization and analysis of vast amounts of data are required. Microarray LIMS (Laboratory Information Management System) provides data management, search, and basic analysis. Recently, microarray joint researches, such as the skeletal system disease and anti-cancer medicine have been widely conducted. This research requires data sharing among laboratories within the joint research group. In this paper, we introduce a web based microarray LIMS, SMILE (Small and solid MIcroarray Lims for Experimenters), especially for shared data management. The data sharing function of SMILE is based on Friend-to-Friend (F2F), which is based on anonymous P2P (Peer-to-Peer), in which people connect directly with their “friends”. It only allows its friends to exchange data directly using IP addresses or digital signatures you trust. In SMILE, there are two types of friends: “service provider”, which provides data, and “client”, which is provided with data. So, the service provider provides shared data only to its clients. SMILE provides useful functions for microarray experiments, such as variant data management, image analysis, normalization, system management, project schedule management, and shared data management. Moreover, it connections with two systems: ArrayMall for analyzing microarray images and GENAW for constructing a genetic network. SMILE is available on http://neobio.cs.pusan.ac.kr:8080/smile.
1 Introduction Laboratory Information Management System (LIMS) is a system that allows laboratories to trace and track all samples or specimens that are received into the laboratory, as well as all tests that are performed. It can also track the analyst, date and time of each step in the analysis process. Weinberg introduced three major functional areas of the LIMS that illustrate the relationship between the information management system and the other management systems in the laboratory [1]. The first is sample tracking. It determins that the appropriate work is
Journal of Integrative Bioinformatics 2007
http://journal.imbio.de/
Figure 1: Life cycle of experimental data: Experimental data is handled at many steps until it is destroyed. The LIMS basically handels all the data of the life cycle.
properly completed and that the workload is properly managed. The second is sample analysis. It must be documented, and the sufficient raw data must be maintained to reconstruct and defend the result. Finally, these results must be organized and reported in a manner that is understandable and meets the requirements of the end user. Figure 1 shows the typical life cycle of experimental data. Several researchers and companies have developed the LIMS for microarray experiments. A microarray is a principal technology in molecular biology. It results in thousands of expressions of genotypes at once[2, 3]. The microarrays are queried in a co-hybridization assay using two or more fluorescently labeled probes prepared from the mRNA from the cellular phenotypes of interest[4]. The hybridization allows expression levels to be determined relative to the ratio with which each probe hybridizes to an individual array element. So, the cDNA microarray method does not measure the absolute expression of the genes, but it is used for a measure of relative expression levels compared to a sample tissue. Hybridization is assayed using a confocal laser scanner to measure fluorescence intensities, which allow the simultaneous determination of the relative level of expression of all the genes represented in the array. The researchers learn the relationships between the genes and genes involved with diseases using this experiment. Generally, various files, such as a microarray scanned image, analysis file and normalization file are generated by microarray experiments. In addition, many steps are needed for getting information, such as the relationships between genes or highly expressed genes at specific conditions from microarray experiments. The first step of the microarray experiment is to design an experiment and perform a hybridization experiment. The second step is to generate a raw image from a scanner, which consists of spots (genes) that form regular arrays (blocks). Next, in order to measure the amount of RNA bound to each spot, the location of each block and spot must be identified in a process, called “gridding”. After the analysis of the microarray image, the normalization is performed to reduce variations of the experiment. The various methods, such as clustering and estimation, are used for getting biological verification and interpretation. So, typically, various and vast information is generated, such as microarray scanned images and various analysis files. Therefore, it requires an efficient management system of vast amounts of data. Since it is difficult to manage these files for the size and variation, it is necessary to use LIMS for the microarray experiment. Recently, joint research collaboration with microarray has been widely conducted for largescale research projects. This research requires the data sharing among laboratories within a joint research group with security. However, as far as we know, there are few LIMS for microarray data sharing within the joint research group. In this paper, we introduce a web based microarray LIMS, SMILE (Small and solid MIcroarray Lims for Experimenters), especially for managing shared data. In SMILE, the data sharing is based on Friend-to-Friend (F2F) which is a particular type of anonymous P2P (Peer-to-Peer) in which people can connect directly with their “friends”. F2F only allows its friends to exchange data directly, using IP addresses or digital signatures.
Journal of Integrative Bioinformatics 2007
http://journal.imbio.de/
SMILE is based on LINUX and it provides useful functions such as data management, image analysis, normalization, system management, project schedule management, and shared data management. Moreover, it works together with the two systems for analyzing the microarray image (ArrayMall [5]) and constructing the genetic network (GENAW [6]).
2 Previous Work There are several microarray LIMS for the management and analysis of microarray data. BASE is a comprehensive database server used to manage massive amounts of data generated by microarray analysis [7]. It manages biomaterial information, raw data and images, and provides integrated and “plug-in”-able normalization, data viewing and analysis tools. Additionally, for labs that prepare their own in-house arrays or for labs that wish to track probe information, the system also has array production LIMS features which can be integrated with the data analysis. The organization and interface of BASE was designed to closely follow the natural work-flow of the microarray biologist, and is compatible with most types of array platforms and data types. ACUITY is a complete enterprize microarray informatics platform for microarray data storage, data filtering and data analysis [8]. It has several features like clustering, normalization, visualizations, data management and analysis of data. As for clustering, this system includes both hierarchical and non-hierarchical clustering methods for different experimental tasks. It is composed of server and client applications. The server manages and runs the database. The client has many analysis methods for microarray data. ARGUS is a microarray database software system designed to process, analyze, manage, and publish microarray data [9]. In particular, it imports the intensities and images of externally quantified microarray spots, performs normalization, and calculates ratios of gene expressions between conditions. Also, searches for regulated genes can be conducted across multiple experiments, and the integrated results incorporate images of the actual hybridization spots for artifact screening. ARGUS uses the built-in database and microsoft windows server for the software platform. Therefore, they form a stand-alone database that can search locally or over the web by using a standard web browser. Basically, LIMS should provide several basic functions. The first function is the management of microarray data. In order to understand biological meaning from microarray experiments, researchers analyze many microarray images, annotation information and analysis files. So, various and large amounts of microarray files must be managed to be well structured. The second is the data search function. There is a lot of information in stored files within LIMS, so the search tools are useful to find information which the user wants. The third is the management of the users. There are many users that use and analyze microarray data by LIMS, so the system should manage the users for data security and management. The fourth is the security function. Microarray data is not opened until researchers find the meaning. Therefore, the security of the data has to be be guaranteed in LIMS. The last is the accessibility of easy link to external DB. Since there are many databases for this information, LIMS should provide the web link to external DB to get more information of genes. Table 1 shows a comparison of SMILE and the previous systems.
Journal of Integrative Bioinformatics 2007
http://journal.imbio.de/
Table 1: The comparison of SMILE and previous systems, BASE [7], ACUITY [8] and ARGUS [9]. SMILE provides several functions comparing with the previous systems additionally, management of project schedule and sharing data, analysis of genetic network and communication among users.
Function hierarchical management management of duplicated experiment management of project schedule management of users management of sharing data search link to external DB reporting backup/restore metafile creation analysis of microarray image analysis of genetic network normalization communication among users MAGE-ML export MAGE-OM import
BASE
Argus
Acuity
SMILE
O O X O X O O X X O O X O X O O
O O X O X O O O X X X X O X X X
O O X O X O O O O X O X O X O X
O O O O O O O O O O O O O O O X
3 System Architecture SMILE was designed for sharing data among joint research group, management and analysis of various microarray data based on the distribution environment because it is more efficient than a centralized LIMS. Since a manager has power to access any data stored in the centralized LIMS, it is a problem of security. In addition to the security problem, vast amounts of storage are needed in the system. However, since each lab stores only its own data and other labs cannot access the data without specific permission, there are no security and storage problems in the distribution of LIMS. So, the distributed LIMS is more useful than the centralized LIMS. Figure 2 shows the architecture of SMILE based on the distributed environment. In SMILE, analysis files of the microarray experiment from its analysis system are otained individually in each laboratory and they can be stored in SMILE. Thus, a user can apply SMILE integrating with the analysis system ArrayMall [5] which is provided with SMILE, or individual analysis systems. Figure 2 shows three different views of SMILE. the first type is SMILE with ArrayMall [5]. In this case, the user can store experiment and analysis files from ArrayMall to SMILE. In the second and third types, users can get microarray data from their private analysis system, so they can store the data in their file system or local LIMS, and then manage them using SMILE.
4 Basic Functions of SMILE In this section, we will introduce several tools in SMILE.
Journal of Integrative Bioinformatics 2007
http://journal.imbio.de/
Figure 2: SMILE architecture based on the distributed environment: Scanned images and analysis files of the microarray experiment from its analysis system are obtained individually in each laboratory and they can be stored in SMILE.
4.1 Management of Experimental Data In order to manage and share microarray data, all microarray data are managed in a hierarchical architecture in SMILE. Figure 3 shows a hierarchical structure of microarray data, where it consists of 4 parts = {project, experiment, work and shot}. Project is a unit which contains total experimental steps and includes several experiments. Experiment is a sub-part of the project unit that is divided by experimental conditions. Consequently, it is a unit with the same experiment target and microarray design. The experiment unit includes several works according to the variation of conditions. And shot as a basic unit that consists of images and its analysis files. For example, suppose that we accompany a research for identifying factors that influence the growth of rice. In this case, a project unit is the object and whole experiment. If we select and experiment with the “light” factor, an experiment would set the “light”. And then, if it is an experiment with light according to the amount of light, the shot can set “the amount of light”, such as 10 or 20 minutes. Finally, the shot should be a microarray image from each experiment according to the amount of light or its analysis file. Figure 4 shows the hierarchical structure of the microarray experiment concerning the factors of rice growth. 4.2 Management of LIMS System It is important to manage the microarray data security. For management of the SMILE system, it provides several functions, such as establishing the user level and managing log files for data update and system access. • Management of LIMS user The level of users in SMILE consists of 4 parts: administrator, project manager, researcher and visitor. Users have different access rights according to their level. Administrators manage the system and all the users. Only a project manager can add/delete users and change the user level to manage the access to microarray data. Each researcher can
Journal of Integrative Bioinformatics 2007
http://journal.imbio.de/
Figure 3: The structure of microarray data in SMILE. It consists of 4 parts = {project, experiment, work and shot}, and these are of hierarchical form.
Figure 4: The hierarchical structure of the microarray experiment concerning the factors of rice growth.
Journal of Integrative Bioinformatics 2007
http://journal.imbio.de/
Figure 5: Snapshot of managing users: In SMILE, there are users on 4 levels: administrator, project manager, researcher and visitor. Administrator can change the user level and delete the user on a web page for the management of the user.
upload their analysis files and use various analysis functions provided by SMILE. Visitors can use the test project and restricted functions of SMILE. Figure 5 shows a snapshot of a web page for changing the user level and deleting users. • Management of system log file There are several log files for inspecting the system in SMILE. The log files can be created and modified whenever data is uploaded, modified, and deleted with its working time. • Data backup/restore SMILE provides many database management tools, such as backup and restore. For this, all the information within the microarray data structure, such as a project or an experiment, is stored. Since there are large data which is stored in SMILE, we need to backup the data of each project regularly and separately. Figure 6 shows a snapshot of a web page for backup and restore. 4.3 Analysis of Experimental Data One of the important functions for microarray LIMS is the analysis of the microarray data. SMILE provides the following features for data analysis.
Journal of Integrative Bioinformatics 2007
http://journal.imbio.de/
Figure 6: Snapshot of a web page for backup and restore: This web page shows the information of each project on this web page. The user can download a backup file of a selected project by the user and restore it using a backup file.
1. 2D/3D Visualization Typically, a single microarray can be scanned in more than one scanner under different settings, so this information is important because the user can use different imageanalysis systems according to these parameters. SMILE provides several 2D/3D visualization tools with various information of the scanned microarray image and spots. Figure 7 shows visualization tools of SMILE. Figure 7 (a) shows a visualization of a microarray image with its information of a scanner, analysis system and microarray design. Figure 7 (b) shows 2D/3D plots of a spot. We can easily detect the distribution of its intensity. Figure 8 shows a view of a microarray image. In a view, a user can see the intensity and shape of each spot with 2D/3D plots, and compare multi-spots by using a multi-spot viewer (Figure 9). SMILE provides other visualization tools, such as a heatmap and multi-spot viewer. Since a microarray experiment is performed in a 6-fold duplication, we need to understand whether the state of the current experiment is successful or not. A heatmap, which uses a gradient of colors to represent data values, has been successfully applied to represent microarray experiment quality. It is designed to view any high throughput data including microarray data. Figure 10 shows a heatmap visualization of SMILE. Spots with green color mean the spot has small variations within the duplicated experiments. Otherwise, spots with red color have large variations within the duplicated experiments. 2. Normalization SMILE provides several normalization methods to users. Normalization is a procedure that identifies and removes systematic sources of variation, such as different labeling efficiencies and scanning properties of the dyes, print-tip or spatial effects. SMILE not only
Journal of Integrative Bioinformatics 2007
(a) a visualization of a microarray image
http://journal.imbio.de/
(b) 2D/3D plots of a spot
Figure 7: The 2D/3D visualization tools of SMILE: (a) a visualization of a microarray image with its scanner, analysis, microarray design information. (b) 2D/3D plot of a spot. We can get the distribution of intensity of each spot.
Figure 8: The viewer of microarray image: user can see the intensity and shape of each spot with 2D/3D plots.
Journal of Integrative Bioinformatics 2007
http://journal.imbio.de/
Figure 9: The multi-spot viewer: This viewer shows an intensity and a shape of selected spots from all the spots by a user.
provides a normalization method obtained from the BioConducter [10], but also several normalization methods related previously, which are written by R-language. Figure 11 shows a snapshot of the normalization method on SMILE. 3. Metafile Processing The metafile is a file composed of user-selected columns from the microarray analysis data sets that include several data fields of independent files that we want to put together into a single file. Figure 12 shows a snapshot of metafile processing on SMILE. In the metafile processing, selected image analysis files are combined into one metafile with selected fields. 4.4 Groupware System When we are included in a specific project group, we need a communication facility to check the progress of the project. SMILE provides a groupware tool which is called “collaborative features”. It provides a mechanism that helps users coordinate and keep track of ongoing projects. It also gives document sharing, group calendaring and scheduling, group meeting and task management. We can communicate with our own colleagues using memo, mailing or bulletin functions in SMILE. Figure 13 shows the schedule system in SMILE. Users can check their working plan and a project manager can allocate tasks to researchers by assigning them in a schedule sheet in SMILE.
Journal of Integrative Bioinformatics 2007
http://journal.imbio.de/
Figure 10: Snapshot of heatmap visualization on SMILE: In a heatmap, spots with green color mean the spot has small variations within the duplicated experiments. Otherwise, spots with red color have large variations within the duplicated experiments. So, a microarray image including large green spots on a heatmap is well performed.
Figure 11: Snapshot of the normalization procedure on SMILE: SMILE not only provides a normalization method obtained from the BioConducter [10], but also several normalization methods related previously, which are written by R-language.
Journal of Integrative Bioinformatics 2007
http://journal.imbio.de/
Figure 12: Snapshot of a web page for metafile processing in SMILE.
Journal of Integrative Bioinformatics 2007
http://journal.imbio.de/
Figure 13: The schedule system of SMILE. Researchers can check their work from the schedule system and project manager can allocate researchers to their work.
5 Distributed LIMS for the Joint Research Group In joint experiments of microarray, data sharing among laboratories is a crucial function, since researchers are required to send data to others by e-mail, FTP or personal visiting, which is not secure and time-consuming. SMILE is especially designed for the data sharing of a joint research group to solve these problems. The data sharing function of SMILE is based on Friend-to-Friend (F2F), which is a particular type of anonymous P2P (Peer-to-Peer) in which people are directly connected to their “friends”. F2F only allows its friends to exchange data directly using IP addresses or digital signatures you trust. In SMILE, there are two types of friends, “service provider” and “client”. The service provider sends data to the “client” and the “client” is provided the data. In this case, the service provider should control shared data to its clients only. In order to permit clients, the administrator of the service provider should set the ID and password of client and finally determine which data is to be shared. The ID and password of the service provider also can be controlled and set on the client side. Throughout this procedure, the data sharing is performed between two or more SMILEs in terms of the service provider and its client’s concepts. Figure 14 shows the data sharing procedure control within SMILEs with “friends” relations. Figure 14 shows one service provider, SMILE1, and two clients, SMILE2 and SMILE3. There are “friend” relations between two pairs: SMILE1 and SMILE2, SMILE1 and SMILE3. SMILE1 provides Prj.1(Project 1) to SMILE2, so SMILE2 can access all the data within Prj.1. We know that SMILE1 only provides Work2 to SMILE3, while SMILE3 can access only data within Work2. Therefore, SMILE2 can access the Work1 data, while SMILE3 cannot touch Work1.
Journal of Integrative Bioinformatics 2007
http://journal.imbio.de/
Figure 14: The data sharing mechanism in SMILEs: Data sharing is enabled within SMILEs only if a friend relation is established. One SMILE which provides data is called a “service provider” and another SMILE which is provided data is called a “client”. The SMILE1 provides Prj.1(Project 1) to SMILE2, so SMILE2 can access all the data within Prj.1. However, SMILE1 provides only a Work2 to SMILE3, so SMILE3 can access only data within Work2.
Journal of Integrative Bioinformatics 2007
http://journal.imbio.de/
6 Integration with other tools SMILE can be integrated to ArrayMall [5] and GENAW(GEnetic Network Analysis Workbench) [6]. Figure 15 shows the structure of SMILE integrated with ArrayMall and GENAW. SMILE is connected with ArrayMall through the backup file of the microarray project, and is also connected with GENAW through metafile, which has a unified file format.
Figure 15: Integration of SMILE for other bioinformatics tools
• ArrayMall ArrayMall is a stand-alone system for managing and analyzing a microarray image file. ArrayMall consists of four major parts: a) DataShop for managing microarray data, b) ArrayShop for microarray image analysis, c) NormalShop for normalizing microarray data and d) the last ExpreView for analyzing gene expression patterns. • GENAW(GEnetic Network Analysis Workbench) Genetic network analysis is important to study the behavior of genes in a holistic rather than in an individual manner because the expression and activities of genes are not isolated or independent of each other. GENAW is a system for constructing a genetic network with microarray expression data. We can make input files of GENAW to generate metafiles on SMILE. After preprocessing, GENAW produces a gene regulatory network based on the Boolean network model [11], Bayesian network model and Differential equations model. A gene network model obtained by GENAW is visualized in Figure 16.
7 Conclusion There is various and vast information that identify biological meaning from microarray experiments. And collaboration experiments with microarray have been widely conducted recently. In this paper, we have designed and implemented SMILE (Small and solid MIcroarray Lims for Experimenters) as LIMS especially for microarray experiments. The main features of the SMILE system are as follows:
Journal of Integrative Bioinformatics 2007
http://journal.imbio.de/
Figure 16: Snapshot of the GENAW: GENAW constructs a genetic network from microarray expression data.
• SMILE is developed especially for shared data management. The data sharing function of SMILE is based on Friend-to-Friend (F2F). For this, in SMILE, there are two types of friends: “service provider” which provides data and “client”. • In order to manage and share various microarray data, all microarray data are managed with a hierarchical architecture in SMILE. • SMILE provides several features for microarray analysis and data visualization, which give insights to users. • SMILE classifies four user levels: administrator, project manager, researcher and visitor. And each group is only permitted for functions according to their level. • SMILE can be easily integrated with the previously developed tools, including ArrayMall and GENAW.
Acknowledgments This research was supported by a grant (M10529000001-06N2900-00110) from the Korea Science and Engineering Foundation.
References [1] Weinberg, Spelton, and Sax,Inc.: GALP Regulatory Handbook. Inc. CRC Press (1994)
Journal of Integrative Bioinformatics 2007
http://journal.imbio.de/
[2] J.Duggan, D., Bittner, M., Chen, Y., Meltzer, P. and M.Trent, J.: Expression profiling using cDNA microarray. Nature genetics 21 (1999) 10-14, 1999. [3] Legrain, P. and Selig, L.: Navigating gene expression using microarrays - a technology review. Nature Cell Biology 3 (2000) E:190-E195 [4] Shalon, D., S.J. Smith and P.O. Brown.: DNA microarray system for analyzing complex DNA samples using two-color fluorescent probe hybridization. Genome Research 6 (1996) 639-645 [5] Chun, B.k., Lee, P.Y., Jin, H.J., Jun, M.J., Yoon, J.H., Jang, C.J., Lee, K.S., Kim, H.J. and Cho, H.G.: ToMAS : software development Toolkits fot Microarray Analysis System. Proc. of ISMB, Poster (2004) [6] Kim, J.H., Lee, K.S., Kim, P.G., Cho, H.G.: Development of An Integrated System for Genetic Network Analysis and Microarray Data Management. Proc. of GIW, Poster (2003) [7] Saal LH, Troein C, Vallon-Christersson J, Gruvberger S, Borg A, Peterson C.: Bioarray software enviroment: A platform for comprehensive management and analusis of microarray data. Genome Biology 3 (2002) 0003.1-0003.6 [8] Acuity Microarray Analysis, http://www.moleculardevices.com
Visualization
and
Database
Software,
[9] Comander, J., Weber, G.M., Gimbrone, J.M.A. and Carcia-Cardena, G.: Argus-A New Database System for Web-Baseed Analysis of Multiple Microarray Data Sets. Genome Research 11 (2001) 1603-1610 [10] BioConductor, http://www.bioconductor.org/ [11] H. D. Jong. : Modeling and simulation of genetic regulatory systems a literature review. Journal of Computational Biology, 9(1) (2002) 67-103