BIOINFORMATICS APPLICATIONS NOTE
Vol. 25 no. 24 2009, pages 3323–3324 doi:10.1093/bioinformatics/btp577
Gene expression
Exon Array Analyzer: a web interface for Affymetrix exon array analysis Pascal Gellert, Shizuka Uchida and Thomas Braun∗ Max-Planck-Institute for Heart and Lung Research, Parkstrasse 1, 61231 Bad Nauheim, Germany Received on July 27, 2009; revised on September 4, 2009; accepted on September 28, 2009 Advance Access publication October 12, 2009 Associate Editor: Trey Ideker
1
INTRODUCTION
Alternative splicing is a post-transcriptional mechanism to increase the number of functionally distinct proteins by excluding or including exons. It is predicted that >70% of human genes undergo alternative splicing (Johnson et al., 2003), and the most recent reports estimated this number to be >90% (Wang and Cooper, 2007). Conventional 3 microarrays are only suitable to analyze expression levels of entire genes while GeneChip Exon 1.0 ST Arrays (exon arrays) are able to detect expression of individual putative exonic region. The analysis of exon arrays, however, is more difficult and requires manual inspection of each potential candidate. The Exon Array Analyzer (EAA) is an all-in-one tool to analyze exon arrays, which includes preprocessing of raw CEL files, detection of alternative spliced exons and genome viewer. Furthermore, our web tool displays sequence information to facilitate primer design and includes extensive filters, which is not offered by other applications. Although other tools such as ‘bioconductor’allow more flexible control of the analysis, exploitation of these considerable
∗ To
whom correspondence should be addressed.
computer skills are needed to exploit these features, which limits its use for the average biologist. Moreover, tools such as R, X:map and the oneChannelGUI require an expensive setup and advanced programming skills for a flexible analysis, which is not the case for our EAA. Others, such as ExonMiner and easyExon, are free and user friendly tools but do not provide sequence information and are unable to filter potential false positive results using advanced settings. We reason that the unique features together with its simple interface and ease of usage makes our EAA as a valuable and versatile tool for the analysis of exon arrays.
2
IMPLEMENTATION
The EAA is a web interface written in PHP5 and uses an MySQL database. The tool runs on a server with Linux and Apache2. Because the analyses, especially the preprocessing step, are computationally challenging, the EAA runs on a dual core processor with 2 GHz and 4 GB RAM. For an analysis (‘core’ exon and gene set, default settings), the EAA needs ∼3 h. A queue ensures that only a limited number of analyses are executed at the same time and enough computational power is left for displaying results. The workflow of a typical analysis is shown in Figure 1. First, after the user has uploaded the binary CEL files to the server, the processing takes place by executing the Affymetrix Power Tools (http://www.affymetrix.com). These command line tools are responsible for background correction, normalization and summarization of raw signals. They implement RMA (Irizarry et al., 2003) and Iter-PLIER (Affymetrix Inc., 2005) for exon and gene-level processing as well as the Detection Above Background (DABG) method. This algorithm is used to calculate P-values for each probe set signal, which can be used as absence/presence calls by the filters in the next steps. Results of the preprocessed files are stored on the server and can be downloaded as flat files after the analysis is completed. In the next step, up to five different filters are applied to reduce the number of false positive results. Afterwards, gene-level normalized intensities and the Splice Index (Clark et al., 2002) are calculated. All filters and the Splice Index calculation are implemented by Perl scripts. To obtain statistical evidence, the Student’s t-test of gene-level normalized intensities is calculated and corrected for multiple comparison (Benjamini and Hochberg, 1995) using R (www.R-project.net). The Splice Index, gene level normalized intensities and P-values are imported into MySQL tables, which are used by the web interface to display results and generate graphics. A more detailed description of the implementation can be found on the EAA web site. The uploaded data and generated results are protected by a password, which can be set up during the first step. This ensures that all data are protected against unauthorized access. The CEL files themselves are deleted immediately after the analyses have been finished. All other files, including MySQL tables, are deleted 72 h after
© The Author 2009. Published by Oxford University Press. All rights reserved. For Permissions, please email:
[email protected]
[15:10 9/11/2009 Bioinformatics-btp577.tex]
Downloaded from http://bioinformatics.oxfordjournals.org/ at Max Planck Institut on March 2, 2013
ABSTRACT Summary: The Exon Array Analyzer (EAA) is a web server, which provides a user-friendly interface to identify alternative splicing events analyzed with Affymetrix Exon Arrays. The EAA implements the Splice Index algorithm to identify differential expressed exons. The use of various filters allows reduction of the number of false positive hits. Results are presented with detailed annotation information and graphics to identify splice events and to facilitate biological validations. To demonstrate the versatility of the EAA, we analyzed exon arrays of 11 different murine tissues using sample data provided by Affymetrix (http://www.affymetrix.com). Data from the heart were compared with other tissues to identify exons that undergo heart-specific alternatively splicing, resulting in the identification of 885 differentially expressed probe sets in 649 genes. Availability: The web interface is available at http://EAA .mpi-bn.mpg.de/. Detailed documentation is available on the EAA web site (http://EAA.mpi-bn.mpg.de/supp.php) including screen shots, example analyzes and step by step instructions. Contact:
[email protected] Supplementary information: Supplementary data are available at Bioinformatics online.
3323
Page: 3323
3323–3324
P.Gellert et al.
Fig. 1. Workflow of the EAA. Once the CEL files are uploaded to the server, preprocessing is accomplished by the Affymetrix Power Tools (red boxes). Afterwards, potential false positive results are filtered, and gene level normalized intensities, Splice Index and P-values are calculated (green boxes). Results are imported into a database and can be queried by the user.
the processing has finished. This period can be extended in the options menu by the user as often as required.
3
PROGRAM FEATURES
The EAA enables analysis of exon arrays in three simple steps without any knowledge on programming or algorithms. A detailed step-by-step instruction how to set up an analysis can be found in the EAA help section. One main feature of the program are filters, which can be set using different parameters. Detection of non-existing splice events might be due to the lack of gene expression, absence of expression of individual exons in different analyzed conditions and cross hybridization of labeled material as described in the Affymetrix Technical Note. Up to five different filters can be selected, and their stringency can be changed according to the users needs. Results of the analyses can be filtered by certain thresholds for Splice Index and P-value. For each gene, several annotations are provided. For further information about the genes and their isoforms, links to NCBI, ENSEMBL, AceView (http://www.ncbi.nlm.nih.gov/IEB/Research/Acembly) and ASTD (http://www.ebi.ac.uk/astd) are provided. The EAA generates box plots of the gene-level normalized signals. This enables to visually inspect all expression signals of a gene at once. Exon arrays do only work if exons are differentially expressed. They do not include probe sets to directly detect splice events such as exon junction arrays. To identify more complex splice events such as exon inclusion or exclusion, another output graphics are provided by the interface. It enables the user to compare detected sequences with isoforms from the ENSEMBL database to compare with known splice events. The probe sets of a gene are shown in
4
EXAMPLE ANALYSES
We analyzed exon arrays of 11 different murine tissues using sample data provided by Affymetrix (http://www.affymetrix.com). Data from the heart were compared with other tissues to identify exons that undergo heart-specific alternative splicing resulting in the identification of 885 differential expressed probe sets (P < 0.01 and Splice Index greater than 1 or less than −1) in 649 genes (Supplementary Table 1). In addition, we used our Databasedependent Gene Selection and Analysis approach (Uchida et al., 2009) to select heart-enriched genes within the EAA result. We manually selected several genes for validation by RT-PCR using cDNA from 15 different murine tissues. Supplementary Figure 1 shows eight validated splice events as predicted by the EAA. Along with other examples, the results of the analyses are available via the web interface. Funding: Max-Planck-Society; the DFG (Br1416); the EU Commisson (MYORES network of excellence); the KerckhoffFoundation and the Excellence Initiative ‘Cardiopulmonary System’ (T.B.). Conflict of Interest: none declared.
REFERENCES Affymetrix Inc. (2005). Guide to probe logarithmic intensity error (PLIER) estimation. Available at http://www.affymetrix.com/support/technical/technotes/ plier_technote.pdf Benjamini,Y. and Hochberg,Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodol.), 57, 289–300. Clark,T.A. et al. (2002) Genomewide analysis of mRNA processing in yeast using Splicing-Specific microarrays. Science, 296, 907–910. Irizarry,R.A. et al. (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostat., 4, 249–264. Johnson,J.M. et al. (2003) Genome-wide survey of human alternative pre-mRNA splicing with exon junction microarrays. Science, 302, 2141–2144. Uchida,S. et al. (2009) An integrated approach for the systematic identification and characterization of heart-enriched genes with unknown functions. BMC Genomics, 10, 100. PMC2657154. Wang,G. and Cooper,T.A. (2007) Splicing in disease: disruption of the splicing code and the decoding machinery. Nat. Rev. Genet., 8, 749–761.
Downloaded from http://bioinformatics.oxfordjournals.org/ at Max Planck Institut on March 2, 2013
their genomic position beside the exons of the known transcripts of the gene. Probe sets with high or low Splice Index are colored for fast and easy identification. The graphics are zoomable and can be enlarged by clicking on it for genes with many exons. Comparison to known splice events makes it possible to identify internal starts or stops and mutually exclusive exons, which otherwise might lead to false negative results during RT-PCR validation. To simplify the primer design process, the web interface implements a unique feature, which provides sequence information of exons and probe sets. A click on a probe set in the ENSEMBL comparison graphics opens a new window, which displays the sequence of the selected exon together with the sequences of the surrounding exons. This output can be directly imported into the Primer3 (http://frodo.wi.mit.edu/primer3) web interface to allow identification of suitable primers around the selected exon. A brief overview of the main features compared to other currently available tools can be found in Supplementary Table 2.
3324
[15:10 9/11/2009 Bioinformatics-btp577.tex]
Page: 3324
3323–3324