Gene expression profiling of Duchenne muscular dystrophy skeletal

0 downloads 0 Views 186KB Size Report
Abstract The primary cause of Duchenne muscular dystrophy (DMD) is a mutation in the dystrophin gene, leading to absence of the corresponding protein, ...
Neurogenetics (2003) 4:163–171 DOI 10.1007/s10048-003-0148-x

ORIGINAL ARTICLE

Judith N. Haslett · Despina Sanoudou · Alvin T. Kho · Mei Han · Richard R. Bennett · Isaac S. Kohane · Alan H. Beggs · Louis M. Kunkel

Gene expression profiling of Duchenne muscular dystrophy skeletal muscle Received: 27 January 2003 / Accepted: 3 March 2003 / Published online: 16 April 2003  Springer-Verlag 2003

Abstract The primary cause of Duchenne muscular dystrophy (DMD) is a mutation in the dystrophin gene, leading to absence of the corresponding protein, disruption of the dystrophin-associated protein complex, and substantial changes in skeletal muscle pathology. Although the primary defect is known and the histological pathology well documented, the underlying molecular pathways remain in question. To clarify these pathways, we used expression microarrays to compare individual gene expression profiles for skeletal muscle biopsies from DMD patients and unaffected controls. We have previously published expression data for the 12,500 known genes and full-length expressed sequence tags (ESTs) on the Affymetrix HG-U95Av2 chips. Here we present comparative expression analysis of the 50,000 EST clusters represented on the remainder of the Affymetrix HG-U95 set. Individual expression profiles were generated for biopsies from 10 DMD patients and 10 unaffected control patients. Two methods of statistical analysis were

used to interpret the resulting data (t-test analysis to determine the statistical significance of differential expression and geometric fold change analysis to determine the extent of differential expression). These analyses identified 183 probe sets (59 of which represent known genes) that differ significantly in expression level between unaffected and disease muscle. This study adds to our knowledge of the molecular pathways that are altered in the dystrophic state. In particular, it suggests that signaling pathways might be substantially involved in the disease process. It also highlights a large number of unknown genes whose expression is altered and whose identity therefore becomes important in understanding the pathogenesis of muscular dystrophy. Electronic Supplementary Material Supplementary material is available for this article if you access the article at http://dx.doi.org/10.1007/s10048-003-0148-x. A link in the frame on the left on that page takes you directly to the supplementary material.

J. N. Haslett · D. Sanoudou · A. H. Beggs · L. M. Kunkel Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA

Keywords Duchenne muscular dystrophy · Expression oligonucleotide microarrays · Pathogenesis

D. Sanoudou · M. Han · R. R. Bennett · A. H. Beggs · L. M. Kunkel Division of Genetics, Children’s Hospital, Boston, Massachusetts, USA

Introduction

A. T. Kho · I. S. Kohane Children’s Hospital Informatics Program, Boston, Massachusetts, USA I. S. Kohane Division of Endocrinology, Children’s Hospital, Boston, Massachusetts, USA L. M. Kunkel Howard Hughes Medical Institute, Boston, USA L. M. Kunkel ()) Division of Genetics, Children’s Hospital, 320 Longwood Avenue, Boston, MA 02115, USA e-mail: [email protected] Tel.: +1-617-3557576 Fax: +1-617-2770496

The muscular dystrophies are a group of clinically and genetically heterogeneous disorders characterized by progressive degenerative changes in skeletal muscle fibers. In Duchenne muscular dystrophy (DMD), the most-common form of muscular dystrophy, the muscle shows fiber size variation, fiber necrosis and regeneration, centralization of nuclei, proliferation of connective and adipose tissues, and infiltration of immune cells, amongst other processes [1]. DMD is caused by mutations in the dystrophin gene, often leading to reduction of mRNA transcript levels and invariably to absence of the protein. Dystrophin is a large cytoskeletal protein associated with a large complex (known as the dystrophin-associated protein complex or DAPC) that links the cytoskeleton to

164

the extracellular matrix. This complex is disrupted in many forms of muscular dystrophy and mutations in genes encoding some DAPC components other than dystrophin are associated with other limb-girdle muscular dystrophies (LGMDs) [2, 3]. The DAPC spans the sarcolemma of skeletal muscle fibers, forming a structural link between the extracellular matrix and the cytoskeleton. The DAPC has long been thought to stabilize the membrane against contractioninduced damage. Disruption of the DAPC in DMD is proposed to break a mechanical linkage crucial for sarcolemmal integrity, leading to sarcolemmal disruption. This in turn causes elevation of intracellular free calcium, triggering calcium-activated proteases and fiber necrosis [4]. However, the DAPC has also been implicated in signaling [5], and mutations in non-DAPC proteinencoding genes have been shown to cause the muscular dystrophy phenotype, contributing to the idea that more than one molecular pathway is implicated in the disease [6, 7]. Thus, despite knowledge of the primary genetic defects and a well-documented histological pathology, the molecular pathology leading to muscle degeneration in the muscular dystrophies is poorly understood. Large-scale gene expression profiling across diseased and unaffected states allows the molecular pathophysiological pathways to be examined in a physiological context, perhaps illuminating a previously unsuspected molecular pathway. This technique has been used in a number of recent publications in which dystrophic muscle has been compared with unaffected muscle. Most of these studies have used a mouse model of muscular dystrophy [8, 9, 10, 11, 12], but comparisons of gene expression differences in DMD patients have also been published [13, 14, 15]. As this is a developing field, for which standard experimental procedures have not yet been established, and in which approaches to data analysis are constantly evolving, each of these studies applies a different approach and generates a unique, although partially overlapping, list of genes whose expression is altered. The significance of these differences in approach and interpretation and the optimum experimental and analytical design for microarray projects are questions that have yet to be fully addressed. The study presented here is an extension of previous work, in which we reported differential expression in DMD and unaffected skeletal muscle biopsies as measured on the Affymetrix HG-U95Av2 chips [which represent 12,500 known genes and full-length expressed sequence tags (ESTs)] [15]. Here we extend this approach to analyze the expression of 50,000 additional EST clusters, represented on the remainder of the HG-U95 GeneChip set. In total, we have identified almost 300 probe sets from the HG-U95 GeneChip set that are differentially expressed in DMD patients compared with unaffected control patients; 110 of these differentially expressed probe sets are on the HG-U95Av2 chip and an additional 183 probe sets were identified from the remainder of the HG-U95 GeneChip set. This contrasts with the recently reported study of Bakay et al. [13] in

which almost 1,500 U95 GeneChip set probe sets were identified as differentially expressed in DMD skeletal muscle [14]. Although it is possible that this difference is due to the patients used in each of the studies, it seems more likely that it is due to experimental and analytical differences between the two studies.

Materials and methods Patient samples Ten quadriceps biopsies from DMD patients were compared with 10 quadriceps biopsies from unaffected controls. The 10 DMD biopsies were from young (5–7 years old) males. The unaffected biopsies were primarily from young (1–10 years old) males (7 biopsies), but included 3 biopsies from adult males. The unaffected controls were biopsied because of suspected muscle weakness, but showed no signs of pathology upon histological examination and expressed normal levels of dystrophin. The DMD patients showed clinical symptoms consistent with a DMD diagnosis and the biopsies were shown to be dystrophin deficient by immunofluorescence and/or western blotting [15]. All biopsies were obtained under institutionally approved protocols. Target preparation and array hybridization Total RNA was extracted from muscle biopsies (70–120 mg) using Trizol (Invitrogen Life Technologies). Following resuspension in DEPC (diethylene pyrocarbonate)-treated water, RNA was prepared for hybridization to Affymetrix GeneChips according to the manufacturer. Double-stranded cDNA was synthesized (Superscript Double-Stranded cDNA Synthesis kit, Invitrogen Life Technologies) and used in an in vitro transcription (IVT) reaction with biotin-labeled nucleotides (Enzo BioArray High Yield RNA Transcript Labeling kit, Affymetrix). Purified (RNeasy kit, Qiagen), fragmented (35–200 nucleotides) biotinylated cRNA, together with IVT controls (according to Affymetrix recommendations), was hybridized to HG-U95B, HG-U95C, HG-U95D, and HG-U95E GeneChips for 16–18 h at 45C. An Affymetrix Fluidics Station 400 was used to perform standard post-hybridization washing and double-staining protocols. The GeneChips were scanned in an Affymetrix/Hewlett-Packard G2500A Gene Array Scanner and the resulting signals quantified and stored. Data processing Affymetrix GeneChip version 5.0 software (MAS5.0) was used for raw data processing. Custom software was developed for additional noise analysis and quality control as previously described (http:// db.chip.org) [15]. Duplicate experiments were performed periodically for reproducibility assessment. Correlation coefficients (r) between signals from replicate arrays were typically about 0.99 when hybridized with the same target and about 0.97 when hybridized with independently isolated and labeled RNA samples. The raw data sets are available on the Harvard Neuromuscular Disease Project web site (http://tch-genomics.org/hndp/publications_index.php). Statistical analysis Affymetrix GeneChip version 5.0 software (MAS5.0) was used for initial data processing; signal values, the Affymetrix computed measure of expression levels, and detection calls, the Affymetrix computed measure representing confidence in gene expression presence, were determined for each probe set. A Present detection call indicates that the particular transcript interrogated by that probe

165 set is present in sufficient quantities to be detected and quantified accurately. The signal values were not normalized with respect to one another at this point, although they were scaled to give an overall average target intensity of 1,500. The signal intensities were normalized, prior to any analysis, via a linear regression technique [16, 17, 18]. With this method the signal intensities in each experiment are transformed so that their scatter plots have a slope of 1 through the origin with respect to a reference data set. The reference data set is comprised of the average probe-by-probe signal intensities from an unaffected control experiment and a DMD experiment that have maximal average correlation coefficients against all other experiments within their disease class. Normalization corrects any uniform linear aberrations of the reported signal intensities between any two replicate measurements. In order to identify probe sets with significant intensity differences between disease classes, a standard two-tail unequal variance t-test was applied to the unaffected control data set versus the DMD data set [19]. The threshold was set at P8 data sets per group), appropriate experimental approach, and stringent statistical analysis

[15]. Two statistical methods were applied in order to distinguish significant and substantial differential expression from noise and variation due to either genetic heterogeneity or experimental idiosyncrasies. Method 1: differences A standard two-tail unequal variance t-test was applied to the two data sets, the threshold set at P