Combining RNA and Protein Profiling Data with Network Interactions ...

2 downloads 55988 Views 6MB Size Report
Jan 21, 2015 - Downloaded from www .biolreprod.org. ... decapsulated and fixed for 10 min in 1% formaldehyde .... was edited using Illustrator 15.0.0 (Adobe).
BIOLOGY OF REPRODUCTION (2015) 92(3):71, 1–18 Published online before print 21 January 2015. DOI 10.1095/biolreprod.114.126250

Combining RNA and Protein Profiling Data with Network Interactions Identifies Genes Associated with Spermatogenesis in Mouse and Human1 Fabrice G. Petit,3,4 Christine Kervarrec,3,4 Soazik P. Jamin,3,4 Fatima Smagulova,4 Chunxiang Hao,4 Emmanuelle Becker,4 Bernard Je´gou,4,5 Fre´de´ric Chalmel,4 and Michael Primig2 4 5

Inserm U1085-IRSET, Universite´ de Rennes 1, Rennes, France EHESP-School of Public Health, Rennes, France

Genome-wide RNA profiling studies have identified hundreds of transcripts that are highly expressed in mammalian male germ cells, including many that are undetectable in somatic control tissues. Among them, genes important for spermatogenesis are significantly enriched. Information about mRNAs and their cognate proteins facilitates the identification of novel conserved target genes for functional studies in the mouse. By inspecting genome-wide RNA profiling data, we manually selected 81 genes for which RNA is detected almost exclusively in the human male germline and, in most cases, in rodent testicular germ cells. We observed corresponding mRNA/protein patterns in 43 cases using immunohistochemical data from the Human Protein Atlas and large-scale human protein profiling data obtained via mass spectroscopy. Protein network information enabled us to establish an interaction map of 38 proteins that points to potentially important testicular roles for some of them. We further characterized six candidate genes at the protein level in the mouse. We conclude that conserved genes induced in testis tend to show similar mRNA/ protein expression patterns across species. Specifically, our results suggest roles during embryogenesis and adult spermatogenesis for Foxr1 and Sox30 and during spermiogenesis and fertility for Fam71b, 1700019N19Rik, Hmgb4, and Zfp597. epigenetics, gene expression, genomics, interactome, male fertility, microarray, proteome, RNA-Seq, spermatogenesis

INTRODUCTION Meiosis and gametogenesis is a highly conserved developmental pathway critical for sexually reproducing organisms. Genetics, molecular biology, and functional genomics experiments carried out with yeast, worm, and mouse model organisms have yielded information about the roles that conserved protein-coding genes play during meiotic development in males and females [1–6]. It is noteworthy that more genes encoded by mammalian genomes appear to be expressed in gonads than in most somatic tissues, including many loci that seem to exert their 1 This work was supported by Inserm Avenir grant R07216NS and Region de Bretagne CREATE grant NR11016NN to M.P. and by funds from Inserm, Universite´ de Rennes 1, and EHESP-School of Public Health. 2 Correspondence: Michael Primig, Inserm U1085 IRSET, Universite´ de Rennes 1, F-35042, Rennes, France. E-mail: [email protected] 3 These authors contributed equally to this work.

Received: 28 October 2014. First decision: 1 December 2014. Accepted: 13 January 2015. Ó 2015 by the Society for the Study of Reproduction, Inc. eISSN: 1529-7268 http://www.biolreprod.org ISSN: 0006-3363

1

Article 71

Downloaded from www.biolreprod.org.

roles exclusively in the germline [7–10]. RNA profiling analyses of human samples have recently been complemented by high-throughput protein profiling studies of human embryonic samples and adult tissues based on mass spectrometry (www.humanproteomemap.org) [11] and large-scale protein localization assays using immunohistochemistry (IHC) and immunofluorescence (IF) (Human Protein Atlas [HPA]; www.proteinatlas.org) [12]. Data on the patterns of mRNA and protein detection are completed by a plethora of epigenetic marks associated with the regulation of gene expression, including the activation mark histone H3 lysine 4 trimethylation (H3K4me3), which have been identified in epigenomics experiments using chromatin immunoprecipitation and DNA sequencing (ChIP-Seq) [13–15]. While functional genomics—that is, the large-scale analysis of gene function by deleting genes or via RNA interference—is feasible in simple organisms, it remains a challenge in mammals, where high-throughput functional data are typically obtained in cultured cell lines [16]. Therefore, genome-wide RNA expression studies using microarrays, RNA sequencing (RNA-Seq), and protein profiling experiments based on mass spectrometry have been employed to associate genes with biological processes during various developmental stages in distinct tissues and organs [17–19]. Moreover, protein network data obtained with assays detecting direct binding (yeast twohybrid system) or indirect interactions (coimmunoprecipitation) help identify genes relevant for growth, development, and disease [20–22]. Such genomics experiments set the stage for further molecular and cell biological work to determine the roles of individual loci. Budding yeast was the first organism for which genomics yielded a comprehensive picture of gene function during growth and development [23]. The mouse field is now entering this phase by establishing the International Mouse Phenotyping Consortium (IMPC), which aims to create knockout strains for all known mouse genes over the course of several years [24]. The experimental approach is to first inactivate a target gene in all cells and then, whenever it is necessary, to delete it in specific tissues using the Cre recombinase (Cre-Lox system) [25, 26]. The production of transgenic mice is followed by an array of phenotypic assays that broadly cover a wide range of anatomical, morphological, and physiological features, including male and female fertility [27]. However, elaborate mechanistic studies and detailed analyses of more or less subtle effects or unexpected phenotypes will have to be carried out by members of the respective fields. In the present study, we integrate data on DNA, RNA, proteins, and interaction networks to identify novel proteincoding genes likely to be important for male germline development and fertility. In addition, we find that an epigenetic activation mark and peak concentrations of mRNAs in germ cells correspond to the presence of their cognate

ABSTRACT

PETIT ET AL. The complete data set will be published elsewhere (Hao and Smagulova, unpublished data).

proteins in mouse and human testicular sections in the cases of six selected target genes. Our data facilitate efforts to gain insight regarding the genetics of male gamete formation and gamete function and therefore may help find molecular causes for certain cases of unexplained male infertility in humans [28].

RNA-Seq Expression Data Processing We obtained RNA-Seq data (not DNA strand specific) generated with a HiSeq 2000 system (Illumina) and enriched testicular cells purified from wildtype mice (F1 of a cross between DBA/2NCrlVr females and 129S2/ SvPasCrlVr males) [31]. For each cell type, DNA sequence reads were mapped to the mouse genome (version mm9) using TopHat 2.0 set at default parameters [32]. Reads were assembled, and differential expression analysis was performed using Cufflinks [33]. For each transcript, normalized values for fragments per kilobase per million reads were extracted. To identify transcripts specifically induced in round spermatids and elongated spermatids, we first selected the cases for which we obtained values greater than the 20th quantile of all values in at least one condition, and then we filtered transcripts showing more than a 3-fold change between two conditions. Finally, we employed a statistical test implemented in the R package limma with the false-discovery rate set at less than 5% [34]. RNA-Seq data were visualized using the Integrated Genomics Viewer (version 2.3.32) [35] and scripts written in R (www. R-project.org/) [36].

MATERIALS AND METHODS Ethics Statement Regarding human samples, the local ethics committee approved the experimental protocol ‘‘Study of Normal and Pathological Human Spermatogenesis’’ (registration PFS09-015) at the French Biomedicine Agency; informed consent was obtained from all donors. Studies involving animals, including housing and care, method of euthanasia, and experimental protocols, were conducted in accordance with the recommendations of the French Accreditation of Laboratory Animal Care. We note that current French legislation (regulation 2013-118, article R.214-89) stipulates that euthanizing laboratory animals for the sole purpose of investigating their tissues does not require prior authorization by an ethics committee. The French Ministry of Agriculture licensed the animal facility (agreement C35-238-19). Animals were killed using CO2, whereby all efforts were made to minimize their suffering.

HPA Data Integration To validate RNA profiles, the HPA (proteinatlas.org) [12] was used as previously described [7]. Briefly, data from IHC assays for normal tissues were retrieved from www.proteinatlas.org/about/download (version 12.0). To integrate IHC and GeneChip data, Ensembl gene identifiers provided by the HPA were first converted into National Center for Biotechnology Information Entrez Gene identifiers and then U133 Plus 2.0 probe set identifiers using Annotation, Mapping, Expression and Network (AMEN) [37]. To combine GeneChip data across species, probe sets for mouse and rat transcripts were linked to the corresponding human probe sets via HomoloGene identifiers as previously published [8, 38].

Publically available protein network data and expression data obtained with Affymetrix GeneChips for human (U133 Plus 2.0), mouse (Mouse Genome 430 2.0), and rat (Rat Genome 230 2.0) samples were processed and visualized as previously described [7, 8].

ChIP-Seq Data Production Testes from a 3-mo-old C57Bl/6JRj mouse (Janvier Labs) were decapsulated and fixed for 10 min in 1% formaldehyde (Sigma-Aldrich). To prepare cell nuclei, the fixed tissue was homogenized, filtered through a 40-lm Falcon Cell Strainer (Fisher Scientific), and washed twice for 5 min in 13 PBS and once each in 0.25% Triton X-100, 10 mM ethylenediaminetetra-acetic acid (EDTA), 0.5 mM ethyleneglycoltetra-acetic acid (EGTA), and 10 mM Tris (pH 8) and in 0.2 M NaCl, 1 mM EDTA, 0.5 mM EGTA, and 10 mM Tris (pH 8). Nuclei were lysed in 1.5 ml of lysis buffer (1% SDS, 10 mM EDTA, and 50 mM Tris [pH 8]) and Complete Protein Inhibitor Cocktail (Roche). The chromatin was sheared to fragments of approximately 500 bp by six rounds of sonication (10 sec on, 30 sec off) using a Branson Sonifier (Diagenode) set at medium strength. The sample was diluted and dialyzed for 4 h against chromatin immunoprecipitation buffer (0.01% SDS, 1.1% Triton X-100, 1.2 mM EDTA, 16.7 mM Tris [pH 8], and 167 mM NaCl). As the input control, 1/ 2000 of the chromatin sample was kept. The remaining material was incubated with an anti-H3K4me3 antibody (Millipore) overnight at 48C with 50 ll of Dynabeads (Invitrogen). Beads were washed consecutively five times for 5 min each in three buffers—1) 20 mM Tris (pH 8), 150 mM NaCl, 2 mM EDTA, 0.1% SDS, and 1% Triton X-100; 2) 20 mM Tris (pH 8), 500 mM NaCl, 0.25 M LiCl, 1% IGEPAL CA-630 (Sigma-Aldrich), and 1 mM EDTA, and 3) 10 mM Tris (pH 8), 1% deoxycholic acid, 2 mM EDTA, 0.1% SDS, and 1% Triton X-100—and twice for 5 min each in 10 mM Tris (pH 8) and 1 mM EDTA. The chromatin was eluted in 1% SDS and 0.1 M NaHCO3 (pH 9) at 658C, and cross-linking was reversed by incubation at 658C for 5 h. Proteins were removed from DNA for 1 h at 458C using 40 lg of Proteinase K (New England Biolabs), and DNA was purified with a MinElute Reaction Cleanup Kit (Qiagen). The DNA concentration was measured with a Quantus Fluorometer (Promega) using the Quantiflor dsDNA system (Promega). The DNA sequencing library was prepared from 50 ng of immunoprecipitated material and input material using the Next Ultra DNA library Prep Kit for Illumina (New England Biolabs). Samples were run on a 2% agarose gel containing GelGreen stain (Gentaur), and 180- to 400-bp fragments were excised on a Dark Reader transilluminator (Clare Chemical Research). The HiSeq2000 System (Illumina) was used to perform RNA sequencing.

Human Proteome Data Visualization We uploaded 81 gene names to the online database of the Human Proteome Map project (www.humanproteomemap.org) [11] and retrieved a graphical display of available protein detection patterns across the sample set. The output was edited using Illustrator 15.0.0 (Adobe).

Isolation of Embryonic and Cellular Samples and RT-PCR GC1 (ATTC-CRL-2053) and GC2 (ATTC-CRL-2196) cells were cultured in Dulbecco Modified Eagle Medium (DMEM, Invitrogen) supplemented with 10% fetal bovine serum (FBS, Invitrogen). Cell were grown at 378C in 5% CO2. Fetal tissues were disrupted in RLT Plus buffer (Qiagen) using the Tissue Lyser II (Qiagen). Total RNAs were isolated using the RNeasy Mini Kit (Qiagen) according to the manufacturer’s instructions. Reverse transcription was performed with the High-Capacity RNA-to-cDNA Kit (Life Technologies) using 125 ng of total RNA in a total volume of 20 ll. The samples were incubated for 1 h at 378C and 5 min at 958C to inactivate the enzyme. The primer pairs used are shown in Table 1. PCR was performed using HotStarTaq Plus DNA Polymerase (Qiagen) with 2 ll of cDNA in a total volume of 15 ll. The amplification conditions were 958C for 5 min, followed by 35 cycles of 948C for 1 min, 588C for 1 min, and 728C for 1 min, with a final extension at 728C for 10 min. The amplified PCR fragments were loaded on 2% agarose gels. The O’GeneRuler 1 kb DNA ladder (Fermentas) was used as a molecular weight marker.

Protein Detection by IF C57Bl6/6JRj mice were provided by Janvier Labs and used to isolate adult testes, ovaries, uterus, brain, kidney, liver, muscle, and mouse embryos (at Embryonic Day [E] 12.5). Samples were dissected in 13 PBS, fixed in 4% paraformaldehyde at 48C, washed with 13 PBS, dehydrated in graded ethanol (from 70% to 100%), embedded in paraffin, and cut into sections (thickness, 5 lm). Sections were deparaffinized using the aliphatic hydrocarbon-based reagent Ottix Plus (MM France Microm) and rehydrated in 100% ethanol, 95% ethanol, and distilled water for 5 min each. Antigen retrieval was performed in antigen-unmasking solution (Vector Laboratories) for 20 min at 908C. Specimens were saturated in blocking buffer (13 PBS, 5% normal serum, and 0.3% Triton X-100) for 60 min. Primary antibodies (SOX30, 1:200 [ab26024; Abcam]; FOXR1, 1:100 [ab74578; Abcam]; C10orf82, 1:50 [ab117935; Abcam]; HMGB4, 1:200 [SAB2104592; Sigma-Aldrich];

ChIP-Seq Data Processing Tags from duplicate mouse testicular samples were sequenced. Next, reads at a length of 50 bases were aligned to the mouse genome (version mm9) using Bowtie 1.0.0 [29]. Only tags that passed the quality filter and mapped only once to the genome were included in the subsequent steps of analysis. Peaks corresponding to H3K4me3 marks were identified using MACS 2.0.10 [30].

2

Article 71

Downloaded from www.biolreprod.org.

Microarray Expression Data and Protein Network Data Processing

IDENTIFYING GENES EXPRESSED IN SPERMATOGENESIS TABLE 1. Oligonucleotides used for RT-PCR. Oligonucleotide name Q-m-HmgB4-374-s Q-m-HmgB4-476-a Q-m-Fam71B-676-s Q-m-Fam71B-804-a Q-m-Zfp597-2468-s Q-m-Zfp597-2595-a Q-m-FoxR1-403-s Q-m-FoxR1-532-a Q-m-1700019N19Rik-563-s Q-m-1700019N19Rik-693-a Q-m-Sox30-966-s Q-m-Sox30-1092-a

5 0 –3 0 Sequence

Bases

PCR product size

GAG-AAG-AAA-GAG-GGA-CCC-AAA-G CCA-GTC-GGG-GTT-CTC-TTG CGT-TGA-CAC-CAA-AAG-CAC-AC GGT-TGT-GTC-CTC-TCT-CCT-CCT TCC-AGA-TCC-CAC-TGA-AGC-A CAT-GGA-CAG-TCG-CAG-ATT-TTT GTG-CCT-CTT-CCC-CTA-GCA-C CCA-AGG-CAA-TCA-GAT-GGA-AG GGT-CTC-TCA-CCC-GCA-TTT-C TCC-CCA-TAC-ACA-TTC-CTG-CT CCC-TGA-CAC-CAG-TGC-CTA-TT TCC-TTA-CTG-AAG-GGA-GTA-TCA-GGT

22 18 20 21 19 21 19 20 19 20 20 24

103 bp

128 bp 130 bp 131 bp 127 bp

selection criteria included that the mRNA profiles were 1) specifically expressed in testis or preferentially expressed in testis (see [7, 8]) and 2) the mRNA expression profiles were confirmed at the protein level in at least one species (mouse or human). For 53 genes, microarray expression data from mouse or rat (or both) reiterated the patterns found in human, whereas Fam81b and Ddi1 yielded different patterns across species (Fig. 1) [8]. We next complemented our GeneChip data by integrating a published RNA-Seq data set covering enriched mouse testicular cells [31]; RNA-Seq experiments yield data on expression levels and transcript architecture [40]. As expected, for 66 mouse orthologs of 81 human target genes, we found peak expression in spermatocytes and spermatids (Fig. 2). Mouse RNA-Seq data rectify the results obtained with GeneChips in the cases of Ddi1, Fbxo39, Mll10, Tmco5, and Tssk4, which are indeed induced in spermatids, and they confirm that Asxl3 is not detectable using high-throughput methods in the male rodent germline. We conclude that highthroughput RNA profiling experiments using GeneChips or RNA-Seq yield reproducible patterns across methods and species in the majority of the cases.

Meiotic Chromosome Spreads Samples were processed as previously described [39]. Briefly, adult male seminiferous tubules isolated were macerated in 13 PBS. The cells were released from the tubules by pipetting and filtered through a 40-lm Falcon cell strainer (Fisher Scientific). The cells were pelleted, washed with 13 PBS, resuspended in 0.5% NaCl, and allowed to adhere on glass slides for 15 min. The slides were fixed in 2% paraformaldehyde with 0.03% SDS for 3 min and 2% paraformaldehyde for 3 min. The slides were washed three times in 0.4% Photo-Flo 200 (Kodak) for 1 min and then air-dried at room temperature. The slides were incubated with blocking solution (1% donkey serum, 0.3% BSA, and 0.005% X-100 in 13 PBS) for 20 min at 378C in a humidity chamber. Primary antibodies against HMGB4 and SCP3 (SC-74569, Santa Cruz Biotechnology) were diluted 1:100 in blocking buffer and incubated under the same conditions for 1 h. Slides were washed twice for 5 minutes each time in 0.4% Photo-Flo in 13 PBS solution, treated with blocking solution for 5 min, and incubated with an fluorescein isothiocyanateconjugated anti-mouse secondary antibody (Invitrogen) and Alexa 594-conjugated anti-rabbit secondary antibody (Invitrogen) at a dilution of 1:100 for 20 min at room temperature. Finally, slides were washed twice with 0.4% Photo-Flo in 13 PBS, rinsed twice with 0.4% Photo-Flo, and allowed to air-dry. UV microscopy was done with an AxioImager microscope (Zeiss).

Messenger RNA Expression Profiles Are Confirmed by Protein Detection Patterns in Male Germ Cells We next used published protein profiling data from a recent mass spectrometry study of fetal tissues, adult organs, and cultured cell lines to validate our RNA data at the protein level [11]. Interpretable information was retrieved for 62 human proteins, including 41 (66%) that were found in testis (albeit not always exclusively) and 21 (34%) that were detected in tissues other than adult male gonads (Fig. 3). We then compared these results with the output of the HPA’s high-throughput IHC assays. Such large-scale approaches inevitably include false-positive and false-negative signals; however, this was not a major issue here because we used HPA data to identify totally or partially corresponding mRNA/protein patterns in the male germline [12]. For the majority of 81 target proteins currently covered by the database, we indeed found staining patterns annotated as testicular cells within seminiferous ducts, indicating that these genes are likely transcribed and translated predominantly (if not specifically) in the male germline (Fig. 4). In the cases of 30 genes, we observed overlapping signals, whereby the proteins are detected not only in testis but in more than three somatic tissues as well. For SLC6A16, TSGA13, C2orf77, C4orf17, and C10orf82, no testicular signal is reported by the HPA. We thus observed similar patterns of peak expression for mRNA and their cognate proteins in the majority of the cases analyzed.

Protein Detection by IHC The sections were incubated overnight at 48C with the goat polyclonal antiFAM71B antibody (SC-246744; Santa Cruz Biotechnology) used at 1:500 in 13 PBS, 0.1% (v/v) Tween-20, and 2% BSA (PBST-BSA). After several washing steps in 13 PBS, sections were incubated for 1 h with a biotinylated rabbit anti-goat antibody (DakoFrance) at 1:500 in 13 PBST-BSA. Samples were subsequently washed in 13 PBS and incubated for 30 min with a streptavidin–peroxidase complex (Dako) at 1:500 in 13 PBS. Immunostaining was revealed with a diaminobenzidine solution (Sigma-Aldrich). Finally, sections were counterstained with hematoxylin and mounted in Eukitt. Slides were examined using an AxioImager microscope (Zeiss).

RESULTS Candidate Genes Show Conserved Patterns of mRNA Induction in the Male Germline In earlier work, we identified human transcripts predominantly or specifically expressed in testicular cells [7]. From these transcripts, we manually selected 81 typically poorly characterized target genes for further analysis. The major 3

Article 71

Downloaded from www.biolreprod.org.

ZNF597, 1:200 [SC-132575; Santa Cruz Biotechnology]) were incubated overnight at 48C in dilution buffer (13 PBS, 1% bovine serum albumin [BSA], and 0.3% Triton X-100). Sections were washed three times in 13 PBS and incubated for 1 h at room temperature with secondary antibodies (Alexa Fluor 594 chicken anti-rabbit [A21442; Invtrogen], Alexa Fluor 594 goat anti-mouse [A11001; Invitrogen], and Alexa Fluor 594 donkey anti-goat [A11058; Invitrogen]) diluted at 1:500 in 13 REAL Antibody Diluent (Dako). After three washing steps in 13 PBS, Vectashield containing 1.5 lg/ml of 4 0 ,6diamidino-2-phenylindole (DAPI; Vector Laboratories) was added, and the slides were mounted in Eukitt (Labnord). Ultraviolet (UV) microscopy was done with an AxioImager microscope equipped with an AxioCam MRc5 camera and AxioVision software (version 4.8.2; Zeiss).

129 bp

PETIT ET AL.

Downloaded from www.biolreprod.org. FIG. 1. Cross-species RNA profiling using GeneChips. Expression data are shown for human, mouse, and rat. Each column corresponds to a sample and each row to a probe set; some transcripts are covered by multiple probe sets that do not necessarily yield similar patterns. Columns showing human, mouse, and rat data are marked at the top. Gene symbols are given to the right. Human sample names are given at the bottom for prepubertal biopsies from patients with high, intermediate, and low risk for infertility (HIR, IIR, and LIR, respectively) and from adult patients with seven different Johnson Scores (JS) as indicated and 45 nontesticular controls as well as duplicate chondrocyte (CC), vascular smooth muscle (VSM), spermatocytes (SC), spermatogonia (SG), seminiferous tubules (TU), and total testis (TT). Mouse and rat duplicate samples are Sertoli cells (SE), spermatogonia (SG), spermatocytes (SC), spermatids (ST), and total testis (TT). Human gene symbols are shown at the right for each probe set. A color scale for log2-transformed data is given.

4

Article 71

IDENTIFYING GENES EXPRESSED IN SPERMATOGENESIS

Downloaded from www.biolreprod.org.

FIG. 2. Transcript profiling using RNA-Seq. The output of an RNA-Seq experiment analyzing mouse Sertoli cells (Sertoli), primitive spermatogonia (primSG), type A spermatogonia (SGA), type B spermatogonia (SGB), leptotene spermatocytes (lepSC), pachytene spermatocytes (pachySC), round spermatids (roundST) and elongated spermatids (elongatedST) is represented as a heatmap. The genes are indicated to the right; symbols for human orthologs are shown when they differ from the mouse symbols. Mouse gene symbols are shown. Human gene symbols are indicated when they differ from the mouse gene names. Color scale shows the fragments per kilobase per million reads (FPKM) values and the percentiles.

FIG. 3. Protein profiling using mass spectrometry. The pattern of detection is summarized for human proteins shown to the left in fetal and adult tissues and cell lines as indicated. A color code for protein signal intensity is given. Ovary and testis samples from fetuses and adults are annotated in blue and red, respectively.

cases that are potential hub proteins (having more than 10 interactors): ASB9, DMRTB1, MLLT10, PDCL2, RNF168, SOX30, and ZNF165. Furthermore, we found that seven proteins—HMGB4, C2orf57, C6orf201, PASD1, PDZD9, SPATA7, and SUN5—interact with APP (amyloid beta protein precursor), a regulator of neuronal development associated with spermatogenesis [41] (for review, see [42]). The network also contains 11 smallest isolated connected components—C2orf73, CCDC36, CXorf48, HSPB9, KIAA1683, STK31, TEX28, TEX36, ZNF576, ZCCHC13, and ZNHIT2—that bind up to six interactors. Note that the latter 11 components appear not to be linked to the principal component. The results concerning hub proteins underline that network interactions, which are available for nearly half of the 81 target proteins, point to promising

Protein Network Interactions Identify Hub Proteins Potentially Important for Spermatogenesis To collect further evidence revealing testicular proteins potentially important for male fertility, we interpreted protein network data and found that 38 of our targets bind other proteins (Fig. 5) (see Supplemental Data S1 for annotation and expression data associated with their interactors; Supplemental Data are available online at www.biolreprod.org). We identified one principal connected component (within which all proteins are linked) containing 27 of our target proteins. This includes seven 5

Article 71

PETIT ET AL.

Downloaded from www.biolreprod.org. FIG. 4. IHC profiling of human proteins. A color-coded heatmap shows GeneChip expression data as in Figure 2. For the human proteins shown to the right, the IHC signals provided by the HPA are given for testicular cells (red), Leydig cells, and nontesticular samples as indicated at the bottom. White space indicates that the protein is not yet covered by the HPA.

candidates for further study (see Discussion for RNF168 and DMRTB1/DMRT6).

and the testicular protein localization pattern of six candidate genes in the mouse possibly involved in gene regulation, sex body formation, and RNA processing. Loci were manually selected based on their mRNA/protein patterns, testis specificity, and/or network interactions. We emphasize that all commercially available antibodies used were certified for IHC/IF experiments using human and/or mouse samples by the supplier and the HPA. The HPA carries out extensive tests

mRNA/Protein Profiles of Selected Human Candidates Are Confirmed by the H3K4me3 Activation Mark and Protein Localization in Mouse Subsequently, we characterized the epigenetic activation mark (H3K4me3), the mRNA level and transcript architecture, 6

Article 71

IDENTIFYING GENES EXPRESSED IN SPERMATOGENESIS

the protein localizes to round spermatids and elongated spermatids (Figs. 3 and 4). We complemented this data set with an IHC assay using adult mouse testicular sections and found strong staining of elongated spermatids (Fig. 6C) and a weaker signal in Leydig cell nuclei, but no equivalent signals in adult brain, kidney, liver, and muscle controls and in an E12.5 embryo (Fig. 6, D and E). We conclude that RNA profiles and IHC protein localization patterns are similar for human and mouse FAM71B. 1700019N19Rik was selected because it interacts with CCNB1IP1 and TEX11, two proteins important for RNA processing in the germline [44–46]. We found that 1700019N19Rik is expressed in pachytene spermatocytes, round spermatids, and elongated spermatids. A strong H3K4me3 signal and RNA-Seq data confirmed the current exon annotation but appeared to show two additional exons in the gene’s 5 0 region and a hugely extended 3 0 untranslated region (UTR) (Fig. 7A). RT-PCR data validated highthroughput signals in adult germ cells, and we also found a weak signal in GC2 cells (derived from spermatocytes) and GC1 cells (derived from spermatogonia). The signal in GC1/2 cells includes an additional slow-migrating band; we obtained a similar result with embryonic muscle and lung samples (Fig. 7B). RNA data and protein profiling information for human C10orf82 published in the HPA do not confirm each other because the mRNA is found only in male post-meiotic germ cells (Figs. 1, 2, and 7A), whereas the protein is absent in seminiferous tubules but present in at least 26 somatic cell

before a reagent is validated and used for IHC/IF experiments (for more details, see www.proteinatlas.org/about/ quality þscoring) [12]. Moreover, we include here only those cases for which the mRNA signals in mouse and human and the human protein detection patterns at least partially confirm each other. This approach minimizes the risk of errors due to antibodies binding to proteins other than the intended targets in the mouse. Fam71b is interesting because it encodes a protein that interacts with FRG1, which is possibly involved in RNA processing in the germline [43]. Frg1 bears a strong testicular H3K4me3 mark and is expressed in male germ cells (Supplemental Fig. S1). We found Fam71b mRNA levels in germ cells (determined by GeneChips) to be highly reproducible between human and mouse (Fig. 1). A testicular ChIP-Seq experiment showed that the entire mouse locus bears the H3K4me3 mark associated with gene activation, and RNA-Seq data confirmed that the mouse mRNA peaks in round spermatids and elongated spermatids (Figs. 2 and 6A). RTPCR assays revealed Fam71b mRNAs in enriched adult spermatocytes and spermatids but not in GC1 and GC2 cell lines derived from immortalized germ cells (Fig. 6B). This indicates that the Fam71b promoter requires signals present only in bona fide germ cells or that the mRNA is unstable in cell lines. We obtained no signals in isolated heart, muscle, lung, and gonad samples from an embryo at Day 12.5 (E12.5), but we observed a faint band in stomach tissue (Fig. 6B). Human FAM71B protein is present in testicular extracts, and 7

Article 71

Downloaded from www.biolreprod.org.

FIG. 5. A testicular protein interaction network. Nodes represent protein, and edges represent protein-protein interactions. Proteins in our target list are given in bold. Proteins with a known function or for which no role was found in a gene-deletion experiment are given in green and blue, respectively. Proteins that were further characterized are shown in red.

PETIT ET AL.

Downloaded from www.biolreprod.org. FIG. 6. Characterization of mouse Fam71b expression at the mRNA and protein level by IHC using adult testicular sections. A) Color-coded RNA-Seq data are displayed as histograms in the context of gene annotation (RefSeq, blue bars; arrowheads indicate the direction of transcription), an epigenetic mark (H3K4me3, grey) for enriched Sertoli cells (Sertoli), primitive spermatogonia (primSG), type A and type B spermatogonia (SGA and SGB, respectively), leptotene and pachytene spermatocytes (lepSC and pachySC, respectively), and round and elongated spermatids (rST and eST, respectively; shades of green). B) RT-PCR data are shown for embryonic samples (E12.5; left) versus adult germ cells (right) and cell lines as indicated at the top. MW, molecular weight. C) IHC data using testicular sections from adult males (Mus musculus; age, 30 days) in the presence (FAM71B; left) and absence (no antibody; right) of an antibody. Bar¼50 lm. D) IHC data for somatic controls. E) IHC data for sections of an embryo at E12.5 in the presence (FAM71B) or absence (no antibody) of an antibody. Bar ¼ 100 lm.

8

Article 71

IDENTIFYING GENES EXPRESSED IN SPERMATOGENESIS

Downloaded from www.biolreprod.org. FIG. 7. Analysis of 1700019N19Rik/C10orf82 mRNA and protein expression in mouse and human. A) RNA-Seq and ChIP-Seq data are displayed as in Figure 6A. B) RT-PCR data are shown as in Figure 6B. C) The protein sequences of three human C10orf82 isoforms (iso1–3) are aligned. The epitope recognized by the polyclonal HPA antibody we used is shown in red. D) IF data using testicular sections from adult males (Mus musculus; age, 30 days) to reveal DNA (DAPI) and the protein (1700019N19Rik) are shown as in Figure 6C. A white rectangle marks a magnified and color-enhanced region shown to the right. The rightmost panel is split into the marked region, which was enlarged and color-enhanced, and the negative control (no antibody). E) IF patterns focusing on round spermatids and elongated spermatids. F) IF data from four somatic controls. G) IF data for the human protein on human testis. A white rectangle marks a magnified and colorenhanced region shown in the rightmost panel. Bar ¼ 50 lm.

9

Article 71

PETIT ET AL.

10

Article 71

Downloaded from www.biolreprod.org.

mouse is partially confirmed in spermatocytes (Fig. 9E). Taken together, the results show that mammalian Foxr1 is predominantly transcribed in mitotic spermatogonia and meiotic spermatocytes but that the mouse protein peaks in postmeiotic elongated spermatids. Hmgb4 is an interesting candidate because it shows an extremely conserved and highly specific pattern of induction in the male germline, and it was proposed to be important for gene regulation in spermatogenesis [54]. We first examined the testicular profile and the exon composition of Hmgb4’s mRNA using RNA-Seq data and found the transcript to be specifically and strongly expressed in round spermatids and elongated spermatids. We observed an as-yet-unannotated transcript just upstream of Hmgb4’s currently annotated single exon and found that the entire locus and 5 0 upstream region including the new transcript is covered by the H3K4me3 mark (Fig. 10A). RT-PCR data showed strong signals in adult germ cells and extremely weak signals in GC1 and GC2 cells as well as embryonic gonads and the muscle sample (Fig. 10B). We verified the HMGB4’s subcellular localization pattern and found the protein to be in a spot in the nuclei of spermatocytes, in the nuclei of round spermatids, and in elongated spermatids (Fig. 10C), which confirms and extends previous observations [54]. As expected, no signals were obtained on sections from adult brain, kidney, liver, and skeletal muscle samples (Fig. 10D). We next sought to further study HMGB4’s localization in meiotic germ cells. To this end, chromosomal spreads from spermatocytes at meiotic prophase I were stained with antibodies against SCP3, a structural component of the synaptonemal complex, and HMGB4. We found that the protein begins to accumulate during the leptotene stage, before its concentration increases further during the pachytene and diplotene stages, where it localizes to the sex body (Fig. 10E). Taken together, the results reveal conserved mRNA/protein patterns in human and mouse testis and an as-yet-unknown interaction of HMGB4 with the sex chromosomes. Finally, we further characterized Sox30, which encodes a DNA-binding transcription factor previously proposed to play an important role in spermatogenesis on the basis of its mRNA profile [55]. Testicular Sox30 induction coincides with peak H3k4me3 signals, and via RNA-Seq, we found that the gene is expressed very weakly in Sertoli cells, whereas its transcript strongly accumulates as germ cells progress through meiosis and postmeiotic stages. We also observed that spermatids express an extended 5 0 UTR (Fig. 11A). RTPCR data showed that Sox30 is strongly induced in enriched germ cells but undetectable in GC1 and GC2 cell lines (Fig. 11B). RNA concentrations and protein levels correspond to each other because we found the protein in the cytoplasm of adult spermatocytes and the nuclei of round spermatids, which resembles the pattern detected for the human protein by the HPA (Fig. 11C). We observed intense cytoplasmic signals in the brain and strong signals in the cytoplasm and nuclei of epithelial cells in the uterus (Fig. 11, D and E); the biological relevance of cytoplasmic signals is unclear, however, because the protein a priori is a nuclear DNA-binding regulator. As expected, SOX30 was not detected in kidney, liver, and muscle controls (Fig. 11F). In summary, Sox30 is similarly transcribed and translated in the human and mouse adult male germline (Figs. 1–3) and thus clearly associated with spermatogenesis.

types (Fig. 4) (see also www.proteinatlas.org). To address this issue, we employed an antibody recognizing human C10orf82 to analyze mouse testicular sections (the epitope present in three isoforms is shown in Fig. 7C). We detected the protein by IF in the cytoplasm of spermatocytes and the nuclei of round spermatids and elongated spermatids (Fig. 7, D and E) but not in adult somatic brain, kidney, liver, and muscle controls (Fig. 7F). In mouse testis, we also observed cytoplasmic staining of peritubular cells and cytoplasmic staining of interstitial Leydig cells, in keeping with the HPA’s IHC data. One clear difference between testicular tissue from mouse and human is the strong signal we found only in mouse spermatids. We therefore carried out an IF assay of C10orf82 in human testis and observed strong signals in peritubular cells and Leydig cells, confirming the HPA’s IHC data; weak signals were found in the cytoplasm of spermatocytes, consistent with our observations in mouse (Fig. 7G). We were, however, unable to detect the protein in human spermatids, which might be due to the expression of an isoform that is folded differently or that entirely lacks the epitope bound by the antibody (Fig. 7C). The corollary is that data on the transcript levels and protein localization are tightly correlated in mouse testis. Next, we selected Zfp597 because it encodes a Kru¨ppel C2H2-type zinc-finger protein possibly involved in transcriptional regulation; the mouse transcript is expressed from a complex locus via alternative exon usage [47]. GeneChip data confirmed that mouse Zfp597 encodes different isoforms, which peak either in the germline or in somatic cells, including brain, kidney, liver, and muscle (see www.germonline.org) [7, 8]. ChIP-Seq data revealed the expected peak in H3K4me3 signals at the mouse gene’s 5 0 region. Interestingly, RNA-Seq revealed that Zfp597 expresses a postmeiotic isoform with an extended 5 0 UTR in round and elongated spermatids, whereas expression of its extensive 3 0 UTR appears to diminish at the same time (Fig. 8A). An RT-PCR analysis using a primer pair located within the extreme 3 0 end of the third exon yielded signals in all embryonic and adult samples (Fig. 8B). These results are confirmed by data available via Genevestigator showing that mouse Zfp597 is expressed in embryonic tissue (www.genevestigator.com) [48]. Because flexible 5 0 UTRs are known to affect protein translation, we reasoned that ZFP597 might be regulated at the posttranslational level [49–51]. We therefore monitored the protein by IF using histological sections and found signals only in elongated spermatids but not in brain, kidney, liver, and muscle (Fig. 8, C and D). The mRNA/protein patterns are thus similar in testis from human (Figs. 1–3) and testis from mouse. Foxr1 was further studied because it belongs to the forkhead-box family of DNA-binding proteins involved in development, somatic cancer, and reproduction [52, 53]. ChIPSeq signals for the H3K4me3 activation mark confirm testicular gene activation, whereas GeneChip and RNA-Seq data show that the mRNA peaks in spermatogonia at different stages of development and leptotene spermatocytes (Figs. 1, 2, and 9A). An RT-PCR assay confirmed this pattern in adult germ cells but also revealed a faint signal in embryonic gonads (Fig. 9B). The protein is detected by mass spectrometry in adult testis and in the retina, and it also appears to be present in fetal gut and fetal liver (Fig. 3) [11]. To learn more about FOXR1’s cellular localization, we performed an IF assay and found that the protein is detectable in the cytoplasm of spermatocytes and strongly accumulates at the perinuclear region in elongated spermatids, whereas no signal was observed in adult brain, kidney, liver, and muscle controls (Fig. 9, C and D). Because the HPA currently does not cover FOXR1, we analyzed human testicular sections and found that the staining pattern in the

IDENTIFYING GENES EXPRESSED IN SPERMATOGENESIS

Downloaded from www.biolreprod.org. FIG. 8. Comparison of mouse Zfp597 mRNA and protein expression. A) RNA-Seq and ChIP-Seq data are displayed as in Figure 6A. The color code is red for Sertoli cells (Sertoli); shades of blue primitive spermatogonia, type A spermatogonia, and type B spermatogonia (primSG, SGA, and SGB, respectively); orange for leptotene spermatocytes (lepSC); brown for pachytene spermatocytes (pachySC); and shades of green for round spermatids and elongated spermatids (rST and eST, respectively). A red bar marks the 5 0 UTR extension observed in postmeiotic germ cells. B) RT-PCR data are shown as in Figure 6B for two isoforms as indicated at the bottom. C) IF data using testicular sections from adult males (Mus musculus; age, 30 days) and an antibody (ZFP597). DNA is stained with DAPI. A control in which the primary antibody was omitted is shown (no antibody). D) IHC data for somatic controls. Bar ¼ 50 lm.

11

Article 71

PETIT ET AL.

Downloaded from www.biolreprod.org. FIG. 9. Characterization of mouse Foxr1 mRNA/protein expression. A) RNA-Seq and ChIP-Seq data are displayed like in Figure 6A. B) RT-PCR data are shown like in Figure 6B. C) IF data using testicular sections from adult males (Mus musculus 30 dpp) and an antibody (FOXR1) are shown like in Figure 7C. White rectangles mark magnified and color-enhanced regions shown in the panels to the right, next to the control lacking the antibody (no antibody). White bars correspond to 50 lm. D) IF data are given for somatic controls as indicated. E) IF data are shown for human testis as indicated. A white rectangle marks a magnified and color-enhanced region shown together with the no antibody control.

12

Article 71

IDENTIFYING GENES EXPRESSED IN SPERMATOGENESIS

Downloaded from www.biolreprod.org. FIG. 10. Analysis of mouse Hmgb4 mRNA/protein expression. A) RNA-Seq and ChIP-Seq data are displayed as in Figure 6A. B) RT-PCR data are shown as in Figure 6B. C) IF data using testicular sections from adult males (Mus musculus; age, 30 days) and an antibody (HMGB4) are shown as in Figure 7C. D) IF data for somatic controls. E) IF data with two antibodies (HMGB4 and SCP3) for chromosomal spreads produced with spermatocytes at different stages of prophase I. Bar ¼ 50 lm (C and D); original magnification 363 (E).

13

Article 71

PETIT ET AL.

Downloaded from www.biolreprod.org. FIG. 11. Assaying Sox30 mRNA/protein expression in gonads and somatic tissues. A) RNA-Seq and ChIP-Seq data are displayed as in Figure 6A. B) RTPCR data are shown as in Figure 6B. C) IF data using testicular sections from adult males (Mus musculus; 30 days) and an antibody (SOX30) are shown as in Figure 7C. D and E) IF data for brain (D) and uterus (E). F) IF data for somatic controls. Bar ¼ 50 lm.

14

Article 71

IDENTIFYING GENES EXPRESSED IN SPERMATOGENESIS

DISCUSSION

Using Correlated mRNA/Protein Patterns to Infer Function in the Mouse Male Germline

Conserved Cross-Species RNA and Protein Expression Patterns Provide Clues for Gene Function Combining information on RNAs, proteins, and networks marks out genes such as Rnf168, Dmrtb1/Dmrt6, Znf165, and Sox30 as important for spermatogenesis because of their conserved testicular expression patterns across species and their positions as hub proteins. Our candidates were initially selected because, among other criteria, no phenotypic information was available in the major databases, neXtProt and the Mouse Genome Database [56, 57]. A detailed query of the scientific literature in PubMed, however, revealed that Rnf168 actually is important for spermatogenesis, but this information was not included into the gene’s annotation [38, 58]. Additionally, in the course of our study, C16orf73, for which human and mouse expression data suggested a role in spermatocytes or spermatids, was renamed MEIOB because the mouse homolog was found to be essential for meiotic recombination [59, 60]. Furthermore, DMRTB1/Dmrt6 was reported to be critical for the transition between mitosis and meiosis in the mouse germline [61]. These recent examples join the growing number of genes showing corresponding germlinespecific mRNA/protein patterns conserved across rodents and human, among loci important for mammalian fertility [5, 62]. The corollary is that tightly correlated mRNA/protein expression patterns are typically a good indicator of an important gene function in a given tissue, but some pitfalls remain. For example, a recent report by members of the IMPC consortium has, unsurprisingly, underlined that genes with paralogs tend not to be essential [27]. Indeed, a group that recently failed to detect a phenotype in mice lacking functional receptor kinase STK31 has explained the result by invoking genetic redundancy [63, 64]. IMPC Gene Deletion Pipeline Ongoing work by the IMPC will generate mouse genedeletion models and initial information on phenotypes for most of their mouse orthologs via classical gene-deletion experiments or cell type-specific deletions using the Cre-Lox system. In the near future, the consortium may well harness the new CRISPR/Cas9 method for fast and cost-effective gene deletion studies [65, 66]. However, deleting thousands of mouse genes will remain a complex and time-consuming task. The IMPC currently plans to analyze 59 of 74 mouse orthologs among the human target loci; our data indicate that the remaining 15 genes are worthwhile targets as well. The consortium recently reported via its database a homeostasis/ metabolism phenotype in males and females lacking 1700003F12Rik (mousephenotype.org/data/genes/ MGI:1922730). Because this result was obtained with a straightforward fertility assay (www.mousephenotype.org/ impress/protocol/105/7), it seems justified to further explore the mutant’s testicular architecture and physiological parameters, such as the levels of sex hormones. In addition, 58 loci scheduled for the generation of transgenic mice are at early stages of embryonic stem cell production in progress (6 loci), mouse production planned (35 loci), or mouse production in progress (11 loci). Transgenic mice have already been generated for Fam209 (C20orf107), Sox30, Spata33, Stk31, 1700003F12Rik (C20orf144), and 4930505A04Rik (C2orf73). It should therefore be possible to determine their reproductive phenotypes in great detail (apart from Stk31 [63, 64]) once homozygous mutant strains become available in the not-toodistant future. 15

Article 71

Downloaded from www.biolreprod.org.

FAM71B is a striking example of a case where testicular mRNA peak signals and protein detection profiles are correlated in human tissue. Given that FAM71B interacts with FRG1, a protein involved in transcript processing, we speculate that FAM71B is playing a role in RNA modification during spermiogenesis [67]. Our results show that mRNA and protein patterns are also highly correlated in mouse germ cells. The observed staining of interstitial Leydig cells was unexpected because we were unable to detect the mRNA in testicular biopsies from infertile patients lacking most or all types of germ cells [7] and in purified Leydig cells from rat and human (F. Chalmel, M. Primig, and B. Je´gou, unpublished results). It is conceivable, however, that the transcript is present in Leydig cells only at a very low level or that it is degraded very efficiently when Leydig cells are purified. The function of 1700019N19Rik/C10orf82 is unknown, but it interacts with two proteins involved in meiotic recombination [44, 46]. Our results in mouse testis are in keeping with a role for the protein during or after this process. IHC data from the HPA and our IF assay on a human testicular section do not confirm RNA profiling measurements that yield a peak in human spermatids because, contrary to the mouse protein, we have not detected human C10orf82 in postmeiotic germ cells. We cannot rule out that the human protein accumulates at a stage not included in our testicular sections (and those analyzed by the HPA) or that C10orf82 is unstable in spermatids. Alternatively, it is possible that an mRNA isoform expressed in postmeiotic germ cells lacks the exon, which covers the peptide against which the antibody was raised. It is known that C10orf82 encodes at least four splice variants, including two that retain an intron and are predicted to yield no protein product (www.ensembl.org) [68]. The neXtProt data for human proteins provides data for three protein variants, whereby isoforms 1 and 2 contain the complete epitope (www.nextprot. org) [56]. Isoform 2 may not fold up the same way as the longer protein, which may mask the epitope, and isoform 3 lacks the antibody-binding site altogether. Indeed, the mouse ortholog encodes two isoforms that appear to differ only in the first exon and the 3 0 UTR (www.ensembl.org). Zfp597, the ortholog of human ZNF597, is a mouse locus that expresses two isoforms using alternative 5 0 exons. The Hit4 isoform was shown to be important for embryonic development in a homozygous mutant and brain formation in a heterozygous mutant [47]. It is noteworthy that round spermatids and elongated spermatids transcribe an mRNA with an extended 5 0 UTR. Because such a phenomenon has been implicated in translational control, our analysis points to a possible posttranscriptional mechanism regulating ZFP597 in the male germline [69, 70]. We report data for RNA and the mouse ZFP597 protein that are consistent with the idea of postmeiotic germ cells expressing a testicular isoform important for gene regulation or chromatin organization during spermiogenesis. Foxr1 encodes a putative transcription factor that belongs to a conserved family of regulators involved in chromatin remodeling, including some that are important for reproduction. For example, Foxo1 is required for spermatogonial stem cells, and Foxl2 is involved in ovarian development [71–73] (for review, see [74]). In humans, FOXR1 fusion genes are known to act as oncogenes, which repress loci activated by FOXO regulators [52]. Our data reveal a posttranscriptional mechanism of regulation for Foxr1 in the mouse male germline because the protein first appears in the cytoplasm in meiotic

PETIT ET AL.

genes important for spermatogenesis are discovered and might thus help us to gain insight regarding the molecular cause underlying certain cases of unexplained male infertility.

germ cells and later accumulates in the perinuclear region of postmeiotic elongated spermatids. The weaker staining pattern in human samples might be a consequence of suboptimal tissue preparation due to the constraints that limit the recovery and processing of human testicular samples. Given the known roles of Forkhead box proteins and the correlated mRNA/protein patterns we observe across species, we propose that FOXR1 plays a role in spermiogenesis. Alternatively, the protein could also exert a function during early stages of embryonic development after fertilization as part of the sperm proteome delivered to the egg. Such a function was proposed for the transcription factor Stat4, which is present in the perinuclear theca of spermatozoa [75]. In this context, we note that almost 6200 proteins are known to be present in sperm, including, for example, human FAM71B and SOX30 [76] (for review, see [77]). Hmgb4 is a member of a family of high-mobility group proteins known to interact with chromatin. The gene is a paralog of Hmbg1, Hmbg2, and Hmbg3, but it is distinguished by lacking a C-terminal stretch of acidic amino acids [78, 79]. Interestingly, the protein binds DNA modified by cisplatin, thereby preventing DNA damage-response enzymes from repairing the lesions, which explains why testicular tumors are hypersensitive to cisplatin-based chemotherapy [80]. Hmgb4 in the mouse was proposed to be important for spermiogenesis because of its presence in spermatocytes and spermatids [54]. We confirm and extend this previous report by assembling new, large-scale RNA and protein profiling data with our own histological analysis. Importantly, based on our IF data with spread nuclei from spermatocytes at different stages of meiotic prophase I, we propose that HMGB4 plays an as-yet-unanticipated role in the chromatin organization of X and Y chromosomes. Sox30 belongs to the large family of DNA-binding SRYbox regulators, many of which are involved in developmental processes and diseases [55, 80]. The gene is under epigenetic control and strongly induced in male germ cells and oocytes [7, 8, 81–83]. SOX30 is a hub protein that binds at least 13 proteins, including two that are essential for spermatogenesis (BRCA1 [84] and PAXIP1/PTIP [85]) and two that are expressed in testis (BAMBI [86] and C6orf165/ 1700003M02Rik [7, 8]). We note that Sox30 is also expressed in the developing mouse pancreas, and fly data, together with our results in the mouse, hint at a possible function for Sox30 during embryogenesis and in adult neuronal tissue (Fig. 11, B and D) [87, 88]. However, because we observe cytoplasmic signal in brain cells and the adult uterus, the protein would have to fulfill a novel role unrelated to DNA bindingdependent gene activation (Fig. 11, D and E). Very recent work shows that human SOX30 acts as a tumor suppressor gene in lung cancer and directly activates the major cell-cycle regulator TP53 (p53) [89]. The p53 protein belongs to a family of three DNA-binding transcription factors that have been implicated in tumor suppression and fertility in males and females [90–92]. This raises the intriguing possibility that SOX30 may participate in the transcriptional activation of TP53 during meiotic and postmeiotic stages of mammalian male germline development. A combined, large-scale RNA-Seq transcript profiling and IHC protein profiling study identified more than 1000 proteins enriched in testis, revealing this organ as expressing the largest set of tissue-restricted proteins [93]. The present study specifically targeted genes conserved between rodents and primates, which also show similar testicular mRNA and protein expression profiles, suggesting potential roles in sexual reproduction. These data will accelerate the pace at which

ACKNOWLEDGMENT We thank Pierre-Yves Kernanec for mouse husbandry and Olivier Collin for GenOuest’s bioinformatics infrastructure support. We gratefully acknowledge Chunsheng Han and Fuchou Tang for providing RNA-Seq raw data files.

REFERENCES

16

Article 71

Downloaded from www.biolreprod.org.

1. Deutschbauer AM, Williams RM, Chu AM, Davis RW. Parallel phenotypic analysis of sporulation and postgermination growth in Saccharomyces cerevisiae. Proc Natl Acad Sci U S A 2002; 99: 15530–15535. 2. Enyenihi AH, Saunders WS. Large-scale functional genomic analysis of sporulation and meiosis in Saccharomyces cerevisiae. Genetics 2003; 163: 47–54. 3. Colaiacovo MP, Stanfield GM, Reddy KC, Reinke V, Kim SK, Villeneuve AM. A targeted RNAi screen for genes involved in chromosome morphogenesis and nuclear organization in the Caenorhabditis elegans germline. Genetics 2002; 162:113–128. 4. Handel MA, Lessard C, Reinholdt L, Schimenti J, Eppig JJ. Mutagenesis as an unbiased approach to identify novel contraceptive targets. Mol Cell Endocrinol 2006; 250:201–205. 5. Jamsai D, O’Bryan MK. Mouse models in male fertility research. Asian J Androl 2011; 13:139–151. 6. O’Bryan MK, de Kretser D. Mouse models for genes involved in impaired spermatogenesis. Int J Androl 2006; 29:76–89 (discussion 105–108). 7. Chalmel F, Lardenois A, Evrard B, Mathieu R, Feig C, Demougin P, Gattiker A, Schulze W, Jegou B, Kirchhoff C, Primig M. Global human tissue profiling and protein network analysis reveals distinct levels of transcriptional germline-specificity and identifies target genes for male infertility. Hum Reprod 2012; 27:3233–3248. 8. Chalmel F, Rolland AD, Niederhauser-Wiederkehr C, Chung SS, Demougin P, Gattiker A, Moore J, Patard JJ, Wolgemuth DJ, Jegou B, Primig M. The conserved transcriptome in human and rodent male gametogenesis. Proc Natl Acad Sci U S A 2007; 104:8346–8351. 9. Schultz N, Hamra FK, Garbers DL. A multitude of genes expressed solely in meiotic or postmeiotic spermatogenic cells offers a myriad of contraceptive targets. Proc Natl Acad Sci U S A 2003; 100:12201–12206. 10. Ellis PJ, Furlong RA, Wilson A, Morris S, Carter D, Oliver G, Print C, Burgoyne PS, Loveland KL, Affara NA. Modulation of the mouse testis transcriptome during postnatal development and in selected models of male infertility. Mol Hum Reprod 2004; 10:271–281. 11. Kim MS, Pinto SM, Getnet D, Nirujogi RS, Manda SS, Chaerkady R, Madugundu AK, Kelkar DS, Isserlin R, Jain S, Thomas JK, Muthusamy B, et al. A draft map of the human proteome. Nature 2014; 509:575–581. 12. Fagerberg L, Hallstrom BM, Oksvold P, Kampf C, Djureinovic D, Odeberg J, Habuka M, Tahmasebpoor S, Danielsson A, Edlund K, Asplund A, Sjostedt E, et al. Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibodybased proteomics. Mol Cell Proteomics 2014; 13:397–406. 13. Ong CT, Corces VG. Enhancer function: new insights into the regulation of tissue-specific gene expression. Nat Rev Genet 2011; 12:283–293. 14. Vermeulen M, Timmers HT. Grasping trimethylation of histone H3 at lysine 4. Epigenomics 2010; 2:395–406. 15. Furey TS. ChIP-seq and beyond: new and improved methodologies to detect and characterize protein-DNA interactions. Nat Rev Genet 2012; 13:840–852. 16. Kittler R, Pelletier L, Buchholz F. Systems biology of mammalian cell division. Cell Cycle 2008; 7:2123–2128. 17. Lipshutz RJ, Fodor SP, Gingeras TR, Lockhart DJ. High density synthetic oligonucleotide arrays. Nat Genet 1999; 21:20–24. 18. Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 2009; 10:57–63. 19. Cox J, Mann M. Quantitative, high-resolution proteomics for data-driven systems biology. Annu Rev Biochem 2011; 80:273–299. 20. Koh GC, Porras P, Aranda B, Hermjakob H, Orchard SE. Analyzing protein-protein interaction networks. J Proteome Res 2012; 11:2014–2031. 21. Ideker T, Krogan NJ. Differential network biology. Mol Syst Biol 2012; 8: 565. 22. Vidal M, Cusick ME, Barabasi AL. Interactome networks and human disease. Cell 2011; 144:986–998.

IDENTIFYING GENES EXPRESSED IN SPERMATOGENESIS

48.

49.

50. 51. 52.

53. 54.

55.

56.

57.

58.

59.

60.

61.

62. 63.

64.

65. 66. 67.

68.

69. 70. 71.

17

Ichikawa T, Kumanishi T, Aizawa Y, Takahashi H, Kakita A, Nawa H. Molecular characterization and gene disruption of a novel zinc-finger protein, HIT-4, expressed in rodent brain. J Neurochem 2010; 112: 1035–1044. Hruz T, Laule O, Szabo G, Wessendorp F, Bleuler S, Oertle L, Widmayer P, Gruissem W, Zimmermann P. Genevestigator V3: a reference expression database for the meta-analysis of transcriptomes. Adv Bioinformatics 2008; 2008:420747. Chowdhury TA, Kleene KC. Identification of potential regulatory elements in the 5 0 and 3 0 UTRs of 12 translationally regulated mRNAs in mammalian spermatids by comparative genomics. J Androl 2012; 33: 244–256. Scheper GC, van der Knaap MS, Proud CG. Translation matters: protein synthesis defects in inherited disease. Nat Rev Genet 2007; 8:711–723. Barbosa C, Peixeiro I, Romao L. Gene expression regulation by upstream open reading frames and human disease. PLoS Genet 2013; 9:e1003529. Katoh M, Igarashi M, Fukuda H, Nakagama H, Katoh M. Cancer genetics and genomics of human FOX family genes. Cancer Lett 2013; 328: 198–206. Thackray VG. Fox tales: regulation of gonadotropin gene expression by forkhead transcription factors. Mol Cell Endocrinol 2014; 385:62–70. Catena R, Escoffier E, Caron C, Khochbin S, Martianov I, Davidson I. HMGB4, a novel member of the HMGB family, is preferentially expressed in the mouse testis and localizes to the basal pole of elongating spermatids. Biol Reprod 2009; 80:358–366. Osaki E, Nishina Y, Inazawa J, Copeland NG, Gilbert DJ, Jenkins NA, Ohsugi M, Tezuka T, Yoshida M, Semba K. Identification of a novel Sryrelated gene and its germ cell-specific expression. Nucleic Acids Res 1999; 27:2503–2510. Lane L, Argoud-Puy G, Britan A, Cusin I, Duek PD, Evalet O, Gateau A, Gaudet P, Gleizes A, Masselot A, Zwahlen C, Bairoch A. neXtProt: a knowledge platform for human proteins. Nucleic Acids Res 2012; 40: D76–D83. Blake JA, Bult CJ, Eppig JT, Kadin JA, Richardson JE, Mouse Genome Database Group. The Mouse Genome Database: integration of and access to knowledge about the laboratory mouse. Nucleic Acids Res 2014; 42: D810–D817. Bohgaki T, Bohgaki M, Cardoso R, Panier S, Zeegers D, Li L, Stewart GS, Sanchez O, Hande MP, Durocher D, Hakem A, Hakem R. Genomic instability, defective spermatogenesis, immunodeficiency, and cancer in a mouse model of the RIDDLE syndrome. PLoS Genet 2011; 7:e1001381. Joyce EF, Tanneti SN, McKim KS. Drosophila hold’em is required for a subset of meiotic crossovers and interacts with the DNA repair endonuclease complex subunits MEI-9 and ERCC1. Genetics 2009; 181:335–340. Souquet B, Abby E, Herve R, Finsterbusch F, Tourpin S, Le Bouffant R, Duquenne C, Messiaen S, Martini E, Bernardino-Sgherri J, Toth A, Habert R, et al. MEIOB targets single-strand DNA and is necessary for meiotic recombination. PLoS Genet 2013; 9:e1003784. Zhang T, Murphy MW, Gearhart MD, Bardwell VJ, Zarkower D. The mammalian Doublesex homolog DMRT6 coordinates the transition between mitotic and meiotic developmental programs during spermatogenesis. Development 2014; 141:3662–3671. Matzuk MM, Lamb DJ. The biology of infertility: research advances and clinical challenges. Nat Med 2008; 14:1197–1213. Bao J, Yuan S, Maestas A, Bhetwal BP, Schuster A, Yan W. Stk31 is dispensable for embryonic development and spermatogenesis in mice. Mol Reprod Dev 2013; 80:786. Zhou J, Leu NA, Eckardt S, McLaughlin KJ, Wang PJ. STK31/TDRD8, a germ cell-specific factor, is dispensable for reproduction in mice. PLoS ONE 2014; 9:e89471. Hsu PD, Lander ES, Zhang F. Development and applications of CRISPRCas9 for genome engineering. Cell 2014; 157:1262–1278. Kim H, Kim JS. A guide to genome engineering with programmable nucleases. Nat Rev Genet 2014; 15:321–334. van Koningsbruggen S, Straasheijm KR, Sterrenburg E, de Graaf N, Dauwerse HG, Frants RR, van der Maarel SM. FRG1P-mediated aggregation of proteins involved in pre-mRNA processing. Chromosoma 2007; 116:53–64. Flicek P, Amode MR, Barrell D, Beal K, Billis K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fitzgerald S, Gil L, Giron CG, et al. Ensembl 2014. Nucleic Acids Res 2014; 42:D749–D755. Schafer M, Nayernia K, Engel W, Schafer U. Translational control in spermatogenesis. Dev Biol 1995; 172:344–352. Pickering BM, Willis AE. The implications of structured 5 0 untranslated regions on translation and disease. Semin Cell Dev Biol 2005; 16:39–47. Lalmansingh AS, Karmakar S, Jin Y, Nagaich AK. Multiple modes of

Article 71

Downloaded from www.biolreprod.org.

23. Costanzo MC, Engel SR, Wong ED, Lloyd P, Karra K, Chan ET, Weng S, Paskov KM, Roe GR, Binkley G, Hitz BC, Cherry JM. Saccharomyces genome database provides new regulation data. Nucleic Acids Res 2014; 42:D717–D725. 24. Koscielny G, Yaikhom G, Iyer V, Meehan TF, Morgan H, AtienzaHerrero J, Blake A, Chen CK, Easty R, Di Fenza A, Fiegel T, Grifiths M, et al. The International Mouse Phenotyping Consortium Web Portal, a unified point of access for knockout mice and related phenotyping data. Nucleic Acids Res 2014; 42:D802–D809. 25. Brown SD, Moore MW. The International Mouse Phenotyping Consortium: past and future perspectives on mouse phenotyping. Mamm Genome 2012; 23:632–640. 26. Skarnes WC, Rosen B, West AP, Koutsourakis M, Bushell W, Iyer V, Mujica AO, Thomas M, Harrow J, Cox T, Jackson D, Severin J, et al. A conditional knockout resource for the genome-wide study of mouse gene function. Nature 2011; 474:337–342. 27. White JK, Gerdin AK, Karp NA, Ryder E, Buljan M, Bussell JN, Salisbury J, Clare S, Ingham NJ, Podrini C, Houghton R, Estabel J, et al. Genome-wide generation and systematic phenotyping of knockout mice reveals new roles for many genes. Cell 2013; 154:452–464. 28. Schofield PN, Hoehndorf R, Gkoutos GV. Mouse genetic and phenotypic resources for human genetics. Hum Mutat 2012; 33:826–836. 29. Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memoryefficient alignment of short DNA sequences to the human genome. Genome Biol 2009; 10:R25. 30. Feng J, Liu T, Qin B, Zhang Y, Liu XS. Identifying ChIP-seq enrichment using MACS. Nat Protoc 2012; 7:1728–1740. 31. Gan H, Wen L, Liao S, Lin X, Ma T, Liu J, Song CX, Wang M, He C, Han C, Tang F. Dynamics of 5-hydroxymethylcytosine during mouse spermatogenesis. Nat Commun 2013; 4:1995. 32. Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 2013; 14:R36. 33. Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 2010; 28:511–515. 34. Smyth GK. limma: linear models for microarray data. In: Gentleman R, Carey VJ, Huber W, Irizarry RA, Dudoit S (eds.), Bioinformatics and Computational Biology Solutions Using R and Bioconductor. New York: Springer; 2005:397–420. 35. Thorvaldsdottir H, Robinson JT, Mesirov JP. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform 2013; 14:178–192. 36. R Foundation for Statistical Computing [Internet]. Vienna: R Foundation. http://www.R-project.org. Accessed 1 January 2014. 37. Chalmel F, Primig M. The Annotation, Mapping, Expression and Network (AMEN) suite of tools for molecular systems biology. BMC Bioinformatics 2008; 9:86. 38. Coordinators NR. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2014; 42:D7–D17. 39. Smagulova F, Brick K, Pu Y, Sengupta U, Camerini-Otero RD, Petukhova GV. Suppression of genetic recombination in the pseudoautosomal region and at subtelomeres in mice with a hypomorphic Spo11 allele. BMC Genomics 2013; 14:493. 40. van Dijk EL, Auger H, Jaszczyszyn Y, Thermes C. Ten years of nextgeneration sequencing technology. Trends Genet 2014; 30:418–426. 41. Shoji M, Kawarabayashi T, Harigaya Y, Yamaguchi H, Hirai S, Kamimura T, Sugiyama T. Alzheimer amyloid beta-protein precursor in sperm development. Am J Pathol 1990; 137:1027–1032. 42. Guo Q, Wang Z, Li H, Wiese M, Zheng H. APP physiological and pathophysiological functions: insights from animal models. Cell Res 2012; 22:78–89. 43. van Koningsbruggen S, Dirks RW, Mommaas AM, Onderwater JJ, Deidda G, Padberg GW, Frants RR, van der Maarel SM. FRG1P is localized in the nucleolus, Cajal bodies, and speckles. J Med Genet 2004; 41:e46. 44. Adelman CA, Petrini JH. ZIP4H (TEX11) deficiency in the mouse impairs meiotic double strand break repair and the regulation of crossing over. PLoS Genet 2008; 4:e1000042. 45. Yang F, Gell K, van der Heijden GW, Eckardt S, Leu NA, Page DC, Benavente R, Her C, Hoog C, McLaughlin KJ, Wang PJ. Meiotic failure in male mice lacking an X-linked factor. Genes Dev 2008; 22:682–691. 46. Ward JO, Reinholdt LG, Motley WW, Niswander LM, Deacon DC, Griffin LB, Langlais KK, Backus VL, Schimenti KJ, O’Brien MJ, Eppig JJ, Schimenti JC. Mutation in mouse Hei10, an E3 ubiquitin ligase, disrupts meiotic crossing over. PLoS Genet 2007; 3:e139. 47. Tanabe Y, Hirano A, Iwasato T, Itohara S, Araki K, Yamaguchi T,

PETIT ET AL.

72. 73.

74. 75. 76.

77. 78. 79. 80.

82.

83.

84. Cressman VL, Backlund DC, Avrutskaya AV, Leadon SA, Godfrey V, Koller BH. Growth retardation, DNA repair defects, and lack of spermatogenesis in BRCA1-deficient mice. Mol Cell Biol 1999; 19: 7061–7075. 85. Schwab KR, Smith GD, Dressler GR. Arrested spermatogenesis and evidence for DNA damage in PTIP mutant testes. Dev Biol 2013; 373: 64–71. 86. Loveland KL, Bakker M, Meehan T, Christy E, von Schonfeldt V, Drummond A, de Kretser D. Expression of Bambi is widespread in juvenile and adult rat tissues and is regulated in male germ cells. Endocrinology 2003; 144:4180–4186. 87. De Martino SP, Errington F, Ashworth A, Jowett T, Austin CA. sox30: a novel zebrafish sox gene expressed in a restricted manner at the midbrainhindbrain boundary during neurogenesis. Dev Genes Evol 1999; 209: 357–362. 88. Lioubinski O, Muller M, Wegner M, Sander M. Expression of Sox transcription factors in the developing mouse pancreas. Dev Dyn 2003; 227:402–408. 89. Han F, Liu W, Jiang X, Shi X, Yin L, Ao L, Cui Z, Li Y, Huang C, Cao J, Liu J. SOX30, a novel epigenetic silenced tumor suppressor, promotes tumor cell apoptosis by transcriptional activating p53 in lung cancer. Oncogene (in press). Published online ahead of print 1 December 2014; DOI: 10.1038/onc.2014.370. 90. Hu W, Zheng T, Wang J. Regulation of fertility by the p53 family members. Genes Cancer 2011; 2:420–430. 91. Feng Z, Zhang C, Kang HJ, Sun Y, Wang H, Naqvi A, Frank AK, Rosenwaks Z, Murphy ME, Levine AJ, Hu W. Regulation of female reproduction by p53 and its family members. FASEB J 2011; 25: 2245–2255. 92. Liu J, Zhang C, Hu W, Feng Z. Tumor suppressor p53 and its mutants in cancer metabolism. Cancer Lett 2015; 356:197–203. 93. Djureinovic D, Fagerberg L, Hallstrom B, Danielsson A, Lindskog C, Uhlen M, Ponten F. The human testis-specific proteome defined by transcriptomics and antibody-based profiling. Mol Hum Reprod 2014; 20: 476–488.

18

Article 71

Downloaded from www.biolreprod.org.

81.

chromatin remodeling by Forkhead box proteins. Biochim Biophys Acta 2012; 1819:707–715. Goertz MJ, Wu Z, Gallardo TD, Hamra FK, Castrillon DH. Foxo1 is required in mouse spermatogonial stem cells for their maintenance and the initiation of spermatogenesis. J Clin Invest 2011; 121:3456–3466. Schmidt D, Ovitt CE, Anlag K, Fehsenfeld S, Gredsted L, Treier AC, Treier M. The murine winged-helix transcription factor Foxl2 is required for granulosa cell differentiation and ovary maintenance. Development 2004; 131:933–942. Uhlenhaut NH, Treier M. Forkhead transcription factors in ovarian function. Reproduction 2011; 142:489–495. Herrada G, Wolgemuth DJ. The mouse transcription factor Stat4 is expressed in haploid male germ cells and is present in the perinuclear theca of spermatozoa. J Cell Sci 1997; 110(Pt 14):1543–1553. Wang G, Guo Y, Zhou T, Shi X, Yu J, Yang Y, Wu Y, Wang J, Liu M, Chen X, Tu W, Zeng Y, et al. In-depth proteomic analysis of the human sperm reveals complex protein compositions. J Proteomics 2013; 79: 114–122. Amaral A, Castillo J, Ramalho-Santos J, Oliva R. The combined human sperm proteome: cellular pathways and implications for basic and clinical science. Hum Reprod Update 2014; 20:40–62. Stros M. HMGB proteins: interactions with DNA and chromatin. Biochim Biophys Acta 2010; 1799:101–113. Malarkey CS, Churchill ME. The high mobility group box: the ultimate utility player of a cell. Trends Biochem Sci 2012; 37:553–562. Park S, Lippard SJ. Binding interaction of HMGB4 with cisplatinmodified DNA. Biochemistry 2012; 51:6728–6737. Han F, Dong Y, Liu W, Ma X, Shi R, Chen H, Cui Z, Ao L, Zhang H, Cao J, Liu J. Epigenetic regulation of Sox30 is associated with testis development in mice. PLoS ONE 2014; 9:e97203. Assou S, Anahory T, Pantesco V, Le Carrour T, Pellestor F, Klein B, Reyftmann L, Dechaud H, De Vos J, Hamamah S. The human cumulus– oocyte complex gene-expression profile. Hum Reprod 2006; 21: 1705–1719. Kocabas AM, Crosby J, Ross PJ, Otu HH, Beyhan Z, Can H, Tam WL, Rosa GJ, Halgren RG, Lim B, Fernandez E, Cibelli JB. The transcriptome of human oocytes. Proc Natl Acad Sci U S A 2006; 103:14027–14032.

Suggest Documents