A glyceraldehyde-3-phosphate dehydrogenase ... - Europe PMC

2 downloads 0 Views 2MB Size Report
Jul 30, 1984 - human B cells which translated into a polypeptide of 36 000 daltons in ...... Hollis,G.F., Hieter,P.A., McBride,O.W., Swan,D.and Leder,P. (1982).
The EMBO Journal vol.3 no. I I pp.2635-2640, 1984

A glyceraldehyde-3-phosphate dehydrogenase pseudogene on the short arm of the human X chromosomes defines a multigene family

F.J.Benham12, S.Hodgkinson1 and K.E.Davies1 'Biochemistry Department, St. Mary's Hospital Medical School, London, W2, and 2lmperial Cancer Research Fund, Lincoln's Inn Fields, London, WC2A 3PX, UK Communicated by R.Williamson

A human X chromosome-derived gene sequence which recognises an abundant, 1.2-kb mRNA in several cell types was previously isolated during a study to identify expressed sequences from an X chromosome recombinant library. Further characterisation of this clone, acronym OA1, has shown that it maps to the short arm of the X, at Xp2l to Xp22. A 777-bp fragment of the clone which hybridises to the 1.2-kb mRNA has been sequenced, and the inferred amino acid sequence shows 8007 homology with the published protein sequence for human muscle glyceraldehyde-3-phosphate dehydrogenase (GAPDH). The fragment shows even higher homology (87070) with pig muscle GAPDH. The OA1 clone selects an mRNA which translates in vitro into a polypeptide of 36 K, the subunit size of GAPDH. However, the X-sequence is most probably a pseudogene whose structure is consistent with it having arisen by reverse transcription of a GAPDH or GAPDH-related mRNA followed by insertion into the X chromosome. The GAPDH-related portion of OA1 hybridises to several DNA fragments in human and mouse DNA, and six fibroblast cDNA clones which crosshybridise to OA1 identify the same genomic fragments as OA1. This series of clones identifies a new, conserved GAPDH-related multigene family. Key words: glyceraldehyde-3-phosphate dehydrogenase/ pseudogene/X chromosome/multigene family Introduction The identification and characterisation of sequences on the X chromosome which are expressed or are related to expressed genes is important for an understanding of the role of the X chromosome in development, sex determination and X-linked genetic disease (Davies et al., 1983). We previously isolated a series of X-coded genes from a human X chromosome recombinant library which recognise a single or a few mRNA species (Benham and Davies, in preparation). One of these sequences, OAl, which recognises an abundant, 1.2-kb transcript in mRNA from all human cells examined, was chosen for further analysis. We now show that part of the OAl sequence, which maps to the short arm of the X chromosome, shows close but not complete homology with the peptide sequence of the enzyme glyceraldehyde-3phosphate dehydrogenase (GAPDH; EC 1.2.1.12) purified from human muscle. Only one gene locus for GAPDH, an enzyme which occupies a central role in the glycolytic pathway, has been identified in man, and it maps to chromosome 12 (Bruns and Gerald, 1976). The OA1 sequence identifies a new multigene family, and this X-linked member is shown by structural evidence to be a pseudogene. IRL Press Limited, Oxford, England.

Results The OA1 clone was isolated from an X chromosome recombinant lambda library as a sequence which recognises a major mRNA species of 1.2 kb in human and human/mouse hybrid cells which contain a human X chromosome (Benham and Davies, in preparation). The recombinant clone contains a 6.3-kb EcoRI genomic fragment. Figure 1 shows the transcripts recognised in a variety of tissues and cells by the cloned genomic insert of OAI in Northern blot analysis. The 1.2-kb mRNA was present in all cell types examined. In rapidly dividing tissue culture cells, this mRNA was as abundant as cellular actin, as shown by analogous experiments using cloned -y-actin as probe (data not shown). In post-mortem tissues the signal from the 1.2-kb mRNA was -2- to 5-fold weaker. Cultured mouse cells examined also showed an abundant mRNA of the same mol. wt., even at high stringency (0.1 x SSC, 50°C), indicating the presence of a homologous rodent mRNA species. In all human samples, a second 0.4-kb mRNA gave a weak signal with the OA1 probe. The signal from this small mRNA was stronger in total mRNA than in poly(A)+ mRNA, in contrast to the 1.2-kb mRNA, which was enriched in poly(A)+ populations compared with total RNA. A 300-bp subclone of OA1 which only recognised this small mRNA was shown to have weak sequence homology with the Alu repeat family (data not shown). The OA1 sequence hybrid selected an mRNA species from human B cells which translated into a polypeptide of 36 000 daltons in the rabbit reticulocyte lysate system (Figure 2). 36 kd is the known subunit size for the enzyme GAPDH. The OA1-selected peptide was easily detected after a short autoradiographic exposure (1-2 days) as compared with an HLA class I gene-selected peptide, which took at least 5 days exposure to show a signal under similar conditions. HLA class I gene transcripts are present at 0.05-0.1 o of poly(A) + mRNA (Ploegh et al., 1980). X chromosome linkage of the OA1 clone was confirmed by Southern blot analysis of DNA from human, mouse and a human X-only somatic cell hybrid line, HORL9X. Figure 3 shows that OA1 hybridises to a series of bands on both human and mouse DNA, even under conditions of high stringency (0.1 x SSC, 65°C). The most intensely hybridising band in EcoRI-cut human DNA was identical in size to the OA1 insert (6.3 kb), and this 6.3-kb fragment is located on the X chromosome as indicated by its presence in the HORL9X hybrid but not in the mouse parental cell line IR. Southern analysis using DNA from two additional somatic cell hybrid lines, one which contains only Xq (lW1-5, Hope et al., 1982) and one which contains several human autosomes and Xp2l to Xqter (Wieacker et al., 1984) as its only human X material showed that the cognate 6.3-kb band maps to the short arm of the X chromosome at or above Xp2l: Figure 3 shows that the 6.3 kb is absent from both of these hybrids. Analysis of a hybrid line which contains an X chromosome with a deletion between Xpl 1 to Xp22 has further localised the clone to Xp2l to Xp22 (C.Ingle, R.R.Her2635

FJ.Benham, S.Hodgkinson and K.E.Davies

..

....4..::::-

..i

..Wdbk,

advalk L

.

i:

r11

Flg. 1. Northern blot showing mRNAs recognised by OAI and two crosshybridising cDNA clones in human and mouse cells and tissues. Tracks contain 1.5 ug of poly(A) + mRNA. (A) OAI as probe. Gel exposed for 2 days. 1, fetal muscle; 2, placenta; 3, human embryonal carcinoma line 212oEp; 4, fibroblast; 5, T-cell line MOLT4; 6, B-cell line G3.32.2; 7, mouse embryonal carcinoma cell line PCC4. (B) All tracks contain poly(A)+ mRNA from the B cell line G.3.32.2. Gel exposed for 5 days. 1, OAI as probe; 2 and 3, two fibroblast cDNA clones as probes.

WI

"aw 'a"A

Fig. 2. SDS-polyacrylamide gels showing polypeptides translated in the rabbit reticulocyte lysate system. 1, mRNA selected by OAI; 2, control: no mRNA added; 3, control: mRNA selected by lambda DNA; 4, poly(A)' mRNA from B cell line G3.32.2.

va, R.Williamson, A.de la Chapelle and K.E.Davies, unpublished). In addition to the 6.3-kb X-linked fragment, several other DNA fragments cross-hybridised to the OAI clone (Figure 3). At least some of these fragments are located on the autosomes as seen by their absence from human X-only hybrid cells. Sequence data The restriction endonuclease Mspl digested the 6.3-kb OAI genomic clone into six resolvable fragments. Two of the

2636

Fig. 3. Southern blots showing EcoRI bands recognised by OA1 in human, mouse and hybrid cell DNA (10 Ag DNA per track). 1, Horl 9X (human X only hybrid); 2, human female B cell line; 3, IR (mouse parent); 4, 697X175K27 (hybrid with Xp21 to Xqter translocation); 5, Horl 9X; 6, RAG (mouse parent of 697X175K27). 7, IWI5 (hybrid with long arm of human X).

fragments, a 777-bp and a 300-bp fragment gave signals when probed on Southern blots with in vitro synthesised cDNA (average size 600 bases). These two fragments were subcloned into the AccI site of the M13 vector Mp8 and sequenced by the chain termination method of Sanger et al. (1977). An additional, 1400-bp MspI fragment contiguous with the 777-bp fragment gave a signal when cDNA of longer (2 1000 bp) average length was synthesised and used as probe, but this fragment has not yet been analysed. The sequence of MspL2 (the 777-bp MspI fragment of OA1) is shown in Figure 4. One reading frame is open until nucleotide 426, which is followed by a STOP codon. The translated sequence of the first 426 bp, which represents 142 amino acids, shows 8007 homology with GAPDH purified from human skeletal muscle (115 out of 142 amino acids) (Table I). Even higher homology was seen with the pig muscle GAPDH: 124 out of 142 amino acids, or 87% homology. The start of the MspL2 clone corresponds to amino acid number 190 in the human muscle enzyme and goes through to the end of the protein at amino acid 332. The STOP codon is followed by 175 bp, at which point there is a poly(A) addition signal sequence AATAAA (Proudfoot and Brownlee, 1976), followed by 17 bases and a run of 14 As. This 37 bases

A GAPDH pseudogene on the human X chromosome defmes a 11 bb GGlb

bl

-o-4 CLG t,LI tITLI r GbLC CL-L Hi L I- LeLt Trp Hr-4 4Tp C.l.

91

li I l IIT II I4b 1'

111 1Tt1_ 1

4I LI!II

u,T tTi, t1v!' FlIC, 1t 17 -,lC il"F,t 9 Ar#! 3 eLl C, I L L,, s Pr,,- t,l a L.W LI

I.hl_ I1 F1t,E114ti

I

i-ll 1

1le

441

.

IL ,e

,Sp

II. 1TCL 1T LT(_ AHt p4Ac3 I Le =r, A131

TACfiGil

Tyr "-r4-r

rr f

-4' 1s34,

o1.t- o

44r3

ML-T

GLL

ACAL TC

,IAILLI- 113,1G

.4., T

I

'!i,,I

Asp 4t59

C1 1T3AG1- f-

41i

iT I Le

~~~4". l I, L e' reMEl

5

111TC14T11HT

/19

729

AA1

I,C

-cn

I

L1, G1 I

L

t

1

b

lT iIT ,1AC AITAI r l.f5-r I rp lyr Ft5p "sr.

I

a

I

LiLI

I/al

> 75 -T T Fhhe

4211 l

L.TILLCC341C3G .131G LF His' IET lia 1e4r Lys 111L.

',/9 C 1 l. CCLT

499 481t rGLTG GGGA(TCCCLLcT

559

5b9

1i3lLALCCCL

1TGAAGAGGG

549 ,

br

rcrL

11A"

1i.

13119 96 I_ATLA -TA1A HtTACCCTGT Gt: T1C4GCC3A AAAA1 AAAAA

819 134 659 e49 G13G 1LGAG A AAA 1TCLTGG C1TrACATT13 T GACTTTGTGA

GCCCA1TTGT AAA1TGGC131

hr Fhe

4 4h9 ACA6-LAGL A L13L1AbAb3A L

519 52'J TC LLLACCI ACi-iL3L IILLL3

'4et" 13A13.L 11-CL4

111. T r

I

lLe F-h"

Hi

4 _.' 44'9 CLAILAGCCi GL(1C(LICTLG

'11()

ft FtE L A, Gtt(f4 L, F^C HA'trJ GTt, G1GC, 4LftG ft4- p, i55sp Ihr L- ,5 ','~E. Iw'9 VaI, L ysC 61 ri

It C tl -' Tilhr HI' -".r Ser

Lit_

.9t

G41 *1

rt

I_,C, r,-^t FLT GAI-t CFiC C'AG GTG tTC ftc LTL1-(: C IC AAGt 66LA -L L, I', Tsvr I hr C, It, H 1s (':Ir, V'aI V,a I e t L; -. tl, 1. F?ttL,

(RAf' Tt- L;"nG C, C, IIIda er GI t, 1~,

5erSer

cDNA clones A fibroblast cDNA library (Woods and Clarke, St. Mary's, unpublished) was screened for recombinants which hybridised to the OAl sequence. Six recombinant clones were shown to cross-hybridize on Southern blots to the MspL2 fragment of OAl, which contains GAPDH-related coding sequence. All six clones recognised a unique, abundant 1.2-kb mRNA in a variety of human cells in Northern blot analysis (e.g.,

-;1LLLI

T3 L I Ii TlCiC CTG 1L3'LFt-C IC, II,l'1.-L -iGC LII GIrL TC tLIr I-IL b-iL :llr Gi-l. Li-iL 1,Gp L3TL Thr

yt-

Sibinda, personal communication).

Gih"( I-IL- f4"C t,Gt, AAl-i ,s bl Il I-PLI (4E n GL

II C i

would be likely to seriously impair the binding of the phosphate group of the substrate and to disrupt the normal catalytic properties of the enzyme (T.Blundell and Lynn

.1' 45 CTL CLUTC L-iG AA3C ATE TLC LLT GCC t1FitStt lle Ile frr,- Al Lett

1

-

I i,C

ILC

LI

Flt{56;I F1ilbtr Obl' HAG OICt_ h ItC #44 ,' 1 lie FrL FUl at ME ul . .I L,

C w1c I,tiL FL LiLC hr LI I 1WE.r fhr- LI.

41. LIL I LIt, Li-, CI L III I-L Mt 1.13 Le,t Ibtr t1.

GGL.

multigene family

7119 699 689 4AA13-1iCrTI AccLcL1CTA ACC1 13G11TT

759 7/9 749 7-9 TLATGGGTCA 1GC3TAGTG33 TCATGCCTG TLGTCLTLGC TACTCCGG

Fig. 4. Nucleotide sequence of MspL2 (a 777-bp subclone of the X-derived OAI sequence), and the corresponding amino acid sequence. The Alu like sequence is underlined by dashes.

follows a pattern which represents most known polyadenylated mRNAs (Darnell et al., 1973). This feature, together with the coincident homology of the GAPDH coding sequence, strongly suggests that the X-derived OA1 sequence is a pseudogene which has arisen by reverse transcription of a processed, polyadenylated mRNA (Hollis et al., 1982). The lack of introns would support this interpretation, although it is not known whether the expressed genomic sequences for GAPDH in fact have introns. Immediately following the run of As, there is a simple repeat sequence GAGAGAGA, followed by 127 bp. A 26 base stretch at the 3' end of the sequence between 739 and 765 shows high homology with the average consensus sequence of the human Alu interspersed repeat family (3 mismatches in 26 bases) (Deininger et al., 1981). Approximately 6007o of the amino acids which are different between the translated MspL2 sequence and the human muscle GAPDH sequence and between the MspL2 and the pig muscle GAPDH, represent semi-conservative changes, e.g., Ile to Leu, Ala to Leu, Gln to Glu, Asp to Asn, Val to Leu. However, at position 231 in the GAPDH protein, where there is normally an Arg residue, the MspL2 sequence would have a Cys. The Arg 231 is conserved in all GAPDH proteins sequenced (including man, pig, lobster, yeast, bacteria) and is involved in binding inorganic phosphate during phosphorilysis of the glyceraldehyde-3-phosphate substrate (Harris and Waters, 1976). Substitution of a Cys residue at position 231

Figure 1). The MspL2 subclone of OAl and the two cDNA clones were used to probe human, mouse and human X-only hybrid DNA (Figure 5). The patterns on Southerns shown by the MspL2 subclone and the six cDNA clones were nearly identical to that of the whole OA1 clone, except that, with the genomic clones, the strongest band was the 6.3-kb X-linked band, whereas this band, though present with cDNA clones as probe, gave a signal which was roughly equivalent to all the

other bands. No extra bands were seen in the presence of the Y chromosome. These patterns suggest that the cDNA clones may not be derived from the X chromosome, although further analysis using gene-specific sequences is necessary to determine the chromosomal location of the expressed genes. These data indicate that there is a family of highly conserved GAPDH-related gene sequences in the human and mouse genomes. Thirteen or more bands are resolved in human DNA, and these range in size from -1.5 to 20 kb. An estimate of the number of gene sequences that these bands represent requires further knowledge of the cDNA clones and of the intron/exon structure of genomic clones. Partial sequence analysis of two of the cDNA clones has not yet revealed homology with MspL2, suggesting that the 3 '-untranslated region of the cDNA clones is different from the corresponding part of the genomic clone MspL2 (data not

shown). Discussion The OAI clone located on the short arm of the human X chromosome has defined a multigene family of GAPDHrelated sequences. Only one gene locus for GAPDH has been identified in man by isozyme analysis in an extensive survey of human tissues (Edwards et al., 1976), and this has been mapped to chromosome 12 (Bruns and Gerald, 1976). However, three functional genes for GAPDH have been identified in yeast (Holland et al., 1979; Musti et al., 1983), and in man and other mammals, other enzymes of the glycolytic pathway, e.g., hexokinase, enolase and aldolase (Grossbard and Schmike, 1968; Cory and Wold, 1966; Penhoet et al., 1966) are encoded by more than one gene locus. Sequence analysis of the fibroblast and other cDNA clones may reveal whether there is more than one gene locus for GAPDH expressed in human tissues. The GAPDH-related gene family has several membes whose sequences appear to be highly conserved between species as divergent as man and mouse as well as within species. The structure of the MspL2 subclone of OAl is equivalent to the general structure of a processed, polyadenylated mRNA, rather than to an expressed, coding genomic sequence (Darnell et al., 1973). The structural features of processed mRNAs, i.e., a polyadenylation site followed 17-20 2637

F.J.Benham, S.Hodgkinson and K.E.Davies Table I. A comparison of the inferred amino acid sequence of MspL2 with the published protein sequence of GAPDH purified from human muscle (Nowak et al., 1981) and pig muscle (Harris and Perham, 1968) 190 OAI Gly MAN Gly PIG Gly 210 QAl Thrl MAN Ala PIG Ala 230 OAI Phe MAN Phe PIG Phe 250 OAI Ala MAN Ala PIG Ala 270 OAI Ile MAN Ile PIG Ile 290 OAI Ser MAN Ser PIG Ser 310 OAI Trp MAN Trp PIG Trp 330 OAI Ser MAN Ser PIG Ser

G Asp

Gly Gly Gly

His Arg Arg

Gly Gly Gly

Ala Ala Ala

e Ala Ala

Gln Gln Gln

Asn Asn Asn

Ile Leu Ile

Ile le Ile

Pro Pro Pro

Ala Ala Ala

Ser Ser Ser

Thr Thr Thr

Gly Gly Gly

Lys Lys Lys

Val Val Val

Ile Ile Ile

Pro Pro Pro

Glu Glu Glu

Leu Leu Leu

A Asp Asp

Gly Gly Gly

Lys Lys Lys

Leu Leu Leu

Thr

Val

Gly Gly Gly

Thr Thr

Gly Gly Gly

Met Met Met

Ala Ala Ala

Pro Pro Pro

Thr Thr Thr

Ala Ala Pro

-

Asn Asn

Val Val Val

Ser Ser Ser

Val Val Val

Val Leu Val

Asp Asp Asp

Leu Leu Leu

Thr Thr Thr

Cys Cys Cys

Arg Arg Arg

Leu Leu Leu

Glu Glu Glu

Lys Lys Lys

Pro Pro Pro

Tyr Tyr Tyr

Asp Asp Asp

Asp Asp Asp

Thr

Lys

Ile Ile

Lys Lys

Lys Lys Lys

Val Val Val

Val Val Val

Lys Lys Lys

Gln Gu Gln

Ala Ala Ala

Ser Ser Ser

Glu Glu Glu

Gly Gly Gly

Pro Pro Pro

Leu Leu Leu

Lys Lys Lys

Gly Gly Gly

Leu Leu Leu

Gly Gly

Thr Thr Thr

Glu Glu Glu

GHis Gn Asp Giu Asp Gln

Val Val Val

Val Val Val

Ser Ser Ser

Ser Asp

Asp Asp Asp

Phe Phe Phe

Asn Asn Asn

Ser

Gly

Gly

Tyr Tyr Tyr

Asp

As Ser Ser

Thr Asn Thr

His His His

Ser Ser Ser

Thr [ Thr

Phe Phe Phe

Asn Asp Asp

Ala Ala Ala

Gly Gly Gly

Ala Ala Ala

IValle Gly Ile Gly Ile

Ala Glu

Leu Leu Leu

Asn Asn Asn

Asn

Asp Asp

His Thr His

Phe Phe Phe

Phe Val Val

Lys Lys Lys

Leu Leu Leu

Ile Vai Ile

Ser Ser Ser

Tyr Tyr Try

Asp Asp Asp

Asn Asn Asn

Glu Glu Glu

Phe Phe Phe

Gly

Tyr Tyr Tyr

Asn Arg AGurg Asn Arg

Met Val Val

Val Val Val

Asp Asp Asp

Leu Leu Leu

Met Met Met

Ala Ala Val

His His His

Met Met Met

Ala Ala Ala

Lys Lys Lys

Glu Glu Glu

Lys Lys Lys

Leu Leu Leu

Trp Trp Trp

Arg Arg Arg

Ala Ala Ala

Lys Lys Lys

Ala Ala Ala

Me Val

[ Arg Arg

Val Val Val

Lys Lys Lys

Asp

Gly Gly

Ser Ser Ser

Fig. 5. Southern blots showing EcoRI bands recognised by two fibroblast cDNA clones and the MspL2 subclone of OAI. Probes were: (A) MspL2; B, OAI complete; C, cDNA clone 6A; D, cDNA clone 8A. 1, Horl 9X (human X only hybrid); 2, human female B cell; 3, IR (mouse parent). 2638

Ala

Cys

nucleotides later by a run of As, and the lack of introns, have previously been observed for members of other gene families (Wilde et al., 1982; Chen et al., 1982; Lemishka and Sharp, 1982). It is generally accepted that these processed gene sequences represent unexpressed pseudogenes which have arisen by integration into the genome of a reverse transcribed, mature mRNA, although the mechanism by which the insertion occurs is unknown (Hollis et al., 1982). By analogy, the X-derived OAI sequence represents a pseudogene. Comparison of its complete sequence with the genomic sequence of the expressed gene from which it arose and its corresponding mRNA will reveal the extent to which it has evolved without selective pressure following its integration into the X chromosome. The inferred amino acid sequence of MspL2 showed closer homology with the pig muscle GAPDH sequence than with the human muscle GAPDH sequence: 18 out of 142 amino acids were different between MspL2 and pig GAPDH compared with 27 out of 142 amino acids different between MspL2 and human GAPDH. In addition, higher homology was observed between the pig and the human muscle GAPDH over the same 142 amino acid sequence (14 differences) than between MspL2 and human or between MspL2 and pig. More than 50% of the amino acid sequence differences between OAI and the known GAPDHs represent semi-conservative changes. These observations suggest that

A GAPDH pseudogene on the human X chromosome defines a multigene family

the OA1 pseudogene sequence was not recently derived from the human GAPDH sequence expressed in muscle, and a possible interpretation of the data is that OAl is a pseudogene derived from a different, but closely related GAPDH sequence. The GAPDH-related gene family may encode enzymes with similar sequence but different functions or substrate specificities, e.g., a glyceraldehyde binding enzyme rather than a glyceraldehyde-3-phosphate binding enzyme. Three-dimensional modelling at the protein level of closely related members of this gene family may reveal functional differences between them. An alternative explanation for the observed discrepancies between the sequences is that the published amino acid sequence of the human GAPDH may contain some errors. A direct comparison of the protein sequence with the nucleic acid sequence of cloned muscle cDNAs for GAPDH will resolve this point. Five of the amino acid differences between OAl and human GAPDH are either Asn to Asp or Gln to Glu, differences which may represent deamidation of the primary protein product. Deamidation of Asn and Gln residues may be a mechanism for generation of the secondary isozymes of GAPDH observed in muscle extracts (Edwards et al., 1976). The expression data using Northern blots showed that the OA1 sequence is sufficiently homologous to a 1.2-kb transcript in all cells examined to give a strong signal. Also the cloned sequence does hybrid select an mRNA which translates into a polypeptide of mol. wt. 36 K, the subunit size of GAPDH. Only the one size class of mRNA was observed in the range of human and mouse tissues studies, so further detailed analysis is required in order to determine whether the 1.2-kb transcripts are the products of one or more similar GAPDH genes. By Southern analysis, one would predict that there are several and may be up to 20 or so GAPDH-related sequences present in both the human and mouse genome. This number of gene sequences is characteristic of several observed gene families now identified, e.g., actins, tubulins, arginosuccinate synthetase (ASS), and tropomyosins. Some of these gene families such as the tubulins (Wilde et al., 1982) and the tropomyosins (Macleod and Talbot, 1983) have been shown by structural analysis to contain pseudogenes as well as transcribed, expressed genes. The proportion of the sequences within the gene families which represent pseudogenes has not been determined. The actin (Hanauer et al., 1984), tubulin (Darlington et al., 1982) and the ASS (Beaudet et al., 1982) gene families as well as the small oncogene family of HarveyRas sequences (O'Brien et al., 1983) all have a member on the X chromosome. It has been postulated that these X sequences probably represent pseudogenes. This report provides the first structural evidence that a member on the X chromosome of a dispersed gene family is a pseudogene. The mechanism by which OAI arrived on the X chromosome is not known. The simple repeat sequence immediately next to the run of As may have been involved in and/or created by the insertion process of a reverse transcribed GAPDH-related mRNA (Schmid and Jelinek, 1982). Other examples of pseudogenes are flanked by direct repeats at the precise 3' and 5' ends of the reverse transcribed mRNA (Chen et al., 1982; Hollis et al., 1982; Lemishka and Sharp, 1982). Perhaps further structural analysis of this and other pseudogenes on the X in relation to the corresponding expressed sequences will reveal properties common to dispersed members of gene families and clues to transposition mechanisms.

Materials and methods Cell lines Human cell lines used for mRNA and/or DNA preparations were the B cell lymphoblastoid lines, G3.32.3, and Maja (Povey et al., 1973), the testicular teratocarcinoma-derived line 2102Ep (Andrews et al., 1980) the ALL-derived T cell line MOLT4 (Minowada et al., 1972), and a male diploid fibroblast line. Human/mouse somatic cell hybrid lines were Horl9X (Goodfellow et al., 1980), which contains a whole X as its sole human genetic component, 1W15 (Hope et al., 1982), which contains only the long arm of the human X, and 697X175K27, which contains several human chromosomes plus a translocation with Xp2l to Xqter. This is the only human X component in the line (Wieacker et al., 1984). Mouse cell lines were PCC4AO (referred to as PCC4), a near diploid embryonal carcinoma line (Jakob et al., 1973). IR, an L cell (Nabholz et al., 1969), and RAG, an adenocarcinoma (Klebe et al., 1970). IR is the mouse parent of Horl9X and IW15, RAG is the mouse parent of 697X175K27. Cell culture Cells were grown at 37°C in Dulbecco's modified Eagles medium, RPMI 1640, supplemented with 10% foetal bovine serum, penicillin and streptomycin. Where appropriate retention of the human HPRT gene was ensured

by supplementing the medium with 100 pM hypoxanthine, 10 jM methotrexate and 16 AM thymidine. Cells which grew in suspension were subcultured by dilution. RNA isolation Total cell RNA was obtained by homogenising freshly harvested cells or tissues in 4 M guanidinium isothiocyanate and sedimenting out the RNA on a 5.7 M CsCI2 gradient (Chirgwin et al., 1979). The polyadenylated mRNA fraction was separated by oligo d(T1cellulose chromatography of the total RNA (Efstratiadis and Kafatos, 1976). Labelling of nucleic acids In vitro -2P-labelled cDNA was prepared as described by Woods et al. (1980) using oligo d(T) to prime synthesis of poly(A)+ mRNA. Specific activity achieved was 1.5 -2.5 x 108 c.p.m./pg poly(A)+ mRNA. Recombinant clones were nick-translated using [32P]dCTP to a specific activity of -5 x 108 c.p.m./ug (Rigby et al., 1977). Northern blotting Poly(A)+ mRNA was denatured by I M glyoxyl and separated by electrophoresis through a 1.1% agarose gel as described by Thomas (1980). The RNA was transferred to nitrocellulose and the blots were hybridised with the 32P-1abelled nucleic acid probes essentially as described by Thomas (1980), but without the addition of dextran sulfate. The non-hybridised radioactivity was eliminated by washing the filters to 0.1% SSC with 0.1%o SDS. Filters were exposed to Fuji X-ray film at 70°C with intensifying screens. The sizes of the mRNA species recognised by the labelled probes were determined from a calibration curve made from the mobilities of fragments of rRNA of known mol. wt. from 2-5A (ppp (AP 'P)2) treated mouse cell lysates (gift of R.Silver-

man). Southern blotting DNA (10 ug) was digested with EcoRI electrophoresed through a 0.8% agarose gel and transferred to nitrocellulose. Blots were hybridised as described by Jeffreys and Flavell (1977) with '2P-labelled nucleic acid probes and then washed to 0.1 x SSC with 0.1% SDS at 65'C. Filters were exposed to X-ray film as for Northern blots. mRNA selection and cell free translation 5.0 pg of DNA from the OAl clones and a cDNA clone for HLA class I, pHLA-A (a gift of J.Trowsdale, I.C.R.F., London) was denatured and immobilised onto nitrocellulose filters. The filters were hybridised to poly(A)+ mRNA isolated from human B lymphoid cells according to the method described by Maniatis et al. (1982). mRNA which hybridised specifically to the cloned DNA was melted'off the filters by boiling and subsequently translated in the reticulocyte lysate cell-free system (Pelham and Jackson,

i976). 3sS-Labelled proteins synthesised

in vitro were

analysed using

one-

dimensional SDS-polyacrylamide gel electrophoresis (Laemmli, 1970). Filters with lambda DNA and plasmid DNA were used as negative controls for the mRNA selection.

cDNA library screening The fibroblast cDNA library constructed by D.Woods and B.Clarke (St Mary's unpublished) was screened by the procedure of Grunstein and Hogness (1975) with the X-derived OAI lambda clone as probe. DNA from positive recombinants was checked by Southern blot analysis to crosshybridise with the MspI fragment (MspL2) of OAI which contains GAPDHrelated coding sequence.

2639

FJ.Benham, S.Hodgkinson and K.E.Davies Sequencing Two MspI fragments of OAI which hybridised to in vitro synthesised cDNA were cloned into the AccI site of the M13 vector MP8 (Messing and Vieira, 1982). The M13 clones were sequenced using the dideoxy chain termination procedure (Sanger et al., 1977). The sequence of the 250-bp in the centre of the 777-bp clone was confirmed by sequencing HaeIII generated subclones. The sequence of the OAI subclone, MspL2, was translated into the corresponding protein sequence and a search of the National Biomedical Research Foundation Protein Data Bank (USA) was carried out for homology with known protein sequences using the programmes of Wilbur and Lipman (1983).

Acknowledgements We thank B.Pym, S.John and S.McGlade for excellent technical assistance. A.Kelly, N.Stoker, K.Cheah, Y.Edwards, P.Goodfellow and R.Williamson provided helpful advice and discussion. This work was funded in part by the Medical Research Council and the Muscular Dystrophy Group of Great Britain.

References Andrews,P.W., Bronson,D.L., Benham,F.J., Strickland,S. and Knowles, B.B. (1980) Int. J. Cancer, 26, 269-280. Beaudet,A.L., Su,T.S., O'Brien,W.E. (1982) Cell, 30, 287-293. Bruns,G.A. and Gerald,P.S. (1976) Science (Wash.), 192, 54-56. Chen,M.J., Shemada,T., Moulton,A.D., Harrison,M. and Neinhus,A.W. (1982) Proc. NatI. Acad. Sci. USA, 79, 7435-7439. Chirgwin,J.M., Przybyla,A.E., MacDonald,R.J., Rutter,J. (1979), Biochemistry (Wash.), 18, 5294-5295. Cory,R.P. and Wold,F. (1966) Biochemistry (Wash.), 5, 3131-3137. Darlington,G.J., Maraia,R.J. and Cowan,N.J. (1982) Am. J. Hum. Genet., 34, 170A. Darnell,J.E., Jelinek,W.R. and Molloy,G.R. (1973) Science (Wash.), 181, 1215-1221.

Davies,K.E., Taylor,P. and Miller,C.R. (1983) Differentiation, 23 (Suppl.), 44-47.

Deininger,P.L., Jolly,D.J., Rubin,C.M., Friedman,T. and Schmid,C.W. (1981) J. Mol. Biol., 151, 17-33. Edwards,Y.H., Clark,P. and Harris,H. (1976) Ann. Hum. Genet. Lond., 40, 67-77. Efstratiadis,A. and Kafatos,F.C. (1976) Methods Mol. Biol., 8, 1-124. Goodfellow,P.N., Banting,G., Levy,R., Povey,S. and McMichael,A. (1980) Somat. Cell Genet., 6, 777-787. Grossbard,L. and Schimke,R.T. (1968) J. Biol. Chem. 241, 3546-3560. Grunstein,M. and Hogness,D.S. (1975) Proc. Nati. Acad. Sci. USA, 72, 3961-3965. Hanauer,A., Heilig,R., Levin,M., Moisan,J.P., Grzechik,K.N. and Mandel, J.L. (1984) Cytogenet. Cell Genet., 37, 487488. Harris,J.I. and Perham,R.N. (1968) Nature, 219, 1025-1028. Harris,J.I. and Waters,M. (1976) in Boyer,P.D. (ed.), The Enzymes, Vol. XIH, Academic Press, NY, pp. 1-50. Holland,M.J., Holland,J.P. and Jackson,K.A. (1979) Methods Enzymol., 68, 408419. Hollis,G.F., Hieter,P.A., McBride,O.W., Swan,D. and Leder,P. (1982) Nature, 296, 321-325. Hope,R.M., Goodfellow,P.N., Solomon,E. and Bodmer,W.F. (1982) Cytogenet. Cell Genet., 33, 204-212. Jakob,H., Boon,T.I., Galliard,J., Nicolas,J.F. and Jacob,F. (1973) Ann. Microbiol., 1246, 269-282. Jeffreys,A.J. and Flavell,R.A. (1977) Cell, 12, 429439. Klebe,R.J., Chen,T.R. and Ruddle,F.H. (1970) J. Cell Biol., 45, 74-82. Laemmli,U.K. (1970) Nature, 227, 680-685. Lemishka,I. and Sharp,P.A. (1982) Nature, 300, 330-335. MacLeod,A.R. and Talbot,K. (1983) J. Mol. Biol., 167, 523-537. Maniatis,T., Fritsch,E.F. and Sambrook,J. (1982) Molecular Cloning, A Laboratory Manual, published by Cold Spring Harbor Laboratory Press, NY.

Messing,J. and Vieira,J. (1982) Gene, 19, 269-276. Minowada,J., Ohnama,T. and Moore,G.E. (1972) J. Natl. Cancer Inst., 49, 891-895. Musti,A.M., Zehner,Z., Bostian,K.A., Paterson,B.M. and Kramer,R.A. (1983) Gene, 25, 133-143.

Nabholz,M., Miggiano,V. and Bodmer,W.F. (1969) Nature, 223, 358-363. Nowak,K., Wolny,M. and Banas,T. (1981) FEBS Lett., 134, 143-146. O'Brien,S.J., Nash,W.F., Goodwin,J.L., Lowy,D.R. and Chang,E.H. (1983) Nature, 302, 839-842.

2640

Pelham,H.R.B. and Jackson,R.J. (1976) Eur. J. Biochem., 67, 247-256. Penhoet,E., Rajkumar,T. and Rutter,W.J. (1966) Proc. Nall. Acad. Sci. USA, 56, 1275-1282. Ploegh,H.L., Orr,H.T. and Strominger,J.L. (1980) Proc. Natl. Acad. Sci. USA, 77, 6081-6085. Povey,S., Gardner,S.E., Watson,B., Mowbray,S., Harris,H., Arthur,E., Steel,C.M., Blenkinsop,C. and Evans,H.J. (1973) Ann. Hum. Genet., 36,

247-266. Proudfoot,N.J. and Brownlee,G.G. (1976) Nature, 263, 211-214. Rigby,P.W.J., Dieckmann,M., Rhodes,C.C. and Berg,P. (1977) J. Mol. Biol., 113, 237-251. Sanger,F., Nicklen,S. and Coulson,A.R. (1977) Proc. Natl. Acad. Sci. USA, 74, 5463-5467. Schmid,C.W. and Jelinek,W.R. (1982) Science (Wash.), 216, 1065-1070. Thomas,P.S. (1980) Proc. Natl. Acad. Sci. USA, 77, 5201-5205. Wieacker,P., Davies,K.E., Cooke,H.J., Pearson,P.L., Williamson,R., Southern,E., Zimmer,J. and Rogers,H.H. (1984) Am. J. Hum. Genet., in press.

Wilbur,W.J. and Lipman,D.J. (1983) Proc. NatI. Acad. Sci. USA, 80, 726730.

Wilde,C.D., Crowther,C.E., Cripe,T.P., Gwo-Shu,M. and Cowan,N.J. (1982) Nature, 297, 83-84. Woods,D., Crampton,J., Clarke,B. and Williamson,R. (1980) Nucleic Acids Res., 8, 5157-5168. Received on 16 April 1984; revised on 30 July 1984