Sequence heterogeneity within the human alphoid repetitive DNA family

0 downloads 0 Views 1MB Size Report
Nov 26, 1985 - From sequence analysis, clones L1.26 and L1.84 are found to consist of .... and the DNA-fragments blunt-ended with DNA polymerase I .... GGATATTTGG ATTGCTTTGA GGATTTCGTT GGAAGCGGG. L1.84. 10. ENDt. 30 t1 VD. 50. 60. 70 .... repeat were deleted from L1.84, base 14 was deleted from L1.26.
Volume 14 Number 5 1986

Nucleic Acids Research

Sequence heterogeneity within the human alphoid repetitive DNA family

P.Devilee, P.Slagboom, C.J.Cornelissel and P.L.Pearson

Department of Human Genetics and 'Department of Pathology, University Medical Center, Leiden, The Netherlands Received 26 November 1985; Revised and Accepted 13 February 1986 ABSTRACT. We have cloned and determined the base-sequence and genome organization of two human chromosome-specific alphoid DNA fragments, designated L1.26, mapping principally to chromosomes 13 and 21, and L1.84, mapping to chromosome 18. Their copy number is estimated to be approximately 2,000 per haploid genome. L1.84 has a double-dimer organization, whereas L1.26 has a much less defined higher order tandem organization. Further, we present evidence that the restriction-site spacing within the alphoid DNA family is chromosome specific. From sequence analysis, clones L1.26 and L1.84 are found to consist of 5 and 4 tandemly duplicated 170 bp monomers. Cross-homology between the various monomers is 65-85%. The analysis suggests that the evolution of tandem-arrays does not take place via a defined 340 bp unit, as was inferred by others, but via circularly permutated monomers or multimers of the 170 bp unit.

INTRODUCTION. Restriction enzyme analysis has shown that human DNA contains many families of repeated DNAs (1-4). These differ from one another with respect to genomic organization, repeat-lengths and copy number. The AluI-family, for example, comprises approximately 300,000 copies of a short (300 bp) sequence, interspersed among stretches of unique sequences or genes (4). In contrast, the KpnI family is interspersed throughout the genome in longer fragments with a lower repetition frequency (3). The alphoid DNA family (5), so termed because of its homology to the alpha component in the African Green Monkey (6) is an example of a different type of organization, characterized by long arrays of tandemly repeated 170 bp units. It is commonly referred to as "satellite"-DNA, although it is distinct in sequence from any of the human satellite DNA peaks obtained after isopycnic centrifugations (7). In man, the alphoid family forms a pronounced 340 bp and 680 bp band in ethidium bromide stained gels of EcoRI digested genomic DNA. When partial digests are analyzed by Southern blotting, using the 340 bp fragment as probe, a "ladder" of bands is observed, the steps of which correspond to multiples of 170 bp (8). Densitometer scanning suggests that the 340 bp

© I R L Press Limited, Oxford, England.

2059

Nucleic Acids Research band represents 0.75% of the genome, corresponding to about 55,000 copies (8). Recently, it has become apparent that the alphoid DNA family can be divided into subfamilies, some of which may be characteristic of specific chromosomes. Thus, the EcoRI-dimer, described by Manuelidis (9) is located predominantly in the centromere regions of chromosomes 1,3,7,10 and 19. A 2.0 kb BamHI-fragment is similarly specific for the X-chromosome (10), while a 5.5 kb EcoRI-fragment characterizes the Y-chromosome (11). We have isolated two alphoid sequences, designated L1.26 and L1.84, and have found them to be principally localized to the pericentric regions of chromosomes 13 and 21 (L1.26) and chromosome 18 (L1.84) (12). In this article we present the sequence analyses of L1.26 and L1.84 which show that they consist of 5 and 4 tandemly organized alphoid subunits respectively, each approximately 170 bp long. Within the 170 bp units, some regions appear conserved while others are more variable. Comparison of chromosome specific members of the alphoid DNA family will give insight into the evolutionary constraints imposed on DNA sequences adjacent to the centromere.

MATERIALS AND METHODS. DNA sources and preparations. Genomic DNA was isolated from cell lines or lymphocytes as described (13). Recombinants L1.26 and L1.84 were selected from a random human recombinant DNA-library (14), containing EcoRI-inserts from DNA restricted to completion cloned in plasmid pAT153. Plasmid-DNA was prepared according to the methods of Maniatis et al. (15).

Cell lines. Human-rodent somatic cell hybrids were obtained as described earlier (16). A hamster hybrid cell line with the X-chromosome as the only retained human material, was a kind gift of Dr. S. Coss, Dunn School of Pathology, Oxford. Southern blotting and hybridizations. Genomic DNA was digested with restriction enzymes as recommended by the supplier, in a final volume of 20 M1. To ensure complete restriction, a threefold excess of enzyme-units was added. Digestion was monitored in a parallel aliquot with phage lambda-DNA as internal marker. After three hours of incubation, the samples were incubated 10 minutes at 65-C, and loaded and electrophoresed on 0.8% agarose gels in Tris-acetate. The separated DNA was blotted onto nylon filters (Gene Screen, New England Nuclear) using standard procedures (17). Overnight hybridization and subsequent washing of the filters was performed at 65-C as described by Jeffreys and Flavell (18). The hybridiza-

2060

Nucleic Acids Research tion-mixture contained 20 mM Tris-HCl pH 7.5, 2 mM EDTA, 3x SSC (0.45 M NaCl, 0.045 M Na-citrate), 0.1 mg/ml salmon sperm DNA, lOx Denhardts Solution, (0.2% ficol, 0.2% BSA, 0.2% polyvinylpyrolidon), 0.1% SDS, 5% dextran-sulphate and 5 ng/ml 32P-nicktranslated probe (19). Exposure was at -70*C on Sakura film backed by an intensifying screen. Sequencing-strategy. Both recombinants L1.26 and L1.84 were sequenced using the dideoxy chain termination method (20). The inserts were recloned into the EcoRI-site of M13mp8 (21), and single-strand recombinant phages cultured, containing opposite insertstrands. Thus, 250 bp from each end could be sequenced. The inner 200 bp of L1.84 were sequenced from an EcoRI-RsaI fragment (fig.5) subcloned in M13mplO. Subcloning of L1.26 was as follows. The recombinant pAT153 plasmid was linearized at the HindIII site. The insert contains no HindIII sites. In the resulting linear fragment, the insert is at one extreme end. This was treated for various times with the exonuclease Bal-31 (Boehringer Mannheim), and the DNA-fragments blunt-ended with DNA polymerase I (Klenow-fragment, Boehringer). Subsequent digestion with EcoRI yielded EcoRI-blunt insert fragments progressively shortened from one EcoRI-site by approximately 150 bp. These were also subcloned in M13mplO and single-stranded phages isolated (21). The sequencing reactions were carried out with 35S-dATP (Amersham, 600 Ci/mmol) as label (22), using the New England Nuclear protocols. A 17-base primer was the kind gift of dr. Van Boom, University of Leiden, The Netherlands. Deoxy-and dideoxy nucleotides were supplied by Boehringer Mannheim. Sequencing-gels were dried on a BioRad slabgel-dryer and exposed overnight at room-temperature on Sakura X-ray film.

RESULTS. L1.26 and L1.84 belong to the alphoid DNA family. Two independent dot-blot experiments are shown in fig.1, from which the copy number of L1.26 is estimated to be approximately 2,000 per haploid genome. It is unlikely that sequences with less than 95% homology will be detected under the applied hybridization conditions (see Materials and Methods; final washing of the filter in O.lx SSC). Densitometer scans (not shown) indicate that approximately 60% of hybridizing signal is contained in EcoRI fragments localized on chromosomes 13 and 21 (fig.4, and ref.12). The results with L1.84 (not shown) were similar. When total human genomic DNA is partially digested with EcoRI, blotted and hybridized with either L1.26 (fig. 2, panel A) or L1.84 (panel B), a ladder of bands is formed early during digestion. The lengths of these bands correspond to multiples of approximately 2061

Nucleic Acids Research b *

*

*E.

* E

* * *.7>,

Figure 1. Dot-blot experiment using L1.26, recloned in M13mp8, as probe. The equivalent of 100 (1); 250 (2); 500 (3); 1,000 @ (4); 2,500 (5); 5,000 (6); 7,500 (7) and 10,000 (8) copies of the insert-fragment of L1.26 per haploid genome was spotted in duplo (a,b) on nylon Gene Screen membrane. One jg of total genomic DNA (46,XX) was spotted eight times as a reference (c). Probe was labeled by primer extension (ref.21). Filter was z: exposed for 4 hr.

,, 3,

_l

F

_

___ a-__

-19w,qw ,:.:

". owl:

:?..

Wfrf

z'.

'Aw-

Figure 2. Southern analysis of partial digests of 5 g/lane total genomic DNA obtained with EcoRI, using L1.26 (panel A) or L1.84 (panel B) as probe. Numbers between the panels indicate multiples of 170 bp. Extent of digestion increases progressively from lane a (1/16 x complete) to f (2 x complete) in both panels. Exposure time was two days.

2062

Nucleic Acids Research a

b

c

d

e

f

g

h

kb 4.43.3-

2.1- w

0.66-

l

_

Figure 3. Southern blot of total genomic DNA hybridized with L1.26. Restriction enzymes used are: TaqI (a); XbaI (b); KpnI (c); HaeIII (d); EcoRI (e); BamHI (f); HindIII (g) and Bgl II (h). Exposure was overnight. 170 bp, indicating that L1.26 and L1.84 belong to the tandemly organized alphoid DNA family. In completely digested samples, L1.26 and L1.84 hybridize to the same ladder-pattern, but with different intensities per band. The largest detectable multimer is a 16-mer in both instances; a fraction of alphoid DNA remains resistant to EcoRI-restriction. Although the 340 bp band is very pronounced in ethidium bromide stained gels, it hybridizes weakly with either probe. Apparently, L1.84 is organized predominantly as a tetramer; multiples thereof (8-,12-,16-,20-mer) are detected early in the course of digestion (Panel B, lane b) and the 4-mer and 8-mer are major bands in completely digested samples (Panel B, lane f). L1.26 does not cross-hybridize significantly with the tetrameric higher order multimers of L1.84 (Compare lanes b, both panels). Its organization is more complex: several longer multimers appear simultaneously (panel A, lane c) and are converted with different kinetics into 8-, 5- and 4-mers. L1.26 contains a 5-mer, which is 0.85 kb in length,

2063

Nucleic Acids Research

AM

eb

B

a

b

C

t

kb

d ci

9.6-~ khb

4.4

*n

44

-.3 "5

21

-~~ ~

~ ~ ~~~~~33

w~_

~~~~_

~

~

4

Figure 4. Southern analysis of 10 ,ug of hybrids Cl 2D (panel A) and 34-2-3 B3 (panel B), containing chromosome X and chromosome 13 as their only retained human material respectively, using L1.26 as probe. Digestions were with EcoRI (panel A, lane a; panel B, lane c), BamHI (panel A, lane c; panel B, lane a) or with both (panels A and B, lanes b). 'M' is marker. Panel C shows partial digestion of total genomic DNA with BamHI (5 ug/lane). Extent of digestion increases progressively from lane a (1/16 x complete) to lane f (2 x complete). Exposure times: Panel A six days, Panel B three days, Panel C two days. and L1.84 contains a 4-mer, 0.68 kb in length (see below). Digestion of genomic DNA with other endonucleases further supports the tandem organization: most enzymes produce ladders with the same fragment-lengths as EcoRI (fig.3).

Specific repeat-lengths reside on specific chromosomes. Although L1.26 is mainly restricted to chromosomes 13 and 21 (12), it also detects homologous alphoid sequences on the X-chromosome. The organization, however, of alphoid sequences on chromosomes X and 13 is clearly different and indicates that they followed a distinct evolutionary history. When L1.26 is hybridized to a hybrid cell line containing a single human X-chromosome, it is found that the tandem-structures are organized in large EcoRI fragments of 2064

Nucleic Acids Research 2-10 kb (fig.4, panel A, lane a). In contrast, in a chromosome 13 hybrid, EcoRI-fragments are mostly 0.68 and 0.85 kb in length, with a few multimers up to 1.7 kb (panel B, lane c), consistent with the overall organization of L1.26 (fig.2). Similarly, BamHI sites are almost absent from L1.26 and its homologs on chromosome 13 (fig.4, panel B, lane a), while the homologs on the X-chromosome all show up as BamHI-multimers of 2.1, 3.0 and 4.0 kb (panel A, lane c). Subsequent digestion with EcoRI reduces these multimers somewhat and yields a 1.6 kb fragment (Panel A, lane b). Partial digests of genomic DNA with BamHI (fig.4, panel C), shows that the organization of L1.26 homologs in the total genome is similar to their organization on chromosome 13 or the X-chromosome: either in very large fragments or in tandems of approximately 2 kb units. Heterogeneity at the sequence level. 1. Organization. The base sequence of both L1.26 and L1.84 is presented in fig.5; their respective lengths are 849 bp and 684 bp. Both sequences reflect the tandemly repeated organization of alphoid DNA. Most restriction-sites appear with a 170 bp spacing: HinfI at positions 288, 629 and 798 in L1.26; DdeI at positions 43, 213, 379 and 547 in L1.84. A comparison with two monomeric EcoRI-units reported by Wu and Manuelidis (1), termed here a-i and a-2, shows that the homology with these monomers starts at an EcoRI-like recognition sequence at position 26 in L1.26 and position 39 in L1.84 (elaborated in fig.6B). This shift in EcoRI restriction sites results in the last 142 bp of both sequences being homologous to the first 142 bp of a-1 and a-2. Similarly, the first 25 bp of L1.26 and L1.84 are homologous to the last 25 bp in both the a-repeats. Between these regions lie several complete monomers, 4 in L1.26, 3 in L1.84. Their lengths are 171 bp, or slightly less, the shortest unit being 166 bp within 11.84 (position 209-374). However, the new phase of EcoRI sites with respect to the a-1 and a-2 units does not break up the typical 170 bp spacing of EcoRI sites in alphoid DNA. Fig.6A shows the position of average nucleotide homology relative to EcoRI restriction sites in L1.26 and L1.84 relative to a hypothetical tandem-structure of a-i units. An interesting feature of the sequence of L1.84 is the presence of a 14 bp direct repeat at position 16 (arrows fig.5). If a-1 and/or a-2 were tandemly repeated within L1.84, this small perfect direct repeat would be located at the border between two of the repeated units, thereby disturbing their contiguous organization. This direct repeat is the reason why L1.84 itself is somewhat longer than 4 times 170 bp.

2065

Nucleic Acids Research

L1.26 40 50 60 70 10 VA 20 ENDr1 VD 90 VH 80 AATTCAAATA AAAGGTAGAC AGCAGCATTC TCAGAAATTG CTTTCTGATG TCTGCATTCA ACTCATAGAG TTGAAGATTC CCTTTCATAG 160 100 140 150 170 110 120 130 180 AGCAGGTTTG AAACACTCTT TCTGGAGTAT CTGGATGTGG ACATTTGGAG CGCTTTGATG CCTACGGTGG AAAAGTAAAT ATCTTCCCAT

2 VD 190 210 AAAAACGAGA CAGAAGGATT CTCAGAAACA 280 300 VH GAAACTCTAT TTTTGTGGAT TCTGCAAATT 380 390 t3 ACAGAAGCAT TCTCACAAAC TTCTTTGTGA 460 470 480 TTTGGTAGAA AATGTAAGTG GATATTTGGA 550 560 570 ATTAAGAAAC TACTTGGTGA TATCTGCATT 640 650 660 ATCTGGAAGT GGACATTTGG AGCGCTTTGA 730 750 RV 740 TTGTTCGTGA TGTGTGTACT CAACTAAAAG 820 830 840 GGATATTTGG ATTGCTTTGA GGATTTCGTT

220 VR VD 240 250 260 270 AGTTTGTGAT GTGTGTACTC AGCTAACAGA GTGGAACCTT TCTTTTTACA GAGCAGCTTT 340 350 310 320 330 XV 360 GATATTTAGA TTGCTTTAAC GATATCGTTG GAAAAGGGAA TATCGTCATA CAAAATCTAG

400

410

420

430

440

450

TGTGTGTCCT CAACTAACAG AGTTGAACCT TTCTTTTGAT GCAGCAATTT GGAAACACCT t4 500 510 490 520 XV 530 TAGCTTAACG ATTTCGTTGG AAACGGGAAT ATCATCATCT AAAATCTAGA CAGAAGCACT

580 610 590 600 620 VH CAAGTCACAG AGTTGAACAT TCCCTTACTT TGAGCACGTT TGAAACACTC TTTTGGAAGA t5inot full length) 670 680 700 690 TGCCTTTGGT GAAAAGGAAA CGTCTTCCAA TAAAAGCCAG ACAGAAGCAT TCTCAGAAAC 760 770 780 790 810 VH AGTTGAACCT TTCTATTGAT AGAGCAGTTT TGAAACACTC TTTTTGTCGA TTCTGCAAGT 849 GGAAGCGGG

L1.84 10 t1 VD 50 60 ENDt 30 70 80 90 AATTCATCAA ATTGCAGACT GCAGCGTTCA GACTGCAGCG TTCTGAGAAA CATCTTTGTG ATGTTTGTAT TCAGGACACC AGAGTTGAAC 100 110 VH 120 130 140 150 160 170 180 ATTCCCTATC ATAGAGCAGG TTTGAATCAC TCCTTTTGTA GTATCTGGAA GTGGACATTT GGAGGCTTTC AGGCCTATGT TGGAAAAGGA t2 VD 220 200 190 230 260 270 VD 250 RV 240 AATATCTTCC ATAACAACTA GACAGAAGCA TTCTCAGAAC TTATTTGAGA TGTGTGTACT CACACTAAGA GAATTGAACC ACCGTTTTGA 280 290 310 320 330 VH 340 350 360 AGGAGCAGTT TTGAAACACT CTTTTTCTGG AATCTGCAAA GTGGATATTT GGCTAGCTTT GGGGATTTCG CTGGAACGGA ATACATATAA

t3 VD 370 400 390 410 420 430 440 450 AAAGCACACA GCAGCGTTCT GAGAAACTGC TTTCTGATGT TTGCATTCAA GTCAAAAGTT GAACACTCCC TTTCATAGAG CAGTCCTGAA 460 470 480 490 510 520 530 VH VD ACACTCTTTT GTAGTATCTG GAACTGGACT TTTGGAGCGC TTTCAGGGCT AAGGTGAAAA AGAAATATCT TCCCATAAAA ACTGGACAGA length) 560 570 580 590 600 610 620 630 ATCATTtTCA GAAACTTGTT TATGCTGTAT CTACTCAACT AACAAAGTTG AACCTTTCTT TTGATAGAGC AGTTTTGAAA TGCTCTTTTT yH 640 650 660 670 680

t4(notfull

GTGGAATCTG CAAGTGGATA TTTGGTTAGT TTTGAGGATT TCGTTGGAAG CGGG

Figure 5. Complete nucleotide sequences of L1.26 and L1.84. Numbered arrows above the sequences indicate the start of homology to the 171 bp alphoid reported consensus sequence (ref.1). 'END' marks the end of homology to the last 25 bp of the latter. Restriction-sites are indicated by filled triangles. These are: AccI (A); DdeI (D); HinfI (H); RsaI (R) and XbaI (X). A 14 bp direct repeat at position 16 in L1.84 is indicated by horizontal arrows. 2. Cross-homologies. We have aligned the monomer units of L1.26 and L1.84 and compared them to several reported alphoid units (Fig.6B and Table I). These are a-i and a-2 (1); a-X, the consensus sequence from the human X-chromosome (23); a-Y, a monomer originating from the Y-chromosome (11); and SPC-1, a monomer detected on small polydisperse circular DNA (24). The cross-homology between the units 2066

Nucleic Acids Research A E

iE

E B

B

171 E~~E

142 ~~~ ~~~~~~~142. ~

EL

:26

.E

~39

E

79

10 20 70 30 40 50 60 80 90 100 0-2 GATTCTCAGA AACTCCTTTG TGATCTCTGC GTTCAACTCA CAGACTTTAA CCTTTCTTTT CATAGACCAG TTAGGAAACA CTCTGTTTGT AAAGTCTGCA C T TC T T ot-I A TA C A C ACTT T GA GGA

ct-X C al-Y CTA CT L1.26-1 C

SPC-1

C C

-2 -3Cc C -4C CAT AC -S C C LI.84-1 CC -2 C C -3 CC -4 C 0 0

C AT A A C C TT A CAT A TA C C T C T CT C CA CC T C A CA CC AC T AC C T T CC A C C ACTAC AC A C CA A C TC C T AC A A A A C CA C AT T TA CCA ACCC T AC TA A ACG A C C T A C AC AC CC C C A A TC A C A CT AC 000 @009 @000 *@00 000

T T CA T TC C

TN G C T TT C T TA C T C GTT TC C TT T A GCCC TT CCGTT C TT C TT TT CAC CCT C TT Se

so

170 150 160 130 140 110 120 ACTCCATATT CACACCTCTT TCAGGCCTTC CTTGGAAACC CCATT-TCTT CATATTATC- CTACACACAA C C C T TCAC C T AAT C A TA A AA ACAAA A N AA TC N NN CAAAAM ax-X C A AMA C TC C CT - T AT TC A A A A a-Y A TC CAC A AC TA A AA AACG C A AAA A SPC-1 AC C --A A AAACG LI.26 START: A A TA AA CC A AAM C L1.26-1 T TC CCG T A C C AA C CAMAT -2 T TTC A CATA C AT C AMAT TC TAC A- CAT AA A -3 TCCC A A ACC CAAMCAGC -4 T C TCGC -5 TC TTC AT :END LI.26 A A CA A --CCT LI.84 START: A C- ACAA A A AA TC-CACC C AT LI.84-1 C C AT C -2 TC CTAC CA -A- A- -C AAMA C C CMC GT CT TCCC C CC AMACA -3 C MAAAAA C -4 :END L1.84 TC TTACT AT

T CT C AT TC

T

T C

T A C

GA CA CA CTA

A C

CC T AT C -TO GAAA TCGACGA CC T T CT CTA TT CGG A TCTA T CGCA *

*e@

T C

C C 0

a-2 ot-l

*

O" 000

*@O@@

OS"

0

000

position (fig.5)

1-25 26-196 197-367 368-536 537-607 608-849 1-27 39-208 209-374 375-542 543-684

@0

Figure 6. A. Position of average nucleotide homology relative to restriction sites in L1.26, L1.84 and ar-X relative to a hypothetical tandem-structure of ca-1 units. B. Comparison of the monomer sequences of L1.26, L1.84, the human a-dimer (ref.1), ar-X (ref.23),ar-Y (ref.l1) and SPC-1 (ref.24). Comparison is made relative to ar-2. Only bases which differ from this sequence are shown. Deletions are indicated by C-), positions where more than three base changes occur by (e). For maximum alignment, bases 18, 80, 243, 310 and the 14 bp repeat were deleted from L1.84, base 14 was deleted from L1.26. Monomer numbers of L1.26 and L1.84 correspond to numbering in fig.5. of our probes and cr-i and ar-2 was found to vary between 68% and 82% with an average of 75% and 73% for L1.26 and L1.84 respectively (Table I). slightly lower homologies were noted between our probes and SPC-1 (72% and 70% resp.), and a-Y (71% and 72%. resp.), whereas the ar-X sequence seems somewhat more related at approximately 80% homology in both instances. Between L1.26 and L1.84 there exists an overall cross-homology of 75%, although much higher homologies can be detected when smaller regions are compared (e.g. 92% between

2067

Nucleic Acids Research Table I. Sequence homologies between the monomers of L1.26 and L1.84, both monomers of the human a-dimer (ref.1), the consensus a-X monomer (ref.23), a monomer found on small polydisperse circular DNA (ref.24), and a-Y, a monomer derived from the human Y-chromosome (ref.11). Numbers above the diagonal represent percent identity of the two compared sequences. Numbers below the diagonal represent mean cross-homology of the sequences that fall within the boxed region. Monomer numbers of L1.26 and L1.84 correspond to numbering in fig.5. L1.84 L1.26 4 a -1 a-2 a-X a-Y SPC-1 3 2 4 1 5 2 3 1 74 82 74 72 76 75 84 82 69 81 72 75 LL.26/1 - 71 70 69 76 77 73 80 80 65 85 72 73 67 LI.26/2 73 70 77 81 70 82 68 71 75 69 85 L1.26/3 70 70 73 69 75 75 66 77 65 69 L1.26/4 73 85 73 90 75 82 73 85 71 L1.26/5 72 80 75 77 73 71 67 82 L1.84/1 67 68 76 70 71 82 65 75 + 10 L1.84/2 71 79 73 71 68 75 L1.84/3 70 77 82 70 73 L1.84/4 69 73 78 75 74 + 3 a-i 73 + 3 73 71 84 72 + 5 76 + 6 U-2 78 81 79+3 80+5 a-X 76 72 + 2 71 + 2 a-Y

-

SPC-i

72 + 2

70 + 3

the last 100 bp of both sequences, not shown). Thus, after comparing overall far is homologies, it appears that no specific tandem-sequence reported significantly more related to any of the others. This may be explained by the scattered distribution of base substitutions among the fourteen compared so

monomer sequences

(fig.6B). About 70% of all positions underwent two

or

less

base changes. Within L1.84, the first full length monomer is 82% homologous to the third and 67% and 71% to the second and fourth respectively. The second monomer shares 82% homology to the fourth monomer. This distribution of homologies is suggestive of a basic 340 bp homology as proposed by Wu and Manuelidis (1) for the consensus alphoid structure. In L1.26, however, the various cross-homologies do not show such a spacing pattern; the first full length monomer is 81% homologous to the fourth monomer, while the second monomer is 80% homologous to the third. These two units are also closely related to the fifth (not full

2068

Nucleic Acids Research length) monomer. If homologies of more than 80% are grouped, another kind of spacing, represented by 'a-b-b-a-b' may be proposed. 3. Sequence-conservation. Because of the scattered distribution of base substitutions, conserved regions are not easily defined when all alphoid sequences are compared. Only

when more variable positions are first defined (dots in fig.6B), do some small relatively more conserved regions become apparent. These include positions 3-13, 17-26, 42-51, 75-89, 95-111 and 140-148 and show a clustering of positions where no base change occurred at all. DISCUSSION. We have examined the genomic organization of two human alpha-satellite DNA-sequences, designated L1.26 and L1.84. They were previously shown to map predominantly to chromosomes 13, 21 and 18 respectively (12). Several lines of evidence suggest that both sequences represent distinct subgroups of the alphoid DNA family: (a). Under our hybridization conditions, the copy-number of L1.26 and L1.84 is about 2,000 per haploid genome each. Since the 340 bp EcoRI-fragment is estimated to be present in about 55,000 copies (8), this indicates that each probe detects a small subset of alphoid DNA sequences. (b). Southern hybridizations to EcoRI-digested genomic DNA show that both probes hybridize to the same series of 170 bp multimers, but each produces a signal of different intensity per band. Further, L1.84 is largely organized into tetrameric units whereas L1.26 has a more complex organization. (c). Sequence analysis of L1.26 and L1.84 shows that they each have diverged about 257 from the 340 bp EcoRI-fragment reported by Wu and Manuelidis (1), which is the reason for their poor hybridization to this band. L1.26 and L1.84 also show a 25% sequence divergence between one another. It seems, therefore, that this family is a highly heterogeneous collection of sequences, all variations on a 170 bp motif. The members diverge in sequence composition, but remain clearly related (fig.6). Digestion with EcoRI (or several other enzymes) results in a distribution of the heterogeneous units among the bands that form the ladder. Consequently, each step in the ladder consists of several DNA fragments of similar length, but with different base sequences. According to our results, this variation may amount to 35% (Table I). Hybridization with a representative of a specific alphoid subfamily under stringent conditions lights up only the most homologous multimers. Thus, probe representatives of two different alphoid subfamilies may hybridize to

2069

Nucleic Acids Research the same band in the ladder, though not necessarily because of cross-homology to each other, but due to comigration of different genomic alphoid sequences to the same position. However, some hybridization may also occur from crosshomology since closely related monomers are scattered throughout the various

tandem-structures (Table I). Since some monomers of L1.26 are over 80% homologous to the X-chromosomal consensus unit of Waye and Willard (23), it is not surprising to find L1.26 hybridizing to DNA of a hybrid cell line containing the X-chromosome as its only retained human material. Although longer exposure times are needed (see legend fig.4), the obtained restriction pattern closely resembles the reported one obtained with a chromosome X-derived alphoid sequence (10). This suggests that virtually all X-chromosomal alphoid DNA is organized in 2.0 kb BamHI

units as described, although we cannot fully exclude the presence of distinctly organized divergent sequences left undetected by both probes. Further, we

showed that chromosome 13 specific alphoid DNA is distinct from the X-chromosome in its organization of restriction sites. Thus, the sequence-heterogeneity within the alphoid family is distributed in a chromosome-specific manner with a characteristic restriction site spacing for each enzyme and chromosome. The speculation that these subfamilies play a role in discriminating chromosomes from one another (25,26), is therefore attractive. The survival of alphoid DNA in the genome during evolution has led to suggestions that it may serve in chromosome structure (25); nucleosome arrangement (27); and homologous chromosome recognition (reviewed in 28). As yet, none of these alleged functions has been confirmed. Alternatively, it may be an evolutionary "hitchhiker" with no special function (29). Sequence analysis reveals some aspects of alphoid DNA evolution. We have found a 14 bp direct repeat within L1.84 at the border of two units defined by Wu and Manuelidis. This 14 bp repeat may be a remnant of an unequal cross-over event. A recently proposed model (30) explains how short repeats or deletions of this type may have originated. Within a Holliday-structure, mispairing may occur, the result of neighbouring sequence homology, or because of hairpin structures within a single DNA strand. An imperfect 14 bp stem-structure (AGAAACATCTTTGT at 46) downstream of the 14 bp direct repeat may, by folding back, have been the cause of the duplication. However, both investigated sequences are approximately 25 bp out of phase compared to the 170 bp unit described by Wu and Manuelidis (1). The question arises what are borders of the amplification unit within L1.26 and L1.84 related sequences. It may be a sequence related to the consensus 340 bp a-1/a-2 dimer (1). This would explain the remarkable coinci-

2070

Nucleic Acids Research dence of the cross-over event in L1.84 with the a-1/a-2 unit-boundaries. It would, however, not explain the tetrameric organization of L1.84 (fig.2), which suggests that L1.84 as a whole is an amplification unit. It would also be inconsistent with data of L1.26, which is suggestive of an a-b-b type of suborganization, and those of Waye and Willard (23) who noticed a similar 79 bp out of phase phenomenon in their BamHI-defined 2.0 kb multimer (fig.6A), but clearly demonstrated their sequence to be the amplification unit. In a tandem array of 170 bp sequences, the start-point of any unit is, of course, arbitrary. A unit-definition based on restriction sites is thus inappropriate. Given the chromosome specific nature of the discussed sequences, it is reasonable to propose that different chromosomes carry distinct amplification-units. The out of phase phenomenon may be explained by the existence of extra-chromosomal circular satellite DNA (24). Formation and integration of these circles through random homologous recombination events can explain both circular permutations and conservation of the 170 bp unit. Although many sequences within L1.26 and L1.84 resemble restriction sites in that they contain one or two base changes relative to the true recognition site, most restriction sites occur with an n(170) bp spacing (fig.5). Assuming random mutation, this suggests that considerable homogenization of sequences is taking place continuously within the array, perhaps through gene conversion or unequal crossing over, both meiotic and mitotic (31). Irrespective of the nature of the basic unit of alphoid DNA amplification, the 170 bp regularity remains conserved. The existence of more highly conserved regions within each monomer (fig.6) may either be a cause of the regularity, or, alternatively, a consequence of it. Two of the conserved regions we observed, notably positions 103-111 and 140-148, coincide with the binding sites II and III of African Green Monkey alpha protein (32). It has been suggested that "nucleosome phasing" may play a role in conservation of the 170 bp structure in alphoid DNA (27), although other data conflict with this opinion (33). However, the observation that the alphoid sequences characterized to date demonstrate approximately 75% homology to each other, irrespective of their chromosomal location, including sequences from the same chromosome, suggests a restriction in the degree of divergence permitted, which remains unexplained at present. Although we have shown in this study that certain alphoid sequences have evolved in ways resulting in chromosome specific attributes, it is clear that other chromosome specific alphoid sequences should be analyzed in order to establish a general model of alphoid DNA evolution.

2071

Nucleic Acids Research ACKNOWLEDGEMENTS.

The authors would like to thank dr. A.M. Millington Ward and dr. G.-J.B. van Ommen for helpful discussions and reviewing the manuscript and dr. F. Baas and dr. H. van Ormondt for technical assistance during sequencing procedures and computer analyses. This work was supported by the Netherlands Cancer Foundation (Koningin Wilhelmina Fonds Grant nr. A83.21). REFERENCES. 1. Wu, J.C. and Manuelidis, L. (1980) J. Mol. Biol. 142, 363-386. 2. Shimizu, Y., Yoshida, K., Ren, Ch., Fujinaga, K., Rajagopalan and Chinnadurai, G. (1983) Nature (Lond.) 302, 587-591. 3. Shafit-Zagardo, B., Maio, J.J. and Brown, F.L. (1982) Nucl. Acids Res.

10, 3175-3193. 4. Houck, C.M., Rinehart, F.P. and Schmid, C.W. (1979) J. Mol. Biol. 132, 289-306. 5. Maio, J.J., Brown, F.L. and Musich, P.R. (1981) Chromosoma (Berl.) 83, 103-125. 6. Manuelidis, L. and Wu, J.C. (1978) Nature (Lond.) 276, 92-94. 7. Manuelidis, L. (1978) Chromosoma (Berl.) 66, 1-21. 8. Darling, S.M., Crampton, J.M. and Williamson, R. (1982) J. Mol. Biol. 154, 51-63. 9. Manuelidis, L. (1978) Chromosoma (Berl.) 66, 23-32. 10. Willard, H.F., Smith, K.D. and Sutherland, J. (1983) Nucl. Acids Res. 11, 2017. 11. Wolfe, J., Darling, S.M., Erickson, R.P., Craig, I.W., Buckle, V.J., Rigby, P.W.J., Willard, H.F. and Goodfellow, P.N. (1985) J. Mol. Biol. 182, 477-485. 12. Devilee, P., Cremer, T., Slagboom, P., Bakker, E., Scholl, H.P., Hager, Cytogen. H.D., Stevenson, A.F.G., Cornelisse, C.J. and Pearson, P.L. Cell Gen., in press. 13. Hofker, M.H., Wapenaar, M.C., Goor, N., Bakker, B., Van Ommen, G.J.B. and Pearson, P.L. (1985) Hum. Genet. 70, 148-156. 14. Pearson, P.L., Bakker, E. and Flavell, P.A. (1982) Cytogen. Cell Gen. 32, 308. 15. Maniatis, T., Fritsch, E.F. and Sambrook, J. (1982) Molecular Cloning, a laboratory manual. Cold Spring Harbor Laboratory. 16. Berbschleb-Voogt, E., Grzeschik, K.-H., Pearson, P.L. and Meera Khan, P. (1981) Hum. Genet. 59, 317-323. 17. Southern, E.M. (1975) J. Mol. Biol. 98, 503-517. 18. Jeffreys, A.J. and Flavell, R.A. (1977) Cell 12, 429-439. 19. Rigby, P.W.J. et al (1977) J. Mol. Biol. 113, 237-251. 20. Sanger, F., Nicklen, S. and Coulson, A.R. (1977) Proc. Natl. Acad. Sci. USA 74, 5463-5467. 21. Messing, J. (1983) Meth. Enzymol. 101, 20-78. 22. Biggin, M.D., Gibson, T.J. and Hong, G.F. (1983) Proc. Natl. Acad. Sci. USA 80, 3963-3965. 23. Waye, J.S. and Willard, H.F. (1985) Nucl. Acids Res. 3, 2731-2743. 24. Jones, R.S. and Potter, S.S. (1985) Nucl. Acids Res. 13, 1027-1042. 25. Manuelidis, L. (1982) In: Genome Evolution, eds. G.A. Dover and R.B. Flavell, Academic Press, New York, p. 263-285. 26. Lee, T.N.H. and Singer, M.F. (1982) J. Mol. Biol. 161, 323-342.

2072

Nucleic Acids Research 27. Wu, K.C., Strauss, F. and Varshavsky, A. (1983) J. Mol. Biol. 170, 93117. 28. Brutlag, D.L. (1980) Ann. Rev. Genet. 14, 121-144. 29. Orgel, L.E., Crick, F.H.C. and Sapienza, C. (1980) Nature (Lond.) 288, 645-646. 30. Millington Ward, A.M., Reuser, J.A.M., Scheele, J.Y., Van Lohuizen, E.J., Van Gorkum Van Diepen, I.R.M.C., Klasen, E.A. and Bresser, M. (1984) Mol. Gen. Genet. 193, 332-339. 31. Dover, G. (1982) Nature (Lond.) 299, 111-117. 32. Strauss, F. and Varshavsky, A. (1984) Cell 37, 889-901. 33. Smith, M.R. and Lieberman, M.W. (1984) Nucl. Acids Res. 12, 6493.

2073