Repetitive DNA and its evolution

14 downloads 148 Views 2MB Size Report
Jul 28, 2011 - S.vav147!259pt1. Pet w 25/208! ... S.vav106/208!215pt1. S.vav106/208! .... internal structure with 370-bp
rapid evolution in copy number, location and sequence, with diverse turnover mechanisms. often mark the major differences between closely related species

Sym038: Plant genomes: not just for models anymore – 38A: 28 July 2011

Repetitive DNA and its evolution: genomic strategies to understand the processes and exploit biodiversity

Pat Heslop-Harrison and Trude Schwarzacher Farah Badakshi, Ana Claudia Guerra, Guto Kuhn, Faisal Nouroz, Mateus Mondin

[email protected]

www.molcyt.com

Twitter: #IBC18

Peanut, Arachis hypogaea, 2n=4x=40 AABB genome constitution Ana Claudia Guerra Araujo, EMBRAPA, Brazil

BAC in situ – individual BACs have various repetitive sequences with

Many of the repetitive sequences are retrotransposons and DNA transposons Some are microsatellite motifs Some are satellites – including the most rapidly evolving sequences

ancestral

High-copy number

High-copy number

High-copy number Low-copy number

Low-copy number

High copy spp: homogenized, amplification from a limited number of master copies Low copy spp: much variation

Afacer1 T.mono106/42!133pt1 T.mono106/42!155pt1 Ae.umb106/208!1911 996 Ae.umb106/208!1810 1000 Ae.umb106/208!2012 S.vav147!259pt1 Pet w 25/208!088 578 Pet w 25/208!077 S.vav42!248 S.vav106/208!215pt1 537 S.vav106/208!204pt2 915 S.vav106/208!215pt2 S.vav25/208!182 S.mon42!136 Pet 22594 25/42!3324 Pet 22594 25/42!3425 623 CS/325/208!1820 Pet 2259425!315 S.vav25/208!193 981 T.tau25/147!2524pt2 CS/325/147!2322pt2 L.moll25/42!156 Pet w25/147!123 CS/325/208!1719 CS/325/147!2120 998 Pet w25/147!3231 Ae.umb25/147!124 Ae.umb25/208!168 L.moll25/42!167 119Repeat2 Pet w 25/208!099 S.mon147!1811 S.vav106/208!204pt1 119Repeat3 S.vav147!2711pt1 942 S.mon25/147!081 T.mono25/208!1212 616 119Repeat1 882 S.vav42!237pt1 S.vav42!237pt2 T.tau25/147!2625 580 T.tau25/147!2726 Ae.umb25/208!157 T.tau25/208!1517 T.tau106/42!177 Ae.umb25/42!091 L.moll25/42!189 CS/325/208!1618 1000 T.tau25/147!2524pt1 CS/325/147!2322pt1 Pet 22594 25/208!2325 564 Pet 22594 25/147!1918 Pet 22594 25/208!2426 T.tau25/208!1315 725 T.tau25/208!1416 Ae.umb25/147!135 985 Ae.umb25/147!146 646 Ae.umb25/208!179

1000 749

120bp repeat unit family in Triticum, Aegilops and Secale species

Homology between sequences 70-100% in Triticum/Aegilops and Secale species

0.1

Ae. tauschii (D genome)

Irrespective of copy number in the genome

S. cereale (R genome)

ancestral

High-copy number

High-copy number

High-copy number

Low-copy number

Low-copy number

High-copy number

High copy and low copy spp: no significant homogenization

www.molcyt.com

Brassica nigra (BB) 2n=16

Brassica carinata (BBCC) 2n=34

Brassica oleracea (CC) 2n=18

Brassica juncea (AABB) 2n=36

Brassica napus (AACC) 2n=38

Brassica rapa (AA) 2n=20

10

Genome Specificity of a CACTA (En/Spm) Transposon B. napus (AACC, 2n=4x=38) – hybridized with C-genome CACTA element red B. oleracea (CC, 2n=2x=18) B. rapa (AA, 2n=2x=20)

Alix et al. The CACTA transposon Bot1 played a major role in Brassica genome divergence and gene proliferation. Plant Journal

Genome Specificity of a CACTA (En/Spm) Transposon

B. napus

AJ 245479

AC 189496

B. rapa

AC 189446

AC 189655

AC 189480

B ot1-1

large insertion specific of Bot1-1

B. oleracea

large insertion in common between Bot1-2 and Bot1-3

B ot1-2

B ot1-3

Bo6L1-15 1010bp

Rearrangement specific of Bot1-3

Sequence level at scale of 100s to 10000s of bp – analysis by dotplot …

Faisal Nouroz

Dot-plots of genomic sequence from homologous pairs of BACs

kb

Brassica rapa (A genome) sequence

Region of high homology between A and C sequence

Region of low homology 4kb Insertion-gap pair: present in C genome 500bp Insertion-gap pair: present in A Microsatellite Transposed (moved) sequence

• Sequence level at scale of 100s to 1000s of bp – analysis by dotplot An inversion Dotter plot of Brassica oleracea var. alboglabra clone BoB028L01 x Brassica rapa subsp. pekinensis clone KBrB073F16 with transposable elements. 28/07/2011

gi 195970379 vs. gi 199580153

Brassica oleracea (C genome) sequence

14

28/07/2011

AAGTGAATGGATGCTCGCATTAGTTACTATGAGCCGATTCTCGCTCTTGCGAAAGCTAAAGAGGAAAAGGCCTTCGCATTGCAGAAG AGCTGGCTGCCAGCGAGCAAGAGGTTTTCAATATTGGCTTGTGGAAAATTTGTTGCCACTTTTGCTTTACTAAGGAATGAAATAATAC TTGTTTTTTTTTTTCATGGTTAATATTAGAAGATATAATTTCCTTTGAAGTTAGATTACGTTTCTTTATGTCGACGAAGTGAAGAAATATT GTCTTGTTTATGGTTCCTTCTAGTCCCAACCTTTTTTCAAGAAGGTACAGTACGTGTCAGGATTTATATGGATATACACA TATCCTATTGCGCAATTGTCAATAATAGCACTTTTTGAAGTTTATGTCTCAAAATAGCACTAGAAGGAGAAAGTCACAAAAATGATATT CATTAAAGGGTAAAATATCTCTTATATCCTTGGTTTAAAATTAAATAAACAAACAAAAATAAATAAAAATAAATAAAAAAAATGAAAAAA AAGAAATTTTTTTTATAGTTTCAGATTATATGTTTTCAGATTCGATTTTTTTTTTATTTTTTTATTTTTTTCGAAATTTTTTTTTTATTTTTTTTCA AATTTTCTTTTTATAATTTAAAAATACTTTTTGAAACTGTTTTTTTAATTTTTATTTTTTATTTTAGTATTTATTTTTTATAAAATTTTAAACCCT AATTCCTAAACCCCCACCCCTTAACTCTAAACCCTAAGGTTTGGATTAATTAACCCAATGGATATAAGTGTATATTTACCTCTTTAATGA AACCTATTTTTGTGACTTTGAATCTTGAGTGCTACTTTGGGAACAAAAACTTGGTTTGGTGCTATCCTAGTCTTTTTCTCTATCCTATT TACCACCCTTCTTTGTTCAATACTTTTTACAGTTTTTGGAAAGGACATGTTTCTTCTATCATCACTTAATGGTTATATATGTATGAGAAG TTTGAAAGAGATTACACTGTTTTGGAATATTAAAAAAAAAAGATATTACAAGATCTGATTTTGTTTGTATTTTAAAATTCTACCAAATC TCTCCTCAAAATCTTGGTCAAAGTCCAAAAATCCAAATATCTCAGTTAAATTCCACCAAATATGAAATCCTAAAACTTTTCCAAAATA GTTCAATAAGCCCTTAGTGTTTGGTG

551-bp BART1 TE 9-bp TSD (TATCCTATT) 6-bp TIR and 66-bp imperfect sub-TIR TSD TIR

TIR TSD

16

Brassica rapa with inserted 542bp sequence not present in B. oleracea. 9bp TSD (red letters and arrow) and TIR (blue). Flanking primers used in PCR as blue arrows on sequence Faisal Nouroz 2011

B. rapa AA 2n=20 B. juncea AABB 2n=36 B. nigra BB 2n=16 B. carinata BBCC 2n=34 B. oleracea CC 2n=18 B. napus AACC 2n=38

28/07/2011

GACACTCTTCCCAATCGTTCATTCCTGACGTCATTAGGCAACCACCTCTGTTTTTCCCCACCACAAACAGTGAATACATCTCTCCTATCTCTC TCAGAATCGTCAGTGTTTGCTCTCCGTTGCTTACTCGCTTCTCTATGAATCCAACTTGCCCCGTCGTTACAAATCTGCCAAAAATAAACCAAA ACCAGTCCGGTCAATGAAAAAAATGCCAATGTTTCAGGTCTAGAAATTATCCACAACCCTAGTACTAAGATCTGAAATTTATGAGGGAGATAA ACATTTTTAGGTTAATTGTAAGAAAAAATATTTATAATTTTTGGGCCATGCAGCAAATACATAATATTTCCTTAAAATTTGGATTGTAAGAC TAATAGTGTTTGAGTATTTGATATTTGATATCTTTTAAAAAAGGAAACAAAATTGAATTTCTAAATAAGATTATATTTTTAAAATAAAACAAT AAAAATACATAAAAATAGTTACAAAAAAAAATATATATATTGTTAAACCGTTAGCAAATTAAATACTAAATCCTATACCCTAAATCCTAAACT CCAAACCCTAAATGATAAACCTTAAATCTTGGATAAACCGTAAACCATTGGAAAATTTTAAAACCTAATCATACATTAAAAACTAAAATTTAA TAACACTAAACCCTAAACCCTAATCACTAAACCCTAAACCCTTAGATAAATCATGAACCCTTGGATAAATCATAAACTCTAAATCAAAAATAT TTAAAATTAAACCCTAAAATATATAATTTATCCAAGGGCTCAGAGTTTACCCAAGGGTTTAGGGTTTAGTGATTAGGATTTAGGGTTTAGTGT TATTAAAATTTAGTTTTTAATGTATGATCTAAGGTTTAAGAGTTTCCAATGGTTTAGAGTTTATCCAAAGTTTAAGGTTTAACGTTTAGGGTT TAGGATTTAGGATTTAAGGTATAGGGTTTAGTATTTTGCTGAAGATTTAACAATATTAATTAATTTATTTTTTGTAACTATTTTTATATATTT TTATTATTTTATTTTTAAAATATAATATAATTTGGATATTCAATTTTATTTTCTTTTTTAAAAAATATCAAATATCAAATACTCAAACACTAT

GGTTGGTGAACTTCTAGGTGTGAACCCAAGAATTACTCTTAATGTTTCATCCGATTGTGCTCAAAACCTTTCATGAACTGGCTAAAGCTGGAA ACATAGGATTAGTAAGAAGTAGAATCTTGTAAAGTACCTGTTATAGTATTCCTCTAAGAAAGTTCGATCAGTTTCGTCGTTTGTCTGATCGTT ACCAACAATCTCCATCAAAACATCGTTGTTTTCTTTGGTCACCGCGTCTCCGACAAGATTCTCTGTCTCCGAGCCATAAGCGACAAACTGTAT GATAGTGAGGTGAATCTGAGAGTTATTGATAAGCCACTGGCACAAGGACAGAGCCTCTCGATCATCAGGACCACCAAAGAACAATGCAGCGAC GTGTTGTACCGACTCAAACCCGTGAAGCTGGTGGAACCCGGTTATGTTTCTATCCACATAGATACCGATCG

790-bp TE TAAT, 4-bp TSD AGTGT/ACTCT, 5-bp TIRs & 370-bp IR

18

Insertion sequence present in B. oleracea, missing from B. rapa. TSD (red); green and blue boxes shows remarkable internal structure with 370-bp inverted repeat near-filling the insert.

1

Rapa8002

246 TSD TIR

542-bp TE

B. rapa (4718648200)

TIR TSD 790

Rapa177R

1000

28/07/2011

Rapa141F Rapa185F

………………..……………….

B. oleracea (66,350-66750) B. rapa (AA) B. juncea (AABB) B. napus (AACC)

Hexaploid Brassica (carinata x rapa)

………………..……………….

B. nigra (BB)

………………..……………….

B. oleracea (CC)

………………..……………….

B. carinata (BBCC)

………………..……………….

B. oleracea (GK97361) =A

=T

=C

=G

Schematic representation of insertion in Brassica rapa and other Brassica genomes. Green, red, blue and black boxes showing DNA motifs. 19

28/07/2011

Dotplot of 790bp insertion element showing inverted repeat structure. 20

Terminal LTR retrotransposon in B.oleracea and b) B.napus.

B. juncea with mariner like transposon

e)

524 bp SINE 670 bp hAT 216 bp SINE 2781 bp MuDR Like

Brassica oleracea (C genome; EU642504.1)

TEs in B. oleracea

Brassica rapa (A genome; AC189298.1)

505 bp Unknown

1655 bp in B.rapa Microsatellite

1189 bp Mariner Like 700 bp hAT

380 in B.rapa

715 bp hAT 914 bp LINE 1016 bp in LINE 3869 bp TE 402 bp hAT 1080 bp LINE 1284 bp 258 MITE Unknown 819 bp Unknown 237 bp Mariner Like Inversion 242, 644 & 274 bp SINEs

Whole genome analysis From morphology, cytology, repeats, genes and whole genome sequences

“The most abundant transposable elements identified in XXX, LTR retrotransposons of the 50-fold and 25-fold less frequently in assembled reads than in shotgun reads, respectively ... Because only predicted protein homologies were used to identify Copia and Gypsy superfamilies, were found to occur

transposable elements and because all transposable elements contain extensive noncoding

the vast majority of the transposable element–related DNA in the XXX genome assembly was missed by this approach”

DNA, we expect that

Repeats are the greatest part of most genomes Repeats amplify, homogenize, move and deleted They are the most rapidly changing genome component

They drive genome evolution, diversification and chromatin structure

Sym038: Plant genomes: not just for models anymore – 38A: 28 July 2011

Repetitive DNA and its evolution: genomic strategies to understand the processes and exploit biodiversity

Pat Heslop-Harrison and Trude Schwarzacher Farah Badakshi, Ana Claudia Guerra, Guto Kuhn, Faisal Nouroz, Mateus Mondin

[email protected]

www.molcyt.com