Jul 28, 2011 - S.vav147!259pt1. Pet w 25/208! ... S.vav106/208!215pt1. S.vav106/208! .... internal structure with 370-bp
rapid evolution in copy number, location and sequence, with diverse turnover mechanisms. often mark the major differences between closely related species
Sym038: Plant genomes: not just for models anymore – 38A: 28 July 2011
Repetitive DNA and its evolution: genomic strategies to understand the processes and exploit biodiversity
Pat Heslop-Harrison and Trude Schwarzacher Farah Badakshi, Ana Claudia Guerra, Guto Kuhn, Faisal Nouroz, Mateus Mondin
[email protected]
www.molcyt.com
Twitter: #IBC18
Peanut, Arachis hypogaea, 2n=4x=40 AABB genome constitution Ana Claudia Guerra Araujo, EMBRAPA, Brazil
BAC in situ – individual BACs have various repetitive sequences with
Many of the repetitive sequences are retrotransposons and DNA transposons Some are microsatellite motifs Some are satellites – including the most rapidly evolving sequences
ancestral
High-copy number
High-copy number
High-copy number Low-copy number
Low-copy number
High copy spp: homogenized, amplification from a limited number of master copies Low copy spp: much variation
Afacer1 T.mono106/42!133pt1 T.mono106/42!155pt1 Ae.umb106/208!1911 996 Ae.umb106/208!1810 1000 Ae.umb106/208!2012 S.vav147!259pt1 Pet w 25/208!088 578 Pet w 25/208!077 S.vav42!248 S.vav106/208!215pt1 537 S.vav106/208!204pt2 915 S.vav106/208!215pt2 S.vav25/208!182 S.mon42!136 Pet 22594 25/42!3324 Pet 22594 25/42!3425 623 CS/325/208!1820 Pet 2259425!315 S.vav25/208!193 981 T.tau25/147!2524pt2 CS/325/147!2322pt2 L.moll25/42!156 Pet w25/147!123 CS/325/208!1719 CS/325/147!2120 998 Pet w25/147!3231 Ae.umb25/147!124 Ae.umb25/208!168 L.moll25/42!167 119Repeat2 Pet w 25/208!099 S.mon147!1811 S.vav106/208!204pt1 119Repeat3 S.vav147!2711pt1 942 S.mon25/147!081 T.mono25/208!1212 616 119Repeat1 882 S.vav42!237pt1 S.vav42!237pt2 T.tau25/147!2625 580 T.tau25/147!2726 Ae.umb25/208!157 T.tau25/208!1517 T.tau106/42!177 Ae.umb25/42!091 L.moll25/42!189 CS/325/208!1618 1000 T.tau25/147!2524pt1 CS/325/147!2322pt1 Pet 22594 25/208!2325 564 Pet 22594 25/147!1918 Pet 22594 25/208!2426 T.tau25/208!1315 725 T.tau25/208!1416 Ae.umb25/147!135 985 Ae.umb25/147!146 646 Ae.umb25/208!179
1000 749
120bp repeat unit family in Triticum, Aegilops and Secale species
Homology between sequences 70-100% in Triticum/Aegilops and Secale species
0.1
Ae. tauschii (D genome)
Irrespective of copy number in the genome
S. cereale (R genome)
ancestral
High-copy number
High-copy number
High-copy number
Low-copy number
Low-copy number
High-copy number
High copy and low copy spp: no significant homogenization
www.molcyt.com
Brassica nigra (BB) 2n=16
Brassica carinata (BBCC) 2n=34
Brassica oleracea (CC) 2n=18
Brassica juncea (AABB) 2n=36
Brassica napus (AACC) 2n=38
Brassica rapa (AA) 2n=20
10
Genome Specificity of a CACTA (En/Spm) Transposon B. napus (AACC, 2n=4x=38) – hybridized with C-genome CACTA element red B. oleracea (CC, 2n=2x=18) B. rapa (AA, 2n=2x=20)
Alix et al. The CACTA transposon Bot1 played a major role in Brassica genome divergence and gene proliferation. Plant Journal
Genome Specificity of a CACTA (En/Spm) Transposon
B. napus
AJ 245479
AC 189496
B. rapa
AC 189446
AC 189655
AC 189480
B ot1-1
large insertion specific of Bot1-1
B. oleracea
large insertion in common between Bot1-2 and Bot1-3
B ot1-2
B ot1-3
Bo6L1-15 1010bp
Rearrangement specific of Bot1-3
Sequence level at scale of 100s to 10000s of bp – analysis by dotplot …
Faisal Nouroz
Dot-plots of genomic sequence from homologous pairs of BACs
kb
Brassica rapa (A genome) sequence
Region of high homology between A and C sequence
Region of low homology 4kb Insertion-gap pair: present in C genome 500bp Insertion-gap pair: present in A Microsatellite Transposed (moved) sequence
• Sequence level at scale of 100s to 1000s of bp – analysis by dotplot An inversion Dotter plot of Brassica oleracea var. alboglabra clone BoB028L01 x Brassica rapa subsp. pekinensis clone KBrB073F16 with transposable elements. 28/07/2011
gi 195970379 vs. gi 199580153
Brassica oleracea (C genome) sequence
14
28/07/2011
AAGTGAATGGATGCTCGCATTAGTTACTATGAGCCGATTCTCGCTCTTGCGAAAGCTAAAGAGGAAAAGGCCTTCGCATTGCAGAAG AGCTGGCTGCCAGCGAGCAAGAGGTTTTCAATATTGGCTTGTGGAAAATTTGTTGCCACTTTTGCTTTACTAAGGAATGAAATAATAC TTGTTTTTTTTTTTCATGGTTAATATTAGAAGATATAATTTCCTTTGAAGTTAGATTACGTTTCTTTATGTCGACGAAGTGAAGAAATATT GTCTTGTTTATGGTTCCTTCTAGTCCCAACCTTTTTTCAAGAAGGTACAGTACGTGTCAGGATTTATATGGATATACACA TATCCTATTGCGCAATTGTCAATAATAGCACTTTTTGAAGTTTATGTCTCAAAATAGCACTAGAAGGAGAAAGTCACAAAAATGATATT CATTAAAGGGTAAAATATCTCTTATATCCTTGGTTTAAAATTAAATAAACAAACAAAAATAAATAAAAATAAATAAAAAAAATGAAAAAA AAGAAATTTTTTTTATAGTTTCAGATTATATGTTTTCAGATTCGATTTTTTTTTTATTTTTTTATTTTTTTCGAAATTTTTTTTTTATTTTTTTTCA AATTTTCTTTTTATAATTTAAAAATACTTTTTGAAACTGTTTTTTTAATTTTTATTTTTTATTTTAGTATTTATTTTTTATAAAATTTTAAACCCT AATTCCTAAACCCCCACCCCTTAACTCTAAACCCTAAGGTTTGGATTAATTAACCCAATGGATATAAGTGTATATTTACCTCTTTAATGA AACCTATTTTTGTGACTTTGAATCTTGAGTGCTACTTTGGGAACAAAAACTTGGTTTGGTGCTATCCTAGTCTTTTTCTCTATCCTATT TACCACCCTTCTTTGTTCAATACTTTTTACAGTTTTTGGAAAGGACATGTTTCTTCTATCATCACTTAATGGTTATATATGTATGAGAAG TTTGAAAGAGATTACACTGTTTTGGAATATTAAAAAAAAAAGATATTACAAGATCTGATTTTGTTTGTATTTTAAAATTCTACCAAATC TCTCCTCAAAATCTTGGTCAAAGTCCAAAAATCCAAATATCTCAGTTAAATTCCACCAAATATGAAATCCTAAAACTTTTCCAAAATA GTTCAATAAGCCCTTAGTGTTTGGTG
551-bp BART1 TE 9-bp TSD (TATCCTATT) 6-bp TIR and 66-bp imperfect sub-TIR TSD TIR
TIR TSD
16
Brassica rapa with inserted 542bp sequence not present in B. oleracea. 9bp TSD (red letters and arrow) and TIR (blue). Flanking primers used in PCR as blue arrows on sequence Faisal Nouroz 2011
B. rapa AA 2n=20 B. juncea AABB 2n=36 B. nigra BB 2n=16 B. carinata BBCC 2n=34 B. oleracea CC 2n=18 B. napus AACC 2n=38
28/07/2011
GACACTCTTCCCAATCGTTCATTCCTGACGTCATTAGGCAACCACCTCTGTTTTTCCCCACCACAAACAGTGAATACATCTCTCCTATCTCTC TCAGAATCGTCAGTGTTTGCTCTCCGTTGCTTACTCGCTTCTCTATGAATCCAACTTGCCCCGTCGTTACAAATCTGCCAAAAATAAACCAAA ACCAGTCCGGTCAATGAAAAAAATGCCAATGTTTCAGGTCTAGAAATTATCCACAACCCTAGTACTAAGATCTGAAATTTATGAGGGAGATAA ACATTTTTAGGTTAATTGTAAGAAAAAATATTTATAATTTTTGGGCCATGCAGCAAATACATAATATTTCCTTAAAATTTGGATTGTAAGAC TAATAGTGTTTGAGTATTTGATATTTGATATCTTTTAAAAAAGGAAACAAAATTGAATTTCTAAATAAGATTATATTTTTAAAATAAAACAAT AAAAATACATAAAAATAGTTACAAAAAAAAATATATATATTGTTAAACCGTTAGCAAATTAAATACTAAATCCTATACCCTAAATCCTAAACT CCAAACCCTAAATGATAAACCTTAAATCTTGGATAAACCGTAAACCATTGGAAAATTTTAAAACCTAATCATACATTAAAAACTAAAATTTAA TAACACTAAACCCTAAACCCTAATCACTAAACCCTAAACCCTTAGATAAATCATGAACCCTTGGATAAATCATAAACTCTAAATCAAAAATAT TTAAAATTAAACCCTAAAATATATAATTTATCCAAGGGCTCAGAGTTTACCCAAGGGTTTAGGGTTTAGTGATTAGGATTTAGGGTTTAGTGT TATTAAAATTTAGTTTTTAATGTATGATCTAAGGTTTAAGAGTTTCCAATGGTTTAGAGTTTATCCAAAGTTTAAGGTTTAACGTTTAGGGTT TAGGATTTAGGATTTAAGGTATAGGGTTTAGTATTTTGCTGAAGATTTAACAATATTAATTAATTTATTTTTTGTAACTATTTTTATATATTT TTATTATTTTATTTTTAAAATATAATATAATTTGGATATTCAATTTTATTTTCTTTTTTAAAAAATATCAAATATCAAATACTCAAACACTAT
GGTTGGTGAACTTCTAGGTGTGAACCCAAGAATTACTCTTAATGTTTCATCCGATTGTGCTCAAAACCTTTCATGAACTGGCTAAAGCTGGAA ACATAGGATTAGTAAGAAGTAGAATCTTGTAAAGTACCTGTTATAGTATTCCTCTAAGAAAGTTCGATCAGTTTCGTCGTTTGTCTGATCGTT ACCAACAATCTCCATCAAAACATCGTTGTTTTCTTTGGTCACCGCGTCTCCGACAAGATTCTCTGTCTCCGAGCCATAAGCGACAAACTGTAT GATAGTGAGGTGAATCTGAGAGTTATTGATAAGCCACTGGCACAAGGACAGAGCCTCTCGATCATCAGGACCACCAAAGAACAATGCAGCGAC GTGTTGTACCGACTCAAACCCGTGAAGCTGGTGGAACCCGGTTATGTTTCTATCCACATAGATACCGATCG
790-bp TE TAAT, 4-bp TSD AGTGT/ACTCT, 5-bp TIRs & 370-bp IR
18
Insertion sequence present in B. oleracea, missing from B. rapa. TSD (red); green and blue boxes shows remarkable internal structure with 370-bp inverted repeat near-filling the insert.
1
Rapa8002
246 TSD TIR
542-bp TE
B. rapa (4718648200)
TIR TSD 790
Rapa177R
1000
28/07/2011
Rapa141F Rapa185F
………………..……………….
B. oleracea (66,350-66750) B. rapa (AA) B. juncea (AABB) B. napus (AACC)
Hexaploid Brassica (carinata x rapa)
………………..……………….
B. nigra (BB)
………………..……………….
B. oleracea (CC)
………………..……………….
B. carinata (BBCC)
………………..……………….
B. oleracea (GK97361) =A
=T
=C
=G
Schematic representation of insertion in Brassica rapa and other Brassica genomes. Green, red, blue and black boxes showing DNA motifs. 19
28/07/2011
Dotplot of 790bp insertion element showing inverted repeat structure. 20
Terminal LTR retrotransposon in B.oleracea and b) B.napus.
B. juncea with mariner like transposon
e)
524 bp SINE 670 bp hAT 216 bp SINE 2781 bp MuDR Like
Brassica oleracea (C genome; EU642504.1)
TEs in B. oleracea
Brassica rapa (A genome; AC189298.1)
505 bp Unknown
1655 bp in B.rapa Microsatellite
1189 bp Mariner Like 700 bp hAT
380 in B.rapa
715 bp hAT 914 bp LINE 1016 bp in LINE 3869 bp TE 402 bp hAT 1080 bp LINE 1284 bp 258 MITE Unknown 819 bp Unknown 237 bp Mariner Like Inversion 242, 644 & 274 bp SINEs
Whole genome analysis From morphology, cytology, repeats, genes and whole genome sequences
“The most abundant transposable elements identified in XXX, LTR retrotransposons of the 50-fold and 25-fold less frequently in assembled reads than in shotgun reads, respectively ... Because only predicted protein homologies were used to identify Copia and Gypsy superfamilies, were found to occur
transposable elements and because all transposable elements contain extensive noncoding
the vast majority of the transposable element–related DNA in the XXX genome assembly was missed by this approach”
DNA, we expect that
Repeats are the greatest part of most genomes Repeats amplify, homogenize, move and deleted They are the most rapidly changing genome component
They drive genome evolution, diversification and chromatin structure
Sym038: Plant genomes: not just for models anymore – 38A: 28 July 2011
Repetitive DNA and its evolution: genomic strategies to understand the processes and exploit biodiversity
Pat Heslop-Harrison and Trude Schwarzacher Farah Badakshi, Ana Claudia Guerra, Guto Kuhn, Faisal Nouroz, Mateus Mondin
[email protected]
www.molcyt.com