Figure S1. Annotated alignment of unc-47 upstream ... - PLOS

0 downloads 0 Views 90KB Size Report
C.ele. -CCTAT GAATTTTCCT ACATCTATTT TGAAAAGTAA GC----- AAA TTCTATGAAA c.bri AGACAACстC AAAAAGТАТА ТТGТТGAACG ACGGCТТGGT ...
Figure S1. Annotated alignment of unc-47 upstream sequences from 4 nematode species.

C.ele C.bri C.rem C.bre

....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| 10 20 30 40 50 60 70 80 90 100 ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~ATCCC GGAACAGTCG AAAGTCGG-~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~GGG AAGCTTGGGA AAGATCGGAA AAGATGATAG GATATATTGA AAAGGATCTG TTTCGTTAAA GCAACTCATC CCAGTAAATT GTGAACTTTT CAA------~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ start of full-length promoter sequences in red

C.ele C.bri C.rem C.bre

....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| 110 120 130 140 150 160 170 180 190 200 -TGGCAAGCG CCGAACTGCT GACGGTCTAA CCGGGG---- -CACAAATCA GGGGTGAGCG GCAAACGATT TTTCCGGCAA AT--CGGCAA ATCGGCAAAT ATTCGGAGAA CAGTAACTCA AAAGCTCGAA ATATGATTCT TCAACTTTTC AAAGTTTTCT TTGATATACA CAGGTAGAGG GGCATTAAAC TGCTAAATGA -TGATAAACC GTGCAAC-TG ATCGAGTGAA A-ATGG---- -CATAG---T AAAGGATACA GTAATCTCCA ACAACAAAAA AAAATCAGAA ATCCAGGAAT ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~

C.ele C.bri C.rem C.bre

....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| 210 220 230 240 250 260 270 280 290 300 TGCCAATATT GAAATACCCG GCAAATCGGT AAATAGCCGG AA--TTGAAA ATTTCCGGCA AACTGGTA-- -AACCGCAAA TTGCTGATTT G-----CC-CAGCCAAAGA GGAGCAAAAT GGCGAATGAC TGCTAGTTGG AAGCCGAAGG AGGACCGATA GACTGTCAAC GGACTGTGGA CGGACAACCG GAACCTCTTG CAATATTAAG AAAATAACAT AT--AATCCC AGTT--TTCC TA-CCGTATA TTCACCATAA TAATGAC--- --ATCATTGA GTTTCTACGA GAAACACT-~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~

C.ele C.bri C.rem C.bre

....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| 310 320 330 340 350 360 370 380 390 400 -GAATTTGCC GGGAAGACGG CAATTG---- ----CCAAAC ATATTC---- ---------- --GGCAAATT GTGGTTTT-G CACTTTTTGG AAATTTCAGA GAAAACCGAC TTGAAAGTTT TGATTGAAAA TCATCCAGAA AAGATTATGT TATGCTTCCA CTAGATTATT CTGATGATAG AAACTACTTT TTCTCACATA GAAAGCTGAA G-TCAAAATT TTATTAAG-- --AGCCATAA AGATT----- ---------- --AAAAAAGT ATTTTGTCAG GAGTTTGTGT --CTTACAAA ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~ ~~CGCATGCT CCAGACTTAG GAATATATGG TACTCATCTG ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| 410 420 430 440 450 460 470 480 490 500

C.ele C.bri C.rem C.bre

A-----TTTC AGACAACCTC A------CTC G------TGC

AATCGGCAAA AAAAAGTATA AGAAA---AA AAATG---TA

TTGT-----TTGTTGAACG TTTT-----G TTCCTATCAG

----GCACAT ACGGCTTGGT TTTTTTAATT ATCTCCAGAT

-----CCTAT GACAGTCGAA -----TCTAA -----CAGAG

GAATTTTCCT AGGCACTGAC AAAATTTAAT AAGTTTCGAA

ACATCTATTT TAGTCGATTA TGG---ATCT TGC---ATCA

TGAAAAGTAA CCGGTGGTTG TGAGTTCTCA AGACTTTTCA

GC-----AAA GTGGACCAAA GT----CGAA CT----TACA

TTCTATGAAA TTTCTGGAGA CTTA-AAGAA GTTC-GGATA

C.ele C.bri C.rem C.bre

....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| 510 520 530 540 550 560 570 580 590 600 ATATCTAAAG AAAAATGGAA AAAATTT--- --TCA----- ---AAAAGGC ACA----GTT TTAAGTGTTT CC----GTCT A---ATAAAA AAATCCCCCT ATGGAACATT TTGTCAAAAA TTACTTTTGG AATTATCACA CTATAAAAGT CAAAGAGGAC TCTGGGAATC TTCAAAATTT ATTATTAGAA GTATGCGGAC CTGGTACATT TGAGG--AAG CGAGACC--- -ATTG----- ---GAAAGGA GTA---TGAG CCACGTCATT ---AAATTGT G---GTAAGA -TACTTACTC TCATTACAGG ACAAGTTGAA CGAATCC--- -TCAGTCTCA CACGGAAGGT GTAGTCCAAT TGATATCATC CA-AAACCGT ATTTGTAGTG -AAACTAAGT

C.ele C.bri C.rem C.bre

....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| 610 620 630 640 650 660 670 680 690 700 AAACACTTCC GGC--AAATT GATGTTCGGC AAATGGCAAA TC-------- GGAAACTTGC CGAAAAT--- ---------- TACAGTTTCC GGTAA----AAAAATGACC CATTTATATT CAGAAGAAAT CTCGAAAACG TTTTTTTTTT GAAAACCAAG AGAAGAATGC GCCTGAACAG TGATGCTTAT CGAGGTAATA AATGAAGATT C----ACATC CAAAAACTGA TCACAAGGCG TA-------- GAAAGAACTG TGAACTG--- ---------- TAATTCTTTC GGACG----AGAGAAGTTT CG---AGATC TCGTTGCCAT CCATAAAAAA AC-------- GAAAACGCTG TCATGTAAAT CCGGGA---- TGCTGCTTGC AGACATAAAT

C.ele C.bri C.rem C.bre

....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| 710 720 730 740 750 760 770 780 790 800 -ATCGGCAAA CCGGCAAACT --------GC CTGAATTGAA AAGTTCCGTC AAATCGGCAA ACCGACAACA CCCCTGGCAC AAATGATGGA CATACTGAGG TAAAGACTTC AGGACTTAAC AATAAAACAA ATGCGTCCAG GAATTTTTGA AGCGGGTCTA GCCAATGTTT TTTTTCTTTT TCAAAATCCC TTTCTGTGAA -ATTGGATAC TAG-CGGAAT ---------- ACTC-T--AC AAAT-----A ATGAG---AA ACCAAGACCT GATTTCACCT GAAAAGCTCT TCTCCGTGAA AATCTATTTT TAA-CAGATT TGAGAGTGGA ATTCGT--AA GAAT-----C AGGAGCGAAA ATTAATGACA AATGACAGGT AGTGAGATGG AAAACTAGGA

C.ele C.bri C.rem C.bre

....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| 810 820 830 840 850 860 870 880 890 900 CAATTTGCCG GT-------- --------TT TCCAATTGCA GGAAATTTT- ---CAATTCC GGCAGTGTGC CGA------- TTTGCCGGAA ATTTTAATTC AAATGATGCC GTGAGATGAA GTCCCAAATT TAGAAAAGTT TCAAACATTG AAACAACACA TTTGAGGAAT AGGCTCTTCT TATTTTAGTA GCCAAGCAAC GAAA------ CT-------- ---------- ---AAAAACA ACTGTCTT-- ---CACTTGA AATGAAG--- ---------- -----TGAAA ACAAAAAAGA GAAGGTTCCT CT-------- ---------- ---CAAAGTT TTGACCTT-- ---CAGTCCA ACTGAAGAGT ---------T TTCGATGGAA GTTGAAACAA

....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| 910 920 930 940 950 960 970 980 990 1000 C.ele C.bri C.rem C.bre

AGGCAAATTG AAACGAATTT AGACGGA--AAAAGGA---

CCGATTTCCC CCATGCAACC --AGAAAATC --GAAAAAGT

GATTTCCCGA CATCCGTAGA CAATTGAAAA GAATAGAAAA

TTTGCCGGAA CATCAAAACG CA--TAGG-AAAGTGAA--

AAAATCGTTT ATGCTCATCT ----TTTACT ----TTACCC

GCCGCCCACC CACAGTCGTC CAAATTCACC CTCATTCAAT

-------CCT ACGTTTTTCT --------CC --------CT

---------GAGAATGAAG ---------TTTGAAGAAA

---------AGAAACAAAA ---------AAAGAACAGA

--GGGT---AAGGGGTTGG --GAGA---ATGAGA----

....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| 1010 1020 1030 1040 1050 1060 1070 1080 1090 1100 C.ele C.bri C.rem C.bre

--CTGAACCT AAAGAAATCG AAATGTATTT AAAAGGATCT

TGATTGTTAC AATAGGAAAT AATTTCGATT AATTTCAGAT

AAAACATT-T AGGGGATTAC TGAAAACCCC GGTTCCTTTT

TTAGCTCTTT TCAATTCCAA TCAGAATTTG TCAAAAGACT

GGAGAAATAA TCCCAACCGA AATTTTCTCA CCTAATCTAA

AATGAATCTC ATCTCATTTC AGAAAAAGTC TTCTTCTTTT

GTAAAATTTAAAAAACTCT GGTGAATTTT AATAGTTTTC

-AATTGACGA CAACCAACTA CTAATGGACT TTAATGGACT

GGACGATATT ATCCCACATT GTACGA-ATG GTACAA-TTT

AGCT----GT GCCTCGAGAT ACAT-----T CCCTCTGTGT

....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| 1110 1120 1130 1140 1150 1160 1170 1180 1190 1200 C.ele C.bri C.rem C.bre

CTCTTT---TCTTTTTTCT TCCCCT---TCTTCC----

AGACCAAATT GTAATGTGTT -------GTT -------ATT

CAGAAAAAAA TCTGAATGGA C----AAAAG CCATAAAAAA

GAAAGAA--CTGTAAGATT CAATTA---CCATTA----

------TACT GATGTTCTCA ------CCCA ------CCCA

TCC~~~~~~~ TTTCCTTTTT TTC---T--TTC---TTTT

~~~~~~~~~~ CTCTAAAAAA --------CA AACAAGCCCA

~~~~~~~~CA CCAGCCTTCA TTTTTCTCCA TTTTTCTTTG

AATTTCCGG~ AATTTCCGGA AATTTCCGGA AATTTCCGTC

~~~~~~~TCC GCACAAGTCC --ATTGGTCC --ATCGGTCC

start of C. briggsae extended conservation promoter in blue distal blocks of conservation in yellow ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| 1210 1220 1230 1240 1250 1260 1270 1280 1290 1300 C.ele C.bri C.rem C.bre

CTCTCTCGTT CTCAACCCCA CTCTCCAT-~TCCCAT---

TTTTTTGCCA ATAGGTTCGC --~~~~~~~~ ---AAACTCG

ATAAACTCAC TCCTTTCCTC ~~AAAGTCTT CTATAGTCCC

TATAGTCGCT TTTT~~~~~~ CCCA----TT TCTTT~~~~~

GGTTCCCCCC ~~~~~~~~~~ CATT~~~~~~ ~~~~~~~~~~

TATTCACATT TATTCTACCA ATCCATCAGT GG-------~~~CCTTATT CATTCCACTC ATCCATCAGG GAACCAAAAA ~~~CCTT-TT CATTCCACTC ATCCATCAGT GGT------~~~TCTT-TT CATTCCACTC ATCCATCAGT GGT------start of proximal promoter sequences in gray

--AACCAGAGAAAACGAAT GGAACCAAAGGAACCAAA-

....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....|

1310 1400 C.ele C.bri C.rem C.bre

AAAAAGAAGA AAGAAGAAGA ---AAGAAGA ---AAGAAGA

1320

GCCTTTCGGT GTCTTC---GCCTC----GCCTC-----

1330

TTGGAGAGTA --GGAGAAGA -------GGA -------GGA

1340

GGGTCTAATA GCGTCTAATA GCGTCTAATA GCGTCTAATA

1350

ATCCCCCGTG ATCCCTG--ATCCCTG--ATCCCTG---

1360

CTCTTCAAAT --CTTCAAAT --CTTCAAAT --CTTCAAAT

1370

CATTGTGCCA CATTGTGCCA CATTGTGCCA CATTGTGCCA

1380

ACACACAGAC ACACA~~GAC ACACA~~~~C ACACA~~~~C

1390

ACACTTTATG ACACTTTATG ACACTTTATG ACACTTTATG

TGTGCTCACA GG~~CCAGAA TG--CCCA-TGTGCCCA--

....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| ....|....| 1410 1420 1430 1440 1450 1460 1470 1480 1490 1500 C.ele C.bri C.rem C.bre

CACACACGCT C---CACGCT ----CACGCT ----CACGCT

C.ele C.bri C.rem C.bre

....|....| ....|....| ....|....| ....|....| ....|.. 1510 1520 1530 1540 TGTCTTTCCT GTGAGACGAC AGCGTCACAT TTATTTCATT ACAGATG GGTGGTTCCT TTCAAGT--- --TTGT--GT TTCCT-TACA G-ACATG GGTCTTCCTG CT-ACAT--- --TT----AT TTCGT-TACA GAAGATG TGTCTTACTT CTCATAC--- --CTCTCAAT TTCGTATTTT ACAGATG

ATTTGAAGAG ATTTGAAGAG ATTTGAAGAG ATTTGAAGAG

CGAAGACGAC CAACGACGAC CGAAGAAGAC CGAAGAAGAC

GAC------GAT-GACG-GACCGA---T GTCCGACGAT

---------G AGCGCCCAAG GACGACCTCG GACGACGACG

ACGCATTCAG AGGTCTCCAG ACGTCTTCAG ACGTCTTCAG

A-GCTCTTTT A-GCT-CTTT ACGCT-CTTT ACCCTTCTTT

CCACGAAATT TCACAAATTC TCAAAAAATT TCACGAATTT

TGCTCCATCT ----TCTTCT ----CTTTCT ----CTTATT

TTCCACAATC TTCAA-AACC TTTCA-AATC TTTCG-AATT

C.ele=C. elegans; C.bri=C. briggsae; C.rem=C. remanei; C.bre=C. brenneri. All genomic sequences are available on Wormbase (http://wormbase.org/). Alignment was carried out using BioEdit (Hall, 1999) ClustalW Multiple Alignment function and was edited manually. All sequences terminate with the start codon of the unc-47 ortholog. 1. Hall, T.A. (1999) BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucl. Acids. Symp. Ser. 41:95-98.