understanding of the big picture of what this worksheet is helping you learn. Big
Picture ... There are many tools available for building phylogenetic trees.
AP Biology
Student Worksheet
Learning Evolution Using Phylogenetic Analysis The purpose of this hands-‐on practice is to learn how to utilize bioinformatics tools to help you learn about evolution. Pre-‐requisites: You should have completed the evolution review homework exercises provided to you by your teacher prior to this lesson.
To begin the worksheet, skip to page 3. Pages 1 and 2 are for your information and better understanding of the big picture of what this worksheet is helping you learn.
Big Picture When we perform phylogenetic analysis we follow these main steps:
1. Gather your characteristics based on which you want to compare the desired species/entities. In bioinformatics we generally use genomic or protein sequences. NCBI’s Nucleotide Page 1 of 25
AP Biology
2.
3.
4.
5.
Student Worksheet
(http://www.ncbi.nlm.nih.gov/nucleotide/) and Protein (http://www.ncbi.nlm.nih.gov/protein/) databases are good sources of genomic and proteomic sequences. For this lesson, the sequences are provided to you at http://compbio.soe.ucsc.edu/binf-‐in-‐AP/ Perform multiple sequence alignment on the selected sequences. There are many online tools available that produce multiple sequence alignment. In this lesson we will be using Clustal Omega site http://www.ebi.ac.uk/Tools/msa/clustalo/ Calculate the distance matrix from the multiple sequence alignment. A distance matrix is a square matrix that indicates the distance between each two species in your list. In this lesson we will be using Protdist tool in Phylip package on The Institut Pasteur’s web site http://mobyle.pasteur.fr/cgi-‐ bin/portal.py#forms::protdist Note: There are many different computational methods for building a phylogenetic tree. In this lesson we will be using distance based tree construction. That is why we need to calculate a distance matrix. There are methods that are not based on evolutionary distance. Those methods do not require a use of a distance matrix. Some of them are regarded more highly than distance-‐based methods. However, they are often quiet slow to do optimally. In this worksheet, we are going to use neighbor-‐joining method, which is a class of distance based method family. It is a good compromise between speed and accuracy when producing phylogenies. Build a phylogenetic tree. There are many tools available for building phylogenetic trees. Some are web tools and some have to be downloaded to your computer. In this lesson we will use a web version of Phylip package located on The Institut Pasteur web site. We will specifically use Neighbor-‐Joining and UPGMA methods tool at http://mobyle.pasteur.fr/cgi-‐bin/portal.py#forms::neighbor The Institut Pasteur is located in Paris, France, and has many useful bioinformatics tools on its web site. We encourage you to explore those tools on your own outside of this lesson about Phylogeny. Just navigate to http://mobyle.pasteur.fr/cgi-‐bin/portal.py#welcome and investigate the Programs list in the top right corner of this web page. This web site provides a workbench in which you are able to click back and forth between different tools/forms you are using and various jobs that have been completed by those tools. Some of the other tools you might want to explore on your own are: • Phylogeny.fr at www.phylogeny.fr/version2_cgi/index.cgi • TreeTop at http://www.genebee.msu.su/services/phtree_reduced.html Visualize the tree in graphical output from the text representation of the tree produced by the previous step. In this lesson we will use Newickstop, Drawtree, and Drawgram in Phylip package at http://mobyle.pasteur.fr/cgi-‐bin/portal.py#forms::newicktops http://mobyle.pasteur.fr/cgi-‐bin/portal.py#forms::drawtree http://mobyle.pasteur.fr/cgi-‐bin/portal.py#forms::drawgram
Page 2 of 25
AP Biology
Student Worksheet
Part A Exercise 1: Draw a line between the two animals that are more closely related.
Quagga
Zebra
Horse
Page 3 of 25
AP Biology
Student Worksheet
Exercise 2: Draw a line between the two animals that are more closely related.
Banksia
Pine
Hakea
Page 4 of 25
AP Biology
Student Worksheet
Exercise 3: Draw a line between the two animals that are more closely related.
Horseshoe crab
Stone crab
Aquatic spider
Page 5 of 25
AP Biology
Student Worksheet
Exercise 4: Draw a line between the two animals that are more closely related.
Barnacle
Shrimp
Limpet
Exercise 5: When using computational methods to determine phylogeny, we can use either genomic (DNA) or protein (amino acid) sequences. Can you think of any reasons when it is advantageous to use genomic over protein sequences and vice versa? Page 6 of 25
AP Biology
Student Worksheet
Part B Exercise 1: In order to calculate distance between two or more species/entities based on protein sequences, we first need to know how simular those sequences are. Bioinformaticians use a technique called multiple sequence alignment just for that purpose. The discussion of how multiple sequence alignment algorithms work is outside of the scope of this course. Let’s align some sequences. 1. Download partB_1.txt from http://compbio.soe.ucsc.edu/binf-‐in-‐AP/evolution/ 2. This file contains sequences for Beta Globin for these species: Homo sapiens (human),
Bos taurus (cow
), Salmo salar (Atlantic salmon
Mus musculus (mouse
Gorilla gorilla (gorilla
),
), Otolemur crassicaudatus (galago
), Gallus gallus (rooster
Tarsius syrichta (lemur
Dasypus novemcinctus (armadillo
),
), Bradypus tridactylus (sloth
), Rattus norvegicus (rat
),
),
), Pan
troglodytes (chimp ) 3. This file contains protein sequences in a format called FASTA. It is just one of many file formats Bioinformaticians use. (If you wish to know more about the FASTA format, read about it on Wikipedia http://en.wikipedia.org/wiki/FASTA_format)
Page 7 of 25
AP Biology
Student Worksheet
You now completed the following steps in the phylogenetic analysis pipeline:
4. Go to Clustal Omega site at http://www.ebi.ac.uk/Tools/msa/clustalo/ 5. Either a. Open the downloaded file in any text editor (e.g. TextEdit on MacOS or TextPad on Windows) b. Highlight all sequences and copy the text c. Paste the copied sequences into Step1 – Enter your input sequences field 6. Or a. In Clustal Omega click on Choose File, navigate to the file you downloaded in step 3 and click Open
Page 8 of 25
AP Biology
Student Worksheet
7. Make sure that it says PROTEIN in the drop down box above the data input field 8. Click on Submit button 9. Once alignments appear, click on Show Colors button. You should see a screen like this:
Page 9 of 25
AP Biology
Student Worksheet
10. Click on FAQ to the left of the alignment section of the screen. A new window will open with the list of questions as links. Click on the question “What do the colours mean when I show them on the alignment?” What does each color of the alignment mean?
You now completed the following steps in the phylogenetic analysis pipeline:
Exercise 2: Let’s calculate the distance matrix for the multiple alignment we created. Page 10 of 25
AP Biology
Student Worksheet
1. Click on Download Alignment File button. The alignment file is a text file containing the alignment. Either save the file and open it in a text editor (e.g. TextEdit) or chose to open it in a text editor without saving it first. 2. Highlight the contents of the text file and copy it. 3. Go to Protdist tool in Phylip package at http://mobyle.pasteur.fr/cgi-‐bin/portal.py#forms::protdist 4. Paste the alignment into the data input box. 5. Click on Run button. 6. Enter your email address into Your email field and click OK 7. Enter captcha text into the validation box and click OK
You now completed the following steps in the phylogenetic analysis pipeline:
Exercise 3: Let’s now build a phylogenetic tree from the distance matrix produced in exercise 2. 1. Highlight all the text in the Outfile (PhylipDistanceMatrix) and copy. Page 11 of 25
AP Biology
Student Worksheet
2. On the top left of the page you can see the list of programs provided by Pasteur web site:
3. 4. 5. 6. 7.
Click on the plus sign next to phylogeny to expand it. Click on plus sign next to distance to expand it. Click on neighbor program. You will see Neighbor-‐Joining and UPGMA methods tool in Phylip package (http://mobyle.pasteur.fr/cgi-‐bin/portal.py#forms::neighbor) Notice how you can click back and forth between Forms and Jobs tabs as well as within different subtabs within those. This workbench makes it very easy and fluid to use multiple tools. Paste the distance matrix into the data input text field. Click on Run button. Click on the full screen view button under the Neighbor output file text box. Explore the output tree in the new window/tab. Please note that it says “remember: this is an unrooted tree!” in the Neighbor output file field. This is important. Even if the tree looks like it is drawn as rooted, there is no root in this tree.
8. Now, look at Neighbor output tree file text box. What you see in there is the output tree in NEWICK format, which is one of the text formats for representing trees. In this format, each subtree is encolsed in a set of matching parentheses. Two leaf nodes or subtrees are separated by a comma. Every opening parenthese must be matched by a closing parenthese. The tree ends with a semi-‐colon. 9. Can you draw a tree from the following NEWICK formatted tree? (((A, B), (C, D)), (E, F)) Page 12 of 25
AP Biology
Student Worksheet
10. Click on view with archaeopteryx button under Neighbor output tree file field. Does the tree you see look like the same or different as the one in step 7? Troubleshooting: Archaeopteryx is a Java application. If Java is disabled on your computer then skip this step and go to Exercise 4 below. 11. Notice the menu at the top of the popup screen. Play around with different types of tree under the Type menu option. Notice that the types available here do not change the topology of the tree. They only change the way edges are dispayed. 12. Click on X in the upper right corner to close archaeopteryx screen. You now completed the following steps in the phylogenetic analysis pipeline:
Exercise 4: Let’s now learn how to visualize the tree using tools provided by Phylip. Part A
Let’s learn newicktops tool. 1. Highlight and copy the text in the Neighbor output tree file text box. 2. On the top right of the page, under phylogeny click on plus sign next to display to expand it. Click on newicktops program. You will see Newicktops tool in Phylip package (http://mobyle.pasteur.fr/cgi-‐ bin/portal.py#forms::newicktops) 3. Paste the tree in NEWICK format into the data input box. Page 13 of 25
AP Biology
Student Worksheet
Note: If you are using a Windows machine then skip to Part B below. Newicktops outputs a tree layout in a format that Windows cannot read. 4. Click on Run button. 5. Click on full screen view under Graphic tree file:
6. The file should open in Preview or in another brower tab Troubleshooting: If you are having troubles with this step then: • If you are on MacOS and the file does not open in preview then check the bottom of your browser to see if a .ps file has been downloaded, then click on that file. Depending on your browser, the downloaded file may not be at the bottom of the brower window. Check downloads directory. • If all else fails, skip to Part B below. 7. You should see an output similar to this:
8. Does the tree you see look like the same as or different from the one viewed in archaeopteryx? 9. From this tree, what species’s beta globin is the human’s beta globin is most closely related to? 10. From this tree, what species’s beta globin is the human’s beta globin is second most closely related to? Page 14 of 25
AP Biology Part B
Student Worksheet
Let’s learn drawgram tool. 1. Go back to Jobs workbench tab. 2. Click on the last subtab that says “neighbor – XXX” where XXX is a timestamp (the job subtabs run from top to bottom and from left to right). 3. Highlight and copy the text in the Neighbor output tree file text box. 4. On the top right of the page, under phylogeny -‐> display click on drawgram program. You will see Drawgram tool in Phylip package (http://mobyle.pasteur.fr/cgi-‐bin/portal.py#forms::drawgram) 5. Paste the tree in NEWICK format into the data input box. 6. Click on Advanced Options and in the field Which plotter or printer will the tree be drawn on select Postscript for MacOS or Bitmap for Windows. 7. Click on Run button. 8. Examine the Standard output field. 9. Does this tool output rooted or unrooted tree? (Hint: The text of the output should state so.) 10. Click on full screen view under Graphic tree file. Troubleshooting: If you are having troubles with this step then: • If you are on Windows, did you remember to select Bitmap format for your output in step 6? • Check the bottom of your browser to see if a .ps (or .bmp on Windows) file has been downloaded, then click on that file. Depending on your browser, the downloaded file may not be at the bottom of the brower window. Check downloads directory. • If all else fails, skip to Part C below. 11. The file should open in Preview 12. You should see an output similar to this:
13. What species does the chicken appear to be the closest to in this tree? Page 15 of 25
AP Biology
Student Worksheet
14. Do you believe that is a correct relationship? If not, how would you suggest fixing it? 15. Go back to the Forms workspace tab. 16. You should be back in drawgram subtab. 17. Scroll down to Tree grows … option and pick Horizontally in the drop down list. 18. Right below it, select Circular tree (O) in the Tree style drop down list. 19. Click on Run button. 20. Click on full screen view under Graphic tree file. 21. You should see an output similar to this:
22. Where in this tree would you place a root? (you can just draw it on the figure above) Part C
Let’s learn drawtree tool. 1. Go back to Jobs workbench tab. 2. Click on the last subtab that says “neighbor – XXX” where XXX is a timestamp (the job subtabs run from top to bottom and from left to right). 3. Highlight and copy the text in the Neighbor output tree file text box. 4. On the top right of the page, under phylogeny -‐> display click on drawtree program. You will see Drawtree tool in Phylip package (http://mobyle.pasteur.fr/cgi-‐bin/portal.py#forms::drawtree) 5. Paste the tree in NEWICK format into the data input box. 6. Scroll down and select Yes in the drop down list for Try to avoid label overlap option Page 16 of 25
AP Biology
Student Worksheet
7. Click on Advanced Options and in the field Which plotter or printer will the tree be drawn on select Postscript for MacOS or Bitmap for Windows. 8. Click on Run button. 9. Examine the Standard output field. 10. Does this tool output a rooted or an unrooted tree? (Hint: the output text should state this.) 11. Click on full screen view under Graphic tree file. Troubleshooting: If you are having troubles with this step then: • If you are on Windows, did you remember to select Bitmap format for your output in step 6? • Check the bottom of your browser to see if a .ps (or .bmp on Windows) file has been downloaded, then click on that file. Depending on your browser, the downloaded file may not be at the bottom of the brower window. Check downloads directory. • If all else fails, skip to Exercise 5 below. 12. The file should open in Preview. 13. You should see an output similar to this:
14. Where in this tree would you place a root? (you can just draw it on the figure above) 23. Go back to the Forms workspace tab. 24. You should be back in drawtree subtab. 25. Scroll down to Use branch lengths option and pick No in the drop down list. 26. Click on Run button. 27. Click on full screen view under Graphic tree file. 28. You should see an output similar to this:
Page 17 of 25
AP Biology
Student Worksheet
29. As a biologist, how would you interpret the difference between the trees produced in steps 8 and 26? What does one tree tell you that the other tree does not tell you? You now completed the following steps in the phylogenetic analysis pipeline:
Page 18 of 25
AP Biology
Student Worksheet
Exercise 5: As you can see from Exercise 4, without including an outlier group a tree could show misleading relationships. The trees produced in Exercise 4 led you to believe that chicken is more closely related to Salmon than any other species. Let’s now add an outlier group. We are going to add the proteomic sequence for the human myoglobin protein. An outlier groups should be a related sequence whose relationship is known to be older than the one you are interested in. This could be a paralog that predates the speciation. We know that myoglobin and betaglobin proteins split from the common ancestor before speciation for those species we included into our analysis. 1. Download partB_2.txt from http://compbio.soe.ucsc.edu/binf-‐in-‐AP/ This file contains the same sequences we used in Exercise 1 with one addition of the outlier sequence 2. Open the downloaded file in any text editor (e.g. TextEdit) 3. Highlight all sequences and copy the text 4. Perform multiple sequence alignment as described in Exercise 1 and download the alignment file. 5. Compute the distance matrix for the alignment as described in Exercise 2. 6. Build phylogenetic tree as described in Exercise 3. a. When in the Neighbor-‐Joining and UPGMA methods tool, scroll down to Outgroup species (default, use as outgroup species 1) option and type in 13 (this means that 13th sequence is the outlier group) 7. Visualize the tree using one or more of the methods described in Exercise 4. Troubleshooting: If you were unable to succeed with any visualization methods in Exercise 4 then skip through to Part C. Page 19 of 25
AP Biology
Student Worksheet
8. In the output tree, what species/group is chicken most related to?
Page 20 of 25
AP Biology
Student Worksheet
Part C In this section we will use phylogenetic analysis to find out to which strain of the Simian immunodeficiency virus (SIV) the Human immunodeficiency virus (HIV) is evolved from. SIV is able to infect at least 33 species of African primates. Different strains of this virus have been extracted from different species. We will use the protein sequence of Group-‐specific antigen (GAG) protein from 4 different strains of SIV and from 1 strain of HIV to build a phylogenetic tree. GAG gene is a characteristic component of retroviruses. Retroviruses are those viruses that carry an RNA genome, rather than a DNA genome. They use an enzyme called reverse transcriptase to produce DNA from their RNA genome, then incorporate that DNA into the host’s genome. Both SIV and HIV are retroviruses. Remember that in order to produce a phylogenetic tree you follow these steps you learned in Part B:
Exercise 1: Let’s use GAG protein sequence from different strains of SIV and HIV to build a phylogenetic tree. 1. Download partC.txt from http://compbio.soe.ucsc.edu/binf-‐in-‐AP/evolution/
Page 21 of 25
AP Biology
Student Worksheet
This file contains GAG sequences for immunodefficiency viruses found in humans, African green
monkeys
, Sooty mangabey monkeys
, Chimpanzees
, and
Macaques . 1. Following the steps you learned in Part B analyze phylogeny of the provided SIV and HIV viruses and answer the following questions. 2. Open the downloaded file in any text editor (e.g. TextEdit) 3. What species of SIV virus is human HIV virus is more closely related to?
Page 22 of 25
AP Biology
Student Worksheet
Part D We will now make phylogenetic analysis to determine phylogeny of marsupials based on Retinol Binding Protein 3. This protein is a large extracellular glycoprotein that binds retinol to the contiguous layer of pigment epithelium cells. It is well known phylogenetic marker in mammal evolution and has been used in the scientific studies about phylogeny of mammals before. However, this protein is not specific to mammals only and is present in other animals. Remember that in order to produce a phylogenetic tree you follow these steps you learned in Part B:
Exercise 1: Let’s use RBP3 sequences from various animals to see evolutionary relationship between marsupials, placentals, and other animals. 2. Download partD.txt from http://compbio.soe.ucsc.edu/binf-‐in-‐AP/evolution/ This file contains sequences for the following species:
Page 23 of 25
AP Biology
Student Worksheet
Marmosops noctivagus (neotropic opossum
), Ornithorhynchus anatinus (platypus
), Dipodomys merriami (mariam’s kangaroo rat
(ord’s kangaroo rat
), Dipodomys spectabilis (banner-‐tailed kangaroo rat
Wallabia bicolor (swamp wallaby
Setonix brachyurus (quokka
Castor canadensis (beaver
),
), Petrogale lateralis (rock wallaby
),
), Onychogalea unguifera (nail-‐tail wallaby
),
), Peromyscus maniculatus (deer mouse
Uranomys ruddi (white-‐bellied brush-‐furred rat
mouse
), Dipodomys ordii
),
), Notomys fuscus (dusky hopping
), Perognathus flavus (silky pocket mouse
(painted spiny pocket mouse ), Drosophila melanogaster (fly 4. Which out of the listed species should be an outgroup? 5. Open the downloaded file in any text editor (e.g. TextEdit)
), Liomys pictus
).
Page 24 of 25
AP Biology
Student Worksheet
6. Following the steps you learned in Part B analyze phylogeny of the provided species and answer the following questions. 7. What other species is platypus most closely related to? 8. Do kangaroo rats belong to the same clade? 9. What other species are the kangaroo rats are most closely related to?
Page 25 of 25