Bioinformatics Practical for Biochemists

57 downloads 126 Views 6MB Size Report
Bioinformatics Practical for. Biochemists. Andrei Lupas, Birte Höcker, Steffen Schmidt. WS 2012/2013. 01. DNA & Genomics. 1 ...
Bioinformatics Practical for Biochemists Andrei Lupas, Birte Höcker, Steffen Schmidt WS 2012/2013 01. DNA & Genomics

1

Description



Lectures about general topics in Bioinformatics & History



Tutorials will provide you with a toolbox of bioinformatics programs to analyze data



Hands-On sessions will give you the opportunity to use these tools

2

Course Outline

• • • • •

Mon

– DNA & Genomics

Tue

– Introduction to Proteins

Wed

– Annotation of Sequence Features

Thr

– Protein Classification

Fri

– Evolution & Design

Course Material: eb.mpg.de/research/departments/protein-evolution/teaching 3

Course Outline

• 13:00-14:00 • 14:15-17:30

Presentation Tutorial (2 x 30min) & hands-on practical

• You will need to keep an electronic lab notebook

• Fri afternoon: Test Exercises 4

Software Requirements

• • •

Browser (e.g. Firefox) “Advanced” Word Processor PyMOL (www.pymol.org – free for teaching)

5

DNA & Genomics

1953 Model of DNA (F. Crick) 6

What is the “genetic material”?



1865 Gregor Mendel





1869 Friedrich Miescher





discovery of ‘nuclein’ (DNA), Hoppe-Seyler repeated all experiments

1881 Edward Zacharias





basic rules of heredity

chromosomes are composed of nuclein

1899 Richard Altmann



renaming nuclein to nucleic acid

wikipedia.org

7

DNA is the “transforming material”



1928 Frederick Griffith





“transforming principle” Str. pneumoniae experiment

1944 Avery & McCarty



Griffith’s “transforming principle” is DNA

history.nih.gov / wikipedia.org

8

DNA is the genetic material



1950 Erwin Chargaff





A/T, C/G same amount in different tissues

1952 Hershey & Chase



DNA is the genetic material using 32P/35S Phage/E. coli experiment

bacteriophagetherapy.info / www.lifesciencesfoundation.org

9

Solving the DNA structure



1952/53 Linus Pauling



beat Cavendish Lab in discovery of α-helix



Cavendish Lab (Cambridge) Watson & Crick allowed to work full-time on DNA



Pauling shared manuscript with Cavendish Lab before publication (via his son Peter Pauling)

http://osulibrary.oregonstate.edu/specialcollections/coll/pauling/dna/notes/1952a.22-ms-01.html 10

Solving the DNA structure



NATURE | VOL 421 | 23 JANUARY 2003 | ww



1952 Franklin & Wilkins



X-ray of B-DNA - Wilkins showed results to Watson & Crick



periodicity, phosphates are outside

1953 Crick & Watson



model of B-DNA

11

ature.com/nature

Solving the DNA structure

Nature, 1953

© 2003 Nature Publishing Group

12

397

DNA structure

13

Getting the “code”



1953 George E. Palade





“RNA organelles” (ribosomes)

1957 Crick et.al

• • •

suggest non-overlapping triplets only 20 out of 64 triplet code for an amino acid “comma-free code”

14

(d) The code is probably ‘degenerate’; that is, in general, one particular ammo-acid can be coded by one of several tripieta of bases.

The Reading ofthe the Code“code” Getting

‘report hers our work ,on the mutant P 13 (now renamed FC 0) in the Bl segment of the B cistron. Thie mutant was originally produced by the action of proflavins. We@ have previously argued that acridines such aa pro5vin act as mutagens because they add or dslsts a base or bases. The most striking evidence in favour of this is that mutants produced by a&dines are seldom ‘leaky’ ; they are almost always completely Since our note lacking in the function of the gene. was published, experimental data from two eourcsa have been added to 0u.1: previous evidence: (1) we have examined a set of 126 pn mutants made with polyF acridine protein yellow; of these only 6 are IeaLT- (typically about half the mutants made with base analogues are leaky) ; (2) Streisinger lo has found that whereas mutants of the lysozyme of phage T4 produced by all lysozyme baas-analogues are usually leaky, mutants produced by proflavin are negative, that is, the function is completely lacking. If an acridine mutant i,3 produced by, say, adding a base, it should revert to ‘lvild-type’ by deleting a bass. Our work on revertants of FC-0 shows that it-usually

The evidence that the genetic cods is not overlapping (see Fig. 1) doss not come from our work. but from that, of Wittmannl and of Tsugita and Frasnkel-Conrat on the mutants of tobacco mosaic virus produced by nitrous asid. In an overlapping triplet code, an alteration to one baas will in general change three adjacent amino-acids in the polypeptide produces chain. Their work on the polyU alterationsmRNA produced in the protein of the virus show that usually only one amino-acid at a time is changed a8 a result of treating complete genetic code the ribonuclsic acid (RNA) of the virus with nitrous acid. In the rarer cases where two amino-acids are altered (owing presumably to two separate deammations by the nitrous acid on one piece of RNA), the altered amino-acids ars not in adjacent positions in the polypeptide chain. Brsnnera had previously shown that, if the code were universal (that is, the same throughout Nature), then all overlapping triplet codes were impossible. no overlapping codes Starlinq point Moreover, all the abnormal human hremoglobins 3 ,, ;$I Overlappirq code studied in detail4 show only single amino-acid changes. The newer experimental rssulta ssssntially rule out concept of mRNA +7 all simple codes of the overlapping type. NUCLEIC ACID * I’ ’ ’ ’ ’ ’ ’ --If the code is not overlapping, then there must be ,-J+-~---triplet Code Borne arrangement to show how to select the correct ETC. 1 triplets (or quadruplets, or(Crick, whatever Brenner, it may be)Barnett, along 3 ' the continuous sequence of bases. One obvious Non-overlapplnq Code Watts-Tobin) suggestion is that, say, every fourth baas is a ‘comma’. Fig. 1. To show the difference between an overlapping code and &other idea is that certain triplets make ‘sense’, a non-overlappinu code. The short wrticnl lines represent the whereas others make ‘nonsense’, as in the comma-free bases of the nucleic acid. The czw illustrated is for a triplet code



1961 Nirenberg & Matthaei

• •



1961 Sydney Brenner

• • •

15

Getting the “code” – incl. start & stop codons



Alternative start codon

• • •



AUG (83%) GUG (14%) UUG (3%)

Alternative stops

• • •

UAA (63%, ‘ochre’) UGA (29% ‘opal’) / or Sec (Seleoncys) UAG (8%, ‘amber’)

E. coli

16

Gene Structure



1977 Sharp & Roberts





1982 Cech





pre-mRNA is processed

ribo(nucleic en)zymes

1980 Joan A. Steitz



role of snRNPs in splicing

wikipedia.org / yale.edu

17

Gene Structure – Eurkayotes / Prokaryotes

lac Operon 1: Regulatory gene

3: ß-galactosidase 4: ß-gal permease 8: ß-gal transacetylase

Promotor region

18

Gene structure – Polysomes in Prokaryotes



EM picture of polysomes on a chromosome

mRNA with Ribosomes

Transcription DNA initiation Miller, O. L. et al. Visualization of bacterial genes in action. Science 169, 392–395

19

Gene Structure – Prokaryotic Operons

lac Operon 1: Regulatory gene

3: ß-galactosidase 4: ß-gal permease 8: ß-gal transacetylase

Promotor region

Griswold, A. (2008) Nature Education 1(1) Understanding Bioinformatics, Zvelebil & Baum, 2007

20

Gene Structure – Prokayotes

u-tokyo.ac.jp

21

Gene Structure – Eurkayotes / Prokaryotes

lac Operon 1: Regulatory gene

3: ß-galactosidase 4: ß-gal permease 8: ß-gal transacetylase

Promotor region

22

Gene Structure – Eukaryotes

zazzle.com

23

Gene Structure – Comparison Eukaryote!

!

Prokaryote!

• Often&have&introns& • Intraspecific&gene&order&and&number& generally&relatively&stable&&

Genes!

• many&non8coding&(RNA)&genes& • There&is&NOT&generally&a&relationship& between&organism&complexity&and&gene& number&

Gene!regulation!

• Promoters,&often&with&distal&long&range& enhancers/silencers,&MARS,&transcriptional& domains& • Generally&mono8cistronic&

Repetitive!sequences! Organelle! (subgenomes)!

• No&introns& • Gene&order&and&number&may& vary&between&strains&of&a&species&

• Promoters& • Enhancers/silencers&rare&& • Genes&often®ulated&as& polycistronic&operons&

• Generally&highly&repetitive&with&genome&wide& • Generally&few&repeated& sequences& families&from&transposable&element& propagation& • Relatively&few&transposons& • Mitochondrial&(all)&

• Absent&

• chloroplasts&(in&plants)&

24

Genomic era



1975 Frederick Sanger



• •

dideoxy sequencing

1986 Human Genome Initiative Genomes

• • • • •

1995

H. influenca

1.8 Mb

1.7k

genes

1997

E. coli

4.6 Mb

4.3k

genes

1996

S. cerevisiae

12.5 Mb

5.7k

genes

1998

C. elegans

100 Mb

21.7k

genes

2000

D. melanogaster

121 Mb

17k

genes

25

Prokaryotic Genome



E. coli

• •

6 Mbp 1 by 2 µm cell size

Kavanoff, Nature Education : Supercoiled chromosome of E. coli.

26

The human genome •

2001

Draft H. sapiens

2.9 Bb

20-30k genes

Science (2001), Nature (2001)

27

The human genome

28

Gene content

29

Genome Structure – Comparison Eukaryote!

!

Prokaryote!

• Large&(10&Mb&–&100,000&Mb)&

Size!

Content!

• There&is¬&generally&a& relationship&between&organism& complexity&and&its&genome&size& (many&plants&have&larger& genomes&than&human!)& • Most&DNA&is&nonLcoding&

• Complexity&(as&measured&by&#&of&genes& and&metabolism)&generally&proportional& to&genome&size& • DNA&is&“coding&gene&dense”& • Circular&DNA,&doesn't&need&telomeres&

Telomeres/! Centromeres!

• Present&(Linear&DNA)&

Number!of! chromosomes!

• More&than&one,&(often)&including& those&discriminating&sexual& identity&

Chromatin!

• Generally&small&(

Suggest Documents