Bioinformatics Practical for. Biochemists. Andrei Lupas, Birte Höcker, Steffen
Schmidt. WS 2012/2013. 01. DNA & Genomics. 1 ...
Bioinformatics Practical for Biochemists Andrei Lupas, Birte Höcker, Steffen Schmidt WS 2012/2013 01. DNA & Genomics
1
Description
•
Lectures about general topics in Bioinformatics & History
•
Tutorials will provide you with a toolbox of bioinformatics programs to analyze data
•
Hands-On sessions will give you the opportunity to use these tools
2
Course Outline
• • • • •
Mon
– DNA & Genomics
Tue
– Introduction to Proteins
Wed
– Annotation of Sequence Features
Thr
– Protein Classification
Fri
– Evolution & Design
Course Material: eb.mpg.de/research/departments/protein-evolution/teaching 3
Course Outline
• 13:00-14:00 • 14:15-17:30
Presentation Tutorial (2 x 30min) & hands-on practical
• You will need to keep an electronic lab notebook
• Fri afternoon: Test Exercises 4
Software Requirements
• • •
Browser (e.g. Firefox) “Advanced” Word Processor PyMOL (www.pymol.org – free for teaching)
5
DNA & Genomics
1953 Model of DNA (F. Crick) 6
What is the “genetic material”?
•
1865 Gregor Mendel
•
•
1869 Friedrich Miescher
•
•
discovery of ‘nuclein’ (DNA), Hoppe-Seyler repeated all experiments
1881 Edward Zacharias
•
•
basic rules of heredity
chromosomes are composed of nuclein
1899 Richard Altmann
•
renaming nuclein to nucleic acid
wikipedia.org
7
DNA is the “transforming material”
•
1928 Frederick Griffith
•
•
“transforming principle” Str. pneumoniae experiment
1944 Avery & McCarty
•
Griffith’s “transforming principle” is DNA
history.nih.gov / wikipedia.org
8
DNA is the genetic material
•
1950 Erwin Chargaff
•
•
A/T, C/G same amount in different tissues
1952 Hershey & Chase
•
DNA is the genetic material using 32P/35S Phage/E. coli experiment
bacteriophagetherapy.info / www.lifesciencesfoundation.org
9
Solving the DNA structure
•
1952/53 Linus Pauling
•
beat Cavendish Lab in discovery of α-helix
•
Cavendish Lab (Cambridge) Watson & Crick allowed to work full-time on DNA
•
Pauling shared manuscript with Cavendish Lab before publication (via his son Peter Pauling)
http://osulibrary.oregonstate.edu/specialcollections/coll/pauling/dna/notes/1952a.22-ms-01.html 10
Solving the DNA structure
•
NATURE | VOL 421 | 23 JANUARY 2003 | ww
•
1952 Franklin & Wilkins
•
X-ray of B-DNA - Wilkins showed results to Watson & Crick
•
periodicity, phosphates are outside
1953 Crick & Watson
•
model of B-DNA
11
ature.com/nature
Solving the DNA structure
Nature, 1953
© 2003 Nature Publishing Group
12
397
DNA structure
13
Getting the “code”
•
1953 George E. Palade
•
•
“RNA organelles” (ribosomes)
1957 Crick et.al
• • •
suggest non-overlapping triplets only 20 out of 64 triplet code for an amino acid “comma-free code”
14
(d) The code is probably ‘degenerate’; that is, in general, one particular ammo-acid can be coded by one of several tripieta of bases.
The Reading ofthe the Code“code” Getting
‘report hers our work ,on the mutant P 13 (now renamed FC 0) in the Bl segment of the B cistron. Thie mutant was originally produced by the action of proflavins. We@ have previously argued that acridines such aa pro5vin act as mutagens because they add or dslsts a base or bases. The most striking evidence in favour of this is that mutants produced by a&dines are seldom ‘leaky’ ; they are almost always completely Since our note lacking in the function of the gene. was published, experimental data from two eourcsa have been added to 0u.1: previous evidence: (1) we have examined a set of 126 pn mutants made with polyF acridine protein yellow; of these only 6 are IeaLT- (typically about half the mutants made with base analogues are leaky) ; (2) Streisinger lo has found that whereas mutants of the lysozyme of phage T4 produced by all lysozyme baas-analogues are usually leaky, mutants produced by proflavin are negative, that is, the function is completely lacking. If an acridine mutant i,3 produced by, say, adding a base, it should revert to ‘lvild-type’ by deleting a bass. Our work on revertants of FC-0 shows that it-usually
The evidence that the genetic cods is not overlapping (see Fig. 1) doss not come from our work. but from that, of Wittmannl and of Tsugita and Frasnkel-Conrat on the mutants of tobacco mosaic virus produced by nitrous asid. In an overlapping triplet code, an alteration to one baas will in general change three adjacent amino-acids in the polypeptide produces chain. Their work on the polyU alterationsmRNA produced in the protein of the virus show that usually only one amino-acid at a time is changed a8 a result of treating complete genetic code the ribonuclsic acid (RNA) of the virus with nitrous acid. In the rarer cases where two amino-acids are altered (owing presumably to two separate deammations by the nitrous acid on one piece of RNA), the altered amino-acids ars not in adjacent positions in the polypeptide chain. Brsnnera had previously shown that, if the code were universal (that is, the same throughout Nature), then all overlapping triplet codes were impossible. no overlapping codes Starlinq point Moreover, all the abnormal human hremoglobins 3 ,, ;$I Overlappirq code studied in detail4 show only single amino-acid changes. The newer experimental rssulta ssssntially rule out concept of mRNA +7 all simple codes of the overlapping type. NUCLEIC ACID * I’ ’ ’ ’ ’ ’ ’ --If the code is not overlapping, then there must be ,-J+-~---triplet Code Borne arrangement to show how to select the correct ETC. 1 triplets (or quadruplets, or(Crick, whatever Brenner, it may be)Barnett, along 3 ' the continuous sequence of bases. One obvious Non-overlapplnq Code Watts-Tobin) suggestion is that, say, every fourth baas is a ‘comma’. Fig. 1. To show the difference between an overlapping code and &other idea is that certain triplets make ‘sense’, a non-overlappinu code. The short wrticnl lines represent the whereas others make ‘nonsense’, as in the comma-free bases of the nucleic acid. The czw illustrated is for a triplet code
•
1961 Nirenberg & Matthaei
• •
•
1961 Sydney Brenner
• • •
15
Getting the “code” – incl. start & stop codons
•
Alternative start codon
• • •
•
AUG (83%) GUG (14%) UUG (3%)
Alternative stops
• • •
UAA (63%, ‘ochre’) UGA (29% ‘opal’) / or Sec (Seleoncys) UAG (8%, ‘amber’)
E. coli
16
Gene Structure
•
1977 Sharp & Roberts
•
•
1982 Cech
•
•
pre-mRNA is processed
ribo(nucleic en)zymes
1980 Joan A. Steitz
•
role of snRNPs in splicing
wikipedia.org / yale.edu
17
Gene Structure – Eurkayotes / Prokaryotes
lac Operon 1: Regulatory gene
3: ß-galactosidase 4: ß-gal permease 8: ß-gal transacetylase
Promotor region
18
Gene structure – Polysomes in Prokaryotes
•
EM picture of polysomes on a chromosome
mRNA with Ribosomes
Transcription DNA initiation Miller, O. L. et al. Visualization of bacterial genes in action. Science 169, 392–395
19
Gene Structure – Prokaryotic Operons
lac Operon 1: Regulatory gene
3: ß-galactosidase 4: ß-gal permease 8: ß-gal transacetylase
Promotor region
Griswold, A. (2008) Nature Education 1(1) Understanding Bioinformatics, Zvelebil & Baum, 2007
20
Gene Structure – Prokayotes
u-tokyo.ac.jp
21
Gene Structure – Eurkayotes / Prokaryotes
lac Operon 1: Regulatory gene
3: ß-galactosidase 4: ß-gal permease 8: ß-gal transacetylase
Promotor region
22
Gene Structure – Eukaryotes
zazzle.com
23
Gene Structure – Comparison Eukaryote!
!
Prokaryote!
• Often&have&introns& • Intraspecific&gene&order&and&number& generally&relatively&stable&&
Genes!
• many&non8coding&(RNA)&genes& • There&is&NOT&generally&a&relationship& between&organism&complexity&and&gene& number&
Gene!regulation!
• Promoters,&often&with&distal&long&range& enhancers/silencers,&MARS,&transcriptional& domains& • Generally&mono8cistronic&
Repetitive!sequences! Organelle! (subgenomes)!
• No&introns& • Gene&order&and&number&may& vary&between&strains&of&a&species&
• Promoters& • Enhancers/silencers&rare&& • Genes&often®ulated&as& polycistronic&operons&
• Generally&highly&repetitive&with&genome&wide& • Generally&few&repeated& sequences& families&from&transposable&element& propagation& • Relatively&few&transposons& • Mitochondrial&(all)&
• Absent&
• chloroplasts&(in&plants)&
24
Genomic era
•
1975 Frederick Sanger
•
• •
dideoxy sequencing
1986 Human Genome Initiative Genomes
• • • • •
1995
H. influenca
1.8 Mb
1.7k
genes
1997
E. coli
4.6 Mb
4.3k
genes
1996
S. cerevisiae
12.5 Mb
5.7k
genes
1998
C. elegans
100 Mb
21.7k
genes
2000
D. melanogaster
121 Mb
17k
genes
25
Prokaryotic Genome
•
E. coli
• •
6 Mbp 1 by 2 µm cell size
Kavanoff, Nature Education : Supercoiled chromosome of E. coli.
26
The human genome •
2001
Draft H. sapiens
2.9 Bb
20-30k genes
Science (2001), Nature (2001)
27
The human genome
28
Gene content
29
Genome Structure – Comparison Eukaryote!
!
Prokaryote!
• Large&(10&Mb&–&100,000&Mb)&
Size!
Content!
• There&is¬&generally&a& relationship&between&organism& complexity&and&its&genome&size& (many&plants&have&larger& genomes&than&human!)& • Most&DNA&is&nonLcoding&
• Complexity&(as&measured&by&of&genes& and&metabolism)&generally&proportional& to&genome&size& • DNA&is&“coding&gene&dense”& • Circular&DNA,&doesn't&need&telomeres&
Telomeres/! Centromeres!
• Present&(Linear&DNA)&
Number!of! chromosomes!
• More&than&one,&(often)&including& those&discriminating&sexual& identity&
Chromatin!
• Generally&small&(