How to access bioinformatics tools? • Why is ... those to acquire, store, organize,
archive, analyze, or visualize such data. ... Human Genome Project. • Initiated in ...
Introduction to Bioinformatics for Medical Research Gideon Greenspan
[email protected] TA: Oleg Rokhlenko
Lecture 1 Introduction to Bioinformatics
Introduction to Bioinformatics • • • • • • •
What is Bioinformatics? Why do we need it? Development timeline Journals, books, websites How to access bioinformatics tools? Why is bioinformatics hard? PubMed and OMIM databases 2
Bioinformatics: What? • NCBI: “Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.”
• Lincoln Stein: “Biologists using computers, or the other way around.”
• Martin Gerstel (Compugen): “Bioinformatics is a name which will probably disappear with time.” 3
Bioinformatics: Why? • Storing large quantity of data – Sequencing – Crystallography – DNA chips
• Enabling fast retrieval – Database searching
• Data mining and analysis – Integrate diverse sources 4
Human Genome Project • Initiated in 1988, declared ‘complete’ 2003 • Major goals – Determine 3¥109 base pairs – Identify ~30,000 genes
• Computational tasks – Storage and indexing – Building contigs – Scanning for genes 5
Human Genome Progress
Source: EMBL Genome Monitoring Table
6
IBM’s Blue Gene • Task: in-silico protein folding • Announced 1999 – Expanded in 2001
• 500,000 times faster than Pentium IV • Aim: Fold one protein per year 7
Bioinformatics: When? Watson and Crick DNA model
Sanger sequences insulin protein
1955 1960
N-W sequence alignment PDB (Protein Data Bank) GenBank database
ARPANET (early Internet)
1965 1970 1975 1980 1985
Sanger dideoxy DNA sequencing PCR (Polymerase Chain Reaction) 8
SWISS-PROT database
USA’s NCBI FASTA algorithm
1990
BLAST algorithm
Human Genome Initiative Israel’s INN
1995
WWW (World Wide Web) Celera Genomics
Europe’s EBI 2000
First human genome draft 9
GenBank Growth
Source: NCBI 10
PubMed Growth 14,000,000
10,000,000 8,000,000 6,000,000 4,000,000 2,000,000
19 83 19 86 19 89 19 92 19 95 19 98 20 01
0
19 8
77
19
74
19
71
19
68
19
65
19
62
19
59
0
19
Articles in Database
12,000,000
11
Bioinformatics: Where? • Journals
12
• Books – David W. Mount, Bioinformatics: Sequence and Genome Analysis – Cynthia Gibas, Developing Bioinformatics Computer Skills – Bryan P. Bergeron, Bioinformatics Computing
13
• World Wide Web – USA National Center for Biotechnology Information: www.ncbi.nlm.nih.gov – European Bioinformatics Institute: www.ebi.ac.uk – ExPASy Molecular Biology Server: www.expasy.org – Israeli National Node: inn.org.il – Open source news: bioinformatics.org – German directory: bioinformatik.de 14
Bioinformatics: How? • Pre-packaged tools – Majority on World Wide Web – Some require downloading – Most are free to use
• Beginning development – Mostly Unix environment – Perl programming language
15
The Trouble with Nature • Hard to represent • Understanding still incomplete • Some problems insoluble?
16
The Trouble with Man • Confusing choice of tools • Developed independently • Written by and for nerds
17
Making it Simpler
18
PubMed • MEDLINE publication database – Over 17,000 journals – Some other citations
• Papers from 1960s – Over 12,000,000 entries
• Alerting services – http://www.pubcrawler.ie/ – http://www.biomail.org/ 19
A PubMed Entry • Journal reference – Volume, number, date, pages – Title, authors, affiliation – Abstract Cancer 2003 May 1;97(9):2248-53
• Links – Related articles – Full text (sometimes) – Database entries
Pregnancy and early-stage melanoma. Daryanani D, Plukker JT, De Hullu JA, Kuiper H, Nap RE, Hoekstra HJ. Division of Surgical Oncology, University Medical Center, Groningen, The Netherlands. BACKGROUND: Cutaneous melanomas are aggressive tumors with an unpredictable… 20
Searching PubMed • Structureless searches – Automatic term mapping
• Structured searches – Field names, e.g. [au], [ta], [dp], [ti] – Boolean operators, e.g. AND, OR, NOT, ()
• Additional features – Subsets, limits – Clipboard, history 21
OMIM • Online Mendelian Inheritance in Man – Genes and genetic disorders – Edited by team at Johns Hopkins – Updated daily
• Entries – 10670 single-loci phenotypes (*) – 1294 multi-loci phenotypes (#) – 2415 unclassified phenotypes 22
An OMIM Entry • Phenotype description – Clinical features – Diagnosis and treatment – Molecular genetics CYSTIC FIBROSIS; CF
• Inheritance Model – Mapping history – Genetic locus/loci
• References
Alternative titles; symbols MUCOVISCIDOSIS Gene map locus 7q31.2 DESCRIPTION Manifestations relate not only to the disruption of exocrine function of the pancreas… 23
Searching OMIM • Search Fields – Disease name, e.g. hypertension – Cytogenetic location, e.g. 1p31.6 – Inheritance, e.g. autosomal dominant
• Browsing Interfaces – Alphabetical by disease – Genetic map
• Additional features like PubMed 24