Introduction to Bioinformatics - Laboratory of Computational Biology

35 downloads 524 Views 3MB Size Report
How to access bioinformatics tools? • Why is ... those to acquire, store, organize, archive, analyze, or visualize such data. ... Human Genome Project. • Initiated in ...
Introduction to Bioinformatics for Medical Research Gideon Greenspan [email protected] TA: Oleg Rokhlenko

Lecture 1 Introduction to Bioinformatics

Introduction to Bioinformatics • • • • • • •

What is Bioinformatics? Why do we need it? Development timeline Journals, books, websites How to access bioinformatics tools? Why is bioinformatics hard? PubMed and OMIM databases 2

Bioinformatics: What? • NCBI: “Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.”

• Lincoln Stein: “Biologists using computers, or the other way around.”

• Martin Gerstel (Compugen): “Bioinformatics is a name which will probably disappear with time.” 3

Bioinformatics: Why? • Storing large quantity of data – Sequencing – Crystallography – DNA chips

• Enabling fast retrieval – Database searching

• Data mining and analysis – Integrate diverse sources 4

Human Genome Project • Initiated in 1988, declared ‘complete’ 2003 • Major goals – Determine 3¥109 base pairs – Identify ~30,000 genes

• Computational tasks – Storage and indexing – Building contigs – Scanning for genes 5

Human Genome Progress

Source: EMBL Genome Monitoring Table

6

IBM’s Blue Gene • Task: in-silico protein folding • Announced 1999 – Expanded in 2001

• 500,000 times faster than Pentium IV • Aim: Fold one protein per year 7

Bioinformatics: When? Watson and Crick DNA model

Sanger sequences insulin protein

1955 1960

N-W sequence alignment PDB (Protein Data Bank) GenBank database

ARPANET (early Internet)

1965 1970 1975 1980 1985

Sanger dideoxy DNA sequencing PCR (Polymerase Chain Reaction) 8

SWISS-PROT database

USA’s NCBI FASTA algorithm

1990

BLAST algorithm

Human Genome Initiative Israel’s INN

1995

WWW (World Wide Web) Celera Genomics

Europe’s EBI 2000

First human genome draft 9

GenBank Growth

Source: NCBI 10

PubMed Growth 14,000,000

10,000,000 8,000,000 6,000,000 4,000,000 2,000,000

19 83 19 86 19 89 19 92 19 95 19 98 20 01

0

19 8

77

19

74

19

71

19

68

19

65

19

62

19

59

0

19

Articles in Database

12,000,000

11

Bioinformatics: Where? • Journals

12

• Books – David W. Mount, Bioinformatics: Sequence and Genome Analysis – Cynthia Gibas, Developing Bioinformatics Computer Skills – Bryan P. Bergeron, Bioinformatics Computing

13

• World Wide Web – USA National Center for Biotechnology Information: www.ncbi.nlm.nih.gov – European Bioinformatics Institute: www.ebi.ac.uk – ExPASy Molecular Biology Server: www.expasy.org – Israeli National Node: inn.org.il – Open source news: bioinformatics.org – German directory: bioinformatik.de 14

Bioinformatics: How? • Pre-packaged tools – Majority on World Wide Web – Some require downloading – Most are free to use

• Beginning development – Mostly Unix environment – Perl programming language

15

The Trouble with Nature • Hard to represent • Understanding still incomplete • Some problems insoluble?

16

The Trouble with Man • Confusing choice of tools • Developed independently • Written by and for nerds

17

Making it Simpler

18

PubMed • MEDLINE publication database – Over 17,000 journals – Some other citations

• Papers from 1960s – Over 12,000,000 entries

• Alerting services – http://www.pubcrawler.ie/ – http://www.biomail.org/ 19

A PubMed Entry • Journal reference – Volume, number, date, pages – Title, authors, affiliation – Abstract Cancer 2003 May 1;97(9):2248-53

• Links – Related articles – Full text (sometimes) – Database entries

Pregnancy and early-stage melanoma. Daryanani D, Plukker JT, De Hullu JA, Kuiper H, Nap RE, Hoekstra HJ. Division of Surgical Oncology, University Medical Center, Groningen, The Netherlands. BACKGROUND: Cutaneous melanomas are aggressive tumors with an unpredictable… 20

Searching PubMed • Structureless searches – Automatic term mapping

• Structured searches – Field names, e.g. [au], [ta], [dp], [ti] – Boolean operators, e.g. AND, OR, NOT, ()

• Additional features – Subsets, limits – Clipboard, history 21

OMIM • Online Mendelian Inheritance in Man – Genes and genetic disorders – Edited by team at Johns Hopkins – Updated daily

• Entries – 10670 single-loci phenotypes (*) – 1294 multi-loci phenotypes (#) – 2415 unclassified phenotypes 22

An OMIM Entry • Phenotype description – Clinical features – Diagnosis and treatment – Molecular genetics CYSTIC FIBROSIS; CF

• Inheritance Model – Mapping history – Genetic locus/loci

• References

Alternative titles; symbols MUCOVISCIDOSIS Gene map locus 7q31.2 DESCRIPTION Manifestations relate not only to the disruption of exocrine function of the pancreas… 23

Searching OMIM • Search Fields – Disease name, e.g. hypertension – Cytogenetic location, e.g. 1p31.6 – Inheritance, e.g. autosomal dominant

• Browsing Interfaces – Alphabetical by disease – Genetic map

• Additional features like PubMed 24