Biotechnology & Bioinformatics - Biotechnology Program

14 downloads 184 Views 11MB Size Report
Do with Biotech Tools…. Use gene therapy to treat heritable disorders. Determine the evolutionary relationship between all living things. Track the migration of.
Biotechnology & Bioinformatics

Denneal Jamison-McClung, PhD [email protected]

Associate Director UC Davis Biotechnology Program www.biotech.ucdavis.edu

Program Coordinator NSF CREATE-REU create-reu.ucdavis.edu

The UCD Biotech Program works to bring all members of the life science community together to promote biotechnology education and workforce development.

ADP Graduate Program Industry

DEB Graduate Program

Academia

NIH & NSF Training Grants Short Technical Courses

BioTech SYSTEM K-14 Outreach

Government

Community Groups

What is Biotechnology? The use of living organisms, or parts thereof, to provide useful products, processes and services.

Cells are the basic building blocks of living organisms

Scientists in the field of biotechnology modify cellular DNA in order to produce useful proteins & other molecules DNA

Protein

Transcription & Translation

Genetic Engineering „

Cutting and moving specific functional sections of DNA (genes for specific desirable traits) from one plant, animal or microbe to another.

„

Gene-encoded desirable traits include: • • • •

Vitamin or nutrient production Disease resistance Insect resistance Stress tolerance (ability to deal with environmental conditions, such as heat, drought, saline soil, flooding, etc…) • Toxin breakdown • Drug, vaccine or “useful chemical” production

Feeding Our People y

Genetically Modified Crops Help Farmers: y y y y

Use less pesticides & fertilizers Grow more food per acre Use poor soils to grow crops Produce food with more nutrients

GM crops help farmers save $$$ and are better for the environment!

Preserving Our Planet y

Biofuels y Using plants, algae and microbes to make fuel

y

Bioremediation y Cleaning up toxins (heavy metals, poisons, etc…) in soil and water with genetically modified plants and microbes

y

Green Chemistry y Making industrial products in genetically modified plants and microbes (plastics, solvents, pigments, etc…)

Treating Diseases & Disorders y

Stem Cells & Tissue Regeneration y Repairing or growing new organs with your own cells!

y

Pharmacogenomics y Matching your genetic background to the best drug treatments y Developing new drugs

Other Great Things Scientists Can Do with Biotech Tools…. y

Use gene therapy to treat heritable disorders

y

Determine the evolutionary relationship between all living things

y

Track the migration of modern humans across the globe (past ~100300,000 years)

Biotech advances are dependent on bioinformatics…

Bioinformatics is the use of information technology in storage, curation, retrieval, and analysis of biological data

Bioinformatics by definition… „

„

“The use of computer science, mathematics, and information theory to model and analyze biological systems, especially systems involving genetic material.” http://www.answers.com/topic/bioinformatics “Bioinformatics now entails the creation and advancement of databases, algorithms, computational and statistical techniques and theory to solve formal and practical problems arising from the management and analysis of biological data.” Computationally intense: • • • •

pattern recognition data mining machine learning algorithms Visualization tools

http://en.wikipedia.org/wiki/Bioinformatics

How is bioinformatics related to molecular biology? Genome analysis Transcriptome analysis Proteome & metabolome analysis

Types of Biological Data Biological sequence data Nucleic acid sequences (DNA, RNA)

Gene expression data Real Time-PCR

CCTGCCTAAACCTCCCAAGTA

Expressed Sequence Tags (EST’s) mRNAÆcDNA Expression Microarrays Amino acid sequences MFLAVSQNKDAVRS

Types of Biological Data, cont’d. Molecular structure data

Glucocorticoid receptor (http://www.ncbi.nlm.nih.gov//S tructure/MMDB/mmdb.shtml)

Metabolic pathway data

D-Alanine metabolism pathway (http://www.genome.ad.jp/kegg/ pathway/map/map00473.html)

Types of Biological Data, cont’d. Physical and genetic chromosome maps

Phylogenetic data

Illustrate Evolutionary Relationships with Phylograms! Browse the Human Genome! (http://www.ncbi.nlm.nih.gov/geno me/guide/human/)

(Hayasaka, K., T. Gojobori, and S. Horai. 1988. Molecular phylogeny and evolution of primate mitochondrial DNA. Mol. Biol. Evol., 5:626-644.)

Types of Biological Data, cont’d. Single Nucleotide Polymorphisms (SNP’s)

STR allele profiles (13 CODIS loci)

http://www.sequenom.com

Correlate Phenotypes to Genotypes! ~1 SNP/1000 bps in humans SNP’s form the basis for phenotypic differences between individuals

D8S1179, D21S11, D7S820, CSF1PO, D3S1358, THO1, D13S317, D16S539, D2S1338, D19S433, vWA, TPOX, D18S51, D5S818, FGA, Amelogenin

Related Types of Informatics Data Geospatial coordinates 1. Population studies—tracking wildlife migration/movements

Medical informatics 1. Patient data 2. Drug dispensation/ pharmaceutical tracking

2. Monitoring agricultural field parameters (soil temp, moisture, salinity, etc…)

Life science patents 1. Searchable text: 1. “novel and nonobvious” invention disclosure 2. inventor names, etc…

Where is biological data generated?

In hospitals, laboratories, biotech companies and university research facilities… We will tour two UCD research facilities using bioinformatics to address biological questions.

Today… Robert Mondavi Institute for Wine & Food Science

Thursday Afternoon UC Davis Genome Center

Where is biological data stored?

Primary or “Archival” Databases (few): „

GenBank (National Center for Biotechnology

http://www.ncbi.nlm.nih.gov/

„

EMBL (European Molecular Biology Laboratory) http://www.ebi.ac.uk/

„

DDJB (The DNA Databank of Japan)

http://www.ddbj.nig.ac.jp/

24 hour data sharing

Genbank is currently at ~99 billion base pairs, with about 98 million submitted sequences!!!

Genbank

EMBL DDJB

http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html

Primary Sequences in Genbank are given unique identifiers upon submission Accession.version

Gene ID#--random, consecutive assignment

“The two systems of identifiers run in parallel to each other. That is, when any change is made to a sequence, it receives a new GI number AND an increase to its version number.” -NCBI website http://www.ncbi.nlm.nih.gov/Sitemap/sequenceIDs.html

Secondary or “curated” databases (there are many, here are a few examples):

„ „ „ „ „

HomoloGene (NCBI) KEGG (The Kyoto Encyclopedia of Genes and Genomes) PDB (Protein Data Bank) TAIR (The Arabidopsis Information Resource) JCVI (J. Craig Venter Institute) formerly known as “TIGR (The Institute for Genomic Research)”

NCBI houses a variety of primary and secondary databases…

http://www.ncbi.nlm.nih.gov

…navigable using the integrated, text-based Entrez search engine

http://www.ncbi.nlm.nih.gov/Database/datamodel/index.html

Entrez records for “resveratrol” (June 2009)

Revised April 15, 2009

http://www.ncbi.nlm.nih.gov/About/tools/ restable_stat_pubmed.html

Month-Yr.

Interactive Searches

Record Views

Mar-09

74,518,774

108,665,470

http://www.ncbi.nlm.nih.gov/About/tools/restable_stat_pubmed.html

Historical perspective…from Jan 1997 to Mar 2007

What types of questions may be answered using bioinformatics tools?

Predicting protein structure and function ab initio… How accurate are predictions based solely on molecular sequence data?

Gene prediction— Where do genes start and stop? Where are the introns, exons, poly-A sites, UTRs, etc…? Gene sizes vary dramatically— the average human gene is about 3000 bps and the largest is dystrophin…at 2.4 million base pairs!!!

Dystrophin Gene

Figures 2-27, 2-28

Characterizing splice variants— How many proteins may result from the transcription of one gene? What is the structure and function of these protein variants?

Validating SNP’s and correlating to phenotypes— Is it a SNP, a mutation or a sequencing error? Does every person with a specific SNP have the same phenotype for a given trait? •There are now over 87 million SNP’s cataloged in NCBI, and growing rapidly. •SNP discovery & validation is the foundation of “personalized medicine”. •Most SNP’s are not diseasecausing, but are linked to a genomic region associated with disease.

Determining the function(s) of non-protein coding genomic regions—

„ „

What are the physical and functional characteristics of non-coding genomic regions? Why is ~95% (intergenic and intragenic) of the human genome unexpressed?

Figure 2-8

Predictive modeling of gene regulation— Where are the DNA and protein-binding domains located in a genome? Are there consensus sequences indicative of specific regulatory roles?

Modeling gene and protein regulatory networks— How is gene expression regulated in complex networks. Can we model these interactions in silico?

Developing effective gene ontology: How do we provide all scientists with a consistent and controlled vocabulary for describing genes? The Gene Ontology project (GO) http://www.geneontology.org/

Using specific nomenclature to define genes and their products: Biological or physiological process: Describes the overall pathway in which the protein participates Molecular function: Describes the specific biochemical activity of the protein Cellular component: Describes the cellular location of the protein

Ex: Crystal structure of human quinone reductase 2 in complex with resveratrol. Physiological process: oxidative stress response Molecular function: 2 e- reduction of quinones to hydroquinones (without forming a reactive oxygen species) Cellular component: cytosol

“…the chemopreventive and cardioprotective properties of resveratrol are possibly the results of QR2 activity inhibition, which in turn, up-regulates the expression of cellular antioxidant enzymes and cellular resistance to oxidative stress.” http://www.ncbi.nlm.nih.gov/Structure/mmdb/mmdbimage.fcgi?small=f&id=31231