Do with Biotech Tools…. Use gene therapy to treat heritable disorders.
Determine the evolutionary relationship between all living things. Track the
migration of.
Biotechnology & Bioinformatics
Denneal Jamison-McClung, PhD
[email protected]
Associate Director UC Davis Biotechnology Program www.biotech.ucdavis.edu
Program Coordinator NSF CREATE-REU create-reu.ucdavis.edu
The UCD Biotech Program works to bring all members of the life science community together to promote biotechnology education and workforce development.
ADP Graduate Program Industry
DEB Graduate Program
Academia
NIH & NSF Training Grants Short Technical Courses
BioTech SYSTEM K-14 Outreach
Government
Community Groups
What is Biotechnology? The use of living organisms, or parts thereof, to provide useful products, processes and services.
Cells are the basic building blocks of living organisms
Scientists in the field of biotechnology modify cellular DNA in order to produce useful proteins & other molecules DNA
Protein
Transcription & Translation
Genetic Engineering
Cutting and moving specific functional sections of DNA (genes for specific desirable traits) from one plant, animal or microbe to another.
Gene-encoded desirable traits include: • • • •
Vitamin or nutrient production Disease resistance Insect resistance Stress tolerance (ability to deal with environmental conditions, such as heat, drought, saline soil, flooding, etc…) • Toxin breakdown • Drug, vaccine or “useful chemical” production
Feeding Our People y
Genetically Modified Crops Help Farmers: y y y y
Use less pesticides & fertilizers Grow more food per acre Use poor soils to grow crops Produce food with more nutrients
GM crops help farmers save $$$ and are better for the environment!
Preserving Our Planet y
Biofuels y Using plants, algae and microbes to make fuel
y
Bioremediation y Cleaning up toxins (heavy metals, poisons, etc…) in soil and water with genetically modified plants and microbes
y
Green Chemistry y Making industrial products in genetically modified plants and microbes (plastics, solvents, pigments, etc…)
Treating Diseases & Disorders y
Stem Cells & Tissue Regeneration y Repairing or growing new organs with your own cells!
y
Pharmacogenomics y Matching your genetic background to the best drug treatments y Developing new drugs
Other Great Things Scientists Can Do with Biotech Tools…. y
Use gene therapy to treat heritable disorders
y
Determine the evolutionary relationship between all living things
y
Track the migration of modern humans across the globe (past ~100300,000 years)
Biotech advances are dependent on bioinformatics…
Bioinformatics is the use of information technology in storage, curation, retrieval, and analysis of biological data
Bioinformatics by definition…
“The use of computer science, mathematics, and information theory to model and analyze biological systems, especially systems involving genetic material.” http://www.answers.com/topic/bioinformatics “Bioinformatics now entails the creation and advancement of databases, algorithms, computational and statistical techniques and theory to solve formal and practical problems arising from the management and analysis of biological data.” Computationally intense: • • • •
pattern recognition data mining machine learning algorithms Visualization tools
http://en.wikipedia.org/wiki/Bioinformatics
How is bioinformatics related to molecular biology? Genome analysis Transcriptome analysis Proteome & metabolome analysis
Types of Biological Data Biological sequence data Nucleic acid sequences (DNA, RNA)
Gene expression data Real Time-PCR
CCTGCCTAAACCTCCCAAGTA
Expressed Sequence Tags (EST’s) mRNAÆcDNA Expression Microarrays Amino acid sequences MFLAVSQNKDAVRS
Types of Biological Data, cont’d. Molecular structure data
Glucocorticoid receptor (http://www.ncbi.nlm.nih.gov//S tructure/MMDB/mmdb.shtml)
Metabolic pathway data
D-Alanine metabolism pathway (http://www.genome.ad.jp/kegg/ pathway/map/map00473.html)
Types of Biological Data, cont’d. Physical and genetic chromosome maps
Phylogenetic data
Illustrate Evolutionary Relationships with Phylograms! Browse the Human Genome! (http://www.ncbi.nlm.nih.gov/geno me/guide/human/)
(Hayasaka, K., T. Gojobori, and S. Horai. 1988. Molecular phylogeny and evolution of primate mitochondrial DNA. Mol. Biol. Evol., 5:626-644.)
Types of Biological Data, cont’d. Single Nucleotide Polymorphisms (SNP’s)
STR allele profiles (13 CODIS loci)
http://www.sequenom.com
Correlate Phenotypes to Genotypes! ~1 SNP/1000 bps in humans SNP’s form the basis for phenotypic differences between individuals
D8S1179, D21S11, D7S820, CSF1PO, D3S1358, THO1, D13S317, D16S539, D2S1338, D19S433, vWA, TPOX, D18S51, D5S818, FGA, Amelogenin
Related Types of Informatics Data Geospatial coordinates 1. Population studies—tracking wildlife migration/movements
Medical informatics 1. Patient data 2. Drug dispensation/ pharmaceutical tracking
2. Monitoring agricultural field parameters (soil temp, moisture, salinity, etc…)
Life science patents 1. Searchable text: 1. “novel and nonobvious” invention disclosure 2. inventor names, etc…
Where is biological data generated?
In hospitals, laboratories, biotech companies and university research facilities… We will tour two UCD research facilities using bioinformatics to address biological questions.
Today… Robert Mondavi Institute for Wine & Food Science
Thursday Afternoon UC Davis Genome Center
Where is biological data stored?
Primary or “Archival” Databases (few):
GenBank (National Center for Biotechnology
http://www.ncbi.nlm.nih.gov/
EMBL (European Molecular Biology Laboratory) http://www.ebi.ac.uk/
DDJB (The DNA Databank of Japan)
http://www.ddbj.nig.ac.jp/
24 hour data sharing
Genbank is currently at ~99 billion base pairs, with about 98 million submitted sequences!!!
Genbank
EMBL DDJB
http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html
Primary Sequences in Genbank are given unique identifiers upon submission Accession.version
Gene ID#--random, consecutive assignment
“The two systems of identifiers run in parallel to each other. That is, when any change is made to a sequence, it receives a new GI number AND an increase to its version number.” -NCBI website http://www.ncbi.nlm.nih.gov/Sitemap/sequenceIDs.html
Secondary or “curated” databases (there are many, here are a few examples):
HomoloGene (NCBI) KEGG (The Kyoto Encyclopedia of Genes and Genomes) PDB (Protein Data Bank) TAIR (The Arabidopsis Information Resource) JCVI (J. Craig Venter Institute) formerly known as “TIGR (The Institute for Genomic Research)”
NCBI houses a variety of primary and secondary databases…
http://www.ncbi.nlm.nih.gov
…navigable using the integrated, text-based Entrez search engine
http://www.ncbi.nlm.nih.gov/Database/datamodel/index.html
Entrez records for “resveratrol” (June 2009)
Revised April 15, 2009
http://www.ncbi.nlm.nih.gov/About/tools/ restable_stat_pubmed.html
Month-Yr.
Interactive Searches
Record Views
Mar-09
74,518,774
108,665,470
http://www.ncbi.nlm.nih.gov/About/tools/restable_stat_pubmed.html
Historical perspective…from Jan 1997 to Mar 2007
What types of questions may be answered using bioinformatics tools?
Predicting protein structure and function ab initio… How accurate are predictions based solely on molecular sequence data?
Gene prediction— Where do genes start and stop? Where are the introns, exons, poly-A sites, UTRs, etc…? Gene sizes vary dramatically— the average human gene is about 3000 bps and the largest is dystrophin…at 2.4 million base pairs!!!
Dystrophin Gene
Figures 2-27, 2-28
Characterizing splice variants— How many proteins may result from the transcription of one gene? What is the structure and function of these protein variants?
Validating SNP’s and correlating to phenotypes— Is it a SNP, a mutation or a sequencing error? Does every person with a specific SNP have the same phenotype for a given trait? •There are now over 87 million SNP’s cataloged in NCBI, and growing rapidly. •SNP discovery & validation is the foundation of “personalized medicine”. •Most SNP’s are not diseasecausing, but are linked to a genomic region associated with disease.
Determining the function(s) of non-protein coding genomic regions—
What are the physical and functional characteristics of non-coding genomic regions? Why is ~95% (intergenic and intragenic) of the human genome unexpressed?
Figure 2-8
Predictive modeling of gene regulation— Where are the DNA and protein-binding domains located in a genome? Are there consensus sequences indicative of specific regulatory roles?
Modeling gene and protein regulatory networks— How is gene expression regulated in complex networks. Can we model these interactions in silico?
Developing effective gene ontology: How do we provide all scientists with a consistent and controlled vocabulary for describing genes? The Gene Ontology project (GO) http://www.geneontology.org/
Using specific nomenclature to define genes and their products: Biological or physiological process: Describes the overall pathway in which the protein participates Molecular function: Describes the specific biochemical activity of the protein Cellular component: Describes the cellular location of the protein
Ex: Crystal structure of human quinone reductase 2 in complex with resveratrol. Physiological process: oxidative stress response Molecular function: 2 e- reduction of quinones to hydroquinones (without forming a reactive oxygen species) Cellular component: cytosol
“…the chemopreventive and cardioprotective properties of resveratrol are possibly the results of QR2 activity inhibition, which in turn, up-regulates the expression of cellular antioxidant enzymes and cellular resistance to oxidative stress.” http://www.ncbi.nlm.nih.gov/Structure/mmdb/mmdbimage.fcgi?small=f&id=31231