Technology development of database integration to make re-use of public biological data Hiromasa Ono, Shin Kawano, Yuki Naito, Tazro Ohta, Takeru Nakazato, Hidemasa Bono
Database Center for Life Science(DBCLS), Research Organization of Information and Systems(ROIS), Bunkyo-ku, Tokyo 113-0032, Japan
Search by nucleotide sequence
Current project in Technology development of DB integration 1. DB integration with RDF 2. Development and maintenance of research environment for accessing DBs 3. Technology development for the integrated DB seach 4. Maintenance and standardization of ontologies, dictionaries and corpus 5. Technology development for the utility of huge amount of public biological data 6. Development and distribution of the system for manual curation 7. Development and maintenance of contents concerning the integrated DB
•Google-like full text search for RefSeq transcripts. •Quickly finds nucleotiode sequence as well as other fields of GenBank File Format.
•Paper published:
About us National research center in Japan since 2007 All tools and services are freely available URL links to the services below http://lifesciencedb.jp/lab
Yuki Naito and Hidemasa Bono GGRNA: an ultrafast, transcript-oriented search engine for genes and transcripts Nucleic FigureAcids 1 Res. (2012) 40: W592-W596. doi: 10.1093/nar/gks448
http://GGRNA.DBCLS.jp/
A
B
input: arbitrary words & phrases
(gene names, IDs, sequences, etc.)
CBI - UCSC - Reference
output: transcripts (DICER1), transcript stored in RefSeq th RNAse motif; endori
Results:
" /note="PAZ domain, Homo sapiens dicer 1, ribonuclease type III (DICER1), transcript variant 3, mRNA. (9921 bp) .... in turn target hydro ..... RNA-specific endoribonuclease; helicase with RNAse motif; endoribonuclease Dicer; helicase MOI; ..... /gene_synonym="DCR1; Dicer; HERNA; MNG1" /note="PAZ domain, dicer_like subfamily. Dicer is an ..... e;target cd02843" .....RNAs. bacteria RNAse involved in cleaving dsRNA in the RNA ..... in turn hydrolysis of homologous PAZ domains are named...; Region: PAZ_dicer_like; cd02843" ..... bacterial and archeal ribonuclease III (RNAse III) proteins. ..... RNAse III is a double stranded ..... endonuclease. Prokaryotic leRNA-specific stranded RNA-specific RNAse III is important in...; Region: RIBOc; cd00593" ..... bacterial and archeal ribonuclease III (RNAse III) proteins. ..... d00593" ..... bacterial an Synonym: DCR1; Dicer; HERNA; MNG1
Matches are highlighted with green background. Overlapping matches are dark colored.
Search
RNase "PAZ domain"
Creating online tutorial video for bioinformatics •Third party tutorial video providing site since 2007. •# of contents: 613 (As of 20120711, including domestic version)
Homo sapiens (human)
Google-like full text search engine for genes and transcripts. Search examples: [ homeobox ] [ claudin ] ...... Simple search [ "RNA interference" ] ...... Phrase search by enclosing terms with double quotes [ RNase "PAZ domain" ] ...... Return results that contain [ RNase ] AND [ "PAZ domain" ] [ NM_001518 ] [ 10579 ] ...... Search for IDs such as RefSeq ID or Gene ID [ symbol:VIM ] ...... Search by gene name (symbol or synonym) [ ref:Naito ] ...... Search within cited references [ 1552311_a_at ] ...... Search for nucleotide sequences by microarray probe IDs [ aa:KDEL ] ...... Search for amino-acid sequences [ caagaagagattg ] ...... Search for nucleotide sequences [ seq2:caagaagagattgcc ] ...... Search for nucleotide sequences with up to 2 mismatches [ comp:caagaagagattg ] ...... Search for complementary sequences [ iub:aggtcannntgacct ] ...... Search for nucleotide sequences containing degenerate nucleotides (e.g. N,R,Y,...) How to use in detail ...
http://togotv.DBCLS.jp/en
•Movies also available from YouTube and iTunes
Homo sapiens dicer 1, ribonuclease type III (DICER1), transcript variant 2, mRNA. (10267 bp) ..... RNA-specific endoribonuclease; helicase with RNAse motif; endoribonuclease Dicer; helicase MOI; ..... /gene_synonym="DCR1; Dicer; HERNA; MNG1" /note="PAZ domain, dicer_like subfamily. Dicer is an ..... RNAse involved in cleaving dsRNA in the RNA ..... in turn target hydrolysis of homologous RNAs. PAZ domains are named...; Region: PAZ_dicer_like; cd02843" ..... bacterial and archeal ribonuclease III (RNAse III) proteins. ..... RNAse III is a double stranded RNA-specific ..... endonuclease. Prokaryotic RNAse III is important in...; Region: RIBOc; cd00593" ..... bacterial and archeal ribonuclease III (RNAse III) proteins. ..... Synonym: DCR1; Dicer; HERNA; MNG1 NM_030621.3 - Homo sapiens (human) - NCBI - UCSC - Reference Expression
search results with highlighted keywords
Homo sapiens dicer 1, ribonuclease type III (DICER1), transcript variant 1, mRNA. (10323 bp)
C
store for free. •Distributed under CC-BY 2.1 license.
Homo sapiens ATPase, Cu++ transporting, beta polypeptide (ATP7B), transcript variant 1, mRNA. (6655 bp) ..... GLDGLGPSSQVATSTVRILGMTCQSCVKSIEDRISNLKGIISMKVS ..... SWPSRSLPAQEAVVKLRVEGMTCQSCVSSIEGKVRKLQGVVRVKVS ..... SETLGHQGSHVVTLQLRIDGMHCKSCVLNIEENIGQLLGVQSIQVS ..... SPPRNQVQGTCSTTLIAIAGMTCASCVHSIEGMISQLEGVQQISVS ..... KSPQSTRAVAPQKCFLQIKGMTCASCVSNIERNLQKEAGVLSVLVA ..... AAVMEDYAGSDGNIELTITGMTCASCVHNIESKLTRTNGITYASVA ..... QTEVIIRFAFQTSITVLCIACPCSLGLATPTAVMVGTGVAAQN ..... ILIKGGKPLEMAHKIKTVMFDKTGTITHGVPRVMRVLLLGDVATLPLRKVLAVVGTAEASSEHPLGVAVTKYCKEELGTETLGYC ..... SHKVAKVQELQNKGKKVAMVGDGVNDSPALAQADMGVAIGTGTDVA ..... AA_position 67 152 266 368 497 573 983 1027 1067 1266 Synonym: PWD; WC1; WD; WND NM_000053.3 - Homo sapiens (human) - NCBI - UCSC - Reference Expression
original website
Shin Kawano, Hiromasa Ono, Toshihisa Takagi, and Hidemasa Bono Tutorial videos of bioinformatics resources: online distribution trial in Japan named TogoTV Brief Bioinform (2012) 13(2): 258-268 doi:10.1093/bib/bbr039 Open Access.
Homo sapiens ATPase, Cu++ transporting, beta polypeptide (ATP7B), transcript variant 2, mRNA. (6034 bp) ..... GLDGLGPSSQVATSTVRILGMTCQSCVKSIEDRISNLKGIISMKVS ..... SWPSRSLPAQEAVVKLRVEGMTCQSCVSSIEGKVRKLQGVVRVKVS ..... SETLGHQGSHVVTLQLRIDGMHCKSCVLNIEENIGQLLGVQSIQVS ..... SPPRNQVQGTCSTTLIAIAGMTCASCVHSIEGMISQLEGVQQISVS ..... KSPQSTRAVAPQKCFLQIKGMTCASCVSNIERNLQKEAGVLSVLVA ..... AAVMEDYAGSDGNIELTITGMTCASCVHNIESKLTRTNGITYASVA ..... QTEVIIRFAFQTSITVLCIACPCSLGLATPTAVMVGTGVAAQN ..... ILIKGGKPLEMAHKIKTVMFDKTGTITHGVPRVMRVLLLGDVATLPLRKVLAVVGTAEASSEHPLGVAVTKYCKEELGTETLGYC ..... SHKVAKVQELQNKGKKVAMVGDGVNDSPALAQADMGVAIGTGTDVA ..... AA_position 67 152 266 368 497 573 776 820 860 1059 Synonym: PWD; WC1; WD; WND NM_001005918.2 - Homo sapiens (human) - NCBI - UCSC - Reference Expression
matched positions for nucleotide/amino-acid sequences
Contain all of these words:
sample
Search for phrases:
sample
seq 1: CTAGCTGCCAAAGAAGGACAT search for this seq mismatch allowed: Search for nucleotide sequences:
0 nt
complementary
degenerate bases (N,R,Y,...)
more...
Homo sapiens retina and anterior neural fold homeobox 2 (RAX2), mRNA. (2190 bp) ..... cagcaggagggggccctgaggcatgggatgggacagtctgggccagcgccacctcccgggacagaagtgcggcaccagggcaggagctgc agtagctaccctccccgtctccagcctgggctccccagatcactcccagatcaccaggtcaccccatctctaggcggcacctcacacaccagtcct gtggtccaacgccccgccatcacccaatgtcaccgcacaccaggcagtggggacacggcagtaagcacaagaaagatttttttttttaaagcta aaccaggccaggtgcggtggctcatgcctgtaatcccagtgctttgggaggctgaggtgggaggattgcttgagaccagcctgggtgacacagcaagac cccatctccacaaacgtttttaaaatgtgccgggtgtactggtgcacacctgtcatcccagctacccaa ..... position 1592 1634 1650 1698 1717 1783 1807 1812 1955 1972 1975 (CDS: 69 - 623) Synonym: ARMD6; CORD11; QRX; RAXL1 NM_032753.3 - Homo sapiens (human) - NCBI - UCSC - Reference Expression
search for this seq mismatch allowed: Search for amino acid sequences:
0 nt
complementary
degenerate bases (N,R,Y,...)
sample
seq 2:
Search by RefSeq ID/Gene ID:
sample
Search by gene name:
sample
Search within cited references:
sample
Search for annotation:
Homo sapiens nuclear factor of kappa light polypeptide gene enhancer in B-cells 1 (NFKB1), transcript variant 1, mRNA. (4093 bp)
Click...
n/a when containing degenerate bases
seq 1:
Search for nucleotide sequences by microarray probe IDs:
..... tgggtaactctgttttgcacctagctgccaaagaaggacatgataaagttctcagtatctt ..... cagcactgcacctggctgtggagcacgacaacatctcattggcaggctgcctgctcctgga ..... position 2328 2547 (CDS: 468 - 3374) Synonym: EBP-1; KBF1; NF-kappa-B; NF-kappaB; NFkappaB; NFKB-p105; NFKB-p50; p105; p50 NM_001165412.1 - Homo sapiens (human) - NCBI - UCSC - Reference Expression
..... tgggtaactctgttttgcacctagctgccaaagaaggacatgataaagttctcagtatctt ..... cagcactgcacctggctgtggagcacgacaacatctcattggcaggctgcctgctcctgga ..... position 2331 2550 (CDS: 468 - 3377) Synonym: EBP-1; KBF1; NF-kappa-B; NF-kappaB; NFkappaB; NFKB-p105; NFKB-p50; p105; p50 NM_003998.3 - Homo sapiens (human) - NCBI - UCSC - Reference Expression
n/a when containing degenerate bases
seq 2: CAATGAGATGTTGTCGTGCTC
http://RefEx.DBCLS.jp/
D
Homo sapiens nuclear factor of kappa light polypeptide gene enhancer in B-cells 1 (NFKB1), transcript variant 2, mRNA. (4090 bp)
Homo sapiens (human)
2. Enter keywords:
Reference Expression dataset
..... RNA-specific endoribonuclease; helicase with RNAse motif; endoribonuclease Dicer; helicase MOI; ..... /gene_synonym="DCR1; Dicer; HERNA; MNG1" /note="PAZ domain, dicer_like subfamily. Dicer is an ..... RNAse involved in cleaving dsRNA in the RNA ..... in turn target hydrolysis of homologous RNAs. PAZ domains are named...; Region: PAZ_dicer_like; cd02843" ..... bacterial and archeal ribonuclease III (RNAse III) proteins. ..... RNAse III is a double stranded RNA-specific ..... endonuclease. Prokaryotic RNAse III is important in...; Region: RIBOc; cd00593" ..... bacterial and archeal ribonuclease III (RNAse III) proteins. ..... Synonym: DCR1; Dicer; HERNA; MNG1 NM_177438.2 - Homo sapiens (human) - NCBI - UCSC - Reference Expression
F
E 1. Select an organism:
Visit poster by Ono, H. for RefEx (Reference Expression dataset)!
NM_001195573.1 - Homo sapiens (human) - NCBI - UCSC - Reference Expression
Generated query string
LOCUS DEFINITION ACCESSION VERSION KEYWORDS SOURCE ORGANISM
sample sample
Generated query string: seq:CTAGCTGCCAAAGAAGGACAT comp:CAATGAGATGTTGTCGTGCTC (Same results will be retrieved by pasting the above text on GGRNA regular search box.)
Statistics of NGS data
Search
★metadata ★expression values SRA
GEO
DDBJ
NM_003998 4093 bp mRNA linear PRI 01-JAN-2012 Homo sapiens nuclear factor of kappa light polypeptide gene enhancer in B-cells 1 (NFKB1), transcript variant 1, mRNA. NM_003998 NM_003998.3 GI:259155300 . Homo sapiens (human) Homo sapiens Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
RefSeq complete record in GBFF
tccaccttcattctcaacttgtgagggatctactagaagtcacatctggtttgatttct gatgacattatcaacatgagaaatgatctgtaccagacgcccttgcacttggcagtgat cactaagcaggaagatgtggtggaggatttgctgagggctggggccgacctgagccttc tggaccgcttgggtaactctgttttgcacctagctgccaaagaaggacatgataaagtt ctcagtatcttactcaagcacaaaaaggcagcactacttcttgaccaccccaacgggga cggtctgaatgccattcatctagccatgatgagcaatagcctgccatgtttgctgctgc tggtggccgctggggctgacgtcaatgctcaggagcagaagtccgggcgcacagcactg cacctggctgtggagcacgacaacatctcattggcaggctgcctgctcctggagggtga tgcccatgtggacagtactacctacgatggaaccacacccctgcatatagcagctggga gagggtccaccaggctggcagctcttctcaaagcagcaggagcagatcccctggtggag
PDBj
★metadata ★nucleotide seqs
DB digest
= overview+index
http://SRA.DBCLS.jp/ Search NGS data by disease
Search NGS data by publication
kusarinoko: Search SRA Entries with PubMed Article
鎖鋸 http://g86.DBCLS.jp/kusarinoko/
Database Center for Life Science (DBCLS) Research Organization of Information and Systems (ROIS) Faculty of Engineer Bldg.12, The University of Tokyo 2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-0032 JAPAN http://DBCLS.rois.ac.jp/ mailto:
[email protected] “Copyright©2011 DBCLS licensed by Creative Commons Attribution 2.1 Japan License.”