http://www.nesc.ac.uk/training ... NGS supports Oracle 10G database as standard
... Query both the Oracle and SQL Server database located in Portsmouth.
http://www.nesc.ac.uk/training
http://www.ngs.ac.uk
Introduction to Oracle on the NGS Keir Hawker – DBA – Rutherford Laboratories Simon Collins – Data Consultant – Manchester University
Goals of today • Introduce concepts of relational databases – Focus on Oracle as it is the NGS standard database • We can support other databases
• Oracle install on the NGS – Who’s using the service now? – Benefits ? – How do I apply and get most the most of the service?
• Functionality specific to the life science community • Get feedback on your specific requirements for data storage 2
Agenda • Introduction to relational databases • Oracle on the NGS • Demonstration – Connectivity to Oracle on the NGS
• Oracle tools for life sciences • Designing D i i d database t b systems t • Practical – BLAST queries within Oracle
• Q&A 3
Relational Databases • Simply - databases that are built upon a relational model – Collection of designed tables that represent a data model • Data types described by columns • Data represented by rows • Tables may be joined by common column (keys)
– Provide and elegant and quick way to safely store, store retrieve, retrieve mine and manipulate large volumes of data
• Examples include Oracle Oracle, MySQL MySQL, SQLServer SQLServer, Ingres , IBM DB2 4
Typical view of a database (shown through SQLDeveloper) Schema – Synonymous to user A design g space p for an application where tables / views etc are created
Tables – where base data is stored Views – defined SQL queries on the table data to represent data in a more meaningful way
5
Typical view of a database (shown through SQLDeveloper)
SQL Statement
Oracle PL /SQL Types of code for manipulating p g data
Other Schemas in the database
SQL Output
6
Databases on the NGS • NGS supports Oracle 10G database as standard – MySQL databases
• Databases are installed at RAL and Manchester • Databases are administrated internally • Any NGS user is entitled to an Oracle account – F Full ll access to t all ll Oracle O l functionality f ti lit / tools t l – Unrestricted access to database development within their own partition (schema)
7
Who’ss using these databases? Who • Portsmouth P t th University U i it – Astronomy A t data d t studies t di – Vast quantities of astronomy data is now being produced • E Envisaged i d that th t suchh data d t in i the th future f t will ill have h to t be b kept k t on heterogeneous data sources (multiple databases)
– NGS Manchester has 3 Terabytes y of SDSS digital g data – Portsmouth conducting performance tests in querying distributed databases – Use OGSA-DAI and OGSA-DQP on the NGS to • Query the Oracle database • Query both the Oracle and SQL Server database located in Portsmouth
8
Example databases on the NGS • MIMAS sponsored d projects j t to t Grid G id enable bl data d t – GEMS Projects (Grid Enabled Mimas Services) • GEMS 1 project j t - access to t 2001 census data d t via i OGSA-DAI OGSA DAI • GEMS 2 project – access to MIMAS satellite image data
– ConvertGrid • Creation of large grid enabled geographic datasets
• Developed p web front ends – Allow social scientists to filter data – Allow statistical analysis of the data through other resources on the Grid – Access through OGSA-DAI developed services to maintain Grid security 9
Why put your data on the NGS? • Fully administered – Databases are supported and maintained by NGS Staff – Databases D t b are backed b k d up daily d il
• Advice – Help on how to access and develop your database
• Data Storage – Significant storage space available on the NGS for large datasets.
• Data Integration – Oracle may y be beneficiallyy used in conjunction j with other services / programs installed on the NGS
10
Why put your data on the NGS? • Computational C t ti lP Power – Potential for data analysis that may be prohibitive on other machines
• Data Protection – Databases D t b configured fi d on RAID 5/10 disks di k andd fully f ll backed b k d up
• Oracle Functionality – NGS hhas ffully ll li licensed dO Oracle l E Enterprise t i suite it – Lot of potentially useful functionality – It It’ss all free
11
How does Oracle fit in with “The The Grid Grid”? ? • O Oracle l software ft iis nott naturally t ll partt off th the globus l b software stack • “Grid “G id enabling” bli ” d data t can b be d done th through h – OGSA-DAI
• Do I have to use OGSA-DAI OGS to utilize the Oracle O databases on the NGS?
12
How do I apply for access? • Through Th h NGS website b it – http://www.ngs.ac.uk – http://www.grid-support.ac.uk/content/view/221/171/ h // id k/ / i /221/171/
• Provide following information – Storage Requirements – Type of access needed – Where Wh you’d ’d lik like the th database d t b to t be b hosted h t d (RAL or Manchester) – What kind of help you need in setting it up
13
Demonstration – Connecting to Oracle • M Many tools t l ffor connecting ti to t Oracle O l for f administration or development – – – –
• We will specifically look at – Sql Login (Script allowing SQL*Login on the NGS)
14
Oracle functionality for life sciences Bl t Searches Blast S h •F For the th query sequence “ATCGCGTT”, “ATCGCGTT” find the top 3 matches above a similarity threshold from each organism select seq_id, organism, score, expect from (select t.seq_id, t.score, t.expect, g.organism, () OVER (PARTITION ( BY organism g RANK() ORDER BY score DESC) as o_rank from SwissProt_DB g, Table(SYS_BLASTP_MATCH (‘ATCGCGTT’, cursor (select ( l t seq_id, id sequence from f SwissProt_DB), 5)) t /* expect_value */ where t.seq_id = g.seq_id) where o_rank