Introduction to Oracle on the NGS

8 downloads 4898 Views 3MB Size Report
http://www.nesc.ac.uk/training ... NGS supports Oracle 10G database as standard ... Query both the Oracle and SQL Server database located in Portsmouth.
http://www.nesc.ac.uk/training

http://www.ngs.ac.uk

Introduction to Oracle on the NGS Keir Hawker – DBA – Rutherford Laboratories Simon Collins – Data Consultant – Manchester University

Goals of today • Introduce concepts of relational databases – Focus on Oracle as it is the NGS standard database • We can support other databases

• Oracle install on the NGS – Who’s using the service now? – Benefits ? – How do I apply and get most the most of the service?

• Functionality specific to the life science community • Get feedback on your specific requirements for data storage 2

Agenda • Introduction to relational databases • Oracle on the NGS • Demonstration – Connectivity to Oracle on the NGS

• Oracle tools for life sciences • Designing D i i d database t b systems t • Practical – BLAST queries within Oracle

• Q&A 3

Relational Databases • Simply - databases that are built upon a relational model – Collection of designed tables that represent a data model • Data types described by columns • Data represented by rows • Tables may be joined by common column (keys)

– Provide and elegant and quick way to safely store, store retrieve, retrieve mine and manipulate large volumes of data

• Examples include Oracle Oracle, MySQL MySQL, SQLServer SQLServer, Ingres , IBM DB2 4

Typical view of a database (shown through SQLDeveloper) Schema – Synonymous to user A design g space p for an application where tables / views etc are created

Tables – where base data is stored Views – defined SQL queries on the table data to represent data in a more meaningful way

5

Typical view of a database (shown through SQLDeveloper)

SQL Statement

Oracle PL /SQL Types of code for manipulating p g data

Other Schemas in the database

SQL Output

6

Databases on the NGS • NGS supports Oracle 10G database as standard – MySQL databases

• Databases are installed at RAL and Manchester • Databases are administrated internally • Any NGS user is entitled to an Oracle account – F Full ll access to t all ll Oracle O l functionality f ti lit / tools t l – Unrestricted access to database development within their own partition (schema)

7

Who’ss using these databases? Who • Portsmouth P t th University U i it – Astronomy A t data d t studies t di – Vast quantities of astronomy data is now being produced • E Envisaged i d that th t suchh data d t in i the th future f t will ill have h to t be b kept k t on heterogeneous data sources (multiple databases)

– NGS Manchester has 3 Terabytes y of SDSS digital g data – Portsmouth conducting performance tests in querying distributed databases – Use OGSA-DAI and OGSA-DQP on the NGS to • Query the Oracle database • Query both the Oracle and SQL Server database located in Portsmouth

8

Example databases on the NGS • MIMAS sponsored d projects j t to t Grid G id enable bl data d t – GEMS Projects (Grid Enabled Mimas Services) • GEMS 1 project j t - access to t 2001 census data d t via i OGSA-DAI OGSA DAI • GEMS 2 project – access to MIMAS satellite image data

– ConvertGrid • Creation of large grid enabled geographic datasets

• Developed p web front ends – Allow social scientists to filter data – Allow statistical analysis of the data through other resources on the Grid – Access through OGSA-DAI developed services to maintain Grid security 9

Why put your data on the NGS? • Fully administered – Databases are supported and maintained by NGS Staff – Databases D t b are backed b k d up daily d il

• Advice – Help on how to access and develop your database

• Data Storage – Significant storage space available on the NGS for large datasets.

• Data Integration – Oracle may y be beneficiallyy used in conjunction j with other services / programs installed on the NGS

10

Why put your data on the NGS? • Computational C t ti lP Power – Potential for data analysis that may be prohibitive on other machines

• Data Protection – Databases D t b configured fi d on RAID 5/10 disks di k andd fully f ll backed b k d up

• Oracle Functionality – NGS hhas ffully ll li licensed dO Oracle l E Enterprise t i suite it – Lot of potentially useful functionality – It It’ss all free

11

How does Oracle fit in with “The The Grid Grid”? ? • O Oracle l software ft iis nott naturally t ll partt off th the globus l b software stack • “Grid “G id enabling” bli ” d data t can b be d done th through h – OGSA-DAI

• Do I have to use OGSA-DAI OGS to utilize the Oracle O databases on the NGS?

12

How do I apply for access? • Through Th h NGS website b it – http://www.ngs.ac.uk – http://www.grid-support.ac.uk/content/view/221/171/ h // id k/ / i /221/171/

• Provide following information – Storage Requirements – Type of access needed – Where Wh you’d ’d lik like the th database d t b to t be b hosted h t d (RAL or Manchester) – What kind of help you need in setting it up

13

Demonstration – Connecting to Oracle • M Many tools t l ffor connecting ti to t Oracle O l for f administration or development – – – –

SQL*Plus (Shell SQL*Pl (Sh ll or GUI) SQLDeveloper (GUI) Isqlplus (Web) Oracle Enterprise Manager (Web)

• We will specifically look at – Sql Login (Script allowing SQL*Login on the NGS)

14

Oracle functionality for life sciences Bl t Searches Blast S h •F For the th query sequence “ATCGCGTT”, “ATCGCGTT” find the top 3 matches above a similarity threshold from each organism select seq_id, organism, score, expect from (select t.seq_id, t.score, t.expect, g.organism, () OVER (PARTITION ( BY organism g RANK() ORDER BY score DESC) as o_rank from SwissProt_DB g, Table(SYS_BLASTP_MATCH (‘ATCGCGTT’, cursor (select ( l t seq_id, id sequence from f SwissProt_DB), 5)) t /* expect_value */ where t.seq_id = g.seq_id) where o_rank