Database extensions for PostgreSQL Tutorial 2/2 - Wikidot

34 downloads 177 Views 124KB Size Report
DB extensions for PostgreSQL. Tutorial 2/2 – 2 / 10. Bioinformatics extensions for PostgreSQL int_interval data type for integer ranges. Topological and set ...
Database extensions for PostgreSQL Tutorial 2/2 Luis Carvalho Division of Applied Math [email protected] February, 2008

DB extensions for PostgreSQL

Tutorial 2/2 – 1 / 10

PostBio PostBio ! PostBio

"

Bioinformatics extensions for PostgreSQL

! Examples 1 ! Examples 2 PostStat PL/Lua Resources

DB extensions for PostgreSQL

Tutorial 2/2 – 2 / 10

PostBio PostBio ! PostBio

"

! Examples 1 ! Examples 2 PostStat PL/Lua

Bioinformatics extensions for PostgreSQL #

int_interval data type for integer ranges $

Topological and set operators

$

Can be indexed with GiST

Resources $

DB extensions for PostgreSQL

Sequence feature: (id, orient, int_interval)

Tutorial 2/2 – 2 / 10

PostBio PostBio ! PostBio

"

! Examples 1 ! Examples 2

Bioinformatics extensions for PostgreSQL #

PostStat PL/Lua

int_interval data type for integer ranges $

Topological and set operators

$

Can be indexed with GiST

Resources $

#

stree data type for suffix trees $

DB extensions for PostgreSQL

Sequence feature: (id, orient, int_interval)

Routines for maximal matches and match counts, based on MUMmer

Tutorial 2/2 – 2 / 10

PostBio PostBio ! PostBio

"

! Examples 1 ! Examples 2

Bioinformatics extensions for PostgreSQL #

PostStat PL/Lua

int_interval data type for integer ranges $

Topological and set operators

$

Can be indexed with GiST

Resources $

#

stree data type for suffix trees $

#

DB extensions for PostgreSQL

Sequence feature: (id, orient, int_interval)

Routines for maximal matches and match counts, based on MUMmer

Utilitary routines: revcomp

Tutorial 2/2 – 2 / 10

PostBio PostBio ! PostBio

"

! Examples 1

Bioinformatics extensions for PostgreSQL #

! Examples 2 PostStat PL/Lua

int_interval data type for integer ranges $

Topological and set operators

$

Can be indexed with GiST

Resources $

#

stree data type for suffix trees $

# "

DB extensions for PostgreSQL

Sequence feature: (id, orient, int_interval)

Routines for maximal matches and match counts, based on MUMmer

Utilitary routines: revcomp

Documentation: http://postbio.projects.postgresql.org Tutorial 2/2 – 2 / 10

int_interval SQL snippets PostBio ! PostBio ! Examples 1 ! Examples 2

SELECT name, #region AS length, #region AS right FROM refseq;

PostStat PL/Lua Resources

SELECT name, region 100 FROM refseq WHERE region @ ’(10000, 100000)’; SELECT r.name, p.id, p.region FROM refseq r JOIN probeset p ON r.seq_id=p.seq_id AND r.same_orient=p.same_orient AND p.region && p.region; SELECT name, int_cover(region) FROM refseq_pset GROUP BY name;

DB extensions for PostgreSQL

Tutorial 2/2 – 3 / 10

stree SQL snippets PostBio ! PostBio ! Examples 1 ! Examples 2 PostStat PL/Lua Resources

SELECT maxmatch(’acgtacgt’, ’cgta’, false, 2); maxmatch ---------(4,2,1) (3,6,1) CREATE TABLE rseq (id integer, st stree); -- fill table SELECT id, maxmatchcount(st, ’cgta’, false, 3, false) FROM rseq; id | maxmatchcount ----+--------------1 | 2 2 | 2

DB extensions for PostgreSQL

Tutorial 2/2 – 4 / 10

PostStat PostBio

"

Statistics extensions for PostgreSQL

PostStat ! PostStat ! Examples PL/Lua Resources

DB extensions for PostgreSQL

Tutorial 2/2 – 5 / 10

PostStat PostBio

"

PostStat ! PostStat ! Examples PL/Lua Resources

DB extensions for PostgreSQL

Statistics extensions for PostgreSQL #

Cumulative probability distributions: binomial, hypergeometric, negative binomial, Poisson, exponential, gamma, normal, chi-square, t, F, Wilcoxon signed rank and rank sum

Tutorial 2/2 – 5 / 10

PostStat PostBio

"

PostStat ! PostStat

Statistics extensions for PostgreSQL #

Cumulative probability distributions: binomial, hypergeometric, negative binomial, Poisson, exponential, gamma, normal, chi-square, t, F, Wilcoxon signed rank and rank sum

#

Linear regression: vector and matrix accumulators, linear fit and F-statistic

! Examples PL/Lua Resources

DB extensions for PostgreSQL

Tutorial 2/2 – 5 / 10

PostStat PostBio

"

PostStat ! PostStat

Statistics extensions for PostgreSQL #

Cumulative probability distributions: binomial, hypergeometric, negative binomial, Poisson, exponential, gamma, normal, chi-square, t, F, Wilcoxon signed rank and rank sum

#

Linear regression: vector and matrix accumulators, linear fit and F-statistic

#

Statistical tests: Fisher, Shapiro, and [G]SEA

! Examples PL/Lua Resources

DB extensions for PostgreSQL

Tutorial 2/2 – 5 / 10

PostStat PostBio

"

PostStat ! PostStat

Statistics extensions for PostgreSQL #

Cumulative probability distributions: binomial, hypergeometric, negative binomial, Poisson, exponential, gamma, normal, chi-square, t, F, Wilcoxon signed rank and rank sum

#

Linear regression: vector and matrix accumulators, linear fit and F-statistic

#

Statistical tests: Fisher, Shapiro, and [G]SEA

! Examples PL/Lua Resources

"

DB extensions for PostgreSQL

Motivation: test statistical hypotheses!

Tutorial 2/2 – 5 / 10

PostStat PostBio

"

PostStat ! PostStat

Statistics extensions for PostgreSQL #

Cumulative probability distributions: binomial, hypergeometric, negative binomial, Poisson, exponential, gamma, normal, chi-square, t, F, Wilcoxon signed rank and rank sum

#

Linear regression: vector and matrix accumulators, linear fit and F-statistic

#

Statistical tests: Fisher, Shapiro, and [G]SEA

! Examples PL/Lua Resources

"

Motivation: test statistical hypotheses!

"

Documentation: http://poststat.projects.postgresql.org

DB extensions for PostgreSQL

Tutorial 2/2 – 5 / 10

PostStat examples PostBio PostStat ! PostStat ! Examples PL/Lua

CREATE VIEW ngseq AS SELECT s.id, count(*) AS ngenes FROM sequence s JOIN refseq r ON r.seq_id=s.id GROUP BY s.id;

Resources

SELECT shapiro(array_accum(ngenes)) FROM ngseq; \set alpha 0.05 SELECT id, ngenes, 1 - pnorm(ngenes, m, s) AS pvalue FROM ngseq, (SELECT avg(ngenes) AS m, stddev(ngenes) AS s FROM ngseq) AS q WHERE 1 - pnorm(ngenes, m, s) < :alpha; DB extensions for PostgreSQL

Tutorial 2/2 – 6 / 10

PL/Lua PostBio

"

PostStat PL/Lua ! PL/Lua ! Examples 1 ! Examples 2

Lua as a procedural language in PostgreSQL #

Trusted and untrusted versions

#

Same facilities in Lua + server extensions (SPI)

#

Each function has a “local namespace” called upvalue and can be recursive

#

Set Returning Functions (SRF) use coroutines

Resources

"

Prototype CREATE FUNCTION func (args) RETURNS rettype AS $$ -- Lua function body $$ LANGUAGE [ pllua | plluau ];

"

DB extensions for PostgreSQL

Documentation: http://pllua.projects.postgresql.org Tutorial 2/2 – 7 / 10

PL/Lua examples PostBio PostStat

CREATE DOMAIN nucseq AS text CHECK (value ~ ’ˆ[acgt]*$’);

PL/Lua ! PL/Lua ! Examples 1 ! Examples 2 Resources

CREATE FUNCTION gccontent (s nucseq) RETURNS double precision AS $$ local c = 0 for i = 1, #s do local l = s:sub(i, i) if l == "c" or l == "g" then c = c + 1 end end return c / #s $$ LANGUAGE pllua;

DB extensions for PostgreSQL

Tutorial 2/2 – 8 / 10

PL/Lua examples PostBio PostStat PL/Lua ! PL/Lua ! Examples 1 ! Examples 2 Resources

CREATE FUNCTION randseq (n integer) RETURNS nucseq AS $$ local s = {} for i = 1, n do s[i] = upvalue() end return table.concat(s, "") end do upvalue = function () local u = math.random() if u < 0.25 then return "a" elseif u < 0.5 then return "c" elseif u < 0.75 then return "g" else return "t" end end $$ LANGUAGE pllua;

DB extensions for PostgreSQL

Tutorial 2/2 – 9 / 10

PostBio PostStat PL/Lua Resources

Wiki http://postbio.wikidot.com

DB extensions for PostgreSQL

Tutorial 2/2 – 10 / 10