DB extensions for PostgreSQL. Tutorial 2/2 – 2 / 10. Bioinformatics extensions for
PostgreSQL int_interval data type for integer ranges. Topological and set ...
Database extensions for PostgreSQL Tutorial 2/2 Luis Carvalho Division of Applied Math
[email protected] February, 2008
DB extensions for PostgreSQL
Tutorial 2/2 – 1 / 10
PostBio PostBio ! PostBio
"
Bioinformatics extensions for PostgreSQL
! Examples 1 ! Examples 2 PostStat PL/Lua Resources
DB extensions for PostgreSQL
Tutorial 2/2 – 2 / 10
PostBio PostBio ! PostBio
"
! Examples 1 ! Examples 2 PostStat PL/Lua
Bioinformatics extensions for PostgreSQL #
int_interval data type for integer ranges $
Topological and set operators
$
Can be indexed with GiST
Resources $
DB extensions for PostgreSQL
Sequence feature: (id, orient, int_interval)
Tutorial 2/2 – 2 / 10
PostBio PostBio ! PostBio
"
! Examples 1 ! Examples 2
Bioinformatics extensions for PostgreSQL #
PostStat PL/Lua
int_interval data type for integer ranges $
Topological and set operators
$
Can be indexed with GiST
Resources $
#
stree data type for suffix trees $
DB extensions for PostgreSQL
Sequence feature: (id, orient, int_interval)
Routines for maximal matches and match counts, based on MUMmer
Tutorial 2/2 – 2 / 10
PostBio PostBio ! PostBio
"
! Examples 1 ! Examples 2
Bioinformatics extensions for PostgreSQL #
PostStat PL/Lua
int_interval data type for integer ranges $
Topological and set operators
$
Can be indexed with GiST
Resources $
#
stree data type for suffix trees $
#
DB extensions for PostgreSQL
Sequence feature: (id, orient, int_interval)
Routines for maximal matches and match counts, based on MUMmer
Utilitary routines: revcomp
Tutorial 2/2 – 2 / 10
PostBio PostBio ! PostBio
"
! Examples 1
Bioinformatics extensions for PostgreSQL #
! Examples 2 PostStat PL/Lua
int_interval data type for integer ranges $
Topological and set operators
$
Can be indexed with GiST
Resources $
#
stree data type for suffix trees $
# "
DB extensions for PostgreSQL
Sequence feature: (id, orient, int_interval)
Routines for maximal matches and match counts, based on MUMmer
Utilitary routines: revcomp
Documentation: http://postbio.projects.postgresql.org Tutorial 2/2 – 2 / 10
int_interval SQL snippets PostBio ! PostBio ! Examples 1 ! Examples 2
SELECT name, #region AS length, #region AS right FROM refseq;
PostStat PL/Lua Resources
SELECT name, region 100 FROM refseq WHERE region @ ’(10000, 100000)’; SELECT r.name, p.id, p.region FROM refseq r JOIN probeset p ON r.seq_id=p.seq_id AND r.same_orient=p.same_orient AND p.region && p.region; SELECT name, int_cover(region) FROM refseq_pset GROUP BY name;
DB extensions for PostgreSQL
Tutorial 2/2 – 3 / 10
stree SQL snippets PostBio ! PostBio ! Examples 1 ! Examples 2 PostStat PL/Lua Resources
SELECT maxmatch(’acgtacgt’, ’cgta’, false, 2); maxmatch ---------(4,2,1) (3,6,1) CREATE TABLE rseq (id integer, st stree); -- fill table SELECT id, maxmatchcount(st, ’cgta’, false, 3, false) FROM rseq; id | maxmatchcount ----+--------------1 | 2 2 | 2
DB extensions for PostgreSQL
Tutorial 2/2 – 4 / 10
PostStat PostBio
"
Statistics extensions for PostgreSQL
PostStat ! PostStat ! Examples PL/Lua Resources
DB extensions for PostgreSQL
Tutorial 2/2 – 5 / 10
PostStat PostBio
"
PostStat ! PostStat ! Examples PL/Lua Resources
DB extensions for PostgreSQL
Statistics extensions for PostgreSQL #
Cumulative probability distributions: binomial, hypergeometric, negative binomial, Poisson, exponential, gamma, normal, chi-square, t, F, Wilcoxon signed rank and rank sum
Tutorial 2/2 – 5 / 10
PostStat PostBio
"
PostStat ! PostStat
Statistics extensions for PostgreSQL #
Cumulative probability distributions: binomial, hypergeometric, negative binomial, Poisson, exponential, gamma, normal, chi-square, t, F, Wilcoxon signed rank and rank sum
#
Linear regression: vector and matrix accumulators, linear fit and F-statistic
! Examples PL/Lua Resources
DB extensions for PostgreSQL
Tutorial 2/2 – 5 / 10
PostStat PostBio
"
PostStat ! PostStat
Statistics extensions for PostgreSQL #
Cumulative probability distributions: binomial, hypergeometric, negative binomial, Poisson, exponential, gamma, normal, chi-square, t, F, Wilcoxon signed rank and rank sum
#
Linear regression: vector and matrix accumulators, linear fit and F-statistic
#
Statistical tests: Fisher, Shapiro, and [G]SEA
! Examples PL/Lua Resources
DB extensions for PostgreSQL
Tutorial 2/2 – 5 / 10
PostStat PostBio
"
PostStat ! PostStat
Statistics extensions for PostgreSQL #
Cumulative probability distributions: binomial, hypergeometric, negative binomial, Poisson, exponential, gamma, normal, chi-square, t, F, Wilcoxon signed rank and rank sum
#
Linear regression: vector and matrix accumulators, linear fit and F-statistic
#
Statistical tests: Fisher, Shapiro, and [G]SEA
! Examples PL/Lua Resources
"
DB extensions for PostgreSQL
Motivation: test statistical hypotheses!
Tutorial 2/2 – 5 / 10
PostStat PostBio
"
PostStat ! PostStat
Statistics extensions for PostgreSQL #
Cumulative probability distributions: binomial, hypergeometric, negative binomial, Poisson, exponential, gamma, normal, chi-square, t, F, Wilcoxon signed rank and rank sum
#
Linear regression: vector and matrix accumulators, linear fit and F-statistic
#
Statistical tests: Fisher, Shapiro, and [G]SEA
! Examples PL/Lua Resources
"
Motivation: test statistical hypotheses!
"
Documentation: http://poststat.projects.postgresql.org
DB extensions for PostgreSQL
Tutorial 2/2 – 5 / 10
PostStat examples PostBio PostStat ! PostStat ! Examples PL/Lua
CREATE VIEW ngseq AS SELECT s.id, count(*) AS ngenes FROM sequence s JOIN refseq r ON r.seq_id=s.id GROUP BY s.id;
Resources
SELECT shapiro(array_accum(ngenes)) FROM ngseq; \set alpha 0.05 SELECT id, ngenes, 1 - pnorm(ngenes, m, s) AS pvalue FROM ngseq, (SELECT avg(ngenes) AS m, stddev(ngenes) AS s FROM ngseq) AS q WHERE 1 - pnorm(ngenes, m, s) < :alpha; DB extensions for PostgreSQL
Tutorial 2/2 – 6 / 10
PL/Lua PostBio
"
PostStat PL/Lua ! PL/Lua ! Examples 1 ! Examples 2
Lua as a procedural language in PostgreSQL #
Trusted and untrusted versions
#
Same facilities in Lua + server extensions (SPI)
#
Each function has a “local namespace” called upvalue and can be recursive
#
Set Returning Functions (SRF) use coroutines
Resources
"
Prototype CREATE FUNCTION func (args) RETURNS rettype AS $$ -- Lua function body $$ LANGUAGE [ pllua | plluau ];
"
DB extensions for PostgreSQL
Documentation: http://pllua.projects.postgresql.org Tutorial 2/2 – 7 / 10
PL/Lua examples PostBio PostStat
CREATE DOMAIN nucseq AS text CHECK (value ~ ’ˆ[acgt]*$’);
PL/Lua ! PL/Lua ! Examples 1 ! Examples 2 Resources
CREATE FUNCTION gccontent (s nucseq) RETURNS double precision AS $$ local c = 0 for i = 1, #s do local l = s:sub(i, i) if l == "c" or l == "g" then c = c + 1 end end return c / #s $$ LANGUAGE pllua;
DB extensions for PostgreSQL
Tutorial 2/2 – 8 / 10
PL/Lua examples PostBio PostStat PL/Lua ! PL/Lua ! Examples 1 ! Examples 2 Resources
CREATE FUNCTION randseq (n integer) RETURNS nucseq AS $$ local s = {} for i = 1, n do s[i] = upvalue() end return table.concat(s, "") end do upvalue = function () local u = math.random() if u < 0.25 then return "a" elseif u < 0.5 then return "c" elseif u < 0.75 then return "g" else return "t" end end $$ LANGUAGE pllua;
DB extensions for PostgreSQL
Tutorial 2/2 – 9 / 10
PostBio PostStat PL/Lua Resources
Wiki http://postbio.wikidot.com
DB extensions for PostgreSQL
Tutorial 2/2 – 10 / 10