Statistical Estimators for Relational Algebra Expressions - UF CISE
Recommend Documents
support negation (e.g., FOIL [11] and TILDE [12]). It is quite conceivable that by combining various of these earlier ILP techniques, relational algebra queries can.
disk protection model that confines operating systems stored on the same disk ..... The token manager is used to construct capabilities and instruction queues and ..... We installed four different guest OSes on SwitchBlade, including Windows XP, ....
N 1/2 X N 1/2 mesh-connected computer and @(log4N) time on both a cube connected and a ..... To get input 2 to its final destination, we must have S (2, 4) = 0.
In an earlier paper we presented a linear time algorithm for computing the Kolmogorov-Smirnov and Lilliefors test statistics. In this paper we present a linear time ...
two bitonic sort algorithms and one quicksort algorithm. The two bitonic .... tion, the Gray code scheme of Chan and Saad [1986] may be used. Figure 4 shows.
IV-like nxn mesh-connected processor array is presented. This. Each selected .... can be sorted with respect to any other index function using to Pk/2 + 1, ***, PK.
Conference Chairs: Mark S. Schmalz, Univ. of Florida; Gerhard X. Ritter, Univ. of Florida; Junior Barrera, ... Paulo (Brazil); Jaakko T. Astola, Tampere Univ. of Technology (Finland) ... Scholl, College of Optical Sciences/The Univ. of Arizona.
Anand Rangarajan1, Eric Mjolsness2, Suguna Pappu1, 3, Lila Davachi4, Patricia S. ... University of California San Diego (UCSD). Abstract .... minutes, the primate was sacrificed, perfused and the brain was frozen at -70 C. 20¡ m thick frozen.
or parenthesis is the priority associated with the operator or parenthesis when ... Note that, since parentheses do not appear in the postfix form, an AFTER value.
Abstract. The implementation of Lee's maze routing algorithm on an MIMD hypercube .... k is a power of 2. k = 24 where d is called the dimension of the hypercube. ... Each partition is labeled with the processor to which it is assigned. As ... the fr
Each hypercube processor (called node) has its own local memory. .... that an unlimited number of processors are available, a parallel algorithm developed for a.
Abstract-An O(n) algorithm to sort nº elements on an Illiac. 3) A REGİSTER INTERCHANGE instruction with time = T1. IV-like nxn mesh-connected processor ...
algebra C++ object library, iac++.1 The techniques and algorithms presented in a given chapter follow .... function, and denotes the absolute value (or ...... that the ability of replacing X with Z greatly simplifies the issue of template implementat
Data Structures for Databases 60-3 area called bufier. A bufier may be a bufier pool and may comprise several smaller bufiers. Components at higher architecture ...
DAIS/ROSIS Imaging Spectrometers at DLR. In Proceed- ings of the 3rd EARSeL Workshop on Imaging Spectroscopy, pages 3â14, Herrsching, Germany, 2003.
S. Sahni is with the Computer Science Department, University of Min- nesota .... NAHAR AND SAHNI: FAST ALGORITHM FOR POLYGON DECOMPOSITION. 475. Step 1 Let the given ...... nology degree in electrical engineering from the.
Conference Chairs: Mark S. Schmalz, Univ. of Florida; Gerhard X. Ritter, Univ. of ... Paulo (Brazil); Jaakko T. Astola, Tampere Univ. of Technology (Finland). Program Committee: Stefano Baronti, Istituto di Fisica Applicata Nello Carrara ... 11:00 am
This is a tutorial on general techniques for combinatorial approxima- tion. .... In the terminology of Garey and Johnson [2, 3] such an algorithm is called.
Relational Algebra: More operational, very useful for ... Understanding Algebra &
Calculus is key to ... Intersection, join, division, renaming: Not essential, but.
Jan 7, 2005 - Many real-world systems today rely on password authentication to verify the .... authenticated key exchange system and prove it secure in the random ..... Let (x, y) be the servers' global key pair such that y = g .... run DistVerify(Ï
has the advantage that the switches can be set in 0(k log N) time by either a cube or perfect shufï¬e ...... 2 for s := 1 to l'n/m'l do //[n/m'| phases of multiway merge//.
Mobile Networking Concepts and Protocols CNT 5517 Dr. Sumi Helal, Ph.D. Professor Computer & Information Science & Engineering Department University of Florida ...
function Position( i );. /* find the position of element i in t[1..m]. < n < k */ begin j := k + i â 1 .... I.e., it is determined by the position of the rightmost zero in the binary.
TV and tTV we see that good speedups can be expected when .... [GOPA85] P.S. Gopalakrishnan, I.V. Ramakrishnan, and L.N. Kanal, "Computing Tree Func-.
Statistical Estimators for Relational Algebra Expressions - UF CISE
Jan 26, 2004 - All operators have set semantics (no duplicates). ⢠No knowledge about the distribution of the data is assumed. Statistical Estimators for ...
Statistical Estimators for Relational Algebra Expressions Wen-Chu Hou, Gultekin Ozsoyoglu and Baledo K Taneja Presenter: Alin Dobra January 26, 2004
The Need for Approximation in Relational Databases • Relational database technology not directly applicable to real-time or time-constrained data processing environments • Problems with traditional relational databases: – cannot be fed real-time data because of monolithic nature – software is very large with lots of components: concurrency control, optimization – heavily utilizes secondary storage • Possible solution: main memory databases – need to fit all data in memory to avoid secondary storage altogether – even with all data in memory exact query processing is still expensive
Statistical Estimators for Relational Algebra Expressions
2
Approximate Query Processing: Sampling Approach • Select a sample from the database • Use the sample to construct a synthetic response to the requested query – construct a statistical approximation of query responses • Focus of the paper: – use sampling to determine consistent and unbiased estimators for the query results – focus only on queries of the form COUNT(E) with E an relational algebra expression – E is allowed to contain o n, ∩, ∪, −, σ and π ∗ All operators have set semantics (no duplicates) • No knowledge about the distribution of the data is assumed
Statistical Estimators for Relational Algebra Expressions
3
Statistical Estimators Notation: • Ψ: parameter of interest – population mean ˆ estimate of parameter Ψ computed from the sample – guess for the real Ψ • Ψ: Unbiased Estimator: ˆ is called an unbiased estimator of Ψ if E[Ψ] ˆ = Ψ for all values of Ψ •Ψ ˆ is not unbiased, • If Ψ
ˆ = E[Ψ] ˆ −Ψ bias(Ψ)
ˆ is defined as: • The Mean Square Error of estimator Ψ ˆ = E(Ψ ˆ − Ψ)2 ) MSE(Ψ) ˆ + (bias(Ψ)) ˆ 2 = Var(Ψ) ˆ ˆ unbiased = Var(Ψ) if Ψ ˆ is consistent if Ψ ˆ → Ψ when the number of samples goes Consistent Estimators: Ψ to infinity (or all tuples in database here) Statistical Estimators for Relational Algebra Expressions
4
Statistical Estimators (cont) • All estimates have error – sample size is finite – have to estimate the error of the estimator – confidence bounds: interval around estimate in which the true value lies with high (prescribed) probability Determining Confidence Intervals: ˆ is usually an average or sum of averages • Idea: Ψ ˆ is normal – Central limit theorem ⇒ distribution of Ψ ˆ is unbiased and Var(Ψ) ˆ is known than the confidence interval is – if Ψ q ˆ ± z × Var(Ψ) ˆ E[Ψ] where z is the value for N(0, 1) that corresponds to the desired confident interval ˆ is not normally distributed • If Ψ – can use Chebyshev’s theorem to give pessimistic, distribution independent bounds
Statistical Estimators for Relational Algebra Expressions
5
ESTIMATE COUNT(E) Algorithm • Input: an arbitrary relational algebra expression E • Output: an estimate of COUNT(E) 1. Push projection inside union. For term Ei π (∪m Eim ) = ∪m π(Eim ) π(Eim ) considered a relation 2. Transform E into E1 φ1 . . . φn−1 En with φi ∈ {∪, −} with Ei not containing these type of operators. 3. Compute estimator Cˆ j of COUNT(E) = ∑ j (±)COUNT(E 0j ) using the inclusion exclusion principle. For each COUNT(E 0j ) chose the appropriate estimator depending if E 0j contains or not π Return ∑ j (±)Cˆ j
Statistical Estimators for Relational Algebra Expressions
6
Example 1: Overall Algorithm Estimate: COUNT(E) = COUNT(R1 o n (R2 − π(R3 ∪ R4 − R5 ))) 1. R1 o n (R2 − π(R3 ∪ R4 − R5 )) = R1 o n (R2 − ((π(R3 − R5 )) ∪ (π(R4 − R5 )))) 2. Notation: R∗3 = π(R3 − R5 ) R∗4 = π(R4 − R5 ) R1 o n (R2 − π(R3 ∪ R4 − R5 )) = (R1 o n R2 ) − ((R1 o n R∗3 ) ∪ (R1 o n R∗4 )) 3. COUNT(E) = COUNT(R1 o n R2 ) − COUNT((R1 o n R2 ) ∩ ((R1 o n R∗3 ) ∪ (R1 o n R∗4 ))) = COUNT(R1 o n R2 ) − COUNT(R1 o n (R2 ∩ R∗3 )) − COUNT(R1 o n (R2 ∩ R∗4 )) + COUNT(R1 o n (R2 ∩ R∗3 ∩ R∗4 )
Statistical Estimators for Relational Algebra Expressions
7
Estimating COUNT(R1 o n ··· o n Rn) Idea: • Relation R with k tuples can be mapped to a set of k points in one-dimensional space. • Crossproducts R1 × · · · × Rn can be represented as a point in an n-dimensional space with d1 , . . . , dn projections in each direction. Call the mappings fi ()˙ • Natural join and intersection (particular form or natural join) can be represented as a subset of all the possible points in this space. • Alternative: assign value 1 to each point p( f1 (t1 ), . . . , fn (tn )) if (t1 , . . . ,tn ) ∈ R1 o n ··· o n Rn and 0 otherwise. COUNT(E) = number of 1s in the mapping To solve this subproblem it is enough to estimate the number of 1s.
Statistical Estimators for Relational Algebra Expressions
8
Estimating number of 1s Notation: • Ni number of tuples in Ri • N = N1 × · · · × Nn • Assume points are numbered p1 . . . pN (any enumeration is fine) • yi value 0 or 1 for point pi • Y (E) = y1 + · · · + yN is the total number of 1s – COUNT(E) Estimator for Y (E): With S an uniform random sample of points in R1 × · · · × Rn ∑ p ∈S yi Yˆ (E) = N i |S| Can show: • Yˆ (E) is an unbiased estimator of Y (E) •
2 2 N − |S| ∑ pi ∈S (yi − y) ˆ Var(Y (E)) = N , N −1 |S|2
Statistical Estimators for Relational Algebra Expressions
y=
∑ pi ∈S yi |S| 9
Incorporating ∪,−,π Operators ∪ and −: Use inclusion exclusion principle. This can be achieved by applying transformation rules that push the operators ∩ and − outside and then using properties of COUNT(). COUNT(R1 − R2 ) = COUNT(R1 ) − COUNT(R1 ∩ R2 ) Operator π: • Difficulty is in eliminating duplicates • Must not count contribution of a tuple twice • Goodman estimator: estimates number of distinct values (groups) in a relation
∑ A i xi i
with xi the number of sample points with value i and (i) i [N − |S| + i − 1] Ai = 1 − (−1) |S|(i)
n(i) = n(n − 1)(n − i + 1) if i > 0, 1 otherwise Statistical Estimators for Relational Algebra Expressions
10
Sampling For Aggregate Estimation • Just need uniform samples from crossproduct spaces for the most part • Can obtain such samples by picking random tuples in each of the participating relations • Samples can be reused form multiple estimations • Use nonuniform samples: make all combinations from samples from multiple relations – Computation of variance more difficult since the samples are not iid • Clustered sampling: sample blocks instead of tuples
Statistical Estimators for Relational Algebra Expressions