B-term Approximation Using Tree-Structured Haar Transforms Hsin-Han Ho† , Karen O. Egiazarian* , and Sanjit K. Mitra† † Dept.
* Dept.
of ECE, University of California, Santa Barbara, CA 93106, U.S.A. of Signal Processing, Tampere University of Technology, P.O. Box 553, 33101, Tampere, Finland ABSTRACT
We present a heuristic solution for B-term approximation of 1-D discrete signals using Tree-Structured Haar (TSH) transforms. Our solution consists of two main stages: best basis selection and greedy approximation. In addition, when approximating the same signal with different B constraints or error metrics, our solution also provides the flexibility of reducing overall computation time of approximation by increasing overall storage space. We adopt a lattice structure to index basis vectors, so that one index value can fully specify a basis vector. Based on the concept of fast computation of TSH transform by butterfly network, we also develop an algorithm for directly deriving butterfly parameters and incorporate it into our solution. Results show that, when the error metric is either normalized 1 -norm or normalized 2 -norm, our solution has comparable (sometimes better) approximation quality with prior data synopsis algorithms. Keywords: Haar transform, approximation, data synopsis, best basis algorithm
1. INTRODUCTION Approximation techniques are of fundamental importance due to its wide applicability in science and engineering. In particular, in many database applications it is often desirable to have quality approximation of target signals, called synopsis, given a space constraint.1, 2 Due to this demand many synopsis algorithms have been developed. Although some of them are developed for higher dimensions, here we focus on approximating 1-D discrete signals. More specifically, our objective is, given an original signal f of length N , to construct its approximation fˆ by B ( N ) non-zero coefficients and B indices, such that a pre-defined error metric is minimized. The error metric we use here is normalized p -norm: |fi − fˆi |p 1/p , p ∈ [1, ∞]. N i Synopsis algorithms can be generally classified into two principal approaches: 1) histogram-based approach,3–9 2) hierarchical approach utilizing dyadic Haar wavelets.8, 10–14 Very recently also has emerged combined approaches like Haar+ ,15 Compact Hierarchical Histogram (CHH)16 and Lattice Histogram (LH).17 Among these algorithms, LH has particularly attracted our attention. In LH paper, the optimal approach and one heuristic approach are proposed. Both approaches run under a space constraint B and a resolution parameter δ. In the general case of monotonic distributive error metric, defined in Ref. 15, the optimal approach minimizes error but is space expensive. In special case of maximum error metric, the optimal approach requires less space usage than general case by solving the dual error-bounded problem followed by binary search in error domain. To reduce space usage under general error metric case, one heuristic approach is proposed. It consists of two stage optimization: 1) obtaining the optimal solution, namely node locations and node values, under an appropriate maximum error metric, and 2) adjusting node values following the spirit of Ref. 16. Further author information: (Send correspondence to H.H.) H.H.: E-mail:
[email protected] K.O.E.: E-mail:
[email protected] S.K.M.: E-mail:
[email protected] Image Processing: Algorithms and Systems VII, edited by Jaakko T. Astola, Karen O. Egiazarian Nasser M. Nasrabadi, Syed A. Rizvi, Proc. of SPIE-IS&T Electronic Imaging, SPIE Vol. 7245, 724505 © 2009 SPIE-IS&T · CCC code: 0277-786X/09/$18 · doi: 10.1117/12.816680 SPIE-IS&T/ Vol. 7245 724505-1
When properly setting the value of resolution parameter δ, both approaches of LH significantly outperform all earlier algorithms in terms of approximation quality. The advantage of LH lies in the good mix of local and non-local approximation, achieved by adopting the lattice data structure. Local approximation is carried out using a single coefficient to represent multiple consecutive data points, whereas non-local approximation is naturally embedded in the hierarchical structure of lattice. When error metric is normalized 1 -norm, optimal Δ 2 3 2 LH requires O( Δ δ N B ) time complexity and O( δ N B) space complexity. However, before obtaining the final satisfactory approximation, it seems necessary to spend additional time/space resources to obtain a moderate δ value. Based on this observation, we feel it is of interest to develop an approximation algorithm with lower time and space complexity than both optimal LH and heuristic LH, namely keeping the advantage of lattice data structure and removing the resolution parameter δ. To this end, we have chosen Tree-Structured Haar (TSH) transforms18 for developing approximation algorithm. TSH transforms are a family of generalized Haar transforms defined by binary interval splitting trees (BISTs). Each tree splitting node specifies support of the corresponding basis vector. It is straightforward to embed basis vector information into the lattice data structure, where each lattice node specifies a basis vector φi and its coefficient ci conveys the information for constructing fˆ = i ci φi . In addition, we have chosen ci = f, φi to avoid the use of resolution parameter. This is in contrast to the unrestricted Haar wavelets13 in which ci ∈ R. We have taken a heuristic approach for determining B non-zero coefficients of TSH transforms. More specifically, our solution consists of two main stages: 1) selecting the best basis whose coefficients ci = f, φi having the minimum cost, respect to a pre-chosen additive cost function such as p -norm or entropy, and 2) greedy approximation using the best basis. In stage one, we select the minimum cost basis from a large library of piecewise constant orthonormal bases by varying the structure of BIST. The best basis should capture signal’s main features in just few basis vectors, giving low approximation error after properly choosing B non-zero coefficients. In stage two, we use the greedy algorithm of Guha and Harb.13 This algorithm supports error metrics in all p -norms, p ∈ [1, ∞], and guarantees that the final approximation error is within a finite distance to the minimum approximation error from optimal solution. Although it is developed for dyadic compactly support wavelet bases, it also works for TSH transforms. Our solution has lower time/space complexity and comparable approximation quality with heuristic LH for the case of normalized 1 -norm and normalized 2 -norm error metrics. It can be immediately seen that this approach belongs to B-term approximation in approximation theory. B-term approximation is defined as selecting B non-zero coefficients for a pre-defined orthonormal basis such that the approximation error is minimized.19 It is often that we select the best basis from a large library according to some cost criterion. For example, Coifman and Wickerhauser provide a O(N log N ) time algorithm to select the basis with minimum entropy cost from a library of O(2N ) bases.20 Due to double stage of optimization, this type of approximation is highly nonlinear .19
2. TREE-STRUCTURED HAAR TRANSFORMS 18
TSH transforms are generalizations of classical dyadic Haar transform by allowing arbitrarily splitting support of each basis vector (row in transform matrix) into one part for positive values and the other part for negative values, except for the flat (DC in circuit theory) basis vector. Among all non-flat basis vectors, hierarchical dependency exists between parent (longer) and child (shorter) basis vectors. More specifically, support for positive (or negative) values of a parent vector is equal to support of the child vector. All basis vectors are defined by a pre-chosen binary tree, called BIST, whose every splitting node is associated with a piecewise constant function, called TSH function, with one positive part next to one negative part. BIST root node is associated with one TSH function and one flat (DC) function. A TSH basis vector of length N is then sampled from each TSH function at N equally spaced points. For example, given the BIST in Figure 1(a), the associated TSH transform matrix H is given in Eq. (1).
SPIE-IS&T/ Vol. 7245 724505-2
⎛
1 5
⎜ ⎜ 2 ⎜ 3·5 ⎜ ⎜ 1 H = ⎜ 2·3 ⎜ ⎜ 1 ⎜ 2 ⎝ 0
1 5
2 3·5 1
2·3 − 12
0
1 5
1 5
−
2 3·5
−
2 1·3
0
3 2·5
⎞
1 5
⎟
3 ⎟ 2·5 ⎟ ⎟
−
⎟ ⎟ ⎟ ⎟ 0 ⎟ ⎠ − 12
0 0
0
0
1 2
(1)
Recently, Fryzlewicz also generalized classical dyadic Haar transform and named it Discrete Unbalanced Haar Transform (DUHT).21 DUHT does not explicitly take any tree structure for constructing transform matrix. Instead, a set of N − 1 breakpoints is used to specify N × N transform matrix. For example, H in Eq. (1) can be specified by a set of breakpoints {3, 2, 1, 4}. It should be noted that a breakpoint together with its location in the set specify a basis vector, thus breakpoint set may not be suitable for specifying any arbitrary set of basis vectors. In the scenario of B-term approximation, we would need a method to index basis vectors, so that one index value specifies one basis vector.
2.1. Indexing Basis Vectors Inspired by LH,17 we have used a similar lattice structure to index basis vectors. Figure 1(b) illustrates the lattice structure and indices associated with the TSH basis defined by BIST in Figure 1(a). Original signal {f1 , f2 , f3 , f4 , f5 } is placed under the lattice in order to illustrate the support range of each lattice node. The TSH basis is indexed by a set of non-negative integers {0, 3, 12, 17, 20}. Detailed indexing scheme is described as follows. For a TSH basis of size N , the associated lattice consists of N (N +1)(N −1)/6 nodes fully specifying the dictionary of non-flat basis vectors, among which N −1 orthonormal vectors are included in the basis. Each included vector is denoted by a black node (•). Positive integers are used to index lattice nodes, whereas 0 denotes the flat (DC) basis vector. All basis vectors with the same support correspond to lattice nodes stacked at the same location.
5 6 7
17
8 9
gg 10
019 20
181
(a)
(b)
Figure 1. (a) Example of a binary interval splitting tree, and (b) the corresponding lattice structure and indices. Original signal {f1 , f2 , f3 , f4 , f5 } is placed under the lattice.
2.2. Butterfly Network Computation by Parameter Table In Ref. 18, a fast algorithm for computing TSH transform in O(N ) operations is proposed. Fast computation is achieved by a series of high-pass and low-pass operations, called butterfly network. Based on an input BIST, a set of sparse matrices are constructed such that their product is equal to the TSH transform matrix. Each sparse matrix specifies one layer of butterfly network. In this section, we propose an algorithm to compute butterfly parameters directly without using any sparse matrix. This algorithm takes a breakpoint set BS as input, and then output a table of butterfly parameters T B and a reordered set of breakpoints RBS. Each table
SPIE-IS&T/ Vol. 7245 724505-3
cell corresponds to a layer of butterfly network. RBS is created by permuting BS according to the order of sorted parameters in T B, and is converted to lattice indices in our proposed heuristic solution. Our algorithm for building parameter table for a size N TSH transform is as follows: Algorithm: BuildParameterTable(BS) Input: Breakpoint set BS = {b1 , b2 , ..., bN −1 } which specifies a TSH basis in RN . Output: A table T B (with O(N ) space) consisting of butterfly parameters. A reordered set of breakpoints RBS. Step 1 Initialize an array of {a1 , a2 } = {1, N }. According to each bk ∈ BS, iteratively split the array at al−1 ≤ bk < al and insert values {bk , bk + 1}. Extract 4 butterfly parameters each time splitting the array: 1) 2 parameters hk,1 = al − bk , hk,2 = −(bk − al−1 + 1) for high-pass operation, 2) 2 indices {al−1 , bk + 1} for low-pass and high-pass coefficients. Parameters of low-pass operation are set to be {1, 1} throughout this step. Step 2 Use stable sorting to sort butterfly parameters twice, first time by |hk,2 |, and the second time by hk,1 + |hk,2 |. Permute BS by the order of sorted parameters, and store into a new set RBS. Group parameters with the same high-pass operation and store into one cell in T B. Time and Space Complexity: Since we use array data structure in Step 1, it takes O(N 2 ) time for processing N − 1 break points. Step 2 takes O(N log N ) time due to sorting operation. Space complexity is O(N ) due to growing array and parameter table. For example, given an input breakpoint set of BS = {4, 2, 1, 3, 6, 5, 9, 8, 7}, illustration of Step 1 is shown in Figure 2. Figure 3(a) shows the output parameter table, which specifies the butterfly network in Figure 3(b). The output RBS = {4, 6, 2, 9, 8, 1, 3, 5, 7}. (1
1O}
4
6
4®W} 2
2
1
1
1O}
5
i;-
E;__-
5
4 1
(1
1
2
2
1
(
( 2
(1
1
2
2
3
3
4
4 ®1O}
(1
1
2
2
3
3
4
4
(1
1
2
2
3
3
4
4
5
5
6
6 ®__(1O}
{1
1
2
2
3
3
4
4
5
5
6
6
710}
(i'js (jjs 1
(1
1
2
2
3
3
4
4
5
5
6
10
10}
1
6 (1J7 (6 9
9
10
10}
Figure 2. Illustration of array splitting and extracting butterfly parameters. Input breakpoint set is {4, 2, 1, 3, 6, 5, 9, 8, 7}.
SPIE-IS&T/ Vol. 7245 724505-4
0 1
1
6
-4
1
5
1
1
4
-2
5
7
2
1
1
2
-2
1
3
3
1
1
1
-3
7
10
4
1
1
1
-2
7
9
1
1
1
-1
1
2
8
1
1
1
-1
3
4
9
1
1
1
-1
5
6
10
8
11
1
1 2
−1
signal index
6 −2 1 −1 −4
5
1
−2
7
1
1
1
-1
7
4
−1
6
1 1
−1
1 −2 −3
8
7
6
(a)
5 4 3 layer m of butterfly network
2
1
0
(b)
Figure 3. (a) Example of parameter table, and (b) its associated butterfly network. Only parameter values of high-pass operations are shown.
3. PROPOSED HEURISTIC SOLUTION Step 1 Select the best TSH basis and represent it by a set of breakpoints BS. Step 2 Run algorithm BuildParameterTable(BS) to obtain parameter table T B. Calculate transform coefficients ci = f, φi by T B, and store into an array by the order of sorted parameters in T B. Step 3 Convert reordered breakpoint set RBS to lattice indices. Store 4 arrays (transform coefficients, lattice indices, hk,1 , |hk,2 |) of length N for greedy approximation with input parameters (p, B). B specifies the number of non-zero terms, and p denotes normalized p -norm error metric. Step 4 Load previously stored 4 arrays. Based on hk,1 , |hk,2 | and p, calculate p = basis vector φi .
p p−1
and ||φi ||p for each
Step 5 Apply the greedy approximation in Ref. 13, namely select B largest terms of |ci |/||φi ||p .
3.1. Best Basis Algorithm for TSH Transforms We have adopted dynamic programming to select the best basis having minimum cost. Because of distributive nature of dynamic programming, we only use additive cost functions, such as entropy or p -norm of transform coefficients. Since the flat (DC) basis vector is always included in the best basis, best basis search essentially selects N − 1 orthonormal vectors from the dictionary of N (N + 1)(N − 1)/6 non-flat TSH basis vectors, where N denotes the length of input signal. The total number of orthonormal bases, CN , in TSH transforms can be derived by the following recursion: N −1 CN = Ck · CN −k , C1 = 1. (2) k=1
Let J(P, Q) be the minimum cost from Q−P non-flat orthonormal basis vectors having support within [P, Q]. The following recursive equation computes J(P, Q): J(P, Q) =
min
P ≤b≤Q−1
(J(P, b) + J(b + 1, Q) + cost of basis vector φ(P, Q, b)),
(3)
where basis vector φ(P, Q, b) has support [P, Q] and breakpoint location at b. For a 1-D signal defined in [1, N ], the cost of best basis is thus equal to: J(1, N ) + cost of flat (DC) basis vector. Figure 4 illustrates the pseudocode of our dynamic programming implementation for cost minimization in Eq. (3). During minimization, two 2-D tables are used to bookkeep solutions of sub-problems: minimum costs (table J) and breakpoints
SPIE-IS&T/ Vol. 7245 724505-5
(table brk pt). Computation of each inner product in O(1) time is achieved using partial sums of input signal stored in a separate table table psum. After cost minimization, breakpoints specifying best basis are derived by backtracking table brk pt. Our best basis algorithm is related to the fast recursive tiling algorithm by Huang et al.22 for finding the optimal tiling of a 2-D discrete image. Our algorithm is different in two aspects: 1) it does not conduct any tree pruning, and 2) the original signal is placed at the tree leaf nodes. In addition, our implementation does not require the use of recursive functions. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
MinCost TSHaarTransform ( s i g i n ) N = length ( s i g i n ) ; f o r k = 1 :N table J (k , k) = 0; table psum (k , k ) = s i g i n ( k ) ; end f o r k = 1 : N−1 // LOOP: ( s u p p o r t l e n g t h )−1 o f e a c h b a s i s v e c t o r f o r p = 1 : N−k // LOOP: s t a r t i n g l o c a t i o n o f e a c h s u p p o r t t a b l e J ( p , p+k ) = i n f ; min brk pt = []; t a b l e p s u m ( p , p+k ) = t a b l e p s u m ( p , p+k−1) + t a b l e p s u m ( p+k , p+k ) ; f o r b = p : p+k−1
// LOOP:
all
possible
b r e a k p o i n t s o f b a s i s v e c t o r p h i [ p , p+k , b ]
// c a l c u l a t e i n n e r p r o d u c t o f b a s i s v e c t o r p h i [ p , p+k , b ] tmp innpd = ( ( p+k−b ) ∗ t a b l e p s u m ( p , b ) − ( b−p+1)∗ t a b l e p s u m ( b+1 , p+k ) ) / s q r t ( ( p+k−b ) ∗ ( b−p +1)∗( k+1) ) ;
...
// c a l c u l a t e c o s t t m p c o s t = t a b l e J ( p , b ) + t a b l e J ( b+1 , p+k ) + ( c o s t o f tmp innpd ) ; if
t m p c o s t