SUBMITTED FOR IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-I, REGULAR PAPERS
1
Novel Block-Formulation and Area-Delay-Efficient Reconfigurable Interpolation Filter Architecture for Multi-Standard SDR Applications Basant Kumar Mohanty, Senior Member, IEEE Abstract—A poly-phase based interpolation filter computation involves an input-matrix and coefficient-matrix of size (P × M ) each, where P is the up-sampling factor and M = N/P , N is the filter length. The input-matrix and the coefficient-matrix resizes when P changes. In this paper, we made an analysis of interpolation filter computation for different up-sampling factors to identify redundant computations and removed those by reusing partial results. Reuse of partial results eliminates the necessity of matrix resizing in interpolation filter computation. A novel blockformulation is presented to share the partial results for parallel computation of filter outputs of different up-sampling factors. Using the proposed block formulation, a parallel multiplier-based reconfigurable architecture is derived for interpolation filter. The most remarkable aspect of the proposed architecture is that, it does not require reconfiguration to compute filter outputs of an interpolation filter for different up-sampling factor. The proposed structure has regular data-flow and it has no overhead complexity for its reconfigurable feature unlike the existing structures. Besides, the proposed structure has significantly less register complexity than the existing structure and its register complexity is independent of the block-size. Moreover, the proposed structure can support higher input-sampling frequency than the existing structure. ASIC synthesis result shows that the proposed structure for block-size 4, filter length 32, and upsampling factor 8, involves 13.6 times more area and offers 245 times higher maximum input-sampling frequency compared with the existing multiplier-less structure. It involves 18.6 times less area-delay-product (ADP) and 9.5 times less energy per output (EPO) than the existing multiplier-less structure. Index Terms—Interpolation filter, reconfigurable, digital-up converter, architecture, VLSI.
I. I NTRODUCTION
I
NTERPOLATOR is used in digital signal processing (DSP) systems to increase the sampling rate digitally, and it comprises of an up-sampler and an anti-imaging (interpolation) filter. The up-sampler change the sampling rate of base-band signal, while the interpolation filter suppress the undesired interference effect resulted due to up-sampling [1] the baseband signal. Pulse shaping filters like root raised cosine (RRC) filter is commonly used as interpolation filter due to its high inter-symbol interference (ISI) rejection ratio and high bandwidth limitation criteria [2]. Interpolation filter has a different coefficient-vector for different up-sampling factors of a base-band signal. Software defined radio (SDR) technology enables for digital implementation of wide band trans-receivers of multi-standard wireless communications [3]. A multi-standard SDR system involve interpolators with interpolation filters with different B. K. Mohanty with the Dept. of Electronics and Communication Engineering, Jaypee University of Engineering and Technology, Raghogarh, Guna, Madhya Pradesh, India-473226, (email:
[email protected]).
coefficients, filter length and up-sampling factors to meet the stringent specifications of different communication standards [3]. For example: Universal Mobile Telecommunication Standard (UMTS) uses interpolators with interpolation factors (4, 8 and 16), and filter lengths (25, 49, 97), respectively. When these interpolators are implemented individually in a hardwired circuit, then a huge amount of resource is spent. A reconfigurable finite impulse response (FIR) interpolation filter is the most appropriate for a resource and power constrained multi-standard SDR receiver which could support different upsampling factors as well as filter specifications. During last decade, several multiplier and multiplier-less designs have been suggested for efficient hardware realization of reconfigurable FIR filters and filter-banks for SDR channelization [4]–[16]. But, we do not find much work on reconfigurable interpolation filter architecture except a few. A single rate (fixed up-sampling factor) FIR interpolation filter can be implementation using a FIR filter. This could be the reason for non availability of any specific design in the literature for reconfigurable FIR interpolation filter. However, a single rate interpolation filter operate at P times higher sampling rate than the input sampling frequency and requires N filter parameters to compute each output, where P is the up-sampling factor. On the other hand, a poly-phase based multi-rate interpolation filter operates at the input sampling rate and compute P outputs using P sub-filters each having dN/P e filter parameters [1]. Therefore, a multi-rate interpolation filter structure is more hardware efficient than the single rate interpolation filter structure. The existing reconfigurable FIR filter structures are efficient for channelizer, but they do not offer an efficient computing structure for reconfigurable interpolation filter. A few multiplier-less designs are proposed for interpolation filter [17], [18], [20]. Symmetric property of pulse shaping filter and a LUT decomposition scheme are used in [17] to reduce the area complexity of 1:4 interpolation filter. In [18], symmetries of coefficient and LUT are used along with LUT sharing of in-phase and quadrature-phase filters to save LUT words which offers a significant saving in area complexity of the interpolation filter. Both these designs implement the in-phase and quadrature-phase pulse shaping filters of QPSK modulator. They can not be reconfigured for up-sampling factor other than 4 and for different filter specifications. A distributed arithmetic (DA)-based reconfigurable FIR interpolation filter architecture is proposed in [19]. The DALUT stores partial results of all the sub-filter outputs of interpolation filter with three different interpolation factors. Therefore, the structure requires a large size DA-LUT which
SUBMITTED FOR IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-I, REGULAR PAPERS
is not favorable for single chip realization. Recently, Hatai et al [20] have proposed a reconfigurable FIR interpolation filter design similar to [19] using LUT-less DA technique to reduce the area complexity. Coefficient-vector of the desired interpolation filter are selected using an array of multiplexers. The structure uses AND-gates, multiplexer and adders to implement the DA-LUT and computes a sub-filter output of the interpolation filter in bit-serial manner. It involves less area than the previous structures and supports base-band signal of low-sampling rates. Besides, the structure has a large overhead complexity (in terms of multiplexer and registers) for its reconfigurable feature. An interpolation filter of up-sampling factor P and filter length N involves an input-matrix of size (P × M ), where dM = N/P e. Input-matrix size of the interpolation filter changes for different filter-lengths and up-sampling factors. As a consequence of this, the reconfigurable interpolation filter architecture has irregular data-flow and the structure is not suitable for hardware realization. This problem is partially solved by assuming a constant filter length (equal to the length of the largest size filter) for the reconfigurable interpolation filter, where smaller size filters are realized using the same structure by zero padding. With this assumption, the inputmatrix of an reconfigurable interpolation filter only re-sizes when P changes. For example: the size of the input-matrix of an interpolation filter for filter length N = 16 and upsampling factors P = 2, 4 and 8, are respectively, (2 × 8), (4 × 4), and (8 × 2). Due to re-size of the input-matrix, the number of sub-filters of an interpolation filter changes with the column size (P ) of the input-matrix where the length of each sub-filter is equal to the size of the row (N/P ). The number of sub-filters required for reconfigurable interpolation filter for full-parallel realization is equal to the highest up-sampling factor. When a full-parallel reconfigurable interpolation structure is configured for lowest up-sampling factor, then maximum number of sub-filters remain unused in the parallel structure. Similarly, when the full-parallel structure is configured for highest up-sampling factor, then hardware resource of each sub-filter are partially utilized. Therefore, a full-parallel reconfigurable interpolation filter structure has a low hardware utilization efficiency (HUE). Computation of P sub-filters can be folded into a single sub-filter since they have identical computation. A folded reconfigurable interpolation filter structure has better HUE than the full-parallel structure. It involves only one sub-filter irrespective of up-sampling factor and computes one sub-filter output per cycle [20]. Therefore, the existing reconfigurable interpolation filter architectures uses a folded structure instead of a full-parallel structure. But, the folded interpolation filter structure has one major problem. Its output sampling frequency remain constant irrespective of the up-sampling factor and the input sampling frequency decreases inversely with the up-sampling factor. In other word, the folded structure does not increase the output sampling rate for higher up-sampling factor. Instead of that it decreases the input sampling rate by P times. Therefore, the folded reconfigurable interpolation filter architectures are suitable for base-band signals of low sampling frequency. On the other hand, the parallel interpolation filter structure increase the
2
output sampling rate proportionately with P , but the structure is not hardware efficient. We observe that the existing reconfigurable interpolation filter architecture is derived using a straight-forward FIR filter design of [11]. Since, the reconfigurable interpolation filter has a different data-flow than the reconfigurable FIR filter, a different design approach need to be considered for interpolation filter. We do not find any such design approach in the literature. The interpolation filter algorithm need to be reformulated taking the data-flow of reconfigurable filter into consideration. Keeping this in mind, in Section II, we made an analysis on interpolation filter computation for different upsampling factors to identify redundant computations. the key contribution of this paper are: • Reuse of partial results in reconfigurable interpolation filter. • A novel block-formulation is presented for efficient realization of reconfigurable interpolation filter. • Parallel inner-product computations are performed using array-multiplication and addition to facilitate reuse of partial results in an interpolation filter. • A parallel reconfigurable architecture is presented for area-delay and power efficient realization of interpolation filters. The rest of this paper is organized as follows: Analysis of interpolation filter computation is presented in Section II. Block formulation of reconfigurable interpolation filter architecture is presented in Section III. Proposed architecture is presented in Section IV. Hardware complexity and performance comparison is presented in Section V. Conclusion is presented in Section VI. II. A NALYSIS OF I NTERPOLATION F ILTER C OMPUTATION The interpolation filter is comprised of an up-sampler and a pulse shaping filter (PSF). Suppose an input signal x(n) is upsampled by factor P . The up-sampled sequence {u(m) = x(n) for m = nP , otherwise u(m) = 0} is filtered by the PSF of length N . The output of an interpolation filter is computed by the relation
y(m) =
N −1 X
h(i) · x(m − i)
(1)
i=0
Using poly-phase decomposition, (1) can be expressed as
y(P n − p) =
M −1 X
h(P i + p) · x(n − P i − p)
(2)
i=0
where m = P n, 0 ≤ p ≤ P − 1, and N = M P . An interpolation filter of up-sampling factor P produce P outputs for each sample of x(n). Using poly-phase decomposition of (1), P outputs of the interpolation filter are computed by P parallel sub-filters of length M each. Computation of P parallel sub-filters can be expressed in matrix form as y n = Xn ⊕ H
(3)
SUBMITTED FOR IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-I, REGULAR PAPERS
where, ⊕ denotes the inner-product of i-th row of Xn with i-th row of H. The input-matrix Xn , weight-matrix H and output-vector yn are defined as Xn = x(n) x(n − P ) . . x(n − N + P ) x(n − 1) x(n − P − 1) . . x(n − N + P − 1) . . . . . . . . . . x(n − P + 1) x(n − 2P + 1) . . x(n − N + 1) H=
h(0) h(1) · · h(P − 1)
3
factors. Therefore, reuse of partial results not only makes the reconfigurable interpolation filter architecture simple, it also favors parallel computation of interpolation filter outputs of different up-sampling factors without performing extra computation. u0 u2 u4 u6 u1 u3 u5 u7
x(n-4) x(n-6) x(n-5) x(n-7)
h(0) h(2) h(4) h(6) h(1) h(3) h(5) h(7)
(a)
h(P ) · · h(N − P ) h(P + 1) · · h(N − P + 1) · · · · · · · · h(2P − 1) · · h(N − 1)
yn = [y(P n), y(P n − 1), · · ·, y(P n − P + 1)]
x(n) x(n-2) x(n-1) x(n-3)
=
u0
u4
u1 u2 u3
u5 u6 u7
=
x(n) x(n-1)
x(n-4) x(n-5)
x(n-2) x(n-3)
x(n-6) x(n-7)
h(0) h(1) h(2) h(3)
h(4) h(5) h(6) h(7)
(b)
Fig. 1. (a) The n-th cycle input-matrix (X), coefficient-matrix (H) and partial values (ui ) of (X H) for P = 2 and N = 8, where ui = x(n − i) · h(i) for (0 ≤ i ≤ 7). (b) X, H and partial values (ui ) of (X H) for P = 4 and N = 8.
T
Matrix-vector product of (3) can otherwise be expressed as: H
yn =
sn,0
sn,1 . . sn,M −1 ⊗
c0
c1 . . cM −1
N
N
X
Partial result generation unit
(4) i
where, ⊗ denotes the vector operation i=0 sn,i ci , and denotes the array-multiplication. sn,i and ci are the i-th column-vector of Xn and H, respectively. The array-multiplication of X with H i.e., (X H) produces N partial results (ui ), and they are added separately to compute P sub-filter outputs, where ui = x(n − i) · h(i) for (0 ≤ i ≤ N − 1). In a particular computation cycle, X does not change its value, but change its size for different P . Similarly, H does not change its value for a particular PSF but change its size for different P . Interestingly, both X and H re-sizes in the same manner when P changes. Therefore, N partial values obtained from (X H) are remain unchanged when P changes. However, these N partial results are added in different sets to produce the sub-filter outputs of interpolation filter of different P . To illustrate this we have considered a PSF of length N = 8 for two different up-sampling factors P = 2, and 4. The input-matrix and coefficient-matrix of the PSF for P = 2 and 4, and the corresponding partial results of (X H) are shown in Fig.1. As shown in Fig.1, (X H) for P = 2 and 4, produce two sets of 8 partial results each, and these two sets are identical. Out of the two sets, one set of partial values form a redundant set. Computation of the redundant set could be avoided and partial results of one set can be reused for other. In general, out of q sets of partial results corresponding to q different up-sampling factors, (q − 1) sets are redundant. These redundant sets can be avoided by reusing partial values of one set for (P − 1) times. Generation of redundant sets (of partial results) involve matrix resizing and array multiplication of re-sized matrices. Resizing the matrices X and H in a hardware structure is resource and time consuming. Reuse of partial values eliminates the necessity of matrix resizing. Also, it offers a saving of (q − 1)N multiplications when filter computation is performed for q different up-sampling
N Control-unit
hP M −1
Data-selector unit M
M
M
M
Reconfigurable adder unit P Parallel sub-filter outputs
Fig. 2. Full-parallel reconfigurable interpolation filter structure where partial results are shared
A reconfigurable interpolation filter structure is shown in Fig.2, where the partial results are shared for different P . The partial result generation unit produces N partial values of (X H). These partial results are distributed into P groups of M partial results each, where M = N/P , and re-group these partial results when P changes. The data-selector unit select partial results from the partial result generation unit and distribute them into P groups. Finally, each group of partial results are added in separate adder tree (AT) to compute a sub-filter output, and P ATs are required to compute P sub-filter outputs corresponding to up-sampling factor P . The reconfigurable adder unit uses separate sets of ATs for different values of P . The reconfigurable adder unit is configured to select a set of ATs corresponding to a particular P . The complexity of the reconfigurable adder unit is (qN − β) and the complexity of data selector unit is N (q − 1) 2:1 MUXes, where β = p1 +p2 +p3 +· · ·+pq , pi (for 1 ≤ i ≤ q) represent q different up-sampling factors of an interpolation filter. For example: N = 16, p1 = 2, p2 = 4 and p3 = 8, the adder unit involves 34 adders and the data selector unit involves 32 number of 2:1 MUXes. Since, reuse of partial result favors parallel computation of interpolation filter outputs for different up-sampling factors, the data-selector unit can be avoided in the reconfigurable architecture without any extra cost. Overall,
SUBMITTED FOR IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-I, REGULAR PAPERS
a parallel reconfigurable architecture can be designed using the partial result generation unit and the reconfigurable adder unit. We find that using the block-processing scheme the reconfigurable adder unit can be replaced by a fixed adderunit comprising of N adders. Besides, the block-processing scheme offers register sharing as well [22]. Therefore, the area delay efficiency of the reconfigurable interpolation filter could be improved significantly using the block-processing scheme. Keeping this in mind, we have presented a novel block processing scheme in Section III for the reconfigurable interpolation filter. III. B LOCK -F ORMULATION OF FIR I NTERPOLATION F ILTER
where T slk,i = [x(kL − l − P i), · · · , x(kL − l − P i − P + 1)] T
A. Analysis of Block Computation of Interpolation Filter Substituting block-size L = 4, up-sampling factor P = 2 and filter-length N = 8 in (5), we have 0 0 sk,0 s0k,1 s0k,2 s0k,3 yk c0 c1 c2 c3 y1 s1k,0 s1k,1 s1k,2 s1k,3 c0 c1 c2 c3 k ⊗ 2 = 2 2 2 2 yk s s s s c c c c 0 1 2 3 k,0 k,1 k,2 k,3 3 3 3 3 3 yk c0 c1 c2 c3 sk,0 sk,1 sk,2 sk,3 (6) where slk,i = [x(4k − l − 2i), x(4k − l − 2i − 1)]
(7a)
ykl
(7b) (7c)
Similarly, substituting L = 4, P = 4 and N = 8 in (5), we have
yk0
r0k,0
y1 r1k,0 k 2 = 2 yk rk,0 yk3 r3k,0
r0k,1
d0
r1k,1 d0 ⊗ 2 rk,1 d0 d0 r3k,1
d1
d1 d1 d1
T
rlk,i = [x(4k − l − 4i), · · · x(4k − l − 4i − 3)] ykl
T
= [y(16k − 4l), · · · , y(16k − 4l − 3)]
di = [h(2i), h(2i + 1)]
(8)
(9a) (9b)
T
(9c)
Substituting (7a) in (9a), and (7c) in (9c), we have T rlk,i = slk,2i , slk,2i+1 di = [c2i , c2i+1 ]
(10a)
T
(10b)
Rewriting (9b) in split form as (11a)
where T
yklj = [y(16k − 4l − 2j), y(16k − 4l − 2j − 1)] (11b) for j = 0, 1 and l = 0, 1, 2, 3. Substituting (10) and (11) in (8), we have
yk00
s0k,0
s0k,2
c0
c2
yk01 yk10 yk11 = yk20 21 yk yk30
s0k,1
s0k,3 s1k,2 s1k,3 ⊗ s2k,2 s2k,3 s3k,2 s3k,3
c1
c0
c3 c2 c3 c2 c3 c2
c1
c3
yk31
ykl = [y(P kL − lP ), · · · , y(P kL − lP − 1)]
ci = [h(2i), h(2i + 1)]
where
T ykl = ykl0 , ykl1
Let us consider an FIR interpolation filter of up-sampling factor (P ) processes a block of L input samples and generates P filter output blocks of size L each in every cycle. Computation of k-th cycle filter outputs is represented in matrix form as: 0 1 T yk , yk , · · ·, ykL−1 = 0 sk,0 s0k,1 · · s0k,M −1 c0 c1 · · cM −1 1 sk,0 s1k,1 · · s1k,M −1 c0 c1 · · cM −1 ⊗ · · · · · · · · · · L−1 L−1 L−1 c0 c1 · · cM −1 sk,0 sk,1 · · sk,M −1 (5)
= [y(8k − 2l), y(8k − 2l − 1)]
4
s1k,0 s1k,1 s2k,0 s2k,1 s3k,0 s3k,1
c0 c1 c0 c1
(12)
As shown in (6) and (12), two interpolation filters of the same filter-length (N = 8), and up-sampling factors P = 2, and 4, involve identical input-vectors (slk,i ) and coefficientvectors (ci ) for computation of filter outputs. Partial results of (12) form the redundant set. The partial results of (6) can be reused for the partial results of (12). However, the inputvectors and coefficient-vectors of (6) appear at different row and column location of (12), but they occures in a specific pattern. The input-matrix and coefficient-matrix of (6) need to be decomposed to match those of (12). Therefore, we present a modified block formulation to facilitate reuse of partial results in a reconfigurable architecture of interpolation filter. B. Modified Block Formulation for Reconfigurable Interpolation Filter Architecture Column-vectors of input-matrix of (6) are derived from two input data-blocks xk = {x(4k), x(4k − 1), x(4k − 2), x(4k − 3)} and xk−1 = {x(4k − 4), x(4k − 5), x(4k − 6), x(4k − 7)} corresponding to clock cycles k and (k − 1), respectively. From the current input-block xk , a pair of input-vectors slk,0 and slk,1 are derived, while slk,2 and slk,3 are derived from the most recent past input-block xk−1 . To incorporate these information, a different notation is used to represent the inputvectors of (6). The input-vector slk,2j+1 (for L = 4, P = 2 and 0 ≤ l ≤ L − 1, 0 ≤ j ≤ (N/4) − 1, i = 0, 1) is denoted as: l vk−j,i = slk,2j+i
(13)
SUBMITTED FOR IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-I, REGULAR PAPERS
Using (13), input-vectors {slk,0 , slk,1 , slk,2 , slk,3 } are reprel l l l sented as {vk,0 , vk,1 , vk−1,0 , vk−1,1 }, respectively.
5
having up-sampling factors (P = 2, 4 and 8) for block sizes L = 4 are given here.
Substituting (13) in (6) and expressed in decomposed form as:
yk0
0 vk,0
1 1 yk vk,0 2 = y v2 k k,0 3 yk3 vk,0 0 0 vk,1 vk−1,1 1 1 vk,1 vk−1,1 + v2 2 k,1 vk−1,1 3 vk,1
3 vk−1,1
0 vk−1,0
c0
1 c0 vk−1,0 ⊗ 2 vk−1,0 c0 3 c0 vk−1,0 c1 c3 c1 c3 ⊗ c c 1 3 c1
c2
C. For Interpolation Filter P = 2, N = 16 and L = 4
c2 c2
c2
(14)
c3
Similarly, substituting (13) in (12) and expressed in split form as:
yk00
10 yk 20 y k yk30 01 yk 11 yk 21 y k yk31
0 vk,0
1 vk,0 = v2 k,0 3 vk,0 0 vk,1 1 vk,1 = v2 k,1 3 vk,1
0 vk−1,0
c0
c2
1 vk−1,0 ⊗ 2 vk−1,0 3 vk−1,0 0 vk−1,1 1 vk−1,1 ⊗ 2 vk−1,1 3 vk−1,1
c0 c0
c2 (15a) c2
c0
c2
c1
c3
c1 c1
c3 (15b) c3
c1
c3
Using the proposed block formulation, partial results of a set of interpolation filters with selected up-sampling factors can be reused. A set of interpolation filters must have a minimum filter length to take advantage of the proposed scheme. However, the minimum filter length is not a binding factor and it could be relaxed by zero padding. Table I lists different sets of interpolation filters according to their upsampling factors and the minimum required filter length for each set. The input block size could be equal to the upsampling factor of a given set. TABLE I D IFFERENT S ETS OF U P - SAMPLING FACTORS FOR R EUSING PARTIAL R ESULTS Set of up-sampling factors
Minimum filter length
{2,4}
8
{2,4,8}
16
{2,4,8,16}
32
{3,6}
12
{3,6,12}
24
{5,10}
32
{5,10,20}
40
{7,14,28}
56
{9,18,36}
72
{11,22,44}
88
Modified block formulation for a set of interpolation filters
yk0
0 vk,0
0 vk−2,0
c0
c4
1 1 1 c0 c4 yk vk,0 vk−2,0 ⊗ 2 = y v2 2 c0 c4 k k,0 vk−2,0 3 3 yk3 c0 c4 vk,0 vk−2,0 0 0 vk−1,0 vk−3,0 c2 c6 1 1 c2 c6 vk−1,0 vk−3,0 ⊗ + c c v2 2 v 2 6 k−1,0 k−3,0 3 3 c2 c6 vk−1,0 vk−3,0 0 0 vk,1 vk−2,1 c1 c5 1 1 vk,1 vk−2,1 c1 c5 ⊗ + v2 c c 2 v 1 5 k,1 k−2,1 3 3 c1 c5 vk,1 vk−2,1 0 0 vk−1,1 vk−3,1 c3 c7 1 1 vk−1,1 vk−3,1 c3 c7 ⊗ + v2 2 k−1,1 vk−3,1 c3 c7 3 vk−1,1
3 vk−3,1
c3
(16)
c7
D. For Interpolation Filter P = 4, N = 16 and L = 4
0 yk00 vk,0 10 1 yk vk,0 20 = y v2 k k,0 3 yk30 vk,0 0 0 vk−1,0 vk−3,0 1 1 vk−1,0 vk−3,0 + v2 2 k−1,0 vk−3,0
0 vk−2,0
1 c0 vk−2,0 ⊗ 2 vk−2,0 c0 3 c0 vk−2,0 c2 c6 c2 c6 ⊗ c c 2 6
c4 c4
0 vk−2,1 1 vk−2,1 2 vk−2,1 3 vk−2,1
c5
3 3 vk−1,0 vk−3,0 0 yk01 vk,1 11 1 yk vk,1 21 = y v2 k k,1 3 yk31 vk,1 0 0 vk−1,1 vk−3,1 1 1 vk−1,1 vk−3,1 + v2 2 k−1,1 vk−3,1 3 3 vk−1,1 vk−3,1
⊗
c2
c6
c0
c1
c1 ⊗ c 1 c1 c3 c7 c3 c7 c3 c7 c3
c4
c4
(17a)
c5 c5 c5
(17b)
c7
where, yklj = {y(16k − 8l − 2j), y(16k − 8l − 2j − 1)} for 0 ≤ l ≤ 3, and 0 ≤ j ≤ 1.
SUBMITTED FOR IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-I, REGULAR PAPERS
E. For Interpolation Filter P = 8, N = 16 and L = 4 yk000
100 yk 200 y k yk300 010 yk 110 yk 210 y k yk310 001 yk 101 yk 201 y k yk301 011 yk 111 yk 211 y k yk311
0 vk,0
0 vk−2,0
c0
c4
1 1 c0 c4 vk,0 vk−2,0 ⊗ = c c v2 2 v 0 4 k,0 k−2,0 3 3 c0 c4 vk,0 vk−2,0 0 0 vk−1,0 vk−3,0 c2 c6 1 1 vk−1,0 vk−3,0 c2 c6 ⊗ = c c v2 2 v 2 6 k−1,0 k−3,0 3 3 c2 c6 vk−1,0 vk−3,0 0 0 vk,1 vk−2,1 c1 c5 1 1 vk,1 vk−2,1 c1 c5 ⊗ = v2 2 c1 c5 k,1 vk−2,1 3 vk,1
=
0 vk−1,1 1 vk−1,1 2 vk−1,1 3 vk−1,1
3 vk−2,1
0 vk−3,1 1 vk−3,1 2 vk−3,1 3 vk−3,1
c1
c3 c3 ⊗ c 3 c3
(18a)
Input-block
Input-vector generation unit
(18b)
Arithmetic unit
Outputblocks of IF8
(18c)
c5
c7
architecture. The MUX-based CSU is used for J ≤ 4 to avoid longer critical path delay due to MUX, and the ROM-based CSU is preferred for J > 4. The required coefficient-vector of a particular interpolation filter is selected in one cycle from the CSU.
Coefficient selection unit
6
c7 (18d) c7 c7
where, yklij = {y(32k−8l−4i−2j), y(32k−8l−4i−2j −1)} for 0 ≤ l ≤ 3, 0 ≤ i ≤ 1 and 0 ≤ j ≤ 1. Block formulations for other sets of up-sampling factors of Table I can also be derived in the similar form as (16), (17) and (18). Using the modified block formulation, we have derived a parallel reconfigurable architecture for interpolation filter. The proposed architecture has many interesting features. These are summarized here. 1) It computes filter output of multiple up-sampling factors in parallel without using any additional resource or computation time. 2) It does not require reconfiguration to compute filter outputs of the interpolation filter for different up-sampling factors. A different coefficient-vector is selected from the coefficient storage unit to configure the proposed architecture for a different filter specification. 3) It does not involve any overhead complexity for its reconfigurable feature unlike the existing structures. 4) It has the same register complexity as the conventional parallel interpolation filter structure and its register complexity is independent of the block-size. IV. P ROPOSED R ECONFIGURABLE A RCHITECTURE Computation of (16), (17) and (18) are mapped to a parallel architecture for realization of an interpolation filter of upsampling factors p1 = 2, p2 = 4 and p3 = 8 (IF2 , IF4 and IF8 ). The proposed reconfigurable architecture is shown in Fig.3. It has three main units, (i) coefficient selection unit (CSU), (ii) input-vector generation unit (VGU), and (iii) arithmetic-unit. The CSU is comprised of N number of J:1 MUXes or N number of ROM LUTs of depth J words each, where N is the filter length and J is the number of interpolation filters of different coefficient vector to be realized in the reconfigurable
Outputblocks of IF4
Outputblocks of IF2
Fig. 3. Proposed reconfigurable architecture for a set of interpolation filters of up-sampling factors p1 = 2, p2 = 4, p3 = 8, filter length N = 16 and input-block size L = 4.
The VGU receives one input-block in each cycle and generates (N/p1 ) input-vectors of size (Lp1 ) each in parallel, where p1 is the smallest up-sampling factor from a set of q different up-sampling factors to be realized in the reconfigurable architecture. Internal structure of the VGU is shown in Fig.4 for p1 = 2, N = 16 and L = 4. It is comprised of (N − 1) registers. The VGU receives a block of input in every cycle and produces 8 data-vectors {Sk,0 , Sk,1 , Sk−1,0 h , Sk−1,1 , Sk−2,0 , Sk−2,1 , Sk−3,0 i , Sk−3,1 }, 0 1 2 3 l where Sk−i,j = vk−i,j , vk−i,j , vk−i,j , vk−i,j , and vk−j,i = l sk,2j+i = {x(4k − l − 4j − 2i), x(4k − l − 4j − 2i − 1)}. Inputsamples {x(4k), x(4k−1), x(4k−2), x(4k−3)} and 15 recent past samples {x(4k−4), x(4k−5), · · · x(4k−17), x(4k−18)} are distributed in overlap manner to obtain all the required data-vectors. xk x(4k)
R1
R5
R9
R13
x(4k-1)
R2
R6
R10
R14
x(4k-2)
R3
R7
R11
R15
x(4k-3)
R4
R8
R12
Sk,0
Sk,1
Sk-1,0
Sk-1,1
Sk-2,0
Sk-2,1
Sk-3,0
Sk-3,1
Fig. 4. Internal structure of the vector generation unit (VGU) for a set of up-sampling factors {p1 = 2, p2 = 4, p3 = 8}, filter length N = 16, and block-size L = 4.
The structure of AU is shown in Fig.5 for a set of interpolation filters IF2 , IF4 and IF8 with up-sampling factor p1 = 2, p2 = 4, p3 = 8, block-size L = 4 and filter length N = 16. It is comprised of [(N/p1 ) = 8] multiplier units (MUs) and [((N/p1 ) − 1) = 7] adder-units (ADU). Each MU receives an (Lp1 )-point input-vector Sk−j,i from the VGU and a short p1 -point coefficient-vector cm from the CSU, and calculates one partial filter output-vector zk,m
SUBMITTED FOR IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-I, REGULAR PAPERS
of size (N/p1 ), for 0 ≤ j ≤ L − 1, 0 ≤ i ≤ p1 − 1, and 0 ≤ m ≤ Lp1 − 1. Internal structure of the MU and ADU are shown in Fig.6. As shown in Fig.5, partial outputvectors (zk,0 , zk,4 ), (zk,1 , zk,5 ), (zk,2 , zk,6 ) and (zk,0 , zk,4 ) are added in four separate ADUs (ADU1 , ADU2 , ADU3 , ADU4 ) to compute filter output-blocks (Yk00 , Yk01 , Yk10 , Yk11 ) of IF8 , where Ykij = [yk0ij , yk1ij , yk2ij , yk3ij ], and yklij = {y(32k − 8l − 4i − 2j), y(32k − 8l − 4i − 2j − 1)}, for 0 ≤ l ≤ 3, i = 0, 1 and j = 0, 1. Interestingly, the outputvectors (Yk00 , Yk01 , Yk10 , Yk11 ) also represent the partial filter output of IF4 , and these partial output-vectors are added in ADU5 and ADU6 to obtain the complete filter output vectors (Yk0 , Yk1 ) of IF4 , where Ykj = [yk0j , yk1j , yk2j , yk3j ], and yklj = {y(16k − 8m − 2j), y(32k − 8l − 2j − 1)}, for 0 ≤ l ≤ 3 and j = 0, 1. Similarly, the output-vectors (Yk0 , Yk1 ) also represent the partial filter outputs of IF2 , and they are added in ADU7 to obtain the complete filter output vector (Yk ) of IF2 , where Yk = [yk0 , yk1 , yk2 , yk3 ], and ykl = {y(8k − 2l), y(8k − 2l − 1)}, for 0 ≤ l ≤ 3. From coefficient selection unit c0 Sk,0
2
c1 Sk,1
8
2
c2 Sk-1,0
8
MU-1 8 zk,0
8
MU-2
c3 Sk-1,1
zk,2
c4 Sk-2,0
2
c5 Sk-2,1
8
MU-4
8
zk,1
2
8
MU-3
8
2
8
MU-5
8
8
c6 Sk-3,0
2
8
MU-6 zk,5
8 zk,6
ADU1
c7 Sk-3,1
2
8
MU-7
8
zk,4
zk,3
2
MU-8 8 zk,7
ADU3
ADU2
ADU4
ADU5 ADU6 8 00 Yk
8 01 Yk
8 0 Yk
ADU7 8 Yk
8
8 1 Yk
Yk10
8 11 Yk
Fig. 5. Structure of the arithmetic unit (AU) for a set of up-sampling factors {p1 = 2, p2 = 4, p3 = 8}, filter length N = 16, and block-size L = 4. c0 h(0)
h(1)
vk0,0(0) vk0,0 (1)
Sk,0
Sk,0 c0
vk1,0 (0) vk1,0 (1) vk2,0 (0) vk2,0 (1) vk3,0 (0) vk3,0 (1)
a1 a2 a3 a4 a5 a6 a7 a8
b1 b2 b3 b4 b5 b6 b7 b8 z1 z2 z3 z4 z5 z6 z7 z8
(a)
(b)
Fig. 6. (a) Internal structure of first multiplier unit (MU). (b) Internal structure of adder-unit (ADU)
The proposed architecture reuse the partial results for parallel computation of filter outputs of different up-sampling factors. It does not require reconfiguration to compute filter outputs of a particular interpolation filter for different upsampling factors, and configured only when there is a need to change the filter specification. In that case, a coefficient-vector of the desired filter is selected from the CSU and fed to the AU to perform the filter computation. The VGU and AU form the core of the proposed structure and they do not require any reconfiguration to change the filter computation. Therefore, the proposed architecture offers reconfigurabilty without using any overhead complexity unlike the existing reconfigurable
7
architectures. As shown in Fig.5, the filter outputs of IF8 (up-sampling factor 8) are added separately to obtain the filter outputs of IF4 (up-sampling factor 4). Similarly, the outputs of of IF4 are further added to obtain the filter outputs of IF2 . Therefore, it is always an advantage to realize the proposed architecture for the lowest up-sampling factor (IF2 ), and filter outputs of higher up-sampling factors of a given set of up-sampling factors can be obtained in parallel without performing any extra computation. This is an unique feature of the proposed structure. When the structure is implemented to compute filter output of any one up-sampling factor of a given set of up-sampling factors, then it also provides filter outputs of all other up-sampling factors. The structure produces filter outputs at multiple sampling frequency of an input sampling frequency. One has to select the filter output of desired sampling rate from the output port. Besides, the complexity of the VGU is independent of input block-size, while the complexity of AU increases proportionately with the blocksize. Overall, the complexity of the proposed architecture is independent of up-sampling factor and it does not increase proportionately with the blocks-size. Therefore the area-delay efficiency of the proposed architecture is expected to be better for higher block-sizes. V. H ARDWARE -T IME C OMPLEXITIES The proposed architecture consists of one CSU, one VGU and one AU. The CSU is comprised of either N number of J : 1 MUXes or the same number of ROM LUTs of depth J words each. The CSU complexity of all the reconfigurable FIR structures is the same for a given value of J. Therefore, we have excluded CSU of all the competing designs for performance comparison. The VGU is comprised of (N − 1) registers, where the AU is comprised of (N/p1 ) MUs and [(N/p1 ) − 1] ADUs. Each MU is further comprised of (Lp1 ) multipliers where each ADU is comprised of (Lp1 ) adders. The AU involves N L multipliers and L(N − p1 ) adders. Therefore, the core of the proposed reconfigurable interpolation filter architecture involves N L multipliers, L(N − p1 ) adders and (N − 1) registers. The proposed architecture is configured for a specific pulse shaping filter and compute filter outputs corresponding to all the up-sampling factors of a given set irrespective of the choice. In every cycle, it compute Lp1 , Lp2 , Lp3 parallel filter outputs corresponding to the up-sampling factors p1 , p2 , p3 , respectively. Therefore, it computes L(p1 + p2 + p3 ) filter outputs in every cycle, where a cycle period T = TM + TA + TF A (log2 (N/p1 ) − 1), where TM , TA , and TF A are respectively, one multiplier delay, one adder delay, and one full-adder delay. It is interesting to note that, the hardware complexity of the proposed structure is independent of up-sampling factor and compute filter outputs of all up-sampling factors. A. Performance Comparison A few reconfigurable interpolation filter architectures are proposed in the literature. Recently, a multiplier-less reconfigurable architecture is proposed in [20]. We have extracted an equivalent multiplier-based design from the multiplier-less
SUBMITTED FOR IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-I, REGULAR PAPERS
8
TABLE II G ENERAL C OMPARISON OF H ARDWARE AND T IME C OMPLEXITIES OF THE P ROPOSED S TRUCTURE AND THE S TRUCTURE OF [20] (p1 , p2 , p3 : U P - SAMPLING FACTORS OF RECONFIGURABLE INTERPOLATION FILTER , P : UP - SAMPLING FACTOR OF FIXED INTERPOLATION FILTER , N : FILTER LENGTH , L: INPUT BLOCK SIZE , w: INPUT BIT- WIDTH ) Structures
Multiplier
Adder
ROM (word)
Register
MUX/DMUX
AND-gate (word)
CP
Throughput
Son et al [18]
0
N/P
2(N/2P )−1
N +1
P
0
T1
2/(wT1 )
0
N/p1
0
N (α/p3 + 1/p1 )
N (α − 4)/p3
N/p1
N/p1
(N/p1 ) − 1
0
N α/p3
N (α − 4)/p3
0
Hatai et al [20] (multiplier-less) Hatai et al [20] (multiplier)
T2 T3
Proposed (p3 )
1/T2 Lp1 /T3
Proposed (p1 ) Proposed (p2 )
1/(wT1 )
NL
L(N − p1 )
0
N −1
0
0
T4
Lp2 /T3 Lp3 /T3
α = 2p1 + p2 + p3 , T1 = TA + 3TM X + TM R , T2 = 2TA + TM X + TAN D , T3 = TM + TA + TF A (log2 (N/p1 ) − 1) + TM X , T4 = TM + TA + TF A (log2 (N/p1 ) − 1). TM : multiplier delay, TA : adder delay, TM R : memory-read time, TM X : 2-1 multiplexer delay, TF A : full-adder delay and TAN D : 2-input AND-gate delay.
design of [20] for uniform comparison of area and delay complexities. Hardware and time complexities of DA-based fixed interpolation filter structure of [18], and the extracted multiplier-based design and the existing multiplier-less design of [20] along with those of the proposed structure are listed in Table II for comparison. As shown in Table II, the proposed architecture involves nearly (α/p3 ) times less registers, (L/p1 ) times more multipliers and adders than those of the multiplierbased design of [20], and offers more than L times higher throughput, where α = 2p1 + p2 + p3 for (p1 < p2 < p3 ), and p1 , p2 and p3 are the up-sampling factors. Compared with the multiplier-less design of [20], the proposed design involves N L multipliers instead of (N/p1 ) AND-gates (wordlevel), nearly (L/p1 ) times more adders and (α/p3 ) times less registers than those of the multiplier-based design of [20], and it offers more than wL times higher throughput, where w is the input bit-width. Compared with [18], the proposed structure involves nearly N L multipliers and adders instead of (2(N/2P )−1 ) ROM words and offers (wL/2) times more throughput rate. However, the proposed structure can be reconfigured for three different up-sampling factors where the structure of [18] is for up-sampling factor 4 only. We can find from Table II that the existing reconfigurable interpolation filter architectures have constant throughput rate (output sampling rate) where the throughput rate of the proposed design increases proportionately with the up-sampling factor. This is mainly due to the fact that the existing design [20] calculates only one sub-filter output in each cycles. A poly-phase structure of an interpolation filter (with upsampling factor P ) has P number of sub-filters. Therefore, the multiplier-based design of [20] takes P cycles to compute a set of P sub-filter outputs corresponding to each input sample, where the multiplier-less design takes P w cycles to compute the same set of P sub-filter outputs. Therefore, the maximum input sampling rate (MISR) and output sampling rate (OSR) of multiplier-less and multiplier-based structures of [20] are {(1/wP T ), (1/wT )}, and {(1/P T ), (1/T )}, respectively, where T is the cycle period. However, the proposed design computes LP sub-filter outputs corresponding to a block of L input samples in each cycle. The MISR and OSR of the proposed design is (L/T ) and (LP/T ), respectively. In
other word, the input sampling rate of the proposed design is independent of P , while it decreases proportionately with P in case of existing structure [20]. The MISR of the proposed design can be increased flexibly by changing the input-block size where in case of [20], it is solely depend on the cycle period. The proposed design has higher MISR than the existing design [20]. It does not involves multiplexers unlike the existing design to reconfigure the structure for different upsampling factors. The register complexity of the proposed structure is independent of input blocks-size and up-sampling factor where it increases proportionately with α in existing design. The register and MUX saving offers a significant reduction of area and power consumption in the proposed design. We have made a theoretical estimate of hardware and time complexities of the proposed design and the similar reconfigurable architectures of [20] for filter length N = 32, up-sampling factors (p1 = 2, p2 = 4, and p3 = 8), and input bit-width w = 12. The estimated values are listed in Table III for comparison. As shown in Table III, the proposed structure for block-size 4 involves 8 times more multipliers and adders, nearly 2 times less registers than those of the multiplierbased design of [20] and it supports 8 times, 16 times and 32 times higher input sampling rate than the design of [20] for p1 = 2, p2 = 4, and p3 = 8, respectively. Compared with the multiplier-less design of [20], the proposed one involves 128 multipliers against 16 AND-gates, 7 times more adders, nearly 2.6 times less registers, and it has 96 times, 192 times and 384 times higher input sampling rate than the multiplier-less design of [20] for p1 = 2, p2 = 4, and p3 = 8, respectively. B. Synthesis Results To validate the proposed design, we have coded it in VHDL for filter length N = 16 and 32 with block-size 4. We have also coded both the multiplier and multiplier-less designs of [20] for the same filter lengths. We have considered input signal width w = 12 and used a post-truncated fixed-width Booth multiplier. All the designs are synthesized by Synopsys Design Compiler using TMSC 65nm CMOS library. The results obtained from synthesis reports generated by Design Compiler are listed in Table IV.
SUBMITTED FOR IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-I, REGULAR PAPERS
9
TABLE III C OMPARISON OF H ARDWARE AND T IME C OMPLEXITIES OF P ROPOSED S TRUCTURE AND THE S TRUCTURE OF [20] FOR INPUT BIT- WIDTH w = 12 Structures
Filter
up-sampling
length (N )
(P )
MULT
ADD
REG
MUX
AND
2
Hatai et al [20] 32
(multiplier-less)
CP
MISR
MOSR
(fi )
(fo )
T1 = 2TA + 3TF A
1/(24T1 )
1/(12T1 )
+TM X + TAN D
1/(48T1 )
1/(12T1 )
1/(96T1 )
1/(12T1 )
T2 = TM + TA
1/(2T2 )
1/T2
+3TF A + TM X
1/(4T2 )
1/T2
1/8T2
1/T2
gate 0
4
16
80
48
16
8 2
Hatai et al [20] 32
(multiplier)
16
4
15
64
48
0
8 Proposed
2
(L = 4)
4
32
128
8
112
31
0
0
4/T3
8/T3
T3 = TM + TA
4/T3
16/T3
+3TF A
4/T3
32/T3
LEGEND: MULT: multiplier, ADD: adder, REG: register, CP: cycle period, MISR: maximum input sampling frequency, MOSR: maximum output sampling frequency. TABLE IV S YNTHESIS R ESULT OF P ROPOSED S TRUCTURE AND S TRUCTURES OF [20] USING TMSC 65 NM CMOS S TANDARD C ELL L IBRARY Filter
MCP
Area
Power
(length N )
(ns)
(um2 )
(mW)
Hatai [20]
16
0.76
8810.28
2.8519
(multiplier-less)
32
0.85
18175.68
4.8844
Hatai [20]
16
1.20
20893.68
7.9023
(multiplier)
32
1.40
39803.4
14.3661
Proposed
16
1.12
127936.8
89.6727
(L = 4)
32
1.38
230512.68
121.3106
TABLE V M AXIMUM I NPUT- SAMPLING RATES OF P ROPOSED S TRUCTURE AND S TRUCTURES OF [20] Structures
Filter
Max. input-sampling frq. (MHz)
(length N )
p1 = 2
p2 = 4
p3 = 8
Hatai [20]
16
54.82
27.41
13.75
(multiplier-less)
32
49.01
24.50
12.25
Hatai [20]
16
416.67
208.38
104.19
(multiplier)
32
357.14
178.57
89.28
Proposed
16
3571.43
3571.43
3571.43
(L = 4)
32
2898.55
2898.55
Hatai (Mult) [20]
Hatai (Mult-less) [20]
Proposed 1483.1
1500.0 ADP in sq.um.us
Structures
V. As shown in Table V, the MISR of [20] is decreasing with the up-sampling factor, while in case of the proposed structure, it is independent of up-sampling factor and increases proportionately with the input block-size. The multiplier-less design of [20] performs filter computation bit-serially and it has the lowest MISR than others. From Table III and Table V, we can find that the proposed structure involves 5.95 times more area, and it has 8.3 times, 16.7 times and 33 times higher MISR than those of the multiplier-based design of [20] for up-sampling factor 2, 4 and 8, respectively, on average for different filter sizes. Compared with the multiplier-less design of [20], the proposed one involves 13.6 more area and offers 62 times, 124 times and 245 times more MISR for up-sampling factor 2, 3 and 8 on average for different filter sizes.
1000.0
741.6 445.8
370.8
500.0
222.9
111.5
79.5
79.5
79.5
0.0
2898.55
P=2
P=4
P=8
Up-sampling factor (P)
Fig. 7.
Comparison of area-delay-product (ADP). Hatai (Mult) [20] Eenery per output (EPO) in nJ
As shown Table IV, multiplier-less structure of [20] has the lowest minimum clock period (MCP) than the multiplier-based structures due to absence of multiplier in the critical path. The MCP of proposed structure and the multiplier-based structure of [20] are equal or marginally differ due to TM X . In-spite of 8 times higher multiplier complexity than the multiplierbased structure of [20] the proposed structure involves 5.95 times more area on average for different filter lengths. This is mainly due to the register saving. The propose structure dissipates nearly 31 times and 24 times more power than the multiplier-based structure of [20] for filter length 16 and 32, respectively. The higher power dissipation is due to extra arithmetic (multiplier and adders) complexity and higher input sampling frequency than the existing structure of of [20]. We have estimated the MISR using MCP and the formula given in Table III. The estimated values are listed in Table
Fig. 8.
49.8
Hatai (Mult-less) [20] 49.8
Proposed 49.8
50.0 40.0 30.0
20.9
20.1
20.0
20.1
20.1 10.5
5.2
10.0 0.0 P=2
P=4 Up-sampling factor (P) Comparison of energy per output (EPO).
P=8
We have estimated area-delay-product (ADP1 ) and energy 1 ADP=Area/Maximum
input-sampling frequency
SUBMITTED FOR IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-I, REGULAR PAPERS
per output (EPO2 ) of all the designs. The estimated ADP and EPO values are shown in Fig.7 and Fig.8 for N = 32. As shown in these figures, the proposed structure has significantly less ADP and EPO than the existing structure. It has 5.6 times less ADP and 3.9 times less EPO than those of multiplierbased design of [20] for up-sampling factor 8. Compared with the multiplier-less design of [20], the proposed design has 18.6 times less ADP and 9.5 times less EPO. VI. C ONCLUSION In this paper, we made an analysis of interpolation filter computation for different up-sampling factors to identify redundant computations and removed those by reusing partial results. Reuse of partial results eliminates the necessity of matrix resizing in interpolation filter computation. A novel block-formulation is presented to share the partial results for parallel computation of filter outputs of different upsampling factors. Using the proposed block formulation, a parallel reconfigurable architecture is derived for interpolation filter. The most remarkable aspect of the proposed architecture is that, it does not require reconfiguration to compute filter outputs of an interpolation filter for different up-sampling factors. The proposed structure has regular data-flow and it has no overhead complexity for its reconfigurable feature unlike the existing structures. Besides, the proposed structure has significantly less register complexity than the existing structure and its register complexity is independent of the block-size. Moreover, the proposed structure can support higher inputsampling frequency than the existing structure. ASIC synthesis result shows that the proposed structure for block-size 4, filter length 32, and up-sampling factor 8, involves 13.6 times more area and offers 245 times higher maximum input-sampling frequency compared with the existing multiplier-less structure [20]. It involves 18.6 times less ADP and 9.5 times less EPO than the existing structure of [20]. Due to less ADP, EPO and high input sampling frequency, the proposed reconfigurable architecture is a good candidate for efficient realization of digitally up-converters for multi-standard SDR applications.
10
[8] S. S. Demirsoy, I. Kale, and A. G. Dempster, ”Efficient implementation of digital filters using novel reconfigurable multiplier blocks,” in Proc. 38th Asilomar Conf. Signals Syst. Comput., vol. 1. Nov. 2004, pp. 461– 464. [9] K. Muhammad and K. Roy, ”Reduced computational redundancy implementation of DSP algorithms using computation sharing vector scaling,” IEEE Trans. Very Large Scale Integr. Syst., vol. 10, no. 3, pp. 292–300, Jun. 2002. [10] J. Park, W. Jeong, H. Mahmoodi-Meimand, Y. Wang, H. Choo, and K. Roy, ”Computation sharing programmable FIR filter for low-power and high-performance applications,” IEEE J. Solid State Circuits, vol. 39, no. 2, pp. 348–357, Feb. 2004. [11] R. Mahesh and A. P. Vinod, ”New reconfigurable architectures for implementing FIR filters with low complexity,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 29, no. 2, pp. 275–288, Feb. 2010. [12] W. A. Abu-Al-Saud and G. L. Stuber, ”Efficient wideband channelizer for software radio systems using modulated PR filter banks,” IEEE Trans. Signal Process., vol. 52, no. 10, pp. 2807–2820, Oct. 2004. [13] R. Mahesh and A. P. Vinod, ”Reconfigurable frequency response masking filters for software radio channelization,” IEEE Trans. Circuits Syst. II, vol. 55, no. 3, pp. 274–278, Mar. 2008. [14] R. Mahesh and A. P. Vinod, ”Low complexity flexible filter bank for uniform and non-uniform channelisation in software radios using coefficent decimation,” IET Circuits Devices Syst.,, vol. 5, no. 3, pp. 232–242, Feb. 2011. [15] R. Mahesh and A. P. Vinod, E. M.-K. Lai and A. Omondi, ”Filter bank channelizers for multi-standard software defined radio receivers,” J. Sign. Process. Syst., Spinger, vol. 62,pp. 157–171, 2011. [16] R. Mahesh and A. P. Vinod, ”Reconfigurable low area complexity filter bank architecture based on frequency response masking for nonuniform Channelization in software defined radio,” IEEE Trans. Aerospace and Electronic Systems, vol. 47, no. 2, pp. 1241–1254, April 2011. [17] J.-K. Choi and S.-Y Hwang, ”Area-efficient pulse-shaping 1:4 interpolated FIR filter based on LUT partitioning”, IEEE Electronics Letters, vol. 35, no. 18, pp. 1504-1505, Sept. 1999. [18] Y. Son, K. Ryoo and Y. Kim, ”1:4 interpolation FIR filter,” IEEE Electronics Letters, vol. 40, no. 25, pp. 1570-1572, Dec., 2004. [19] G. C. Cardarilli, A. D. Re, M. Re, L. Simone, Optimized QPSK Modulator for DVB-S Application, Proc. of IEEE International Symposium on Circuits and Systems, ISCAS-2006, pp. 1571-1574, 2006. [20] I. Hatai and I. Chakrabarti and S. Banerjee, ”Reconfigurable architecture of RRC FIR interpolator for multi-standard digital up converter,” In Proc. IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum, pp. 247–251, 2013. [21] Y.-T. Kuo, T.-J. Lin, Y.-T. Li, and C. W. Liu, ”Design and implementation of low-power ANSI S1.11 filter bank for digital hearing ,” IEEE Trans. Circuits Syst. I, Regular Papers, vol. 57, no. 7, pp. 1684–1696, April 2010. [22] B. K. Mohanty, P. K. Meher, S. A-Madeed A. Amira ”Memory footprint reduction for power-efficient realization of 2-D finite impulse response filters”, IEEE Transactions on Circuits and Systems-I, Regular Papers, vol. 61, no. 1, pp.120-133, Jan. 2014.
R EFERENCES [1] P. P. Vaidyanathan, ”Multirate systems and filter banks”, Prentice Hall Signal Processing Series, 1993. [2] N. A. Sheikholeslami, and P. Kabal, ”Generalized raised cosine filters,” IEEE Trans. on Communications, vol. 47, no. 7, pp. 989–997, 1999. [3] E. Buracchini, ”The software radio concept,” IEEE Commun. Mag.,, vol. 38, pp. 138–143, Sept. 2000. [4] T. Solla, R. Makela, M. Liljeroos, and O. Vainio, ”Application-specific filter processor for flexible receivers,” in Proc. 19th NORCHIP Conf., Kista, Sweden, pp. 53–58, Nov. 2001. [5] T. Zhangwen, J. Zhang, and H. Min, ”A high-speed, programmable, CSD coefficient FIR filter,” IEEE Trans. Consumer Electron., vol. 48, no. 4, pp. 834–837, Nov. 2002. [6] K. H. Chen and T. D. Chiueh, ”A low-power digit-based reconfigurable FIR filter,” IEEE Trans. Circuits Syst. II, vol. 53, no. 8, pp. 617–621, Aug. 2006. [7] X. Chenghuan, C. He, Z. Shunan, and W. Hua, ”Design and implementation of a high-speed programmable polyphase FIR filter,” in Proc. 5th Int. Conf. Applicat.-Specific Integr. Circuit, vol. 2, pp. 783–787, Oct. 2003. 2 EPO=Power/(Maximum
input sampling frequency× Throughput
Basant K Mohanty (M’06, SM’11) received M.Sc degree in Physics from Sambalpur University, India, in 1989 and received Ph.D degree in the field of VLSI for Digital Signal Processing from Berhampur University, Orissa in 2000. In 2001 he joined as Lecturer in EEE Department, BITS Pilani, Rajasthan. Then he joined as an Assistant Professor in the Department of ECE, Mody Institute of Education Research (Deemed University), Rajasthan. In 2003 he joined Jaypee University of Engineering and Technology, Guna, Madhya Pradesh, where he becomes Associate Professor in 2005 and full Professor in 2007. His research interest includes design and implementation of low-power and high-performance systems for adaptive filters, image and video-processing applications, secured communication and reconfigurable architectures. He has published nearly 50 technical papers. Dr.Mohanty is serving as Associate Editor for the Journal of Circuits, Systems, and Signal Processing. He is a life time member of the Institution of Electronics and Telecommunication Engineering, New Delhi, India, and he was the recipient of the Rashtriya Gaurav Award conferred by India International friendship Society, New Delhi for the year 2012.