This article has been accepted. The official version will be published in April, 2018 issue of IEICE Trans. on Fundamental. IEICE TRANS. ??, VOL.Exx–??, NO.xx XXXX 200x
1
PAPER
Approximate-DCT-Derived Measurement Matrices with Row-Operation-Based Measurement Compression and its VLSI Architecture for Compressed Sensing Jianbin ZHOU†a) , Nonmember, Dajiang ZHOU, Takeshi YOSHIMURA† , Members, and Satoshi GOTO† , Fellow
SUMMARY Compressed Sensing based CMOS image sensor (CS-CIS) is a new generation of CMOS image sensor that significantly reduces the power consumption. For CS-CIS, the image quality and data volume of output are two important issues to concern. In this paper, we first proposed an algorithm to generate a series of deterministic and ternary matrices, which improves the image quality, reduces the data volume and are compatible with CS-CIS. Proposed matrices are derived from the approximate DCT and trimmed in 2D-zigzag order, thus preserving the energy compaction property as DCT does. Moreover, we proposed matrix row operations adaptive to the proposed matrix to further compress data (measurements) without any image quality loss. At last, a low-cost VLSI architecture of measurements compression with proposed matrix row operations is implemented. Experiment results show our proposed matrix significantly improve the coding efficiency by BD-PSNR increase of 4.2 dB, comparing with the random binary matrix used in the-state-of-art CS-CIS. The proposed matrix row operations for measurement compression further increases the coding efficiency by 0.24 dB BD-PSNR (4.8% BD-rate reduction). The VLSI architecture is only 4.3 K gates in area and 0.3 mW in power consumption. key words: Compressed sensing, Approximate DCT, Measurement coding, Measurement matrix, Intra prediction, VLSI architecture, CMOS Image Sensor
1.
Introduction
CMOS image sensor (CIS) has attracted a huge amount of researches for the last decades. As most of the CIS applied in the mobile systems, the power consumption becomes a main concern. CIS first converts the analog luminance signal acquired into a digital one pixel by pixel, then compresses the image to reduce the data amount for the storage or for the further transmission, which is a capture (pixel)→compress (pixel) process. With the increase of resolution and frame rate in the recent years, however, the low-power design becomes a challenge. Since it is found that the Analog-toDigital (A/D) conversions followed by the output readout is the main power consumption in CMOS image sensor, increases linearly as least in resolution and frame rate. [1] With the advent of a recently proposed sampling theory, Compressed Sensing (CS) [2], an image could be acquired by capturing a significantly reduced number of measurements (the linear combination of pixels), instead of by capturing pixel by pixel. Such image sensors are called CS-based CIS (CS-CIS). The luminance signals are linearly Manuscript received January 1, 2017. Manuscript revised January 1, 2017. † The author is with the Graduate School of Information Production and Systems, Waseda University, Kitakyushu,808-0135 Japan a) E-mail:
[email protected] DOI: 10.1587/trans.E0.??.1
combined into a measurement in analog domain, followed by the A/D conversion. Thus, the throughput of A/D conversion could be reduced, which results into a significant reduction in power consumption, as shown in the recent CSCIS [1, 3, 4]. The output of CS-CIS – measurements are further compressed before the transmission. This is a capture (measurements)→compress (measurements) process. In this measurement-based process, there’re two issues to concern: how to increase the image quality and how to reduce the size of measurements. For the image quality, the measurement matrix plays a major role. It decides how pixels get combined into measurements. The binary/ternary measurement matrix Φ is frequently used due to its simplicity in controlling the linear combination, which is achieved by the sum of current in in analog domain, as shown in Fig. 1. The binary/ternary matrix controls the switches to tell whether a pixel to be added or subtracted so that measurements are calculated by analog addition and differential integration [1, 3, 4]. However, the image quality of the binary/ternary measurement matrix being used is not satisfied, comparing with the Gaussian matrix. But the Gaussian matrix is not suitable for real implementation, because floating point elements in the matrix makes linear combinations hard to implement, requiring complex and energy-consuming analog multiplier. Moreover, several binary/ternary measurement matrices proposed in [5] could outperform the Gaussian matrix a little bit, however, they can only be applied to sparse signals instead of directly to natural images, which means extra transform in CIS is required. Thus, a binary/ternary matrix could achieve high image quality is wanted. To reduce the size of measurements, several previous works exploited the spatial redundancy to compress measurements. In [6], measurements in previous blocks were directly subtracted and used for prediction. Nevertheless, it only partly utilizes the horizontal correlation. In [7, 8], the intra prediction occurs by the measurement-wise subtraction from the neighboring measurements, similar to pixel-wise subtraction inter prediction. In these works, however, the measurements for prediction contain irrelevant information, such as the nonadjacent pixels, so that the prediction precision decreases. In [9], a local structural measurement matrix providing more precise prediction is proposed for the measurement-domain prediction by extracting the local features within a block. However, it has high compu-
Copyright © 200x The Institute of Electronics, Information and Communication Engineers
IEICE TRANS. ??, VOL.Exx–??, NO.xx XXXX 200x
2
...
...
... ...
𝜙𝑖
+
-
sum
4T-APS 1 pixel
+ -
sum
Diffrential Integrator ADCs Y = 𝑦1 … 𝑦𝑖
𝑇
Fig. 1 A brief architecture of a processing block in CS-CIS [3] (Component A in Fig. 2). Outputs are digital measurements Y.
tational complexity for a brute-force search among all the local predictor candidates, and it requires a floating point measurement matrix that cannot be applied to the image sensor. Another issue in [7–9] is that they require all the measurements of a block for prediction. It requires a large memory bandwidth to fetch / load the data, as well as a large memory storage used for line buffer to store all the measurements of neighboring blocks, which would be a problem for a power-limited and a storage-limited wireless camera system. In [10, 11], an intra prediction framework for measurement compression is proposed. It requires few memory storage and bandwidth, however, it needs to modify two rows of a random matrix, which might be not suitable to all matrices. Overall, these works could improve the coding efficiency, but the image quality still has spaces to improve. In this paper, we therefore propose ternary measurement matrices to improve image quality and measurements compression algorithm as well as VLSI architecture to reduce the size of measurements. Main contributions of the paper are outlined as follows. 1) We proposed an algorithm to generate a series of deterministic and ternary measurement matrices, compatible to the current CS-CIS architecture [1,3,4]. The proposed matrices are derived from approximated DCT and capable to preserve the energy compact property as DCT. Comparing with random binary/ternary matrix, the proposed matrices achieve a significant improvement in image recovery quality and certain degree of bit rate saving.
This paper is the extension (contribution 2 and 3) of our previous work [12]. Our previous work [12]’s contribution is the proposed measurement matrix. Based on this matrix, we proposed measurement-based intra prediction able to perform on matrix in [12], which is contribution 2. The low cost and low power VLSI architecture that implements the measurement-based intra prediction, which is contribution 3. Both contribution 2 and 3 are totally new to [12]. Experiment results show that our proposed matrix contributes a significant improvement in coding efficiency, by BD-PSNR increase of 4.2 dB on average comparing with the random binary matrix used in the-state-of-art CS-CIS. The proposed prediction could further improve coding efficiency by increasing BD-PSNR of 0.24 dB. The VLSI architecture of our proposed prediction is 4.3 K gates in area with 1 KB dualport SRAM memory and 0.3 mW in power consumption. It could support the real-time encoding of 2160p@240fps. The rest of this paper is organized as follows. Section 2 introduce the approximated DCT and Compressed Sensing. Section 3 presents how we derived the proposed measurement matrix from the approximated DCT. Section 4 presents how our proposed matrix row operation generates the intra prediction pattern to compress measurements. The VLSI architecture of the proposed method is also presented. Section 5 shows experiment results and analyze the performance of the proposed algorithm and architecture, followed by the conclusion in Section 6. 2. 2.1
Approximate DCT and Compressed Sensing (CS) Approximate DCT:
DCT is a tool widely used in image compression due to its strong energy compaction property. However, it requires fast algorithm to reduce the computational complexity. Approximate DCT is one of the fast algorithm that offers a close result to exact DCT with hardware-friendly implementation. The 2D-DCT of an image Ro = CRi CT is approximated ˆ i CˆT where Cˆ = SP is an approximate matrix by Rˆo = CR p ˆ S, P ∈ R N ×N ). S = (PPT )−1 is a diagonal (Ro, Ri, C, C, ˆ P is a coarse approximate DCT matrix to orthogonalize C. matrix with low-complexity that can even only consists of ternary numbers as 0/1/-1. Several works relate to the design of 4/8/16-point approximate DCT in [13–15]. A 4-point approximate matrix (N = 4) in [13] is shown as an example.
2) We propose matrix row operations adaptive to the proposed matrix above for measurements compression. It is able to generate the intra prediction pattern as our previous work [10,11], without constructing new rows, so that measurements could be further compressed without any image quality loss.
2.2
3) We implement hardware architecture of the proposed intra prediction for measurements compression presented above. The proposed matrices could simplified the architecture resulting into low hardware cost and low power consumption.
The CS theory [2] asserts that only a few measurements are enough to recover the signals, as long as the signals are sparse in some transform domain. Suppose the image signal X = [x1 ... xn ]T can also be represented in the transform domain Ψ, as
1 1 P = 1 0
1 0 −1 −1
1 0 −1 1
1 −1 1 0
(1)
Concept of CS:
ZHOU et al.: APPROXIMATE-DCT-DERIVED MEASUREMENT MATRICES WITH ROW-OPERATION-BASED MEASUREMENT COMPRESSION AND ITS VLSI ARCHITECTURE FOR C
X = ΨS
(2)
where S = [s1 ... sn ]T is the signal represented in Ψ transform domain and Ψ is an n × n transform matrix. The signal X is said to be k-sparse if it has only k non-zero coefficients. We would like to recover signals X = [x1 ... xn ]T from m n linear and non-adaptive measurements Y = [y1 ... ym ]T , which are taken from the random projection as Y = ΦX
(3)
where Φ is an m × n measurement matrix. We know that the system is under-determined since m < n. The CS theory asserts that the signal S 0 can be recovered with high probability using only m = cklog(n/k) measurements for some constant C, by solving the L1 -norm minimization problem (4) 0
min kS k 1
s.t Y = ΘS
0
(4)
where Θ = ΦΨ and the measurement matrix Φ must be incoherent with transform matrix Ψ to preserve the Restricted isometry property (RIP) [2]. The CS theory shows that Φ can even be a random 1/−1 or 0/1 matrix, while Ψ could be a discrete cosine transform (DCT), discrete wavelet transform (DWT), contourlet transform and so forth. The problem (4) can be solved by basis pursuit [2]. To a noise environment, (3) can be extended to Y 0 = ΘS 0 + Z, where Z represents the noise. It could be solved by basis pursuit denoising [2]. After the recovery of S 0, the signal X 0 can thus be calculated by (2). 2.3
Process of CS image sensing and image reconstruction:
Considering the infeasibility and scalability of the image sensor implementation and the complexity of image recovery, the pixel array are divided into blocks as Fig. 3 (a) to perform the sampling [1, 4]. For each block: 1) The analog signals of n pixels are acquired by the pixel array inside the sensor, X = [x1 ... xn ]T . The measurements are calculated through (3) in the analog domain and digitalized into Y = [y1 ... ym ]T . 2) Measurements are predicted, quantized, followed by the entropy coding and the transmission 3) The bitstream obtained from the channel are dequantized and then reconstructed into measurements. 4) The reconstructed signal X’ are recovered from the decoded measurements Y’ by solving (4) and (2). The process is shown in Fig. 2. 3. 3.1
Proposed measurement matrix The main idea
Because of the energy compact property of DCT, we propose a measurement matrix Φt that performs an approximated DCT to generate the measurements in (3). Measurements generated by the proposed matrix represent the frequency response of input signals, unlike the usual case that measurements are linear combination of input signals randomly
A
CS-CIS (Fig. 1) Pixel Sensing
𝑋 = 𝑥1 𝑥2 … 𝑥𝑛
Entropy decoding
𝑇
𝑌 = Φ𝑋 𝑌𝑟′ = 𝑌𝑄𝑟 ′ ≪ 𝑄𝑠𝑡𝑒𝑝
A/D conversion
𝑌 = 𝑦1 𝑦2 … 𝑦𝑚
B
𝑇
Measurement Intra Prediction
Channel
𝑌′ = 𝑦1′ 𝑦2′ … 𝑦𝑚′
Yr
C
Measurement Reconstruction 𝑇
Image Reconstruction
𝑌𝑄𝑟 = 𝑌𝑟 ≫ 𝑄𝑠𝑡𝑒𝑝 YQr
𝑋′ = 𝑥1′ 𝑥2′ … 𝑥𝑛′
Entropy coding
𝑇
Fig. 2 The process of CS based image sensing, encoding and decoding. The proposed measurement matrix is the gray box in component A. The proposed measurements intra prediction and its architecture is in component B, C, shown in Fig. 8
taken. As is known, the low-frequency components in an image play the majority role in the determining the image quality. It gives us an intuition that measurements representing the lower frequency response would be more important than the ones representing the higher frequency response, if m N 2 measurements are taken for image reconstruction. 2 Given any image signal X ∈ R N , it could be represented by a 2D matrix X2 ∈ R N ×N or an 1D matrix 2 X1 ∈ R N ×1 . To perform the DCT on X, it can be a 2DDCT as (5) in Fig. 3 (b), that sequentially performs 1D-DCT twice (vertically and horizontally), or can be a the projection to Φd , as (6) in Fig. 3 (c), such that Y1 , Y2 represents the same output. Y2 = PX2 PT Y1 = Φd X1
(5) (6) 2
2
2
where P, Y2 ∈ R N ×N , Y1 ∈ R N ×1 and Φd ∈ R N ×N . The relationship between X1, X2 and Y1, Y2 can be represented by X1 = f (X2 ) and Y1 = f (Y2 ). The function f denotes a mapping from a 2D matrix M2 to a 1D matrix M1 , f : M2i, j − > M1(i−1) N + j . In matrix Y2 in Fig. 3 (b), the low frequency components are on the upper-left corner and the frequency increases according to the zigzag scan (Z-scan) order (blue dash line). Taking the lowest 4 frequency components (deep gray to light gray) as example, their corresponding locations in Y1 are shown in Fig. 3 (c). Obviously, given any frequency component in Y2 , the corresponding measurements in Y1 can always be located. Whenever to take m N 2 measurements for image reconstruction, e.g m = 4 in this case, we can find the lowest m frequency components in Y2 according the Z-scan order, and know which measurements to be kept in Y1 . Since each measurement in Y1 is determined by the corresponding row in Φd , as the gray area in 3 (c), we know which row to keep or given the number of measurement m.
IEICE TRANS. ??, VOL.Exx–??, NO.xx XXXX 200x
4
N
1
2 3
4
5
6
8
7
N Õ
Yi, j =
N
By observing some of the terms in (8) as follows, we expand (9)
9 10 11 12
13 14 15 16
𝑿𝟐 ∈ 𝑹𝑵×𝑵
gi,1 =
(a)
N
𝐏𝐗 𝟐 𝐏𝐓 = 𝐘𝟐
N
N
𝐏
1
2 3
4
5
6
8
7
𝐏𝐓
9 10 11 12
=
13 14 15 16
1
2 3
4
5
6
8
7
gi,2 =
9 10 11 12
𝚽𝐝 𝑿𝟏 = 𝒀𝟏
5
𝑁2
..
1 2
Yi, j =
.. ...
... ...
16
16
(c)
1 2 5 9
1 .. 5
𝑁2
..
=
16
Fig. 3 (a) An image is separated block by block. (b) A 2D-DCT transform, N=4. The blue dashed line shows the Z-scan order, ascending frequency response. (c) The 1D representation of (b). (d) An example of N = 4, m = 4 measurements are taken in the Z-scan order.
The final measurement matrix is shown in 3 (d). Overall, there are two steps to generate the measurement matrix Φt : 1) Calculate the matrix Φd according to (7), where i, j, k, q ∈ [1, N], p is an element of matrix P. 2) Trim the matrix Φd into the matrix Φt , by keeping the lowest m frequency components, which are the first m element in Zscan order. It is noted that Φd is a ternary matrix, since the element p is ternary number.
3.2
(7)
Derivation of proposed matrix Φd
Suppose there are matrices D, X, E, G, Y ∈ R N ×N , such that Gi, j = (DX)i, j and Yi, j = (GE)i, j = (DX E)i, j . According to the definition of matrix multiplication, Gi, j and Yi, j can be expanded as (8) and (9) Gi, j =
N Õ r=1
r=1 N Õ
di,r xr, N = di,1 x1, N + di,2 x2, N + ... + di, N x N, N gi,r er, j = gi,1 e1, j + gi,2 e2, j + ... + gi, N e N, j
k,q=1
..
(d)
φ(i−1)N +j,(k−1)N +q = pi,k p j,q
N Õ
By observing (10), we know each element in Y is a linear combination of x1,1 .. x N, N . Thus (10) could be simplied into N Õ (di,k eq, j )xk,q (11) Yi, j =
𝚽𝐭 𝑿𝟏 = 𝒀𝒕 𝑚
di,r xr,2 = di,1 x1,2 + di,2 x2,2 + ... + di, N x N, N
=(di,1 x1,1 + di,2 x2,1 + ... + di, N x N,1 )e1, j + (di,1 x1,2 + di,2 x2,2 + ... + di, N x N,2 )e2, j + ...+ (di,1 x1, N + di,2 x2, N + ... + di, N x N, N )e N, j (10)
...
9
..
di,r xr,1 = di,1 x1,1 + di,2 x2,1 + ... + di, N x N,1
r=1
5
=
r=1 N Õ
...
13 14 15 16
gi, N =
..
N Õ
r=1
(b)
1
gi,r er, j = gi,1 e1, j + gi,2 e2, j + ... + gi, N e N, j (9)
r=1
di,r xr, j = di,1 x1, j + di,2 x2, j + ... + di, N x N, j (8)
˚ Y˚ ∈ R N 2 ×1 represent Xi, j and Yi, j where X˚ = Let matrix X, f (X) and Y˚ = f (Y ). Suppose Y˚ = Φ X˚
(12)
The element y˚(i−1)N +j is determined by the [(i − 1)N + j]th ˚ For a given row in Φ, the element in row of Φ and X. ˚ which each column determines the linear combination of X, is di,k eq, j in (11). Therefore, (11) could be represented as follows N Õ Y˚(i−1)N +j = (φ(i−1)N +j,(k−1)N +q ) x˚(k−1)N +q (13) k,q=1
φ(i−1)N +j,(k−1)N +q = di,k eq, j
(14)
where φ a,b represents the element in row a and column b in Φ. By replacing D, E with P, PT , (14) becomes (7), and ˚ Y˚ with Φd is the proposed matrix we want. By replacing X, X1, Y1 , (12) becomes (6). We show Φd, Φt generated from (1) as an example. When m = 4, the 1st , 2nd , 5th and 9th rows (bold) are kept.
Φd =
1 1 1 0 1 1 1 0 1 1 1 0 0 0 0 0
1 0 −1 −1 1 0 −1 −1 1 0 −1 −1 0 0 0 0
1 0 −1 1 1 0 −1 1 1 0 −1 1 0 0 0 0
1 -1 1 0 1 −1 1 0 1 −1 1 0 0 0 0 0
1 1 1 0 0 0 0 0 -1 −1 −1 0 −1 −1 −1 0
1 1 0 0 −1 −1 −1 1 0 0 0 0 0 0 0 0 -1 -1 0 0 1 1 1 −1 −1 −1 0 0 1 1 1 −1
1 -1 1 0 0 0 0 0 -1 1 −1 0 −1 1 −1 0
1 1 1 0 0 0 0 0 -1 −1 −1 0 1 1 1 0
1 0 −1 −1 0 0 0 0 -1 0 1 1 1 0 −1 −1
1 1 0 -1 −1 1 1 0 0 0 0 0 0 0 0 0 -1 -1 0 1 1 −1 −1 0 1 1 0 −1 −1 1 1 0
1 1 1 0 -1 −1 −1 0 1 1 1 0 0 0 0 0
1 0 −1 −1 -1 0 1 1 1 0 −1 −1 0 0 0 0
1 0 −1 1 -1 0 1 −1 1 0 −1 1 0 0 0 0
1 -1 1 0 -1 1 −1 0 1 −1 1 0 0 0 0 0
(15)
ZHOU et al.: APPROXIMATE-DCT-DERIVED MEASUREMENT MATRICES WITH ROW-OPERATION-BASED MEASUREMENT COMPRESSION AND ITS VLSI ARCHITECTURE FOR C
27
30
PSNR(dB)
26
SR=0.25
28 25 27 24
1
1.5 bits(bpp)
32
2
25 0.5
2.5
PSNR(dB)
1
1.5
2
bits(bpp)
36
31
32
29 28 27
28
26
26
1
2
3
4
1 1 1 1 0 0 1 1 1 1 1 1
1 −1 1 1
1 1 0 −1
1 0 0 −1
1 0 0 −1
1 −1 0 −1
1 1 0 −1
1 0 0 −1
1 0 0 −1
4.1
𝚽𝑿 = 𝒀
𝒓𝑹𝑪
4 1
1.5
2 2.5 bits(bpp)
3
3.5
1 −1 0 −1
1 1 −1 1
1 0 −1 1
1 0 −1 1
1 −1 −1 1
(16)
The performance comparison
We compare the performance of two methods of trimming the measurement matrix, between Z-scan order (measurement index: 1,2,5,9 ...) as Fig. 3(b) and normal-scan order (measurement index: 1,2,3,4 ...). From both graphs in Fig. 4, we can find that green curves occur in the lefter and upper region than the blue curves, showing that Z-scan order outperforms the N-scan order in different sizes of approximate matrix. The results verify our intuition that the preference for measurements of lowest m frequency response improves the image quality. 4.
𝒓𝑩𝑹
4
30
Fig. 4 The BD curve comparison between proposed measurement matrix with Z-scan and N-scan, using two images, mandrill (left) and F16 (right). Sample rate (SR) of 0.25 (first row) and 0.50 (second row) are evaluated. Matrices are generated the approximate DCT from several previous work (N=4: [13], 8: [14] and 16: [15]). The marker Circle, Plus and Square represent N=4,8 and 16 respectively. Blue dash line represents N-scan and green solid line Z-scan.
3.3
Fig. 5 The predictor candidates (Blue: Bottom row of upper block and red: rightmost column of left block)
34
30
bits(bpp)
Φt =
Upper/ Left/None
Left block
26
23 0.5
SR=0.50
Upper block
29
Proposed matrix row operation for measurementbased intra prediction and its VLSI architecture Existing measurement-based intra prediction
To reduce the data volume for storage and transmission, the measurements are further compressed. However, measurements, unlike pixels, could not be compressed by traditional intra coding methods, because the spatially correlation between adjacent pixels is corrupted during the generation of measurements. Thus, some works for measurements compression are proposed [7, 8, 10]. Among these works, [10] achieved the best coding efficiency of 7% BD-rate reduction. The basic idea of [10,11] is to extract the information of neighbouring blocks (bottom row in upper block and rightmost column in left block) for predicting the current block, as
Input signal 𝑋 = 𝑥1 𝑥2 … 𝑥16
𝒚𝟏 = 𝐒𝐮𝐦𝐁𝐑 𝒚𝟐 = 𝐒𝐮𝐦𝐑𝐂
=
Random binary matrix 𝚽
𝑇
(a)
(b)
Fig. 6 (a) A 4×4 block as input signal X. The target is to extract the sum of the blue and red part. (b) In the existing method [10, 11], the first two rows in the random binary matrix are modified as r B R and r RC to extract the information (sum of bottom row and sum of rightmost column)
shown in Fig. 5. Comparing with using pixels for prediction far away, using pixels nearby could improve the prediction accuracy. Thus, as shown in Fig. 6 (a), local information of a block (the sum of bottom row and sum of rightmost column) is extracted and stored when processing this block, so that next block could use them for prediction. To extract these local information, the first two rows of the random binary matrix are modified as rBR and rRC are shown in (17) and (18), when block size N = 4. In rBR , the last N values are set to 1’s, while the rest are set to 0’s. In rRC every N th values are set to 1’s, while the rest are set to 0’s. rBR = [0 rRC = [0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1]
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1]
(17) (18)
When input signal X projects on these two modified row (the 1st and 2nd row), the resulted first two measurements y1, y2 would represent the sum of bottom row (SumBR ) and sum of rightmost column (SumRC ), which are the key information to be stored for future prediction. 4.2
Proposed matrix row operation for measurement-based intra prediction
However, the above approach [10, 11] could not be applied to the proposed matrices in Section 3 without degrading the image quality, because it requires to modify two rows in the matrix to extract the information. As shown in (15) and (16), each row in the matrix is special, selected according to the frequency response. Changing a row could significantly degrades the image quality. Thus, our intuition is to find a way to extract the information of neighbouring blocks as
IEICE TRANS. ??, VOL.Exx–??, NO.xx XXXX 200x
6
𝒄𝟏 𝒓𝟏 + ⋯ + 𝒄𝒌 𝒓𝒌 = 𝐫𝐁𝐑 𝒄𝟏 𝒓𝟏 + ⋯ + 𝒄𝒋 𝒓𝒋 = 𝐫𝐑𝐂
𝒄𝟏 𝒚𝟏 + ⋯ + 𝒄𝒌 𝒚𝒌 = 𝐒𝐮𝐦𝐁𝐑 𝒄𝟏 𝒚𝟏 + ⋯ + 𝒄𝒋 𝒚𝒋 = 𝐒𝐮𝐦𝐑𝐂
=
Proposed ternary matrix 𝚽𝐭
𝚽𝑿 = 𝒀
4.3
Fig. 7 The proposed matrix row operation performed on the proposed matrix Φt to generate r B R and r RC to extract the sum of bottom row and the sum of rightmost column
[10,11] without modifying proposed matrices. Two rows rBR and rRC are the keys to extract information of the bottom row and the rightmost column. According to our observation, we propose to use matrix row operation in the proposed matrices to calculate the target rows, as shown in Fig. 7. Still using N=4 as an example, we generate a proposed matrix Φt6 with m=6 in Z-scan order from Φd in (15), by taking the 1st , 2nd , 5th , 9th 6th and 3rd rows of Φd as the followings. Φt6 =
1 1 1 1 1 1
1 0 1 1 0 −1
1 0 1 1 0 −1
1 −1 1 1 −1 1
1 1 0 −1 0 1
1 0 0 −1 0 −1
1 0 0 −1 0 −1
1 −1 0 −1 0 1
1 1 1 0 0 0 −1 −1 0 0 1 −1
1 0 0 −1 0 −1
1 −1 0 −1 0 1
1 1 −1 1 −1 1
1 0 −1 1 0 −1
1 0 −1 1 0 −1
1 −1 −1 1 1 1
(19)
We observe that the key row rBR in (17) could be obtained by matrix row operation from three rows, r1 , r3 , r4 in the measurement matrix Φt6 , as (20). r1 = [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1] r3 = [1 1 1 1 0 0 0 0 0 0 0 0 −1 −1 −1 −1] r4 = [1 1 1 1 −1 −1 −1 −1 −1 −1 −1 −1 1 1 1 1] (r1 + r4 − 2 ∗ r3 ) = 4 ∗ rBR rBR = (r1 + r4 − 2r3 ) >> 2 (20) Similarly, from matrix row operation from three rows, r1 , r2 , r6 in the measurement matrix Φt6 , the key row rRC in (18) could be obtained as (21) r2 = [1 0 0 −1 1 0 0 −1 r6 = [1 −1 −1 1 1 −1 −1 1 (r1 + r6 − 2 ∗ r2 ) = 4 ∗ rRC rRC = (r1 + r6 − 2r2 ) >> 2
−1
1
0
0
1
−1
−1 1 1 −1
1
0 0
−1]
−1 1]
(21)
The discover above makes it possible to construct the key rows rBR and rRC without modifying the proposed measurement matrix Φt . Since each measurement is the projection of one row in the measurement matrix on the input signal, as mentioned in Section 4.1, the same matrix row operation performing on the measurements could obtain the neighboring information (the sum of bottom row and rightmost column) as the followings. SumBR = (y1 + y4 − 2y3 ) >> 2
(23)
We have verified that the proposed matrix row operation could be applied to other approximate-DCT measurement matrices deriving from 8/16-point approximate-DCT matrix [14, 15]. Since the proposed measurement matrix is determined, which rows to combine and how to combine to get the rows rBR , rRC could be obtained by offline calculation.
𝒚𝟏
SumRC = (y1 + y6 − 2y2 ) >> 2
(22)
Measurement-based intra prediction framework and its VLSI implementation
After getting the SumBR and SumRC , the process of measurement-based prediction is the same as the one in [10, 11], which could be divided into 4 steps. 1) Obtain the average of bottom row of upper block, aveBRu p and obtain the average of rightmost column of left bloc, aveRCl e , from SumBR and SumRC , respectively. 2) Use aveBRu p , aveRCl e and DC (eg. DC=128) to generate the predictor candidates Yup , Yle , YDC respectively, by performing the projection to the measurement matrix as (24), (25). Xup = [aveBRu p ... aveBRu p ]T Yup = Φt Xup Xle = [aveRCl e ... aveRCl e ] Yle = Φt Xle YDC = Φt [DC ... DC]T
(24)
T
(25)
3) Compare the sum and difference (SAD) of the original measurements Y and predictor candidates Yup , Yle , YDC , find the one with the minimum SAD as the predictor as (26) and use the original measurements to subtract it to get the residual as (27). p = arg min S AD(Y, YC AN D )
(26)
C AN D
Yr = Y − Yp
(27)
4) Quantize the residual by YQr = Yr >>Q step , followed by the entropy coding. The VLSI architecture of the above measurement prediction is shown in Fig. 8. It is to compress measurements from CS-CIS (component A in Fig. 2) and then produces the quantized residuals YQr to entropy coder. It takes measurements of the block size N = 4 [4] as input, with sample rate (SR) of 0.25, 0.5, and 0.75 (numbers of measurements m are 4, 8 and 12, respectively). A 4 × 4 block is processed every cycle. The major difference between [10] and the proposed architecture is that the matrix multiplication (the blue dash boxes) in this work has a significant lower hardware cost, because of the property of the proposed matrix Φd and Φt in (15) and (16). It could be observed that, the sum of element in each row of Φt or Φd is zero, expect for the first row (sum is 16, with sixteen 1’s). Because the elements are identical in Xup and Xle respectively, when Φt multiplies the signal Xup , Xle , this property makes all measurements (except for the first one) equal to zeros. Thus, the matrix multiplication
ZHOU et al.: APPROXIMATE-DCT-DERIVED MEASUREMENT MATRICES WITH ROW-OPERATION-BASED MEASUREMENT COMPRESSION AND ITS VLSI ARCHITECTURE FOR C
Q 2
(22,23)
𝑎𝑣𝑒𝑅𝐶𝑙𝑒 >>2
𝑆𝑢𝑚𝐵𝑅
𝑆𝑢𝑚𝑅𝐶
>>2
2𝑦3
-
𝑦4
+
x2
Left buf.
>>2
𝑦1
-
2𝑦2
+
𝑦6
PSNR:33.24 dB bits:1.9 bpp
PSNR:28.61 dB bits::3.48 bpp
PSNR:27.25 dB bits:2.17 bpp
PSNR:27.02 bits:3.57 bpp
Fig. 8 VLSI architecture (Component B and C in Fig. 2) for the measurement-based intra prediction with proposed matrix row operation. The dashed boxes are marked by numbers in parentheses, corresponding to the equations (22)–(27) in Section 4. The matrix multiplication in the gray solid box could be simplified into the shift operation.
Prop. 35
30 25 1
2 PSNR(dB)
2
3
25 2
3 6
30 25 0
1
2
4
2 5
36
4
5
6
25
36
4
5
6
25
36
4
5
6
2 5
36 4 bits(bpp)
5
6
26 25 24 4 1 28
1 4
2 5
3 6
35 PSNR(dB)
PSNR(dB)
2 5
25
3
35
20
1 4
30
20 1
Fig. 10
27
PSNR(dB)
PSNR(dB)
PSNR(dB)
30
MIP MEAP
30 25
20 3 0 bits(bpp)
26 24 22 4 1 35
4 1
5 2
N=4
30
25 6 3 1 bits(bpp)
4
N=8
Fig. 9 The BD curve of lena,barbara,mandrill and F16 (top to bottom) with N=4 and N=8, SR =0.5, with various Q s t e p ∈ [0, 6]. The proposed measurement matrix in Prop. and Prop.+ are trimmed by Z-scan.
in (24) and (25) has not any computation, except for the first measurements (equal to 16*aves) requiring shift operation. Thus the matrix multiplication nearly has no hardware cost, not even an adder. Besides, the property also reduces the calculations for the residual in (27), since only the first measurement needs subtraction, rather than all measurements. Though the proposed matrix row operation introduces the overhead in the red box at the bottom of Fig. 8, it has a low hardware cost, consisting of only 4 adders. 5.
The visual quality comparison. Left :Prop.+ Right: Dir.
30
25 6 3 1
35
35
20
5 2
25 20
1
4 1
PSNR(dB)
PSNR(dB)
25 20
25
PSNR(dB)
0
30
20 3 0 30
30
Dir. 35
PSNR(dB)
PSNR(dB)
PSNR(dB)
35
20
Prop.+
Experiment Results
The direct (Dir.) way [1, 4] and MIP [10], which use the random binary matrix (RBM), are compared with Prop. and Prop.+ in this work. In Dir., measurements are not compressed by any prediction method. In MIP, measurements are compressed by intra prediction. We define Prop. as the
proposed matrix with Z-scan order in Section 3 but without any prediction. We define Prop.+ as the algorithm combining Prop. with the proposed matrix row operation for measurement intra prediction in Section 4.2). The performance of each method is evaluated under fourteen gray-scale test images (512×512), which are reconstructed by the algorithm, L1 primal-dual (PD) interior-point [2] with DCT. First, the mean square error (MSE) is evaluated, as shown in Fig. 11. The results of block size N = 4 and N = 8 are in the left three columns and right three columns, respectively. The MSE of each processing block in Prop. are smaller and closer to zero than that in Dir. Prop.+ further decreases the MSE from Prop.. Moreover, the reduction in MSE between Prop. and Dir. more significant for N = 4 than that for N = 8. Next, we evaluate the BD-PSNR and BD-curve. Since the occurrence of 1’s in the RBM used in MIP and Dir. influences the reconstruction quality. We find the optimal image quality can be achieved when occurrences of 1’s are 74% when N=4 and 23% when N=8. From Fig. 9 , we find that Prop. could significantly improve the image quality and reduce the size of measurements comparing with Dir., which is consistent with the result shown in 11. Prop.+ could reduce further reduce bit rate without introducing any image quality loss comparing with Prop. The result in Table 1 shows that Prop. could increase the BD-PSNR by 4.2 dB at average comparing with Dir., and 2.2 dB comparing with MIP. Prop.+ could further increase the BD-PSNR by 0.24 dB at average (equivalent to 5% BD-rate reduction) comparing with Prop. Finally, Fig. 10 shows an obvious improvement in image quality as well as in bit saving in Prop.+ on the left. The performance of the hardware of Prop.+ (Component B in Fig. 2) and MIP are compared in Table. 2. The total area of Prop.+ is 4.3 K gates, which includes 2.8 K
IEICE TRANS. ??, VOL.Exx–??, NO.xx XXXX 200x
8 Table 1 BD-PSNR Comparison. (Anchor is Dir., with Q s t e p ∈ [0,6], N=4 [13] and N=8 [14]. Reconstruction algorithm is L1-PD with DCT) BD-PSNR N=4 N=8 SR=0.75 SR=0.50 SR=0.25 SR=0.75 SR=0.50 Prop.+ Prop. MIP Prop.+ Prop. MIP Prop.+ Prop. MIP Prop.+ Prop. MIP Prop.+ Prop. MIP 11.58 10.75 6.12 5.73 5.24 2.70 0.92 0.71 0.47 7.40 7.28 2.81 3.70 3.65 1.18 4.86 4.40 2.99 2.85 2.65 1.12 0.26 0.17 0.08 3.27 3.19 1.92 1.11 1.09 0.80 4.02 3.98 1.10 1.56 1.55 0.71 7.08 6.79 2.59 4.75 4.64 0.95 2.06 2.02 0.99 7.47 6.69 2.45 4.82 4.63 1.44 0.44 0.37 0.03 5.37 5.28 1.46 3.04 3.01 0.59 14.79 14.47 6.71 5.72 5.53 2.40 18.37 15.86 6.94 7.76 6.51 4.78 0.24 0.04 0.09 7.61 7.49 2.81 3.43 3.39 1.11 11.56 10.85 5.21 6.16 5.79 2.82 1.48 1.36 0.93 10.58 9.87 5.52 5.60 5.24 2.48 1.14 1.01 0.75 6.28 6.19 2.73 2.96 2.92 1.11 3.46 3.44 1.16 1.91 1.89 0.62 6.38 6.15 2.66 3.47 3.37 1.31 1.24 1.19 0.61 5.33 5.26 1.98 3.02 3.00 0.91 8.75 8.25 4.16 5.19 4.97 2.03 0.61 0.53 0.47 4.68 4.34 1.51 2.98 2.88 0.85 0.76 0.73 0.30 2.57 2.53 0.88 1.39 1.38 0.44 5.16 5.09 1.68 2.95 2.93 0.82 8.09 7.56 3.77 4.85 4.64 1.91 1.34 1.26 0.77 15.98 14.68 8.76 9.35 8.58 4.08 0.84 0.45 1.13 9.96 9.71 4.91 4.52 4.39 1.32 5.78 5.69 2.64 3.20 3.16 1.15 8.89 8.11 6.19 4.93 4.43 3.16 1.77 1.54 1.33 9.56 8.79 4.53 5.27 4.89 2.28 1.01 0.88 0.61 6.23 6.12 2.52 2.96 2.91 1.01
Test Images Lena Barbara Mandrill Peppers house F16 goldhill pentagon boat bike sailboat milkdrop elaine Aver. Aver. in all sample rate
2.4
x 10
4
Prop.+ 5.28
Dir. 2.4
Prop. 4.85
x 10
4
Prop. 2.4
1.6
1.6
0.8
0.8
0.8
x 10
4
0
2.4
0
8192 16384 block number 4 x 10
0
2.4
0
8192 16384 block number 4 x 10
0
2.4
1.6
1.6
0.8
0.8
0.8
x 10
2.4
4
Prop. 3.47
Dir. 2.4
x 10
4
MIP 0.94 0.51 0.33 0.89 1.45 0.73 0.76 0.40 0.76 0.34 0.65 1.06 0.76 0.74
MIP 1.42
Prop. 2.4
1.6
1.6
0.8
0.8
0.8
x 10
4
Prop.+
MSE
1.6
0
0
8192 16384 block number 4 x 10
0
2048 block number
x 10
2.4
4096
0
4
2.4
0 x 10
2048 block number
4096
0
4
2.4
1.6
1.6
1.6
0.8
0.8
0.8
0 x 10
2048 block number
4096
2048
4096
4
MSE
MSE
1.6
Prop.+ 3.53
Prop.+
MSE
1.6
MIP 2.47
SR=0.25 Prop.+ Prop. 1.54 1.53 0.02 0.01 1.24 1.23 1.58 1.57 1.80 1.74 1.69 1.68 1.56 1.55 1.24 1.22 1.73 1.72 0.92 0.92 1.64 1.63 1.69 1.64 1.32 1.31 1.38 1.36
0
0
8192
16384
0
0
8192
16384
0
0
8192
16384
0
0
2048
4096
N=4
0
0
2048
4096
0
0
N=8
Fig. 11 Comparison of MSE of residuals in each block among three algorithms, with N = 4 and 8 , SR = 0.5, and Q s t e p =4 of lena (first row) and mandrill(second row) . Table 2
Performance of VLSI architecture. Prop.+ MIP [10] Process SMIC 40nm Area (NAND Gates) 4.3 K 9.1 K Specification 2160p@240fps Freq. (MHz) 200 SRAM 1 KB Throughput (measurements/Cyc.) 12 Power Consumption 0.3 mW 0.6 mW
gates for SAD modules, 1.2 K gates for intra prediction, 0.3 K gates for the finite state machine and the memory of 1 KB is for storing predictors. The area reduction is contributed by property of proposed matrix. It makes the matrix multiplication in intra prediction simpler. Its power consumption is 0.3 mW at 200MHz in typical condition (1.1 V, 25 °C). It is omittable (only 1% ) comparing with the power consumption of the current CS-CIS [1, 3] (28 mW to 100 mW). The throughput of our design is processing 12 measurements per cycle (a 4 × 4 block with SR = 0.75 has 12 measurements). The architecture could support 2160p@240fps. Because of the proposed measurement matrix and
measurement-based intra prediction, Prop.+ compresses measurements by 88% BD-Rate reduction comparing with Dir., with extra area of 4.3K gate and power consumption of 0.3 mW from the architecture. Comparing with MIP, Prop.+ also achieves compression of 49% BD-Rate reduction. Because we exploited the property of matrix to optimize the architecture, the area and power consumption of Prop.+ is 52% and 50% less than MIP, respectively. 6.
Conclusion
We proposed an algorithm to generate a series of deterministic and ternary matrices, which are compatible with the CS-CIS. The proposed matrices are derived from the approximate DCT, hence preserving the energy compaction property as DCT does. The proposed measurement matrix significantly improve the coding efficiency by BD-PSNR increase of 4.2 dB, comparing with the random binary matrix used in the-state-of-art CS-CIS. We further proposed matrix row operations adaptive to the proposed matrix to compress
ZHOU et al.: APPROXIMATE-DCT-DERIVED MEASUREMENT MATRICES WITH ROW-OPERATION-BASED MEASUREMENT COMPRESSION AND ITS VLSI ARCHITECTURE FOR C
measurement by 4.8% BD-rate without any image quality loss. Lastly, a low-cost and low-power VLSI architecture of the proposed measurements intra prediction is implemented, with only 4.3 K gates in area, 0.3 mW in power consumption and supporting 2160p@240fps.
approximate DCT for image and video compression,” Multidimensional Systems and Signal Processing, vol.27, no.1, pp.87–104, jan 2016.
Acknowledgment The present work was supported in part by the Program for Leading Graduate Schools, “Graduate Program for Embodiment Informatics” of the Ministry of Education, Culture, Sports, Science and Technology. References [1] Y. Oike and A. El Gamal, “CMOS Image Sensor With Per-Column Σ∆ ADC and Programmable Compressed Sensing,” IEEE Journal of Solid-State Circuits, vol.48, no.1, pp.318–328, jan 2013. [2] D. Donoho, “Compressed sensing,” IEEE TIT, vol.52, no.4, pp.1289–1306, apr 2006. [3] N. Katic, M.H. Kamal, M. Kilic, A. Schmid, P. Vandergheynst, and Y. Leblebici, “Power-efficient CMOS image acquisition system based on compressive sampling,” 2013 IEEE 56th MWSCAS, pp.1367– 1370, IEEE, aug 2013. [4] M. Dadkhah, M.J. Deen, and S. Shirani, “CMOS Image Sensor With Area-Efficient Block-Based Compressive Sensing,” IEEE Sensors Journal, vol.15, no.7, pp.3699–3710, jul 2015. [5] X.J. Liu, S.T. Xia, and T. Dai, “Deterministic constructions of binary measurement matrices with various sizes,” 2015 ICASSP, pp.3641– 3645, IEEE, apr 2015. [6] S. Mun and J.E. Fowler, “Dpcm for quantized block-based compressed sensing of images.,” pp.1424–1428, 2012. [7] J. Zhang, D. Zhao, and F. Jiang, “Spatially directional predictive coding for block-based compressive sensing of natural images,” 2013 ICIP, pp.1021–1025, IEEE, sep 2013. [8] K.Q. Dinh, C.V. Trinh, V.A. Nguyen, Y. Park, and B. Jeon, “Measurement Coding for Compressive Sensing of Color Images,” IEIE Transactions on Smart Processing & Computing, vol.3, no.1, pp.10– 18, 2014. [9] X. Gao, J. Zhang, W. Che, X. Fan, and D. Zhao, “Block-Based Compressive Sensing Coding of Natural Images by Local Structural Measurement Matrix,” pp.133–142, apr 2015. [10] J. Zhou, D. Zhou, L. Guo, T. Yoshimura, and S. Goto, “Framework and VLSI architecture of Measurement-Domain Intra Prediction for Compressively Sensed Visual Contents,” IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences, vol.E100-A, no.12, pp.2869–2877, 2017. [11] J. Zhou, D. Zhou, L. Guo, T. Yoshimura, and S. Goto, “Measurementdomain Intra Prediction Framework for Compressively Sensed Images,” 2017 IEEE International Symposium on Circuits and Systems, pp.168–171, IEEE, May 2017. [12] J. Zhou, D. Zhou, L. Guo, T. Yoshimura, and S. Goto, “ApproximateDCT-Derived Measurement Matrices for Compressed Sensing,” 2017 IEEE International Symposium on Circuits and Systems, pp.1123–1126, IEEE, May 2017. [13] F. Bayer, R. Cintra, A. Madanayake, and U. Potluri, “Multiplierless approximate 4-point DCT VLSI architectures for transform block coding,” Electronics Letters, vol.49, no.24, pp.1532–1534, 2013. [14] U.S. Potluri, A. Madanayake, R.J. Cintra, F.M. Bayer, S. Kulasekera, and A. Edirisuriya, “Improved 8-Point Approximate DCT for Image and Video Compression Requiring Only 14 Additions,” IEEE TCASI: Regular Papers, vol.61, no.6, pp.1727–1740, jun 2014. [15] T.L.T. da Silveira, F.M. Bayer, R.J. Cintra, S. Kulasekera, A. Madanayake, and A.J. Kozakevicius, “An orthogonal 16-point
Jianbin Zhou received the B.E degree in Software Engineering from South China University of Technology, China, in 2012 and the M.E degree in Electronic Engineering from Waseda University, Japan, in 2014. He is currently pursuing the Ph.D degree in engineering in Waseda University, Japan. His interests are in algorithm and VLSI for video coding. He received scholarship from Ting Hsin Group and from Graduate Program of Embodiment Informatics.
Dajiang Zhou received B.E. and M.E. degrees from Shanghai Jiao Tong University, China. He earned his Ph.D. degree in engineering from Waseda University, Japan, in 2010. His interests are in algorithms and VLSI architecture for multimedia and communications signal processing. Dr. Zhou received the Best Student Paper Award of VLSI Circuits Symposium 2010, the International Low Power Design Contest Award of ACM ISLPED 2010, the 2013 Kenjiro Takayanagi Young Researcher Award, and the Chinese Government Award for Excellent Students Abroad of 2010. His work on an 8K UHDTV video decoder VLSI chip was granted the 2012 Semiconductor of the Year Award of Japan.
Takeshi Yoshimura received B.E., M.E., and Dr. Eng. degrees from Osaka University, Osaka, Japan, in 1972, 1974, and 1997. He joined the NEC Corporation, Kawasaki, Japan, in 1974. He received Best Paper Awards from the Institute of Electronics, Information and Communication Engineers of Japan (IEICE) and the IEEE CAS Society. Dr. Yoshimura is a professor of Waseda University and a member of IPSJ (the Information Processing Society of Japan), and IEEE.
Satoshi Goto received B.E. and M.E. degrees in electronics and communication engineering from Waseda University in 1968 and 1970, respectively. He received the Doctor of Engineering degree from Waseda University in 1981. He joined NEC Laboratories in 1970. Since 2003, he had been a professor with the Graduate School of Information, Production and Systems, Waseda University, Kitakyushu, Japan and now Professor Emeritus, Waseda University. Main interests include VLSI design methodologies for multimedia and mobile applications. He served as GC of ICCAD and ASPDAC and was a board member of IEEE CAS Society. He is a Life Fellow of IEEE, IEICE, and a member of Science Council of Japan.