[22] Prakash Ishwar and Pierre Moulin. On spatial adaptation of motion-field .... [49] Atul Puri and Alexandros Eleftheriadis. MPEG-4: an object-based multimedia ...
Exploiting Spline and Quadtree for Lossy Temporal Video Data and Lossless Synthetic Image Data Compression
March 2008
Murtaza Ali Khan
Dedicated to
My Parents, Wife, & Children
ABSTRACT Digital image and video compression techniques have played an important role in the world of telecommunication and multimedia systems, where the bandwidth is a valuable resource. Digital data requires high data rates - the better the picture, the more bandwidth is needed. This means powerful hardware and/or software at the transmitting and receiving side. In this thesis we present efficient techniques for lossy temporal video data compression and lossless synthetic image data compression. The size of raw video data is very large, however much of the data in video is not necessary for achieving good perceptual quality, because it can be easily predicted - for example, successive frames in a video rarely change much from one to the next - this makes data compression work well with video. The main goal of video coding techniques is to reduce the amount of information needed for a sequence of pictures without losing much of its quality. We present two new and efficient techniques of temporal video data compression. Both techniques are based on spline fitting of temporal video data. First technique is based on approximation of intensity or color variations of each pixel individually in the sequence of frames at fixed spatial locations. This technique yields very precise control of accuracy of the approximated or compressed video. Due to fitting of individual pixel video data, the output video is also free from blocking-artifacts. The subjective and objective analysis shows that this technique performs better than existing temporal video data compression methods. However applying spatial compression to output data (control points of spline) needs special care. The second technique gives more consideration to achieve higher compression ratio, and applies fitting to group of pixels together. This technique is based on the assumption that in most of the video sequences neighboring pixels have high temporal correlation and rather than fitting data of each spatial location individually, we fit the spline to intensity or color variations of group of ii
pixels at correlated spatial locations in the sequence of frames. The second technique can easily be incorporated as a temporal compression stage (motion compensation) of existing video coding schemes based on Discrete Cosine Transform (DCT) e.g., MPEG-1/2. We incorporated this technique to MPEG-1/2 coding. It other words rather than using conventional block matching techniques for temporal video data compression, we used group of pixel-based fitting. The subjective and objective comparison shows that that the proposed method yields competitive and better results than conventional MPEG-1/2, where block matching is used for temporal video compression. For compression of synthetic image data, we presented a new hybrid scheme based on quadtree decomposition and parametric line fitting. In this scheme, the encoding consists of two phases. In the first phase, the input image is partitioned into quadrants using quadtree decomposition. To prevent from very small quadrants, a constraint of minimal block size is imposed during quadtree decomposition. Homogeneous quadrants are separated from non-homogeneous quadrants. In the second phase of encoding, the nonhomogeneous quadrants are scanned row-wise. Luminance variation of each scanned row is fitted using parametric line at specified level of tolerance. Experimental results show that the proposed scheme performs better than well-known lossless image compression techniques such as PNG and GIF for several types of synthetic images. For image and video compression, we used several kinds of spline namely, Natural cubic spline, cubic Cardinal spline, linear, quadratic and cubic B´ezier curves. Fortunately linear B´ezier (parametric line) performs best for both image and video data fitting. Linear B´ezier (parametric line) has not only least storage requirements but it is also most computationally efficient among all spline/curves. Therefore, the proposed methods not only yield higher compression ratio but also computationally fast enough to be used in practical image and video coding methods.
iii
Acknowledgements All praise and glory to Almighty Allah (SWT) who gave me courage and patience to carry out this work. Peace and blessing of Allah be upon last Prophet Muhammad (PBUH). My deep appreciation goes to my thesis supervisor Dr. Yoshio Ohno for his constant help, guidance and the countless hours of attention he devoted throughout the course of this research work. His priceless suggestions made this work interesting and learning for me. Apart from research work I express my sincere gratitude for Dr. Ohno, whom I always find cooperative and helpful. Acknowledgement is due to my thesis committee members Dr. Hideo Saito, Dr. Hiroaki Saito and Dr. Hiroshi Shigeno for their interest, invaluable cooperation and support. Acknowledgement is due to Keio University for providing environment and facilities for this work. Yagami campus, a never sleeping campus, is a nurturing many eminent scientist of future. I am thankful to Yoshida Scholarship Foundation for providing not only financial support but also for arranging extra curricular activities each semester for its scholars. Acknowledgement is due to every member of Ohno lab, those who graduated and those are still registered. Special thanks to Yokota and Sindharta. My stay in Ohno lab as a doctorate student is a nice life-long experience. Due to my lack of Japanese language skill, I could not understand fully valuable discussions and refreshing comments during lab meetings, but I felt the sense of them and learned a lot from all the members of Ohno lab. Acknowledgement is due to all the people of NAMCO, meeting with NAMCO people were very valuable for my research work. Their comments helped a lot in my research. And finally, my heartfelt thanks to my parents and other family members, their prayers and support and are always with me. iv
Contents Chapter 1 INTRODUCTION 1.1 Objectives . . . . . . . . . . . . . . . . . 1.2 Notations and Conventions . . . . . . . . 1.3 Quality . . . . . . . . . . . . . . . . . . . 1.3.1 Objective Quality . . . . . . . . . 1.3.2 Subjective Quality . . . . . . . . 1.4 Digital Image . . . . . . . . . . . . . . . 1.5 Digital Video . . . . . . . . . . . . . . . 1.5.1 Video Signal . . . . . . . . . . . . 1.5.2 Analog Video Raster . . . . . . . 1.5.3 Digital Video Data Formats . . . 1.5.4 Dimension of Digital Video Data 1.6 Organization of Thesis . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
Chapter 2 RELATED WORK 2.1 Spline . . . . . . . . . . . . . . . . . . . . . 2.1.1 Natural Cubic Spline . . . . . . . . . 2.1.2 Cubic Cardinal Spline . . . . . . . . 2.1.3 B´ezier Curve . . . . . . . . . . . . . 2.1.4 Parameterization . . . . . . . . . . . 2.2 Spline and Data Fitting . . . . . . . . . . . 2.2.1 Compression of Temporal Video Data 2.2.2 Spline fitting vs Block Matching . . . 2.3 Image Compression . . . . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
1 1 2 4 4 5 5 6 6 6 8 9 9
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . by Motion Compensation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . .
10 10 11 18 22 26 27 29 35 36
. . . . . . . . . .
38 38 39 43 46 46 48 50 52 52 55
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
Chapter 3 DATA FITTING SCHEMES 3.1 Fitting and Compression . . . . . . . . . . . . . . . . 3.2 Fitting Terminologies and Notations . . . . . . . . . 3.3 Fitting Strategy . . . . . . . . . . . . . . . . . . . . . 3.3.1 Natural Cubic Spline Fitting . . . . . . . . . . 3.3.2 Cubic Cardinal Spline Fitting (Fixed Tension) 3.3.3 Cubic Cardinal Spline Least Square Fitting . 3.3.4 B´ezier Curve Fitting . . . . . . . . . . . . . . 3.3.5 Fitting Algorithms Revisited . . . . . . . . . . 3.3.6 Experiments and Discussion . . . . . . . . . . 3.3.7 Comparative Results and Discussion . . . . . v
. . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
Chapter 4 FITTING GROUP OF PIXELS 4.1 Individual Pixel Level Fitting . . . . . . . 4.2 Group Level Fitting . . . . . . . . . . . . . 4.3 Algorithm . . . . . . . . . . . . . . . . . . 4.4 Simulation Results . . . . . . . . . . . . . 4.5 Discussion . . . . . . . . . . . . . . . . . . 4.6 Choosing Parameters . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
65 65 66 67 71 72 75
Chapter 5 IMAGE COMPRESSION 5.1 Framework . . . . . . . . . . . . . . 5.1.1 Quadtree . . . . . . . . . . 5.1.2 Parametric Line . . . . . . . 5.2 Algorithm . . . . . . . . . . . . . . 5.2.1 Encoding . . . . . . . . . . 5.2.2 Decoding . . . . . . . . . . 5.3 Experiments and Results . . . . . . 5.4 Discussion . . . . . . . . . . . . . . 5.5 Conclusion . . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
81 81 81 82 83 83 93 93 94 98
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
Chapter 6 CONCLUSION AND FUTURE WORK 106 6.1 Objectives and Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 6.2 Applications and Advantages . . . . . . . . . . . . . . . . . . . . . . . . 108 6.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Appendix A Test Video Sequences
110
Appendix B Software
112
Appendix C MATLAB CODE
114
Bibliography
147
vi
List of Algorithms 1 2 3 4
Break-and-fit Break-and-fit Break-and-fit Break-and-fit
spline spline spline spline
fitting, fitting, fitting, fitting,
pixel level error bound . . segment level error bound pixel level error bound . . segment level error bound
1
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
44 45 52 53
Chapter 1 INTRODUCTION Uncompressed multimedia (image, graphics, audio and video) data requires considerable storage capacity and transmission bandwidth. Despite rapid progress in mass-storage density, processor speeds, and digital communication system performance, demand for data storage capacity and data-transmission bandwidth continues to outstrip the capabilities of available technologies. The recent growth of data intensive multimedia-based web applications have not only sustained the need for more efficient ways to encode image and video signals but have made compression of such signals central to storage and communication technology. Image and video data compression techniques are highly related; video data is sequences of images. Image data contains only spatial redundancy, while video data contains both spatial and temporal redundancy. Spatial compression of video data is achieved by image compression techniques. Therefore, in this research work we investigated techniques for both image and video data compression. In this thesis, we proposed and presented methods of temporal video data compression using spline fitting and image data compression using quadtree decomposition and spline fitting.
1.1
Objectives
The research work has following objectives. 1. Investigate curves/splines to approximate and compress video and synthetic image data. 1
2. Find suitable spline(s) for image/video data approximation. 3. Optimize subjective quality i.e., human visual acceptance, objective quality i.e., P SN R and compression factor. 4. Automate the encoding and decoding processes. 5. Design and implement computationally efficient fitting algorithms. 6. Find suitable values of parameters that control the bitrate and quality of output video and image. 7. Investigate how to incorporate the proposed methods in the existing coding standards. 8. Compare the performance of proposed methods with existing coding methods.
1.2
Notations and Conventions
This section defines fundamental notations and conventions that are frequently used in video and image coding and compression literature. Bitrate: Average number of bits per unit sample or unit time. In digital multimedia, bitrate is quantified using bits per pixel (bpp) or bits per second (bps). For a source outputting fs samples per second and representing each sample with R bits then, bitrate in terms of bpp is fs R [68]. For large bit-rates following prefixes are used: • 1,000 bps = 1 kbps (one kilobit or one thousand bits per second) • 1,000,000 bps = 1 Mbps (one megabit or one million bits per second) • 1,000,000,000 bps = 1 Gbps (one gigabit or one billion bits per second) Framerate: The number of complete screens or frames drawn per unit time. Frame rate is most often expressed in frames per second (FPS) or simply, hertz (Hz).
2
Entropy: Entropy of a source in bits per symbol can be expressed as:
Entropy = −
J X
P (aj ) log P (aj ),
(1.1)
j=1
where J is the number of unique symbols in the source, P (aj ) is the probability of the occurrence of symbol aj . For image/video data entropy is measured in bits per pixel. Distortion Measure: A criteria to measure the closeness or fidelity of a reconstructed data to the original data. In lossy video compression schemes most frequently used distortion measures are mean squared error (M SE), mean absolute difference (M AD), signal-to-noise ratio (SN R) and peak-signal-to-noise-ratio (P SN R). Let I is an original video sequence of P monochrome images/frames. Size of each frame is M × N . Let K is compressed video of same size then distortion measures are defined as follows: Mean squared error (M SE): Average of the squared error. For each video frame M SE can be computed as follows: M N 1 XX kIi,j − Ki,j k2 . M SE = M N i=1 j=1
(1.2)
Mean absolute difference (M AD): It is a distortion measure that is sometimes used in place of the M SE, mainly for reduced computation. For each video frame M AD can be computed as follows:
M AD =
M N 1 XX kIi,j − Ki,j k . M N i=1 j=1
(1.3)
Signal-to-noise ratio (SN R): It is the ratio of the average squared value of the original data and the M SE. SN R is often measured on a logarithmic scale with units decibels (dB). For each video frame SN R can be computed as follows:
SN R =
kIi,j k2 . M SE
PM PN i=1
3
j=1
(1.4)
Peak-signal-to-noise ratio (P SN R): Size of the error relative to the peak value of the original data. It is measured in decibels. Mathematically P SN R can be computed as follows: P SN R = 10 log10 (
2 Imax ), M SE
(1.5)
2 where Imax is the maximum pixel value of the original video sequence. When the 2 pixels are represented using 8 bits per sample, Imax = 255. To compute the P SN R
between two sequences, first compute the M SE between corresponding frames, then average the resulting M SE values over all frames, and finally convert the mean M SE value to P SN R using expression 1.5. For color sequences with three RGB values per pixel, the definition of P SN R is the same except the M SE is the sum over all squared value differences divided by image size and by three. Typical values for the P SN R are between 20 and 40 dB. Y U V (luminance and chrominance) : Y stands for the luminance component (the brightness) and U and V are the chrominance (color) components. Image aspect ratio (IAR) : It is ratio of image width-to-height.
1.3 1.3.1
Quality Objective Quality
The objective evaluation techniques are mathematical models that successfully emulate the subjective quality assessment results, based on criteria and metrics that can be measured objectively. The objective methods are classified, according to the availability of the original image/video signal, which is considered to be in high quality. Therefore, they can be classified as Full Reference Methods, Reduced Reference Methods and No-Reference Methods. The most frequently used measures of evaluating the quality of compressed image and video are SN R and P SN R. P SN R is one of the objective quality metrics that can be automatically computed by a computer program. But good P SN R does not always guarantee a good visual quality due to non-linear behaviour of human visual system. 4
1.3.2
Subjective Quality
Humans visual acceptance is the subjective measure of image/video quality. Sometimes however, subjective quality can also be challenging because it may require a trained expert to judge it. Image or video sequences are shown to the group of viewers and then their opinions are averaged to evaluate the quality, but details of testing may vary greatly. There are other methods proposed for subjective video quality assessment [60, 61, 20] et al. But there is no method that can perfectly relate objective and subjective video quality. By considering the characteristics of HVS the subjective quality of compressed image/video data can be improved. For example JPEG/MPEG family of image/video coding uses a quantization matrix that is determined experimentally, based on the threshold of visibility of distortion [62, 1, 40]. Since Human visual system (HVS) is more sensitive to edges [68, 50] therefore any image/video compression scheme that causes edges suffers from low subjective quality. Unfortunately JPEG/MPEG family of coding that is based on dividing frames into blocks then performing block matching and spatial compression of blocks also causes blocking-artifacts at low bit rate.
1.4
Digital Image
A digital image is a representation of a two-dimensional image as a finite set of digital values, called picture elements or pixels. The digital image contains a fixed number of rows and columns of pixels. Pixels are the smallest individual element in an image, holding quantized values that represent the brightness of a given colour at any specific point. Typically, the pixels are stored in computer memory as a raster image or raster map, a two-dimensional array of small integers. These values are often transmitted or stored in a compressed form. Digital images can be created by a variety of input devices and techniques, such as digital cameras, scanners, coordinate-measuring machines, seismographic profiling, airborne radar, and more. They can also be synthesised from arbitrary non-image data, such as mathematical functions or three-dimensional geometric models; the latter being a major sub-area of computer graphics. The field of digital image processing is the study 5
of algorithms for their transformation.
1.5
Digital Video
1.5.1
Video Signal
Video signal is a sequence of two-dimensional (2–D) images, projected from a dynamic three-dimensional (3–D) scene onto the image plane of a video camera. The color value at any point in a video frame records the emitted or reflected light at a particular 3–D point in the observed scene.
1.5.2
Analog Video Raster
Video technology is based on a raster scan display refreshing format. Raster scan refers to the pattern used to scan out the image: top-to-bottom a line at a time, left-to-right along a line. A line is called a scanline. The image is drawn by an electron beam which strikes a phosphor-coated screen which emits photons in the form of light. After a scan of an individual scanline, the electron beam is turned off and is positioned at the beginning of the next scanline line. The time it takes to do this is called the vertical retrace interval and the signal which notifies the electronics of this is called vertical blanking or vertical sync. When the beam gets to the bottom of the image, it is turned off and is returned to the top left of the screen. The time is takes to do this is called the horizontal retrace interval and the signal which notifies the electronics of this is called horizontal blanking or horizontal sync. A complete scan of all the scan lines of an image is called a frame [10]. Progressive scan: All of the scan lines are done from top to bottom. Interlaced scan: Every odd-numbered scanline is done on one pass and every evennumbered scanline is done on the next pass. Each pass is called a field (two fields per frame). A digital video can be obtained either by sampling a raster scan, or directly using a digital video camera. A digital video is defined by frame width W , frame height 6
H, framerate fs,t , the number of lines per frame fs,h and number of samples per line fs,w . Temporal sampling interval or frame-interval ∆t , horizontal sampling interval ∆w , vertical sampling interval ∆h and sampling rate fs are obtained by: ∆t = 1/fs,t ,
(1.6)
∆w = W/fs,w ,
(1.7)
∆h = H/fs,h ,
(1.8)
fs = fs,t fs,w fs,h .
(1.9)
Another important parameter of digital video is the number of bits per pixel Nb (for luminance and/or chrominance data). Conventionally each color component is assigned 8-bits i.e., 28 = 256 levels from 0 to 255. Therefore Nb = 8 for gray level video and Nb = 24 for color video. The data rate, R in bits per second (bps) is determined by: R = fs,t fs,w fs,h Nb .
(1.10)
Often the spatial and temporal sampling rates are different for luminance and chrominance components. In this case, Nb should reflect the equivalent number of bits used for each pixel in the luminance sampling resolution. For example, if the horizontal and vertical sampling rates for each chrominance components (e.g., U and V ) are both half of that for the luminance component (e.g., Y ), then there are two chrominance samples for every four luminance samples. If each sample is represented by 8-bits then Nb = (4 × 8 + 2 × 8)/4 = 12-bits. When displaying digital video on computer monitor, each pixel is rendered as a rectangle region with a constant color. The ratio of the width to the height of pixel is known as pixel aspect ratio (P AR). It is related to IAR by:
P AR = IAR
7
fs,h . fs,w
(1.11)
1.5.3
Digital Video Data Formats
In an attempt to standardize the digital video formats, the International Telecommunication Union—Radio Sector (ITU-R) developed the BT.601 recommendation. In the BT.601 standard, the sampling rate fs is chosen to satisfy two constraints: (1) the horizontal sampling resolution should match to the vertical sampling resolution as closely as possible, i.e., ∆w ≈ ∆h ; and (2) the same sampling rate should be used for NTSC and PAL/SECAM systems, and, it should be multiple of respective line rates in these systems. In BT.601 signal, the pixel width-to-height ration is not 1, i.e., P AR = ∆w /∆h = f
s,h = 8/9 for 525/60 and 16/15 for 625/50 signals. IAR fs,w
Color Coordinate and Chrominance Subsampling : Along with the image resolution, the BT.601 recommendation also defines a digital color coordinate, known as Y Cb Cr . In Y Cb Cr color space scaling and shifting operations are introduced so that the resulting components take values in the rage (0–255). Following are the equations to transform RGB to Y Cb Cr and vice versa.
0.257 0.504 0.098 Y R 16 Cb = −0.148 −0.291 0.439 G + 128 , 0.439 −0.368 −0.071 B Cr 128
(1.12)
Y − 16 1.164 0.000 1.596 R G = 1.164 −0.392 −0.813 Cb − 128 . Cr − 128 1.164 2.017 0.000 B
(1.13)
In the Y Cb Cr color space, Y reflects the luminance (range: 16–235); Cb and Cr are scaled versions of the color differences B − Y and R − Y respectively (range: 16–240). Table 1.1: BT.601 subsampling formats Sampling Y pixels Cb pixels Cr pixels Comments 4:4:4 2×2 2×2 2×2 no subsampling
8
Video Format BT.601 BT.601 BT.601 SIF CIF QCIF
1.5.4
Table 1.2: Digital Video Resolution Formats Y size Color Framerate Raw Remark Sampling (Bitrate) 720 × 480/576 4:4:4 60I/50I 249 Mbps MPEG-2 720 × 480/576 4:2:2 60I/50I 166 Mbps 15-50 Mbps 720 × 480/576 4:2:0 60I/50I 124 Mbps MPEG-2, 4-8 Mbps 352 × 240/288 4:2:0 30P/25P 30 Mbps MPEG-1, 1.5 Mbps 352 × 288 4:2:0 30P 37 Mbps H.261, H.263, 128-384 kbps 176 × 144 4:2:0 30P 9.1 Mbps H.263, 20-64 kbps
Dimension of Digital Video Data
In video data value of each pixel is referred by its spatial and temporal locations. Value of a pixel at spatial location (h, w) and temporal location t can be written as p(h,w,t) . For example pixel in second row, third column of fifth frame is referred as p(2,3,5) . Value itself can by 1–D (e.g., gray level) or 3–D (e.g., RGB space). For example p(2,3,5) = 230 for gray level data and p(2,3,5) = (210, 35, 80) for RGB data. For spline base temporal video approximation methods we are interested mainly in the temporal variation of pixel values. Therefore most of time we would use only temporal index with the pixel i.e., pt means value of a pixel in tth frame. Similarly {p1 , p2 , . . . , pn } refers to values of a pixel in n frames of a video.
1.6
Organization of Thesis
Chapter 2 overviews the related work. Chapter 3 describes the algorithms of our fitting techniques. Chapter 4 describes the fitting method for group of pixels. Chapter 5 is about image compression using spline and quadtree decomposition. Chapter 6 concludes our work and gives suggestion for future work.
9
Chapter 2 RELATED WORK The main focus of our research was compression of temporal video data using spline (parametric curve). In the chapter we will describe the theory of various kinds of splines then how we used spline to compress (approximate) the video data. The main references of spline theory are [15], [53] and [70]. In this chapter we also describe the non-spline based temporal compression methods based on motion estimation.
2.1
Spline
Spline is a piecewise polynomial (parametric) curve. Splines are widely used in computeraided design and computer graphics because of the simplicity of their construction, their ease and accuracy of evaluation, and their ability to approximate complex shapes through curve fitting and interactive curve design. A Practical Guide to Splines by Carl DeBoor [11] is a classic reference in spline theory. It covers spline interpolation in detail with rigorous mathematical analysis. TSPACK [51] is a well known tension spline curve-fitting package. The primary purpose of TSPACK is to construct a smooth function which interpolates a discrete set of data points. The fitting methods in TSPACK are designed to avoid extraneous inflection points and preserve local shape properties of the data (monotonicity and convexity), or to satisfy the more general constraints of bounds on function values or first derivatives. The package also provides a parametric representation for constructing general planar curves and space curves. The code of TSPACK is written in ANSII standard FORTRAN. 10
2.1.1
Natural Cubic Spline
A Natural cubic spline is a spline constructed of piecewise third-order polynomials which passes through a set of control points. Natural cubic spline provides very smooth C 2 continuous curve. To generate a cubic spline that interpolates the k + 1 points, k cubic spline segments are used. Generalized cubic spline equation for a single parametric cubic spline segment can be written as follows:
Q(t) =
4 X
Bi ti−1 ,
i=1
= B1 + B2 t + B3 t2 + B4 t3 ,
t1 ≤ t ≤ t2 ,
(2.1)
where t1 and t2 are points at the beginning and end of the segment. Q(t) is the position vector of any point on the cubic spline segment. The constant coefficients Bi are determined by specifying four boundary conditions for the spline segment. Let P1 and P2 be the control points (position vectors) at the ends of the spline segment. Let P1′ and P2′ the derivatives with respect to t, be the tangents (tangent vectors) at the ends of the spline segment. Differentiating Eq. (2.1) yields: ′
Q (t) =
4 X
(i − 1)Bi ti−2 ,
i=1
= B2 + 2B3 t + 3B4 t2 .
(2.2)
Assuming that t1 = 0, and applying the four boundary conditions, Q(0) = P1 ,
(2.3)
Q(t2 ) = P2 ,
(2.4)
Q′ (0) = P1′ ,
(2.5)
Q′ (t2 ) = P2′ ,
(2.6)
11
yield four equations for the unknown coefficients Bi . Specifically, Q(0) = B1 = P1 , 4 ¯ X ¯ Q′ (0) = (i − 1)ti−2 Bi ¯ Q(t2 ) = ′
Q (t2 ) =
i=1 4 X
i=1 4 X
¯ ¯
i−1 ¯
Bi t
t=t2 i−2
(i − 1)t
i=1
(2.7) t=0
= B2 = P1′ ,
= B1 + B2 t2 + B3 t22 + B4 t32 ,
¯ ¯ Bi ¯
t=t2
= B2 + 2B3 t2 + 3B4 t22 .
(2.8) (2.9) (2.10)
Solving for B3 and B4 yields: 3(P2 − P1 ) 2P1′ P2′ − , − t22 t2 t2 ′ 2(P1 − P2 ) P1 P2′ = + 2 + 2. t32 t2 t2
B3 =
(2.11)
B4
(2.12)
These values of B1 , B2 , B3 and B4 determine the cubic spline segment. The shape of the segment depends on the position and tangent vectors at the ends of the segment. Note that the value of the parameter t = t2 at the end of the segment occurs in the results. Substituting the values of B1 , B2 , B3 and B4 in Eq. 2.1 we obtain: Q(t) = P1 +
P1′ t
¸ 3(P2 − P1 ) 2P1′ P2′ 2 t − − + t22 t2 t2 · ¸ 2(P1 − P2 ) P1′ P2′ 3 + + 2 + 2 t. t32 t2 t2 ·
(2.13)
Equation (2.13) is for a single cubic spline segment. To represent a complete curve, adjacent segments are joined together. Internal tangent vector P2′ at the internal joint between the two adjacent segments are need to be known. Internal tangent vector P2′ can be find by imposing the continuity condition at the internal joint. A piecewise spline of degree K has continuity of order K − 1 at the internal joints. Thus, the cubic spline has second-order-continuity. This means that the second derivative P2′′ is continuous across the joint; i.e., the curvature is continuous at the joint.
12
Differentiating Eq. (2.1) twice yields: Q′′ (t) =
4 X
(i − 1)(i − 2)Bi ti−3 ,
t1 ≤ t ≤ t2 .
(2.14)
i=1
For the first segment the parameter t range is 0 ≤ t ≤ t2 . Evaluating Eq. (2.14) at the end of the segment where t = t2 gives: Q′′ (t2 ) = 6B4 t2 + 2B3 .
(2.15)
For the second segment the parameter t range is 0 ≤ t ≤ t3 . Evaluating Eq. (2.14) at the beginning of second segment where t = 0 gives: Q′′ (0) = 2B3 .
(2.16)
Equating these two results and using Eqs. (2.7, 2.8) and Eqs. (2.11, 2.12) yields: 6t2
·
¸ ¸ · 2(P1 − P2 ) P1′ P2′ 3(P2 − P1 ) 2P1′ P2′ = + 2 + 2 + − − t32 t2 t2 t22 t2 t2 ¸ · 3(P2 − P1 ) 2P1′ P2′ . − − t22 t2 t2
(2.17)
Left hand-side of the Eq. (2.17) represents the curvature at the end of the first segment and the right-hand side represents the curvature at the beginning of the second segment. Multiplying Eq. (2.17) by t2 t3 and collecting terms gives: t3 P1′ + 2(t3 + t2 )P2′ + t2 P3′ =
¤ 3 £2 t2 (P3 − P2 ) + t23 P2 − P1 ) . t2 t3
(2.18)
Equation 2.18 can be solved for P2′ , the unknown tangent vector at the internal joint. Note that the end values of the parameter t, i.e., t2 and t3 , occur in the Eq. (2.18). These results can be generalized for n data points to give n − 1 piecewise cubic spline segments with position, slope and curvature, i.e., C 2 continuity at the joints. The generalized equations for any two adjacent segments Qk (t) and Qk+1 (t) are:
13
1.5 Cubic Spline Control Points P2
1
0.5
P3
0
−0.5
−1 −0.2
P1 0
0.2
0.4
0.6
0.8
1
1.2
Figure 2.1: Two segments of Natural cubic spline; interpolation is done in Euclidean space R2 . Cubic Spline Control Points
1.5 P3 1
0.5 P2 0 2 P1
1
1 0.5
0
0 −1
−0.5
Figure 2.2: Two segments of Natural cubic spline; interpolation is done in Euclidean space R3 . 14
Qk (t) = Pk +
Pk′ t
¸ ′ Pk+1 3(Pk+1 − Pk ) 2Pk′ t2 − − + 2 tk+1 tk+1 tk+1 · ¸ ′ ′ Pk+1 Pk 2(Pk − Pk+1 ) + + 2 + 2 t3 , t3k+1 tk+1 tk+1 ·
(2.19)
is for the first segment, and Qk+1 (t) = Pk+1 +
′ Pk+1 t
¸ ′ ′ Pk+2 3(Pk+2 − Pk+1 ) 2Pk+1 + t2 − − 2 tk+2 tk+2 tk+2 ¸ · ′ ′ 2(Pk+1 − Pk+2 ) Pk+1 Pk+2 + 2 + 2 t3 , + t3k+2 tk+2 tk+2 ·
(2.20)
is for the second segment. The parameter t range begins at zero for each segment, for the first segment 0 ≤ t ≤ tk+1 and for the second segment 0 ≤ t ≤ tk+2 . For any two adjacent spline segments, equating the second derivatives at the common internal joint, i.e., letting Q′′k (tk ) = Q′′k+1 (0), yields the generalized results, equivalent to Eq. (2.18), i.e.,
3 tk+1 tk+2
′ ′ tk+2 Pk′ + 2(tk+1 + tk+2 )Pk+1 + tk+1 Pk+2 = £2 ¤ tk+1 (Pk+2 − Pk+1 ) + t2k+2 Pk+1 − Pk ) , 1 ≤ k ≤ (n − 2),
(2.21)
is for determining the tangent vector at the internal joint between any two splice segments Pk and Pk+1 . Applying Eq. (2.21) recursively over all the spline segments yields (n−2) equations for the tangents Pk′ , 2 ≤ k ≤ (n − 1). In the matrix form the result is: [M ∗ ] [P ′ ] = [R] ,
15
(2.22)
where t 2(t2 + t3 ) t2 0 . . . . 3 0 t4 2(t3 + t4 ) t3 0 . . . ∗ M = 0 0 t5 2(t4 + t5 ) t4 0 . . , .. .. .. .. .. .. .. .. . . . . . . . . . . . . 0 tn 2(tn + tn−1 ) tn−1
P′ 1
′ P2 P ′ = P3′ , .. . Pn′
R=
3 t2 t3
− P2 ) +
t23 P2
− P1 )]
− P3 ) + − P2 )] . .. . £2 ¤ 2 tn−1 (Pn − Pn−1 ) + tn (Pn−1 − Pn−1 ) 3 t3 t4
3 tn−1 tn
[t22 (P3 [t23 (P4
t23 P3
Since there are only (n − 2) equations for the n tangents , [M ∗ ] is not square and thus cannot be inverted to obtain the solution for [P ′ ]; i.e., the problem is indeterminate. By assuming that the end tangents P1′ and Pn′ are known, the problem becomes determinant. The matrix form is now: [M ] [P ′ ] = [R] ,
(2.23)
where
1
0
.
.
.
.
.
.
t3 2(t2 + t3 ) t2 0 . . . . 0 t4 2(t3 + t4 ) t3 0 . . . M = 0 0 t5 2(t4 + t5 ) t4 0 . . , .. .. .. .. .. .. .. .. . . . . . . . . . . . . 0 tn 2(tn + tn−1 ) tn−1 . . . . . . 0 1 16
P1′
′ P2 ′ P = P3′ , .. . Pn′
P1′
3 2 2 [t (P − P ) + t P − P )] 3 2 1 3 2 t2 t3 2 3 2 2 [t3 (P4 − P3 ) + t3 P3 − P2 )] t t 3 4 . R= .. . £ ¤ 3 2 2 t t tn−1 (Pn − Pn−1 ) + tn Pn−1 − Pn−1 ) n−1 n ′ Pn [M ] in Eq. (2.24) is now square and invertible. Fortunately [M ] is also tridiagonal,1 which reduces the computational load to invert it. Further, [M ] is diagonally dominant,2 consequently it is nonsingular, and inversion yields a unique solution. The solution of [P ′ ] is thus: [P ′ ] = [M ]−1 [R] .
(2.24)
Once the Pk′ are known, the Bi coefficients for each spline segment can be determined. Generalized Eqs. (2.7, 2.8) and (2.11, 2.12) yield: B1k = Pk ,
(2.25)
B2k = Pk′ , ′ Pk+1 3(Pk+1 − Pk ) 2Pk′ − − , B3k = t2k+1 tk+1 tk+1 ′ Pk+1 Pk′ 2(Pk − Pk+1 ) + + . B4k = t3k+1 t2k+1 t2k+1
(2.26) (2.27) (2.28)
Note that if Pk and Pk′ is vector valued e.g., Pk and Pk′ have (x, y, z) components (3dimensional data) then coefficients Bi would also be vector valued with same dimension. 1
A tridiagonal matrix is one in which coefficients appear only on the main, first upper and first lower diagonals. 2 In a diagonally dominant matrix the magnitude of the terms on the main diagonal exceed that of the off-diagonal terms on the same row.
17
In matrix form equation of coefficients Bi can be written as follows:
B 1k B2k [B] = B3k B4k 1 0 = −3 t2 k+1
,
2 t3k+1
0
0
1
0
−2 tk+1
3 t2k+1
1 t2k+1
−2 t3k+1
0
Pk 0 Pk′ −1 tk+1 Pk+1 1 ′ Pk+1 t2 k+1
.
(2.29)
To generate a piecewise cubic spline segment through n given control points Pk , 1 ≤ k ≤ n, with the end tangents P1′ and Pn′ , Eq. (2.24) is used to determine the internal tangents Pk′ , 2 ≤ k ≤ (n − 1). Then for each piecewise cubic spline segment the end position and tangents for that segment are used to determine the coefficients Bi , 1 ≤ i ≤ 4 for that segment using Eq. (2.29). Finally the general form of Eq. (2.1) is:
Qk (t) =
4 X
Bik ti−1 ,
0 ≤ t ≤ tk+1 ,
1 ≤ k ≤ (n − 1).
(2.30)
0 ≤ t ≤ tk+1 .
(2.31)
i=1
In matrix form Eq. (2.30) can be written as:
Qk (t) =
2.1.2
h
1 t t2
B1k
i B2k t3 B3k B4k
,
Cubic Cardinal Spline
In our thesis the word Cardinal spline always referred to cubic Cardinal spline. A Cardinal spline is a cubic Hermite spline whose tangents are defined by the points and a tension parameter. This spline creates a curve from one point to another taking into account the points before and after. By taking into account the way points before and after the current curve, the curves appear to join together making one seamless curve [71]. A 18
Cardinal spline segment is defined by four control points. In order to define the boundary conditions of k th segment as shown in Fig. 2.3, we use following conventions: • Pk−1 : Beginning point of (k − 1)th segment. • Pk : Beginning point of k th segment/Ending point of (k − 1)th segment. • Pk+1 : Ending point of k th segment/Beginning point of (k + 1)th segment. • Pk+2 : Ending point of (k + 1)th segment. Pk and Pk+1 are beginning and ending points of k th segment, Pk−1 and Pk+2 are used to calculate the tangent of Pk and Pk+1 . Now the equations for boundary conditions of k th segment can be written as: 22
Pk+2
Cardinal Spline Control Points
Pk+1
20
18
16
14
Pk
12
10
8 15
Pk−1 20
25
30
35
40
Figure 2.3: A k th Cardinal spline Segment, T ension = 0.
1 (1 − T ) (Pk+1 − Pk−1 ) , 2 1 = (1 − T ) (Pk+2 − Pk ) , 2
Pk′ = ′ Pk+1
19
(2.32) (2.33)
where parameter T is Tension and it controls how loosely or tightly the spline fits the control points. When T = 0, Cardinal spline is known as Catmull-Rom spline, or Overhauser spline [6, 12, 7]. A recursive evaluation algorithm for Catmull-Rom splines is presented by [43]. For k joined segments, there are 2k conditions for continuity of functions and 2k conditions for continuity of slopes. Finally the equation of cardinal spline is written as follows: Qk (t) =(−st3 + 2st2 − st)Pk−1 + [(2 − s)t3 + (s − 3)t2 + 1]Pk 3
2
3
2
(2.34)
+ [(s − 2)t + (3 − 2s)t + st]Pk+1 + (st − st )Pk+2 , where t is parameter of interpolation, 0 ≤ t ≤ 1, and s is related to Tension by s = (1 − T ) /2. In order to generate n points between Pk and Pk+1 inclusive, the parameter t is divided into (n − 1) intervals between 0 and 1 inclusive, and Q(t) is evaluated (interpolated) at n values of t. Note that a point is vector valued e.g., a point can have (x, y, z) components for 3-dimensional data. But s and T are scalar entities i.e., 1-dimensional data. In matrix form Eq.(2.34) can be written as:
Qk (t) =
h
t3 t2
P k−1 i Pk t 1 MC Pk+1 Pk+2
,
0 ≤ t ≤ 1,
(2.35)
where MC is Cardinal matrix. MC can be written as:
−s (2 − s) (s − 2) s 2s (s − 3) (3 − 2s) −s . MC = −s 0 s 0 0 1 0 0
(2.36)
Fig. 2.4 and Fig. 2.5 show multi-segment Cardinal splines in R2 and R3 respectively. 20
50 Cardinal Spline Control Points
45 40 35 30 25 20 15 10 10
20
30
40
50
60
70
Figure 2.4: Multi segment cubic Cardinal spline, Tension=0; interpolation is done in Euclidean space R2 . Cardinal Spline Control Points
40 30 20 10 0 −10 −20 50 40
80 60
30
40
20
20 10
0
Figure 2.5: Multi segment cubic Cardinal spline, Tension=0; interpolation is done in Euclidean space R3 . 21
2.1.3
B´ ezier Curve
B´ezier curves are named after a French engineer named Pierre B´ezier, who used them to design the Body of a Renault Car in the 1970’s [41, 42]. Paul de Casteljau developed a robust and numerically stable algorithm for computation of a B´ezier curve [72, 8]. An integer version of the de Casteljau algorithm of NURBS curves is presented by [2]. A B´ezier curve is a parametric curve that passes through its end-control-points while shape of the curve is controlled by its middle-control-points (except for linear B´ezier, that does not have middle-control-points). Continuity of cubic B´ezier is C 0 , therefore for multiple segments each segment can be evaluated independently to its previous and next segments. Continuous multiple segment curve can be obtained by taking the first control point of current segment same as the last control point of the previous segment. A B´ezier curve of degree m can be generalized as follows: Q(t) =
m µ ¶ X m j=0
j
Pj (1 − t)m−j tj ,
0 ≤ t ≤ 1,
(2.37)
where Q(t) is an interpolated point at parameter value t, m is degree of B´ezier curve and Pj is j th control point. To generate n points between first and last control points inclusive, the parameter t is divided into n − 1 intervals between 0 and 1 inclusive. Equations of linear, quadratic and cubic B´ezier curves can be derived from (2.37) as follows: Q(t) = (1 − t)P0 + tP1 ,
(2.38)
Q(t) = (1 − t)2 P0 + 2t(1 − t)P1 + t2 P2 ,
(2.39)
Q(t) = (1 − t)3 P0 + 3t(1 − t)2 P1 + 3t2 (1 − t)P2 + t3 P3 .
(2.40)
B´ezier curve passes through its end control points (ECP) i.e., P0 and P1 for linear B´ezier, P0 and P2 for quadratic B´ezier and P0 and P3 for cubic B´ezier. The middle control points (MCP), i.e., P1 for quadratic B´ezier and P1 and P2 for cubic B´ezier, determines the shape of curve (no middle control point exists for linear B´ezier). Figures 2.6 and 2.7 show quadratic B´ezier curves, each point of curve is in Euclidean space R2 and R3 respectively. Figures 2.8 and 2.9 show cubic B´ezier curves, each point of curve is in Euclidean space R2 and R3 respectively. Figure 2.10 shows two connected 22
segments of cubic B´ezier curve, each point of curve is in Euclidean space R3 . 200
P0
Quadratic Bezier Curve Control Polygon Control Point
190
180
170
160
P2
150 P1 140 290
300
310
320
330
340
350
360
Figure 2.6: Quadratic B´ezier Curve, interpolation is done in Euclidean space R2 .
23
Quadratic Bezier Curve Control Polygon Control Point
300 P2
200
100
0
P0 P1
−100 200
360
180
340 320
160 300 140
280
Figure 2.7: Quadratic B´ezier Curve, interpolation is done in Euclidean space R3 . 200 Cubic Bezier Curve Control Polygon Control Point
P0 190
180
170
160 P1 P3
150 P2 140 280
290
300
310
320
330
340
350
360
Figure 2.8: Cubic B´ezier Curve, interpolation is done in Euclidean space R2 . 24
Cubic Bezier Curve Control Polygon Control Point 300 P3
200 100 0
P2
P0 P1
−100 200
360
180
340 320
160 300 140
280
Figure 2.9: Cubic B´ezier Curve, interpolation is done in Euclidean space R3 .
25
Cubic Bezier Curve Control Polygon Control Point 200
100 joint−point between two segments
0
−100
−200 400 400
300 350 200
300 100
250
Figure 2.10: Two connected Cubic B´ezier Curves, interpolation is done in Euclidean space R3 .
2.1.4
Parameterization
Splines are parametric curves i.e., in order to interpolate the data, the spline is evaluated at given values of parameter t. There are many types of parameterizations, used for spline interpolation [14]. We would describe only uniform and chord-length parameterization that are most widely used. Uniform unit parameterization: Uniform unit parameterization is the most simple form of parameterization. In order to generate n values of parameter t, it is equally divided into n − 1 intervals, between 0 and 1 inclusive. ti =
i−1 , n−1
26
1 ≤ i ≤ n,
(2.41)
or computationally more efficient way to calculate ti is:
ti =
0
i = 1;
ti = ti−1 + ∆, 1
∆=
1 , n−1
2 ≤ i ≤ n − 1;
(2.42)
i = n.
Uniform unit parameterization is simple, computationally efficient and there is no need to store the values of t. Only single value data i.e., count of interpolating points is sufficient to generate n values of t. Therefore for video data compression we used unit parameterization. Chord-length parameterization: In chord-length parameterization the value of parameter t is based on Euclidean distance between points.
ti =
0
i = 1;
|p1 p2 |+|p2 p3 |+...+|pi−1 pi |
|p1 p2 |+|p2 p3 |+...+|pn−1 pn | 1
2 ≤ i ≤ n − 1;
(2.43)
i = n.
Chord-length parameterization is more accurate than uniform parameterization. But it is computationally expensive and requires to store the value of t (or original data points), which is not feasible for our video data compression purpose.
2.2
Spline and Data Fitting
Approximation of data using parametric curves, particularly cubic splines is explored by many authors [45]. An algorithm to automatically generate sketches from stereo image using cubic B´ezier curve is discussed in [36]. Curve fitting algorithms in the domain of approximating boundary of characters/fonts are proposed by [24, 55]. Least square fitting using cubic B´ezier curve for font design is proposed by [55]. [24] also uses least square fitting but the form of cubic curve used by [24] is not conventional cubic B´ezier, where the curve passes through the first and last control points. [24] solves the least square equations for all four control points and uses B´ezier clipping to find the intersection of joining segments. In both [24, 55] re-parameterization is used to find the better value 27
of parameter t. Re-parameterization is computationally expensive process and involves equations of degree 5. Due to huge size of video data using re-parameterization is out of question. In traditional curve fitting methods, particularly for font design, a very important factor is to generate smooth curves that are visually pleasing. In approximation of video data smoothness is not an important factor, because the approximated curve is not the final output to be visually judged. Rather it is color or gray level data of a pixel that is to be matched with original color or gray level data of the pixel. Therefore the closeness of fit is desired; so that approximated video is as close to original video as possible. Consequently all curve fitting techniques that are based on smoothness or high-order continuity are in general not very suitable for video data approximation. Curve fitting algorithms in the domain of image compression is also investigated by many authors e.g., [33, 67, 63] et al. [33] presented a medical image compression algorithm using cubic spline interpolation. Contour data compression method using Curvature Scale Space technique and Hermite curves is proposed by [67]. [63] proposed method for compression of 1-D and 2-D image signals using cubic convolution spline interpolation. [46] considers a video as two parts, a space of possible images and a trajectory through that space. [46] assumes m × n pixel images exist in the mn–dimensional space where each dimension is the intensity at a particular pixel. The proposed method of [46] uses Isomap, a non–linear dimensionality reduction technique, to represent a set of images as a set of points in a low dimensional space and uses spline to connect points in reduced dimension space. We used spline but without dimension reduction but, unlike [46] we used spline/line to approximate pixel values in Euclidean space R1 or R3 . Spline representation of boundaries of objects in image/video not only gives compact and compressed representation of objects but can also be used to segments objects from the scene. [65] proposed a method for automatic segmentation of moving objects in image sequences. The method uses graph labeling over a region adjacency graph (RAG) and classifies regions obtained in an initial partition as foreground or background, based on motion information. The label field is modeled as a Markov random field (MRF). An initial spatial partition of each frame is obtained by a watershed algorithm. The motion of each region is estimated by hierarchical region matching. Each region is tracked to the next frame in the sequence. The algorithm of [65] is quite useful to find objects in 28
MPEG–4 that relies on a decomposition of each frame of an image sequence into video object planes (VOPs). Another method by [13] describes an interactive object segmentation approach in a video scene by fitting splines to graph cuts. In order to extract an object from an input sequence the user first marks a point on the object and an area around it in a single frame. The object position is tracked forward and backward in time and the result is a set of discrete points (x, y, t). [13] fits a 3–D spline to this set of points by a greedy algorithm. The spline representation provides not only segmentation but also more controls to user by manipulating control points.
2.2.1
Compression of Temporal Video Data by Motion Compensation
Non-spline based methods to approximate/compress the temporal video data are called motion estimation (ME) or motion compensation (MC) methods. Motion compensation is an important part of any video compression system [17, 59]. Motion estimation algorithms are based on temporal changes in intensities of sequence of frames. In fact, the observed 2-D motions based on intensity changes may not be the same as actual 2-D motions. To be more precise, the velocity of observed motion is referred as optical flow. Optical flow can be caused not only by object motions, but also camera movements or illumination conditions changes. So it is quite possible that there is change in intensities without actual motion. As we would later see that our spline based intensity approximation methods are more robust because they work in both situations i.e., changes in intensities with or without actual motion. Whereas conventional motion compensation methods based on block matching are dependent on actual motion of object (block) to find the matching block. Most of the prevalent motion compensation methods use a translating block matching technique. In this approach, the frame being encoded is divided into rectangle blocks called macro blocks or simply blocks. For each block a search is made in the previously reconstructed frame for the block of same size that closely matches the block being encoded. If the match is successful then the block is encoded using motion vector (MV), a relative location of the block with respect to the found matched block in the previously
29
encoded frame, otherwise the block is encoded without using previously encoded blocks [56, 68, 5]. The search area for a macro block match is constrained up to p pixels on all four sides of the corresponding macro block in previous frame. This p is called as the search parameter. Larger motions require a larger p, and the larger the search parameter the more computationally expensive the process of motion estimation becomes. Usually the macro block is taken as a square of side 16 pixels, and the search parameter p is 7 or 15 pixels. The matching of one macro block with another is based on the output of a cost function. The macro block that results in the least cost is the one that matches the closest to current block. There are various cost functions, of which the most popular and least computationally expensive is Mean Absolute Difference (M AD) given by Eq. (1.3). Another cost function is Mean Squared Error (M SE) given by Eq. (1.2). Since our proposed methods are essentially alternatives to conventional motion estimation algorithms, therefore we would describe most important and prevalent MC algorithms. Exhaustive Search (ES) or Full Search (FS) is the most computationally expensive block matching algorithm. This algorithm calculates the cost function at each possible location in the search window and finds the best possible match and gives the highest PSNR among any block matching algorithms. The cost of ES can be imagined by an example for which frame size is 512 × 512, block size is 16 × 16 and search range is 16. For a video sequence with a frame rate of 30 fps, the number of operations required per second is 8.55 × 109 , an astronomical number! This example shows that ES requires intense computation, which poses a challenge to applications using software only solution. There have been many research efforts on efficient realization of ES, using VLSI chips [27, 4, 44, 21]. et al. Three Step Search (TSS) [26, 29] was proposed to overcome the slowness of ES. The idea behind TSS is that the error surface due to motion in every macro block is unimodal. A unimodal surface is a bowl shaped surface such that the weights generated by the cost function increase monotonically from the global minimum. TSS starts with the search location at the center and sets the step size, S = 4, for a p = 7. It then searches at eight locations ±S pixels around location (0, 0). From these nine locations searched so far it picks the one giving least cost and makes it the new search origin. It then sets the 30
new step size S = S/2, and repeats similar search for two more iterations until S = 1. At that point it finds the location with the least cost function and the macro block at that location is the best match. The concept is explained in Fig. 2.11. The calculated motion vector is then saved for transmission. It gives a flat reduction in computation by a factor of 9. For p = 7, ES will compute cost for 225 macro blocks whereas TSS computes cost for 25 macro blocks.
Figure 2.11: Three Step Search procedure. The motion vector is (5, −3). New Three Step Search (NTSS) [31] improves on TSS results by providing a center biased searching scheme and having provisions for half way stop to reduce computational cost. NTSS was also used for implementing earlier video coding standards like MPEG-1 and H.261. The TSS uses a uniformly allocated checking pattern for motion detection and is prone to missing small motions. The NTSS process is illustrated in Fig. 2.12. In the first step 16 points are checked in addition to the search origin for lowest weight using a cost function. Of these additional search locations, 8 are a distance of S = 4 away (similar to TSS) and the other 8 are at S = 1 away from the search origin. If the lowest cost is at the origin then the search is stopped right here and the motion vector is set as (0, 0). If the lowest weight is at any one of the 8 locations at S = 1, then we change the origin of the search to that point and check for weights adjacent to 31
it. Depending on which point it is we might end up checking 5 points or 3 points. The location that gives the lowest weight is the closest match and motion vector is set to that location. On the other hand if the lowest weight after the first step was one of the 8 locations at S = 4, then we follow the normal TSS procedure. Hence although this process might need a minimum of 17 points to check every macro block, it also has the worst-case scenario of 33 locations to check.
Figure 2.12: New Three Step Search procedure. Big circles are checking points in the first step of TSS and the squares are the extra 8 points added in the first step of NTSS. Triangles and diamonds are second step of NTSS showing 3 points and 5 points being checked when least weight in first step is at one of the 8 neighbors of window center. Four Step Search (4SS) [48, 75] similar to NTSS, 4SS also employs center biased searching and has a halfway stop provision. 4SS sets a fixed pattern size of S = 2 for the first step, no matter what the search parameter p value is. Thus it looks at 9 locations in a 5 × 5 window. If the least weight is found at the center of search window the search jumps to fourth step. If the least weight is at one of the eight locations except the center, then it searches origin and move to the second step. The search window is still maintained as 5 × 5 pixels wide. Depending on where the least weight location was, 4SS might end up checking weights at 3 locations or 5 locations. Once again if the least weight location is at the center of the 5 × 5 search window 4SS jump to fourth step or else we move on to third step. The third is exactly the same as the second step. In the fourth step the 32
window size is dropped to 3 × 3, i.e., S = 1. The location with the least weight is the best matching macro block and the motion vector is set to point on that location. A sample procedure is shown in Fig. 2.13. This search algorithm has the best case of 17 checking points and worst case of 27 checking points.
Figure 2.13: Four Step Search procedure. The motion vector is (3, −7) Diamond Search (DS) [77] algorithm is exactly the same as 4SS, but the search point pattern is changed from a square to a diamond, and there is no limit on the number of steps that the algorithm can take. DS uses two different types of fixed patterns, one is Large Diamond Search Pattern (LDSP) and the other is Small Diamond Search Pattern (SDSP). These two patterns and the DS procedure are illustrated in Fig. 2.14. Just like in FSS, the first step uses LDSP and if the least weight is at the center location we jump to fourth step. The consequent steps, except the last step, are also similar and use LDSP, but the number of points where cost function is checked are either 3 or 5 and are illustrated in second and third steps of procedure shown in Fig. 2.14. The last step uses SDSP around the new search origin and the location with the least weight is the best 33
match. As the search pattern is neither too small nor too big and the fact that there is no limit to the number of steps, this algorithm can find global minimum very accurately. The end result should see a PSNR close to that of ES while computational expense should be significantly less.
Figure 2.14: Diamond Search procedure. This figure shows the large diamond search pattern and the small diamond search pattern. It also shows an example path to motion vector (−4, −2) in five search steps four times of LDSP and one time of SDSP Hierarchical/Multi-resolution Block Matching Algorithm (HBMA) The use of multi-resolution representation for image processing was first introduced by [9]. For video coding various multi-resolution algorithms are proposed [28, 57, 30, 66, 35] et al. Most of the proposed multi-resolution algorithms are for video coding that uses DCT. But there are some algorithms for wavelet based video coding such as [34]. A multi-resolution block matching algorithm subsamples the image to smaller sizes of various resolutions by filtering and sub-sampling, as shown in Fig. 2.15. Image of each resolution can be considered as a level of multilevel pyramid. Conventional block matching algorithms, such as TSS or 4SS etc, is first applied to the highest level of the pyramid. Then the obtained motion vector is doubled in size, and further refinement is carried out in the next level. The process is repeated to the last level. Therefore, with an n–level pyramid the maximum motion speed w at the highest level is reduced to w/2n−1 . 34
Figure 2.15: A three level multi-resolution image for HBMA
2.2.2
Spline fitting vs Block Matching
The proposed spline based methods differ from traditional MC methods that are based on block matching, in following aspects: 1. Our methods work in R1 space or R3 space. We used break and fit approach to approximate the change in intensity/color using spline. The traditional methods are based on 2–D block matching and do not use any data fitting model. 2. In our methods translation of pixel or block is not considered, which is an essential part of existing methods. 3. In the block matching technique missing block is reconstructed by reference of some existing blocks, while in our methods missing pixel values are obtained by interpolation. 4. Unlike existing methods, the proposed methods do not require to specify 2–D motion vectors, rather than they use 1–D keyframe indices or 1–D count of interpo35
lating points. 5. Our methods do not require to encode error between original data and fitted data. Only control points of spline are encoded. Block matching algorithms require to encode block-error, in addition to reference block and its motion vector. If error is not coded then decoded frame would have very bad subjective quality. BMA works at block level and may cause blocking artifacts [68, 22]. The pixel level fitting does not cause blocking-artifacts but it may cause splining or lining artifacts.
2.3
Image Compression
The two broad classes of images are natural and synthetic. Natural images occur in nature and are captured by digital devices like camera. The pixel intensities of natural images vary smoothly and there is correlation in the neighboring pixel values. Synthetic images are computer generated, and include animation, clip-art, cartoons, medical images, text images, maps etc. The pixel intensities of synthetic images do not vary smoothly but contain a small discrete set of values. There are large areas (quadrants) of a uniform color and there are sharp changes in color. Due to extensive research on lossy compression of natural images, several techniques based on discrete cosine transform (JPEG), wavelet transform (JPEG2000), and fractal coding have been developed. The advancement and availability of new software tools to create and generate synthetic images has raised the interest in research community to investigate and devise better techniques for compression of synthetic images. Synthetic images cannot be compressed well using lossy compression techniques [19]. The two most widely used formats for lossless compression are GIF and PNG. A concise overview of GIF and PNG image compression techniques is described as follows. Graphics Interchange Format The Graphics Interchange Format (GIF) is a lossless 8-bit-per-pixel bitmap image format that was introduced by CompuServe in 1987 [73, 19]. GIF images are compressed using the Lempel-Ziv-Welch (LZW) coding [69], a dictionary-based technique to exploit redundancy. The initial size of dictionary is 29 , when this fills up, the dictionary size is doubled, until the maximum dictionary size of 36
4096 is reached. Afterwards the compression algorithm behaves like a static dictionary algorithm. A more detailed description can be found in [37, 18]. GIF is widely used for lossless compression of both natural and synthetic images. While GIF works well with synthetic images, and pseudo color or color-mapped images, it is generally not the most efficient way to compress natural images, photographs, satellite images [56]. LZW coding, used in GIF, scans pixels from left to right, top to bottom. Therefore, horizontal patterns are effectively compressed but vertical patterns are not [64]. Portable Network Graphics (PNG) Portable Network Graphics (PNG) is a bitmap image format that employs lossless data compression. PNG was created to improve upon and replace the GIF format, as an image-file format not requiring a patent license [74]. The compression algorithm used in PNG is based on LZ77 [78], a dictionarybased compression technique. PNG uses deflate [52] implementation of LZ77. At each step the encoder examines three bytes. If it cannot find a match of three bytes, it abandons the first byte and examines the next three bytes. So, at each step it either abandons the value of a single byte, or a literal, or a pair hmatch length, offseti. The alphabets of the literal and match length are combined to form an alphabet of size 286 (indexed between 0-285). The indices 0-255 represent literal bytes and the index 256 is an end-of-block symbol. The remaining 29 indices represent the codes for ranges of lengths between 3 and 258. A more detailed description and standard tables for representation of match length and Huffman codes can be found in [37, 47]. For most images, PNG can achieve greater compression than GIF. A quadtree based image compression method for cartoon images is presented by [64]. In this method an RGB image is first converted into indexed image. An index image consists of color palette and indices that refers to color palette. Then [64] applies quadtree decomposition to image. This method relies on quadtree only, therefore suffers when the size of block is very small. In addition to that due to limited size of color palette, it is only applicable to images with maximum 256 colors such as cartoon images. Our method is different from any conventional quadtree based image compression in several ways : (1) We imposed the constraint of minimum block size. (2) We separated homogeneous and non-homogeneous quadrants. (3) In addition to quadtree, we also used parametric line.
37
Chapter 3 DATA FITTING SCHEMES This is an important chapter that describes the proposed data fitting algorithms using several kinds of spline. Organization of this chapter is as follows. Section 3.1 addresses the importance of fitting from data compression perspective. Section 3.2 defines basic terminologies of fitting process. A general strategy to fit a given data by a parametric spline is described in section 3.3. Afterwards sections discuss the specific issues related to each kind of spline we used in fitting of video data.
3.1
Fitting and Compression
Our objective of fitting video data using spline is to compress the video data. The basic principle of fitting via parametric spline is to approximate large number of points via spline interpolation using only few control points. Once a segment is approximated by a spline then we have to store control points and number of points in the segment. Saving number of points in the original data ensures that later we can generate equal number of points as in original data via spline interpolation. Even though interpolation allows to generate points less or greater number of points than in the original data. But for video data we want to generate equal number of frames via interpolation, therefore this count is essential to store. In general, number of control points are far less than the number of points in the original data, therefore we compress the data. Figure 3.1 shows how a spline (Catmull-Rom) fits a data. This compression is at the cost of accuracy. Since interpolated data depends on control points, it is very important to find or select 38
appropriate control points that minimize the approximation error. In addition to that our objective is to find minimal number of control points to achieve higher compression. 236 234 232 230 228 226 224 Original Data Fitted Data Control Points
222 220
1
10
20
30
40
50
60
70
80
Figure 3.1: Fitting of data by a cubic Catmull-Rom spline
3.2
Fitting Terminologies and Notations
Original Data: A set of points O = {p1 , p2 , . . . , pn }. For example Intensity or RGB values of a pixel in sequence of n frames. This data is to be approximated by spline. Interpolated/Fitted/Approximated Data: A set of points Q = {q1 , q2 , . . . , qn } obtained from spline interpolation or fitting or approximation, such that Q(ti ) = qi . In our fitting schemes, number of points in set O and set Q are always equal and qi is corresponding approximated point of pi . An important goal of fitting is to minimize the M SE between O and Q. Breakpoints: A set of points BP = {bp1 , bp2 , . . . , bpm } that are taken from O i.e., BP ⊆ O, usually m ≪ n (n is number of points in original data, m is number 39
of breakpoints). We always order breakpoints in increasing order with respect to its position/index in O. Interpolated curve must pass through the breakpoints. First and last point of original data are always taken as breakpoints. In addition to that many times we take breakpoints at regular intervals. These breakpoints make the fitting process more efficient and accurate. In video data, a breakpoint is nothing but value of a pixel from original video data at certain spatial location and certain frame. A control point may or may not be a breakpoint. For example in B´ezier curve, end control points are breakpoints but middle control points are not breakpoints. Break-indices: Break-indices BI = {bi1 , bi2 , . . . , bim } are indices of breakpoints with respect to their positions in O. For example if there are 50 points in set O = {p1 , p2 , . . . , p50 } and we take breakpoints after every 15th point from set O (plus last point even if it is not the 15th point from its predecessor point) then BP = {p1 , p16 , p31 , p46 , p50 } and BI = {1, 16, 31, 46, 50}. Count of points in a segment: It is simply a count of points in a segment i.e., ck = (bik+1 − bik ) + 1. Segment: Segment is a set of points (original or interpolated) between two consecutive breakpoints. If data is divided into l segments then SO = {SO1 , SO2 , . . . , SOl } and SQ = {SQ1 , SQ2 , . . . , SQl } are sets of original and interpolated segments respectively. SOk = {pbik , pbik +1 , . . . , pbik +ck −1 } and SQk = {qbik , qbik +1 , . . . , qbik +ck −1 } are k th segments of original and interpolated data respectively. For example SO1 = {p1 , p2 , . . . , p16 } and SO2 = {p16 , p17 , . . . , p31 } are first and second segment of original data. Similarly SQ1 = {q1 , q2 , . . . , q16 } and SQ2 = {q16 , q17 , . . . , q31 } are first and second segment of approximated data. In the notations bik+1 and bik + 1 are different. Note that bik + ck − 1 = bik+1 , so pbik +ck −1 = pbik+1 and qbik +ck −1 = qbik+1 . In order to get a continuous curve we take first point of (k + 1)th segment same as the last point of k th segment. Number of segments in original and approximated data are equal. Number of segments are one less than the number of breakpoints. Error: Error between original data O and fitted data Q. Error is measure at pixel level 40
and segment level. 1. Pixel level error (ξ): Error is measure between each pixel of original data and its corresponding point (pixel) of approximated data. Square distance (SD) or Absolute distance (AD) can be used to measure pixel level error. Error between value of original pixel pi and value of approximated pixel qi is represented as: ξi = Error(pi , qi ),
1 ≤ i ≤ n,
(3.1)
where n is number of pixel in the data. Error between O and Q at breakpoints is zero because approximated data always passes though breakpoints. 2. Segment level error (λ): Error is computed for each segment independently. Segment level error provides information about the error of the segment. Informally, it is average of all pixel errors in a segment. Error can be measure as mean square error (M SE), mean absolute error (M AE) etc. Segment level error between k th original segment and k th approximated segment SQk is represented as: λk = Error(SOk , SQk ),
1 ≤ k ≤ l,
(3.2)
where l is number of segments in the data. For computational efficiency pixel level error calculations are used to compute segment level errors. If a segment splits then error is computed for only those approximated pixels/segments that are modified. Maximum error (ξ max ): Maximum value of error between original data and fitted data. Maximum error is measured at pixel level and segment level. 1. Pixel level maximum error (ξ max ): Maximum error is found at pixel level among errors of all pixels (complete data). ξ max = M ax(ξ1 , ξn , . . . , ξn ).
41
(3.3)
2. Segment level maximum error (λmax ): Maximum error is found at segment level among errors of all segments. λmax = M ax(λ1 , λn , . . . , λl ).
(3.4)
): 3. Pixel level maximum error for segment of maximum error (ξλmax max Maximum error is found at pixel level for the segment that produces maximum segment error. If λmax = λk = Error(SOk , SQk ), where SOk = {pu , pu+1 , . . . , pu+ck −1 } ,
u = bik ,
SQk = {qu , qu+1 , . . . , qu+ck −1 } ,
u = bik .
(3.5)
Then ξi = Error(pi , qi ),
u ≤ i ≤ u + ck − 1,
= M ax(ξu , ξu+1 , . . . , ξu+ck −1 ). ξλmax max
(3.6) (3.7)
Limit/Tolerance/Threshold of error: The maximum limit or upper bound of error between original data and fitted data. It means that the error between original data and fitted data would be less than or equal to limit of error. Limit of error can be applied at pixel level or at segment level. 1. Limit of error at pixel level (ξ lmt ): Limit or error constraint is applied at pixel level. For example error of each approximated pixel is bound to maximum allowed SD or maximum allowed AD. 2. Limit of error at segment level (λlmt ): Limit or error constraint is applied at segment level. For example error of each approximated segment is bound to maximum allowed MSE or maximum allowed MAE.
42
3.3
Fitting Strategy
Suppose we have a set of points (original data) O = {p1 , p2 , . . . , pn } and we want to approximate O using spline. As an input, we specify the value of limit of error and provide initial set of breakpoints. At least two breakpoints are required i.e., the first point and the last point of original data. Now the fitting process begins. We generate n points (approximated data) Q = {q1 , q2 , . . . , qn } using spline interpolation such that spline passes through breakpoints. Then we measure the error between original and approximated (fitted) data. If approximated data is not close enough to original data i.e., limit of error bound is violated then an existing segment splits (breaks) into two segments at the point called new breakpoint. After splitting number of segments are increased by one (splitted segment is replaced by two new segments). Number of breakpoints are also increased by one (new breakpoint is added in the set of existing breakpoints). Selection of splitting-segment and new-breakpoint depends where the limit of error constraint is applied. If the limit of error constraint is applied at pixel level then it is relatively easy to select the pixel, where the error is maximum between original and approximated pixels, and this pixel is added in the set of breakpoints. But if the limit of error constraint is applied at segment level then first we have to find the segment where error between original and approximated segment is maximum. Then within this maximum error segment, we have to select a pixel as a breakpoint. To select a pixel within maximum error segment, we select the local pixel that gives maximum error and this pixel is added in the set of breakpoints. It means that other than maximum error segment there may be one or more segments whose individual pixel can have higher error value than the local maximum error pixel of maximum error segment. Let bpnew is the new breakpoint and k th segment splits into two new segments k1 and k2 . Now they are two cases:
43
1. If limit of error is at pixel level ξ max = ξi = Error(pi , qi ),
1 ≤ i ≤ n,
pi ∈ SOk ,
(3.8)
qi ∈ SQk , bpnew = pi . 2. If limit of error is at segment level λmax = λk = Error(SOk , SQk ), SOk = {pu , pu+1 , . . . , pu+ck −1 } ,
u = bik ,
SQk = {qu , qu+1 , . . . , qu+ck −1 } ,
u = bik ,
= ξi = Error(pi , qi ), ξλmax max
(3.9)
u ≤ i ≤ u + ck − 1,
bpnew = pi After splitting, repeat the same fitting procedure using updated set of segments and breakpoints until error is less than or equal to the predefined limit of error. We call this technique of fitting break-and-fit strategy. Figure 3.2 describes the flow chart of fitting technique. Algorithm 1 and Algorithm 2 describe the psudcodes of fitting techniques for error bounds at pixel level and segment level respectively. Algorithm 1 Break-and-fit spline fitting, pixel level error bound Require: O = {p1 , p2 , . . . , pn }, BP = {bp1 , bp2 , . . . , bpm } , ξ lmt . 1: Fit the spline using BP 2: Find ξ max 3: while ξ max > ξ lmt do 4: Find bpnew ∈ k th segment 5: Add bpnew in BP 6: Split k th segment into k1th and k2th segments 7: Fit spline using updated BP 8: Find new ξ max 9: end while
44
x
Error Threshold (T1)
x
Breakpoint Interval
x
Temporal data of a pixel
Divide data in to segments using breakpoints
Fit Spline to (updated) segments
Compute Max. Error of Fit (T2)
T2 > T1
No
Yes
Insert new Breakpoint at point of Max. Error
Stop
Figure 3.2: Flow chart of fitting algorithm. Algorithm 2 Break-and-fit spline fitting, segment level error bound Require: O = {p1 , p2 , . . . , pn }, BP = {bp1 , bp2 , . . . , bpm } , λlmt . 1: Fit the spline using BP 2: Find λmax 3: while λmax > λlmt do and bpnew ∈ k th segment 4: Find ξλmax max 5: Add bpnew in BP 6: Split k th segment into k1th and k2th segments 7: Fit spline using updated BP 8: Find new λmax 9: end while 45
3.3.1
Natural Cubic Spline Fitting
To fit data by Natural cubic spline (NCS), breakpoints are taken as control points at the beginning and at the end of each segment. For l segments, 4l unknown coefficients B1j , B2j , B3j , B4j , 1 ≤ j ≤ l are solved by 2l conditions of continuity of functions, (l − 1) conditions of continuity of slopes, (l − 1) conditions of continuity of second derivatives. Two conditions for end tangent vectors of the spline i.e., P1´and Pl´assumed to be known and taken as 0. Split of a segment: Natural cubic spline has second-order-continuity C 2 i.e., the curvature is continuous at the joint. It means that split of a segment in NCS requires to recompute the coefficients values of all segments. This is computationally very expensive. For NCS there is no middle control points, therefore the data to be stored consists of breakpoints and count of interpolating points.
Data to be stored • Breakpoints. • Count of interpolating points for each segment.
3.3.2
Cubic Cardinal Spline Fitting (Fixed Tension)
In order to fit data by Cardinal spline the breakpoints are taken as control points of Cardinal spline. In addition to that a fixed value of Tension (a value between 0 and 1) can be given as initial input. Using separate value of Tension for each segment can also be used, but this requires to have some criteria to find optimal values of Tension for each segment and saving these values of Tension. In fixed Tension Cardinal spline approximation for k th segment, we take four consecutive breakpoints bpk−1 , bpk , bpk+1 and bpk+2 as four control points of k th Cardinal spline segment (see Fig. 2.3). Recall that Cardinal spline interpolates between two middle control points, therefore for k th segment in-between points are generated between bpk and bpk+1 . Split of a segment: For three consecutive segments k − 1, k and k + 1, the last three control points of the (k − 1)th segment i.e., bpk−1 , bpk and bpk+1 are same as that of
46
first three control points of the k th segment and first control point of the (k −1)th segment i.e., bpk−2 is not shared with the k th segment. Similarly the first three control points of the (k + 1)th segment i.e., bpk , bpk+1 and bpk+2 are same to last three control points of the k th segment and last control point of the (k + 1)th segment i.e., bpk+3 is not shared with the k th segment. It means that a segment shares control points with its previous and next segments. Consequently adding a new breakpoint (splitting) at a segment would require to recompute interpolated values for two new segments, obtained from splitting of current segment, in addition to that previous and next segments of splitted segments are also need to be interpolated again. Table 3.1 gives the details of control points in the splitting process. From computational point of view, splitting of a segment in Cardinal spline is far less expensive than splitting of a segment in Natural cubic spline but little more expensive than B´ezier curve. One more interesting issue is how to obtain four control points of first and last segments, because there is no previous and next segments for first and last segments respectively. We opted to repeat first and last breakpoints in the set of breakpoints. By this way we are able to automatically generate in-between points for all breakpoints from available set of breakpoints. Table 3.1: Splitting (Breaking) of a Segment in Cardinal Spline Segment Number
Control Points Before Split k−1 bpk−2 bpk−1 bpk bpk+1 k bpk−1 bpk bpk+1 bpk+2 k+1 bpk bpk+1 bpk+2 bpk+3 k th Segment split into k1 and k2 bpnew is created between bpk and bpk+1 k−1 bpk−2 bpk−1 bpk bpnew k1 bpk−1 bpk bpnew bpk+1 k2 bpk bpnew bpk+1 bpk+2 k+1 bpnew bpk+1 bpk+2 bpk+3
47
Data to be stored • Breakpoints. • Count of interpolating points for each segment. • Single value of Tension for all segments.
3.3.3
Cubic Cardinal Spline Least Square Fitting
The fitting of data by least square cubic Cardinal spline is similar to fixed Tension Cardinal spline, except that value of Tension is determined by least square solution for each segment. Cardinal Spline Least Square Solution
Q(ti ) =[(−st3i + 2st2i − sti )]P0 + [(2 − s)t3i + (s − 3)t2i + 1]P1 + [(s − 2)t3i + (3 − 2s)t2i + sti ]P2 + (st3i − st2i )P3 =[s(−t3i + 2t2i − ti )]P0 + [2t3i − t3i s + t2i s − 3t2i + 1]P1 +
[t3i s
−
2t3i
+
3t2i
−
2t2i s
+ sti ]P2 +
[s(t3i
−
t2i )]P3
(3.10)
=[s(−t3i + 2t2i − ti )]P0 + [s(−t3i + t2i )]P1 + [2t3i − 3t2i + 1]P1 + [s(t3i − 2t2i + ti )]P2 + [−2t3i + 3t2i ]P2 + [s(t3i − t2i )]P3 Let Ai = t3i − 2t3i + ti ,
(3.11)
Bi = t3i − t2i ,
(3.12)
Ci = 2t3i − 3t2i .
(3.13)
Then Eq. (3.10) can be written as: Q(ti ) = − sAi P0 − sBi P1 + (Ci + 1)P1 + sAi P2 − Ci P2 + sBi P3 =s[Ai P0 − Bi P1 + Ai P2 + Bi P3 ] + (Ci + 1)P1 − Ci P2 =s[(P0 + P2 )Ai + (P3 − P1 )Bi ] + [(P1 − P2 )Ci + P1 ] 48
(3.14)
Let Di = (P2 − P0 ) Ai + (P3 − P1 ) Bi ,
(3.15)
Ei = (P1 − P2 )Ci + P1
(3.16)
Then Eq. (3.14) can be written as: Q(ti ) = Di s + Ei .
(3.17)
Sum of square distances between original data and approximated data is given by the following: X=
n X
[Q(ti ) − pi ]2 .
(3.18)
i=1
Substituting the value of Q(ti ) from Eq. (3.17) in Eq. (3.18) gives X=
n X
[(Di s + Ei ) − pi ]2 .
(3.19)
i=1
Least square solution to find the value of s is:
∂X ∂s
= 0,
(3.20)
∂ X [(Di s + Ei ) − pi ]2 = 0, ∂s i=1
(3.21)
∂ [(Di s + Ei ) − pi ] = 0, ∂s
(3.22)
n
2
n X
[(Di s + Ei ) − pi ]
i=1
2
n X
[(Di s + Ei ) − pi ]Di = 0,
(3.23)
n X
[(Di s + Ei ) − pi ]Di = 0,
(3.24)
[Di2 s + Di Ei − Di pi ] = 0,
(3.25)
i=1
i=1 n X i=1
49
s=
Pn
i=1
Di p i − Pn
Pn
i=1 2 D i i=1
Di Ei
.
(3.26)
Data to be stored • Breakpoints. • Count of interpolating points for each segment. • Separate value of Tension for each segment.
3.3.4
B´ ezier Curve Fitting
To fit data by a B´ezier curve, adjacent breakpoints are taken as end control points at the beginning and at end of each segment. If there are n points in a segment then parameter t is divided into n − 1 uniform intervals between 0 ≤ t ≤ 1. A B´ezier curve has C 0 continuity, therefore each segment is fitted independently to other segments. Linear B´ezier consists of only two end control points, therefore no other control point is required in fitting. But the the middle control points, i.e., P1 for quadratic B´ezier and P1 and P2 for cubic B´ezier must be determined. Least-square-fitting : We used least square method to find the middle control points. Least square method gives the best values of middle control points that minimize the squared distance between original and fitted data. If pi and P (ti ) are values of original and approximated (interpolated) points respectively such that 0 ≤ ti ≤ 1, 1 ≤ i ≤ n, Q(0) = p1 and Q(1) = pn then we can write the least square equation as follows: S=
n X
[pi − P (ti )]2 .
(3.27)
i=1
For quadratic B´ezier Eq. (3.27) can be written as follows: S=
n X
[pi − (1 − ti )2 P0 − 2ti (1 − ti )P1 − t2i P2 ]2 .
i=1
50
(3.28)
For quadratic B´ezier P1 can be determined by: ∂S = 0, ∂P1 solving for P1 gives: P1 =
Pn
i=1
[pi − (1 − ti )2 P0 − t2i P2 ] Pn . i=1 2ti (1 − ti )
(3.29)
(3.30)
For cubic B´ezier Eq. (3.27) can be written as follows: S=
n X
[pi − (1 − ti )3 P0 − 3ti (1 − ti )2 P1 − 3t2i (1 − ti )P2 − t3i P3 ]2 .
(3.31)
i=1
For cubic B´ezier P1 and P2 can be determined by: ∂S = 0, ∂P1
(3.32)
∂S = 0. ∂P2
(3.33)
P1 = (A2 C1 − A12 C2 )/(A1 A2 − A12 A12 ),
(3.34)
P2 = (A1 C2 − A12 C1 )/(A1 A2 − A12 A12 ),
(3.35)
Solving for P1 and P2 gives:
where A1 = 9
n X
t2i (1 − ti )4 ,
(3.36)
n X
t4i (1 − ti )2 ,
(3.37)
i=1
A2 = 9
i=1
A12 = 9
n X
t3i (1 − ti )3 ,
(3.38)
i=1
C1 =
n X
3ti (1 − ti )2 [pi − (1 − ti )3 P0 − t3i P3 ],
(3.39)
n X
3t2i (1 − ti )[pi − (1 − ti )3 P0 − t3i P3 ].
(3.40)
i=1
C2 =
i=1
51
Split of a segment: In a multi-segment fitting each B´ezier curve is fitted independently. Therefore, splitting of a segment in B´ezier curve is local. When a segment is splitted, it is replaced by two newly created segments and they are fitted as other segments, without effecting any other segments. Due to local splitting nature of B´ezier curve, splitting is computationally more efficient than Natural cubic spline or Cardinal spline.
Data to be stored • Breakpoints (End Control Points). • Middle Control Points (for quadratic and cubic B´ezier). • Count of interpolating points for each segment.
3.3.5
Fitting Algorithms Revisited
Now after explaining the split of segment for Natural cubic spline, Cardinal spline and B´ezier curve, we can update the Fit spline using updated BP step. Modified algorithms are described as follows: Algorithm 3 Break-and-fit spline fitting, pixel level error bound Require: O = {p1 , p2 , . . . , pn }, BP = {bp1 , bp2 , . . . , bpm } , ξ lmt . 1: Fit the spline using BP 2: Find ξ max 3: while ξ max > ξ lmt do 4: Find bpnew ∈ k th segment 5: Add bpnew in BP th th 6: Split k th segment into k1 and k2 segments Natural cubic spline Fit all segments Fit k − 1, k1 , k2 , k + 1 segments Cardinal spline 7: Using updated BP Fit k1 , k2 segments B´ezier curve 8: Find new ξ max 9: end while
3.3.6
Experiments and Discussion
Now we take an example from real video data and apply the algorithm to variation of luminance data of a pixel in a sequence of frames. Table 3.2 gives the details of input 52
Algorithm 4 Break-and-fit spline fitting, segment level error bound Require: O = {p1 , p2 , . . . , pn }, BP = {bp1 , bp2 , . . . , bpm } , λlmt . 1: Fit the spline using BP 2: Find λmax 3: while λmax > λlmt do and bpnew ∈ k th segment 4: Find ξλmax max 5: Add bpnew in BP th th 6: Split k th segment into k1 and k2 segments Natural cubic spline Fit all segments Fit k − 1, k1 , k2 , k + 1 segments Cardinal spline 7: Using updated BP Fit k1 , k2 segments B´ezier curve 8: Find new λmax 9: end while data. Figures 3.3, 3.4, 3.6, 3.7 and 3.8 show the fitting of input data with specified parameters using Natural cubic spline, Cardinal spline, linear B´ezier, quadratic B´ezier and cubic B´ezier respectively. Square distance (SD) measure is used to measure the accuracy of fitness. Initially only first and last pixels are taken as breakpoints, remaining breakpoints are added during splitting of segments. Table 3.3 shows the comparative performance. From the perspective of compression, it is desirable that spline fits the data with minimal number of control points. From Tab. 3.3 we can observe that the Natural cubic spline required maximal number of control points while linear B´ezier needed minimal number of control points. It is not surprising because NCS is most smooth form of spline i.e., continuity C 2 . This constraint of continuity forces NCS to split again and again. Linear B´ezier is least smooth spline (in fact it is a straight line) but this is not the only the reason for least number of control points of linear B´ezier. Unlike quadratic and cubic B´ezier curves, linear B´ezier does not have any middle control points. Therefore, without Table 3.2: Details of a pixel data Video Name No. of Frames Pixel Spatial Location Format Sampling
53
Highway 80 (50,50) Y UV 4:2:0
236 234 232
Data
230 228 226 224 Original Data Fitted Data Breakpoints
222 220
1
10
20
30 40 50 Frame Number
60
70
80
Figure 3.3: Pixel level error bound Natural cubic spline fitting to Y component of Tab. 3.2 data. n = 80, initial BP = {p1 , p80 }, ξ lmt = 10. the continuity constraint and without burden to store middle control points causes linear B´ezier to achieve best performance. Even though quadratic and cubic B´ezier curves have the advantage of finding middle control points using least square method, which gives good initial estimation of original data. In this example data fluctuates at many points, therefore least square method could not help quadratic and cubic B´ezier curves to perform better than linear B´ezier. This example does not represent all kind of variations a pixel may have. There may be variations of pixel values where result might be different from this example. The important finding in this experiment is that most of time linear B´ezier performs best and Natural cubic spline performs worst. The competition between quadratic B´ezier and Cardinal spline is always tough. In this example Cardinal spline won but quadratic B´ezier is not far from it and in general we are never able to decide which can perform better. In fact even in this example although number of control points for Cardinal spline are less than quadratic B´ezier but ξ max is lower for quadratic B´ezier than Cardinal spline (lower the ξ max better the fitting)!
54
236 234 232
Data
230 228 226 224 Original Data Fitted Data Breakpoints
222 220
1
10
20
30 40 50 Frame Number
60
70
80
Figure 3.4: Pixel level error bound Cardinal spline fitting to Y component of Tab. 3.2 data. n = 80, initial BP = {p1 , p80 }, ξ lmt = 10.
3.3.7
Comparative Results and Discussion
We have applied our fitting schemes to several types of videos from low motion activity to high motion activity. Linear B´ezier always performs best, in terms of computational efficiency and saving requirement. Therefore, we compared the performance of linear B´ezier fitting (LBF) with existing temporal compression methods that are based on block matching. The performance of proposed method is evaluated in terms of Entropy, measured in bits per pixel (bpp) and P SN R, measured in decibel (dB), of output data. For compression perspective less Entropy and high P SN R are desirable. Table 3.4 gives the details of input video sequences used in simulation. Foreman sequence has medium temporal activity and Football sequence has high temporal activity. Table 3.5 compares the performance of linear B´ezier fitting (LBF) algorithm with block matching algorithms, namely Three Step Search (TSS), Diamond Search (DS) and Adaptive Rood Pattern Search (ARPS). Figures 3.9 and 3.10 show variation of Entropy and P SN R at various values of λlmt for Foreman and Football video sequences respectively. Figures 3.11, 3.12
55
236 234 232
Data
230 228 226 224 Original Data Fitted Data Breakpoints
222 220
1
10
20
30 40 50 Frame Number
60
70
80
Figure 3.5: Pixel level error bound Cardinal spline least square Tension fitting to Y component of Tab. 3.2 data. n = 80, initial BP = {p1 , p80 }, ξ lmt = 10. and 3.13 show 16th frame of original, DS-BMA approximation and LBF approximation for Foreman sequence. Figures 3.14, 3.15 and 3.16 show 16th frame of original, DS-BMA approximation and LBF approximation for Football sequence. LBF works at pixel level and due to break-and-fit strategy chooses breakpoints (keypixels) from the frames that are most appropriate, therefore yields better objective performance in terms of Entropy and P SN R than BMAs. Human visual system (HVS) is more sensitive to edges [68, 50] than arbitrary noise. Error due to block matching of BMA produces edges while error due to pixel level LBF produces arbitrary or randomly distributed error. Therefore, approximated images of BMA tend to have low subjective quality than approximated images of LBF, even though both have error. Foreman sequence has motion activity in the face while the background is fixed, therefore error in approximated images/videos lies on the face. Football sequence has lot of motion activity, i.e., change in intensity of pixels is rapid, both in the background and foreground. Matching blocks are scarce in reference frame and error between blocks of predicated and reference frame is higher, consequently BMA causes quite noticeable blocking artifacts. 56
236 234 232
Data
230 228 226 224 Original Data Fitted Data Breakpoints
222 220
1
10
20
30 40 50 Frame Number
60
70
80
Figure 3.6: Pixel level error bound linear B´ezier fitting to Y component of Tab. 3.2 data. n = 80, initial BP = {p1 , p80 }, ξ lmt = 10. Due to pixel level fitting and break and fit strategy, LBF approximates the change in intensity of pixels relatively better, even though it also contains error. From Figure 3.17 and Figure 3.18, where non-keypixels are shown in white color, it is evident that the LBF successfully approximates majority of the pixels. Almost all the background and considerable portions of foreground in both videos are approximated by fitting (i.e., no data need to be saved for non-keypixels).
57
236 234 232
Data
230 228 226 224 Original Data Fitted Data Breakpoints
222 220
1
10
20
30 40 50 Frame Number
60
70
80
Figure 3.7: Pixel level error bound quadratic B´ezier fitting to Y component of Tab. 3.2 data. n = 80, initial BP = {p1 , p80 }, ξ lmt = 10. Table 3.3: Comparative performance of pixel level error bound spline fitting to Y component of Tab. 3.2 data. n = 80, BP = {p1 , p80 }, ξ lmt = 10. Spline Natural Cubic Spline Cardinal Spline Cardinal Spline Least Square Linear B´ezier Curve Quadratic B´ezier Curve Cubic B´ezier Curve
No. of Points 40 24 (23+1) 37 (20+17) 19 27 (14 ECP + 13 MCP) 31 (11 ECP + 20 MCP)
Max. Sq. distance (ξ max ) 7.20877 9.71686 9.33035 9 8.41 8.4508
Table 3.4: Details of input video sequences used in simulation Video Name
Format
Frame Size
Foreman (intensity) Football (intensity)
SIF
352 × 288
Number of Frames 44
SIF
352 × 240
44
58
236 234 232
Data
230 228 226 224 Original Data Fitted Data Breakpoints
222 220
1
10
20
30 40 50 Frame Number
60
70
80
Figure 3.8: Pixel level error bound cubic B´ezier fitting to Y component of Tab. 3.2 data. n = 80, initial BP = {p1 , p80 }, ξ lmt = 10.
Table 3.5: Performance comparison of proposed linear B´ezier fitting (LBF) method with TSS, DS and ARPS. For LBF: ∆ = 12, δ = 36 for Foreman, δ = 256 for Football. For BMAs: Block size = 16 × 16, w = ±15 (horizontal and vertical). Method Name TSS DS ARPS LBF
Foreman Entropy P SN R 2.63249 35.8008 2.6307 36.5367 2.62998 36.4523 2.2495 37.7122
59
Football Entropy P SN R 2.91464 25.9021 2.91348 25.6388 2.91347 25.6673 2.6742 29.7645
2.5
38 Entropy PSNR 36
34 1.9 32 1.6
PSNR (dB)
Entropy (bit/pixel)
2.2
30 1.3
1
28
0
100
200
300 400 Max. Allowed MSE
26 600
500
Figure 3.9: Entropy and P SN R performance of Foreman sequence at varying value of maximum allowed mean squared error (λlmt ).
4
38 Entropy PSNR 36
34 3.2 32 2.8
PSNR (dB)
Entropy (bit/pixel)
3.6
30 2.4
2
28
0
100
200
300 400 Max. Allowed MSE
500
26 600
Figure 3.10: Entropy and P SN R performance of Football sequence at varying value of maximum allowed mean squared error (λlmt ).
60
Figure 3.11: 16th frame of original Foreman video sequence.
Figure 3.12: 16th frame of Foreman video sequence, DS-BMA approximation.
61
Figure 3.13: 16th frame of Foreman video sequence, LBF approximation.
Figure 3.14: 16th frame of original Football video sequence.
62
Figure 3.15: 16th frame of Football video sequence, DS-BMA approximation.
Figure 3.16: 16th frame of Football video sequence, LBF approximation.
63
Figure 3.17: 16th frame of approximated Foreman video sequence by LBF. Non-keypixels (approximated pixels) are shown in white.
Figure 3.18: 16th frame of approximated Football video sequence by LBF. Non-keypixels (approximated pixels) are shown in white.
64
Chapter 4 FITTING GROUP OF PIXELS 4.1
Individual Pixel Level Fitting
To better understand the fitting of group of pixels lets first revise the individual level fitting. We can approximate set of points P = {pi , . . . , pk } by taking the first and the last points as breakpoints, i.e., pi and pk , then approximate the in between points using spline interpolation. If the maximum error (e.g., maximum distance dmax ) between any point in original data and its corresponding point in interpolated data is greater than some predefined threshold then insert a new point at the location (index) of maximum error from the original data in the set of breakpoints. Repeat the fitting process with updated set of breakpoints. For video data, we can think P as a set of luminance or color values associated with a pixel between frames i and k while breakpoints can be consider as keyframes. The above described fitting process can be applied to all pixels separately. In order to regenerate (decode) the approximated data later we have to save: (1) breakpoints and (2) count of interpolating points between every consecutive breakpoints for all pixels. This saved information ensures that we generate equal number of points (frames) between any two consecutive breakpoints (keyframes) as in original video data. Figure 4.1 shows original data, line interpolated data and maximum error (dmax) at any point between original and interpolated data for four different data sets. Although in this example data is taken from a non-video source to explain the concept, but it can be thought as a variation of frame sequence along horizontal axis and corresponding variations of 1-D pixel values (e.g., luminance values) along vertical axis. In Fig. 4.2 four new breakpoints 65
are inserted at different indices for each data set, where distance between original data and fitted data is maximum for each data set. This is how fitting model works at pixel level, that is fitting of data is accomplished separately for each pixel. org. data
org. data dmax
dmax
interp. data interp. data
dmax org. data
org. data
dmax
interp. data interp. data
Figure 4.1: Original data, line interpolated data and maximum error (dmax) at any point between original and interpolated data.
4.2
Group Level Fitting
In video data there is spatial correlation, that is pixels in close proximity have values equal or close to each other. There is also temporal correlation, that is variation of values for close proximity pixels in consecutive video frames is close to each other. Figure 4.3 shows four neighboring pixels of an actual video sequence in 24 consecutive frames. It can be observed from the figure that neighboring pixels variation along temporal direction is correlated. Therefore, it is beneficial to take neighboring pixels together as a group and approximate them together using spline. In the group level approximation, we insert new breakpoints at same frame for all pixels in a group, or in other words, if new breakpoints are need to be inserted, they are inserted at the same index for all points in a group. Although the indices of maximum error points between original data and interpolated data for all points in a group would not be same, due to correlation, they would not 66
83
108
124
158
Figure 4.2: Pixel level fit model: Breakpoints are inserted at different indices/frames (108, 83, 158, 124) for each data set. be far away from each other and by inserting breakpoints at same index/frame fitting convergence would be achieved. In Fig. 4.4 four new breakpoints are inserted at the same index for each data set where sum of distances is maximum between original data and fitted data. This is how fitting model works at group level.
This group level
approximation not only helps to save less data but also gives equal number of breakpoints at the same indices for each group i.e., rectangle data. Spatial compression techniques like DCT or DWT are much easier to apply on rectangle data than arbitrary shape data. The above discussion provides the basis of group level approximation and in the following section we formally describe our proposed algorithm.
4.3
Algorithm
In this section, we describe our compression algorithm. Suppose that a sequence of M frame is given, and each frame consists of wF r × hF r pixels, where wF r and hF r are width and height of a frame respectively. Each pixel value is in R1 or R3 . 1. Specify block width wBlk , block height hBlk , keyblock interval ∆ and the tolerance 67
140
140
R G
120
120
100
100
80 60
80
B 1
5
10
15
60
20 24
140
B 1
5
10
15
20 24
140
R G
120
R G
120
100
100
80 60
R G
80
B 1
5
10
15
60
20 24
B 1
5
10
15
20 24
Figure 4.3: RGB variations of 4 neighboring pixels in 24 consecutive frames. Spatial locations from left to right, top to bottoms are: (2,16), (2,17), (3,16), (3,17).
124
124
124
124
Figure 4.4: Block level fit model: Breakpoints are inserted at the same index/frame (124) for each data set. 68
of fit ξ lmt (maximum allowed distance between original and fitted data) for one lmt point. Compute block tolerance ξBlk by:
lmt ξBlk = ξ lmt × wBlk × hBlk .
(4.1)
2. Divide each video frame spatially into non-overlapping set of 2D rectangle blocks of equal sizes. Each block consists of wBlk × hBlk pixels. © 1 ª 1 M B = B1,1 , B1,2 , . . . , Bnw,nh ,
k3 Bk1 u ,k2u
\
k3 = ∅, (k1u , k2u ) 6= (k1v , k2v ), Bk1 v ,k2v
(4.2)
(4.3)
k3 where Bk1,k2 is (k1, k2)th block of k3th frame, 1 ≤ k1 ≤ nw, 1 ≤ k2 ≤ nh, 1 ≤
k3 ≤ M ; nw = wF r /wBlk and nh = hF r /hBlk are number of blocks of a frame in k3 horizontal and vertical directions respectively. Each Bk1,k2 block contains local set
of pixels, PBk1,k2 = {p11 . . . pij }, where 1 ≤ i ≤ wBlk and 1 ≤ j ≤ hBlk . k3 1 M I 3. From each set of Bk1,k2 blocks take first Bk1,k2 , last Bk1,k2 and every Bk1,k2 blocks
in the set of initial keyblocks:
KBk1,k2
o n Il+1 I1 I2 1 M = Bk1,k2 , Bk1,k2 , Bk1,k2 , . . . , Bk1,k2 , Bk1,k2 ,
(4.4)
© 1 ª 2 M Bk1,k2 = Bk1,k2 , Bk1,k2 , . . . , Bk1,k2 ,
(4.5)
where KBk1,k2 ⊆ Bk1,k2 , 1 ≤ I ≤ l + 1, 0 ≤ l ≤ ⌊M/∆⌋ − 1, Il+1 = Il + ∆, I0 = 0, ∆ ≥ 2. 4. Apply break and fit process between each pair of consecutive keyblocks as follows: k3
k3u u+1 and Bk1,k2 are two consecutive keyblocks the number of blocks be(a) If Bk1,k2
69
tween them are computed by: m = k3u+1 − k3u + 1.
(4.6)
Now we will use the local indices 1 ≤ k3 ≤ m for m blocks between two adjacent keyblocks, inclusive i.e., k3u = 1 and k3u+1 = m. (b) Generate the interpolation points QBk1,k2 by fitting the line/cubic spline bek3 tween corresponding points of keyblocks. Number of interpolated points between every pair of keyblocks are equal number of points between correspond¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯1 ing keyblocks of original video i.e., ¯PBk1,k2 k3 k3 ¯ = ¯QBk1,k2 ¯ .
(c) Find the sum of distances for each frame between corresponding block pixels of original data and fitted (interpolated) data by: hF r X wF r X N (dk3 (sd ) = ij ) , k3 N
(4.7)
j=1 i=1
i.e., (sdk3 )N is the distance between corresponding blocks of k3th frames of N original data and fitted data. Where (dk3 is defined as follows: ij )
° ° k3 N k3 °N ° (dk3 ) = , p − q ij ij ij
(4.8)
k3 th th where pk3 ij and qij are luminance or color values of (i, j) pixel of k3 frame of k3 N original and interpolated data respectively and (dij ) is the distance between k3 pk3 ij and qij . For N = 1, it is absolute difference measure, for N = 2 it is
square distance measure. In our experiments we use the latter. (d) Find the frame of maximum distance i.e., frame where sum of pixel distances between corresponding original and fitted blocks is maximum using:
1
ª © (dmax)N = M ax (sd1 )N , (sd2 )N , . . . , (sdm )N ,
(4.9)
(dmax)N = (sdi )N , i = J.
(4.10)
|A| denotes cardinality of set A
70
i.e., at J th frame the total distance is maximum between original video data block and fitted data block. lmt (e) If (dmax)N > ξBlk , insert a new keyblock at the J th frame (i.e., at Lth global
frame index) in the set of keyblocks: KBk1,k2 = KBk1,k2
[©
ª L Bk1,k2 ,
(4.11)
where L = k3u + J − 1. 5. Repeat the fitting process until the maximum distance between any two consecutive lmt original data keyblocks and fitted data keyblocks is less than or equal to ξBlk .
4.4
Simulation Results
In order to compare the performance of proposed method with MPEG, we applied JPEG standard as spatial coding to input test sequences and applied the proposed method as temporal coding. Figure 4.5 shows the simplified flow chart of video coding system where temporal compression is done using block based fitting. Figure 4.6 shows the simplified flow chart of video coding system where temporal compression is done using block matching (motion estimation). Same test sequence are coded using the MPEG-2 encoding with default parameters. Finally rate-distortion measure i.e., peak signal to noise ratio (P SN R) vs bitrate is used to compare the performance of proposed method and MPEG-2. Table 4.1 gives the details of input video sequences used in simulation. Table 4.2 gives the details of parameters used for MPEG-2 encoding of input sequences. Figure 4.7 and Fig. 4.8 show variation of P SN R at various values of bitrate for gray level Football video sequences in SIF and BT.601 format respectively at given parameters values. Fig. 4.9 shows 20th frame of MPEG-2, line, and NCS approximated videos for Football sequence in SIF format at given parameter values. Fig. 4.10, Fig. 4.11 and Fig. 4.12 show MPEG, line and NCS approximated sequences respectively for BT .601 format.
71
Input: Raw Video data Temporal Data compression by Spline Fitting Output: Keyblocks
Spatial Data Compression by DCT/DWT Output: DCT/DWT coefficients
Quantization Output: DCT/DWT quantized coefficients
Entropy Encoding (Huffman or Arithmetic) Output: coded bit-stream Figure 4.5: Flow chart of video coding system, temporal compensation is done by block based fitting.
4.5
Discussion
We applied proposed group level fitting using parametric line (linear B´ezier) and Natural cubic spline. For parametric line once a distance between original points PBk1,k2 and k3 corresponding interpolating points QBk1,k2 for all blocks between consecutive keyblocks k3 has been computed and it is within tolerance then no further distance calculation will be required for all the blocks within these keyblocks, inclusive. When a new keyblock is inserted then recomputation of distance measure would be required for only those blocks that lie within these three keyblocks (two old keyblocks and one new keyblock at maximum distance frame). But for NCS during break and fit loop of block Bk1,k2 insertion of every new keyblock requires recomputation of fitting parameters for all blocks, which is computationally quite expensive. We take ∆ = 12, and it means keyblock exists after every 12th frame. Increasing the value of parameter ∆ approximates larger number of block sequence with smaller 72
Input: Raw Video data Temporal Data compression by ME Output: MC predicted frames
Spatial Data Compression by DCT/DWT Output: DCT/DWT coefficients
Quantization Output: DCT/DWT quantized coefficients
Entropy Encoding (Huffman or Arithmetic) Output: coded bit-stream Figure 4.6: Flow chart of video coding system, temporal compensation is done by motion estimation. number of keyblocks. This is analogous to curve fitting situation where fitting curve to very large data set and taking very few keypoints initially and insert new keypoints when needed. This fitting usually requires less keypoints as compared to fitting by taking large number of keypoints initially, because some of the initial keypoints may not be needed. But if a lot of splitting occur due to high variation of pixels values, i.e., many intermediate keyblocks are to be inserted, then this would slowdown the fitting process. It is because each split requires a lot of recomputation. Therefore, break and fit strategy of the algorithm in conjunction with initial set of breaks at certain intervals requires a balance between computation efficiency and addition of minimal number of keyblocks. For constant or linear variation in pixel values of a block the proposed line model can approximate it with no error. The proposed algorithm ensures that all the pixels th of Bk1,k2 block use same set of keyframes during interpolation, and this results in less
saving requirement compared to individual pixel fitting process. This also leads us to save
73
Table 4.1: Details of input video sequence Video Name
Format
Frame Size
Football (gray level) Football (gray level)
SIF
352 × 240
Number of Frames 36
BT.601
720 × 480
37
Table 4.2: Details of MPEG-2 coding parameters Macroblock Size Block Matching Search Algorithm Range of Block Search Window GOP arrangement Frame Rate (fps) Scan Format
Four 8 × 8 luminance blocks Standard Hierarchical ±15 IBBPBBPBBPBBI 30 Progressive
1D indices of keyblocks (keyframes) rather than 2D/3D motion vectors. Our approach guarantees non-overlapping nature of blocks in any frame, consequently gives us very nice feature that is pixels of any block for all video frames can be encoded/decoded independently to other blocks, which would not be possible if translation of blocks had to be taken into account. Space and time complexity in both aspects, parametric line model performs better than NCS model. If a pixel value remains constant or changes linearly in many consecutive frames or it changes suddenly at some point, in all cases linear model can approximate it better. Cubic spline model produces oscillation for constant data and does not change direction abruptly. The concept is explained in Fig. 4.13 where NCS fit produces oscillation and requires more break points than line fit, even though NCS curve looks smoother. But this smoothness does not impact the quality of reconstructed video because it depends on closeness of fit to original data rather than smoothness of fit. The proposed method can be applied to any 3D color spaces like RGB, Y Cb Cr or HSV or 1D space like luminance or chrominance components separately . Optionally input data can be downsampled; preferably in Y Cb Cr color space, because in Y Cb Cr color space, Cb and Cr components can be downsampled with higher ratio than Y component by utilizing 74
36 MPEG−2 Line NCS
34
PSNR (dB)
32
30
28
26
24 0.5
1
1.5
2 Bitrate (Mbps)
2.5
3
3.5
Figure 4.7: Rate distortion performance of MPEG-2, Line and NCS coding for gray level lmt Football sequence, SIF format, 30 fps, wBlk × hBlk = 4 × 4, ∆ = 12, M = 36, ξBlk = 64. the characteristics of human visual system.
4.6
Choosing Parameters
Apparently the described method has many parameters but fortunately they are interrelated and requires the end user to control only three of them: (1) Block size i.e., wBlk ×hBlk (2) Initial keyblock interval i.e., ∆ (3) Tolerance of fit i.e., ξ lmt . Due to architectural reasons block sizes of integer powers of 2 are preferred and we recommend block sizes of 4, 8 or 16. As MPEG scheme [23, 25] usually takes a keyframe after every 12th frame, we used the same interval in our experiments but this interval can be increased, if higher CR is desirable at the cost of more computations. Level of tolerance ξ lmt controls bitrate and P SN R of approximated video. User can choose the value of ξ lmt depending on the trade off between bitrate and quality. Empirically we find that parameter ξ lmt is not very sensitive to small variations and can be chosen flexibly.
75
37 MPEG−2 Line NCS
36
PSNR (dB)
35
34
33
32
31
30
2
2.5
3
3.5 Bitrate (Mbps)
4
4.5
5
Figure 4.8: Rate distortion performance of MPEG-2, Line and NCS coding for gray level Football sequence, BT.601 format, 30 fps, wBlk × hBlk = 4 × 4, ∆ = 12, M = 37, lmt ξBlk = 100.
76
Figure 4.9: Luminance Frames: Top to bottom, 20th frame of MPEG-2, line, and NCS lmt approximated videos, bitrate=2.5 Mbps, ξBlk = 64, wBlk × hBlk = 4 × 4, ∆ = 12, M = 36. Gray level Football video sequence, SIF format.
77
lmt Figure 4.10: Luminance Frame of MPEG-2 approximated videos, bitrate=8 Mbps, ξBlk = 100, wBlk × hBlk = 4 × 4, ∆ = 12, M = 37. Gray level Football video sequence, BT.601 format.
78
lmt Figure 4.11: Luminance Frame of line approximated videos, bitrate=8 Mbps, ξBlk = 100, wBlk ×hBlk = 4×4, ∆ = 12, M = 37. Gray level Football video sequence, BT.601 format.
79
lmt Figure 4.12: Luminance Frame of NCS approximated videos, bitrate=8 Mbps, ξBlk = 100, wBlk ×hBlk = 4×4, ∆ = 12, M = 37. Gray level Football video sequence, BT.601 format.
Line aprox.
Keyframe NCS aprox.
Figure 4.13: Line and NCS approximation of data. NCS shows oscillation.
80
Chapter 5 IMAGE COMPRESSION In this chapter we present a new technique for synthetic image compression using quadtree decomposition and parametric line (linear spline, linear B´ezier) fitting. Since our method is based on Quadtree and Parametric Line, we would call our method QPL. We compare the performance of our technique with well known lossless image compression techniques, GIF and PNG.
5.1
Framework
The subsequent two sections, 5.1.1 and 5.1.2 describe the framework of our system that is based on quadtree and parametric line.
5.1.1
Quadtree
Quadtree is a data structure that is widely used for image storage, representation and processing [16, 54]. Quadtree is most often used to partition a 2-D space by recursively subdividing it into four quadrants or blocks until each quadrant contains only pixels of one color or luminance. Recursive subdivision may result a quadrant that contains only single pixel. This conventional quadtree decomposition has following drawbacks: (1) The overhead of representing a single pixel by quadtree is not desirable for image compression. It may take more space to represent a single pixel by quadtree than without using it. (2) Due to subdividing criteria, even if a single pixel in a quadrant is of different color or
81
luminance then quadtree decomposition would divide that quadrant into four quadrants. As a consequence of this, there may be three quadrants with same luminance value. In other words, the boundaries between quadrants does not necessary represent quadrant of different luminance. To overcome the first drawback; in our method we imposed a constraint of minimum block size on quadtree decomposition. It means that a quadrant would not be further divided into four quadrants if its size is equal to the predefined minimum block size. The constraint of minimum block size safeguards our method from the overhead of representing very small quadrants (e.g., quadrants of size less than 4 × 4) by a quadtree. The constraint based quadtree decomposition results in two types of quadrants, (a) homogeneous quadrants, i.e., quadrants that contain only pixels of one color or luminance, (b) non-homogeneous quadrants, i.e., quadrants that contain pixels of more than one color or luminance. We represented only homogeneous quadrants using quadtree. Non-homogeneous quadrants are represented by parametric line as described in next section 5.1.2.
5.1.2
Parametric Line
In the chapter of spline, we already described the theory of linear B´ezier curve and here we briefly review it. Parametric line is essentially a straight line obtained by linear interpolation between two points (control points). To generate a parametric line that interpolates k + 1 points, k line segments are used. Equation of j th segment between points pj and pj+1 can be written as follows: qj (t) = (1 − t) pj + tpj+1 , t ∈ [0, 1], 1 ≤ j ≤ k,
(5.1)
where qj (t) is an interpolated point between control points pj and pj+1 at parameter value t. To generate n points between pj and pj+1 inclusive, the parameter t is divided into n − 1 intervals between 0 and 1 inclusive such that qj (0) = pj and qj (1) = pj+1 . In order to represent the non-homogeneous quadrants, we scanned the image data row wise and fitted the parametric line to pixels of non-homogeneous quadrants. Parametric line fitting helps to further reduce the data size in two ways. First, the parametric line fitting helps to represent the pixels of one color/luminance with smaller data set. 82
Second, the parametric line fitting merges the data of a row, belong to more than one non-homogeneous quadrant, as a single data set. This single merged row removes the artificial boundaries between quadrants that have been imposed by quadtree decomposition. It is very likely that at the boundaries of two adjacent non-homogeneous quadrants, pixels have same luminance. By merging quadrants, large number of pixels can be represented by small output data obtained from parametric line fitting. This also solves the second drawback of conventional quadtree representation of image, as described in the previous section 5.1.1.
5.2
Algorithm
This section presents the details of our algorithm for image compression. The encoding part of the algorithm consists of two main phases. The first phase consists of quadtree decomposition and the second phase consists of parametric line fitting. The decoding is relatively a simple process and we described it in a single phase. Let we have a gray-scale image I of size w × h, as shown in Fig. 5.1. Let pi,j is luminance of a pixel at spatial location (i, j), where 0 ≤ pi,j ≤ 255, 1 ≤ i ≤ h and 1 ≤ j ≤ w.
5.2.1
Encoding
Phase 1: Quadtree Encoding 1. If size of image I is not a power of 2 then pad the right and bottom borders of I with −1s. The Fig 5.2 shows padded image, where padded area is shown in black color (internally image matrix has −1 value for padded area). Let J is the padded image of size, w′ × h′ . For example, if size of I is 509 × 486 then after padding, the size of J would be 512 × 512. 2. Specify threshold of minimum block size B lmt for quadtree decomposition. For example when B lmt = 4, the minimum block size would be 4 × 4. 3. Apply the quadtree decomposition to image J. This yields homogeneous quadrants and non-homogeneous quadrants. Figure 5.3 shows homogeneous and non83
Figure 5.1: Original image of size 509 × 486. The image size is not a power of 2. homogeneous quadrants of a quadtree decomposed image, with minimum block size is 4 × 4. Non-homogeneous quadrants have pixels of more than one luminance value; therefore it is not worth to save each pixel (total 16 pixels in a minimum size quadrant) of non-homogeneous quadrants individually. 4. Entropy encode the data of homogeneous quadrants (blocks). Each homogeneous quadrants is identified by, (i) size of the quadrant, because width and height of a quadrant are always equal, therefore only single value is required to save, (ii) (x, y) coordinates of upper left corner of the quadrant, (iii) luminance of the quadrant. There are multiple quadrants of same size, therefore for efficient storage, we created a list of sizes and each element of list has reference towards (ii) and (iii) of all quadrants whose size is equal to corresponding element in the list. There is no need 84
Figure 5.2: The image of Fig. 5.1 is padded. Size of the padded image is 512 × 512. Padded area on right and bottom is shown in black color. to save the data of any quadrant in J which is completely outside of original image I. It can easily be determined by comparing the upper left corner coordinates of a quadrant with the value of w and h. 5. Replace the luminance values of pixels of homogeneous quadrants in J with −1. In Fig. 5.4 we show homogeneous quadrants with green color, while non-homogeneous quadrants are shown with their actual luminance value. Let the K is the image we obtained after replacing pixels of homogeneous quadrants in J with −1.
85
Figure 5.3: A quadtree decomposed image; minimum block size is 4 × 4. Boundaries of homogeneous and non-homogeneous quadrants are shown with blue and red colors respectively.
86
Figure 5.4: A quadtree decomposed image, minimum block size is 4 × 4. Homogeneous quadrants are filled with −1, shown in green color. Non-homogeneous quadrants are shown with their actual luminance values.
87
Phase 2: Parametric Line Encoding 1. Specify threshold of fitting λlmt . The value of λlmt is 0 for lossless fitting (compression), while λlmt > 0 for lossy compression. 2. Scan the pixels of image K row by row. During scanning skip the pixels of values −1. In other words, we are skipping the pixels of homogeneous quadrants, because they are already represented by quadtree. Let Ri is the set of pixels in the ith row of image K. Let Ri′ is the set of pixels in the ith row of image K, excluding pixels of value −1, |Ri′ | ≤ |Ri | 1 . 3. Apply the fitting process to pixel values of each row of image K separately as follows: (a) Ri′ = {pi,1 , pi,2 , . . . , pi,n } consists of luminance values of all pixels in ith row (excluding pixels of homogeneous quadrants). Each element in Ri′ can be considered as a point in 1-dimensional Euclidean space. We call the set points in Ri′ as original data. ª © (b) Take the first and the last pixels of Ri′ as breakpoints, i.e., BP = p(i,1) , p(i,n) . ª © For example, BP = p(i,1) , p(i,105) , assuming there are 105 pixels in the ith row, excluding pixels of −1 value.
(c) Divide the Ri′ into segments. A segment is a set of all points (pixels) between two adjacent breakpoints. There is only one segment in the first iteration between p(i,1) and p(i,n) , but when new breakpoints would be added in the following iterations then the number of segments would be increased. For example, ª © suppose there are four breakpoints, i.e., BP = p(i,1) , p(i,55) , p(i,75) , p(i,105) .
Then there would be three segments: ª ª © ª © © S1 = p(i,1) , . . . , p(i,55) , S2 = p(i,55) , . . . , p(i,75) , and S3 = p(i,75) , . . . , p(i,105) . (d) Apply the fitting process to each segment by taking each pair of adjacent breakpoints as the first and the last point of parametric line and obtain the fitted data by Eq. 5.1. Number of segments and number of points in the 1
|A| denotes cardinality of set A
88
fitted data are equal to number of segments and number of points in the original data respectively. Let Qi is the set of points in the fitted data, then Qi = {qi,1 , qi,2 , . . . , qi,n }. (e) Compute the squared distance for each point between Ri′ and Qi i.e., ° °2 d2j =°p(i,j) − q(i,j) ° , 1 ≤ j ≤ n. (f) Find λmax = M ax (d21 , d22 , . . . , d2n ), λmax ∈ k th segment.
(g) If λmax > λlmt then replace the k th segment by two new segments at the point where the squared distance is maximum. For example, if λmax =d237 , it means that squared distance between p(i,37) and q(i,37) is maximum. Since © ª p(i,37) is in segment S1 = p(i,1) , . . . , p(i,55) , , split S1 at p(i,37) and replace S1 ª ª © © by S1a = p(i,1) , . . . , p(i,37) and S1b = p(i,37) , . . . , p(i,55) .
(h) Add a new breakpoint bpnew in the set of breakpoints, i.e., S BP = {BP } {bpnew }. ª © For example, if before spitting, BP = p(i,1) , p(i,55) , p(i,75) , p(i,105) and ª © bpnew =p(i,37) then after splitting, BP = p(i,1) , p(i,37) , p(i,55) , p(i,75) , p(i,105) .
(i) Go to step 3c and repeat the fitting process with the new set of breakpoints, until the λmax between any two points of original and fitted data is less than or equal to λlmt . Figures 5.5 – 5.11 show how the fitting process works for a row of image data. Pixels belong to third row of a image are scanned, while pixels belong to homogeneous quadrants are skipped (because they are already represented by quadtree). Then parametric line is fitted to pixel values. Initially first and last pixels are taken as breakpoints. Due to break-and-fit strategy other pixels are added as new breakpoints during fitting process. Eventually all data is fitted with 0 error (lossless coding) with very few breakpoints. It is also worth to note that by using only 3 breakpoints, as shown in Figure 5.6, sufficient fitting accuracy is achieved. We found that the method is also suitable for lossy coding and yields higher compression (less number of breakpoints gives higher compression) with very little distortion.
(j) Entropy encode the final set of breakpoints (BP ) and count of interpolation points (C) between breakpoints. This is required to decode the spline fitted 89
300
250
Luminance
200
150
100
50
0
Original Data Fitted Data Breakpoints 1
10 20 x−coordinates of scan row of image
30
Figure 5.5: Fitting with 2 breakpoints. 300
250
Luminance
200
150
100
50
0
Original Data Fitted Data Breakpoints 1
10 20 x−coordinates of scan row of image
Figure 5.6: Fitting with 3 breakpoints. 90
30
300
250
Luminance
200
150
100
50
0
Original Data Fitted Data Breakpoints 1
10 20 x−coordinates of scan row of image
30
Figure 5.7: Fitting with 4 breakpoints. 300
250
Luminance
200
150
100
50
0
Original Data Fitted Data Breakpoints 1
10 20 x−coordinates of scan row of image
Figure 5.8: Fitting with 5 breakpoints. 91
30
300
250
Luminance
200
150
100
50
0
Original Data Fitted Data Breakpoints 1
10 20 x−coordinates of scan row of image
30
Figure 5.9: Fitting with 6 breakpoints. 300
250
Luminance
200
150
100
50
0
Original Data Fitted Data Breakpoints 1
10 20 x−coordinates of scan row of image
Figure 5.10: Fitting with 7 breakpoints. 92
30
data. For example, if the final set of breakpoints is: ª © BP = p(i,1) , p(i,37) , p(i,55) , p(i,75) , p(i,105) ,
then count of interpolating points would be:
C = {37 − 1 + 1, 55 − 37 + 1, 75 − 55 + 1, 105 − 75 + 1}, i.e., C = {37, 19, 21, 31}. Figure 5.12 describes the flow chart of compression technique.
5.2.2
Decoding
1. Create an empty image (all pixels of 0 values) L of size w′ × h′ . The size of L is equal to the padded image J. 2. Assign −1 value to each pixel of homogeneous quadrants of L. We know the coordinates of homogeneous quadrants, because we saved the size and the coordinates of homogeneous quadrants during step 4 of quadtree encoding process. 3. Using BP and C, obtained in the step 3j of parametric line encoding process, perform the parametric line interpolation. This yields the parametric line data Qpl . 4. Fill L with Qpl with skipping pixels of −1 value. In other words we are skipping homogeneous quadrants. This yields portion of image decoded by parametric line. 5. Fill the homogeneous quadrants of L using actual values of homogeneous quadrants. We saved the values of each block of homogeneous quadrants in step 4 of encoding process, besides its size and coordinates. 6. Trim L to size w × h, i.e., equal to the size of the original image I. This is the decoded image.
5.3
Experiments and Results
We tested our method on various types of synthetic images such as clip-art, animation, cartoon, medical image, scientific plot, wavelet transformed, etc. Table 5.1 gives the
93
details of input images. All input images are bitmap images, with 256 possible graylevels (0–255), stored at 8 bit per pixel. Figure 5.1, and Figures 5.14 – 5.18 show input images. In figures with white background, rectangle boxes are drawn around images to visualize boundaries of images. Table 5.2 shows the bit-rate performance of GIF, PNG and our method QPL. QPL preformed better than GIF and PNG for all images except for Text image, where PNG performs slightly better. The good performance of QPL is due to the fact that in the first phase, it exploits both horizontal and vertical redundancy using quadtree structure. GIF and PNG do not exploit horizontal and vertical redundancy simultaneously. In the second phase, QPL further reduces the redundancy of those pixels showing constant or linear luminance variation by parametric line fitting. By imposing minimum block size constraint our quadtree decomposition does not fall in the trap of very small blocks. Table 5.1: Details of input images used in the simulation. Image Name Planter Ball Play Text Style Bone Wavelet Meshgrid Piegray
5.4
Type Clip-art Clip-art Cartoon Text Computer Animation Medical (X-ray) Wavelet transform Scientific plot Pie chart
w×h 509 × 486 500 × 500 420 × 315 704 × 395 250 × 314 560 × 420 400 × 352 560 × 420 560 × 420
Discussion
Encoding/Decoding of RGB image: We described the algorithm for gray-scale image. To apply the algorithm on true color or RGB or HSV image, each dimension (channel) is processed separately. Lossless and Lossy Compression: The proposed method can be used for both lossless and lossy compression, while both GIF and PNG are inherently lossless compression schemes and cannot be used for lossy compression. In our method, the only difference 94
300
250
Luminance
200
150
100
50
0
Original Data Fitted Data Breakpoints 1
10 20 x−coordinates of scan row of image
30
Figure 5.11: Fitting with 8 breakpoints.
Table 5.2: Bitrate (bit per pixel, bpp) performance of GIF, PNG and QPL for lossless compression (λlmt = 0). For QPL minimum block size is 4 × 4. Image Name Planter Ball Play Text Style Bone Wavelet Meshgrid Piegray
GIF (bpp) 0.6633 0.8657 1.5689 1.7398 0.8728 0.5452 0.3891 0.3866 0.1637
95
PNG (bpp) 0.7965 0.7833 1.5618 1.5150 1.0517 0.5113 0.4497 0.4215 0.1214
QPL (bpp) 0.6180 0.7219 1.3497 1.5220 0.8204 0.4389 0.3587 0.3349 0.1167
x
Min. block size
x
Image
Pad Image (if needed)
Apply Quadtree Decomposition
Encode Quadrants of Homogeneous Blocks
x
Error Threshold (T1)
x
Breakpoint Interval
x
Non-Homogeneous Quadrants
Divide data in to segments using breakpoints
Fit Spline to (updated) segments
Compute Max. Error of Fit (T2)
T2 > T1
No
Yes
Insert new Breakpoint at point of Max. Error
Stop
96 Figure 5.12: Flow chart of compression technique.
Figure 5.13: Ball, a clip-art image
97
Figure 5.14: Play, a cartoon image. between lossless and lossy compression is the value of parameter λlmt , i.e., threshold of fitting. The value of λlmt is zero for lossless compression, while λlmt is greater than zero for lossy compression. Using higher degree B´ ezier curves: Parametric line is a linear B´ezier curve. Mathematical model of parametric line (linear B´ezier) is analogous to higher degree B´ezier curves such as quadratic or cubic B´ezier curves. After experiments, we found that parametric line is more suitable than higher degree B´ezier curves; both from compression and computational perspectives. Fitting by any form of straight line, e.g. polyline [3, 39], would yield the same results as parametric line.
5.5
Conclusion
We presented a new hybrid scheme for image data compression using quadtree decomposition and parametric line fitting. We described the encoding and decoding steps of algorithm. The encoding is composed of two main phases. In the first phase, the method finds the homogeneous and non-homogeneous quadrants using quadtree decomposition with constraint of minimum block size. Homogeneous quadrants are represented 98
Figure 5.15: Text, a text image. Original image is rotated 90 degree counter-clockwise.
99
Figure 5.16: Style, an animation image.
100
Figure 5.17: Bone, a medical (x-ray) image.
101
Figure 5.18: Wavelet, a wavelet transform image.
102
Figure 5.19: Meshgrid, a scientific plot.
103
Figure 5.20: Piegray, a pie chart.
104
by quadtree. In the second phase, the method fits the parametric line to non-homogeneous pixels of each row. Experimental results show that the proposed scheme performs better than well-known lossless image compression techniques for several types of synthetic images.
105
Chapter 6 CONCLUSION AND FUTURE WORK We conclude this thesis with a review of the objectives for our research, a review of our research outcome and suggestions for future work.
6.1
Objectives and Overview
The main objective of our research was to compress the temporal video and synthetic image data. We used spline to compress the temporal video data and a hybrid scheme based on quadtree decomposition and parametric line to compress the image data. We investigated linear B´ezier, quadratic B´ezier, cubic B´ezier, Natural cubic spline and cubic Cardinal spline to approximate/compress the temporal variations of video data and spatial variations of image data. Since the size of multimedia data, especially video data is very large, a few seconds of video needs to store millions of pixels, therefore the approximating methods must be automated and computationally efficient. All the fitting method we applied are fully automated, it means that initially user has to set a few parameters then the rest of the encoding/decoding process does not require any human intervention. For pixel level video data fitting, the parameters user has to specify are limit of error (tolerance of fit) and initial keyframe (breakpoint) interval. For block level fitting additional parameter is block size. In fact 12 is default value for initial keyframe interval and 4 − by − 4 is default value for block-size. It means only one parameter which is limit of 106
error needs to be explicitly specified by user and it is the most important parameter that controls the quality of approximated video i.e., P SN R and quantity of approximated data i.e., bitrate. Our default breakpoint interval 12 is quite low that causes less breaking of segments which results in fast computation. For quadratic B´ezier curve, cubic B´ezier curve and cubic Cardinal spline, we used least square fitting. Least square fitting usually causes less breaking of segments and yields better fitting compared to fitting that solely depends on break-and-fit. Consequently our least square fitting schemes are also good candidates of video data approximation. For image data, user has to specify minimum quadrant (block) size for homogeneous quadrants and limit of error for parametric line fitting of non-homogeneous quadrants. Both quadtree decomposition and parametric line fitting are very fast and make our method very practical for image compression. Among the proposed methods of video data compression two schemes perform better or comparative to existing temporal video data compression methods that are based on block matching. These notable proposed methods are linear fitting and Group of Pixel-based Fitting. The block based fitting method groups neighboring pixels as blocks and approximates them together using parametric line and Natural cubic spline break and fit models. The method proposed can be incorporated in the motion compensation (MC) step of existing video coding techniques based on discrete cosine transform (DCT) or discrete wavelet transform (DWT). Approximating pixel values within the block boundary also makes it possible to encode/decode each group of pixels independently. Decoding in temporal direction is also a simple interpolation of pixels inside keyblocks. Experimental results show that the parametric line yields comparative performance with the MC algorithm of MPEG coding both in terms of objective and subjective quality measurement parameters, i.e. P SN R and human visual acceptance. The second scheme that is comparable to contemporary MC methods is pixel level linear fitting. Due to fitting of individual pixel video data, the output video is also free from blocking-artifacts. The subjective and objective analysis shows that this technique performs better than existing temporal video data compression methods. We compared the performance of our image compression method with prevalent lossless synthetic image standards, i.e. PNG and GIF. The bitrate performance of proposed method is better than PNG and GIF for most of the synthetic images. 107
6.2
Applications and Advantages
Following are few applications and advantages of proposed video compression methods. • Compression of video data for high bitrate medias such as CD, DVD, VCD. • Compression of video data for low bitrate medias such as ISDN, Internet (using parametric line only). • Creating spatial and temporal scalable videos. • Compression/Transmission of selected areas of video frames. • Precise control of pixel/block level accuracy. • Simple, efficient and robust software and hardware implementation. • Patent free video compression method. Following are few applications and advantages of proposed image compression method. • Lossless compression of several types synthetic images, e.g., cartoon, scientific plots, animation, medical images etc. • Lossy compression of natural images. • Simple, efficient and robust implementation. • Simple, efficient and robust software and hardware implementation. • Patent free image compression method.
6.3
Future Work
Distributed/Parallel processing: The nature of our proposed video compression methods makes them very suitable for distributed/parallel processing. Video frames can be divided into blocks and fitting process can be applied to two or more blocks in parallel. Java Threads are light weight processes and can be used to process blocks. 108
Similarly distributed/parallel processing can also be applied during decoding of blocks. Shape Adaptive Transform: We did spatial coding of spline approximated data using conventional JPEG standards, which is essentially Discrete Cosine Transform (same approach is used in MPEG–1, MPEG–2 standards). Better coding gain can be achieved if spatial coding of spline approximated data is done using Shape-Adaptive Discrete Cosine Transform [58] or Shape-Adaptive Wavelet Transform [32]. Future object based coding standard, i.e. MPEG–7 also proposed Shape-Adaptive Transforms. Object Coding using Spline: In MPEG-4 [76, 38, 49] standard a scene consists of several layers of Video Object Planes (VOPs). Each VOP is a video frame of a specific object of interest to be coded. VOPs are coded independently, using motion, texture and shape. At the decoder, different objects are composed into a scene and displayed. The MPEG-4 does not standardize the method of defining the VOPs. Spline can be used to model the shape of objects in MPEG-4 videos. This is useful not only from compression perspective but also make it possible to apply operations such as scaling, translation, rotation etc. on objects. Lossy Compression of Natural Images: Our proposed method is suitable for lossless synthetic image compression. Further research is required on lossy compression of natural images by spline fitting.
109
Appendix A Test Video Sequences Test video sequences can be downloaded from the following websites. 1. Xiph.org Test Media: A repository for freely-redistributable test sequences. Available format/resolution: CIF, QCIF, SIF. http://media.xiph.org/video/derf/ 2. Center for Image Processing Research (CIPR). A repository for wide variety test sequences. Available format/resolution: CIF, QCIF, SIF, Sun Raster file, img. Supports both HTTP and FTP download. http://www.cipr.rpi.edu/resource/sequences/index.html 3. Encoded sequences with SP and SI frames. Available format/resolution: H.264 output file, The encoded file in YUV format. http://ivms.stanford.edu/∼esetton/sequences.htm 4. CityU Image Database: Well organized archive of both images and video sequences. Available format/resolution: YUV in various resolutions. http://eelmpo.cityu.edu.hk/imagedb/cgi-bin/ibrowser/ibrowser.cgi?folder= / 5. Test sequences in upto very high resolution (1080) available via anonymous ftp. http://www.ldv.ei.tum.de/liquid.php?page=70
110
6. Video trace research group at Arizona State University. Available format/resolution: 4:2:0 YUV at CIF and QCIF resolution. http://trace.eas.asu.edu/yuv/index.html
111
Appendix B Software Abundance of software are available for image processing but very few softwares are available for video coding with user control on input parameters. 1. VcDemo: VcDemo is an interactive image and video compression free-ware software package for Windows. It includes Image Subsampling, Pulse Code Modulation, Differential Pulse Code Modulation, Vector Quantization, Fractal Compression, Discrete-Cosine Transform (DCT) Compression, JPEG Compression Standard (lossy DCT version), Subband Compression, Embedded Zerotree Wavelet Compression, Wavelet-SPIHT (set-partitioning in hierarchical trees) Compression, JPEG2000 Compression, Video Player, Block-based Motion Estimation, MPEG-1 and MPEG-2 Video Encoder/Decoder, Wireless Channel Simulator (in MPEG decoder). http://ict.ewi.tudelft.nl/∼inald/vcdemo/VcDemo 2. TMPGEnc: TMPGEnc (formerly known as Tsunami MPEG Encoder) is an MPEG encoder with basic video editing capabilities. It runs on Microsoft Windows. The free version has a 30-day time limit on MPEG-2 encoding capability. MPEG-1 encoding is free. There is also a commercial version with no limitations, called TMPGEnc Plus. TMPGEnc can read most video formats, as long as the appropriate DirectShow filters are installed in the system. http://www.tmpgenc.net/
112
Figure B.1: VcDemo
Figure B.2: TMPGEnc
113
Appendix C MATLAB CODE Natural Cubic spline fitting, point/pixel level error bound
% % % Approximation of Data by Natural Cublic Spline % % % Each data point can be in N−Dimension Vector Space
% % INPUT % % Mat: M x N Data Matrix with each row has a point in N−D vector space i.e. % %
Mat=[x1, y1,..., zn;
% %
x2, y2,..., zn;
% %
...
% %
xm, ym,..., zn];
% % MxAllowSqD (optional): Max.allowed distance b/w original and % %
fitted data
% % breaksIndex (optional): indices of initial breaks point w.r.t Mat % % cmd (optional): a string variable if its value is stop then fit spline % % to only initial break indices
% % OUTPUT % % argout: structure with following fields % %
breaksPoint: Final Break Points
% %
breaksIndex: Final break indices with respect Mat
114
% %
MaxSqDist: Max. distance between original data and parametric data,
% %
it is
≤
MxAllowSqD
% %
MaxSqDistIndex: Index of MaxSqDist with respect to input Mat
% %
MatInterp: interpolated values at final breaksIndex
function argout=ncsaproxu(Mat,varargin) % % −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− if (size(Mat,1) MxAllowSqD) %% index w.r.t Mat where sq. dist. is max among all segments dsmaxGlbInd=MxSqDGlbIndTen(dsmaxLocInd,2);
122
fbi(length(fbi)+1)=dsmaxGlbInd; fbi=sort(fbi);
% append % sort
%% Cardinal interpolation b/w Pj and Pj+1 requires Pj−1 and Pj+2. %% Therefore if new breakindex is k the effected breaks are k−3 to k+3, %% and effected segments are b/w k−2 to k+2 breaks.
%% Finding range of fbi that would be affected by adding a new %% point at max−square−distance postion. %% If kth row mataches then get atmost k−3 to k+3 rows of fbi. [EffbreaksIndex]=FindGivenRangeMatchedMat([fbi],[1 ; dsmaxGlbInd], 3); si=EffbreaksIndex(2);
% intrp. values for fbi(1:si) are already computed
SL=length(EffbreaksIndex)−1;% second last postion of EffbreaksIndex array ei=EffbreaksIndex(SL);
% intrp. values for fbi(ei:end,:) are already computed
MatEff=Mat(EffbreaksIndex,:);% values of Mat at EffbreaksIndex
%% Findng tension of effected segments TVecEff =crdMatfindtensionleastsq(Mat,EffbreaksIndex);
%% Now we would do cardinal interpolation of effected segments only [MatInterpNew]=crdspinterp( MatEff, EffbreaksIndex,TVecEff ); % new values
%% Combining new and old values (old + new + old ). %% Not taking common point b/w old−new and b/w new−old. MatI=[MatI(1:si−1,:); MatInterpNew; MatI(ei+1:end,:)];
%% Now we would find the max−square−distance of affected segments only [sqDistAryN,indexAryGlobalN]=MaxSqDistAndInd4EachSegbw2Mat(Mat,... MatI, EffbreaksIndex(2:SL) ); % new sqDistMatN=[sqDistAryN',indexAryGlobalN', TVecEff];
% new mat
MxSqDGlbIndTen=[MxSqDGlbIndTen(1:dsmaxLocInd−2,:);... sqDistMatN;... MxSqDGlbIndTen(dsmaxLocInd+2:end,:)]; [dsmax, dsmaxLocInd]=max(MxSqDGlbIndTen(:,1)); end % % −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
123
fbp=Mat(fbi,:);
% break points
fbp=converttoclass(fbp,datatype); % convert to that of org. Mat MxSqD=dsmax;
% MxSqD≤ MxAllowSqD
TVec=MxSqDGlbIndTen(:,3);
% Tension values for each segment
Linear B´ ezier fitting, point/pixel level error bound
% % Approximation of data by Parametric Line. % % Finds Break Points in the ginve data set that approximates the % % given data upto specified squared distance limit. % % Uniform parameterization is used.
% % INPUT % % Mat: original data (e.g. boundary data) to be approximated % % ibi: initial break points indices. % %
For each point (row) in Mat, ibi have its correspding
% %
index (row) in Mat
% % MxAllowSqD: Tolerance limit b/w orginal and fitted spline. % %
If distance b/w original data and fitted data
% %
is greater thabn MxAllowSqD then at the
% %
point of max. sq distance segment is broken
% % cmd: a string variable if its value is 'stop' then % %
fit line to only initial break indices (DOES NOT BREAK)
% % OUTPUT % % fbp:
Final set of break points
% % fbi:
Final break points indices (Optional).
% % MxSqD: Max square distance between origina data and paramteric % %
values. MxSqD
≤
MxAllowSqD (Optional).
% % MxSqDInd: index of max. sq. distance with respect to Mat
% % One can think each row of Mat & fbp is like [w, x, y, z,....] % % i.e. each row has one % % point P=(w,x,y,z....) so format of Mat is like following,
124
% %
[P1;
% %
P2;
% %
P3;
% %
P4;
% %
...
% %
PN];
function [fbp,fbi,MxSqD,MxSqDInd]=lineapproxu(Mat,varargin)
MxSqD=0;
if (size(Mat,1) < 2) error('Atleast two rows (four points) are required in Data Matrix'); end
% % % Default Values MxAllowSqD=1; ibi=[1; size(Mat,1)]; % first & last cmd=''; defaultValues = {MxAllowSqD,ibi,cmd}; % % %
Assign Values
nonemptyIdx = ¬cellfun('isempty',varargin); defaultValues(nonemptyIdx) = varargin(nonemptyIdx); [MxAllowSqD,ibi,cmd] = deal(defaultValues{:}); % % %−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− datatype=class(Mat); %original data type of Mat Mat=double(Mat);
%converte to double (necessary for computation)
MxAllowSqD=double(MxAllowSqD);
ibi=getcolvector(ibi); ibi=[ibi; 1; size(Mat,1)]; % make sure first & last are included ibi=unique(ibi);
% sort and remove duplicates if any
fbi=ibi; clear ibi; fbp=[];
125
% % fbi are going to be update % % incrementaly based on break and fit algorithm. % % At end values at fbi would be values of fbp.
if(MxAllowSqD MxAllowSqD ) % % appending index of new segmentation into fbi MxSqDInd=sqDistMat(localIndex,2); % index w.r.t Mat where sq. dist. is max among all segments fbi(length(fbi)+1)=MxSqDInd;
% append
fbi=sort(fbi);
% sort
% % EfffinalbreaksIndex is range of fbi that would be % % effected by adding a new point at max−square−distance postion. % % If kth row mataches then get atmost k−1 to k+1 rows of fbi. [EfffinalbreaksIndex]=FindGivenRangeMatchedMat([fbi],[1 ; MxSqDInd], 1);
EffVal=Mat(EfffinalbreaksIndex,:); % values of effected breaks
% % Line Interpolatoin to new segments using effetive break points [MatINew]=ndlineinterpuseg(EffVal,EfffinalbreaksIndex); %% Newly inserted segment share common break points with %% its previous and next segments.
126
si=EfffinalbreaksIndex(1); ei=EfffinalbreaksIndex(end); %% interpolated values of fbi(1:si) are already computed %% interpolated values of fbi(ei:end,:) are already computed
%% Combining new and old interpolation values (old + new + old ), %% skipping common point. MatI=[MatI(1:si−1,:); MatINew; MatI(ei+1:end,:)];
%% now we would find the max−square−distance value %% and index of affected segments only [sqDistAryN,indexAryGlobalN]=MaxSqDistAndInd4EachSegbw2Mat(... Mat,MatI, EfffinalbreaksIndex ); sqDistMatN=[sqDistAryN',indexAryGlobalN'];
% new mat
%% when initially there was one segment b/w first and last breaks if( size(sqDistMat,1)==1 ) sqDistMat=sqDistMatN; % %
combining sqDistMat new and old values (old + new + old)
else sqDistMat=[sqDistMat(1:localIndex−1,:);... sqDistMatN;... sqDistMat(localIndex+1:length(sqDistMat),:)]; end
[MxSqD, localIndex]=max(sqDistMat(:,1));
end
% % no break at this index becasue at this point MxSqD MxAllowSqD) %% appending index of new segmentation into ibi %% index w.r.t Mat where sq. dist. is max among all segments MxSqDInd=sqDistMat(localIndex,2); ibi(length(ibi)+1)=MxSqDInd;
% append
ibi=sort(ibi);
% sort
%% Finding range of ibi that would be affected by adding a new %% point at max−square−distance postion. %% If kth row mataches then get atmost k−1 to k+1 rows of ibi. [EffinitialbreaksIndex]=FindGivenRangeMatchedMat([ibi],[1 ; MxSqDInd], 1);
%% Finding control points of two new segments (obtained by breaking a segment) %% Since we are passing EffinitialbreaksIndex, findqbzcplsallseg will only take %% relevant segments data from Mat. [p0matN,p1matN,p2matN,tiN]=findqbzcplsallseg(Mat,EffinitialbreaksIndex,'u');
%% Combining new and old control point values (old + new + old ). %% if only one row in sqDistMat (case when initially there were only two %% breakpoints) if( size(sqDistMat,1)==1 ) p0mat=p0matN; p1mat=p1matN; p2mat=p2matN; else p0mat=[p0mat(1:localIndex−1,:); p0matN; p0mat(localIndex+1:end,:)]; p1mat=[p1mat(1:localIndex−1,:); p1matN; p1mat(localIndex+1:end,:)]; p2mat=[p2mat(1:localIndex−1,:); p2matN; p2mat(localIndex+1:end,:)]; end
%%Bezier Interpolatoin to new segments [MatINew]=qbzIntrpcpmatsegvec(p0matN,p1matN,p2matN,EffinitialbreaksIndex,tiN);
si=EffinitialbreaksIndex(1);
%intrp. values ibi(1:si) are already computed
130
ei=EffinitialbreaksIndex(end);%intrp. values ibi(ei:end,:) are already computed
%% Combining new and old interpolation values (old + new + old ). %% Not taking common point b/w old−new and b/w new−old MatI=[MatI(1:si−1,:); MatINew; MatI(ei+1:end,:)];
%% now we would find the max−square−distance of affected segments only [sqDistAryN,indexAryGlobalN]=MaxSqDistAndInd4EachSegbw2Mat(... Mat,MatI, EffinitialbreaksIndex ); % new sqDistMatN=[sqDistAryN',indexAryGlobalN'];
% new mat
%% if only one row in sqDistMat (case when initially there were only two %% breakpoints) if( size(sqDistMat,1)==1 ) sqDistMat=sqDistMatN; else %% combining sqDistMat new and old values (old + new + old) sqDistMat=[sqDistMat(1:localIndex−1,:);... sqDistMatN;... sqDistMat(localIndex+1:length(sqDistMat),:)]; end [MxSqD, localIndex]=max(sqDistMat(:,1)); end
fbi=ibi;
p0mat=converttoclass(p0mat,datatype); %to that of org. Mat p1mat=converttoclass(p1mat,datatype); p2mat=converttoclass(p2mat,datatype);
% % % % % NOTES=============== % % Alternativley combining sqDistMat new and old values % % can be done as follows (slightly less efficient). %
sqDistMat(localIndex,:)=[];
% remove previous local row
%
sqDistMat=[sqDistMat;sqDistMatN];
% appending two new rows
%
sqDistMat=sortrows(sqDistMat,2)
% sort by index (seond column)
131
Cubic B´ ezier fitting, point/pixel level error bound
% Approximation of data by Cubic Bezier Curves. % Based on least square fit, uniform parameterization. % Finds Control Point of Bezier Curve that approximates the % given data upto specified squared distance limit.
% %INPUT % Mat: original data (e.g. boundary data) to be approximated % ibi: initial break points indices. %
For each point (row) in Mat ibi have its correspding index(row) in Mat
% MxAllowSqD: Tolerance limit b/w orginal and fitted spline. %
If distance b/w original data and fitted data
%
is greater thabn MxAllowSqD then at the
%
point of max. sq distance segment is brokon
% % OUTPUT % p0mat,p1mat,p2mat,p3mat: Final set of control points % fbi: Final break points indices (Optional). % MxSqD: Max square distance between origina data and %
paramteric values. MxSqD
≤
MxAllowSqD (Optional).
% One can think each row of Mat, p0mat,p1mat,p2mat,p3mat is % like [w, x, y, z,....] i.e. each row has one point % P=(w,x,y,z....) so format of Mat is like following, %
[P1;
%
P2;
%
P3;
%
P4;
%
...
%
PN];
function [p0mat,p1mat,p2mat,p3mat,fbi,MxSqD]=bzapproxu(Mat,varargin)
132
p0mat=[];
p1mat=[];
p2mat=[];
p3mat=[];
fbi=[]; MxSqD=0;
if (size(Mat,1) < 4) error('Atleast four points are required in Data Matrix'); end
%%% Default Values %%% MxAllowSqD=1; ibi=[1; size(Mat,1)]; % first & last defaultValues = {MxAllowSqD ibi}; %%% Assign Valus %%% nonemptyIdx = ¬cellfun('isempty',varargin); defaultValues(nonemptyIdx) = varargin(nonemptyIdx); [MxAllowSqD ibi] = deal(defaultValues{:}); % % %−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− datatype=class(Mat); %original data type Mat=double(Mat);
%converte to double (necessary for computation)
MxAllowSqD=double(MxAllowSqD); % % %−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
if(MxAllowSqD MxAllowSqD) %% appending index of new segmentation into ibi %% index w.r.t Mat where sq. dist. is max among all segments MaxSqDistIndex=sqDistMat(localIndex,2); % append
ibi(length(ibi)+1)=MaxSqDistIndex; ibi=sort(ibi);
% sort
%% Finding range of ibi that would be affected by adding a new %% point at max−square−distance postion. %% If kth row mataches then get atmost k−1 to k+1 rows of ibi. [EffinitialbreaksIndex]=FindGivenRangeMatchedMat([ibi],[1 ; MaxSqDistIndex], 1);
%% Finding control points of two new segments (obtained by breaking a segment) %% Since we are passing EffinitialbreaksIndex, FindBzCP4AllSeg will only take %% relevant segments data from Mat. [p0matN,p1matN,p2matN,p3matN,tiN]=FindBzCP4AllSeg(... Mat,EffinitialbreaksIndex,'u');
%% Combining new and old control point values (old + new + old ). %% if only one row in sqDistMat (case when initially there were only two %% breakpoints) if( size(sqDistMat,1)==1 ) p0mat=p0matN; p1mat=p1matN; p2mat=p2matN; p3mat=p3matN; else p0mat=[p0mat(1:localIndex−1,:); p0matN; p0mat(localIndex+1:end,:)]; p1mat=[p1mat(1:localIndex−1,:); p1matN; p1mat(localIndex+1:end,:)]; p2mat=[p2mat(1:localIndex−1,:); p2matN; p2mat(localIndex+1:end,:)]; p3mat=[p3mat(1:localIndex−1,:); p3matN; p3mat(localIndex+1:end,:)]; end
%% Bezier Interpolatoin to new segments [MatINew]=BezierInterpCPMatSegVec(p0matN,p1matN,p2matN,p3matN,...
134
EffinitialbreaksIndex,tiN);
si=EffinitialbreaksIndex(1);
% intrp. values ibi(1:si) are already computed
ei=EffinitialbreaksIndex(end);% intrp. values ibi(ei:end,:) are already computed
%% Combining new and old interpolation values (old + new + old ). %% Not taking common point b/w old−new and b/w new−old MatI=[MatI(1:si−1,:); MatINew; MatI(ei+1:end,:)];
%% now we would find the max−square−distance of affected segments only [sqDistAryN,indexAryGlobalN]=MaxSqDistAndInd4EachSegbw2Mat(... Mat,MatI, EffinitialbreaksIndex ); % new sqDistMatN=[sqDistAryN',indexAryGlobalN'];
% new mat
%% if only one row in sqDistMat (case when initially %% there were only two breakpoints) if( size(sqDistMat,1)==1 ) sqDistMat=sqDistMatN; else %% combining sqDistMat new and old values (old + new + old) sqDistMat=[sqDistMat(1:localIndex−1,:);... sqDistMatN;... sqDistMat(localIndex+1:length(sqDistMat),:)]; end [MxSqD, localIndex]=max(sqDistMat(:,1)); end
fbi=ibi;
%Convert to that of org. Mat p0mat=converttoclass(p0mat,datatype); p1mat=converttoclass(p1mat,datatype); p2mat=converttoclass(p2mat,datatype); p3mat=converttoclass(p3mat,datatype);
% % % % % NOTES=============== % % Alternativley combining sqDistMat new and old values
135
% % can be done as follows (slightly less efficient). %
sqDistMat(localIndex,:)=[];
% remove previous local row
%
sqDistMat=[sqDistMat;sqDistMatN];
% appending two new rows
%
sqDistMat=sortrows(sqDistMat,2)
% sort by index (seond column)
Linear B´ ezier fitting, segment level error bound
% % Approximation of data by Linear Bezier Curves. % % Based on least square fit, uniform parameterization. % % Finds Control Points of Bezier Curve that approximates the given % % data upto specified mean square limit (MSE) for each segment.
% % INPUT % % Mat: original data % % ibi: Initial Break Indices. For each point (row) in Mat % %
ibi have its correspding index(row) in Mat
% % MxAllowMSE: Max. allowed MSE b/w original and fitted data.
% % OUTPUT % % MatI: Interpolated (approximated) data % % fbp: Final set of break points % % fbi: Final break points indices (Optional). % % msesqdsqdInd segAry(k,:): holds MSE, max. sq. dist, % %
max. sq. dist index for kth segment
% % One can think each row of Mat, p0mat,p1mat,p2mat,p3mat is like % % [w, x, y, z,....] i.e. each row has one point P=(w,x,y,z....) % %
[P1;
% %
P2;
% %
P3;
% %
...
% %
PN];
function [MatI,fbp,fbi,msesqdsqdInd segAry]=lbzmseaprxu(Mat,varargin)
fbi=[];
136
maxmse=0;
% % % Default Values MxAllowMSE=1; ibi=[1; size(Mat,1)]; % first & last defaultValues = {MxAllowMSE ibi}; % % % Assign Valus nonemptyIdx = ¬cellfun('isempty',varargin); defaultValues(nonemptyIdx) = varargin(nonemptyIdx); [MxAllowMSE ibi] = deal(defaultValues{:}); % % %−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
datatype=class(Mat); %original data type Mat=double(Mat);
%converte to double (necessary for computation)
MxAllowMSE=double(MxAllowMSE); % % %−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
if(MxAllowMSE