Proc. 2nd International Conference on Mathematical Sciences Pros. Persidangan Antarabangsa Sains Matematik Kedua
ICMS2 2010
COMPRESSION OF TEMPERATURE DATA BY USING DAUBECHIES WAVELETS SAMSUL ARIFFIN ABDUL KARIM, BAKRI ABDUL KARIM, MOHD TAHIR ISMAIL, MOHAMMAD KHATIM HASAN & JUMAT SULAIMAN
ABSTRACT Wavelet transform is being used effectively in data compression such as signal and image compression. In data compression, we will set threshold value to cut-off the data at certain number, then we apply wavelet transform to the data to obtain the compressed form of the original data. The quality of the compression will be based on two measurement e.g., compression ratio (CR) and Root Mean Square Error (RMSE). In this paper we will compress the temperature data in Kuala Lumpur from January 1948 until July 2010 by using Daubechies 4 wavelets (D4). Numerical results will be presented using Matlab. Keywords: Daubechies wavelet; Wavelet Transform; temperature data; filters, data compression.
1. Introduction Almost every day we transmit all data via internet connection or we use all data such as music, video, camera etc. Say for example, we would like to share the video data with the size of 100MB with our friends around the world, then it is difficult for us to email the data. Therefore, we need to compress the data by using some suitable methods. By doing compression, we will be able to reduce the size of the original data with the acceptable Compression Ratio (CR), thus, we may loss some data but in general the data lost does not significantly effect the original data. This is why compression is really important in our daily life as well as for business. Given the original signal (1D) or images (2D) problem, several transforms method can be applied such as Fourier, wavelets, Gabor transform, WashHadamard Transform, Karhunen-Loeve transform (KLT), Discrete Cosine Transform (DCTwhich is being used in JPEG and MPEG) and others. Each transforms method have their own capabilities and they may be suitable for different applications. Wavelet transform (WT) have been successfully used in image compression applications notably for finger print compression at FBI in USA and in JPEG2000 where both applications used biorthogonal wavelets which have two sacling functions and two wavelets function respectively. Besides that orthogonal wavelets also give us a lot of option in order to compress signal or image. What is wavelet? Basically wavelets come from scaling function (also known as father wavelets) and wavelets function (known as mother wavelet) are generated from father wavelets and all their children will be constructed from the shift and dilation operator to the mother wavelets. But one question remains to be answered: How to choose the correct wavelet basis functions? In our opinion, the best answer is solely depending on what applications that we are trying to solve. Having said that, we have a lot of wavelet basis functions such as Daubechies wavelets, symlets, Haar (the simplest wavelet), coiflets, spline wavelets, biorthogonal wavelets, and the most recent ones are post wavelets such as bandlets, edgelets, grouplets, curvelets etc. But in this paper we will use daubechies 4 wavelets (D4) since based on our experiment, these wavelets are sufficient in order to compress the temperature data and its transform keep all important features such as spike, structural break, anomaly etc.
726
S. A. Abdul Karim, B. Abdul Karim, M. T. Ismail, M. K. Hasan & J. Sulaiman
Lau and Weng (1995) have discussed the applications of wavelets to detect the climate signal. In this paper, the authors used Morlet wavelets (continuous wavelet transform CWT) and they concluded that WT provides understanding of the importance of local versus global climate signals via time-frequency localization of WT. Janicke et al. (2009) have utilized WT for visual exploration of climate variability changes at seasonally averaged northern hemisphere winter and northern hemisphere summer. While, Torrence and Compo (1998) explain in details all about wavelet analysis and its applications in atmospheric and oceanic. In this paper we will utilize wavelet transform (D4) in order to compress the temperature data. The results indicate that the D4 is capable to compress the original data with a good quality.
2 Wavelet Transform Wavelet analysis is a mathematical model that transforms the original signal (especially with time domain) into a different domain for analysis and processing. This model is very suitable with the non-stationary data, i.e. mean and autocorrelation of the signal are not constant over time. There exist various choices of wavelet basis functions such as Haar, Daubechies, Symlet, Meyer, biorthogonal wavelet and etc. Basically, we define wavelet directly from its counterpart that is scaling function also known as father wavelet and wavelet function also known as mother wavelet (Chui, 1992; Daubechies, 1992; Van Fleet, 2007). Following Karim et al. (2008), suppose that there exists a function φ (t ) ∈ L2 (R ) such that the family of functions j
(
)
φ (t ) = 2 2 2 j t − k , j , k ∈ Z . is an orthonormal basis. We can define the wavelet series as follow: f (x ) =
∑
α k ϕ 0k (x ) +
(1)
∞
∑∑ β jkψ jk (x ) ,
(2)
j =0 k
k
where α k , β jk are coefficients defined Eq (4), and {ψ
jk
}, k ∈ Ζ is a basis for W j . The relation
in (2) is called a multiresolution expansion of f. To turn (2) into wavelet expansion we use the following expression j 2 jk ( x ) = 2 ψ (2 x − k ), j
ψ
j, k ∈ Ζ .
(3)
Basically the function ϕ jk (x ) and ψ jk (x ) are called the scaling function (father wavelet) and the mother wavelet respectively. Meanwhile αk =
∫ f (x)ϕ0k (x)dx , β
jk
=
∫ f (x )ψ
jk
(x )dx
,
(4)
Where α k are called approximation/coarser coefficients and β jk are called detail coefficients. Since the development of multiresolution analysis by Mallat (1989), Daubechies (1992) has constructed various wavelets function e.g. Daubechies, symlets and coiflets wavelets. Figure 1 shows scaling and wavelet functions for D4 together with its corresponding lowpass (LP) and highpass (HP) filters.
727
Compression of Temperature Data by using Daubechies Wavelets
Figure 1: Scaling and Wavelet Functions for Daubechies 4 and its LP filter and HP filter
3. Thresholding Approach In order to compress the data, we need thresholding method. Basically in the literature there exists various thresholding methods such as hard thresholding, soft thresholding, Garrote thresholding, firm thresholding etc. In statistics literature, thresholding is called as shrinkage approach. Refer to Antoniadis (2007) for more details on various thresholding methods. In data compression, one of the main objective is to cut-off the data at certain values and then reconstruct the original signal or image with an acceptable compression ratio. Normally in data compression we will use hard thresholding method. Below, we list algorithm that could be used to perform data compression for 1D problem: 3.1 Compression algorithm 1. Input signal with length N 2. Apply DWT by using D4 to the data (perform wavelet transform of the data) 3. We find hard thresholding values. All transform values that are insignificant (which lie below threshold value) are set equal to zero. 4. Keep only values that are non-zero or significant obtain from the transformation in step 3. Then apply wavelet compression to the original signal with the threshold values from Step 3. 5. Finally perform the inverse discrete wavelet transform (IDWT) for the data in step 4. This decompressing step produces an approximate of the original data (an example can be seen in Figure 5). Below we give the details on hard thresholding: Say, we are given wavelet coefficients w and threshold value λ , the hard threshold value of the coefficient can be written as: ηhard= ( w, λ ) w I ( w > λ ) , (5) Where I is the usual indicator function. See Donoho and Johnstone (1994, 1995), Antoniadis (1997, 2007), Hardle et al. (1998), Karim et al. (2010a, 2010b), Strang and Nguyen (1996), Van Fleet (2007) and Karim and Ismail (2008) for more detail on denoising techniques. In this paper we use global value of threshold as suggested by Donoho and Johnstone (1994,
728
S. A. Abdul Karim, B. Abdul Karim, M. T. Ismail, M. K. Hasan & J. Sulaiman
1995). Figure 2 shows an example of soft thresholding and hard thresholding with the function on [-1,1]. Clearly soft thresholding is more suitable to other operation such as denoising etc. Meanwhile hard thresholding is suitable for data compression since in compression, we cut and kill all the coefficients lie below the threshold value.
Figure 2: Threshold Selection.
4. Results And Discussion In this section we discuss the application of wavelet transform Daubechies 4 (D4) 4 to compress the time series data (temperature data). The data that we used is temperature data in Kuala Lumpur from January 1948 until July 2010. Total data sets are 751. We collected the data from NASA. Figure 3 shows plot of the original series. 30 29.5 29 28.5 28 27.5 27 26.5 26 25.5 25
0
100
200
300
400
500
600
700
800
Figure 3: Original Series
Since to apply Discrete Wavelet Transform (DWT) we need the data in terms of dyadic number, we can use any methods such as zero padding etc. (see Van Fleet, 2007) for details on this approach. But for the analysis, we need to consider only the size of the original data. The reason for using this wavelet functions is because it is similar with the shape of the original data, as we can see clearly that it resembles fractal like shape, so D4 seem suitable for this type of data sets (of course we can use other wavelets). Before we do the compression, we decompose the original time series (signal) by using D4 at level 5. From Figure 4, we can notice that at level 5, the approximations (a5) and detail (d5) have shown clearly the shape and characteristics of the original data. In other words, all the characteristics of the data have been exactly recaptured via MRA (Multiresolution Analysis) or wavelet decomposition. This is why wavelets are so efficient for time series analysis. From level 1 until level 5 in details coefficients, all of the high frequencies have been filtered out where finally we obtained the
729
Compression of Temperature Data by using Daubechies Wavelets
smooth version of the original signal. We can reconstruct again the original signal at level 5 by using the summation of (d1+d2+d3+d4+d5+a5). This is shown clearly in Figure 4 (we obtained exactly the same signal after we decomposed!). Subsequently, how to compress the data? Once we have decomposed the original data and we have decided to decompose at level 5, then we need to calculate threshold values for every level. But in this paper we have calculated the threshold values only one time since we apply global threshold (all level have same threshold values). With the hard threshold values, we apply compression to the original data. Figure 5 shows the result. It seems to us that the compressed signal have resembled the original signal. Indeed, based on statistical results in table 1, we noticed that the RMSE is 1.23 × 10 −3 and CR is 1:10 which indicated that the compressed signal quite good in term of compression quality. The percentage of the zeros of details coefficients (84.95%) and retain energy (99.99%) which are almost perfect. Therefore, by using D4 wavelet and applying level 5 compression with hard threshold value (0.8166), we obtained a good compressed data.
Figure 4: Wavelet decomposition at level 5 for original temperature data in Kuala Lumpur by using D4- left (approximation) and right (detail)
730
S. A. Abdul Karim, B. Abdul Karim, M. T. Ismail, M. K. Hasan & J. Sulaiman
Figure 5: Reconstruct signal d1+d2+d3+d4+d5+a5
Figure 5 shows the wavelet decomposition for original temperature data. We can reconstruct the signal by adding details from level 1 until level 5 and approximation at level 5. Then, after that we will select suitable threshold value (in this paper we used global threshold) then apply compression to the data by using D4. Overall level 5 of the approximations (a5) and detail (d5) indicates an increasing trend in the temperature. This is due to the global warming because of the Green House effect which is getting worse year by year. Table 1: Statistical Analysis for Compression Time Series by Using D4 Wavelet
level
Retained energy (%)
Zeroes detail (%)
RMSE
Standard Deviation (SD)
Median Absolute Deviation (MEAD)
Mean Absolute Deviation (MAD)
Compr ession Ratio (CR)
D4
5
99.99
84.95
1.23 × 10 −3
0.3312
0.2205
0.2667
1:10
One might be asking why we did not use fourier transform (FT) to compress the data? To answer this question, we should understand that the basis of FT is cosines and sines which is smooth in nature. Meanwile from the original signal the nature of the data is fractal like shape. So definitely FT will not achieve best result as compared to d4 wavelets. Since in FT we smooth all the data where all of the spike and anomaly in the data will be removed when we compress it by using FT. On the other hand, this will not happen when we compress using DWT. Since DWT will preserve all the spike and etc.. In order to justify this matter, Figure 7 exhibits the periodogram or the plot of the estimation power spectrum versus frequency. It appears from Figure 7 that fast fourier transform (FFT) is not sufficient to capture the behavior of the series since it represents the data as a function of position. Moreover, a plot of the FFT (Figure 7) of this signals show nothing particular interesting. But with wavelets we can do a lot of analysis as compared with FFT.
731
Compression of Temperature Data by using Daubechies Wavelets
Figure 6: Compressed original signal (temperature data) at level 5 using D4. 8
4.5
Periodogram
x 10
4 3.5 3 2.5 2 1.5 1 0.5 0
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Figure 7: The periodogram or the Power Spectrum Estimation
Conclusions In this preliminary study on the use of wavelet transform in analysing the temperature data in Malaysia, we have discussed the ability of Dabechies 4 (D4) wavelets. From the experiment we have done, D4 are capable to extract all important features in the temperature data (e.g., time series). We also perform compression to the temperature data. Based on the result, even at level 5, D4 are capable to preserve all important features and its decomposition inherits all important features in the time series such as the spike, nomaly, structural break etc. Based on statistical results, D4 give better results in compression the temperature data i.e., RMSE and CR are acceptable and the results show good quality in the reconstruction of temperature data, this can be seen clearly from Table 1. For future research we will do the temperature data compression by using various orthogonal wavelet functions and we will do standard statistical testing to the data such as Box-Jenkins, Weill Bull etc. We will report the result in our forthcoming papers.
732
S. A. Abdul Karim, B. Abdul Karim, M. T. Ismail, M. K. Hasan & J. Sulaiman
The first author would like to thanks to Universiti Teknologi Petronas for providing the financial support and computing Acknowledgement facilities including Matlab software. The fourth author also would like to thank UKMGGPM-ICT-110-2010 for sponsoring his attendant to this conference. References Antoniadis, A. 1997. Wavelets in Statistics: A review. Journal of Italian Statistical Society, 6: 1-34. Antoniadis, A. 2007. Wavelets Methods in Statistics: Some Recent developments and Their Applications. Statistics Surveys 1: 16-55. Chui, C.K. 1992. An Introduction to Wavelets. New York: Academic Press. Daubechies, I. 1992. Ten Lectures on Wavelets. Vol. 61, Philadelphia, PA: CBMS-NSF Reg. Con. Ser. Appl. Math., Society for Industrial Applied Maths (SIAM). Donoho, D. L. & Johnstone, I. M. 1994. Ideal Spatial Adaptation by Wavelets Shrinkage. Biometrika 81: 425-455. Donoho, D. L. & Johnstone, I. M. 1995. Adapting to Unknown Smootnees via Wavelet Shrinkage. Journal of American Statistical Society. 90: 1200-1224. Hardle, W., Kerkyacharian, G., Picard, D. & Tsybakov, A. 1998. Wavelets, Approximation and Statistical Applications. Lecture Notes in Statistics, Volume 129, New York: Springer. Janicke, H., Bottinger, M., Mikolajewicz, U & Scheuermann, G. 2009. Visual Exploration of Climate Variability Changes Using Wavelet Analysis. IEEE Transactions on Visualization and Computer Graphics. 15(6): 13841391. Karim, S.A.A, Ismail, M.T. and Karim, B. 2008. Denoising Non-Stationary Time Series using Daubechies Wavelets. Seminar Kebangsaan Matematik & Masyarakat (SKMM), UMT Terengganu 13-14 February 2008, Grand Continental Hotel, Kuala Terengganu (In CD). Karim, S.A.A and Ismail, M.T. 2008. Wavelet Method in Statistics. In Proceedings of the Sixteenth National Symposium on Mathematical Sciences, 3-5 Jun 2008 at Hotel Renaissance, Kota Baharu. Karim, S.A.A, Karim , B.A., Ismail, M.T., Hasan, M.K. and Sulaiman, J. 2010. Applications of Wavelet Method in Stock Exchange Problem. Proceedings of International Conference on Fundamental and Applied Sciences (ICFAS 2010), 15-17 June 2010, Kuala Lumpur Convention Centre. Karim, S.A.A, Ismail, M.T., Karim , B.A., Hasan, M.K. and Sulaiman, J. 2010. Compression KLCI Time Series Data Using Wavelet Transform. World Engineering Congress 2010, 2nd – 5th August 2010, Kuching, Sarawak, Malaysia Conference on Engineering and Technology Education. In CD. Lau, K.-M & Weng, H. 1995. Climate Signal Detection Using Wavelet Trasnform: How to Make a Time Series Sing. Bulletin of the American Meterological Society 76: 2391-2402. Mallat, S. 1989. A theory of multiresolution signal decomposition: the wavelet representation. IEEE Trans. Pattern Anal. Machine Learn 11(9): 674-693. Mallat, S. 1998. A Wavelet Tour of Signal Processing. San Diego: Academic Press. Strang, G. & Nguyen, T. 1996. Wavelets and Filter Banks. Massachussets, Wellesley:Wellesly-Cambridge Press. Torrence, C & Compo, G.P. 1998. A Practical Guide to Wavelet Analysis. Bulletin of the American Meterological Society 79: 61-78. Van Fleet, P. J. 2008. Discrete Wavelet Transformation: An Elementary Approach with Application. New Jersey, John Wiley & Sons.
Fundamental and Applied Sciences Department, Universiti Teknologi Petronas, Bandar Seri Iskandar, 31750 Tronoh, Perak Darul Ridzuan, Malaysia. E-mail:
[email protected] * Fakulti Ekonomi dan Perniagaan, Universiti Malaysia Sarawak, 94300 Kota Samarahan, Sarawak, Malaysia. E-mail:
[email protected]
733
Compression of Temperature Data by using Daubechies Wavelets
Pusat Pengajian Sains Matematik, Universiti Sains Malaysia, 11800 Minden, Pulau Pinang, Malaysia. E-mail:
[email protected] Program Informatik Industri, Pusat Pengajian Teknologi Maklumat, Fakulti Teknologi dan Sains Maklumat, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia. E-mail:
[email protected] Program Matematik dengan Ekonomi, Sekolah Sains dan Teknologi, Universiti Malaysia Sabah, Bg Berkunci 2073, 88999 Kota Kinabalu, Sabah, Malaysia. E-mail:
[email protected]
*
Corresponding author
734