Gaussian Approximation and Burrow-Wheeler ...

IEEE TRANSACTIONS ON SMART GRID 2016

1

Gaussian Approximation and Burrow-Wheeler Transform Based Lossless Compression of Smart Grid Readings Abstract—Smart grids have recently attracted attention because of high efficiency and throughput performance. They transmit a huge amount of periodically collected waveform readings (e.g. monitoring). Due to the size of these streams, required bandwidth and storage space become a unique challenge. Compression of these streams has a significant opportunity to solve these issues. Therefore, this paper proposes a new lossless smart grid readings compression algorithm. The uniqueness is representing smart grid waveforms using few parameters. This is effectively achieved using Gaussian approximation based on dynamic-nonlinear least square error technique. This means that our algorithm can work on any type of waveforms. The margin space between the approximated and the actual readings is measured. The significance is that the compression will be only for margin space limited points rather than the entire stream of readings. The margin space values are then encoded using Burrow-Wheeler Transform followed by Move-To-Front and Run-Length to eliminate the redundancy. Entropy encoding is finally applied. Both mathematical and empirical experiments have been thoroughly conducted to prove the significant enhancement of the entropy (i.e. almost reduced by half) and the resultant compression ratio (i.e. 3.8:1) Which is higher than any known lossless algorithm in this domain. Index Terms—Smart Grid, Compression, Gaussian, Entropy.

I. I NTRODUCTION The traditional power grid of the past century is now regarded as unsuitable to the 21st Century requirements for many reasons such as lack of automation, outage management deficiency and poor real-time analysis [1]. Consequently, a new infrastructure called smart grid are currently being investigated and deployed which can be used to automatically gather periodic waveform readings (i.e. using Phasor Measurement Units (PMUs)) every second (e.g. power consumption of the premise) and transmit them to operational centers using various techniques [2]. The considerable advantages are improved accuracy (e.g. continuous-dynamic electricity distribution and billings), efficiency (e.g. outage management automation), and sustainability (e.g. climate change mitigation). To that end, nations around the globe, standardization assemblies, companies and research institutes are working to regulate this field. However, the unusual amount of the continuously transmitted data (i.e. from millions of premises) is posing unexpected bandwidth and storage requirement challenges. For example, the collected waveforms data from smart grid project called Western Interconnection Synchrophasor Program (WISP) that has been recently implemented, where 300 PMUs have been distributed over US Western states for 15 months. The resultant data was 100 Terabyte, which means more than 220 Gigabyte per day [3]. Therefore, compression of smart grid data, has

a strong impact on minimizing the required bandwidth and storage space, while ensuring the avoidance of burdening the available data management and utilities communication infrastructure. Basically, the proposed compression methods for smart grid waveforms readings can be classified into two groups - lossy and lossless [4]. Lossy compression relies on the fact that losing some information while trying to maintain the main features of the waveforms signal. Therefore, the decompressed signal is somewhat different from the original. This kind of compression was acceptable in the traditional grid model, and so lots of research has been done in this path which can be classified into transformation techniques[5], [6], [7], parametric coding [8] and mixed [9] (See summary table I). This is because of that it can easily achieve higher compression ratio while losing some data. However, lossy compression is recently discouraged for two reasons: (1) after the emerging of smart grids and the potential use of their collected readings remotely in billings and other purposes, and (2) to maintain the privacy and authenticity of the transmitted readings, current models are using steganography to hide the secret information randomly inside these readings [10], [11]. Consequently, losing any bit of these readings will not be tolerated anymore. On the other hand, lossless compression is obligated to reconstruct the exact waveform signal as the original with zero loss. Due to these constraints, a few research has been done under this category such as in [12], [13], [14]. However, according to a recent state of art work [4] this path is far from being as mature as image, voice and video lossless compression. Therefore, we are compelled to look for better lossless compression mechanisms that achieve a higher compression ratio, while preserving all features of transmitted readings. One of the main constraints for any lossless compression is that less similarity in waveform readings means less likelihood of compression. The mathematical measurement of that in information theory called entropy (i.e. the minimum number of bits required to represent a value after compression) (See Section IV). This becomes worse in the smart waveform grid readings because of the precision required to reconstruct the signal (i.e. floating points). Therefore, the main question drives this paper is that, how can the waveform smart grid readings preprocessed to significantly reduce the entropy? In this paper, a new lossless waveform smart grid readings compression algorithm is proposed. The main target is representing smart grid waveform readings in few parameters. This is successfully accomplished using Gaussian approximation


Bulk Generator

2

Various Cloud-Based Servers Operation Centres

2 ways information flows Waveform readings can be decompressed by authorized staff inside Operation centres

Compressed Waveform readings storage and management

Electricity Flows Waveform readings are compressed at Phasor Data Concentrator PDC

Customers Smart Meters

Distribution Transmission

Waveform readings collected at the smart meter

Fig. 1. The main scenario of our proposed model where houses power consumption activities are collected as waveforms readings and compressed before transmission to operation centers.

[25]. The difference between the approximated and the actual waveform readings is calculated. Therefore, the compression will be only for margin space rather than the entire stream of waveform readings. The margin space values are encoded using Burrow-Wheeler Transform (BWT) followed by MoveTo-Front and Run-Length (RLE) to rearrange and eliminate the redundancy. Entropy coding is finally applied. The rest of this paper is organized as follows. Section II summarizes the relevant work. Section III introduces our algorithm in different stages. Then, evaluation of different characteristics of the proposed technique is introduced in Section IV. Section V discusses our performed experiments and the obtained results. We finally draw our conclusions in Section VI. II. R ELATED W ORK Most of the studies have been conducted on waveforms collected readings focused on lossy compression. This is due to (1) the readings was not directly transmitted and used for crucial purposes such as billings and real-time analysis in the tradition grid system, and (2) the effectiveness of transformation technique called wavelet transform that helps to represent waveform signals in few values (i.e. with losing some bits from every reading). The lossy compression work can be grouped based on the used techniques into transformation, parametric coding and mixed. Firstly, transformation models such as the work of Surya Santoso et al. [5], where Discrete wavelet transform has been applied to identify most of the signal energy in low-frequency coefficients (i.e. using dbX), allowing others to be removed. Additional work has been conducted using different families of wavelet such as Sluntlet [15] and B-Spline [16]. Secondly, Parametric coding models such as the work of Tcheou Michel et al. [8] where they used damped sinusoids models to extract signal features before compression. Finally, mixed transformation and parametric models such as the work proposed by Ribeiro Moises et al. [9] where they used fundamental harmonic and transient coding together.

On the other hand, a few work has been conducted on lossless compression due to its restrictions and the nature of waveforms readings. The lossless compression can be classified based on the technique used into the dictionary, entropy and mixed based models. Dictionary based algorithms rely mainly on general compressors (e.g. ZIP, GZIP and LZO) where a dictionary is built, and more frequent samples will be represented in fewer bits, whereas more bits are allocated to less frequent samples. For example, the work of Gerek Omer and Ece Dogan [17] where they used Lempel-Ziv to compress a stream of waveforms readings. The achieved compression ratio was 2.5:1 bin-to-bin. However, dictionary algorithms are fundamentally designed for letters (e.g. English characters) where the number of options is limited. This is ill-suited for waveforms signals because of their floating point nature. This means every integer number has thousands of images due to its floating values. Entropy-based algorithms are statistical models designed based on measuring the probability of every symbol within a stream and allocating less number of bits for higher probability and vice-versa. For example, the work of Kraus Jan et al. [13] where arithmetic coding was used to replace the input symbols with a single floating-point value. The achieved compression ratio was 2.6:1. Zhang Dahai et al. [12] also proposed a model that improves Huffman coding by preprocessing the data using higher order delta modulation. The improvement was from 1.7 to 2.3:1. Additionally, Joseph Tate [14] recently introduced a model that uses Golomb-Rice coding after preprocessing the data with several methods such as frequency compensated difference. The achieved compression ratio was 2.8:1. Mixed algorithms are more sophisticated techniques using both dictionary and statistical algorithms. This allows exploiting both the frequency of repetition and its probability within a stream of values. For example, Karus Jan et al. [19] introduced a model that improves LZMA algorithm to reduce the redundancy in waveforms readings. This is accomplished by using prediction models based on interval selection optimization and differential encoding. The achieved compression was 2.6:1. Karus Jan and Tobiska Tomas [13] also proposed a model


3

TABLE I R ELATED WORK S UMMARY. Group

Lossless

Category

Main Tech

Dictionary

Lempel-Ziv Huffman & Delta-Huffman Arithmetic Code Invert-Trans Golomb Prediction & LZMA Bzip2 & DeltaBzip2 Daubechies DWT Daubechies DWT Daubechies DWT

Entropy

Mixed

Transform Lossy

Parametric Coding Mixed Transform & Parametric

Reported CR 5:1 1.7:1 & 2:1 2.5:1

Measured CR 2.5:1 1.8:1 & 2.2:1 2.4:1

Distortion Matric -

Distortion Comments Value Measurement should be bin-bin not text to text Accurate numbers based on bin-to-bin

Ref

-

-

Better than Huffman

2.8:1

2.8:1

-

-

Accurate results & less overhead

[18], [13] [14]

2.5:1

2.5:1

-

-

Improve 10% over LZMA

[19]

2.8:1 & 3:1 3:1

2.7:1 & 2.9:1 -

-

-

The best existing lossless model

[13]

NMSE

10−5

Losing info

[5]

Better.Still losing info

[7]

[17] [12]

3.4:1

-

NMSE

10−4

5:1

-

SNR & RMS

Better. Still losing info

[6]

Slantlet DWT B-Spline DWT WPT & LZW WPT & AC EZW

10:1 15:1 10:1 6:1 10-16:1

-

MSE MSE PRD NMSE NMSE

27dB & 10−3 -19dB -25dB 10% 10−5 10−5

Serious lose Very high distortion Losing crucial readings Losing info. Losing info.

Lifting WT & Huffman Singular Value

20:1

-

SNR

> 25

Huge lose.

[15] [16] [20] [21] [18], [22] [23]

> 20

-

[24]

16:1

-

Very high distortion

[8]

16:1

-

MSE

4.1% & 0.036 > 30dB −30dB

Losing many bits

Damped sinusoids modeling Fundamental, Hamonic & Transient coding

MAE & MPE SNR

Higher distortion

[9]

that improves BZIP2 by using delta modulation after applying an efficient block sorting Burrows-Wheeler algorithm. The achieved compression ratio was 2.9:1. Table I summarizes most of the related work by categorizing them into groups. III. M ETHODOLOGY The principle behind the proposed technique is representing smart grid readings using few parameters. This has been successfully achieved using Gaussian approximation. The margin space between the approximated and the actual readings has been calculated before encoding them using Burrow-Wheeler Transform (BWT). This has been followed by Move-ToFront and RLE to eliminate the repetition. Entropy coding is finally applied. The core challenging task of our model is that the precision of Gaussian model and the selection of its appropriate parameters to produce an accurate approximated smart grid readings. A. Gaussian Approximation In probability theory, Gaussian approximation is a very well-known continuous probability distribution. It’s significance in statistics comes from its ability to represent realvalued random fluctuated signals whose distribution are not known [25]. This inspired us to use Gaussian distribution functions in our compression algorithm to approximate the

smart grid readings. Simple Gaussian function is depicted in Fig 2 and it is shown in Eq. 1 f (x) = ae−(

(x−b) c

2

)

(1)

Where x is a discrete variable and the Gaussian function parameters represented by a which is the amplitude of the highest peak value, b is the centroid of the model and c is the peak’s width. To depict multi-peak signal, the identical Gaussian equation can be reformulated as shown in Eq. 2.

f (x) =

n X

ai e

(x−bi ) 2 − c i

(2)

i=1

Where n reflects the number of Gaussian functions (i.e. the required peaks to fit). f (x) represents the smart grid readings. The crucial part is that how the values of the parameters are selected? Therefore, in our model, a suitable optimization theory called trust region algorithm is used to calculate these parameters due to its robust behavior in the ill-condition problems while maintaining a very strong convergence properties. This resulted in minimizing the difference between the approximated and the actual values of the smart meter readings. The variation between the original readings (y) and the Gaussian approximated readings signal (b y ) is carefully


4

a1

SM Readings

Gaussian App

250

0

500

1000

0

1500

Sample Number

b1

500

1000

0

1500

Sample Number

b2

Residuals=SM-Gaussian App

250 200

200

200

Gaussian App

300

power

power

250

SM Readings

Gaussian App

300

300

power

a3

a2 SM Readings

b3


100

500

1000

1500

Sample Number

100


50

50

power

power

power

100

0

50 0

0 -50 0

500

1000

1500

-50 0

Sample Number

500

1000

1500

0

Sample Number

500

1000

1500

Sample Number

Fig. 2. Three examples of Gaussian approximation optimization. (a) Plot of more than 1500 Smart grid power readings and their Gaussian approximations and (b) Plot of the resultant residuals after calculating the margin (i.e. highlighted in blue). Third is obviously better than first and second due to its very low residuals space.

measured using a non-linear least square equation as shown in Eq. 3 f=

n X

(yi − ybi )2

(3)

i=1

where f demonstrates the residuals of the sum squares that have to be reduced. Therefore, Trust region algorithm [25] is used to find the optimum Gaussian parameters x that utilized to minimize the objective function represented in f . This is done by approximating the original function f utilizing a quadratic equation mδ (p) to find out the optimum step size p by which the parameter values x have to be scaled up or down. Additionally, the step value at iteration δ can be specified by resolve this quadratic equation [26] as shown in Eq. 4 1 mδ (p) = fδ + pT gδ + pT Bδ p 2

(4)

where the value of the objective function at iteration δ using the current parameters values xδ can be reflected as fδ = f (xδ ), Bδ is the Hessian of f , and gδ is the gradient of both the parameter values of xδ at iteration δ and f . The step size value p (i.e. the solution of this problem) is restricted to a particular region ∆k called trust region as depicted in Eq. 5 kpk ≤ ∆δ

(5)

The trust region can be scaled up or down based on the original objective function and its accuracy in the approximated quadratic function. For this target, a reduction factor rk is introduced to examine the performance of the quadratic approximation as formulated in Eq. 6 rδ =

f (xδ ) − f (xδ + pδ ) mδ (0) − mδ (pδ )

(6)

The trust region ∆δ is changed based on the value of rδ as follows:   if rδ > 34  ∆δ ↑ (7) ∆δ ↓ if rδ < 41   ∆δ → else The parameters are finally updated utilizing the determined step pδ and the operation is repeated until the stop condition is reached. For better understanding, the steps shown in Algorithm 1 can be used to calculate Gaussian approximation parameters. For better understanding, Fig 2 shows three examples of Gaussian approximation optimization. (a) Plot for more than 1500 Smart grid power readings and their Gaussian approximations and (b) Plot for the resultant residuals after calculating the margin (i.e. highlighted in blue), which clearly shows the more the accuracy of Gaussian approximations is the less left margin values are. In this process alone, more than 50% of the values have been zeroed and others have been significantly reduced. The advantage of that is achieving higher compression ratio while keeping just a few parameters to reproduce Gaussian approximation to recover the readings.

B. Margin Calculation Gaussian distribution optimizations have been thoroughly examined to achieve the best possible generic distribution. To avoid unbiased results, the final optimum collection of parameters have been used in all our experiments. Next, the difference between smart grid readings and their Gaussian approximation is calculated (See Eq. 8). The significance of that is the compression will only be on the calculated margin space rather than the entire set of readings. Also, to avoid


5

Algorithm 1 Nonlinear fitting algorithm 1: Gaussian parameters values x0 initialized. ˆ reseted. 2: Maximum step length ∆ 3: Trust region size ∆0 ∈ [0, 14 ] initiated. 4: f is the the least square error to be reduced (See Eq. 3). 5: δ=1 6: while xδ is not ideal do 7: Calculate Eq. (4 and 5) to find the step value pδ 8: Calculate the minimization factor rδ (See Eq. 6). 9: if rδ > 43 then 10: xδ+1 = xδ + pδ 11: else 12: xδ+1 = xδ 13: end if 14: if rδ < 14 then 15: ∆δ+1 = 14 ∆δ 16: else if rδ > 43 and kpk = ∆δ then 17: ∆δ+1 = min(2∆δ , ∆ˆδ ) 18: else 19: ∆δ+1 = ∆δ 20: end if 21: δ =δ+1 22: end while

Comparison between the resultant margin space values Vs their unique values 2000 Their Unique Values e.g. 160

1800 1600 1400 1200 1000 800 600 400 200 0 1

2

3

4

Z

n

ϕ= i=1

[yi − ybi ]

D = [ϕ(2) − ϕ(1)ϕ(3) − ϕ(2)...ϕ(Ω) − ϕ(Ω − 1)]

(8)

(9)

Where ϕ is the calculated margin, Ω is the length of ϕ and D is the resultant derivative vector.

C. Burrow-Wheeler Transform

5

6

7

8

9

10

Fig. 3. Comparison between the calculated margin space numbers and their unique values.

The BWT begins by left rotation of the vector Λ in an iterative manner. This will generate a new 2D matrix called ω, as depicted in Eq. 11   Λ1 Λ2 Λ3 · · · Λn  Λ2 Λ3 · · · Λn Λ1     Λ3 · · · Λn Λ1 Λ2  ω= (11)   .. .. ..  ..  .  . . . Λn

large differences in any consecutive margin calculated values, the first derivative is applied as shown in Eq. 9.

Resultant margin space values e.g. 1500

Λ1

Λ2

···

Λn − 1

It is clear from Eq. 11 that the rows in ω represent various rotations of Λ. A new version of the initial ω called ω e is generated by sorting its rows in ascending way. The last column C of ω e is retrieved accompany with the index I that points to the original block Λ. For better understanding, let’s assume that the resultant values after the calculation of the margin and first derivative as shown in Table II. For simplicity, they have been converted to characters (i.e. represent Λ). Λ is rotated n (i.e. number of elements) times to generate ω as shown in Eq. 12.  ∧ b a n a n a |       | ∧ b a n a n a       ∧  a | b a n a n       ∧ n a | b a n a ω= (12) ∧ a n a | b a n      ∧  n a n a | b a       ∧  a n a n a | b      ∧ b a n a n a |

After calculating the first derivative on the margin space, the observation was they have less than 10% unique values (See fig. 3). However, they are scattered which minimize their compression effectiveness. Therefore, BWT is used to rearrange the values to gather them in the consecutive long sequence of identical symbols. BWT is originally introduced by Michael Burrows and David Wheeler [27] to transform text into another format to increase its compressibility by applying techniques such as MTF and RLE. The significance of this algorithm is that it is reversible with zero additional information. The general idea is that rotating the data (i.e. 1 to n blocks) lexicographically. Let’s Λ be the block of symbols in textual or numerical (i.e. in our algorithm is numerical) form to be compressed).

| represents the end of data. ω will then be sorted to generate ω e as in Eq. 13.   a n a n a | ∧ b       a n a | ∧ b a n        a | ∧ b a n a n       b a n a n a | ∧ ω e= (13) n a n a | ∧ b a       n a | ∧ b a n a       ∧  b a n a n a |       ∧ | b a n a n a

Λ = Λ1 , Λ2 , ..., Λn

L (i.e. last column) and I (i.e. the index 7 in this example) will be the final output from this stage.

(10)


6

TABLE II C ONVERSION FROM N UMERICAL TO CHARACTER VALUES Λ Ch

0.0999 b

0.2000 a

0.1000 n

0.2000 a

0.1000 n

TABLE III MTF OF L = [b, n, n, a, a, a] AND Υ = [a, b, n]

0.2000 a

The decoder will rely on L and I to retrieve the original form. This is done by inserting L as the last column in a temporary 2D matrix of size n × n (i.e. the number of elements). This column is sorted to figure out the first column. Then the first and last columns (of each row) together give you all pairs of successive characters. Finally, the resultant matrix is identical to ω e (See Eq. 13). Therefore, the original form of data is easily accessible using the parameter I. D. Move-To-Front Although the output of BWT perfectly clusters similar symbols in long runs, in the case of smart meter readings these symbols vary from very small values (e.g. 11 and 10) to large values (e.g.g 3000 and 5000). Therefore, to expand the compression effectiveness of any entropy encoder such as Arithmetic coding, MTF transform is applied. MTF is a lightweight algorithm proposed by Ryabko [28] to increase the probability of small numbers near to zeros while decreasing the large numbers in a data list. The general idea is that each symbol in the data is replaced by its index in the list of currently used symbols. Therefore, long sequences of similar symbols are replaced by as many zeroes, while a rear symbol that has not been used for a long time will be replaced by a larger number. Let’s Υ be all the distinct symbols in the list L obtained from BWT stage which is shared between encoder and decoder, MTF algorithm then can be summarized in three steps. (1) Υ is initialized using L. (2) Each Lx in the list L is encoded as its preceding number of symbols in Υ, which will then be moved to the front of the distinct list Υ. (3) The final output is constructed in a list ∂ by combining the codes of step 2. The decoding process is the inverse of the these steps. for better clarification (See Table III), let’s assume L = [b, n, n, a, a, a] which is the output of BWT and its distinct symbols Υ = [a, b, n]. The first symbol L0 is b, and its preceding symbol index in Υ is 1. Consequently, the encoder will output 1 in ∂ and move b to the front of Υ = [b, a, n]. The second symbol L1 is n which is preceded by two symbols, and so the encoder output will be 2 and updated Υ = [n, b, a]. This will continue until Llast and the final output of this stage is ∂ = [1, 2, 0, 2, 0, 0, 0, 0]. Note that, two zeros has been added at the end of L to emphasize the significance of MTF technique. E. Run Length The output from MTF contains so many identical consecutive symbols; therefore, a simple technique called run length [29] is applied before the entropy codes. The general idea behind this approach to data compression is that, let’s assume a data item d occurs n consecutive times in the list of values. The n occurrences of this item will be replaced with the

Lx b n n a a a a a

Υ a, b, n b, a, n n, b, a n, b, a a, n, b a, n, b a, n, b a, n, b

∂ 1 2 0 2 0 0 0 0

single pair nd. The n consecutive occurrences of the item are called a run length of n. For example, the consecutive zeros in ∂ = [1, 2, 0, 2, 0, 0, 0, 0] will be ∂ = [1, 2, 0, 2, 0#4]. F. Arithmetic Coding An entropy encoding technique called Arithmetic coding (AC) is finally applied in our algorithm to achieve the most possible optimum compression ratio. AC is a statistical variable length coding by which frequently used numbers will be stored with fewer bits and not-so-frequently occurring symbols will be represented with more bits [29]. It is superior in most respects to the better-known entropy coders such Huffman method. This is because of that rather than segregating the input into component symbols and replacing each with a code, it encodes the entire message into a single number, a fraction n where (0.0 ≤ n < 1.0). The main idea is by starting with a certain interval, reading the input list by symbols and using the probability of each number to narrow the interval. AC can be summarized in the following steps. 1) Begin by defining the current interval as (0,1). 2) Repeat the following two steps for each symbol Si in the data list: a) Divide the present interval into subintervals whose sizes are proportional to the symbols probabilities. b) Choose the subinterval for Si which will be defined as the new present interval. 3) When the entire data list has been processed in this way, the output should be any number that uniquely distinguishes the current interval (i.e. any number in the present interval). The present interval gets smaller for each symbol processed. The final output is a single number (i.e. called tag value) and does not consist of codes for the individual symbols. To illustrate AC code construction, let’s consider encoding ˜ = (b, n, n, a, x). The full frequency a portion of message M distribution of that message is shown in Table IV. Default probability limit is between (0, 1). First, if b occurs then tag value has to be between (0, 0.4). Next, n is detected, so the current interval (0, 0.4) should be divided into subintervals by using lower and upper limit equations as shown in 14 and 15. ln = ln−1 + (un−1 − ln−1 ) × Fx (xn−1 )

(14)

un = ln−1 + (un−1 − ln−1 ) × Fx (x)

(15)


7

Arithmetic encoding process for the message "bnnax" Lower tag Values

0

0.16

0.208

0.2332

0.23608

"bnna"

"bnnax"

0.2368

0.2368

0 0.1

b

0.2 0.3 0.4 0.5

n

"b"

"bn"

"bnn"

0.6 0.7

a 0.8

0.9

x 1 Upper tag Values

0.4

0.28

0.244

Fig. 4. Graphical representation for Arithmetic encoding process for a message ”bnnax”.

TABLE IV F REQUENCY D ISTRIBUTION IN MESSAGE M Symbol b n a x

Probability 0.4 0.3 0.1 0.2

Accumulative Freq 0.4 0.7 0.8 1

Where ln and un are the lower and upper limit of the nth symbol, Fx represents its accumulative frequency. After substituting in Eq 14 and 15, the tag value of the sequence b, n is (0.16, 0.28). This has to be repeated for the entire message accumulatively. All tag values have been graphically summarized in fig. 4. The final compressed value is the average of the lower and the upper tag values 0.2360+0.2368 = 0.2364 which will be converted into binary. 2 At the decoder side, this tag value will be received and the probabilities of the message must be known. Then, the steps are identical but an inverse where the letters or numbers should be found by their accumulative probability. G. Decompression Algorithm The decompression process is identical to the stated compression steps but in an inverse way. The algorithm begins by Arithmetic decoding. Next, the run length values are segregated followed by MTF decoding. Burrow-Wheeler inverse transform then is applied which will generate the calculated margin space values. After that, first derivative

inverse is conducted to retrieve the actual margin space values. The stored Gaussian parameters are then used to calculate the approximated waveform readings. The summation of the approximated readings and the margin space values is finally calculated to reconstruct the exact lossless smart meter readings. Fig. 6 in Section V shows few examples of the original smart grid readings and the decompressed version which obviously proves that the readings are fully recoverable in our approach with zero loss.

IV. C OMPRESSION P ERFORMANCE M ETRICS This section discusses the theoretical and empirical matrices used to evaluate our proposed algorithm.

A. Theoretical Entropy In information theory, the term entropy of a signal represents the minimum bitrate (i.e. assuming the best compression) required to transmit this signal [30]. Therefore, to prove the effectiveness of preprocessing the data in our algorithm, the theoretical entropy is calculated for every data list of smart grid readings before starting our algorithm. Then, a quantitative measurement comparison will be conducted between the theoretical entropy and the achieved compression ratio. Lets assume smart grid readings list consisting of the data


8

Average Entropy of unprocessed readings Vs gaussian approximated readings 8.5 8

9

7.5

6.5 6 5.5

8

Entropy Level

7

7

V. I MPLEMENTATIONS

6 5

A. Datasets

5 4.5 4 3.5

been used to accurately measure the original and compressed size of the readings in bits. Every Reading represented as 16 bit. The typical block size suggested is around 1500 readings.

4 3 20 15

2 1.8

10

1.6

Case Number

1.4

5 0

1.2 1

Fig. 5. Comparison between the average entropy calculated from the original readings and margin space values after Gaussian approximation.

points d[1], d[2], ..., d[N ] the maximum likelihood entropy (i.e. in bits) is measured as X H(d) , − pb(v) log2 (b p(v)) (16) v∈R(d) N

1 X δv (d[n]) pb(v) , N n=1 ( 1 if d[n] = v δv (d[n]) , 0 else

(17)

B. Experiments and Results (18)

Where R(d) is the range of d and pb(v) is the empirical probability of v ∈ R. The worst-case (i.e. highest entropy) occurs when each value in R appears at the identical frequency 1/|R|, where |R| represents the elements in original R (See Eq. 19). Hmax

X 1 1 log2 ( ) = log2 |R|. =− |R| |R|

(19)

v∈R

In contrary, the lowest entropy (i.e. best-case) occurs when all values of d are similar, which leads to Hmin = −1 log2 (1) = 0. Fig. 5 emphasizes the average of improvement in the entropy before and after the major step which is applying Gaussian approximation and utilizing only the margin space. The entropy has been reduced by almost half. B. Empirical Ratio The compression ratio (CR) is the main benchmark to practically measure any proposed compression algorithm performance. Let’s denote the original smart meter readings block O (i.e. its unit in byte or bit) and the resultant compressed readings C. Therefore, the empirical CR in results section is calculated as shown in Eq. 20. O (20) C The well-known leading power quality storage format for electric power system waveforms used in most of smart grids called PQDIF (Power Quality Data Interchange Format), which is defined by the IEEE1159 working group [31], has CR =

Various smart meters readings datasets have been utilized in our experiments which are collected and published by Laboratory for Advanced System Software as a part of project named ”Smart*” [32], [33]. The dataset includes continuous readings -every minute- from three homes for three months. The readings’ types can be classified into (1) power usage such as watts consumption and heat-index, and (2) environmental characteristics such as inside/outside temperature, inside/outside humidity and wind-chill. The dataset also provides periodical electricity power consumption -every minute- from around 400 anonymous homes for (3 × 30 × 24) hours. According to the definition of spatial and temporal aggregations, these readings are temporal because they are gathered separately from every single house after equipping it with a smart meter that collects its readings periodically.

Our main performed experiments can be classified into two main parts. (1) The compression which is performed by aggregators, that receives an overwhelming amount of collected readings from various entities (e.g. homes). (2) Decompression which will be performed at the operation centers or cloud level. To avoid biased results, all records in the aforementioned datasets have been used in our algorithm. In other words, the shown results in both Table V and Fig. 7 are from continuous blocks (i.e. from 15/Apr/2012 to 1/Jul/2012). Similarly, the existing lossless compression models in this domain has been implemented to precisely provide a clear comparison. For brevity in this paper, the results has been summarized in Table V. Every column represents results from different model as follows. (1) The second and third columns show CR of using techniques in [12] which mainly relies on Huffman and deltaHuffman. (2) The fourth column shows results of utilizing models in [13] based Arithmetic coding. (3) The results of the fifth column is based on the model in [17] that uses LempelZiv. (4) The sixth column presents results obtained by utilizing the technique in [19] which is based on linear prediction model and LZMA. (5) The seventh and the ninth columns highlight results using the technique in [13] which relies on bzip2 and delta bzip2 (i.e. BWT,MTV and AC). (6) The eighth column shows the results using the invertible transformation preprocessing followed by Golomb-Rice encoding [14]. (7) The last column shows results using our algorithm. Fig 7 shows a larger number of the obtained CR from our algorithm compared to the aforementioned ones. Additionally, fig. 6 shows an example of a plot of 6 original CR smart meter readings before the compression and after the recovery (i.e. decompression).


a

9

Original Smart Meter Readings 350

300

300 250

171

310

170

305

169

300

168

295

167

290

250 200 200 150

150 0

b

500

166 0

500

285 0

500

310

800

305

600

300

400

295

200

290 0

500

0 0

500

0

500

0

500

Decompressed Smart Meter Readings 350

300

300 250

171

310

170

305

169

300

168

295

167

290

250 200 200 150

150 0

500

166 0

500

285 0

500

310

800

305

600

300

400

295

200

290 0

500

0 0

500

Fig. 6. 3 examples of watts consumptions’ readings collected from different homes: (a) Direct plot of original readings, and (b) Plot of the readings after the decompression process.

Compression Ratio Comparison between 6 models 4.5

4 Our approach Gaussian 3.5

3

Delta-Bzip2 [13] Trans Golomb [14]

2.5

Lempel-Ziv [17] Delta-Huffman [12]

2

Huffman [12]

1.5

1 0

10

20

30

40

50

60

Fig. 7. Compression ratio comparison results between 6 models: (1) Huffman [12] , (2) delta-Huffman [12] , (3) Lempel-Ziv [17], (4) Invert-Trans Golomb [14], (5) delta-Bzip2 [13], and (6) our Gaussian based approach.

C. Discussion The experiments obviously emphasize the effectiveness of our Gaussian approximation based lossless compression of smart grids readings compared to other existing algorithms. This is because of the advantage of excluding many data points and compress only the margin space. This has been proved theoretically by comparing the entropy before and after our approach (See fig. 5) and experimentally as shown in fig. 7. VI. C ONCLUSION In this paper, a novel lossless waveform smart grid readings compression algorithm has been introduced. The main target is representing smart grid waveform readings in few parameters.

This is successfully achieved using Gaussian approximation based on dynamic-nonlinear least square error technique. This means that our algorithm can work on any type of waveform readings. The margin space between the approximated and the actual waveform readings is calculated. The significance is that the compression will be only for margin space limited points rather than the entire stream of waveform readings. The margin space values are then encoded using BWT followed by MTF and RLE to eliminate the redundancy. AC is finally applied. After a thorough evaluation, the proposed technique was superior to existing models theoretically (i.e. the entropy was almost reduced by half) and empirically (i.e. achieved compression ratio was 3.8:1).


10

TABLE V C OMPRESSION RATIO Record 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Average

Huffman [12] 1.72 1.08 1.14 1.29 1.78 1.81 1.34 0.91 1.18 1.03 1.32 1.52 1.03 1.47 0.95 0.91 1.09 0.79 0.68 0.67 0.63 1.33 1.60 0.70 1.00 1.43 1.15 1.20 1.00 1.10 1.09 1.17 0.77 1.14 1.27 1.04 1.62 1.16 0.95 0.73 1.14

Delta+Huffman [12] 1.97 1.71 1.63 1.64 2.01 1.78 1.70 1.62 1.58 1.80 1.67 1.74 1.84 1.92 2.09 1.74 1.80 2.06 1.74 1.81 1.76 1.71 1.70 1.95 1.89 1.65 1.81 1.88 1.98 1.77 1.73 2.02 1.72 1.78 1.96 1.70 1.83 1.91 2.05 1.80 1.81

AC [13] 2.43 2.41 2.39 2.37 2.36 2.36 2.36 2.37 2.38 2.41 2.42 2.44 2.46 2.46 2.44 2.44 2.43 2.42 2.42 2.44 2.46 2.47 2.48 2.50 2.51 2.51 2.52 2.52 2.50 2.47 2.46 2.44 2.44 2.46 2.48 2.48 2.49 2.48 2.45 2.43 2.44

Lempel-Ziv [17] 2.47 2.21 2.13 2.14 2.01 2.28 2.20 2.12 2.08 2.30 2.17 2.24 2.34 2.42 2.09 2.24 2.30 2.06 2.24 2.31 2.26 2.21 2.20 2.45 2.39 2.15 2.31 2.38 2.48 2.27 2.23 2.02 2.22 2.28 2.46 2.20 2.33 2.41 2.05 2.30 2.25

Prediction+LZMA [19] 2.44 2.54 2.53 2.51 2.26 2.48 2.50 2.54 2.53 2.36 2.33 2.22 2.53 2.48 2.22 2.50 2.61 2.55 2.64 2.30 2.24 2.39 2.65 2.23 2.64 2.45 2.22 2.23 2.47 2.55 2.20 2.56 2.50 2.21 2.47 2.52 2.64 2.59 2.67 2.43 2.45

R EFERENCES [1] Vehbi C Gungor, Bin Lu, and Gerhard P Hancke. Opportunities and challenges of wireless sensor networks in smart grid. Industrial Electronics, IEEE Transactions on, 57(10):3557–3564, 2010. [2] Vehbi C Güngör, Dilan Sahin, Taskin Kocak, Salih Ergüt, Concettina Buccella, Carlo Cecati, and Gerhard P Hancke. Smart grid technologies: communication technologies and standards. Industrial informatics, IEEE transactions on, 7(4):529–539, 2011. [3] Western Electricity Coordinating Council. The western interconnection synchrophasor project. [4] Michel P Tcheou, Lisandro Lovisolo, Moisés Vidal Ribeiro, Eduardo AB da Silva, Marco AM Rodrigues, Joao Marcos Travassos Romano, and Paulo SR Diniz. The compression of electric signal waveforms for smart grids: state of the art and future trends. Smart Grid, IEEE Transactions on, 5(1):291–302, 2014. [5] Surya Santoso, Edward J Powers, and WM Grady. Power quality disturbance data compression using wavelet transform methods. Power Delivery, IEEE Transactions on, 12(3):1250–1257, 1997. [6] Jiaxin Ning, Jianhui Wang, Wenzhong Gao, and Cong Liu. A waveletbased data compression technique for smart grid. Smart Grid, IEEE Transactions on, 2(1):212–218, 2011. [7] CT Hsieh and SJ Huang. Disturbance data compression of a power system using the huffman coding approach with wavelet transform

[8]

[9]

[10]

[11]

[12]

[13]

Bzip2 [13] 2.60 2.53 2.58 2.54 2.47 2.52 2.56 2.54 2.61 2.65 2.66 2.67 2.67 2.69 2.69 2.68 2.68 2.67 2.63 2.63 2.65 2.66 2.67 2.67 2.68 2.63 2.63 2.65 2.63 2.66 2.69 2.67 2.71 2.68 2.65 2.64 2.65 2.60 2.62 2.63 2.63

Invert Trans+Golomb [14] 3.07 2.81 2.73 2.74 2.61 2.88 2.80 2.72 2.68 2.90 2.77 2.84 2.94 3.02 2.69 2.84 2.90 2.66 2.84 2.91 2.86 2.81 2.80 3.05 2.99 2.75 2.91 2.98 3.08 2.87 2.83 2.62 2.82 2.88 3.06 2.80 2.93 3.01 2.65 2.90 2.85

Delta+Bzip2 [13] 3.25 3.11 3.01 2.97 2.90 2.83 2.88 2.84 2.77 2.82 2.86 2.84 2.91 2.95 2.96 2.97 2.97 2.99 2.99 2.98 2.98 2.97 2.93 2.93 2.95 2.96 2.97 2.97 2.98 2.93 2.93 2.95 2.93 2.96 2.99 2.97 3.01 2.98 2.95 2.94 2.95

Our approach 3.90 3.91 3.90 3.89 3.87 3.85 3.83 3.82 3.80 3.78 3.76 3.73 3.70 3.65 3.60 3.54 3.97 3.91 3.85 3.80 3.75 3.72 3.70 3.69 3.69 3.70 3.71 3.72 3.74 3.75 3.76 3.78 3.80 3.83 3.87 3.92 3.97 3.53 3.58 3.64 3.80

enhancement. In Generation, Transmission and Distribution, IEE Proceedings-, volume 150, pages 7–14. IET, 2003. Michel P Tcheou, Lisandro Lovisolo, Eduardo AB da Silva, Marco AM Rodrigues, and Paulo SR Diniz. Optimum rate-distortion dictionary selection for compression of atomic decompositions of electric disturbance signals. Signal Processing Letters, IEEE, 14(2):81–84, 2007. Moisés V Ribeiro, Seop Hyeong Park, João Marcos T Romano, and Sanjit K Mitra. A novel mdl-based compression method for power quality applications. Power Delivery, IEEE Transactions on, 22(1):27– 36, 2007. Alsharif Abuadbba and Ibrahim Khalil. Wavelet based steganographic technique to protect household confidential information and seal the transmitted smart grid readings. Information Systems (2014)., 2014. Alsharif Abuadbba, Ibrahim Khalil, and Mohammed Atiquzzaman. Robust privacy preservation and authenticity of the collected data in cognitive radio networkwalsh–hadamard based steganographic approach. Pervasive and Mobile Computing (2015)., 2015. Dahai Zhang, Yanqiu Bi, and Jianguo Zhao. A new data compression algorithm for power quality online monitoring. In Sustainable Power Generation and Supply, 2009. SUPERGEN’09. International Conference on, pages 1–4. IEEE, 2009. Jan Kraus, Tomas Tobiska, and Viktor Bubla. Loooseless encodings and compression algorithms applied on power quality datasets. In Electricity Distribution-Part 1, 2009. CIRED 2009. 20th International Conference


and Exhibition on, pages 1–4. IET, 2009. [14] J. E. Tate. Preprocessing and golomb -rice encoding for lossless compression of phasor angle data. IEEE Transactions on Smart Grid, 7(2):718–729, March 2016. [15] Ganapati Panda, PK Dash, Ashok Kumar Pradhan, and Saroj K Meher. Data compression of power quality events using the slantlet transform. Power Delivery, IEEE Transactions on, 17(2):662–667, 2002. [16] Saroj K Meher, AK Pradhan, and G Panda. An integrated data compression scheme for power quality events using spline wavelet and neural network. Electric power systems research, 69(2):213–220, 2004. ¨ [17] Omer Nezih Gerek and Dogan Gökhan Ece. Compression of power quality event data using 2d representation. Electric Power Systems Research, 78(6):1047–1052, 2008. [18] F Lorio and F Magnago. Analysis of data compression methods for power quality events. In Power Engineering Society General Meeting, 2004. IEEE, pages 504–509. IEEE, 2004. ˇ epán, and Leoˇs Kukaˇcka. Optimal data compression [19] Jan Kraus, Pavel St´ techniques for smart grid and power quality trend data. In Harmonics and Quality of Power (ICHQP), 2012 IEEE 15th International Conference on, pages 707–712. IEEE, 2012. [20] Timothy Brian Littler and DJ Morrow. Wavelets for the analysis and compression of power system disturbances. Power Delivery, IEEE Transactions on, 14(2):358–364, 1999. [21] Shyh-Jier Huang and Ming-Jong Jou. Application of arithmetic coding for electric power disturbance data compression with wavelet packet enhancement. Power Systems, IEEE Transactions on, 19(3):1334–1341, 2004. [22] Jahangir Khan, Shoaib Bhuiyan, Gail C Murphy, and Morgan Arline. Embedded zerotree wavelet based data compression for smart grid. In Industry Applications Society Annual Meeting, 2013 IEEE, pages 1–8. IEEE, 2013. [23] Norman CF Tse, JohnY C Chan, Wing-Hong Lau, Jone TY Poon, and LL Lai. Real-time power-quality monitoring with hybrid sinusoidal and lifting wavelet compression algorithm. Power Delivery, IEEE Transactions on, 27(4):1718–1726, 2012. [24] J. C. S. de Souza, T. M. L. Assis, and B. C. Pal. Data compression in smart distribution systems via singular value decomposition. IEEE Transactions on Smart Grid, (99):1–1, 2015. [25] Raj Chhikara. The Inverse Gaussian Distribution: Theory: Methodology, and Applications, volume 95. CRC Press, 1988. [26] Sae-Young Chung, Thomas J Richardson, and Rüdiger L Urbanke. Analysis of sum-product decoding of low-density parity-check codes using a gaussian approximation. Information Theory, IEEE Transactions on, 47(2):657–670, 2001. [27] M. Burrows and D. J. Wheeler. A block-sorting lossless data compression algorithm. Digital SRC Research Report, (124), 1994. [28] B Ryabko. Data compression by means of a ”book stack. Problems of Information Transmission, 16(4):265–269, 1980. [29] David Salomon. Data Compression: The Complete Reference. SpringerVerlag New York, 2004. [30] C. E. Shannon. A mathematical theory of communication. The Bell System Technical Journal, 27(4):623–656, Oct 1948. [31] Ieee recommended practice for the transfer of power quality data. IEEE Std 1159.3-2003, pages 1–119, 2004. [32] Sean Barker, Aditya Mishra, David Irwin, Emmanuel Cecchet, Prashant Shenoy, and Jeannie Albrecht. Smart*: An open data set and tools for enabling research in sustainable homes. SustKDD, August, 2012. [33] Sean Barker, Aditya Mishra, David Irwin, Emmanuel Cecchet, Prashant Shenoy, and Jeannie Albrecht. Smart project. http://traces.cs.umass.edu/ index.php/Smart/Smart, 2012.

11