CSE 390 Introduction to Data Compression Run-Length Coding ...
Recommend Documents
1. CSE 490 G. Introduction to Data Compression. Winter 2006. Dictionary Coding
. LZW. CSE 490g - Lecture 7 - Winter 2006. 2. Dictionary Coding. • Does not ...
1. CSE 490 G. Introduction to Data Compression. Winter 2006. Huffman Coding.
CSE 490g - Lecture 2 - Winter 2006. 2. Huffman Coding. • Huffman (1951).
1. CSE 490 G. Introduction to Data Compression. Winter 2006. Wavelet
Transform Coding. PACW. CSE 490g - Lecture 12 - Winter 2006. 2. Wavelet
Transform.
1. CSE 490 G. Introduction to Data Compression. Winter 2006. Lossy Image
Compression. Transform Coding. JPEG. CSE 490g - Lecture 11 - Winter 2006. 2.
Knowledge discovery, data mining, data preprocessing, data transformations;
clustering, classification, frequent pattern mining, anomaly detection, graph ...
Data Mining: Concepts and Techniques. J. Han, M. Kamber : Morgan Kaufmann,
2006.
Introduction to Data Management. CSE 344. Lecture 25: DBMS-as-a-service and
NoSQL. Magda Balazinska - CSE 344 Winter 2012. 1 ...
There are two types of data compression. –. Lossless data compression. –
decompression reproduces the original data exactly. –. Lossy data compression.
–.
Data compression [13] is an area in the field of information theory that stud- ies
ways of ... studied through the definition of universal distances that can be used
in ...
and Cryptography under Award NRF-CRP2-2007-03. The author is with ...... personal âEmmyâ award in 2004, the 1998 IEEE Edison Medal, the 1996 IEEE.
118. The International Arab Journal of Information Technology, Vol. 2, No. 2, April
2005. Wavelet Coding Design for Image Data. Compression. Othman Khalifa.
118. The International Arab Journal of Information Technology, Vol. 2, No. 2, April 2005. Wavelet Coding Design for Image Data. Compression. Othman Khalifa.
Digital image compression has recently gained enormous popularity in a variety
of ... conversion of digital image data into a bitstream, which is e.g. transmitted.
CSE 581 Introduction to DBMS. Summary notes on Chapter 6: Database design
and the E-R Model. The focus of this chapter is on the entity-relationship data ...
A bit stuff algorithm is suggested for coding wide sense RLL channels. We determine the rate of the bit stuff algorithm as the function of the stuffing probability.
High-Rate Maximum Runlength Constrained Coding. Schemes Using Nibble Replacement. Kees A. Schouhamer Immink, Fellow, IEEE. AbstractâIn this paper, ...
INTRODUCTION. Data compression is often referred to as coding, where coding
is a very general term encompassing any special representation of data which ...
In this paper, we present the simple and double compression al- gorithms with .... consist of compressing the coefficients obtained from the simple com- pression ...
Series Editor, Edward A. Fox, Virginia Polytechnic University. Introduction to Data
Compression, Third Edition. Khalid Sayood. Understanding Digital Libraries ...
Whoops! There was a problem loading more pages. Retrying... [zatmit.com]Introduction-to-Data-Compression-Fourth-Edition-
download pdf Introduction to Data Compression, Fifth Edition (The. Morgan Kaufmann Series in Multimedia Information and
The design of data compression schemes therefore involve trade-offs between ...
amount of distortion introduced (if using a lossy compression scheme), and the ...
images. Compression is therefore all about probability. When discussing
compression algorithms it is important to make a distinction between two
components: ...
The Morgan Kaufmann Series in Multimedia Information and Systems. Series Editor ... 2 Mathematical Preliminaries for Lossless Compression. 13. 2.1 Overview. 13 ..... also points to a certain degree of maturation of the technology. This maturity is ..
Page 3 of 743. FOURTH E D I T I O N. Introduction to. Data Compression. Khalid Sayood. University of Nebraska. AMSTERDAM
CSE 390 Introduction to Data Compression Run-Length Coding ...
Introduction to Data Compression. Fall 2004. Lecture 4: Golomb Codes. Tunstall
Codes. CSE 390 - Lecture 4 - Fall 2004. 2. Run-Length Coding. • Lots of 0's and ...
Run-Length Coding • Lots of 0’s and not too many 1’s:
CSE 390 Introduction to Data Compression
– Fax of letters. – Graphics.
• Simple run-length code:
Fall 2004
– 00000010000000001000000000010001001..... – 6 9 10 3 2 ... – Code the bits as a sequence of integers.
Lecture 4: Golomb Codes Tunstall Codes
CSE 390 - Lecture 4 - Fall 2004
Golomb Code of Order m •
Example
Let n = qm + r where 0 < r < m.
• n = qm + r is represented by: q 67 8 11L10rˆ
– Divide m into n to get the quotient q and remainder r.
•
Code for n has two parts:
– where rˆ is the fixed prefix code for r
1. q is coded in unary. 2. r is coded as a fixed prefix code. Example: m = 5 0 1 0 0
1 1
2
• Example (m=5): 2 6 9 10 010 1001 10111 11000
code for r
1
0 0 3
4 3
Alternative Explanation Golomb Code input 0
1
1 00000
0 1
1
1
0
01 001
0
0001
1 00001
27 11111010
1
CSE 390 - Lecture 4 - Fall 2004
0
2
1
00001
0111
0001
0110
001
010
01
001
1
000
CSE 390 - Lecture 4 - Fall 2004
4
Run Length Example: m = 5 00000010000000001000000000010001001..... 1 00000010000000001000000000010001001..... 001 00000010000000001000000000010001001..... 1 00000010000000001000000000010001001..... 0111
output
00000
CSE 390 - Lecture 4 - Fall 2004
In this example we coded 17 bit in only 9 bits.
5
CSE 390 - Lecture 4 - Fall 2004
6
1
Choosing m
Average Bit Rate for Golomb Code
• Suppose that 0 has the probability p and 1 has probability 1-p. • The probability of 0n1 is pn(1-p). The Golomb code of order m = - 1 log 2 p is optimal.
Average Bit Rate =
Average output code length Average input code length
• m = 4 as an example. With p as the probability of 0. ABR =
Notes on Golomb codes • Useful for binary compression when one symbol is much more likely than another.
GC – entropy entropy
order
– binary images – fax documents – bit planes for wavelet image compression
• Need a parameter (the order) – training – adaptively learn the right parameter
• Variable-to-variable length code • Last symbol needs to be a 1 – coder always adds a 1 – decoder always removes a 1
p CSE 390 - Lecture 4 - Fall 2004
9
Tunstall Codes
1. No input code is a prefix of another to assure unique encodability. 2. Minimize the number of bits per symbol.
input output 000
b
001
ca
010
cb
011
cca
100
ccb
101
ccc
110
10
Tunstall code Properties
• Variable-to-fixed length code • Example a
CSE 390 - Lecture 4 - Fall 2004
a b cca cb ccc ... 000 001 110 011 110 ...
CSE 390 - Lecture 4 - Fall 2004
11
CSE 390 - Lecture 4 - Fall 2004
12
2
Prefix Code Property
a
000
b
001
ca
010
cb
011
cca
100
ccb
101
ccc
110
a 000
010
• Consider the string “cc”. It does not have a code. • Send the unused code and some fixed code for the cc. • Generally, if there are k internal nodes in the prefix tree then there is a need for k-1 fixed codes.
c
b 001
Use for unused code
a
b
c
011 a b 100
c
101
110
Unused output code is 111. CSE 390 - Lecture 4 - Fall 2004
13
CSE 390 - Lecture 4 - Fall 2004
Designing a Tunstall Code • •
14
Example
Suppose there are m initial symbols. Choose a target output length n where 2n > m.
• P(a) = .7, P(b) = .2, P(c) = .1 • n=3 a
1. Form a tree with a root and m children with edges labeled with the symbols. 2. If the number of leaves is > 2n – m then halt.* 3. Find the leaf with highest probability and expand it to have m children.** Go to 2.
c
b
.7
.2
.1
* In the next step we will add m-1 more leaves. ** The probability is the product of the probabilities of the symbols on the root to leaf path. CSE 390 - Lecture 4 - Fall 2004
15
CSE 390 - Lecture 4 - Fall 2004
Example
Example
• P(a) = .7, P(b) = .2, P(c) = .1 • n=3
• P(a) = .7, P(b) = .2, P(c) = .1 • n=3
a a .49
c
b .14
a
c
b .2
a
.1 a
.07 .343
CSE 390 - Lecture 4 - Fall 2004
17
b .098
c
b c
b c
.14
16
.2
.1
.07
.049
CSE 390 - Lecture 4 - Fall 2004
18
3
Bit Rate of Tunstall
Example a
• The length of the output code divided by the average length of the input code. • Let pi be the probability of input code i and ri the length of input code i (1 < i < s) and let n be the length of the output code. Average bit rate =
a a .343
b .098
b c
c
b c
.14
.2
.1
.07
.049
n s
ABR = 3/[3 (.343 + .098 + .049) + 2 (.14 + .07) + .2 + .1] = 1.37 bits per symbol Entropy = 1.16 bits per symbol
∑p r i=1
i i
CSE 390 - Lecture 4 - Fall 2004
19
CSE 390 - Lecture 4 - Fall 2004
20
Notes on Tunstall Codes • Variable-to-fixed length code • Error resilient – A flipped bit will introduce just one error in the output – Huffman is not error resilient. A single bit flip can destroy the code.