ARITHMETIC CODING OF A LOSSLESS CONTOUR ... - CiteSeerX

0 downloads 0 Views 85KB Size Report
16 Place du Commerce, Verdun, Québec H3E 1H6, Canada [email protected]. ABSTRACT. We propose a new method for the encoding of ...
To appear in Proc. International Conference on Image Processing, ICIP-98, Oct. 4-7, 1998, Chicago, IL, USA

ARITHMETIC CODING OF A LOSSLESS CONTOUR-BASED REPRESENTATION OF LABEL IMAGES Lilian Labelle, Daniel Lauzon, Janusz Konrad, Eric Dubois INRS-T´el´ecommunications, Institut National de la Recherche Scientifique 16 Place du Commerce, Verdun, Qu´ebec H3E 1H6, Canada [email protected]

ABSTRACT We propose a new method for the encoding of label images (also known as segmentation maps or alpha planes) that are often used to identify object location in region-based image and video coders. The method is contour-based and lossless with a contour model composed of two parts: a contour graph describing the topology of the contour network and a directional chain code to deal with the geometric part of the label image (internal contour points). The graph-based description of the topology is designed to minimize the cost of encoding the nodes, while the directional chain codes are compressed by arithmetic coding. The approach is flexible since separating the contour network into topological and geometrical parts allows the use of other lossless or lossy methods to encode the geometric part without changing the graph representation. The proposed method has been compared with an arithmetic encoder used in MPEG-4. 1. INTRODUCTION Region-based image coding is a viable alternative to the traditional block-based coding in terms of both manipulation flexibility and compression efficiency. While evaluating the performance of region-based schemes it has been found that the encoding of label images (segmentation maps, alpha planes) is particularly critical. Two approaches are typically used to encode such images: a contour-based approach and a label-based approach. In a contour-based approach, the contour network is usually divided into several contour chains which are encoded separately. Each chain is composed of the nodes, that identify the start and end of a chain (as well as intersection points of a contour network), and of the internal points. In general, the encoding of the nodes is costly, especially when the contour chains are numerous. This work was supported by the the Natural Sciences and Engineering Research Council of Canada and by the Fonds pour la formation de chercheurs et l’aide la recherche (Fonds FCAR). The first author is now with Canon Research Center France, Rue de la Touche-Lambert, 35517 Cesson-Sevign´e, France.

The internal points of the chains can be described using either a lossless or a lossy method. A lossless approach has the obvious advantage of distortion-free representation but provides a limited degree of compression. A lossy approach assures more efficient coded representations [1], but problems can occur, such as loss of region connectivity. Also, the degree of degradation must be controlled [2]. In this paper, a lossless contour-based method for the compression of label images is developed and compared experimentally with a typical arithmetic coder. The proposed contour-based model is composed of a contour graph (nodes), which describes the topology of the contour network, and of a directional chain code, describing the geometrical part of the label image (internal points of the contour network), followed by arithmetic coding. The graphbased description of the topology minimizes the cost of encoding the nodes, while the DCC is compressed by arithmetic coding. Note that separating the topological and geometrical parts of the contour network provides a general model because other lossless or lossy methods can be used to encode the geometric part without changing the graph representation. We compare the proposed method with a typical arithmetic encoder, a well-known approach that may yield varying gains depending on the statistical model used. Here, an adaptive model (Q-coder in intra mode) defined in the verification model of the MPEG-4 standard is used [3]. 2. PROPOSED APPROACH A typical contour-based coding method consists of two steps: extraction of the contour graph from the label image and encoding of the contour graph. Before explaining each step, it is important to define a contour point. Usually, a contour point is defined as a grid point in the label image with at least one of its eight nearest neighbors labelled differently than the point itself. The main problem with such an approach is that contour points belong to the regions they are supposed to separate and the resulting contours are twopixels thick.

c 1998 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

To overcome this shortcoming, we define contour elements on an interpixel grid [2]. Each contour point is considered to be in the center of a line joining two adjacent pixels that have different labels, i.e., that belong to different regions. Points defined in this manner constitute a hexagonal lattice shifted with respect to the lattice of image pixels. The corresponding connectivity is the 6-connectivity, but with two separate neighborood systems for horizontal and vertical contour points (Fig. 1). Such a definition naturally leads to one-pixel thick contours and hence even narrow objects can be uniquely represented.

6-connected contour

Neighborhood systems

Figure 1: Example of a contour on an interpixel grid (  denotes a pixel, and + denotes an element of 6-connected interpixel grid).

2.1. Definition of the contour graph The contour points on the 6-connected grid are identified by raster scanning the label image; when two consecutive pixels do not have the same label, a contour point is declared between them. This process is done both horizontally and vertically to find all contour points. All the points around the border of the image are assumed to be contour points. The contour network can be treated as a graph, where the vertices or nodes are junction points of contours (i.e., contour points with three or more contour points in their neighborhood) and the arcs are the contours connecting these nodes. The nodes are identified by scanning the image of contour points from top to bottom; a contour point is a node if it has three or more neighbors that are contour points and none of its neighbors is a previously identified node. This avoids clusters of nodes at a junction point. In the case of a single closed contour, any point can be selected as the single node. The graph is composed of one or more connected components in which each contour point can be reached from any other by following contours. Each such connected component will be encoded independently, so we need only address the representation of a single connected component. Fig. 2(a) illustrates a simple connected component with 4 nodes and 6 arcs. In this work, we represent the contour graph by specifying each of the arcs in a certain order. We identify the starting point (a node) and the positions of all the points on

the arc relative to the starting node by a geometric description such as a chain code. The final point of the arc is known to be a node. Explicitly specifying each start node would be costly. In our approach the arcs are transmitted in such an order that each start node can be chosen from among previously transmitted nodes and identified by an efficient code. The algorithm for selecting the order in which arcs are transmitted and the starting node for each arc for a connected component works as follows. A starting or root node is selected and one of its neighboring contour points is selected to define the first arc. As each arc is traversed, the contour points are tagged as visited. Also a list of previously visited nodes is maintained; the receiver maintains the same list. The encoder also generates an associated tree used to determine which previously visited node is the start node for the next arc to transmit. The receiver also maintains a copy of this tree. Each time an arc is transmitted, a branch is added to the associated tree. The algorithm for a connected component is thus: 0. Choose a root node and starting arc. Initialize the list of visited nodes with this root node, and put it in correspondence with the root of the associated tree. 1. Given a start node and an unvisited neighboring contour point, encode the corresponding arc, tagging each contour point as visited, and return the coordinates of the terminal node. Add the corresponding branch to the associated tree. 2. If the terminal node is not tagged as previously visited, add it to the list of visited nodes, output a 1 and go to 5. 3. Go back in the associated tree to the previously visited start node. If this node has unvisited neighboring contour points, output a 1 and go to 5. 4. If the current node is the root, stop. Otherwise output a 0 and go to 3. 5. Set the current node as a start node. Choose the first unvisited neighboring contour point after a visited neighbor when searching in a counterclockwise direction to specify the next arc to follow. Go to 1. The associated tree for the contour graph in Fig. 2(a) is shown in Fig. 2(b). The sequence of bits transmitted to identify the start nodes (also called selection decisions) is 111101. Based on such a contour graph, each boundary of a region in the label image can be defined by a closed list of consecutive arcs; each arc separates a right and a left region according to the arc covering. At the decoder side, from the reconstructed network of contour points, this list is computed as follows. A start arc is selected. From its end node, the rightmost (or the leftmost) arc is added to the list of arcs until the start arc is found, i.e., the list is closed. Based on the reconstructed arcs the 8-connected internal points (pixels of a region) are reconstructed using a filling algorithm.

n1

a)

a5

a1

n2

pp

n2

pc

a2

a4 n4

n1

a1

a3

n3

a2

b)

4

a6

n3

a) Contour to encode

n4

a3 a6

0

a5 n1

n2

pc

0

1

pc

1

3 2

Figure 2: Example of a contour graph and its associated tree

3

5

6

2

e) Fourth point (code = 3)

pp

n1

a4

pa

pp

3

pa

4

4

pp

pa

5

pc

6

2

1 0

5

6

6 5

pp

2.2. Encoding of the arcs

0

The usual approach to obtain a lossless representation of a contour chain is to use Freeman codes (with 8, 16, 32 or more states) [4] or chain codes [5, 6, 7]. We will use a directional chain code method with 7 states defined on the hexagonal lattice shown in Fig. 1. Let pp and pc denote two contour points last encoded (second-previous and previous), and let pa be a point to be encoded next. If pa is neither the first nor the second point of a chain, then all three points (pp , pc , pa ) belong to the same chain and the encoding is straightforward. However, if pa is either the first or the second point of a chain, then special selection rules apply; this will be discussed later. Finally, when the very first chain is encoded, pp is the root of a connected component and pc is the second point of the chain; the relative coordinates of pc , computed with respect to pp , are transmitted. Given the positions of pp and pc , we propose to construct the codes according to the following rules (Fig. 3)

 

the 5 nearest positions with respect to pc on the 6connected grid (all elements of the 6-connected neighborhood except for pp ), read counter-clockwise starting at pp , have the codes 0, 1, 2, 3 and 4, respectively, the two points d(p; pp )

p

colinear with

pc

and

pp

such that

= 3d(pc ; pp ) or d(p; pp ) = 4d(pc ; pp ), where

d(.,.) is the Euclidean distance, have the codes 5 and 6, respectively. The set of positions f0; 1; 2; 3; 4; 5; 6g is called a pattern and ?! its orientation depends on the vector pp pc (Fig. 3). Having defined the pattern, the code to be assigned to the next point pa is determined as follows. If points corresponding to the codes 2, 5 and 6 of the pattern belong to the contour chain, the code is 6. If points 2 and 5 of the pattern belong to the chain, the code is 5. Otherwise, the code is 0, 1, 2, 3 or 4

f) Fifth point (code = 4)

b) First point (code = 3)

Note that the label value of each region must be transmitted in order to assure a lossless representation of the original label image.

pa

4

pc 1

3 2

pa

2 3 4

1

pc

0

pp

5 6

g) Sixth point (code = 3)

c) Second point (code = 3)

pp

4

3

pc

pa

0

1

6 5 2

5

6

d) Third point (code = 5, not 2)

pa

2

1

0

pc

pp

3

4

h) Seventh point (code = 2)

Figure 3: Example of directional chain codes: a) contour to be encoded, b-h) construction of codes for the 7 points of the contour. pa is being encoded whereas pp and pc are already encoded points. The final chain code is f3,3,5,3,4,3,2g. depending on the position of pa . Fig. 3 shows an example of such an encoding method. Since each chain code assumes one of 7 possible states, 3 bits are needed for its representation. If the code 5 (respectively 6) is chosen, 2 (respectively 3) contour points are encoded using only 1 code, i.e., 3 bits. Thus, we can say in this case that 1.5 (respectively 1.0) bits are used to encode each contour point. Other patterns can be chosen as well; using our definition fewer codes than contour points will be used without loss of information if contour chains define lines of 2, 3 or more points. In the DCC method, a set of chain codes is computed for every contour chain. To define the first code of each set, the DCC method must be initialized, i.e., the points pp and pc must be defined from a previously-encoded chain. To explain this initialization process, we denote by ci the chain associated with the arc ai . ci is an ordered set of contour points pi;j , such that i is the chain number and j is the number of a contour point in ci . ci starts at pi;0 and ends at pi;Ni . Thus, ci is composed of Ni + 1 contour points. Note that the termination points pi;0 and pi;Ni are nodes of

the network. In Fig. 2, n1 is the root, pp = p1;0 = n1 and pc = p1;1 . The relative coordinates of p1;1 , computed with respect to n1 , are separately transmitted. c1 is encoded using the DCC method; the last two points being pp = p1;N1 ?1 and pc = p1;N1 = n2 . To encode a contour graph, its geometric and topological data parts are treated separately. The geometric data stream is composed of all the sets of chain codes defined above. Note that in order to help the decoder reconstruct the contour graph, some codes need to be added: 1) the endchain code, included just after the last chain code of each chain, to mark an end of arc, 2) the optional leaf code to specify that the current node is a leaf, i.e., that the selection decisions must be read to define the next start node (must be inserted just after the end-chain code; if not inserted, then the last reconstructed node is the new start node), and 3) the end-cc code to mark the end of a connected component. Thus, the stream of chain codes, is a set of N realizations xi 2 E = f0; 1; :::; 6, end-chain, leaf, end-ccg = fei g of a random variable Xi . This set has statistical redundancies which can be exploited using an arithmetic coding method. We have determined experimentally that a first order Markov model using the matrix of transition probabilities [P ]kj = P r(Xi = ej j Xi?1 = ek ) provides a good probabilistic model of the chain. Thus, arithmetic encoding of events in a chain effectively assigns a codeword of length ?log2 (tri;i?1 ) to the event xi , where tri;i?1 = P r(xi j xi?1 ). The decoder must be able to access the matrix elements in order to decode the binary stream and so the matrix must be transmitted. This matrix of Card(E )  Card(E ) elements is usually sparse. Thus, encoding it using an arithmetic coding approach should be very efficient. To do that, the transition matrix elements are quantized to Q real values qj 2 [0; 1], known at the encoder and decoder; we used f0, .01, .1, .2, .3, .5, .7, .8, .9, .99, 1g. The frequency of occurrence of these values were estimated and used to arithmetically code the transition matrix for transmission. With the above encoding process, the geometrical data are represented by bitstreams generated for the chain codes and for the transition matrix entries. The topological data are the absolute coordinates of the root of each component, the relative coordinates of the first neighbor of each root (used to initialize the DCC method) and the selection decisions. All the topological data are transmitted with no statistical encoding. The global encoding cost (GC) of our model includes the cost of the topological and geometric data, and the transmission cost of each region’s label. 3. REFERENCE ALGORITHM To evaluate the performance of the proposed approach we have used the following reference algorithm. The whole

image 1

image 2

image 3,4

image 5,6

Figure 4: MPEG-4 test label images label image is scanned to compute a stream of events. An event xi is the label of a pixel in the label image. To encode this stream, an arithmetic encoding is used that assigns a codeword of length ?log2 (P r(xi )) to the event xi . A Q-coder model, similar to that implemented in verification model of MPEG-4 [3], defines the probabilistic model. In MPEG-4, the binary label images are divided into 1616-pixel blocks and each block is encoded using the Q-coder. Here, however, the Q-coder is applied to the whole label image. The Q-coder defines an index in a table of probabilities for each pixel. This index is data-dependent; it is computed within a 10-pixel neighborood and the MPEG-4 table of probabilities is used. It is composed of 1024 probabilities defined experimentally and known at the decoder (see [3]). This cost will be called the Q cost. 4. RESULTS AND DISCUSSION Although the contour-based approach presented here can be applied to any label images, to compare the proposed method with the Q-coder we show the results only for binary label images (MPEG-4). The selected test images have varying complexities and different formats to evaluate the potential of the two approaches. Three sequences are used: “News” (304244), “Coastguard” (CIF, i.e., 352288 pixels) and “Stefan” (CIF, derivative of “Stefan” but with modified boundaries). Fig. 4 shows the label images extracted from these three sequences. The images 1 and 2 are extracted from the “News” sequence, while images 3 and 5 come from “Coastguard” and “Stefan”, respectively. The QCIF format (144176) has also been used for the last 2 images to define the same label images at lower resolutions (images 4 and 6). The coding results are shown in Table 1, where npt is the number of contour points, nch is the num-

image

npt

nch ncc

nr

events GC cost GC cost GC cost (b) (bpp) (comp. ratio)

1

878

2

2

3

677

1332

0:017

55:6

1719

0:023

2

622

1

1

2

371

682

0:009

108:7

1322

0:017

56:1

3

1596

11

11

12

984

2255

0:022

44:9

2627

0:025

38:6

4

761

11

11

12

472

1316

0:051

19:3

1157

0:045

21:9

5

762

1

1

2

575

1168

0:011

86:8

3063

0:030

33:1

6

380

1

1

2

292

670

0:026

37:8

1425

0:056

17:8

Q cost Q cost Q cost (b) (bpp) (comp. ratio) 43:1

Table 1: Description of test images and comparison of compression results: encoding cost and compression ratio. GC = global cost for the proposed method, Q = cost for the Q-coder. ber of chains, ncc is the number of connected components, nr is the number of regions and events is the total number of chain codes. The encoding costs are in bits (b) and bits per pixel (bpp). Compression ratio is also given. The results show that the contour-based method provides, in most cases, significantly lower encoding cost than the Q-coder. The DCC method is well-adapted to contour point encoding since the number of events is always smaller than the number of contour points and represents about 6080% of the number of contour points. This difference is due to the use of codes 5 and 6 which encode 2 and 3 aligned points, respectively. Thus, our encoding method is more efficient when the contours are smooth. These results also show that the two encoders have different characteristics. Between the QCIF and CIF formats the number of events for the Q-coder is multipled by 4 and we can say that the volume of data increases as n2 . In the case of the contour-based approach, the number of contour points does not increase as n2 but as n. Moreover, in the case of “Coastguard”, the Q-coder method provides better results for the QCIF image and worse on the CIF image. The method uses a 10-point neighborood to assign to each event a probability from the table of transition probabilities. This table is previously defined based on test images and since the size of the image changes, so does the context. Therefore, the probability assignments are (or not) welladapted to that particular table. In the case of the contourbased method, the statistics of the chain codes are nearly independent of the image format, and the efficiency of the encoding is about constant. 5. CONCLUSIONS We have presented a comparative study of two lossless representations of binary map images. It shows that the proposed contour-based approach outperforms the arithmetic coding method based on the probabilistic model used in MPEG-4. Moreover, the contour-based approach is more general than the Q-coder and can be applied to any label image. We believe that the encoding of the chain codes can be improved. Other patterns can be defined with numer-

ous states, and the statistical encoding method of the chain codes can be better optimized. For example, to encode the transition matrix, an optimisation process could be developed to define the quantization values according to the original probabilities. Overall, the proposed approach seems to be a promising method for the compression of label images in the context of region-based coding. 6. REFERENCES [1] L. Labelle, Repr´esentation adaptative d’images appliqu´ee a` un sch´ema de codage orient´e r´egions de s´equences TV. PhD thesis, no. 1677, Universit´e de Rennes, France, 1996. [2] H. Sanson and L. Labelle., “Partition encoding based on graph representation and geometric approximation of contours for regions-based video coding,” Proc. SPIE Visual Communications and Image Process., vol. 3024, pp. 1417–1428, 1997. [3] ISO/IEC JTC1/SC29/WG11, “MPEG-4 Video Verification Model Version 8.0.” MPEG97/N1796, July 1997. [4] H. Freeman, “Application of the generalized chain coding scheme to map data processing,” Proc. IEEE Comp. Soc. Conf. Pattern Recognition Map Data Proccessing, May 31-June 2 1978. [5] T. Minami and K. Shinohara., “Encoding of Line Drawings with a Multiple Grid Chain Code,” IEEE Trans. Pattern Analysis Machine Intelligence, vol. PAMI-8, pp. 269–276, 1986. [6] C. C. Lu and J. G. Dunham, “Highly efficient coding schemes for contour lines based on chain code representations,” IEEE Trans. Commun., vol. 39, pp. 1511– 1514, Oct. 1991. [7] T. Kaneko and M. Okudaira, “Encoding of arbitrary curves based on the chain code representation,” IEEE Trans. Commun., vol. COM-33, pp. 697–707, July 1985.