set of continuous pre-images of a digitized straight line segment. ..... n = ne + no and m = no. Then g. BLUE( ne; no) = p. 2 n f m + 1 n atan( m + 1 n. ) ?2 m n atan(.
Discrete Straight Line Segments: Parameters, Primitives and Properties Leo Dorst Philips Laboratories, North American Philips Corporation, Briarcli Manor, NY, USA, and Arnold W.M. Smeulders 1
1
Dept. of Computer Systems, University of Amsterdam, Amsterdam, The Netherlands.
Abstract
Digitizing the continuous world unavoidably looses information; as a consequence, geometrical properties of real-world objects can only be estimated from the digital data. For continuous straight lines and straight object boundaries, we show the best accuracy that can be reached in the estimation of length (and other properties), in the absence of noise. In the process, we give an analytical expression for the set of continuous pre-images of a digitized straight line segment. That fundamental result has found many applications in analyses of geometrical digitization processes; several of these are indicated, as well as recent extensions of the method to arbitrary objects.
1 Introduction In digital image analysis the aim is to measure some aspects of the continuous world on the basis of digital images. In many applications, geometrical properties are foremost among the aspects to be measured. Therefore it is of interest to study how accurately these continuous properties can still be determined from the digital data. The issues are non-trivial; this paper gives an overview some of the rst results for discretized straight lines and straight object boundaries. A typical problem is the following. Assume we have a 2-dimensional disk in the world, black against a white background, described by a function f (x; y) which is 0 inside the object and 1 outside. The goal is to measure the length of the perimeter of the disk. In microscopy, one might do so by projecting a grid onto the disk, in several directions, and counting the intersections of the contour with the grid. Moran [23] gives the procedure to convert this to an unbiased estimate of the perimeter length. In digitial image analysis, one digitizes the image before measuring; by means of a camera and a digitizer, the object is represented in a discrete array as a function g(i; j ), which may assume values in the continuous range [0; 1] (when properly scaled) due to the imperfections of optics and digitization. From this function, we have to derive a good estimate of the circumference { exact measurement has become impossible. The estimation could be performed by rst estimating the original object f (x; y) on the basis of g(i; j ), and measuring the circumference of that estimate f^(x; y). Alternatively, it could be performed by nding a good estimate of the circumference based on the discrete data directly. For instance, one could threshold g(i; j ) at value 21 and represent the pairs of discrete boundary points by a chaincode string (a way of encoding coordinate dierences between neighboring points, see Fig.1); then count the number of even chaincode elements ne (representing `straight' steps parallel to the grid coordinate axes) and odd chaincode p elements no (representing `diagonal' steps), and estimate the circumference by the estimator L = ne + no 2. The former method of measurement requires the estimation of a continuous function f (x; y) on the basis of a deformed and sampled version g(x; y). This estimation is amenable to classical image reconstruction techniques. However, the circumference of the best reconstruction is not necessarily the best reconstructed circumference. The latter estimation technique is more commonly used in image processing, mainly because it is simpler and faster. But it turns out to be far from trivial to nd `proper' estimators for the circumference. 1 This paper mainly reports work done when the authors were with the Pattern Recognition Group, Applied Physics, Delft University of Technology, Delft, The Netherlands (1980-1986).
1
In this paper we study the problem of measuring properties such as length, slope, angle, etc. of a straight fragment of a contour, discretized ideally (no distortion, no noise). This is admittedly a long way o from the general geometric parameter estimation problem outlined before. But we will see that the solution of this problem points to ways to solve the general problem. We focus mainly on the calculation of estimators for the length of a digitized straight line segment. Prior to our study, various length measures had been published. Those ofp [11], [12], [19] were of the form L = ane + bno. Freeman [11] used the intuitive but inaccurate (a; b) = (1; 2); Groen and Verbeek [12] and Kulpa [19] calculated the coecients by a statistical analysis over all digitized straight lines. Later, Koplowitz [16] computed the probability of odd and even codes for arbitrary digitization schemes and curving contours, leading to generalized coecients. However, Vossepoel and Smeulders [26] realized that the number of odd and even codes is not a very accurate way of characterizing strings, and that one may not expect very accurate estimators from such characterizations. Therefore they extended the characterization by the `corner count' (the number of odd-even chaincode transitions in the string). They computed the best linear unbiased estimator for the length of strings in this characterization, and gave the simple but accurate estimator L = 0:980ne + 1:406n0 ? 0:091nc, based on a statistical analysis of digitized straight lines. We realized that the most accurate length estimators for straight lines would be found by carrying through the main idea of [26]: nd an invertible characterization of straight strings; then `invert the digitization process', computing the set of continuous straight lines that digitize to a given string; nally, use estimation theory to compute the optimal value for an estimator of the desired property (such as length) over that set. The results of the separate steps in this study have been published in [5]-[8], and an overview for length estimation has been given [9]. In the present paper we explain our main results and their relationships, and discuss extensions of the work given since by others. An alternative way of extending Vossepoel and Smeulders [26] was followed by Koplowitz and Bruckstein [17], who computed length estimators based on linear combinations of arbitrary local properties. We will not discuss that extension here; the interested reader is referred to [17]. This paper is organized as follows. In Section 2 and 3 describe digitized line segments, and develop several characterizations of them by means of parameters. Section 4 describes the inverse process, of nding the primitives, given the parameters. Section 5 de nes estimators for properties of the continuous line segment, taking these characterizations as input. Various criteria and characterizations are studied. Section 6 develops estimators for the important property `length', and compares the results of the dierent estimators; this shows that the best-known estimators are not always the best. Section 7 investigates the completeness of our present knowledge of the metrology of digitized straight lines and other digitized shapes.
2 Digitization Suppose we have a continuous straight boundary with equation
y = ax + b; with the object lying in the direction of negative y. Without loss of generality, we can assume 0 a 1; 0 b < 1:
(1) (2)
Overlay on this boundary a grid of discretization points parallel to the axes. The boundary is digitized by the procedure sketched in the introduction: integration over the sensitivity region of a discretization point, then thresholding. It can be shown [7] that for symmetrical sensitivity regions this leads to the discrete boundary points given by: (i; yi ) = (i; bai + bc): (3) 2
3
2
4 5 (a)
1 0
6
7
(b)
(c)
Figure 1: a: OBQ digitization, b: Chaincodes, c: GIQ digitization. Here b:c indicates the oor function, or `rounding down'. This method is also known as `object boundary quantization', or OBQ. Another common method of digitization, used for lines rather than for boundaries, is `grid intersection quantization' (GIQ) [12], which codes the grid points closest to intersections of the boundary line with the grid. This gives the points (i; yi0 ) = (i; [ai + b]);
(4)
where [:] indicates the rounding-o function. Since we have [x] = bx + 21 c, we need only study OBQ, noting that results apply to GIQ when we rewrite b b + 21 . Discrete straight lines are commonly represented by their chaincode string c, of which the i-th element ci is ci yi ? yi?1 : (5) This is the representation we use. Straight strings of lines with the limitations of eq.(2) consist only of code elements 0 and 1 (see Fig.1). For the remainder of this paper, we consider nite strings, consisting of n elements, and we consider such a string as a nite part of an in nitely long string. We do not consider a nite string as the digitization of a continuous line segment { that would require special treatment of the end points. Our motivation is that in the application we have in mind, lines occur as piecewise linear approximations of closed contours, and hence do not have end points which are objectively well-de ned. The chaincode representation of straight lines has interesting properties, and it can be determined whether a given chaincode string could have been the discretization of a straight line by checking the so-called `linearity conditions'. These are directly related to number-theoretical issues in the approximation of real numbers by rational numbers, see [6, 28]. These linearity conditions can be checked incrementally, leading to an O(n) decomposition of arbitrary strings into straight substrings [24]. Another way of checking straightness is by means of the `chord property', see for instance [1].
3
3 Parameters The chaincode string representation is not convenient for use in estimation; formulas for estimators do not take strings as input, but numbers characterizing the string. As an example, the length estimator given in the introduction was based on the number of odd and even chaincodes. Four characterizations are studied.
3.1
(n)
-characterization
3.2
(ne ; no )
3.3
(ne ; no ; nc )
3.4
(n; q; p; s)
In this characterization, the string c is characterized by the number of chaincode elements n. All strings with the same number of elements are thus characterized by the same 1-tuple n. Among these, there are many non-straight strings.
-characterization
The string c is characterized by the total number of even code elements ne and the number of odd code elements no . The endpoint of the discrete line segment corresponding to a straight string c starting at (0,0) is (ne + no ; no). Thus all discrete line segments with the same endpoints have the same tuple (ne ; no ). Moreover, there are non-straight strings between the endpoints that also have the same tuple (namely all strings with only chaincode elements 0 and 1). This characterization avant-la-lettre was used in [11, 12, 19].
-characterization
The parameters ne and no have the same meaning as before. The parameter nc is the `corner count', de ned as the number of occurrences in the string where ci 6= ci?1 . This characterization was rst introduced by Vossepoel and Smeulders [26], who also showed that straight strings have the same tuple (ne ; no; nc ) if and only if they have the same discrete points at the four columns i = 0; 1; (n ? 1); n.
-characterization
The (n; q; p; s)-characterization is a 4-tuple of parameters rst introduced in [5]. It is de ned by
8 the number of elements of c > > < qn =is min Pq k fk 2 f1; 2; :::; ng j k = n _ 8i 2 f1; 2; :::; n ? kg : ci+k = ci g p = > > : s : s 2 if=10; c1i; :::; q ? 1g ^ 8i 2 f1; 2; :::; qg : ci = b p (i ? s)c ? b p (i ? s ? 1)c q q
(6)
This 4-tuple bears an invertible relation to the chaincode string c. In terms of (n; q; p; s), the string c is given by: (7) ci = b pq (i ? s)c ? b pq (i ? s ? 1)c; i = 0; :::; n:
This equation shows how the parameters p, q and s should be interpreted. The fraction pq is a rational approximation to a (in fact it is the simplest approximation, in terms of having minimum denominator). This fraction is not sucient to determine the chaincode string, since it is only related to a. The parameter s is a `phase shift' of the standard string of the fraction pq , to account for the eect of the intercept b. For a proof, see [5, 7]. By eq.(7), strings with the same tuple (n; q; p; s) are necessarily the same string. Such a characterization with a invertible correspondence between tuple and string is called a faithful characterization. Also by
4
eq.(7), strings for which a tuple (n; q; p; s) exists are necessarily straight; one continuous line of which it is the digitization is y = pq (x ? s).2
4 Primitives
Let us denote a continuous line by `; its digitization (in the form of a chaincode string) by c = C `; the characterization of c by t = T c = T C `. Denote the set of strings with a given t by T ?1 t, and the set of lines with a given c by C ?1 c. The set C ?1T ?1 t for a given tuple t is thus by de nition the set of continuous straight lines ` whose tuple T C ` equals t. These sets are conveniently expressed in the parametrization ` = (b; a) for a given n. Expressions for the sets C ?1 T ?1 t (also called `regions') are given in [7]. In (b; a)-space, they are collections of quadrilaterals. For the (n)-characterization and the (ne ; no )-characterization these can be found easily. The expressions for C ?1 T ?1 (ne ; no ; nc) were rst given in [26]. The expressions for C ?1T ?1 (n; q; p; s) were rst given in [5] and are derived more directly in [22], where they are called `facets'. Fig.2 plots the regions for all strings with n = 6 consisting of codes 0 and 1, in the four characterizations introduced in Section 3. It is seen from this gure that the ner the characterization is, the ner is the tesselation of (b; a)space; more continuous lines have become distinguishable by the characterization of their digitization. Since the (n; q; p; s)-characterization is a faithful characterization, the regions of this characterization are the equivalence classes of lines still distinguishable after digitization. We call these equivalence classes domains (McIlroy [22] calls them `facets', Havelock [14, 15] `locales'). Each string has its own domain. These domains are so fundamental in the study of discrete straight lines that it is worth repeating the theorem derived in [5], giving their analytical expressions:
The Domain Theorem
In object boundary quantization, C ?1T ?1 (n; q; p; s) is given by
1) (pq?? < a < pq++ d spq e ? sa b < d L(t)qp+1 e ? L(t)a if pq a < pq++ 2) if pq?? < a pq d L(qs)p e ? L(s)a b < d tpq+1 e ? ta
(8)
where t is implicitly de ned by tp = (sp ? 1) mod q, and where L(x) = x + b n?q x c, p+ = d pL(qt)+1 e? d spq e , p? = d pLq(s) e ? d tpq+1 e, q+ = L(t) ? s, and q? = L(s) ? t.
The rst proof of this theorem, published in [5], was performed in (x; y)-space, and quite lengthy; McIlroy [22] gave a much shorter proof in (b; a)-space, using `Farey-fans'; this inspired an improved proof in [7]; yet another proof is given in [3]. Anderson and Kim [1] give a chord-property-based algorithm that computes the `pre-images' of discretized straight lines; these pre-images are, of course, the domains. Their algorithm uses a convex hull determination as one of its subroutines; this is an O(n) algorithm. Koplowitz and Sundar Raj [18] use the convex hull construction in a non-linear ltering algorithm on chaincode strings, for accurate reconstruction of arbitrary curves. The domains for all strings of 6 or fewer elements are depicted in Fig.3; compare the gure for n = 6 with Fig.2d.3 2 Lindenbaum and Koplowitz [21] later derived a set of 4 characterizing parameters on the basis of nontrivial linear dichotomies of the discrete plane; they proved the equivalence to (n; q; p; s). 3 There are interesting relations between these domains and the approximation of reals by rationals (see [6, 22]).
5
1
1
(0,6) (1,5) (2,4) (3,3)
(6)
a
a
(4,2) (5,1)
1
0
(6,0)
0
0
1
0
b
b (a)
1
(b)
1
(0,6,0) (1,5,1)
(1,5,1)
(1,5,2)
(6,1,0,0)
(6,6,5,0)
(6,6,5,5) (6,5,4,1)
(6,5,4,0)
(2,4,2)
(6,4,3,2)
(2,4,3)
(6,3,2,1)
(2,4,3)
(2,4,4)
a
(3,3,5) (4,2,4)
(4,2,3)
a
(6,5,2,0)
(6,4,1,2)
(6,4,1,1) (6,5,1,1) (6,5,1,2) (6,6,1,1)
(6,6,1,0)
(5,1,1)
1
(6,0,0)
(6,5,2,3) (6,3,1,1)
(6,4,1,2)
(6,5,1,0)
(5,1,2)
0
(6,5,2,1)
(6,3,1,2)
(6,4,1,0)
(4,2,2)
(6,2,1,1) (6,5,2,4)
(6,3,1,0)
(4,2,3)
(5,1,1)
(6,5,3,2)
(6,2,1,0)
(3,3,4)
(6,5,4,4)
(6,3,2,2)
(6,5,3,1) (6,5,3,3)
(6,5,3,0)
(3,3,4) (3,3,5)
(6,4,3,3)
(6,4,3,1)
(6,4,3,0) (6,3,2,0)
(6,0,0,0)
0
0
1
0
b
b (c)
(d)
Figure 2: Regions in (b; a)-space of all strings with n = 6, for the four characterizations: a) (n)characterization, b) (ne ; no )-characterization , c) (ne ; no ; nc)-characterization , d) (n,q,p,s)c. 6
1
1
11
1
1
111
1
1111 0111
011 0110
110
01
101
1110 1101
1011
0101
1010
010
10
0010
001
1001 0100
0001
100
1000
0
0
0
0
00
0
1
1
0
000
1
0
n=2
n=1
111111
111110
011111
01110
11011
01101
101111
11110 11101
10111
011110
011011
111011
101101
010110
11010
110110
101011 110101
011010
10101
101010
010101
01010
01001
001001
00100
10001
010100 010010
000100
01000
00001
100101
001010
10100
10010
000010
001001
100010 100001 010000 100000
000001
0
00000
0
101001 100100
001000
10000
0
111101
110111 110110
011101
10110
00101
1
n=4
1
01111
00010
0000
0
n=3
11111
01011
0
1
1
000000
0
n=5
1
n=6
Figure 3: Domains in (b; a)-space of all strings of codes 0 and/or 1 with n 6. 7
5 Properties
Denote the property of ` to be estimated by f (`), and the estimator by g(t). All lines ` in C ?1T ?1 t have the same value for t, and hence the property f (`) for all these lines is estimated by the same value g(t). The art of designing good estimators, given C and T , is to minimize a measure of error between f (`) and g(t), for all ` 2 C ?1 T ?1 t. Two types of estimator are considered here: Best Linear Unbiased Estimators (BLUEstimators) and Most Probable Original (MPO) Estimators. The BLUE estimator is the linear estimator4 that minimizes the mean square error (MSE) between f (`) and g(T C `) over all ` considered. This is equivalent to (proof in [8])
gBLUE (t) =
Z
C?1 T ?1 t
f (`)p(`)d`;
(9)
where p(`) is the probability density of lines. In words, the BLUEstimator for f given t is the average of f over the region corresponding to t. The MPO estimator for a property f is the most probable original value for the property, given t. In formula gMPO (t) = f (argmaxfp(`)j` 2 C ?1 T ?1 tg); (10) where argmaxfp(`)g indicates the value of ` maximizing p(`) in the given range. In [7] dierent types of estimators are treated, obeying dierent optimality criteria. In [8], BLUE and MPO estimators are developed for arbitrary properties of straight line segments. Here we focus on length estimators as an important example.
6 Length Estimators
The property `length' of the continuous straight line segments considered (those spanning n columns of the grid) is given by p (11) f (b; a; n) = n 1 + a2 : 2 Note that this expression is independent of b. The probability density of random lines in R is uniform in polar parameters5, which is transformed to the parameters a and b as (see [7]) p p(b; a; n) = 2(1 + a2 )?3=2 : (12) In this Section, we develop estimators for this property. As a measure for the accuracy of a length estimator g we use RDEV(g; n), de ned as the square root of the mean square error of g, averaged over all straight strings of n elements 0 and/or 1, and divided by n:
sX Z 1 RDEV(g(t); n) = n t
C?1 T ?1 t
(g(t) ? f (`))2 p(`)d`:
(13)
The normalization by n permits interpretation of RDEV(g; n) as the relative error in the length measurement of all line segments with a projected length of unity, when the sampling density is n2 per square unit. Many length estimators can be developed (see [7]); here we consider the MPO- and BLUE- estimators, ordered by characterization. Then we introduce and discuss a dierent type of estimator, important for fast measurement: the simple length estimators. 4 5
A linear estimator is is a linear combination of values of the function f (`) over some range of `. In [2], the distribution in polar parameters is derived from rst principles, for lines in R2 and R3 .
8
6.1
-characterization
(n)
For this characterization, the MPO length estimator is given by
gMPO (n) = n: The BLUEstimator is
p
gBLUE (n) = 4 2n = 1:111n:
(14) (15)
(see [9]). For both estimators, the error RDEV tends to a constant as n ! 1. This implies that the estimation does not get more accurate than some xed accuracy, even when the sampling density becomes in nite. For gMPO (n), the asymptotic error is 16%, for gBLUE (n) it is 11%.
6.2
-characterization
(ne ; no )
For the (ne ; no)-characterization, the MPO length estimator is given by
gMPO (ne ; no) =
1p 3 if(ne ; no) = (0; 1) : 2 p (ne + no )2 + n2o elsewhere
(16)
Notice that this is in fact the Euclidean distance between the endpoints of the discrete line segment (except for the case (ne ; no ) = (0; 1)). This seems a very reasonable estimator of the length of the discrete line segment. Surprisingly, and as we will see in Section 6.4, it is not the most accurate length estimator! It can be shown that the asymptotic behavior of the accuracy as measured by RDEV is O(n?1 ) (see [7]). The BLUEstimator for the (ne ; no )-characterization is more complicated. For convenience, let us put n = ne + no and m = no . Then
p
2 m+1 m m m?1 m?1 m+1 n f n atan( n ) ? 2 n atan( n ) + n atan( n ) ? 21 log(1 + ( m n+ 1 )2 ) + log(1 + ( mn )2 ) ? 12 log(1 + ( m n? 1 )2 )g (17) The asymptotic RDEV of this estimator is also O(n?1 ). A better order than this is impossible for any estimator based on the (ne ; no)-characterization (see [7]).
gBLUE (ne ; no ) =
6.3
(ne ; no ; nc )
-characterization
6.4
(n; q; p; s)
The MPO length estimator for the (ne ; no; nc )-characterization has not been computed, due to the complex form of the regions. The asymptotic order can be computed, though, and is found to be O(n?1 ) (see [7]). The BLUEstimator was rst computed in [26]. The resulting formulas are rather involved and are not repeated here. The asymptotic accuracy is O(n?1 ).
-characterization
The MPO-estimator for the (n; q; p; s)-characterization is simply (see [7])
r gMPO (n; q; p; s) = n 1 + ( pq )2 :
This is independent of s (because the original f in eq.(1) is independent of b). 9
(18)
L0(n) L1(n) LG(ne ; no ) LF (ne ; no ) LK (ne ; no ) LN (ne ; no ; nc ) LC (ne ; no ; nc )
RDEV(g(t); 1) estimator = 1:000n 16 % = 1:111n 11 % = 1:059ne + 1:183no 8.0 % = 1:000ne + 1:414no 6.6 % = 0:945ne + 1:346no 2.6 % = 1:000ne + 1:414no ? 0:089nc 1.1 % = 0:980ne + 1:406no ? 0:091nc 0.8 %
Figure 4: The simple estimators. Though similar in form to gMPO (ne ; no ), this estimator is essentially more accurate; the asymptotic error is O(n? 23 ). The reason is that pq is a better estimate of the slope a of the real line than nen+ono . The latter is an approximation by a fraction with xed denominator n = ne + no, of which there are only a total number of n. The former approximates a by an irreducible fraction with denominator between 1 and n, of which there are O(n2 ) (see [13]).6 The BLUEstimator for the (n; q; p; s)-characterization was given in [8]. As with the BLUEstimator for the (ne ; no; nc )-characterization , the expressions are involved (in fact very similar!) and are not repeated here. The BLUEstimator for the (n; q; p; s)-characterization is the most accurate estimator for the length of a discrete line that can be constructed (in terms of minimizing the mean square error). The asymptotic 3 ? 2 order is O(n ), and by curve tting we have established the coecient of proportionality to be 0.34.
6.5 Simple Length Estimators
We will call estimators that are linear in the parameters of the characterization simple estimators, indicated by L(t). The optimal coecients in the linear function are to be determined by an optimization criterion, for which we use minimization of the mean square error over all ` 2 C ?1 T ?1 (t). The simple estimators considered are given in Fig.4. LF was rst given by Freeman [11], LG by Groen [12], LK by Kulpa [19] and LC by Vossepoel and Smeulders [26]. More on the historical development can be found in [9]. Recently, Koplowitz and Bruckstein [17] have given a design method for simple estimators based on arbitrary local string properties. Note that L0 (n) = gMPO (n) and L1 (n) = gBLUE (n). Of the estimators depicted in the table, L0 (n), LF (ne ; no ) and LN (ne ; no; nc ) are straight-forward estimators, counting `grid moves' (code element 0) as p lengthp1, diagonal moves' (code element 1) as length 2, and `knight's moves' (neighboring codes 0 and 1) as 5. They are biased, and therefore have a large RDEV . The estimators L1(n), LK (ne ; no) and LC (ne ; no ; nc) are the optimal simple estimators, i.e. simple estimators for which the coecients are chosen to minimize the MSE. LG(ne ; no) minimizes the MSE for strings with n = 1, or for longer strings with uncorrelated elements. Fig.5 plots RDEV for these estimators as a function of n. Note that these simple estimators have a limited accuracy; even with a sampling density that goes to in nity, they do not become more accurate, but the error converges to an asymptotically constant value. Therefore measuring the length of a contour with an accuracy of 1% by a simple estimator is only possible with LC , and then only if one samples with a sampling density of about 40 points along the length to be measured. Note that Fig.5 gives average values for the error (see eq.(13)) { some directions have a worse error, speci cally those along the grid axes.7 The distribution of these fractions is not uniform, and that is the reason that one not does reach an accuracy better than 0 1 The main contributors to RDEV are the simple p directions 1 , 1 , etc. 7 It is interesting to note that the direction a = atan( 2 ? 1) is the most accurate; it pays to tilt the camera to that angle
6
3 O(n? 2 ).
10
1.0000
L (n) = g (n) 0 MPO L1(n) = gBLUE(n)
0.1000
L (n ,n ) G e o LF(ne,no) LK(ne,no)
0.0100
LC(ne,no,nc)
RDEV
g (n ,n ) MPO e o gBLUE(ne,no) gBLUE(ne,no,nc)
0.0010
g
(n,q,p,s) MPO gBLUE(n,q,p,s)
0.0001
1
10
100
n
Figure 5: RDEV as a function of n for all estimators considered. Recommended estimators are indicated by drawn lines.
11
6.6 Recommendations for Length Estimators
Fig.5 shows the accuracy of all length estimators treated, as a function of n. We discuss their relative importance for image analysis. The best estimator is gBLUE (n; q; p; s), as expected. Its asymptotic accuracy is 0:34n? 23 . The fact that this is the best estimator can be interpreted in terms of the required sampling density to reach a certain accuracy. This leads to the following:
Sampling Density vs. Accuracy Trade-o Theorem
For length measurement of discrete straight lines, the sampling density d (per unit length) is related to the asymptotically best achievable estimation error percentage p by
d 10p? 32
(19)
Equality is reached with the length estimator gBLUE (n; q; p; s) only. Thus for an error of 1% one needs d > 10, when using gBLUE (n; q; p; s); and for an error of 0.1% one needs d > 50. On the basis of Fig.5 and considerations of computational eciency, we can give some recommendations as to which estimator to use: L1(n) (see Table 4) can be used in extremely time-critical situations, or in simple image analysis (count the contour pixels and multiply by 1.111). Accuracy up to 11%. Use instead of L0 (n). LK (ne ; no) (see Table 4) can be used in time-critical situations or when no high accuracy is required. Accuracy p up to 2.6%. To be used instead of LG (ne ; no) and LF (ne ; no ) (the latter is the familiar `ne + 2no '). LC (ne ; no; nc ) (see Table 4) is the `corner count' estimator. For normal use, since it is simple to compute and has a reasonable accuracy, up to 0.8%. This is comparable to the optimal result for short strings, with n < 10. gMPO (ne ; no ) (see eq.(16)) is the Euclidean distance between the discrete end points of the discrete line segment. It is easy to compute. The accuracy is 0:17n?1. gMPO (n; q; p; s) (see eq.(18)) for use when high accuracy is desired; it is only slightly less accurate than the much more involved optimal estimator gBLUE (n; q; p; s). Asymptotic accuracy 0:46n? 23 . It should be emphasized that these results apply only to strings that are the digitization of straight lines (so-called `straight strings'); interpreting them to arbitrary curves should be done with care (see [9]).
7 Extensions of the Framework The faithful characterization of straight strings, rst published in [5], has been used by many to derive additional results on digitized straight lines. Also, our method of nding primitive `domains' on which estimators are computed has been applied to dierent digitized curves. We discuss these new developments brie y. relative to the contour to be measured! (see [6])
12
7.1 The Number of Digitized Straight Line Segments
The number of straight strings consisting of n chaincodes 0 and 1 can be computed easily from the Domain Theorem of eq.(8) and some number theory. The result is 3 N = n2 + O(n2 log n):
(20)
This result is given in [7]; a dierent derivation and considerations on re ned estimates can be found in Berenstein et al.[4]; Lindenbaum and Koplowitz [20] use their own derivation of the same result, based on a characterization in [21] which is equivalent to the faithful (n; q; p; s)-characterization.
7.2 Data Compression and Conversion
We have used the domains to describe the digitization process, so as to nd the pre-images of digitized straight lines. In an interesting inversion, McIlroy [22] uses the domains to rapidly determine the digitization of the line y = ax + b. Lindenbaum and Koplowitz [20] suggest to code arbitrary strings by straight substrings, whose representation is stored in a table. It follows from eq.(20) that the size of a table containing all strings of up to n elements is n4 =(42 ) + O(n log n).
7.3 Properties of Arbitrary Curves
To determine properties of arbitrary curves optimally, one should perform a procedure similar to that for straight lines: determine the pre-image (domain), and use as estimator some statistical measure of the property on that domain. It was argued in [7] that this could not be done in general without making some assumptions about the original curve. The recognition problem for the digitization of linear combinations of linearly independent functions is treated in Werman et al.[27], with special attention to polynomials. Smeulders and Worring [25] compute the pre-images of curves that are bounded in curvature; this obviously also solves the inversion of digitization for circular arcs. They plan to extend their work with the computation of the optimal estimators, which could be stored in lookup tables for practical use.8 Havelock [14] suggests that we may not want to determine the generalized domains (he calls them `locales') exactly, but that a statistical knowledge of their properties is sucient for metrology. These principles can be used to achieve subpixel accuracy in the geometric registration of arbitrary gures, see Havelock's thesis [15]. For digitized straight lines, those calculations have been done exactly in [6] and in more detail in Berenstein et al.[3]; the latter also contains an alternative proof of the Domain Theorem of [5] (eq.(8)). A rst analytical treatment of lines and planes in three dimensions is given by Forchhammer [10], along the lines of our method, with an interesting application to the processing of half-tone pictures. There is an alternative to developing more accurate estimators for arbitrary strings through the domains, and that is to augment the simple estimators to take into account more local properties of the chaincode string. This approach is taken by [17] and leads to quite accurate and easily computable estimators.
8 Conclusions The main result of our research is the inversion of the digitization process for straight lines. This result, in eq.(8) and depicted in Fig.3, is central to the study of digitization of straight lines, and has led to an 8 We have run some experiments to assess the performance of the length estimators designed for straight strings on nonstraight strings. The results are reported in [7] and [9]. The surprising nding is that the simple length estimator LK (ne ; no ), which was designed for straight lines, is a consistent estimator for the length of large circular arcs of =8 radians. As a consequence, this estimator performs very well on the perimeter estimation of circles and circle-like objects.
13
avalanche of further results. We have used it to derive optimal estimators for arbitrary properties [7, 8], and to compare length estimators ([9], Section 6). Others have used it for further theoretical results [21, 4], for data compression [20, 22], and to achieve subpixel accuracy [3, 14, 15]; currently, the methods are being applied to other digitized shapes [10, 25, 27]. The motivation of our research in digitized straight line segments was the need for accurate estimates for their properties, notably their length. We succeeded in computing those; they are given in Sections 5 and 6 (for properties other than length, results can be found in [7, 8]). The explanation for observed dierent asymptotic behaviors of the length estimators is the characterization of the string used. For the (n)characterization the asymptotic error is O(1), for the (ne ; no )- and 3(ne ; no ; nc)-characterization it is O(n?1 ). For the (n; q; p; s)-characterization the optimal accuracy of O(n? 2 )is reached. This can be interpreted as a sampling theorem for length measurement: d 10p?3=2, where d is the sampling density and p the percentage error. The usefulness of the (ne ; no; nc )-characterization lies in the very acceptable accuracy of the simple estimator LC (ne ; no ; nc) = 0:980ne +1:406no ? 0:091nc; this is the recommended length estimator for everyday use. We believe that with the computation of the domains (pre-images) and their properties, noise-free digitized straight lines are now very well understood. The methods used give a direction for the treatment of arbitrary curves, as shown by recent work [10, 14, 15, 25]. For shapes with noise, Havelock's theory on `locales' (generalized domains) and their statistical properties looks promising.
References
[1] T.A. Anderson and C.E. Kim, Representation of Digital Line Segments and their Preimages, Computer Vision and Image Processing, Vol.30, pp.279-288, 1985. [2] A.L.D. Beckers and A.W.M. Smeulders, The Probability of a Random Straight Line in Two and Three Dimensions, Pattern Recognition Letters, vol.11, pp.233-240, 1990. [3] C.A. Berenstein, L.N. Kanal, D. Lavine, A Geometric Approach to Subpixel Registration Accuracy, Computer Vision, Graphics and Image Processing, Vol.40, No.3, pp.334-360, 1987. [4] C.A. Berenstein and D.Lavine, On the Number of Digital Straight Line Segments, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-10, No.6, pp.880-887, 1988. [5] L. Dorst and A.W.M. Smeulders, Discrete Representation of Straight Lines, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-6, pp.460-462, 1984. [6] L. Dorst and R.P.W. Duin, Spirograph Theory, a Framework for Calculations on Digitized Straight Lines, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-6, No.5, pp.632-639, 1984. [7] L. Dorst, Discrete Straight Line Segments: Parameters, Primitives and Properties, Ph.D. Thesis, Technological University Delft, 1986. [8] L. Dorst and A.W.M. Smeulders, Best Linear Unbiased Estimators for Properties of Digitized Straight Lines, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-8, No.2, pp.276-282, 1986. [9] L.Dorst and A.W.M. Smeulders, Length Estimators for Digitized Contours, Computer Vision, Graphics and Image Processing, Vol.40, pp.311-333, 1987. [10] S. Forchhammer, Digital Plane and Grid Point Segments, Computer Vision, Graphics and Image Processing, Vol.47, pp.373-384, 1989. [11] H. Freeman, Boundary Encoding and Processing, In: Picture Processing and Psychopictorics (B.S. Lipkin and A. Rosenfeld, eds.), Academic Press, New York, pp.241-266, 1970. [12] F.C.A. Groen and P.W. Verbeek, Freeman Code Propbabilities of Object Boundary Quantized Contours, Computer Graphics and Image Processing, Vol.7, pp.391-402, 1978. [13] G.H. Hardy and E.M. Wright, An introduction to the theory of numbers, Oxford, 5th edition, 1979.
14
[14] D.I. Havelock, Geometric Precision in Digital Images, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-11, No.10, pp.1065-1075. [15] D.I. Havelock, High Precision Position Estimation in Digital Image Metrology, Ph.D. Thesis, Carleton Univ. Dept. Syst. Eng. Comput. Sci., Ottawa, Canada, 1989. [16] J. Koplowitz, On the Performance of Chain Codes for Quantization of Line Drawings, IEEE Transactions Pattern Analysis and Machine Intelligence, Vol.PAMI-3, No.2, pp.180-185, 1981. [17] J. Koplowitz, A.M. Bruckstein, Design of Perimeter Estimators for Digitized Planar Shapes, IEEE Transactions Pattern Analysis and Machine Intelligence, vol.PAMI-11, no.6, 1989, pp.611-622. [18] J. Koplowitz, A.P. Sundar Raj, A Robust Filtering Algorithm for Subpixel Reconstruction of Chain Coded Line Drawings, IEEE Transactions Pattern Analysis and Machine Intelligence, vol.PAMI-9, no.3, 1987, pp.451-457. [19] Z. Kulpa, Area and Perimeter Measurement of Blobs in Discrete Binary Pictures, Computer Vision and Image Processing, Vol.6, pp.434-454, 1977. [20] M. Lindenbaum and J. Koplowitz, Compression of Chain Codes Using Digital Straight Line Sequences, Pattern Recognition Letters, vol.9, pp.167-171, 1988. [21] M. Lindenbaum and J. Koplowitz, A New Parametrization of Digital Straight Lines, EE report, Technion Israel Institute of Technology, Haifa, Israel, 1989. [22] M.D. McIlroy, A Note on Discrete Representation of Lines, AT&T Technical Journal, Vol.64, No.2, 1985. [23] P.A.P. Moran, Measuring the Length of a Curve, Biometrika 1966, vol.53, no.3 and 4, pp.359-364. [24] A.W.M. Smeulders and L. Dorst, Decomposition of Discrete Curves into Piecewise Straight Segments in Linear Time, elsewhere in this volume. [25] A.W.M. Smeulders and M. Worring, Accurate Measurement of Shape at Low Resolution, in: Pattern Recognition and Arti cial Intelligence, E.S. Gelsema and L.N. Kanal, eds., Elsevier Science Publishers, 1988, pp.91-102. In preparation is: M. Worring and A.W.M. Smeulders, Optimal Curvature Estimation in Noise-Free Digital Images. [26] A.M. Vossepoel and A.W.M. Smeulders, Vector Code Probability and Metrication Error in the Representation of Straight Lines of Finite Length, Computer Graphics and Image Processing, Vol.20, pp.347-364, 1982. [27] M. Werman, A.Y. Wu, R.A. Melter, Recognition and Characterization of Digitized Curves, Pattern Recognition Letters 5, pp.207-213, 1987. [28] L.-D. Wu, On the Chain Code of a Line, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.PAMI-4, No.3, pp.347-353, 1982.
15