A Robust GM-Estimator for the Automated Detection of External ...

35 downloads 229539 Views 730KB Size Report
This paper deals with a new method that pro- cesses hardwood laser-scanned ... and rotten or decayed regions in hard- or softwood logs and stems that are ... data improves cutting strategies that optimize log recovery or yield, i.e., preserving ...
3568

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 7, JULY 2007

A Robust GM-Estimator for the Automated Detection of External Defects on Barked Hardwood Logs and Stems Liya Thomas, Student Member, IEEE, and Lamine Mili, Senior Member, IEEE

Abstract—The ability to detect defects on hardwood trees and logs holds great promise for the hardwood forest products industry. At every stage of wood processing, there is a potential for improving value and recovery with knowledge of the location, size, shape, and type of log defects. This paper deals with a new method that processes hardwood laser-scanned surface data for defect detection. The detection method is based on robust circle fitting applied to scanned cross-section data sets recorded along the log length. It can be observed that these data sets have missing data and include large outliers induced by loose bark that dangles from the log trunk. Because of that and because of the nonlinearity of the circle model, which presents both additive and nonadditive errors, we initiated a new robust Generalized M-estimator for which the residuals are standardized via scale estimates calculated by means of projection statistics and incorporated in the Huber objective function, yielding a bounded influence method. Our projection statistics are based on the 2-D radial vectors instead of the row vectors of the Jacobian matrix as advocated in the literature dealing with linear regression. These radial distances allow us to develop algorithms aimed at pinpointing large surface rises and depressions from the contour image levels, and thereby, locating severe external defects having at least a height of 0.5 in and a diameter of 5 in. Index Terms—Generalized M-estimation, object detection, robust circle fitting, robust estimation.

I. INTRODUCTION NE of the major activities of the wood industry is log sawing for producing lumber in various sizes and grades. Log quality, which has a direct impact on lumber grades, is inversely proportional to the presence of defects. The high- and low-quality logs are determined by defect type, frequency, size, and location. Log defects refer to overgrown knots, sawn knots, and rotten or decayed regions in hard- or softwood logs and stems that are harvested from forests. They include both internal and external defects. Internal defects are usually composed of one or more types of damage, such as rotten wood, regions of decay and discoloration, knots, holes, and insect damage. Ex-

O

Manuscript received January 28, 2006; revised November 8, 2006. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Simon J. Godsill. This work was supported in part by the U.S. Department of Agriculture under Grants 01-CA-11242343-065 and 02-CA-11242343-083. L. Thomas is with the Department of Computer Science, Virginia Tech, Blacksburg, VA 24061 USA (e-mail: [email protected]). L. Mili is with the Bradley Department of Electrical and Computer Engineering, Virginia Tech, Arlington, VA 22203 USA (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TSP.2007.894262

ternal defect indicators consist of bumps, splits, holes, and circular distortions in the bark pattern. Logs can be classified into softwood and hardwood. In general, most softwood trees have a fast growth rate and exhibit identical, clustered defects mostly caused by branch pruning. By contrast, hardwood trees generally grow more slowly and thereby, are plagued by a variety of defects, but yield more valuable products. Studies have demonstrated that the use of defect data improves cutting strategies that optimize log recovery or yield, i.e., preserving the largest possible region of clear wood on a board face [1], [2]. This is a rather challenging task to achieve because the distribution, types, and sizes of hardwood defects are random and irregular. Traditionally, before a hardwood log is sawn, an assessment of its quality is performed, typically via a mill operator’s visual inspection. This includes detection and classification of external defects because the latter reveal the presence of internal defects that stem from the pith of the log. On average, this process takes place within 8 s, which can be quite subjective and depends entirely on the operator’s judgment and abilities. It is recognized that it is difficult to accurately and quickly detect and measure defects by manual methods [3]. This is the reason why the development of fast and reliable automated defect detection methods has recently attracted a great deal of attention among the signal processing community. Over the last few decades, various internal defect inspection methods have been proposed in the literature, which include computer tomography/X-ray (CT/X-ray), X-ray tomosynthesis, magnetic resonance imaging (MRI), microwave scanning, ultrasound, and enhanced pattern recognition of regular X-ray images [4]–[7]. CT or MRI systems provide excellent quality internal images of logs. For example, Li et al. [8] were able to accurately locate and describe internal defects of hardwood logs using CT data computer vision algorithms such as the feedforward artificial neural network classifier. However, image acquisition provided by these techniques is slow and expensive. In addition, the variable moisture content and the log size can present problems to the CT scanning device [9]. Therefore, no commercial installation of these methods is known to exist at this time. Meanwhile, Tian and Murphy [3] developed a methodology of extracting features and defects from grayscale images of freshly harvested radiata pine, a type of softwood. This approach relies on the fact that the intensity of such defects is much stronger than that of surrounding bark, yielding better contrast. For hardwood, this information does not exist for a large percentage of external defects that are old and covered under bark.

1053-587X/$25.00 © 2007 IEEE

THOMAS AND MILI: ROBUST GM-ESTIMATOR FOR THE AUTOMATED DETECTION OF EXTERNAL DEFECTS

Unlike the previously listed technologies, laser scanners are significantly less expensive and easier to operate. Methods based on these devices are being investigated by a few private companies. One of them is the Perceptron system [10], which is currently under development and is expected to be able to detect gross external defects, such as knots and bulges. Another system is discussed by Orbay and Brdicko [11]. While based on high-frequency lasers to generate a log surface scan, it is limited to discovering holes and overgrown knots on softwood logs. To overcome these weaknesses, we developed a new method that processes hardwood laser-scanned surface data. Our approach is fast while being able to detect severe external log defects that are at least 0.5-in height and 5-in diameter with a 97.5% of detection probability and a probability of false alarm of 1.5%. It proceeds in three major steps as follows. First, it determines an appropriate reference level by performing 2-D circle fits to scanned cross-section data sets recorded along the log length. Next, it obtains radial distances and determines from them contour image levels, and finally, it locates severe external defects pinpointed as large surface rises or depressions. The log data we obtained comprises a large number of points (about 1000 2-D Cartesian coordinates per cross section) and contains a small percentage of outliers (less than 5%). Statistically, outliers are observations that deviate from the pattern formed by the majority of a data set, which in our application are caused by loose bark or supporting structure of the scanning equipment. There are also missing data due to log sizes and scanner limitation and calibration. Many least squares (LS) 2-D curve-fitting methods have been proposed in the literature; see, for example, [12]–[15]. However, all these methods fail to provide a good fit to the log cross-section data as they assume that data are free of outliers and complete. This is the reason why we resort to the theories and methods proposed in the field of robust statistics [16]–[19]. In signal processing, the M-estimators introduced by Huber in 1964 [16] and the least median of squares (LMS) estimator proposed by Rousseeuw and Leroy [18] have received a great deal of attention [20], [21]. In particular, the use of M-estimators has been advocated for a broad range of applications such as spectrum estimation [22], multiuser detection in wireless communications [23], image filtering [24], and image modeling for log defect recognition [25] and classification [26]. It turns out that neither of these estimation methods meets the requirements of good resistance to outliers and low computational complexity for circle fitting. Indeed, all the M-estimators are not resistant to outliers in position of leverage while the LMS estimator is solved via combinatorial optimization procedures [18]. Recall that a leverage point in linear regression is a data point whose projection on the design space is distant from the others, resulting in an unbounded influence of position for all the M-estimators. For example, it can be shown that the L -norm estimator passes right through a leverage point, hence its name [17]. In the late 1970s, a great deal of effort has been devoted to the initiation of new classes of estimators that are resistant to bad leverage points. This endeavor resulted in the development of the class of the generalized M-estimators, or GM-estimators for short [27]. They include the Mallows-type

3569

and the Schweppe-type GM-estimators that bound in different ways the influence of position. Both approaches incorporate a weight function in the objective function that is inversely proportional to the relative distances of the points in the design space. Unlike the Mallows-type estimators, which downweight all the leverage points, the Schweppe-type methods downweight only bad leverage points, resulting in an enhanced statistical efficiency. In signal processing, the only applications of GM-estimators are in autoregressive moving average (ARMA) parameter estimation [19] and in electric power system state estimation [28]. Made primarily for linear regression, these proposals are to be extended to nonlinear regression such as circle fitting. This need prompted us to develop a new Schweppe-type GM-estimator whose objective function makes use of a weight function calculated by means of projection statistics so as to bound its influence function. Our projection statistic algorithm utilizes the 2-D radial-vector coordinates instead of the row vectors of the Jacobian matrix, as proposed in [28] for power state estimation. This nonlinear method proves effective in our application in that it successfully identifies severe outliers in data. -function rather than that of a reWe choose the Huber descending M-estimator such as Tukey’s biweight function [17] because the former is convex while the latter is not. Obviously, iterative algorithms based on convex functions are less prone to numerical problems. Furthermore, our choice fell on the circle rather than on the ellipse model for the following reasons. While each individual ellipse does generate radial distances that tend to reveal more surface details, unfortunately, the resulting surface contour map contain more undesirable features, primarily due to the difference of axes orientation between neighboring ellipses. Obviously, this is a serious drawback that is not shared by circle fitting. However, the latter method has its own weakness as well, since it does cause the rolling effect in the contour map along the cross-section direction; however, this is a minor issue compared to the former one. The paper is organized as follows. Section II is devoted to the new robust GM-estimator for circle fitting along with the iteratively reweighted LS algorithm that implements it. Section III derives the influence function of the GM-estimator for circle fitting and shows that it is bounded. Section IV provides some simulation results carried out on real logs and describes a method that identifies the defects on the log surface based on the contour levels generated from the radial distance image. The algorithm that calculates the projection statistics is described in the Appendix. II. ROBUST GENERALIZED M-ESTIMATOR FOR CIRCLE FITTING To obtain a good circle fitting to the recorded data for a given log cross section, we develop a new GM-estimator and propose an algorithm that implements it. A. Robust Circle Fitting The 3-D log surface data consist of a collection of 3-D range data points comprised of circular-shaped cross sections from denote the set of data the scanner. Let . Our points of a given cross section, where intention is to fit a circle to these data points, which all lie on a

3570

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 7, JULY 2007

plane defined by a constant third coordinate, . On that plane, one can define a nonlinear regression model given by (1) where is the parameter vector containing the and the radius of the circle, and center coordinates is the 2-D measurement vector in the where cross section under consideration. In (1), the measurement error , while the model error is vector is defined as denoted by and accounts for the uncertainty in the assumed circle model. Note that this uncertainty exists even if the measurements are perfect. The model given by (1) can be written in a compact form as for

which includes asymmetric distributions. For small , this of the model indicates that there is a large fraction while the remaining fraction errors that follow follow an unknown distribution . Such a model will be used in Section III to derive the influence function of the GM-estimator. The estimator is a solution to (7) Assuming that is constant in the neighborhood of and , we get defining the scalar function (8)

(2)

The problem is hence to robustly estimate the parameter vector in (2) from a 2-D measurement vector . For this model, conventional M-estimators are not robust because their influence function is not bounded for the error vector , as it is shown in Section III. A Schweppe-type GM-estimator is more appropriate here. This estimator minimizes an objective function of the form

The vector of the

in (8) denotes the transpose of the th row Jacobian matrix given by

.. .

.. .

.. . (9)

(3) Here,

is the Huber function expressed as for

(4)

for and the residual

is defined as (5)

with (6) Note that the only difference between the two functions and is the presence of the measurement in the latter. We pick in (4) to have a error vector good statistical efficiency at the Gaussian distribution while not increasing too much the bias under contamination [16], , we get the [17]. Writing (5) in compact form for -dimensional residual vector , where is an -dimensional vector-valued function. In (3), is a robust esmedian and timator of scale of the residuals given by is an appropriate weight function that makes the esis chosen timator robust against outliers in . The Huber to bound the influence of the errors , that is, of the residuals , are being introduced to bound the influence of the while the measurement errors , in the model given by (1) and (2). The are assumed to be distributed according to the errors and -contaminated model , . It defines a -contamination neighborhood where of the multivariate Gaussian probability distribution ,

The function is calculated based on the projection statistics defined in Section II-C. It is such that it equals one for a good measurement and decreases asymptotically to zero as the radial distance of to the fitted circle increases beyond a given threshold. Consequently, the objective function given by (3) and (4) will not downweight a good measurement with small stan, because in the quadratic dardized residual (i.e., region of the -function, cancels out from the particular summation term in (3); but, for an outlier, the -function reduces to , downweighting it. Thus, our estimator is influence-bounded, a property that is shown more formally in Section III. B. Iteratively Reweighted LS Algorithm A solution to (8) is found through the iteratively reweighted least squares (IRLS) algorithm [16], [29]. To derive its expression, we first divide and multiply the -function in (8) by the standardized residual to get (10) where (10) in a matrix form, we obtain

. Then, putting (11)

where is a weight matrix. Perabout forming a first-order Taylor series expansion of , we get the value of obtained at the th iteration (12)

THOMAS AND MILI: ROBUST GM-ESTIMATOR FOR THE AUTOMATED DETECTION OF EXTERNAL DEFECTS

Substituting (12) into (11) and putting

, we obtain

(13) The initial conditions for the IRLS algorithm given by (13) are not determined by the conventional LS method [12] because the latter provides a solution that has a too large mean-squared error due to the action of severe outliers, especially those that stems from the supporting scanner structure under the log. One alternative method would be to resort to the least median of squares estimator or any other high breakdown estimator [18]. However, this class of estimators is typically implemented via computationally intensive algorithms that are inappropriate for this application. To circumvent this difficulty, we have developed a simple and very fast method based on the log data characteristics, which provides reasonably good initial conditions. It consists of the following three steps. 1) Identify all the cross sections that have a sufficiently large number of data points, say larger or equal to 80% of the average number of data points per cross section; the remaining cross sections are considered corrupted and will be excluded from the computation in steps 2) and 3). 2) For each of these cross sections, pick as an estimate of the and coordinates of its center, the midpoints of the and axes, minimum and maximum values along the respectively; pick as an estimate of its radius the midpoint of the width and the height of the bounding rectangle. 3) Smooth out the center point values and radii by replacing each of them with the corresponding averages taken over three consecutive cross sections, known as box filter [30]. C. Defining the Weight Function Unlike the GM-estimator developed for linear regression, the in (3) are not calculated from the residuals given weights by (5), which are algebraic squared distances; they are rather determined from the radial distances between the data points and the fitted circle. Furthermore, they are evaluated in a robust manner by means of the projection statistics, which can be viewed as a robust version of the classical Mahalanobis distances of a collection of points in -dimensions. The radial distances that we are referring to are defined as denote the center of the circle and follows. Let denote the radial vector between the point let and the circle with radius . The vector is then given by where stands for the norm identify a point of a vector . The vectors cloud in a plane. 1) Classical Outlier Identification Methods Based on Mahalanobis Distances: The conventional method for identifying outliers among a collection of points in -dimensions, , makes use of the Mahalanobis distances expressed as , where MD and are the sample mean and the sample covariance matrix, respectively. A well-known result is that when the s are drawn from a multi, the MD follows approxivariate normal distribution

3571

mately a chi-squared distribution with degrees of freedom, [31]. Therefore, there is a probability of approximately 97.5% will fall inside the tolerance ellipsoid given by that a point MD . A sensible approach would then be to flag as deviant points, termed outliers, all the data points that fall outside that ellipsoid. While this method seems to be reasonable at first glance, it is unfortunately prone to the masking effect of multiple outliers because the sample mean is attracted by them and the sample covariance matrix is inflated to the extent that some or all of them may fall inside the tolerance ellipsoid. 2) Robust Outlier Identification Based on Projection Statistics: Initiated independently by Stahel [32] and Donoho [33] in 1982, the projection method was inspired by the following equivalent expression of the Mahalanobis distance: MD

(14)

where and are, respectively, the sample mean and the sample standard deviation of the projections of the data points on the direction of the vector , and where the maximum is taken over all the possible directions. A robust version of (14) is then obtained in a straightforward manner by replacing and by robust statistics, for example, by the sample median and the median absolute deviation (MAD) from the median of the projections. A practical implementation of this method was advocated by Gasko and Donoho [34], who proposed to investigate only those of directions originating from the coordinate-wise median the point cloud and passing through each of the data points , yielding a total of directions to be examined. Termed projection statistic, the resulting estimate for a data point, say the th point, is indicative of the distances that it has with respect to the bulk of the point cloud in the worst 1-D projection. Formally, it is defined as PS

(15)

The algorithm that calculates projection statistics can be found in the Appendix. Note that this estimator is different from the one proposed by Mili et al. [28] for power system state estimation since here PS is determined based on the radial vectors, while in the latter, it is based on the row vectors of the Jacobian given by (9), which revealed to be not robust in matrix our application. Rousseeuw and Van Zomeren [35] showed through Monte Carlo simulations that when a collection of data points in -dimensions are drawn from a multivariate Gaussian distribution, their squared projection statistics follow roughly a chi-squared distribution with degrees of freedom. Since in our case we are dealing with observations in 2-D, we apply a statistical test at a significance level of say 97.5% to tag as an outlier any data point that has PS . This allows us to define a weight PS where , which function as is used in the objective function of the GM-estimator defined by (3). Note that this weight function decreases as the squared PS when the latter gets larger than the threshold .

3572

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 7, JULY 2007

Now, let grow to infinity, yielding an infinitesimal fraction of . Therefore, the cumulative probability contamination as , and may be distribution function of the random vectors expressed as the contaminated model given by (18) where is the unit probability mass at point . If we let denote the asymptotic functional form of , (16) reduces to (19) The asymptotic influence function of the estimator is defined as the Gâteaux derivative [16] given by

at

(20) Fig. 1. GM-fitted circle (solid line) with the confidence ring (dashed line).

3) Determining Confidence Rings of the Fitted Model: The extreme data points in the log data can be detected by determining the confidence ring of a fitted circle. Such points are composed of outliers, as well as data that are part of a log defect with significant protrusion or depression. The 95% confidence ring is the region between the two circles both centered , with radius and , respectively, at median . If a data point is outside where that confidence ring, it may belong either to a loose bark or to a defect with large protrusion or depression. Fig. 1 demonstrates such a method. III. INFLUENCE FUNCTION OF THE GM-ESTIMATOR Following Neugebauer and Mili [36], [37], we derive the asymptotic influence function of our GM-estimator and show that it is bounded. To this end, consider a set of 2-D measurewhose coordinates ments of size , meaare contained in vector . Suppose that the first surements, whose coordinates are contained in vector , are independent identically distributed (i.i.d.) according to the , Gaussian cumulative probability distribution function while the last measurement point takes on arbitrary values on , yielding a fraction of contamination . Also, suppose that the vector is independent from the -dimensional model error vector , whose components are assumed to be i.i.d. according to a cumulative probability . Let denote the distribution function joint probability distribution function of and . By processing the measurement vector , our GM-estimator provides an estimate for by seeking a solution to an implicit equation given by

It is the directional derivative of in the direction of . To derive it, let us first substitute (18) into (19) to get

at

(21) Differentiating with respect to , it follows:

(22) , assuming Fisher consistency given Evaluating (22) at , and interchanging differentiation and by integration in the first term of the summation, we get

(23) Applying the chain rule to the kernel of the first integral and using the sifting property of the Dirac measure, we obtain (24) Solving for

, we get (25)

given by (17) with respect to while assuming Deriving that and are independent of over the neighborhood where the derivative is applied, it follows:

(16) (26)

where (17)

where which is equal to

is the Hessian matrix of , . Applying the chain rule

THOMAS AND MILI: ROBUST GM-ESTIMATOR FOR THE AUTOMATED DETECTION OF EXTERNAL DEFECTS

to the derivative of that

with respect to , we obtain

3573

and using the fact

(27) where expression of

. Substituting (17) and (27) into the given by (25), we get

(28) is bounded beWe observe that the influence function and cause the influence of the residuals is bounded via because the weight function is decreasing from one to zero for an outlier , and thereby, is bounding the influence of the column vector , which is just a function of , as for all the measurements, inseen in (9). Now, putting cluding for , turns our GM-estimator into a conventional M-es. timator [16], [17], which obviously has an unbounded This indicates that a single outlying point may drive the bias of an M-estimator to infinity. IV. SIMULATION RESULTS Simulations for the developed robust estimators were performed using several complete log samples; some were executed on single data cross sections, while the rest on the entire log data. They are carried out on a personal HP notebook computer with 3.06-MHz Intel Pentium 4 processor with hyperthread. High-end PCs will be the type of computers for such an application that are affordable to sawmills. First, we discuss the results obtained using data cross sections. Then, we analyze graylevel images obtained from radial distance images, and finally, we generate a contour plot and identify defects.

Fig. 2. (a) End points of radial vectors [originated from (0; 0)] of one data cross section. Outliers due to loose bark are marked in red and are separated from the good data. (b) Cross section of log data with a large segment of missing values along with outliers together with three fitted circles superimposed. These circles have been fitted using our robust GM-estimator (solid circle), the Huber M-estimator (dashed circle), and the LS estimator (dashed–dotted circle). We observe that the robust GM-fitted circle passes through the good data points while the other two fitted circles are attracted by the outliers.

A. Circle Fitting Using the GM-Estimator The simulation results depicted in Fig. 2 are those of log # 480 at length 30.044 in, which has a cross section with 786 data points. Table I displays the projection statistics calculated from the projection statistics assessed from the radial distances, which are denoted by PS. The square root of the 97.5 percentile of the chi-squared distribution with 2 degrees of freedom, , is the threshold chosen for PS beyond which a point is flagged as an outlier. We observe that PS identifies all the outliers in the data. Fig. 2 further demonstrates the robustness of the GMestimator over the Huber M-estimator and the conventional LS estimator. For all these three estimators, the IRLS algorithm set to one for the given by (13) is applied, with weights Huber M-estimator and the threshold set to a large value (i.e., ) for the LS estimator. As a robust goodness-of-fit criterion, we calculate the sum of the squared radial distances (SRD) over the data points that have not been rejected, that larger than 0.1. is, that have been assigned weights

TABLE I PROJECTION STATISTICS OF SOME LOG DATA

For the Huber and the LS estimator, none of the data points are rejected, yielding an SRD equal to 2373.11 and 2317.61, respectively. Regarding the GM-estimator, 178 bad points

3574

are rejected, yielding a much smaller SRD value of 342.64, which indicates a much better fit. As observed in Fig. 2, the robust fitted circle passes right through the good data points. Incidentally, the asymptotic bias under contamination of the for close to zero. GM-estimator is roughly equal to It remains reasonably small for small values of . Simulations showed that our GM-estimator can handle no more than 33% of contamination. For a larger percentage, it may breakdown for outliers placed at strategic locations, for example, when they are located at the same point. The experiments with the circle-fitting robust regression model brought insight to the research work of external log defect detection of hardwood logs and stems. First, it is essential to perform a model fitting to the log data, because the fitted solutions help to sort the input data and provide a reference level of the log surface for defect detection, segmentation, and classification. The parameters of the fitted model, i.e., the center of a fitted circle, can be applied to remove redundant data and to sort data points in an increasing order of angles of the vectors passing through the circle center and data points with respect to the horizontal axis. Moreover, a robust 2-D circle fitting helps to amplify the variation on log surfaces that contain external defect information. The criterion considered for a good fitting algorithm is the one for which the solution minimizes the variance of the regular bark regions, and maximizes that of the defect regions. This is achieved by fitting circles to the cross sections, such that the radial distances are as small as possible in bark regions. To do so, the weight function of the data, defined in Section II-C, should give more weight to data in the bark region, and give less weight to data in the defect region. Typically, the bark region tends to fluctuate around the fitted circle with small variations for the majority of the log data cross section, thereby revealing a large protrusion or depression that departs from the fitted model significantly. Circle fitting is done individually to each data cross section. The initial conditions are estimated using the method discussed in Section II, where cross sections are independently analyzed, then all the initial parameters are smoothed across the entire log. Fitted circle parameters are also smoothed using a box filter. The case in Fig. 2 is special, in that there are severe outliers and large number of missing data. There is approximately less than 5% of cross sections being corrupted. Most of them are complete without severe outliers or missing data. For the 14 log samples we experimented with, fitting results are satisfying in general. A total of 68 defects exist in the 14 samples, 63 were correctly identified, while the algorithm also falsely identified ten regions which are clear wood. In our next phase, we will experiment with more log samples and more performance evaluation metric will be provided.

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 7, JULY 2007

Fig. 3. (a) Radial distance grayscale image of a segment of log # 493. (b) Corresponding contour plot automatically generated by the defect detection Matlab programs. Dashed–crossed rectangles mark the possible defective regions, and solid–crossed rectangles are overlaid observed defective regions.

First, the position along the log’s length in the 3-D data given is mapped to the row number in the 2-D image, with by . Second, the column number is calculated by scaling the angle of a point from the center of the fitted circle, . If the desired image is to be 750 pixels wide, with . Thus, the 3-D point then the scaling factor would be with a radial distance of would become the 2-D with a grayscale value of . The radial distances point is associated are linearly interpolated so that each point with a certain value. After such a conversion, they are referred to a to as gridified. To convert the gridified radial distance and minimum of all gray value , the maximum is calculated the radial distances are first determined and the through

B. Radial Distance Images To create an image of the log surface, the radial distances are converted to grayscale values, as depicted in Fig. 3(a). Typical radial distances range from to , and the grayscale values range from 0 to 255. Since the log data are not originally in a grid format, the corresponding radial distances are not in a grid format either. To form a grid, the radial distances are interpolated linearly to fill up any gaps. This is carried out as follows.

(29)

Since the number of rows and columns are out of proportion ( versus ), another linear interpolation is performed to insert rows between the original radial distance rows. This creates a radial distance image resembling the log surface. This is

THOMAS AND MILI: ROBUST GM-ESTIMATOR FOR THE AUTOMATED DETECTION OF EXTERNAL DEFECTS

illustrated in Fig. 3(a), where radial distance images were generated by circle fitting along with the log defect diagram for comparison. C. Generating Contour Plots and Identifying the Defects Using the graylevel image, we can generate a contour plot [Fig. 3(b)], where it is possible to discern the regions containing likely defects based on height information alone. We developed an algorithm to generate rectangles that enclose regions with contour curves at the highest level. Regions with significant height change ( 0.5 in) and sizes are selected as possible defect regions. Our simulation showed that the majority of the most protruding or depressing defects with a diameter of at least 3.5 in is detected. If the rectangles enclose a portion, e.g., a corner, of the external defect, further analysis is applied and the boundary is adjusted to correctly map the exact defect region. This often occurs in the case when the defect is a sawn knot, where a part of the defect region is roughly flat, and the corresponding data cross-section segments are relatively straight. Thus, we apply a statistical method to determine the existence of the straight-line segments for the boundary modification. This is achieved by checking all enclosed small regions of highest contours over a region less than 25 in . Defects with little surface variation are not covered in the highest or lowest contour curves, and therefore, cannot be identified using the previous method. Such defects include relatively small overgrown knots and sound knots, and large or medium distortions. In general, these are less severe than big knots or deep cracks on log surface. Presently, we are developing a method that applies a specific filter that aims at locating knobby defects of at least 5-in diameter. This required minimum size is mainly due to the limitation in data resolution of 0.8 in between cross sections. Such defects are severe degraders of logs and lumber and must be identified. Depending on the location, tree grade, and specie defects, this type of defects comprise approximately 42% of the total defect population [38]. Note that relatively small errors during sawing can make significant dollar value differences in the lumber recovered [1], [2]. For example, a small mistake could result in more lower value 2A Common boards and fewer high-value FAS and Select boards being sawn. Specifically, a FAS Appalachian red oak board (green) is worth $8.80 while the same size board graded at 2A Common is worth only $3.84 [39]. Just 50 sawing decision errors a day in a medium-sized mill could easily result in the production of 200 more low versus high quality boards. Over the course of the year, these types of errors could result in a loss of more than $400 000. In a recent study, six yellow-poplar logs were sawn, to the best of the operator’s ability, to produce the highest valued set of boards [40]. The defects on the boards were noted and a 3-D shape model of the logs and internal defects were constructed. For those six logs using a sawing simulator, the researcher was able to obtain 132 more boards in the highest valued lumber grades (FAS, F1F, and Selects). Further, the researcher obtained 47 fewer lower valued 1 Common boards. Knowing the size, shape, and location of defects available from surface scanning would allow hardwood sawyers to make better decisions and reduce the occurrence of costly errors.

3575

A mill operator spends on average 9 s to inspect one whole log. This is much faster than the current computing time of 1.5 min taken by our algorithm when it is executed on a 3.06-MHz PC with hyperthread. We will investigate means of improving the algorithm efficiency and execution time. V. CONCLUSION AND FUTURE WORK This paper introduces a new robust GM-estimator that performs 2-D circle fitting to detect external defects on hardwood logs and stems. Classical estimation methods based on the LS method revealed themselves to be unreliable because they generate strongly biased estimates due to the presence of missing data and severe outliers. By contrast, our GM-estimator suppresses these outliers via weights calculated from projection statistics applied to the radial distances, thereby bounding the influence function of the estimator. Based on these robust circle fittings, the developed defect-detection programs transform the original log data into a sharper and cleaner graylevel image, determine contour levels of the radial distances, and further narrow down the potential defect regions. The generation and initial processing of the radial distance image is not the final step of this work. Not only log unrolling and height analyses methods have been examined, but also an algorithm has been developed that determines whether a region of interest contains a sawn knot by locating the approximately straight line segment in a cross section. As a future work, texture analysis methods will be investigated to detect defects with small height changes. Since such methods are a lot more computationally intensive than the contour approach, it is logical to apply the latter to detect defects with significant height changes. A preliminary study has been conducted to extract features of external defect types from randomly chosen defect samples. These features will be used to train a robust clustering and classification system for defect classification. APPENDIX The algorithm for projection statistics consists of the following main steps. in , let de1) For a certain . Then, calnote the median of culate the coordinate-wise median given by . 2) Calculate the directions . , yielding , disregard the corWhenever responding direction in subsequent computation. 3) Calculate . . 4) Calculate . 5) Calculate , let 6) For a certain in denote the MAD of from Then, calculate MAD . and all in , calculate the standard7) For all in ized projections MAD . , calculate the projection statistics PS 8) For all in .

3576

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 7, JULY 2007

ACKNOWLEDGMENT The authors would like to thank E. Thomas, Dr. J. Baumgrass, and Dr. J. Wiedenbeck of the United Stated Department of Agriculture (USDA) Forest Research Lab of Princeton, WV, for their continuing collaboration and support for this research, and Perceptron Inc., Farmington Hills, MI, for providing their TriCam log scanning equipment which was used to collect author’s sample data. REFERENCES [1] D. G. Hodges, W. C. Anderson, and C. W. McMillin, “The economic potential of CT scanners for hardwood sawmills,” Forest Products J., vol. 40, no. 3, pp. 65–69, 1990. [2] P. H. Steele, T. E. G. Harless, F. G. Wagner, L. Kumar, and F. W. Taylor, “Increased lumber value from optimum orientation of internal defects with respect to sawing pattern in hardwood logs,” Forest Products J., vol. 44, no. 3, pp. 69–72, 1994. [3] X. Tian and G. E. Murphy, “Detection of trimmed and occluded branches of harvested tree stems using texture analysis,” Int. J. Eng., vol. 8, no. 2, pp. 65–78, 1997. [4] S. Guddanti and S. J. Chang, “Replicating sawmill sawing with TOPSAW using CT images of a full length hardwood log,” Forest Products J., vol. 48, no. 1, pp. 72–75, 1998. [5] D. L. Schmoldt, “CT imaging, data reduction, and visualization of hardwood logs,” in Proc. Hardwood Res. Symp., Memphis, TN, 1996, pp. 103–114. [6] F. G. Wagner, F. W. Taylor, D. S. Ladd, C. W. McMillin, and F. L. Roder, “Ultrafast CT scanning of an oak log for internal defects,” Forest Products J., vol. 39, no. 11/12, pp. 62–64, 1989. [7] D. Zhu, R. Conners, F. Lamb, and P. Araman, “A computer vision system for locating and identifying internal log defects using CT imagery,” in Proc. 4th Int. Conf. Scanning Technol. Wood Ind., San Francisco, CA, 1991, pp. 1–13. [8] P. Li, A. L. Abbott, and D. L. Schmoldt, “Automated analysis of CT images for the inspection of hardwood logs,” in Proc. Int. Conf. Neural Netw, Washington, DC, 1996, vol. 3, pp. 1744–1749. [9] S. M. Bhandarkar, T. D. Faust, and M. Tang, “CATALOG: A system for detection and rendering of internal log defects using computer tomography,” Mach. Vis. Appl., no. 11, pp. 171–190, 1999. [10] Forest Products Division, Perceptron Inc., “Mill wide scanning and optimization,” Farmington Hills, MI, 1999. [11] L. Orbay and J. Brdicko, “Using external log characteristics for stem and log breakdown optimization,” in Proc. ScanTech, Seattle, WA, 2001, pp. 49–60. [12] W. Gander, G. H. Golub, and R. Strebel, “Fitting of circles and ellipses—Least squares solution,” Insitiut fur Eisenschaftliches Rechnen, ETH, Zurich, Switzerland, Tech. Rep. 217, 1994 [Online]. Available: ftp://ftp.inf.ethz.ch/doc/tech-reports/2xx/ [13] A. Fitzgibbon, M. Pilu, and R. B. Fisher, “Direct least square fitting of ellipses,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 21, no. 5, pp. 476–480, May 1999. [14] G. Taubin, “Estimation of planar curves, surfaces and nonplanar space curves defined by implicit equations with applications to edge and range image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 13, no. 11, pp. 1115–1138, Nov. 1991. [15] D. Eberly, “Least squares fitting of data,” 1999 [Online]. Available: http://www.magic-software.com [16] P. J. Huber, Robust Statistics. New York: Wiley, 1981. [17] F. R. Hampel, E. M. Ronchetti, P. J. Rousseeuw, and W. A. Stahel, Robust Statistics: The Approach Based on Influence Functions. New York: Wiley, 1986. [18] P. J. Rousseeuw and A. M. Leroy, Robust Regression and Outlier Detection. New York: Wiley, 1987. [19] R. A. Maronna, R. D. Martin, and V. J. Yohai, Robust Statistics: Theory and Methods. New York: Wiley, 2006. [20] P. Meer, D. Mintz, and A. Rosenfeld, “Robust regression for computer vision: A review,” Int. J. Comput. Vis., vol. 6, no. 1, pp. 59–70, 1991. [21] S. A. Kassam and H. V. Poor, “Robust techniques for signal processing: A survey,” Proc. IEEE, vol. 73, no. 3, pp. 433–481, Mar. 1985. [22] R. D. Martin and D. J. Thomson, “Robust-resistant spectrum estimation,” Proc. IEEE, vol. 70, no. 9, pp. 1097–1115, Sep. 1982.

[23] X. Wang and H. V. Poor, “Robust multiuser detection in non-Gaussian channels,” IEEE Trans. Signal Process., vol. 47, no. 2, pp. 289–305, Feb. 1999. [24] Y. Mainguy, J. B. Birch, and L. T. Watson, “A robust variable order facet model for image data,” Mach. Vis. Appl., vol. 8, pp. 141–162, 1995. [25] D. Zhu and A. A. Beex, “Robust spatial modeling for hardwood log inspection,” J. Vis. Commun. Image Representation, vol. 5, pp. 41–51, 1994. [26] A. J. Koivo and C. W. Kim, “Robust image modeling for classification of surface defects on wood boards,” IEEE Trans. Syst. Man. Cybern., vol. 19, no. 6, pp. 1659–1666, Nov./Dec. 1989. [27] W. S. Krasker and R. E. Welsch, “Efficient bounded-influence regression estimation,” J. Amer. Statist. Assoc., vol. 77, pp. 595–604, 1982. [28] L. Mili, M. G. Cheniae, N. S. Vichare, and P. J. Rousseeuw, “Robust state estimation based on projection statistics,” IEEE Trans. Power Syst., vol. 11, no. 2, pp. 1118–1127, May 1996. [29] P. W. Holland and R. E. Welsch, “Robust regression using iteratively reweighted least squares,” Commun. Statist.—Theory Methods, vol. A6, pp. 813–827, 1977. [30] R. M. Haralick and L. Shapiro, Computer and Robot Vision.. Reading, MA: Addison-Wesley, 1992, vol. 2. [31] T. W. Anderson, An Introduction to Multivariate Statistical Analysis, 2nd ed. New York: Wiley, 1984. [32] W. A. Stahel, “Robuste Schätzungen von Kovarianzmatrizen,” Ph.D. dissertation, Seminar für Statistik, ETH, Zurich, Switzerland, 1981. [33] D. L. Donoho, Breakdown Properties of Multivariate Location estimators. Boston, MA: Harvard Univ. Press, 1982. [34] M. Gasko and D. Donoho, “Influential observation in data analysis,” in Amer. Statist. Assoc. Proc. Business Econom. Statist. Sec., 1982, pp. 104–110. [35] P. J. Rousseeuw and B. C. Van Zomeren, “Robust distances: simulations and cutoff values,” in Directions in Robust Statistics and Diagnostics. New York: Springer-Verlag, 1991, pp. 195–203. [36] S. P. Neugebauer, “Robust analysis of M-estimators of nonlinear models” M.S. thesis, Dept. Electr. Comp. Eng., Virginia Tech, Blacksburg, VA, Aug. 1996 [Online]. Available: http://scholar.lib.vt.edu/ theses/available/etd-22820699602791 [37] S. P. Neugebauer and L. Mili, “Local robustness of nonlinear regression M-estimators,” in Proc. IEEE/EURASIP Workshop Nonlinear Signal Image Process., Mackinac Island, MI, 1997. [38] R. E. Thomas, “Data from the logger databank. A database of hardwood tree defect populations,” USDA Forest Service Databank, Northern Res. Station, Princeton, WV, 2006, unpublished. [39] “Hardwood review,” Hardwood Review Weekly, vol. 22, no. 39, May 26, 2006. [40] L. G. Occeña, “Hardwood log breakdown decision automation,” Wood Fiber Sci., vol. 24, no. 2, pp. 181–188, 1992.

Liya Thomas (S’04) received the B.S. degree in computer science from Fudan University, Shanghai, China, in 1988 and the M.S. degree in computer science from West Virginia University, Morgantown, in 1993. Currently, she is working towards the Ph.D. degree in computer science at Virginia Tech, Blacksburg. Her research interests are in robust statistics and image processing.

Lamine Mili (S’82–M’88–SM’93) received the B.S. degree from the Swiss Federal Institute of Technology, Lausanne, Switzerland, in 1976 and the Ph.D. degree from the University of Liège, Liège, Belgium, in 1987. Currently, he is a Professor of Electrical and Computer Engineering at Virginia Tech, Arlington. His research interests are in robust statistics, robust signal processing, image and speech processing, radar systems, bifurcation theory, and power systems analysis and control.

Suggest Documents