Polynocular Image Set Consistency for Local Model Verification

1 downloads 0 Views 1MB Size Report
Here we present it as a statistical decision procedure based on the model ... texture the easier is the discrimination from a small number of images. .... In Figure 6 we can see that many local models were rejected in hair, since its appearance.
Polynocular Image Set Consistency for Local Model Verification ˇ ara V´ıt Z´ yka and Radim S´ Czech Technical University, Center for Machine Perception {zyka,sara}@cmp.felk.cvut.cz

Abstract: This work deals with verification of local 3D geometric model recovered from noisy data. Verification is understood as making decision about consistency of the model with a set of images. To evaluate the consistency, a statistical measure is chosen and a novel procedure for testing statistical dependence among a set of random variables is proposed. The approach is demonstrated on verification of disk-like local 3D surface model in the process of bottom-up surface reconstruction.

1

Introduction

In various computer vision tasks it is possible to encounter the problem of verification of a 3D model. The purpose of verification is to assess the consistency between the model and a set of images that are supposed to be views of the object that is modeled. For example, Baillard et al. [1] describe the reconstruction of 3D urban site models from aerial images. The local candidate models for each building roof are recovered from edge-based stereo. The roof regions are then verified in the input images based on the mutual consistency of image values. A similar problem has been reported in [4], where local planar surface models recovered in a bottom-up reconstruction process are verified for their pose. Models that are not consistent with a set of images are rejected. These practical problems prompted us to study the verification itself in greater detail. The general problem will be restricted to local planar models. Then, the verification decision is whether the model (a reconstruction hypothesis) is consistent or not with a set of images taken from general viewpoints. Here we present it as a statistical decision procedure based on the model pose parameters, the projection parameters, and a set of images. Good statistical decision allows to eliminate local models that are unlikely to correspond to real objects, which significantly improves the performance of subsequent steps in global 3D model reconstruction. This research was supported by the Czech Ministry of Education, grant VS96049, by the Grant Agency of the Czech Republic, grant 102/00/1679, by grant OCAMS No. 4/11/AIP CR, and by the Research Programme J04/98:210000012 Transdisciplinary Research in Biomedical Engineering.

2

Model Verification

A local planar model is verified if its projections to all possible images of the corresponding object are equivalent up to the class of transformations given by the physics of image formation. What is meant by local planar model is apparent from Figure 5. The intuition behind model verification is demonstrated in Figure 1.

The image formation introduces geometric distortion (perspective foreshortening) and radiometric distortion due to the surface non-Lambertianity. Both are shown in Figure 1. The former is apparent as the circle-to-ellipse distortion of the model rim and the latter as a highlight visible on the cornea, for instance. The class of distortions further includes image discretization and quantization, and the optical properties of each of the sensors.

Figure 1: Disc-like 3D model boundary projected to a stereo image pair. The model is imageconsistent, since its projections cover corresponding areas. If its pose parameters (orientation, position) were incorrect, the projections would differ from each other.

Since verification is defined in terms of projections of the model to images, we need a measure of mutual consistency among the images (given the model and projection parameters defining the pointwise image correspondences).

For practical considerations, the views used for verification must be reduced to a small number. The smallest possible number of views depends on the isotropy of surface texture, but it is always greater or equal to two, since the verification is based on the relation among the images and not on the relation model-image. The likelihood of successful rejection increases with the number of views involved and reaches certainty for their infinite number. The stronger the texture the easier is the discrimination from a small number of images. Note that any model is consistent with any number of images if the observed scene is completely texture-free and without any shading. This is not considered a failure.

Let the scene be static or all images be captured at the same instant. We assume that in practice the verification is done based on the following: 1) on a local planar surface model instance given by its pose and the size, 2) on a finite number (but at least two) intensity images from different viewpoints but with overlapping visual fields, and 3) on known model projection parameters (binding the model and image coordinate systems). 2

3

The Consistency Measure

Of all consistency measures, we consider the following three possibilities. They differ in the degree of invariance to the radiometric image distortion: The Sum of Square Differences averaged over image pairs assumes that only identity transformation of intensity values acts among the images. This may be too restrictive in certain applications. The Standard Correlation Coefficient [3] is invariant to a linear transformation acting on image values. It is computed as a mean of all pairwise correlation coefficients and has a range of [−1, 1]. It allows to suppress the influence of small deviations from surface non-Lambertianity. A Rank Correlation [2] is more general since it enables any monotonic transformation among the images. We use Spearman’s Rank Concordance [2]. Let Rij be the rank of point i among the values collected P in image j, the Rij ranges from 1 to the total number of measurements n. Let Ri = m j=1 Rij be the sum of ranks of the same point over all images. It is easy to prove that the mean value ¯ = 1 m(n + 1). Kendall’s rank concordance K is of Ri is R 2 K=

S Smax

,

where S =

n X

¯ 2 (Ri − R)

i=1

and Smax =

n X

¯ mi − R

2

.

i=1

The value S is the variance of Ri computed over all points and the Smax is the maximum possible value for S. The Spearman’s Rank Concordance we use, is calculated from K by re-normalization: mK − 1 C= . m−1 The range of C is [−1/(m − 1), 1]. Note that all image values calculated from all m images play a symmetric role, the C is not computed as a mere average over all image pairs. The invariance to monotonic image transformations is paid for by the loss of discriminability. More general transformations of image values are possible to assume; the corresponding consistency measure would be based either on relative mutual information [3] invariant to all 1:1 mappings or by explicit parametric models. These choices will not be discussed in this paper. As will be seen in the next section, however, the actual verification procedure can be made independent on the choice of the measure.

4

The Statistical Decision Procedure

The goal of verification is to decide for each model independently if its images are mutually consistent based on the selected consistency measure C. The problem is posed as a statistical test in which image brightness is considered as random variable of unknown distribution. Given an r-th local model Mr , the data for the statistical test is obtained as follows: First, a set of n points is randomly selected on the surface of Mr . Second, all the points are projected to m images using camera projection matrices. Third, the m × n image intensity values are collected by using bicubic image interpolation to obtain a data matrix from which consistency 3

Histograms and Cumulative Histograms

Model 3

1

Model 2

Model 1

Prior Consistency 0.85 Cumulative Histogram Percentile 10 %

0.8

0.6

0.4

0.2 10 %

0

0

0.029

0.2

0.4

PLIC3

0.6

0.65

0.8

0.85

0.92

(a) Typical histograms of consistency distribution of three local models.

1

(b) Local model collection.

Figure 2: Of the set of local models representing the observed object (a tea pot) shown in (b), the selected three have typical properties: Model 1 lies correctly on the object surface in the middle of the tea pot body, Model 2 is located near the self-occlusion boundary, and Model 3 is an outlier. Their respective consistency histograms are shown in (a). Based on them, Model 2 (Cα = 0.65) and Model 3 (Cα = 0.029) are rejected, since their α percentiles are smaller than the threshold (Cp = 0.85). The percentile is chosen as α = 0.1. The Model 1 is accepted by the test (Cα = 0.92).

C1 is computed. The three steps are repeated in k trials and the computed consistency values C1 , C2 , . . . , Ck are binned in a histogram, as in Figure 2(a). Using the histogram, the model is accepted based on a statistical test at a given confidence level α if the probability that the real (unknown) correlation C is smaller than the verification threshold Cp becomes smaller than α: P (C ≤ Cp ) ≤ α. If the cumulative histogram value Cα corresponding to the given confidence level α exceeds the verification threshold, i.e. if Cα ≥ Cp , the model is accepted, otherwise it is rejected. This approach does not require to know the statistical distribution of the consistency estimate. See Figure 2 for an explanation of the procedure. The decision procedure requires four parameters, all of them have a clear meaning, see Figure 3. The Cp is a design parameter whose value must be chosen based on factors like the image signal-to-noise ratio. The k and n are to be chosen too, but their values are not critical. Note that our procedure differs from the standard implementation of the statistical dependence test that works by inverting the decision of an independence test. The difference between the two approaches is illustrated in Figure 4. The dark histogram in Figure 4(a) corresponds to consistency measure histogrammed over a col-

Verification threshold Cp Confidence level α Number of measurements n Number of trials k Model pars Camera pars Images

Verification

Decision

Figure 3: Verification procedure.

4

n

+ −

lection of local models reconstructed from the images of a real object. Note the histogram mode is very close to the value of C = 1 and that it tails off as approaching the zero consistency. The verification threshold is selected based on the light-gray histogram, which corresponds to consistency computed under the hypothesis of independence by numerical procedure drawing k samples of size n from two independent random number generators. The width of this histogram is inversely proportional to n (we used Spearman’s concordance whose statistical distribution is independent of the distribution of the random variables under statistical independence). The histogram can be precomputed in advance for a given n and a very high k. Its upper α-percentile is supposed to serve as a threshold for the verification decision. But a very high value must be chosen (e.g. α = 0.99999) to obtain a reasonable threshold. Clearly, the confidence level that needs to be chosen for a given type of images fails to have an intuitive meaning. This is the reason why we did not follow this approach. ,93

6

,96 ,99

5

5

4

4

3

2

1

1

−0.2

0

0.2 0.4 Spearman Concordance

0.6

0.8

0

1

(a) Majority of models is consistent.

,96 ,99

Percentile Thresholds Independent Variables Real Data

3

2

0

,93

Percentile Thresholds Independent Variables Real Data

% of Data Points

% of Data Points

6

−0.2

0

0.2 0.4 Spearman Concordance

0.6

0.8

1

(b) Majority of models is inconsistent.

Figure 4: Inadequacy of the statistical independence test for the verification decision. The dark histograms are computed for a set with a majority of consistent (a) and inconsistent (b) models, respectively. The light gray histograms represent consistency estimate distributions under the hypothesis of independence. They are used to select the verification threshold Cp . But even if a very tolerant percentile is chosen, e.g. γ = 0.999 (here shown by the leftmost vertical line 0.93 ), the threshold obtained (Cp = 0.13) is of little practical use.

5

Experiments

To test our verification procedure we used data obtained from stereo reconstruction process. Four synchronized and calibrated cameras were used to capture a ceramic teapot from the distance of approximately 60 cm and human faces from the distance of about 110 cm. The scene was illuminated by random texture pattern from uncalibrated texture projector, see [4] for the setup description. We tested our verification algorithm on two different applications: outlying local model rejection and multiscale local model reconstruction. 5

(a) Before verification.

(b) After verification.

Figure 5: Local model verification, the teapot data. A collection of 2417 local models, most of which are outliers, was reconstructed from an unorganized point set obtained from stereo vision (a). Only the set of 410 local planar models consistent with the input images was accepted by the verification procedure (b).

5.1

Rejection of outlying local surface models

Surface model is obtained by fitting a local plane model to unorganized point cloud reconstructed from stereo vision. An example of the result is given in Figure 5(a). See [4] for a description of the stereo reconstruction process and the local model fitting procedure. Note that the local fitting process introduces orientation (up to a sign) to the model which is already a local interpretation of the point cloud structure and infers thus something not explicitly present in the data. The task is to verify whether the recovered model position and the orientation hypothesis are correct. In the first experiment we describe here, a tea pot was placed in front of a distant background. In the reconstructed collection of local planar models, many outliers were recovered due to stereo matching artifacts (mismatches). Any local model that does not correspond to a continuous surface patch of the observed object should be inconsistent with the input images and should therefore be excluded from the collection. The verification result is shown in Figure 5, where we can see that almost none of the outliers survived. The following verification parameters were set: Cp = 0.9, α = 0.1, n = 80, k = 100. In the second experiment, a polynocular set of human face images was processed by stereo. The result of local planar model reconstruction is shown in Figure 6. In this experiment there are fewer outliers in input data because the point cloud from which the local models were reconstructed underwent already pointwise verification based on error propagation [4]. 6

(a)

(b)

Figure 6: Local model verification, the faces data. Input data are shown left, the verified models are shown right.

In Figure 6 we can see that many local models were rejected in hair, since its appearance cannot be locally explained by a planar model. The local models recovered on the glasses frame are rejected too because they are larger than the frame width; avoiding this would require recovering the models at much smaller scale. The models on the side of the nose and around the self-occlusion boundary were rejected because of simple implementation of the verification procedure that does not correctly treat the case when the local model is invisible in some cameras. This is a topic for further development. 5.2

Multiscale reconstruction

In the experiments shown above, the local model reconstruction works by grouping unorganized point cloud to local structures of given size. The smallest recovered geometric detail, the image noise level, and the acceptable data set size must be considered when choosing the optimal value for the detail size, see Figure 7. This experiment Figure 7: The scale selection problem. goal was to investigate whether the local model The left model captures fine details even in reconstruction could be driven by the image con- highly curved regions but consists of 2336 losistency so as to reconstruct small local models cal elements (of size 4 mm). The right model consisting of only 266 elements (of size 10 mm) in highly curved regions and large models in less smoothes the details out. curved regions. This approach does not necessarily require any new parameters except for those used in the verification. The results of a simple experiment on multiscale local model reconstruction is shown in Figure 8.

7

Figure 8: Multiscale reconstruction. The results for the teapot, and for the faces data. The teapot, for example, consists of 4965 local models of the size varying from 1.1 mm to 37 mm. The holes in the body of the teapot are due to the presence of specular reflections.

6

Conclusion

We have studied the problem of verification of local surface model in a small set of intensity images. The verification is based on mutual consistency of the images of the model. Several image consistency measures were overviewed. Each of them is invariant to a different class of radiometric image distortions. An algorithm for testing statistical dependence among a set of random variables was proposed. The algorithm is independent on the choice of the consistency measure. The verification method was tested on real data to demonstrate its utility for 3D model reconstruction from stereo. It was shown that it is very efficient in detecting gross outliers in the recovered local model collection and it was suggested how it could be utilized in multiscale local model reconstruction. The most important problem that remains to be solved is the invisibility of the model in some of the verification images.

References [1] C. Baillard, C. Schmid, A. Zisserman, and A. Fitzgibbon. Automatic line matching and 3D reconstruction of buildings from multiple views. In ISPRS Conference on Automatic Extraction of GIS Objects from Digital Imagery, IAPRS Vol.32, Part 3-2W5, September 1999. [2] M. Kendall and J. D. Gibbons. Rank Correlation Methods. Edward Arnold, 1990. [3] A. Papoulis. Probability, Random Variables, and Stochastic Processes. McGraw-Hill International Editions, 3rd edition, 1991. ˇara. Accurate natural surface reconstruction from polynocular stereo. In Proceedings NATO Advanced [4] R. S´ Research Workshop: Confluence of Computer Vision and Computer Graphics, Ljubljana, Slovenia, August 1999. Kluwer Academic Publishers. To appear. [5] V. Z´ yka. Recovering accurate geometric surface model from passive stereo vision. Technical Report K335ˇ 98-153, CVUT FEL, dep. of Control Engineering, Praha, Czech Republic, January 1998.

8