The In uence of Radiometric Correction on Stereo Matching Gerda Kamberova and Ruzena Bajcsy E-mail:
[email protected],
[email protected] Department of Computer and Information Science University of Pennsylvania Philadelphia, PA 19104
Abstract
We present preliminary results con rming the importance of radiometric calibration in multicamera applications. We review systematic and random noise in the video sensor (camera/frame grabber). The manifestation of background noise in dark images, and xed pattern noise in at elds are illustrated. Through a at- eld correction of an uniform intensity stereo image pair an improvement of 26% to 59% in stereo matching is demonstrated.
1 Introduction
The need of methodology for characterization of vision algorithms has been recognized [7]. When multiple sensors are used in applications requiring hard performance guarantees, correcting for errors and obtaining objective con dence measures for the uncertainty of the results cannot be neglected. Empirical tests, validation, and analysis of robustness of existing systems are necessary [18]. The use of vision algorithms with physically dierent sensors, necessitates the evaluation of the performance of the algorithms given the parameters of the sensors. Our short term goal is to understand the important characteristics of the video sensor, and to show how this knowledge can be used in a calibration procedure which improves signi cantly the performance of vision algorithms in multiple camera applications. In the paper we: (i) present parameterization of the vision acquisition system and the uncertainty in the digital image; (ii) give a procedure for a radiometric correction of a video sensor; (iii) use a disparity map computation as an example of multiple camera algorithm where such calibration is important; and (iv) support this claim with test experiments and evaluation. We want to identify the minimal set of experiments and calibration steps one should use in order to \equalize" jointly multiple video sensors. Our nal goal (and object of current research) is to present a digital image model together with a camera error measurement model. We will use these models for performance evaluation and characterization of subsequent vision algorithms (in particular, stereo reconstruction).
1.1 Relevant literature
An indepth treatment of the video signal formats can be found in [6, 17]. A short and clear introduction is given in [12, 2]. Charge-coupled devices and solid state imaging arrays have been analyzed by designers. Test procedures for their evaluation with the use of precise measuring equipment is discussed in [4, 22, 10, 11]. Various sources of geometric distortions in the video sensor are analyzed in [12] where the term videometrics is coined. In [13] a proposal for determining horizontal scale parameter and the linejitter prior to calibration is presented. A comprehensive discussion of linejitter problem, its sources and detection, is given in [12, 2, 3]. The computer vision community has long ago recognized the importance of the geometric calibration in accounting for geometric distortions. The work on radiometric calibration and noise models is limited [8, 3].
1.2 The organization of the paper
In Section 2 we review the video sensor operation, and the geometric and radiometric distortions in the digital image. Systematic and random errors are illustrated in dark images, and spatial nonuniformities of response to a constant stimuli is illustrated in at elds. In Section 3 we give a procedure for a radiometric correction ( at- eld correction) which we use to \equalize" the physically dierent sensors. In Section 4 we give the problem of a disparity map computation in stereo as an example of multicamera application. Test results are presented which demonstrate a nontrivial improvement in the disparity map computation if the cameras are radiometricly calibrated. In the last section conclusions are drawn and future research directions speci ed.
2 The video sensor The video sensor is a camera plus a frame grabber. 2.1 Principles of operation
The imaging process with a CCD camera is based on the physical principle of converting photons into a measurable quantity (voltage). The incoming light from the scene is collected by the lens, and directed onto the photosensitive array of identical Sensing ELements (sels [12]). The photons generate charges. For
2.2.1 Camera related parameters and noise
each sel, the charges generated during the exposure time are collected into a charge packet. When suitable clock voltages are applied a \moving array of potential wells" is created which stores and transfers the charge packets. During transport, rst, the charge packets are transferred, in parallel, along vertical CCDs toward a horizontal CCD shift register. From this register, the charge packets are shifted to the output stage where they are measured (converted to voltages) and ampli ed. The image array content is transferred lineby-line to the horizontal shift register, and from there, a line is pumped out, sel-by-sel, at time periods set by the camera pixel clock. The camera output is an analog signal. A frame grabber works simultaneously with the CCD camera. It reads in the analog signal, usually in composite video format. This format consists of image content periods (corresponding to the content of a single line) separated by horizontal synchronization pulses (HSYNC). A single image is called a frame. It consists of two elds { one of all odd number lines and one of all even number lines. The transmission of the image from camera to frame grabber is in elds. The elds are separated by vertical synchronization pulses (VSYNC). Two consecutive lines in a digital image are actually temporally spaced { the image is interlaced1. The frame grabber samples the the analog signal at a sampling frequency set by its A/D converter. The sampled voltage levels are converted to integer gray values. The frame grabber reassembles the 1D signal into a 2D digital image. Following again [12], we call an individual Picture ELement of the digital image a pel. In the text, we use pixel to denote sel or pel, which one it will be clear from the context.
The quantum eciency (QE) characterizes the physics of the charge accumulation process. It is the ratio of collected electrons to incident photons. The QE is dependent on the technology used and the wavelength of the light. A basic limitation of the performance of the CCD is the eciency with which a charge packet can be transferred from one potential well to the next (with minimal addition or loss of charges). It is necessary that any charge packet passes through the CCD structure in a time period so short that the amount of additional charge picked on the way is minimal. (This limits the size of the CCD registers, and the clock frequencies). On the other hand, the limited time for transport and the trapping of charges by \surface states" results in loss of charge during transport. A way of minimizing the loss due to trapping is to keep the potential wells always semi-empty (fat zero is introduced), so charges from passing packets will not be trapped. For high-performance contemporary CCDs the loss could be 7 electrons for 4096 transfers. A charge transfer eciency (CTE) of 0.9999995 is reported [11]. The spatial resolution is the ability of the sensor to discriminate between closely spaced points in the image, and the temporal resolution the ability to resolve temporal variations in the incoming signal. The spatial resolution depends on the geometry of the imaging array (number of sels, their shape, and organization in the imaging array). The minimal time necessary for the CCD to collect enough charge so that the signal raises above the read noise2 puts a bound on the temporal resolution. The dynamic range is a derived parameter de ned as the ratio of the full-well capacity (saturation level) to read noise. It speci es the range between the brightens and darkest levels within a single image. The saturation level dependent of the CCD technology (material properties and pixel size). The number of the intensity levels in the dynamic range gives intensity resolution of the camera. The total random noise3 of the CCD has three major components: photon (shot) noise, read noise, and xed pattern noise. The total noise is dominated by the read noise at low illumination levels, by the xed pattern noise at high levels, and by the photon shot noise in between [10]. The, photon (shot) noise is related to the quantum nature of light, the natural variation of the incident photon ux. The total number of photons emitted by a steady light source over a time interval varies according to a Poisson distribution. Any \shot noise limited" process exhibits Poisson distribution. An example of the derivation of the distribution could be found in [1]. Because the process is Poisson, rms error(root mean square) of the photon shot noise in electrons is equal to the square root of the mean signal. The shot noise is always present in the data.
2.2 Parameterization of the vision system and its uncertainties
We review parameters and factors which characterize the video sensor (camera and digitizer) and limit its performance. Along with this, we point out geometric and radiometric uncertainties and discrepancies in the digital image. These are due to the optics, the CCD camera, the joint operation of the camera and the digitizer, and to the discretization process. The geometric distortions related to optics has been analyzed extensively [20]. Discretization eects are in the center of the signal processing research [9]. Both of these are important, but out of the scope of our paper. Any undesired factors which cause discrepancies in the output signal we consider noise. The noise may be deterministic, systematic, or random. We focus on systematic and random noise which originate in the CCD camera and the frame grabber. We want to study all noise components. This paper shows that not only random but also systematic errors have to be accounted for.
1 Other modes of transmission are possible, but we mention only interlacing since it is most common for the \general" purpose cameras we looked at.
Read noise is reviewed lated in Section 2.2.1. In the literature often the term noise is used to mean the amount of noise itself and the variance of the noise. 2 3
2
The read noise is one of the most important characteristics of the sensitivity of the CCD. Factors contributing to the read noise are: background noise, trapping noise, reset noise, charge transfer noise, and output ampli er noise4 . The output ampli er noise is the de nite lower bound on the read noise. Background noise and trapping noise are primary technology related, while reset noise and ampli er noise are much more related to design of the device and signal processing after the CCD [22]. The background noise has three main components: dark current, optical or electronic fat zero, and luminescence in the device. Random motions of atoms within the semiconductor caused by heat even at normal room temperature generate dark current. These thermal charges are added to the charge packets and cannot be distinguished from the photo-generated ones. In general, dark current doubles with every 8 increase in temperature [4]. Generation of dark current is a random process modeled by Poisson distribution. Fat zero is introduced to aid the charge transfer ef ciency and the consistency of the quantum eciency. Optically generated fat zero follows Poisson distribution. If the fat zero is electrically introduced, on input, it is less than shot noise. The internal luminance in the CCD device may have variety of sources. One source is the clocking of the voltages to the gates which control the potential well levels. It is manifested in an exponential decline in average line intensity could be detected. (This is con rmed by our experiments with dark images.) The phenomena is explained by generation of long wave photons by the clocking of the register, which photons get absorbed in the lines close to the register, thus increasing the background charges [10]. Statistically this noise is modeled by Poisson distribution. Second source of luminance is diusion. It is related to input-output mechanisms. This phenomena explains the \radiation" of light in the CCD dark image from the position of the output ampli er. The most damaging source of luminance are blemishes. These are single sels that get saturated fast. This is a result from defects in the sel gates[10]. Trapping noise is caused by random variations in the \trapping" states of CCD: charges get trapped, and also they are kept trapped for some random period. This type of noise is also very much technology dependent, and for the so called buried-channel CCD is very low, on the order of 5 electrons [14]5 . The ampli er noise is associated entirely with the output stage. It may have two components: a white noise (due to thermal generation) and a component introduced by interaction of charges and \traps" present in the transistor channel. By good manufacturing, this noise can be reduced substantially, \typically about 6
Image A
Image B
100
row number
row number
100 200 300 400
200 300 400
100
200 300 400 column number
500
100
Statistics for image A
200 300 400 column number
500
Statistics for image A
16.5
18
17.5 16
avg intensity
avg intensity
17
16.5
15.5
15
16
15.5
15 14.5 14.5
14 0
100
200 300 column number
400
500
14 0
100
200 300 row number
400
Figure 1: SONY XC-77RR/DT1451: Image A is a single dark intensity image, and B is a temporal average of 100 dark images. electrons or less" [14]. When a charge packet arrives at the output node, it produces a voltage change. To measure the voltage of a charge packet a reference voltage level is needed. The readout capacitors are reset to a nominal voltage level at each readout cycle. The reset noise relates to the uncertainty this voltage level. This noise is effectively removed by a electronic processing technique called correlated double sampling [14]. Dark current can originate at dierent locations in the CCD but has, in all cases, to do with the irregularities in the crystal structure of the silicon (metal impurities and crystal defects). This gives rise to the so called xed pattern noise. It contributes to the nonuniformity of the individual sel responses to a scene of uniform brightness (imaged over the whole chip). We describe some experiments which illustrate the background noise and the xed pattern noise.
2.2.2 Dark images and background noise
Manifestation of the background noise can be observed in the dark images. A dark image is an image taken with no access of light to the video sensor. First the camera is warmed up, next images are taken with a tight, opaque cap on the lens. Figure 1 shows a typical dark image for the con guration of black and white camera SONY XC-77RR and a frame grabber DT1451. Speci cations from the data sheet of the camera: 493(V)x768(H) sels; pixel clock frequency 14:318MHz, within 1% error. For the frame grabber DT1451: eective digital image size 480(V)x512(H) pels; sampling frequency about 10MHz. In image A, Figure 1, the dark current random noise is visible. Observe internal luminance. Systematic component in the background noise is prominent in image B. The vertical stripes are due to mismatch between the camera pixel clock and the sampling frequencies, and
4 In the literature, there is no agreement on these names. We follow de nitions from [10], [4], [14], and [16]. 5 One should keep in mind that in astronomy applications, where very low level signals have to be detected, high performance digital cooled cameras are usually used. We assume that the numbers cited are for digital cameras.
3
8 6 4
5 4 3 2 1 100
200
300
400
500
3 2.5 2 1.5
100
200
300
400
500
100
200
300
400
500
600
.
5 4 3 2 1 100
200
300
400
500
35
4 2
30
200
400
Figure 2: Top to bottom:(i) SONY
600
0
;(ii) SONY
XC-77/DT1451
200
400
;(iii) SONY
XC-77RR/DT1451
XC-77RR/TIM-40
nonmonotone behavior of the averaged intensity by row for case (iii) (in the dark image itself alternating wide horizontal stripes were observed). One possibility is the DC-restoration process in the frame grabber (see Section 2.2.4). In another test with the same con guration SONY XC-77RR/TIM-40 (but physically dierent camera and dierent digitizer form the ones presented here) a phase pattern could be observed (see Figure 3). In [3] similar phase phenomena is reported but again the source is not located. Note that each con guration has its own dark image \signature" expressed by the systematic components in the background noise (the oset in average intensity form 0 varies for physically dierent sensors, even of the came con guration). Another phenomena worth mentioning is the presence of \ghosts" in some dark images we gathered. In some of the temporally averaged dark images: for SONY XC-77RR/DT1451 a checker-board pattern was added to the periodic vertical stripe pattern; for SONY XC-77RR/TIM-40, a brighter rectangle could be noticed even in 10 temporally averaged images (see Figure 3). For SONY XC-77RR/DT1451 case we recognized an image of the target which we usually use for geometric calibration. The image sensor was exposed to the target at least an hour prior to the acquisition of the dark images! The brighter rectangle for SONY XC-77RR/TIM-40 results from the scene of a white piece of paper on black background to which the camera was exposed prior to taking the dark images. When over-exposed, an image is \remembered"
\cross-talk" from clocking of other electronic components. Figure 2 shows statistics for single dark images for various con gurations we looked at. In the left column the variable on the horizontal axis is position, and on the vertical axis is intensity. The two curves: averaged intensity by column, and average intensity by row are plotted. The decline in the averaged intensity by column due to the internal luminance is noticeable (another component which contributes to this decline is the DC-restoration problem which originates in the frame grabber, see Section 2.2.4). For con gurations (i) and (ii) the average intensity by column curve shows the periodicity due to mismatch in the pixel clock frequencies and the sampling frequencies. For (iii) the averaged intensity by column is an almost a horizontal line (slowly decreasing), for this con guration camera pixel clock and sampling frequencies are almost the same. The high frequency components in horizontal direction for (i) and (ii) are clearly visible in the corresponding graphs of in the right column which show the power spectra of the averaged intensity by column. Regarding the behavior of the curve of average intensity by row: in cases (i) and (ii) it is represented by the \pepper like" trail going through the periodic stripes of the averaged intensity by column curve; and for case (iii) it is nonmonotone graph going across the line representing the averaged intensity by column. In all three of the cases, the low number lines, up to 80-100, have higher values as explained earlier. At the moment we do not have an explanation for the 4
A temporal average of 10 dark images
50
100
100
200
200
300
300
400
400
100 150 200 250 300 350 400
100
100
200
300
400
500
600
700
% of data points
Figure 3: SONY XC-77RR/TIM-40: The residual image of the rectangle is visible in the lower left part of the images.
200
300
400
500 1.6
1.4
1.4
1.2
1.2
1
0.8
300
400
500
1
0.6
0.4
0.4
0 200
200
0.8
0.6
0.2
on the chip for hours: quantum eciency is raised arti cially. In both cases the source of the ghost image were charges trapped during use of the sensor prior to taking the dark images. A \trapping state" may remain lled by a charge for a while. \the eective QE for subsequent images will be increased because less trapping of newly generated signal electrons will occur"[10]. To clear up such residual images, some cameras have a \clear" command, in our case simply switching o the camera removes the residual image. Observe also that in Figure 3 there are no high frequency components in the image, in this con guration camera pixel clock and sampling frequencies are almost equal (there is one-to-one correspondence between sels and pels).
100
1.6
% of data points
450
0.2
210
220 intensity
230
240
0 200
210
220 intensity
230
240
Figure 4: SONY XC-77RR/DT1451: averaged at eld approximations and their histograms (scaled). to re ections from the square aperture in front of the sensor (a part of the camera architecture).
2.2.4 Frame grabber related noise
There are two possible types of geometric discrepancies which could arise due to the frame grabber: (i) the individual lines in the digitized image are not aligned properly, this is termed linejitter (it is a result of the failure of the frame grabber to detect the HSYNC); and (ii) the digitizer undersamples or oversamples the \locked" line (the sampling frequency is dierent from the camera pixel clock frequency) and thus a scaling factor for conversion between pixels and sels is introduced. This parameter, horizontal scale, is one of the objects of geometric calibration. A geometric distortion in the digital image which relates to VSYNC detection is that the top, up to 100, lines in the image have relatively higher jitter. An eect in radiometric distortion from interlacing is the shift in gray level between odd and even elds. Yet another source of radiometric distortion: in the process of digitization, the digitizer has to be able to restore the zero reference level from which to measure (this process is called DC-restoration); failure to detect the reference level correctly results in shift in the gray values, and thus radiometric distortions. \The fall-o of the sample-and-hold mechanism used in many DC-restoration circuits" leads to uniform (for all images with this con guration) component in the background and could be easily removed [3]. A detailed study of dierent frame grabber architectures and synchronization mechanisms, and related geometric and radiometric distortions is given in [3]. In [2] three dierent methods for detecting and accommodating for linejitter are proposed. Two of them require very specialized equipment and measuring procedures. The most appropriate method for general computer vision applications is the plumb line method [5] in which
2.2.3 Flat elds and xed pattern noise
We will illustrate the intrinsic nonuniformities of the sels. An image, which is a response of the sensor to a scene of uniform brightness (preferably close to saturation level) is called a at eld. It is used in radiometric calibration. Obtaining a at eld in a standard engineering laboratory is nontrivial task. We investigated dierent methods of approximating a at eld. Here we present two independent experiments regarding two instances of the con guration SONY XC-77RR/DT1451: we used in both cases the same physical frame grabber but two dierent cameras. The
at eld approximations for each video sensor were obtained without the lens, but with a diusing white glass lter. The indirect illumination was provided from 3 distributed incandescent light bulbs. For two instances of the camera/frame grabber con guration SONY XC77-RR/DT1451 Figure 4 shows the temporal average of 100 at elds and their scaled histograms. The images have been transformed via histogram equalization just for the purpose of enhancing the display. The at eld approximations for each sensor were obtained independently, under xed illumination level. The acquisition of the at elds without the use of a lens removes any nonuniformity which could have risen from lens vignetting (fall o in intensity from the center of the image to the boundary). On the other hand a bright square frame can be noticed in the at eld images. We believe that it is due 5
an image of vertical stripes is taken, and the linejitter and radial distortions due to optics are estimated together. In [13] a method for estimating linejitter based on Fourier analysis is presented, it is appropriate for cases in which it is proven that the jitter is no more than a pixel. One of the most noticeable manifestations of distortions due to the camera/frame grabber interface is a systematic error component in the background noise of the dark images as discussed earlier due primary to the mismatch between the frequency of the camera pixel clock and the digitizer sampling frequency.
200
200
300
300
400
400 200
300
400
500
100
6
6
5
5
4
4
% of data points
% of data points
The eect of spatial discretization is a main objective in signal processing. Severe distortions (Moire-eect) occur when the sampling theorem6 is violated. The sel spacing puts limitation on the highest frequency in the input beyond which severe distortions occur, and the area of the CCD chip limits the lowest frequencies which can be detected. A type of radiometric uncertainty is captured by the quantization error. It results from the analog to digital conversion of the signal. There are dierent schemes for conversion. Usually, all gray values are considered equally probable, and the distribution of the quantization error is assumed to be uniform over the interval [? 21 q; 12 q], where q is the analog to digital quantization unit. Under this model the variance of the quantization error is q2=12. The value of q depends on the dynamic range of the CCD camera and the bit depth of the digitizer.
3
2
1
1
5
10 intensity
15
0
20
200
300
400
500
3
2
0
5
10 intensity
15
20
Figure 5: SONY XC-77RR/DT1461:averaged dark images and their histograms (scaled) For a speci c image I, compute the pixel-wise dierence B = I ? D. (6) For every pixel i, the corrected value pc is computed by pc = (b =a )m. i
i
i
i
4 Disparity map computation | an example of multicamera algorithm
Here we present an example of multiple camera algorithm where radiometric correction is important. We study the in uence of a radiometric correction of the digital images on a stereo matching algorithm. The disparity map is the rst milestone in the algorithm, so we looked at comparisons at that level. For a given stereo pair of images we computed, rst, the disparity map; second, we computed the at- eld corrections for the original pair; third, we computed the disparity maps based on the uncorrected and on the
at- eld corrected pairs; last, we evaluated the disparity maps.
3 A radiometric correction procedure for a single sensor
The variation in dark images and in at elds for physically dierent sensors clearly shows that sensors are not \equal", contrary to the assumptions made in many multisensor vision applications. Given, even, absolutely the same scene and illumination, due to all the factors we discussed in section 2, physically dierent the cameras \see" dierently. The radiometric calibration procedure by which individual pixel values are corrected is called at elding or shading correction [16]. We use the following at eld correction procedure to correct for nonuniformities due to systematic errors from background noise and xed pattern noise: (1) Obtain an average dark image D. (2) Obtain an average at eld F at a level close to saturation. (3) Compute the pixel-wise dierence A = F ? D. (4) P Compute the average intensity, m, of A, i.e., m = =1 (f ? d ), where n is the total number of pixels in the digital image. Steps 1-4 are executed once for the given sensor. Images acquired with the sensor will be corrected using m and A. The correction steps are as follows: (5) i
100
100
2.2.5 Discretization related noise
n i
100
4.1 Experimental setup
Con gurations SONY XC-77RR/DT1461 was used. The two cameras are connected to the same physical frame grabber, each camera is equipped with 25mm lens. The pair of cameras are approximately 80cm apart and at a verging angle of approximately 40 . The cameras were warmed up for couple of hours, then they were geometrically calibrated and intrinsic and extrinsic camera parameters (using least squares) were recovered. Averaged dark images and at elds were obtained several days prior to the experiment (this has to be just done once, for a xed physical sensor): this correspond to steps 1-4 in the at eld correction procedure. For at eld acquisition caution should be paid to eliminate a possibility of image residues. Figure 5 shows the temporal average of 100 dark images and the corresponding scaled histograms for the two average images. The at eld approximations are the ones given in Figure 4. When the distributions of the noise
i
6 The input signal can be fully reconstructed from the samples if the input signal frequency, f , is at most half of the the sampling frequency.
6
14
12
12
10
10
8
8
6
6
4
4
2
2
0 −2
0 −2
−1
0 1 residuals value
The experiment was repeated with at eld pairs at four dierent average levels, approximately 60, 110, 160, 220. Subpixel disparity maps without and with
at- eld correction were computed based on these, planes were tted to each one, and statistics of residuals were compared.
Corrected 16
14
% of data points
% of data points
Uncorrected 16
2
3
−1
0 1 residuals value
2
A monotone increase in the dominance of the performance of the matching algorithm with at- eld correction over the matching without at- eld correction is observed:, from 26% for the low intensity level at eld (level 60), to 59% for the matching with at eld correction at level 220.
3
Figure 7: Histograms the of residuals
5 Conclusions
sources for the individual sensors are empirically con rmed, the sample size of the images to be averaged could be adjusted accordingly. In any case, the averaged dark image and at led acquisition is one-time, o-line process. The target was a planar white card, Lambertian re ectance surface. The card is approximately parallel to the vertical image axes of each camera. The illumination was controlled so that no saturation occurs. A stereo pair of raw images was acquired. The matching uses normalized cross-correlation [15], and match selection method forbidden zone constraint [23]. To account for the suppression of high frequencies due to low pass ltering process of the image (in camera and digitizer) a constant gain k is used in the calculation of image derivatives during the subpixel disparity map computation (see below). The gain is constant for a xed pair of cameras [19].
We show that accounting for the radiometric variation in the measurements of multiple sensors and \normalizing"the digital images prior to matching improves signi cantly the results in the computation of the disparity maps for uniform brightness scene. This con rms the importance of the radiometric calibration, so we intend to address the problem in its full complexity. We plan to use the procedure to help stereo matching locally (at palaces where it fails due to uniformity of the patches). We continue our work on deriving an image model for the radiometric distortions and error measurement models we will use in multisensor vision applications for: (i) \equalizing" the output from dierent sensors; (ii) estimating the random noise error statistics in the digital images; (iii) propagating the errors through the vision algorithms and deriving a quantitative measure for their performance.
4.2 Tests conducted
Acknowledgements
First, a at- eld correction was performed on each of the raw images. Second, a subpixel disparity map based on the recti ed original stereo pair images was computed (we call this \uncorrected" disparity map). The gain constant k was tuned to an optimal value, 0.71 in this case [19]. Third, using the same recti cation and matching algorithms, we computed a subpixel disparity map on the recti ed at- eld corrected stereo pair of images (we refer to this map as \corrected"). The gain k = 1, no tuning was attempted. Figure 6 shows the disparity maps as 3D plots. The full size of the disparity maps is 256x256 since the recti cation was done in half resolution. For the purpose of displaying the result, the 3D plots show the subsampled maps at a quarter of the resolution. Some \holes" in the disparity maps can be observed, where the algorithm has failed (matches were rejected). Spikes represent outliers which erroneously were accepted as valid. The \corrected" map less holes and spikes.
This work is supported by, or in part by, the U.S. Army Research Oce under grant number DAAH0496-1-0007, the Defense Advanced Research Projects Agency under grant number N00014-92-J-1647, and the National Science Foundation under grant SBR8920230. We would like to thank Radim Sara for the useful dicussions and comments. Thanks to all colleagues who provided dark images.
References
[1] Y. Beers, Introduction to the theory of error, Addison-Wesley, 1957. [2] H. Beyer, \Linejitter and geometric calibration of CCD cameras", ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 45, pp. 17-32, 1990. [3] H. Beyer, \Geometric and radiometric analysis of a CCD-camera based photogrammetric close-range system", Dissertation, Institut fur Geod`"asie und Photogrammetrie, Zurich, 1992. [4] E. Beynon and D. Lamb,edt., Charge-coupled devices and their applications, McGraw-Hill, 1980. [5] D. Brown, \Close-range camera calibration", Photogrammetric Engineering, Vol. 31, pp. 855866, 1971.
4.3 Evaluation
The two subpixel disparity maps were evaluated by tting a plane model to each one of them. Then the comparison between the two disparity maps was done based on the statistics of the residuals. Figure 7 shows the histogram for the residuals in the two cases. The standard deviation of the residuals, for the \uncorrected" map was 0:1668, and for the \corrected" 0:0679. 7
Subpixel disparity map for original image pair (with tuning)
Subpixel disparity map for corrected image pair (no tuning)
30 30 20 20 10
disparity
disparity
10 0 −10
0 −10
−20 −20 0
−30 0
0
−30 0
50
50
50 50
100
100 150
150
100 100 150
x
y
150
x
y
Figure 6: Subpixel disparity maps shown as 3D plots. [6] H. Ennes, Television broadcasting: equipment, systems, and operating fundamentals, Howard W. Sams & Co, Inc., 1971. [7] W. Forstner, Wolfgang, \10 Pros and cons against performance characterization of vision algorithms", Workshop on performance characterization of vision algorithms, Robin College, Cambridge, 1996. [8] E. Healey and R. Kondepudy, \Radiometric CCD Camera Calibration and Noise Estimation, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 16, No. 3, pp. 267-276, 1994. [9] B. Jahne, Digital image processing: concepts, algorithms and scienti c applications, SpringerVerlag, 1993. [10] J. Janesick et al, \The future scienti c CCD", Proc. of SPIE State-of-the-art imaging arrays and their applications, San Diego, California, Vol. 501, pp. 2-33, 1984. [11] J. Janesick et al, \Sandbox CCDs", Proc. of SPIE Charge-coupled devices and solid state optical sensors V, San Jose,California, Vol. 2415, pp. 2-42, 1995. [12] R. Lenz and D. Fritsch, \Accuracy of videomery with CCD sensors", ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 45, pp. 90-110, 1990. [13] R. Lenz and R. Tsai, \Techniques for calibration of the scale factor and image center for high accuracy 3D machine vision", ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 45, pp. 90-110, 1990. [14] I. Mclean, Electronic and computer-aided astronomy: from the eyes to electronic sensors, Ellis Horwood Limited, England, 1989.
[15] H. Moravec, \Robot rover visual navigation", Computer Science:Arti cial Intelligence, pp. 1315, 105-108, UMI Research Press 1980/1981. [16] Photometrics Homepage, Photometrics High performance CCD Imaging, http://www.photomet.com, 1996. [17] C. Poynton, A Technical introduction to digital video, John Wiley & Sons, Inc., 1996. [18] K. Price, \Anything you can do, I can do better (No you can't)...", Computer Vision, Graphics, and Image processing, Vol. 36, pp. 387-391, 1986. [19] R. Sara and R. Bajcsy, \Reconstruction of 3-D geometry and topology from polynocular stereo", GRASP Laboratory, University of Pennsylvania, work in progress. [20] C. Slama, edt., Manual of photogrammetry, IVth edition, American Society of Photogrammetry, 1980. [21] D. Snyder et al., \Compensation for readout noise in CCD images", Journal of the Optical Society of America, A, Vol. 12, No. 2, pp. 272-283, 1995. [22] A. Theuwissen, Solid-State imaging with chargecoupled devices, Kluwer Academic Publishers, 1995. [23] A. Yuille and T. Poggio, \A Generalized Ordering Constraint for Stereo Correspondence", MIT, Arti cial Intelligence Laboratory Memo, No. 777, 1984.
8