The vision sensors are complex electronic systems and as such exhibit ..... Gq=C, increases). Cooling also reduces the reset noise but it is not that e cient as the.
Understanding the Systematic and Random Errors in Video Sensor Data Gerda Kamberova GRASP Laboratory Department of Computer and Information Science University of Pennsylvania
i
Abstract
The purpose of this report is to help computer vision researchers to understand the video sensor data, and hence, utilize better the data in vision algorithms, and also evaluate correctly and methodically the results of the algorithms. The vision sensors are complex electronic systems and as such exhibit systematic and random errors. Unfortunately important speci cations regarding cameras are not standardly and unambiguously provided by manufacturers. Such speci cations are necessary for some scienti c and engineering applications. In this report we present the major components of the imaging system, and give the main parameters and noise sources. There is no agreement in the literature (and manufacturers' documentation) on the nomenclature, de nitions, measurement units and/or the conditions under which these parameters and noise levels are measured. As a general rule, video cameras are for qualitative imaging (this does not apply for scienti c and special purpose cameras). In order to perform quantitative measurements radiometric correction of the data has to be performed. We use at eld correction procedure, [PH96], to account for systematic errors. We show the positive eect this procedure has on multicamera applications (in particular on disparity map computation). In addition we have also reviewed the noise models and the radiometric correction procedure of Healey amd Kondepudy [HK94].
ii
Contents
1 Introduction 2 The Imaging System Overview 2.1 2.2 2.3 2.4
The Camera : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : The Image Transmission: form the Camera to the Framegrabber The Framgrabber and Analog-to-Digital Converter (ADC) : : : : A Camera Model : : : : : : : : : : : : : : : : : : : : : : : : : : :
3 Parameterization of the Vision Sensor and Its Uncertainties 3.1 Camera Related Noise : : : : : : : : : : : : : : : : : : : : : : 3.1.1 Photon Shot Noise : : : : : : : : : : : : : : : : : : : : 3.1.2 Read Noise : : : : : : : : : : : : : : : : : : : : : : : : 3.1.3 Pattern Noise : : : : : : : : : : : : : : : : : : : : : : : 3.2 Camera Related Parameters : : : : : : : : : : : : : : : : : : : 3.2.1 Quantum Eciency of the CCD : : : : : : : : : : : : 3.2.2 Spectral Responsitivity of the CCD : : : : : : : : : : 3.2.3 Charge Transfer Eciency of the CCD : : : : : : : : : 3.2.4 Minimal and Maximal Signals : : : : : : : : : : : : : : 3.2.5 Spatial Resolution : : : : : : : : : : : : : : : : : : : : 3.2.6 Temporal Resolution : : : : : : : : : : : : : : : : : : : 3.2.7 Signal-to-Noise ratio (SNR) : : : : : : : : : : : : : : : 3.2.8 Dynamic Range of the Camera and the Video Sensor : 3.3 Framegrabber Related Parameters and Noise : : : : : : : : : 3.3.1 Geometric Distortions : : : : : : : : : : : : : : : : : : 3.3.2 Radiometric Distortions : : : : : : : : : : : : : : : : : 3.4 Discretization Related Noise and Distortions : : : : : : : : : 3.4.1 Aliasing : : : : : : : : : : : : : : : : : : : : : : : : : : 3.4.2 Quantization error : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
1 2 2 5 5 6
7
7 7 8 10 10 11 11 12 12 13 14 14 14 15 15 15 16 16 16
4 Examples of Sensor Parameters and Pattern Noise
16
5 Noise Models and Estimation Procedures of Healey and Kondepudy
21
4.1 Background and Pattern Noise in Dark Images : : : : : : : : : : : : : : : : : : : : : 16 4.2 Photoresponce Nonuniformities in Flat Fields : : : : : : : : : : : : : : : : : : : : : : 20 4.3 Parameters Reported in some Manufacturer's Data Sheets : : : : : : : : : : : : : : : 21 5.1 The Image Model of Theoretical Importance : : : : : : : : : : : : : : : : : : : : : : : 5.2 The Image Model Used in the Computations : : : : : : : : : : : : : : : : : : : : : : 5.3 Estimation Procedures : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5.3.1 An Estimation of Total Noise Variance for a Flat Field of \Reasonably" High Level : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5.3.2 Estimation of the ampli er gain and the signal-independent noise variance : : 5.3.3 Estimating FPN in dark images and its variance : : : : : : : : : : : : : : : : 5.3.4 Estimating the factor modeling PRNU | K (a; b) : : : : : : : : : : : : : : : :
iii
21 22 23
23 24 24 25
6 Radiometric Correction Procedures
27
7 A Disparity Map Computation | an Example of a Multicamera Algorithm
30
8 Conclusions
32
6.1 Flat- eld Correction Used in the Disparity Map Experiment : : : : : : : : : : : : : : 27 6.2 The Radiometric Correction of Healey and Kondepudy : : : : : : : : : : : : : : : : : 28 6.3 Radiometric Correction Methods of Beyer : : : : : : : : : : : : : : : : : : : : : : : : 29
7.1 Experimental setup : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 30 7.2 Tests conducted : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 30 7.3 Evaluation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 31
iv
1 Introduction We study the video sensor as a measuring device, in particular, its systematic and random measurement errors. The purpose of this report is to help computer vision researchers: (i) to understand video sensor data, and hence utilize better the data in vision algorithms; and (ii) to evaluate correctly and methodically the results of the algorithms. Our nal goal is to present a digital image model, accounting both for systematic and random measurement errors. These models will be used in the performance evaluation and characterization of subsequent vision algorithms. We will identify the calibration steps necessary for the joint \equalization" of multiple cameras. These experiments and procedures should be reproducible in a typical vision engineering laboratory. In most of the engineering applications in robotics and machine vision it is desirable to guarantee satisfactory accuracy and precision in minimum (real) time at moderate cost. Thus by de ning an appropriate cost functional, a balance between precision/accuracy | monetary cost | processing time has to be struck. The most popular sensor in computer vision applications is the analog charge-coupled device camera (CCD camera/solid-state camera) paired with a framegrabber1. The sensing surface of the camera is an array of photodetecting elements. The imaging process is based on the principle of converting photons into electric current. The main stages in the CCD imaging are: (i) the generation and capture of charges induced by the incoming photons; (ii) the collection of these charges into charge packets; (iii) the transport of all charge packets to an output node; (iv) the conversion of the total charge of each packet into an ampli ed measurable quantity (voltage). The output of the CCD camera is an analog signal. This signal is rst ampli ed by an o-chip ampli er, and then resampled and digitized by a framegrabber. The output of the framegrabber is a digital image2 which is a discrete representation of the scene being imaged (discrete, both, spatially and in intensity). The digital image is stored in a computer memory for subsequent processing by computer vision modules. Digital images are input data to vision algorithms. When the digital images are used as data for inferring accurate and precise information about a real scene it is important that the video sensor (i.e. CCD camera and framegrabber) is geometrically and radiomertrically calibrated. Algorithms for geometric calibration (recovering the intrinsic and extrinsic camera parameters, and their accuracy) have been subject of extensive research in the computer vision community, see [HS93], [Fau93] for a textbook introduction; [DA89] and [WCH92] for a review of papers; and [GM96] for recent results. The work on radiometric calibration and noise models of CCD cameras is limited, [HK94]. Radiometric calibration procedures are studied in videometric applications [LF90], photogrammetry [Bey92] , astronomy [SHL+ 95], [Mcl89], video microscopy [Ino89]. Charged-coupled devices and solid-state imagers are analyzed by designers, and test procedures for their evaluation, with the use of precise measuring equipment, is discussed in [Bar75], [BL80], [JEC+ 84], [JKE85], [JEW+ 95]. For recent, complete, and comprehensive introduction to solid-state imaging and related issues see [The95] and [Hol96]. An indepth treatment of the video signal formats can be found in [Enn71, Poy96]. A short and clear introduction is given in [LF90, Bey90]. When multiple sensors are used in applications requiring hard performance guarantees, correcting for errors and obtaining objective con dence measures for the uncertainty of the results cannot be neglected. The use of vision algorithms with physically dierent sensors, necessitates 1 In the text we use as synonyms frame grabber and digitizer 2 The digital image is a two dimensional array of numbers.
1
the evaluation of the performance of the algorithms given the parameters of the sensors. Empirical tests, validation, and analysis of robustness of existing systems are necessary [Pri86]. The need of methodology for performance evaluation and characterization of vision algorithms has been recognized, [For96], [JB91], but the problem has not been addressed fully and with enough depth by the researchers in computer vision. The transfer of technology necessitates the validation and robustness analysis of existing systems. The limitations of the algorithms should be speci ed too. The eect of camera/framegrabber selection on particular vision algorithms has been explored earlier in some degree at the GRASP Laboratory, University of Pennsylvania: the eect of hardware con gurations on edge nding algorithms is studied in [And88], some appropriate CCD error models are proposed in [BKM86], and the eect on two particular type of framegrabbers to the accuracy of geometric calibration is analyzed in [Vez96]. This report is structures as follows. In Section 2 we give an overview of the vision system. In Section 3 we discuss major system parameters including noise in the camera and the camera/framegrabber interface. In Section 4 we illustrates with examples some of the parameters and noise terms introduced in the previous sections. We also give example of some camera speci cations traditionally provided by manufacturers. From this speci cations, it is evident that a radiometric calibration is a necessary preprocessing in applications requiring quantitative imaging. In Section 5 we review the image models and noise estimation procedures from the paper of Healey and Kondepudy, [HK94]. In Section 6 we present some radiometric correction procedures. In Section 7 we show that radiometric correction improves signi cantly results in multicamera applications (in particularly, disparity map computation in stereo). In the last section we highlight and summarize the report.
2 The Imaging System Overview We present the principles in the organization and the mechanism employed in the operation of the CCD camera. Figure 1 (page 3) represents schematically the major components of the imaging system. On the left side of each component the dierent noise sources which originate in that components are listed.
2.1 The Camera
The incoming light from the scene is collected by the lens, and directed onto the photosensitive elements of the camera. Following [LF90] we call each individual Sensing ELement, sel. All sels have the same geometry and are organized in a rectangular array. The raw data collected at each site is a charge packet which consists of all charges collected at the site during the exposure period. The amount of charge generated depends on the incoming photon ux (illumination) and the physical properties of the sels (quantum eciency, spectral response, for example). The charge packet includes photoinduced charges and noise. The noise has systematic deterministic components and also stochastic components. After the charge packets are collected they have to be transferred to the output stage. There are dierent mechanisms for transporting the charge packets from the site at which they are collected to the output stage where they are ampli ed and converted to voltage. Figure 2 (page 4) illustrates two dierent CCD chip organizations, interline and frame transfers. The chip organization is reported by the camera manufacturers. In the frame transfer organization the sels themselves are also transporting cells, after the collection of the charge packets, all packets 2
LOSS OF INFORMATION AND SYSTEMATIC AND RANDOM NOISE
OPTICS: blur geometric dstortions
LENS
ccd image low pass
CAMERA
A/D DIGITIZER
digitized image
CCD CAMERA: photon (shot) noise; fixed pattern noise (systematic); read noise - trapping - charge transfer - reset - output amplifer - background (dark current, internal luminance) CCD CAMERA AND DIGITIZER: horizontal scale (systematic, due to mismatch in the camera pixel clock and digitizer clock frequencies); line jitter ; quantization error ; systematic noise introduced by the clocking of electronic components
The Output: discretized spatialy and in intensity; noisy .
COMPUTER MEMORY
Figure 1: The imaging system: in { incoming light from a scene; out { a digitized image
3
columns of photosensitive area vertical CCD shift register (shielded from light) horizontal CCD shift register
1
(shielded from light)
1
2
2
3
4
image area
storage area
b) Frame transfer
a) Interline transfer
First move charge packets from image area to storage area, and then from storage to readout
First move charge packets from pixels to vertical shift register, and then, in parallel, to readout
Figure 2: Charge transfer organizations: a) interline; b) frame
4
are shifted vertically till the contents of the imaging array is moved into the storage array (which is shielded from light), and then transported from the storage area to the output. In the interline transfer, next to each column of photosensitive sels there is a column of transporting cells (vertical CCD shift register) capable of holding and shifting a charge packet. Independent of the particular organization, all transport mechanisms are based on the capabilities of charged-couple devices to move eciently a charge packet from one site to a neighboring site. Physically, a CCD is essentially an array of closely spaced electrodes. When suitable clock voltages are applied to these electrodes, a moving array of potential wells is created which stores and transfers the signal information in a form of charge packets. First the charge packets from the rst (or the last) line of the array are transferred in parallel vertically from the vertical shift registers to a horizontal shift register. And the charge packets of all the rest of the lines are shifted in parallel along the vertical shift registers. The horizontal shift register contains one line of the camera image at a time. From this register, the charge packets are shifted horizontal to the output stage (node). At the output the charges are converted to voltages and ampli ed by the on-chip ampli er. After all the packets from the horizontal shift register are pumped out, the next line from the vertical shift registers is moved to the horizontal shift register and the process of shifting horizontally, converting and amplifying is repeated. This way, the CCD array contents is transferred line-by-line to the horizontal shift register and from there, a line is pumped out sel-by-sel, at time periods set by the pixel (output) clock. Tradeos exist in all CCD structures between desire to handle large signals (which are easy to detect) and the need to avoid the large sel dimensions and clock voltages that are required for large charge packets. Essentially the amount of charge that can be handled by a CCD is determined by the device geometry and the clock voltages. Line-by-line, sel-by-sel (at time periods set by the pixel clock of the camera the image is output in analog form3.
2.2 The Image Transmission: form the Camera to the Framegrabber
A frame grabber works simultaneously with the CCD camera. It reads in the analog signal, usually in composite video format. This format consists of image content periods (corresponding to the content of a single line) separated by horizontal synchronization pulses (HSYNC). A single image is called a frame. It consists of two elds { one of all odd number lines and one of all even number lines. The transmission of the image from camera to frame grabber is in elds. The elds are separated by vertical synchronization pulses (VSYNC). Two consecutive lines in a digital image are actually temporally spaced { the image is interlaced4.
2.3 The Framgrabber and Analog-to-Digital Converter (ADC)
The frame grabber samples the the analog signal at a sampling frequency set by its A/D converter. The sampled voltage levels are converted to integer gray values. The frame grabber reassembles the 1D signal into a 2D digital image. Following again [LF90], we call an individual Picture ELement
3 Some processing of the signal after it exits the CCD array and before is exits the camera is usually done (for example, low pass ltering, Gamma control). This processing is not discussed in the report. Gamma introduces nonlinearity and is not desirable for quantitative imaging, it should be o. 4 Other modes of transmission are possible, but we mention only interlacing since it is most common for the \general" purpose cameras we looked at.
5
of the digital image a pel. In the text, we use pixel to denote sel or pel, which one it will be clear from the context. A row of sels is mapped into a row of pels. Ideally, a pel should correspond to a an exact sel. Often this is not the case. Clearly, if the camera pixel clock and the digitizer clock have dierent periods, the number of sels per line in the camera and pels, per row in the digitized image may dier, also the resolution of the camera and digitized image will dier (we explain this later in more detail). In digital cameras the A/D conversion is done in the camera system, so there is one to one correspondence between sels and pels. In this report we will not discuss digital cameras (they more reliable, and advanced, but also expensive { and for the last reason not very popular yet in computer vision applications.)
2.4 A Camera Model
The camera output is analog, in [Hol96] the following model is given,
Vcamera = ne GGC1q ; where q = 1:6E ? 19 coul is the electron charge, G and G1 denote the gains of the on-chip and o-chip ampli ers, respectively, ne is the total number of electrons in the charge packet, and C is
the capacitance of the output node. The device output conversion gain (OCG), Gq C , is measure in units V/electron. \Charge conversion values typically range from 0.1 Ve? to 10Ve? ", [Hol96].A charge packet contains a photoelectrons and also \noise" electrons (thermally generated, induced from cross talk between electronic components, or simply because of physics of the device). The camera output depends on the illumination (the type of source and the wavelength, , of the photons), optics (f-number, aperture, focal length, magni cation), and detector quantum eciency (generated electrons per photon). Figure 3 (page 6) represents the transfer of the signal from the scene, through the detector (characterized by its quantum eciency/ spectral responsitivity), the on-chip ampli er, the o-chip ampli er and A/D converter. The signal is measurable quantity while the number of photoelectrons, Integration time Detector
Output node amplifier
ne
Rq
G
Off-chip amplifier
Vsignal
ADC
G1
Mq Vcamera Analog Output
DN Digital Output
Figure 3: From photon input to camera output( number of electrons, voltage from the on-chip ampli er or from the o-chip ampli er, or digital number, DN, after digitization.
npe , is calculated using the model, [Hol96], 2 npe = ? Rq () 4 FL2q(1(;+TM)ADtint)2 optics()Tatm d; 1 optics
Z
6
where AD is the area of the detector, Lq is the illumination/radiance, Rq is the quantum eciency/spectral responsitivity, tint is the integration time, Moptics is the magni cation of the optics5 , F is the f-number6, Tatm is the transform function of the atmosphere and optics is the transmittance of the optical system, and T is the absolute temperature.
3 Parameterization of the Vision Sensor and Its Uncertainties We review parameters and factors which characterize the video sensor7 (camera and digitizer) and limit its performance. Along with this, we point out geometric and radiometric uncertainties and discrepancies in the digital image. These are due to the optics, the CCD camera, the joint operation of the camera and the digitizer, and to the discretization process. The geometric distortions related to optics has been analyzed extensively [Sla80]. Discretization eects are in the center of the signal processing research [Jah93]. Both of these are important, but out of the scope of our report. Any undesired factors which cause discrepancies in the output signal we consider noise. The noise may be deterministic, systematic, or random. We focus on systematic and random noise which originate in the CCD camera and the frame grabber. We want to study all noise components. This report shows that not only random but also systematic errors have to be accounted for. \The magnitude of each noise component must be quanti ed and its eect understood. ... predicted system performance may deviate signi cantly from actual performance if signi cant noise is present. It is essential to understand what limits the system performance, so that intelligent improvements are made.", [Hol96], page 90.
3.1 Camera Related Noise
The total random noise8 of the CCD has three major components: photon (shot) noise, read noise (has many dierent components), and pattern noise. Often the total noise is modeled as sum of shot noise, read noise and pattern noise
q
< ncamera >= < n2shot > + < n2read > + < n2pattern >; where < n2::: > denotes the variance of the corresponding noise component, and < n::: > the standard deviation. A graph of the noise vs the signal level is called a photon transfer curved (introduced by Janesick). From here the noise levels and saturation levels can be derived and the dynamic range speci ed [Hol96], p 115. The total noise is dominated by the read noise at low signal levels, by the pattern noise at levels close to saturation, and by the photon shot noise in between [JEC+ 84].
3.1.1 Photon Shot Noise
This noise is related to the quantum nature of light, the natural variation of the incident photon
ux. The total number of photons emitted by a steady light source over a time interval varies according to a Poisson distribution. Any \shot noise limited" process exhibits Poisson distribution. 5 Moptics = the distance, R2 , from the detector to the lens divided by the distance, R1 , from the source to the lens. 6 For a circular aperture F = f l=D where f l is the eective focal length, f l = 1=R1 + 1=R2 , and D is the diameter
of the aperture. 7 The video sensor is a camera plus a frame grabber. 8 In the literature the noise reported is rms in electrons
7
An example of the derivation of the distribution could be found in [Bee57]. Because the process is Poisson, rms error(root mean square) of the photon shot noise in electrons is equal to the square root of the mean signal. The shot noise is always present in the data.
3.1.2 Read Noise
The read ( oor) noise is one of the most important parameters of of the sensitivity of the CCD. Factors contributing to the read noise are: background noise, trapping noise, reset noise, charge transfer noise, and output ampli er noise9. The output ampli er noise is the de nite lower bound on the read noise. Background noise and trapping noise are primary technology related, while reset noise and ampli er noise are much more related to design of the device and signal processing after the CCD [The95].
Background noise.
The background noise has three main components: dark current, optical or electronic |em fat zero, and luminescence in the device. 1.
Dark current: Random motions of atoms within the semiconductor caused by heat
even at normal room temperature generate dark current. These thermal charges are added to the charge packets and cannot be distinguished from the photo-generated ones. In [Hol96], pp 74-75, the following expressions for computing the number of dark current electrons are given ndark = JD AD tint ;
q
the dark current density, JD , depends on the material and the temperature, and is approximately equal to EG JD k exp? GkT ; where k = 1:38E -23 is the Boltzmann constant, EG is the energy gap, 1 < G < 2 is the dark current factor, and T is the absolute temperature; AD is the detector area, and q the electron charge. Generation of dark current is a random process modeled by Poisson distribution. Under this model, the dark current shot noise in rms is pndark . In general, dark current doubles with every 8 C increase in temperature [BL80]. Example 3.1 [Hol96]. Given JD = 10 nA/cm2, the cell size is 8m2, and tint = 0:0167 sec, using the formula, the number of dark current electrons is 668. The dark current noise is approximately 26 rms electrons. If we want to achieve 5 electrons rms dark current noise, the array has to be cooled, 25 = ndarkcooled = 2 Tinitial8?Tcool 668 ndarkinitial and solving it, Tinitial ? Tcool 38:1C so the array has to be cooled by 38:1 compared with the temperature at which the dark current was initially measured (usually that is 20). 9 In the literature, there is no agreement on these names. We follow de nitions from [JEC+84], [BL80], [Mcl89], [Hol96], and [PH96].
8
In some cameras certain number of pixels are shielded from light and are used to estimate the average dark current which then is subtracted from the signal. This is not very precise since the dark current varies from pixel to pixel and also the average of dark current over the shielded (so called dark pixels) is dierent than the average dark current over the \active" sels (removing the average does not remove the variability in dark current). Anyway by removing the average dark current what is achieved is full use of the dynamic range of the sensor, but the shot noise in the dark current electrons cannot be removed { it is alway present. Dark current generation is pixel dependent. By averaging dark images a systematic component, so called xed pattern in the dark images is make prominent. This component can be subtracted from the images, thus removing the systematic component due to dark current generation, but the variation due to shot noise in dark current cannot be removed | always is present. 2. Fat zero: Fat zero is introduced to aid the charge transfer eciency and the consistency of the quantum eciency. Optically generated fat zero follows Poisson distribution. If the fat zero is electrically introduced, on input, it is less than shot noise. 3. Internal luminance: The internal luminance in the CCD device may have variety of sources. One source is the clocking of the voltages to the gates which control the potential well levels. It is manifested in an exponential decline in average line intensity could be detected. (This is con rmed by our experiments with dark images.) The phenomena is explained by generation of long wave photons by the clocking of the register, which photons get absorbed in the lines close to the register, thus increasing the background charges [JEC+ 84]. Statistically this noise is modeled by Poisson distribution. Second source of luminance is diusion. It is related to input-output mechanisms. This phenomena explains the \radiation" of light in the CCD dark image from the position of the output ampli er. To prevent the contribution to dark current from output register and ampli er, some cameras have additional dark pixels along the horizontal shift register and the ampli er. The most damaging source of luminance are blemishes. These are single sels that get saturated fast. This is a result from defects in the sel gates [JEC+ 84].
Trapping noise.
Trapping noise is caused by random variations in the \trapping" states of CCD: charges get trapped, and also they are kept trapped for some random period. This type of noise is also very much technology dependent, and for the so called buried-channel CCD is very low, on the order of 5 electrons [Mcl89]10.
Reset (kTC ) noise.
When a charge packet arrives at the output node, it produces a voltage change. To measure the voltage of a charge packet a reference voltage level is needed. The readout capacitors are reset to a nominal voltage level at each readout cycle. The reset noise relates to the uncertainty this voltage level. The rms reset noise in electrons is
p ? < nreset >= kTC q e rms;
10 One should keep in mind that in astronomy applications, where very low level signals have to be detected, high performance digital cooled cameras are usually used. We assume that the numbers cited are for digital cameras.
9
where q is the electron charge, kis the Boltzmann constant, T is the absolute temperature, and C is the capacitance in picofarads. The reset noise represent the uncertainty at the measuring stage. Reducing the capacitance, reduces the reset noise (also the output conversion gain, Gq=C , increases). Cooling also reduces the reset noise but it is not that ecient as the reduction on the dark current generation. The reset noise is eectively removed by a electronic processing technique called correlated double sampling [Mcl89]. Ampli er noise The ampli er noise is associated entirely with the output node. It may have two components: a white noise (due to thermal generation) and a component introduced by interaction of charges and \traps" present in the transistor channel (called 1/f noise). Ana ampli er noise is associated with the on-the chip ampli er, also there may be an ampli er noise component associated with the o chip ampli er (if such an ampli er is present). By good manufacturing, this noise can be reduced substantially, \typically about 6 electrons or less" [Mcl89].
3.1.3 Pattern Noise
Both, the xed pattern noise in dark current and the photoresponce nonuniformity are called pattern noise. They are manifested as spatial nonuniformities most prominent in averaged dark images and averaged at elds (see Sections 4.1 and 4.2). The pattern noise contributes to the nonuniformity of the individual sel responses to a scene of uniform brightness. We use a at elding procedure to correct for the pattern noise (this takes care of the systematic components in the noise). 1.
Fixed Pattern Noise (FPN) of the Dark Current
Dark current can originate at dierent locations in the CCD but has, in all cases, to do with the irregularities in the crystal structure of the silicon (metal impurities and crystal defects). This gives rise to the so called xed pattern noise in the dark current. This xed pattern noise is signal-independent and is additive. 2. Photoresponce Nonuniformities (PRNU) of the CCD Array Dierences in the responsitivity of individual pixels in the presence of light lead to Photoresponce nonuniformity. This is also a type of pattern noise, but it is signal dependent and is modeled with multiplicative factor of the photoelectrons shot noise ([Hol96], p 113),
< nPRNU >= Unpe ;
(3:1)
where U varies with the pixel.
3.2 Camera Related Parameters
The following parameters are commonly used to measure and compare the performance of the CCD arrays: spectral response, minimum signal, maximum signal, dynamic range, pixel to pixel uniformities, and output conversion gain. Other parameters, for full characterization, include the spectral quantum eciency, the charge transfer eciency, full well capacity, linearity, pixel nonuniformity, signal to noise ratio. Read noise is an important parameter which was discussed in Section 3.1.2. read noise which determines the lowest level of detectable signal. For consumer applications the read noise, full well capacity and responsitivity are usually of most interest, [Hol96], still even these are not reported by all manufacturers. 10
3.2.1 Quantum Eciency of the CCD
The quantum eciency (QE) characterizes the physics of the charge accumulation process. It is the ratio of collected electrons to incident photons. The QE is dependent on the technology used and the wavelength of the light | it depends on the gate voltages, the thickness of the material, and the wavelength. For an ideal material the quantum eciency is one. In real cases it is between zero an one. For the same thickness material the quantum eciency decreases with the wavelength. Typical quantum eciency is around 0.3 to 0.6 for wavelengths between 0.6m and 0:9m. The so called back-side illuminated CCD has higher quantum eciency, but the manufacturing process for them is complicated and delicate so they are used for scienti c applications where high quantum eciency is a requirement (for such devices a quantum eciency of 0.85 is reported for wavelengths between 0.5 and 0.8 m, [Hol96].)
3.2.2 Spectral Responsitivity of the CCD
The CCDs have in general good linearity (the numbers of photoelectrons generated is liner in exposure (illumination times integration time), Thus often a liner model is a good approximation for the CCD charge accumulation process. The spectral responsitivity of the CCD array is related to the quantum eciency. The spectral responsitivity depends on the illumination and the quantum eciency. The average responsitivity, Rave , is the slope of the transformation from input to output signal (under the linear model) | see Figure 4 (page 11). Output signal Max signal
Dark current R ave=(Max signal)/SEE SEE
Exposure
Figure 4: Imaging transfer function. SEE is the exposure which saturates the potential wells, it gives maximal signal. Sometimes peak average responsitivity is reported which occurs for the speci c peak wavelength, p. Rave (GpAD )=(1:24C), where is the quantum eciency. Another feature of contemporary cameras is an anti-blooming drain. When a potential well lls, charges may spill in neighboring sels in the column (spills between columns are prevented by special shields). This process is called blooming. To prevent blooming to each sel or to each column drains can be attached. When anti-blooming drains are used, the gain pro le changes from linear to piecewise linear. The slope changes close the saturation level (a gain knee appears, [Hol96], thus the linearity model does not hold, pp 78-79). 11
Holst, [Hol96], cautions that in order to obtain the transformation curve, most often the exposure is varied (illumination times integration time), during these experiments the source should not be changed (the same light bulb should be used for example, but neutral density lters could be used to change the source intensity). It is important not to compare blindly average responsitivity for dierent devices since the spectral quantum eciency dier among devices. Comparing arrays based on spectral responsitivity makes since if the illumination conditions are similar and also similar to the calibration condition, otherwise responsitivity should be used only as a qualitative descriptor of array performance. \Sensitivity" parameter is mentioned in some camera data sheets. Unfortunately, there is no unique de nition of sensitivity, and the related units. It is generally understood that the sensitivity depends on the spectral quantum eciency and the noise level. At high signal levels the sensitivity is proportional to the exposure (thus quantum eciency), and at low signal levels it is bounded by the noise.Thus \sensitivity" if not clari ed, is also a qualitative parameter, not a reliable ground for comparing dierent cameras. An evaluation of a camera should be done with an application in mind.
3.2.3 Charge Transfer Eciency of the CCD
A basic limitation of the performance of the CCD is the eciency with which a charge packet can be transferred from one potential well to the next (with minimal addition or loss of charges). It is necessary that any charge packet passes through the CCD structure in a time period so short that the amount of additional charge picked on the way is minimal. (This limits the size of the CCD registers, and the clock frequencies). On the other hand, the limited time for transport and the trapping of charges by \surface states" results in loss of charge during transport. A way of minimizing the loss due to trapping is to keep the potential wells always semi-empty (fat zero is introduced), so charges from passing packets will not be trapped. Typical CTE for consumer applications CCD is about 0.9999. For high-performance contemporary CCDs the loss could be 7 electrons for 4096 transfers. A charge transfer eciency (CTE) of 0.9999995 is reported [JEW+ 95]. The net eciency is CTEn , where n is the maximal number of transfers a packet may undergo. CTE depends on the charge packet size, it decreases for small packets (due to trapping), it also decreases for near maximal size packets (due to spills) [Hol96], page 81.
3.2.4 Minimal and Maximal Signals
Both, minimal and maximal signals are technology dependent and varies with the CCD architecture. The minimal signal is the one which produces signal-to-noise ratio one [Hol96], or, equivalently, signal equal to the noise level in rms electrons. The minimal signal is some times reported as noise equivalent exposure (NEE). The read noise puts a lower bound on the minimal signal. NEE should not be used for comparing arrays with dierent architecture [Hol96]. The minimal illumination implies the signal-to-noise ratio one. The de nition of minimal signal also varies with the author. The maximal signal is the one which lls up the potential wells (saturates them). It is called saturation equivalent exposure (SEE). The size of the well is directly related to the size of the sel and the CCD architecture (back illuminated CCDs have smaller wells). In [PH96], for the Photometrics digital camera it is reported that the full well capacity is 800 times the pixel area in micrometers. In the presence of anti-blooming, the \white-clip" level is taken as the maximum
12
signal. In case of minimal dark current
SEE = VRmax : ave
The saturation level depends on the CCD technology (material properties and pixel size), it is independent of the noise. The maximal signal for the camera corresponds to full well well Vmax = G1 GqN C ;
where G and G1 are the gains of the on-chip and o-chip ampli er, q is the electron charge, Nwell is the full well capacity and C is the capacitance of the measuring node in picofarads. The number in electrons is inversely proportional to the square of the f-number of the optics, thus f-number must be selected when reporting minimal or maximal signal. Usually the minimal and maximal signals are reported with dierent optic f-numbers, 5.6 for the maximum, and 1.4 for the minimum, if not otherwise speci ed. For consumer/industrial cameras the integration time is selected to comply with the broadcasting standards (i.e. 1/60 for the EIA 170 broadcasting standard). Varying the integration time or iris the signal levels may be changed. The minimal/maximal parameters depend on the illumination, and the spectral response of the detector. It is important that the source color temperature is reported (most often an incandescent source, color temperature 2856 K is used, but this does not have to be the source used by the manufacturers). The reported parameters minimal signal/maximal signal/sensitivity/illumination are quantities which de nition varies with the author, thus, cameras cannot be compared blindly based on these parameters.
3.2.5 Spatial Resolution
The resolution of an imaging sensor is its ability to resolve spatial variations in the incoming signal. It is the sensor's ability to discriminate between closely spaced points in the image. The spatial frequency of an image focused on the device and the modulation transfer function (MTF) of the output are standard ways of characterizing the resolution. Spatial resolution depends on the geometry of the sensor { number of pixels, their shape, and organization of the pixels in the imaging array. The resolution is speci ed by measuring the highest spatial frequency which can be distinguished by the device for a given contrast. One method employed for determining special resolution is the shrinking raster method [Mad96]. In this method, a pattern of contrast vertical bar is imaged on the CCD. The distance between the bars is gradually decreased till a human observer cannot distinguish the bars as separate entities in the image. The response of the imaging system to changing spatial frequencies is characterized by its modulation transfer function: de ned as the response of the system to a sinusoidally changing spatial frequency fsig of the input signal. The individual sels sample the optical signal. Let x denotes a sel center, Ppix horizontal pitch (i.e., the center to center horizontal distance between sels), and x is the pixel width, then the geometric modulation transfer function is MTFG = sin ( 2 Px ffsig ) pix N where fsig is the signal frequency, and fN is the Nyquist frequency [The95]. High MTFG may not always be advantage though { this leads to aliasing eect which is out of the scope of our discussion. For treatment of the Modulation Transfer Function Theory for the video sensor see [Hol96], Chapters 9 and 10. 13
3.2.6 Temporal Resolution
The temporal resolution of an imaging sensor is its ability to resolve temporal variations in the incoming signal and thus relates to the sensitivity of the camera to motion. It depends on the time it takes for the CCD camera to collect charge packet (with high enough number of signal carriers), the time it takes to transfer the charges to the output stage, convert them, and pump them out of the CCD. In addition, the temporal resolution depends crucially on the time it take for the framegrabber to digitized the image, and to store it in the computer memory for further processing. The bottle neck of the system is the memory transfer, which also limits the size of the image. The minimal time necessary for the CCD to collect enough charge so that the signal raises above the read noise is an ultimate limit on the temporal resolution of the camera output. The temporal resolution, puts a bound on the reactive time of a robotic system relying on the vision input. The temporal resolution is speci ed by the maximum number of frames per second which the system can acquire. Higher resolution means more frames, faster. Each CCD has an optimal operation range in which the readout speed is maximal, and the read noise is minimal.
3.2.7 Signal-to-Noise ratio (SNR)
The array SNR is the ratio of the number of photoelectrons (signal dependent electrons) to system noise SNR = < nnpe > sys
At low levels SNR increases withpthe signal increase (constant read noise); at midlevels the photon p N shot noise limits SNR ( SNR = npe with max value possible SNR= well ), and at high levels the SNR approaches 1=U (3.1). For low signal levels and small well capacity the SNR equals the dynamic range.
3.2.8 Dynamic Range of the Camera and the Video Sensor
The dynamic range is a derived parameter de ned as the ratio of the full-well capacity (saturation level) to read noise in rms (which is the ultimate lower bound on the noise) for the CCD array. For the video sensor the dynamic range speci es the range between the brightens and darkest levels within a single image. The dynamic range of the CCD array can be de ned also as
SEE = Vmax = Nwell DR = NEE Vnoise < nread > where SEE is saturation equivalent exposure and NEE is the noise equivalent exposure (see [Hol96], p 102); Nwell is the full well capacity in electrons, and < nread > is the noise.
q
The dynamic range of the video sensor (camera nad framegrabber) is the ratio of full well capacity to < n2read > + < n2ADC > (read noise plus quantization (ADC) noise). Often the dynamic range is expressed in decibels, SEE dB DR = 20 log NEE The number of the intensity levels in the dynamic range gives intensity resolution of the camera. The dynamic range is used to restrict the bit depth of ADC, so that low contrast targets are detectable. The number of the intensity levels in the dynamic range of the whole system is intensity 14
resolution (or gain) of the video sensor. It is directly related to the saturation level (full wellcapacity) and the bit depth of the framegrabber. The saturation level divided by 2N (where N is the bit depth of ADC) gives the quantization step. The video sensor dynamic range is restricted by the quantization noise when the video sensor noise is dominated by the quantization noise. As it will be discussed later, under uniform distribution model, the variance of the quantization noise equals the quantization step squared divided by 12. For N bit ADC, the maximum signal is 2N p times the quantization step, thus the video sensor dynamic range is 2N 12, so 8-bit ADC cannot have dynamic range greater than is 59dB. There are dierent tradeos which relate spatial, temporal, and intensity resolution, which we will not discuss further here. Again, as with other parameters, the dynamic range varies largely with conditions depending on the optics (f-number of the lens, iris setting), integration time, spectral characteristics of the source of illumination, spectral response of the CCD array. To be conclusive, when reported, the conditions under which it was measures should be clear.
3.3 Framegrabber Related Parameters and Noise 3.3.1 Geometric Distortions
There are two possible types of geometric discrepancies which could arise due to the frame grabber: (i) the individual lines in the digitized image are not aligned properly, this is termed linejitter (it is a result of the failure of the frame grabber to detect the HSYNC); and (ii) the framegrabber undersamples or oversamples the \locked" line (the sampling frequency is dierent from the camera pixel clock frequency) and thus a scaling factor for conversion between pixels and sels is introduced. This parameter, horizontal scale, is one of the objects of geometric calibration. A geometric distortion in the digital image which relates to VSYNC detection is that the top, up to 100, lines in the image have relatively higher jitter. A detailed study of dierent frame grabber architectures and synchronization mechanisms, and related geometric and radiometric distortions is given in [Bey92]. In [Bey90] three dierent methods for detecting and accommodating for linejitter are proposed. Two of them require very specialized equipment and measuring procedures. The most appropriate method for general computer vision applications is the plumb line method [Bro71] in which an image of vertical stripes is taken, and the linejitter and radial distortions due to optics are estimated together. In [LT87] a method for estimating linejitter based on Fourier analysis is presented, it is appropriate for cases in which it is proven that the jitter is no more than a pixel.
3.3.2 Radiometric Distortions
An eect in radiometric distortion from interlacing is the shift in gray level between odd and even elds. Yet another source of radiometric distortion: in the process of digitization, the framegrabber has to be able to restore the zero reference level from which to measure (this process is called DC-restoration); failure to detect the reference level correctly results in shift in the gray values, and thus radiometric distortions. \The fall-o of the sample-and-hold mechanism used in many DC-restoration circuits" leads to uniform (for all images with this con guration) component in the background and could be easily removed [Bey92]. One of the most noticeable manifestations of distortions due to the camera/frame grabber interface is a systematic error component in the background noise of the dark images as discussed 15
earlier due primary to the mismatch between the frequency of the camera pixel clock and the digitizer sampling frequency.
3.4 Discretization Related Noise and Distortions 3.4.1 Aliasing
The eect of spatial discretization is a main objective in signal processing. Severe distortions (Moire-eect) occur when the sampling theorem11 is violated. The sel spacing puts limitation on the highest frequency in the input beyond which severe distortions occur, and the area of the CCD chip limits the lowest frequencies which can be detected.
3.4.2 Quantization error
A type of radiometric uncertainty is captured by the quantization error. It results from the analog to digital conversion of the signal. There are dierent schemes for conversion. Usually, all gray values are considered equally probable, and the distribution of the quantization error is assumed to be uniform over the interval [? 12 q; 21 q ], where q is the analog to digital quantization unit. Under this model the variance of the quantization error is q 2 =12. The value of q depends on the dynamic range of the CCD camera and the bit depth of the digitizer, its magnitude is the voltage corresponding to the least signi cant bit in the ADC scale. For ADC with N bits, q = Vmax =2N , and in terms of the array output
< nADC >= GGC q pq e? rms:
12 If we want to pick up an ADC converter to match the dynamic range of the array, Vmax corresponds to the full capacity of the well, 1
p : < nADC >= NN well
2 12 It is desirable that the quantization noise is less than the read noise, for this an ADC with large enough number of bits has to be selected.
4 Examples of Sensor Parameters and Pattern Noise
4.1 Background and Pattern Noise in Dark Images
Manifestation of the background noise can be observed in the dark images. A dark image is an image taken with no access of light to the video sensor. First the camera is warmed up, next images are taken with a tight, opaque cap on the lens. Figure 5 (page 17) shows a typical dark image for the con guration of black and white camera SONY XC-77RR and a frame grabber DT1451. Speci cations from the data sheet of the camera: 493(V)x768(H) sels; pixel clock frequency 14:318MHz, within 1% error. For the frame grabber DT1451: eective digital image size 480(V)x512(H) pels; sampling frequency about 10MHz. In image A, Figure 5 (page 17), the dark current random noise is visible. Observe internal luminance. Systematic component in the background noise is prominent in image 11 The input signal can be fully reconstructed from the samples if the input signal frequency, f , is at most half of
the the sampling frequency.
16
Image A
Image B
100
row number
row number
100 200 300 400
200 300 400
100
200 300 400 column number
500
100
Statistics for image A
200 300 400 column number
500
Statistics for image A
16.5
18
17.5 16
16.5
15.5
avg intensity
avg intensity
17
15
16
15.5
15 14.5 14.5
14 0
100
200 300 column number
400
14 0
500
100
200 300 row number
400
Figure 5: SONY XC-77RR/DT1451: Image A is a single dark intensity image, and B is a temporal average of 100 dark images.
17
B. The vertical stripes are due to mismatch between the camera pixel clock and the sampling frequencies, and \cross-talk" from clocking of other electronic components. Figure 6 (page 19) shows statistics for single dark images for various con gurations we looked at. In the left column the variable on the horizontal axis is position, and on the vertical axis is intensity. The two curves: averaged intensity by column, and average intensity by row are plotted. The decline in the averaged intensity by column due to the internal luminance is noticeable (another component which contributes to this decline is the DC-restoration problem which originates in the frame grabber, see Section 3.3). For con gurations (i) and (ii) the average intensity by column curve shows the periodicity due to mismatch in the pixel clock frequencies and the sampling frequencies. For (iii) the averaged intensity by column is an almost a horizontal line (slowly decreasing), for this con guration camera pixel clock and sampling frequencies are almost the same. The high frequency components in horizontal direction for (i) and (ii) are clearly visible in the corresponding graphs of in the right column which show the power spectra of the averaged intensity by column. Regarding the behavior of the curve of average intensity by row: in cases (i) and (ii) it is represented by the \pepper like" trail going through the periodic stripes of the averaged intensity by column curve; and for case (iii) it is nonmonotone graph going across the line representing the averaged intensity by column. In all three of the cases, the low number lines, up to 80-100, have higher values as explained earlier. At the moment we do not have an explanation for the nonmonotone behavior of the averaged intensity by row for case (iii) (in the dark image itself alternating wide horizontal stripes were observed). One possibility is the DC-restoration process in the frame grabber (see Section 3.3). In another test with the same con guration SONY XC-77RR/TIM-40 (but physically dierent camera and dierent digitizer form the ones presented here) a phase pattern could be observed (see Figure 7 (page 19)). In [Bey92] similar phase phenomena is reported but again the source is not located. Note that each con guration has its own dark image \signature" expressed by the systematic components in the background noise (the oset in average intensity form 0 varies for physically dierent sensors, even of the came con guration). Another phenomena worth mentioning is the presence of \ghosts" in some dark images we gathered. In some of the temporally averaged dark images: for SONY XC-77RR/DT1451 a checkerboard pattern was added to the periodic vertical stripe pattern; for SONY XC-77RR/TIM-40, a brighter rectangle could be noticed even in 10 temporally averaged images (see Figure 7 (page 19)). For SONY XC-77RR/DT1451 case we recognized an image of the target which we usually use for geometric calibration. The image sensor was exposed to the target at least an hour prior to the acquisition of the dark images! The brighter rectangle for SONY XC-77RR/TIM-40 results from the scene of a white piece of paper on black background to which the camera was exposed prior to taking the dark images. When over-exposed, an image is \remembered" on the chip for hours: quantum eciency is raised arti cially. In both cases the source of the ghost image were charges trapped during use of the sensor prior to taking the dark images. A \trapping state" may remain lled by a charge for a while. \the eective QE for subsequent images will be increased because less trapping of newly generated signal electrons will occur"[JEC+ 84]. To clear up such residual images, some cameras have a \clear" command, in our case simply switching o the camera removes the residual image. Observe also that in Figure 7 (page 19) there are no high frequency components in the image, in this con guration camera pixel clock and sampling frequencies are almost equal (there is one-to-one correspondence between sels and pels).
18
8 6 4
5 4 3 2 1 100
200
300
400
500
3 2.5 2 1.5
100
200
300
400
500
100
200
300
400
500
5 4 3 2 1 100
200
300
400
500
35
4 2
30
Figure 6:
200
400
600
Top to bottom:(i) XC-77RR/TIM-40.
0 SONY XC-77/DT1451;(ii)
200
400
600
SONY XC-77RR/DT1451;(iii)
SONY
A temporal average of 10 dark images
50 100 150 200 250 300 350 400 450 100
200
300
400
500
600
700
Figure 7: SONY XC-77RR/TIM-40: The residual image of the rectangle is visible in the lower left part of the images. 19
100
100
200
200
300
300
400
400 200
300
400
500
100
1.6
1.6
1.4
1.4
1.2
1.2
% of data points
% of data points
100
1
0.8
0.4
0.4
0.2
0.2
Figure 8: SONY
220 intensity
230
XC-77RR/DT1451:
0 200
240
400
500
0.8
0.6
210
300
1
0.6
0 200
200
210
220 intensity
230
240
averaged at eld approximations and their histograms (scaled).
4.2 Photoresponce Nonuniformities in Flat Fields
We will illustrate the intrinsic nonuniformities of the sels. An image, which is a response of the sensor to a scene of uniform brightness (preferably close to saturation level) is called a at eld. It is used in radiometric calibration. Obtaining a at eld in a standard engineering laboratory is nontrivial task. We investigated dierent methods of approximating a at eld. Here we present two independent experiments regarding two instances of the con guration SONY XC-77RR/DT1451: we used in both cases the same physical frame grabber but two dierent cameras. The at eld approximations for each video sensor were obtained without the lens, but with a diusing white glass lter. The indirect illumination was provided from 3 distributed incandescent light bulbs. For two instances of the camera/frame grabber con guration SONY XC77-RR/DT1451 Figure 8 (page 20) shows the temporal average of 100 at elds and their scaled histograms. The images have been transformed via histogram equalization just for the purpose of enhancing the display. The at eld approximations for each sensor were obtained independently, under xed illumination level. The acquisition of the at elds without the use of a lens removes any nonuniformity which could have risen from lens vignetting (fall o in intensity from the center of the image to the 20
boundary). On the other hand a bright square frame can be noticed in the at eld images. We believe that it is due to re ections from the square aperture in front of the sensor (a part of the camera architecture).
4.3 Parameters Reported in some Manufacturer's Data Sheets
In Figures 13 (page 38), 14 (page 39) and 15 (page 40) we show copies of standard speci cations provided by camera manufacturers. We have marked by hand the lines relevant to parameters and sources of noise we discussed in the report. The only parameters we nd are: the type of CCD device (interline), the size of the array, the pixel size (not reported in all cases), the pixel (output) clock (which is not reported in all cases), sensing area, scanning system (interlaced for out applications; number of video lines), \sensitivity"/luminance, SNR, speed shutter, Gamma (should be o). As mentioned in Section 3, for many of these parameters there are no unique clear and unambiguous de nitions, also the conditions under which the measurements were taken are important (and not generally reported). What is \sensitivity" in the context of these examples? Signal level? What is the exposure time (which speed of the shutter is used)? Under what conditions is the SNR measured, is it a ratio of rms electrons or something else? From these examples, it is clear, that in order to perform quantitative measurements with the video sensors, one has to calibrate them. In Section 6.1 we show the radiometric correction procedure we use.
5 Noise Models and Estimation Procedures of Healey and Kondepudy In this section we review the radiometric model of Healey and Kondepudy, [HK94].
5.1 The Image Model of Theoretical Importance
The number of photo-electrons collected at pixel (a; b) in ideal CCD, is denoted by I (a; b). To account for photoresponse nonuniformity (PRNU) of pixels over the array, I (a; b) is scaled by a factor K (a; b). Ideally, K (a; b) should be wavelength dependent, but the dependence is ignored to simplify the model. Charge transfer eciency is assumed 1. Charge collection is independent from site to site, so blooming and diusion are ignored. The noise components (dark current, shot noise of the photoelectrons and shot noise in dark current, and the ampli er noise) are additive. The output analog signal from a pixel (a; b) is
V (a; b) = A(K (a; b)I (a; b) + NDC (a; b) + NS (a; b) + NR(a; b)): NDC (a; b) is the dark current noise; EDC (a; b) denotes the expected dark current E (NDC (a; b)). NS (a; b) denotes the Poisson shot noise at (a; b) from which the mean, K (a; b)I (a; b) + EDC (a; b), was subtracted (i.e. it is called zero-mean Poisson). Thus, E (NS (a; b)) = 0; and V ar(NS (a; b)) = K (a; b)I (a; b) + EDC (a; b). NR is called read noise, it is the on-chip ampli er noise, it is assumed to be zero mean. A is the ampli er gain. Camera pixel clock and digitizer clock have the same frequencies, thus there is one-to-one correspondence between sels and pels. 21
The gray value of a pixel (a; b) is
D(a; b) = A(K (a; b)I (a; b) + NDC (a; b) + NS (a; b) + NR (a; b)) + NQ(a; b);
(5:2)
where NQ is the quantization error. NQ is assumed to be uniformly distributed in [?q=2; q=2] (where q is the quantization step), so it has variance N2 Q = q 2=12.
Assumptions A1: All assumptions made in this section.
5.2 The Image Model Used in the Computations
The model given by equation (5.2) is pretty detailed. The noise parameters are dicult to quantify in typical computer vision laboratory. The following crude model is used for the purpose of quantifying gross noise parameters. The gray value of a pixel (a; b) is modeled by
D(a; b) = (a; b) + N (a; b) where
(a; b) = A(K (a; b)I (a; b)+ EDC (a; b));
(5:3)
(5:4) and N (a; b) denotes the total random noise in digital value. Under Assumptions A1, N (a; b) is zero-mean. The variance of N (a; b) is denoted by N2 (a; b). N (a; b) is a sum of independent noise components: signal-dependent component, NI (a; b), and signal- and position-independent component, NC (a; b) N (a; b) = NI (a; b) + NC (a; b): The following assumptions are made in addition. Under uniform scene irradiance, there is only a small variation in the number of electrons collected at dierent pixels: the number of photoelectrons collected at site (a; b), I (a; b), is approximately equal to the average number of photoelectrons collected over the whole array, I. The multiplicative factor, K (a; b), which scales I (a; b) to account for PRNU is very very close to one. The average number of dark current electrons generated at a site (a; b), EDC (a; b), is very very close to the average dark current electrons generated over the whole array, EDC . The total noise variance over a pixel (a; b), N2 (a; b), is approximated by the total noise variance over the whole array N2 which is approximated by N2 A2 (I + EDC ) + C2 = A + C2 : (5:5) where is the average pixel value over the whole array.
Assumptions A2: All assumptions made in this section.
22
5.3 Estimation Procedures
There are three noise estimation problems addressed in the paper: 1. Estimating the right hand-side terms of equation (5.5): N (a; b) and (a; b), for each pixel. That problem is reduced to computing sample mean and sample variance over certain at eld based images. These sample mean and variance are used to estimate the overall mean signal and total noise variance N2 over a at eld of a certain signal level (these global image parameters approximate the pixel parameters). 2. Estimating the ampli er gain A and the signal independent noise component C2 . Using the estimates for and N2 for dierent at eld levels, a least squares t is done on (5.5) to obtain estimates of A and N2 . 3. Estimating pattern noise (FPN and PRNU) for each pixel. Pattern noise gives the systematic noise component and is subsequently used for radiometric correction. FPN is estimated by averaging dark images. PRNU estimate is based on the averaging of at elds from which the average dark image has been subtracted.
5.3.1 An Estimation of Total Noise Variance for a Flat Field of \Reasonably" High Level Goal: Given a at eld of an average gray value , estimate total noise variance, N2 (a; b) and the average signal level (a; b), see equation (5.3). Thus, the noise estimates depend on the signal level, this is what is emphasized as the author's contribution. By varying the at led levels a curve describing dependence of total noise variance from average signal level can be plotted. As it will be clear from the estimation procedure given below, what is estimated is not the pixel noise and signal level, but the average signal level and variance over a at eld of the xed gray level. Assumptions: A1, A2, A3 and A4. The last two assumptions are given below, Assumption A3: (a; b) ; N2 (a; b) N2 , also assume that the pixel values, for the pixel-wise dierence of two at elds of \reasonably" high levels, are normally distributed. A reasonable justi cation presented in the paper which backs up this assumption is that the shot noise of the photoelectrons dominates the total noise, and the Poisson distribution of the shot noise can be approximated by a normal distribution for large image sizes.
Procedure:
1. Take two independent at eld images D1 and D2 using uniform re ectance card under uniform illumination from a single source. The at elds have the same, \reasonably" high gray level. 2. Use the sample mean, ^, of all pixel values in the two images to estimate . 3. Take the pixel-wise dierence of the two images, D = D1 ? D2 : D has zero mean and variance 2N2 . 23
4. Use the sample variance of all pixels values in D to estimate the variance over D , divide that by 2 to obtain an estimate ^N , of the variance of the total noise over a at eld of the xed gray level, i.e., ^N2 is an estimate of N2 . Under the Assumption A4 that the pixel values in D are normally distributed, an estimate of the variance of the sample variance estimate ^N2 is obtained by the standard estimate N2 )2 (5:6) V ar(^N2 ) 2(^ M ?1 where M is the total number of pixels in the image. Comments: What is estimated here, is the overall average of the gray value over a at eld of a xed relatively high level and the overall noise variance. Under the assumptions the total noise is dominated by the photon shot noise, thus practically, for a given gray level, the variance in the photon shot noise over the whole array for that level is estimated.
5.3.2 Estimation of the ampli er gain and the signal-independent noise variance Goal: Estimate ampli er gain A and the signal independent noise component C2 . Assumptions: A1, A2 and A3. Procedure: 1. Obtain P pairs of at elds, D1i ; D2i , i = 1; 2; : : :; P , each pair is obtained at dierent but reasonably high level. 2. As discussed in the previous section, obtain the estimates for mean signal i and the 2 , for each at eld level pair. total noise variance N;i 3. Use equation (5.5) as a regression model. Use the parameters estimated in the previous step as data points and use weighted (by the estimated variances from (5.6)) least squares to nd the line parameters A and C2 . The t produces MLE estimates of A and C2 under 2 , i = 1; 2; : : :; P , is normally distributed the assumption that each of the the variances N;i { justi able as normal approximation of the Chi square distribution by Assumption A4.
5.3.3 Estimating FPN in dark images and its variance Goal: Estimate the systematic component in dark current | xed pattern noise. For each
pixel (a; b) estimate the average oset in gray level due to dark current. Assumptions: A1 and A2. Procedure: Average s dark images, the resulting image is D^ D . Under assumptions A1 and A2, the mean and the variance in gray value over D^ D are AEDC (a; b) and N2 (a; b)=s, respectively. The total noise in dark images is dominated by the background noise (dark current and ampli er noise).
24
5.3.4 Estimating the factor modeling PRNU | K (a; b)
Experiments are designed to distinguish between image variation due to PRNU and image variation due to scene irradiance and noise. A series of experiments is conducted so that scene re ectance and illumination vary locally (over the array) from one experiment to the next, and the only constant parameter is PRNU of the pixel. Under the model used, the PRNU is assumed to be independent of the wavelength. Goal: For each pixel, (a; b), estimate the PRNU scale factor K (a; b). Assumptions: A1, A2, A5 and A6. The last two assumptions are given below.
Procedure: 1. Use n1 dierent image con gurations. Calibration card of uniform re ectance is used. For a xed pixel, from one experiment to the next a dierent patch from the card is imaged under dierent illumination. The change of illumination is achieved by placing neutral density lters and moving the illumination source and the card. 2. For each imaging con guration: (a) Take n2 images and average them pixel-wise. Since the total noise is zero mean, the i-th averaged image has a gray value
A(K (a; b)Ii(a; b) + EDC (a; b)):
(5:7)
(b) Subtract the averaged dark image D^ D from (5.7), i.e., ei (a; b) = A(K (a; b)Ii(a; b) + EDC (a; b)) ? D^ D (a; b) K (a; b)Ii(a; b)A: (c) Next, Ii (a; b)A is estimated by e~i (a; b), where e~i (a; b) is the average of the gray values from ei over an m by m window centered at (a; b). This approximation is valid under the Assumption A5 that the i-th averaged image, Ii , has very small variation over the window, and that the average of the corresponding K (a; b) scale factors over the window is 1. (For their experiment the authors used 9x9 window size.) 3. Obtain initial estimate, m (a; b), of K (a; b). Using the data points (ei (a; b); ~ei(a; b)), and the linear approximation ei (a; b) K (a; b)~ei(a; b); m (a; b) is, the average of mi (a; b) = ei (a; b)=e~i(a; b), i = 1; 2; : : :; P . 4. For the purpose of identifying outliers in the data (ei (a; b); ~ei(a; b)), i = 1; 2; : : :; P , produce an estimate ^ of the standard deviation of the slopes mi (a; b) from the mean slope value by: (i) for each (a; b), compute the sample standard deviations in mi (a; b) over i = 1; 2; : : :; P ; (ii) average the computed sample standard deviations over \many" sites (a; b) and denote the resulting value with ^ . 5. Identify and remove outliers (ei (a; b); ~ei(a; b)) from the data. Under the Assumption A6 that the slopes mi(a; b) are normally distributed around the mean over i = 1; 2; : : :; P , de ned as outlier those mi (a; b), and hence (ei (a; b); ~ei(a; b)), which are more than p^ away from m (a; b). Remove the outliers from the data. 25
6. Recompute the average slope estimate, K^ (a; b) based on the points (ei (a; b); ~ei(a; b)) with the outliers removed, i.e., ei (a; b) ; K^ (a; b) = n (1a; b) 0 i2?(a;b) e~i (a; b)
X
where ?(a; b) f1; 2; : : :; P g is the set of integers indexing the points (ei (a; b); ~ei(a; b)) except the outliers, and n0 (a; b) is the number of points in ?(a; b). 7. Bound the variance of the estimator, for xed (a; b), V ar(K^ (a; b)) (p^ )2 =nmin ; (5:8) where nmin is the minimum size of ?(a; b), the minimum is taken over all sites (a; b), and p is a selected threshold. 8. Determine the value of n1 (the number of dierent imaging con gurations), so that the estimate K^ (a; b) has enough precision, i.e., V ar(K^ (a; b)) is less than one tenth of a gray value precision (2?N =10, where N is the bit depth of the digitizer). The procedure suggested is as follows: (i) set
nmin = 22N (p^ )210
(5:9)
(ii) the worst-case ratio nmin =n1 is measured (how?), and then n1 is computed. 9. Determine the value of n2 (the number of at elds taken for each xed level). Compute estimates of the ampli er gain and the signal independent noise component C2 . From (5.5), N2 is a function of . The variance of the average of n2 at elds at xed level is N2 =n2 . Requesting that for each at eld level , N2 =n2 q 2=10 (where q is the quantization step) will give the minimal value for n2 . (Note that in this context, n2 is signal level () dependent, so n2 which bounds the variance for all signal levels has to be selected.) Apparently, the this methods was used in experiments. n1 and n2 were found to be 11 and 18, respectively, but for \safety" reasons 20 and 20 were used.
Comment:
The major estimation method is \averaging". A main assumption which propagates through the procedure is the \normality" assumption (even a normal distributions for the slope values mi ), which gives meaning of the variance estimates as measures of optimality. There is no justi cation for all these assumptions which supposedly reduce the parameters of the model (5.2) to the parameters ; N2 ; A; N2 which have been actually estimated. Regarding the methods for computing the values for n1 and n2 , the discussion is not clear. In their experiments the authors used images of size 100x100, n1 = 20 and n2 = 20 (the method given in Section 5.3.4 apparently obtained n1 = 11 and n2 = 18, but values 20 and 20 were used for \safety"), window size used in the estimation of e~i was 9x9, and the factor p for rejecting outliers was 1.5. What is the meaning of the words \many" and \reasonably" in the context? How many is \many", and how high of a gray level is \reasonably" high (since 20 levels are used, they cannot be all \close" to saturation). There is a lot of \engineering" on the spot. As a whole, may 26
be the proposed procedure for estimating K (a; b) is satisfactory (although computationally expensive), but it is not clear if the same procedure will be satisfactory under dierent conditions. And how satisfactory it is actually? Evaluating performance of estimators in terms of variances, which are sound estimates only in case of the numerous normality assumptions is not enough. An independent evaluation procedure is necessary.
6 Radiometric Correction Procedures The variation in dark images and in at elds for physically dierent sensors clearly shows that sensors are not \equal", contrary to the assumptions made in many multisensor vision applications. Given, even, absolutely the same scene and illumination, due to all the factors we discussed in Section 3, physically dierent the cameras \see" dierently. A radiometric calibration procedure by which individual pixel values are corrected is also called
at elding or shading correction. There are dierent radiometric correction procedures. They all have the following features in common. The linear model is adopted. CCD cameras have good linearity, and if no gamma correction or extra nonlinear processing is done, a linear model often is assumed to be reasonable approximation. This assumption is not correct, since the ampli er introduces non-linearity, [Bey92], page 70. The value of the digital image at each pixel is corrected in the following way
pc = (p ? offset) gain
(6:10)
where p and pc denote the original and the corrected pixel values respectively, and gain and offset are pixel dependent parameters. For an ideal camera the gain should be 1 and the oset 0 over the whole array. The radiometric correction methods dier in the way the parameters gain and offset are de ned and estimated. But in all cases, they are computed based on two sets of images: dark images and at elds. In the following subsections, (i) we give the at- eld correction procedure we used in our experiments to study the signi cance of the radiometric correction in disparity map computation (an example of multicamera algorithm); (ii) we give the radiometric correction procedure of Healey and Kondepudy [HK94]; (iii) a radiometric correction procedure tested by Beyer [Bey92]. From the experiments conducted by Beyer [Bey92], it is clear that there are no theoretical reasons for choosing one of the presented radiometric correction methods over another, but in all cases radiometric correction methods improve the uniformity of the pixel response over the CCD array and have eect on detection algorithms (detecting dots, lines or edges) and matching algorithms.
6.1 Flat- eld Correction Used in the Disparity Map Experiment
We use the following at- eld correction procedure, [PH96], to correct for nonuniformities due to systematic errors from background noise and pattern noise. Steps 1-4 are executed once for the given sensor (camera/digitizer). 1. Obtain an average dark image D^ . 2. Obtain an average at eld F^ at a level close to saturation. 27
3. Compute the pixel-wise dierence e = F^ ? D^ . 4. Compute the average intensity, e~, of e, i.e., e~ = (1=N ) (F^(a; b) ? D^ (a; b));
X
(a;b)
where N is the total number of pixels in the digital image, the sum is taken over all pixels (a; b), and F^ (a; b) and D^ (a; b) denote the pixel values for the average at eld and the average dark image, respectively. 5. For a given digital image D, the corrected image DC is computed as follows. For every pixel (a; b), ^ DC (a; b) = D(a; be)(a;? bD) (a; b) e~;
where DC (a; b); D(a; b); D^ (a; b); e(a; b) denote corresponding pixel values in the corrected image, the original image, the average image and the image, respectively. In this correction procedure: (6.11) gain = e(a;e~ b) offset = D^ (a; b): (6.12)
6.2 The Radiometric Correction of Healey and Kondepudy
The procedure for radiometric correction given in [HK94] is as follows. Steps 1-2 are performed once for each sensor. 1. Obtain an average dark image D^ as discussed in Section 5.3.3. 2. Obtain an estimate, K^ , of the PRNU factor K as discussed in 5.3.4. This step is pretty involved and complex but it can be summarized as: (a) For i-th image con guration, average the at elds, the result is F^i . (b) For i-th image con guration, compute the pixel-wise dierence ei = F^i ? D^ . (c) For each pixel (a; b) of ei , average the values in a window of xed size centered at (a; b). Denote the average by e~i (a; b). (d) Average ei (a; b)=e~i(a; b) over i = 1; 2; : : :; P . Ideally, this would have been 1=gain for the pixel (a; b) for the procedure proposed. (e) In case some of the data are unreliable, an outlier detection and rejection is performed, and the average is recomputed base on the data without the outliers (See Section 5.3.4 for the details). 3. For a given digital image D the following correction is done. Correct the value at each site (a; b) as follows, ^ DC (a; b) = D(a; b^) ? D(a; b) : K (a; b) 28
In this correction procedure:
gain = ^ 1 K (a; b) offset = D^ (a; b):
(6.13)
(6.14) Note that one should not equate symbolically, the right hand-sides of equations (6.13) and (6.11). A(a; b) 6= K (^a; b): (6:15) m The correction proposed in [HK94] would have been exactly as the one we used, Section 6.1, except that for the computation of the gain in equation (6.11) the numerator is not e~ but e~i (a; b). Instead, in [HK94] the gain is computed not based on the single average at eld (or local average in a window of a single at eld) but on n1 such averaged at elds with some \outliers" removed. The process of \averaging" based on dierent at eld levels after the so called outliers are remover may be a justi able engineering approach to a particular problem, but there is no sound theoretical foundations why this should be so in a general procedure. To evaluate and compare the two methods the relationship has to be investigated.
6.3 Radiometric Correction Methods of Beyer
Two radiometric correction procedures are considered by Beyer [Bey92]. We give only the rst of the two methods presented in [Bey92] (the two methods are practicly the same theoretically). Steps 1-4 are performed once for each sensor. 1. Take the average of dark images, D^ . 2. Take the average of at elds close to saturation, F^ . 3. Compute the pixel-wise dierence e = F^ ? D^ . 4. Set a reference gray value for dark, gdark , and a reference gray value for bright, gbright, and compute the dierence e~ = gbright ? gdark (Note that in the two previous methods the referenced values for bright and dark were based on the mean values over averaged at elds and dark images, respectively) 5. The corrected value at a pixel (a; b) is ^ DC (a; b) = D(a; be)(a;? bD) (a; b) e~ + gdark e(a;e~ b) In this correction procedure: (6.16) gain = e(a;e~ b) dark (6.17) offset = D^ (a; b) ? ggain
Comparing (6.16) and (6.13), we see that in (6.13) the reference bright and reference dark values are the mean values over the averaged at eld and the mean value over the averaged dark images respectively. The more profound dierence is in the de nition of the offset: in the two previous methods the oset did not depend on the PRNU, but just the FPN; in the current method the oset is readjusted according to the FPN and the PRNU. 29
7 A Disparity Map Computation | an Example of a Multicamera Algorithm Here we present an example of multiple camera algorithm where radiometric correction is important. We study the in uence of a radiometric correction of the digital images on a stereo matching algorithm. The disparity map is the rst milestone in the algorithm, so we looked at comparisons at that level. For a given stereo pair of images we computed, rst, the disparity map; second, we computed the at- eld corrections for the original pair; third, we computed the disparity maps based on the uncorrected and on the at- eld corrected pairs; last, we evaluated the disparity maps.
7.1 Experimental setup
Con gurations SONY XC-77RR/DT1461 was used. The two cameras are connected to the same physical frame grabber, each camera is equipped with 25mm lens. The pair of cameras are approximately 80cm apart and at a verging angle of approximately 40. The cameras were warmed up for couple of hours, then they were geometrically calibrated and intrinsic and extrinsic camera parameters (using least squares) were recovered. Averaged dark images and at elds were obtained several days prior to the experiment (this has to be just done once, for a xed physical sensor): this correspond to steps 1-4 in the at eld correction procedure. For at eld acquisition caution should be paid to eliminate a possibility of image residues. Figure 9 shows the temporal average of 100 dark images and the corresponding scaled histograms for the two average images. The at eld approximations are the ones given in Figure 8 (page 20). When the distributions of the noise sources for the individual sensors are empirically con rmed, the sample size of the images to be averaged could be adjusted accordingly. In any case, the averaged dark image and at led acquisition is one-time, o-line process. The target was a planar white card, Lambertian re ectance surface. The card is approximately parallel to the vertical image axes of each camera. The illumination was controlled so that no saturation occurs. A stereo pair of raw images was acquired. The matching uses normalized crosscorrelation [Mor81], and match selection method forbidden zone constraint [YP84]. To account for the suppression of high frequencies due to low pass ltering process of the image (in camera and digitizer) a constant gain k is used in the calculation of image derivatives during the subpixel disparity map computation (see below). The gain is constant for a xed pair of cameras [SB96].
7.2 Tests conducted
First, a at- eld correction was performed on each of the raw images. Second, a subpixel disparity map based on the recti ed original stereo pair images (see Figure 10 (page 32)) was computed (we call this \uncorrected" disparity map). The gain constant k was tuned to an optimal value, 0.71 in this case [SB96]. Third, using the same recti cation and matching algorithms, we computed a subpixel disparity map on the recti ed at- eld corrected stereo pair of images (we refer to this map as \corrected"). The gain k = 1, no tuning was attempted. Figure 11 shows the disparity maps as 3D plots. The full size of the disparity maps is 256x256 since the recti cation was done in half resolution. For the purpose of displaying the result, the 3D plots show the subsampled maps at a quarter of the resolution. Some \holes" in the disparity maps can be observed, where the algorithm has failed (matches were rejected). Spikes represent outliers which erroneously were accepted as valid. The \corrected" map less holes and spikes. 30
100
100
200
200
300
300
400
400 200
300
400
500
100 6
5
5
4
4
% of data points
% of data points
100 6
3
2
1
1
5
10 intensity
Figure 9: SONY
15
0
20
XC-77RR/DT1451:averaged
300
400
500
3
2
0
200
5
10 intensity
15
20
dark images and their histograms (scaled)
7.3 Evaluation
The two subpixel disparity maps were evaluated by tting a plane model to each one of them. Then the comparison between the two disparity maps was done based on the statistics of the residuals. Figure 12 shows the histogram for the residuals in the two cases. The standard deviation of the residuals, for the \uncorrected" map was 0:1668, and for the \corrected" 0:0679. The experiment was repeated with at eld pairs at four dierent average levels, approximately 60, 110, 160, 220. Subpixel disparity maps without and with at- eld correction were computed based on these, planes were tted to each one, and statistics of residuals were compared. A monotone increase in the dominance of the performance of the matching algorithm with at eld correction over the matching without at- eld correction is observed:, from 26% for the low intensity level at eld (level 60), to 59% for the matching with at eld correction at level 220.
31
Left image
Right image
100
100
200
200
300
300
400
400 100
200
300
400
500
100
200
300
400
500
Figure 10: The test pair of original stero images: white paper card, planar.
8 Conclusions The imaging process with a CCD camera is based on the physical principle of converting photons into a measurable quantity (voltage). The photons generate charges. For each sel, the charges generated during the exposure time are collected into a charge packet. When suitable clock voltages are applied a \moving array of potential wells" is created which stores and transfers the charge packets. During transport, rst, the charge packets are transferred, in parallel, along vertical CCDs toward a horizontal CCD shift register. From this register, the charge packets are shifted to the output stage where they are measured (converted to voltages) and ampli ed (by an on-chip ampli er and later an o-chip ampli er). The image array content is transferred line-by-line to the horizontal shift register, and from there, a line is pumped out, sel-by-sel, at time periods set by the camera pixel clock. Next the analog signal is ampli ed and resamples and digitized by the framegrabber. The nal output of the video sensor is the digital image. Note that the charges collected at a sel are integrated over the area of the sel during the exposure time. What we observe in the digital image is the resampled and discretized values of the data collected in the camera with some measurement error. And even prior to falling on the photo sensitive surface of the camera, the light goes through the optical system (lens), which acts as a projection system (from 3-D real scent to 2-D ideal image) which also adds some blur and some distortions to the ideal image. In conclusion, there are geometric and radiometric uncertainties and discrepancies in the digital image (in position and in value). These are due to the optics, the CCD camera, and the digitizer operating together as a complete system. We analyzed the sources and character of some of these discrepancies (systematic and random) in Section 3. Regarding the camera noise: photon shot noise, dark current, xed pattern noise of the dark current and photoresponce nonuniformities, charge trapping and charge transfer noise are generated during the charge collection and charge transfer phase; reset noise results during the measuring stage, at the output node; ampli er noise (white noise and 1/f noise) occurs during the ampli cation stage { on-chip and o-chip (with the on chip ampli er having in general a higher level than the o chip 32
Subpixel disparity map for original image pair (with tuning)
30 20
disparity
10 0 −10 −20 −30 0
0 50
50 100
100 150
150
x
y Subpixel disparity map for corrected image pair (no tuning)
30 20
disparity
10 0 −10 −20 0 −30 0
50 50
100 100 150
150
x
y
Figure 11: Subpixel disparity maps based on the original stereo pair (top) and the at- eld corrected pair (bottom). The maps are shown as a 3D plots.
33
Corrected 16
14
14
12
12
10
10
% of data points
% of data points
Uncorrected 16
8
8
6
6
4
4
2
2
0 −2
−1
0 1 residuals value
2
0 −2
3
−1
0 1 residuals value
2
3
Figure 12: Histograms the of residuals ampli er). The quantization noise is a result of ADC process. Cooling the camera reduces the dark current and reset noise, correlated double sampling takes care of ampli er noise and reset noise. Fixed pattern noise in the dark current is additive and can be removed from the images while pattern noise associated with photoresponce nonuniformities is multiplicative and the images has to be scaled accordingly. The shot noise is always present. When the camera pixel clock and the digitizer sampling clock do not have the same frequencies, a camera pixel does not correspond to a pixel in the digital image | horizontal scale parameter is introduced (it is one of the parameters recovered by geometric calibration). With the use of digital cameras, and the development of technology, the distortions due to camera/framegrabber interface will be minimized (The noise sources are still present. Manufacturers of digital cameras report the pixel-to-pixel variation in response, so PRNU noise is characterized at least). The details with which the noise sources and levels are analyzed depends on the application. Traditionally noise sources are considered additive, and are combined by taking square root of sum of squared dierences. Theoretically, this is justi able in case of uncorrelated noise sources, and in cases when mean square error is a reasonable measure of optimality. Camera speci cations should be read with the understanding that some of the parameters reported have only qualitative value, and cameras should not be compared blindly using such parameters. For other parameters which are not standardly reported, additional measurements and calibration may be needed. We show that accounting for the radiometric variation in the measurements of multiple sensors (analyzing the xed pattern noise in dark current and photoresponce nonuniformities in at elds) and \normalizing" the digital images prior to matching, improves signi cantly the results in the computation of the disparity maps for uniform brightness scene. This con rms the importance of the radiometric calibration, so we intend to address the problem in its full complexity. We plan to use the procedure to help stereo matching locally (at palaces where it fails due to uniformity of the patches). We continue our work on deriving an image model for the radiometric distortions and error measurement models we will use in multisensor vision applications for: (i) \equalizing" the output from dierent sensors; (ii) estimating the random noise error statistics in the digital images; (iii) propagating the errors through the vision algorithms and deriving a quantitative measure for their performance. 34
References [And88]
Helen Anderson. GRASP Lab camera systems and their eects on algorithms. GRASP laboratory technical report, (161,MS-CIS-88-85), 1988. [Bar75] D.F. Barbe. Imaging devices using the charge-coupled concept. IEEE, Proceedings, 63:38{67, 1975. [Bee57] Yardley Beers. Introduction to the theory of error". Addison-Wesley, 1957. [Bey90] Horst Beyer. Linejitter and geometric calibration of CCD cameras. ISPRS Journal of Photogrammetry and Remote Sensing, (45):17{32, 1990. [Bey92] Horst Beyer. Geometric and radiometric analysis of a CCD-camera based photogrammetric close-range system. PhD thesis, Institut fur Geodasie und Photogrammetrie, Zurich, May 1992. [BKM86] Ruzena Bajcsy, Eric Krotkov, and Max Mintz. Models of errors and mistakes in machine perception. GRASP laboratory technical report, (64,MS-CIS-86-26), 1986. [BL80] E. Beynon and D. R. Lamb. Charge-coupled devices and their applications. McGraw-Hill, 1980. [Bro71] Duane Brown. Close-range camera calibration. Photogrammetric Engineering, 31:855{ 866, March 1971. [DA89] U. R. Dhond and J. K. Aggarwal. Structure from stereo | a review. IEEE Transactions on Systems, Man, and Cybernetics, 19(16), November/December 1989. [Enn71] Harold Ennes. Television broadcasting: equipment, systems, and operating fundamentals. Howard W. Sams & Co, Inc., 1971. [Fau93] Olivier Faugeras. Three-dimensional computer vision: a geometric viewpoint. MIT Press, 1993. [For96] Wolfgang Forstner. 10 Pros and cons against performance characterization of vision algorithms, 1996. [GM96] Florou Giannoula and Roger Mohr. What accuracy for 3D measurements with cameras? In IEEE International Conference Pattern Recognition, pages 354{358, 1996. [HK94] G. E. Healey and R. Kondepudy. Radiometric CCD camera calibration and noise estimation. PAMI, 16(3):267{276, March 1994. [Hol96] Gerald C. Holst. CCD arrays, cameras, and displays. Winter Park, FL : JCD Pub., 1996. [HS93] Robert Haralick and Linda Shapiro. Computer and robot vision, volume 2. AddisonWesley, 1993. [Ino89] S. Inoue. Video microscopy. Plenum Press, 1989. 35
[Jah93] [JB91] [JEC+ 84] [JEW+ 95] [JKE85] [LF90] [LT87] [Mad96] [Mcl89] [Mor81] [PH96] [Poy96] [Pri86] [SB96] [SHL+ 95] [Sla80]
Bernard Jahne. Digital image processing: consepts, algorithms and scienti c applications. Springer-Verlag, 1993. T. Jain and T. Binford. Ignorance, myopia, and naivite in computer vision systems. Computer Vision, Graphics, Image Processing, 53(1):112{117, 1991. J. R. Janesick, T. Elliot, S. Collins, H. Marsh, M. Blouke, and J Freeman. The future scienti c ccd. In Proc. of SPIE State-of-the-art imaging arrays and their applications, San Diego, California, volume 501, Aug 1984. J. R. Janesick, T. Elliot, R. Winzenread, J. Pinter, and R Dyck. Sandbox CCDs. In Proc. of SPIE Charge-coupled devices and solid state optical sensors V, San Jose,California, volume 2415, Aug 1995. J. R. Janesick, K. Klaasen, and T. Elliot. CCD charge collection eciency and the photon transfer technique. In Proc. of SPIE Solid state imaging arrays,San Diego, California, volume 570, Aug 1985. Reimar Lenz and Dieter Fritsch. Accuracy of videomery with CCD sensors. ISPRS Journal of Photogrammetry and Remote Sensing, 45:90{110, 1990. Reimar Lenz and Roger Tsai. Techniques for calibration of the scale factor and image center for high accuracy 3D machine vision. ISPRS Journal of Photogrammetry and Remote Sensing, 45:90{110, 1987. Brian Madden. Personal communication, 1996. Ian. Mclean. Electronic and computer-aided astronomy: from the eyes to electronic sensors. Ellis Horwood Limited, England, 1989. H. Moravec. Robot rover visual navigation. Computer Science:Arti cial Intelligence. UMI Research Press, 1980/1981. http://www.photomet.com Photometrics Homepage. Photometrics high performance ccd imaging, September 1996. Charles Poynton. A Technical introduction to digital video. John Wiley & Sons, Inc., 1996. K. Price. Anything you can do, i can do better (no you can't)... CVGIP, 36, 1986. Radim Sara and Ruzena Bajscy. Reconstruction of 3-d geometry and topology from polynocular stereo, GRASP Laboratory, University of Pennsylvania, work in progress, 1996. Donald L. Snyder, C. Helstrom, A. Lenterman, M. Faisal, and R. White. Compensation for readout noise in CCD images. J. Optical Society of America, A, 12(2):272{283, February 1995. C. Slama. Manual of photogrammetry, IVth edition. American Society of Photogrammetry, 1980. 36
[The95]
Albert J. P. Theuwissen. Solid-State imaging with charge-coupled devices. Kluwer Academic Publishers, 1995. [Vez96] Jean-Mark Vezien. Notes on the calibration of two GRASP lab frame grabbers; personal communication, 1996. [WCH92] Juyang Weng, Paul Cohen, and Marc Herniou. Camera calibration with distortion models and accuracy evaluation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(10):265{980, Oct 1992. [YP84] A. Yuille and T. Poggio. A generalized ordering constraint for stereo correspondence, mit, ai lab memo, no. 777, 1984.
37
Figure 13: Speci cations from the Operation Manual of Model KP-230/231 Solid-State Black-andwhite TV Camera; Hirachi Denshi, Ltd. 38
Figure 14: Speci cations for CCD Vision Camera Module SONY XC-77RR/77RRCE; Sony Corporation; Chori America, Inc. 39
Figure 15: Speci cations for CCD Color Camera Module SONY XC-007/007P; Sony Corporation 40