IEEE TRANSACTIONS ON ELECTRON DEVICES
Digital Camera Imaging System Simulation
1
Junqing Chen, Member, IEEE, Kartik Venkataraman, Dmitry Bakin, Brian Rodricks, Robert Gravelle, Pravin Rao, and Yongshen Ni
2 3
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1
Abstract—A digital camera is a complex system including a lens, a sensor (physics and circuits), and a digital image processor, where each component is a sophisticated system on its own. Since prototyping a digital camera is very expensive, it is highly desirable to have the capability to explore the system design tradeoffs and preview the system output ahead of time. An empirical digital imaging system simulation that aims to achieve such a goal is presented. It traces the photons reflected by the objects in a scene through the optics and color filter array, converts photons into electrons with consideration of noise introduced by the system, quantizes the accumulated voltage to digital counts by an analogto-digital converter, and generates a Bayer raw image just as a real camera does. The simulated images are validated against real system outputs and show a close resemblance to the images captured under similar condition at all illumination levels.
19 Index Terms—Hyperspectral scene, imaging sensor, lens char20 acterization, modulation transfer function (MTF), pixel charac21 terization, pixel noise, point spread function (PSF).
I. I NTRODUCTION
22
T
HE INCREASING popularity of multimedia applications has led to the ubiquitous presence of digital cameras in various mobile devices such as cell phones, PC webcams, and 26 gaming devices. Such trend incentivizes the development of 27 smaller imagers at lower cost with the least sacrifice of the 28 image quality. A digital camera is a complex system that is 29 composed of a lens, a sensor, and a digital image processor, 30 each of which is a sophisticated system on its own. Since 31 making a test system is costly and time-consuming, it is highly 32 desirable for imager makers, system integrators, and image 33 processing algorithm developers to explore the design tradeoffs 34 and visualize the outcome images via simulation before the 35 actual system becomes available. 36 The digital camera system is usually modeled as a linear 37 system based on the principles of optics and device physics 38 [1]–[3], [5], [7]. As shown in Fig. 1, a camera takes the trace 23 24 25
Manuscript received January 6, 2009; revised April 22, 2009. The review of this paper was arranged by Editor P. Magnan. J. Chen, D. Bakin, and R. Gravelle are with Aptina Imaging LLC, San Jose, CA 95134 USA (e-mail:
[email protected]; dbakin@aptina. com;
[email protected]). K. Venkataraman is with Pelican Imaging Corporation, Mountain View, CA 94041 USA. B. Rodricks is with Fairchild Imaging, Milpitas, CA 95035 USA. P. Rao is with Pixim Inc., Mountain View, CA 94043 USA. Y. Ni is with OmniVision Technologies, Inc., Santa Clara, CA 95054 USA. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TED.2009.2030995
Fig. 1. Camera model. A camera is modeled as a cascaded system of optics, a sensor, and a digital image processor. The sensor is modeled as a series of filtering (the infrared filter, the color filter array, and the pixel QE) followed by a sampling of the sensor grid, where signal crosstalk happens simultaneously.
of light from a scene and passes it through the lens and to 39 the sensor. In the context of this paper, the word scene refers 40 to a projection of information from any physical environment 41 into the eyes or cameras. With the assumption that all objects 42 reflecting the incoming light only and single light source, such 43 projection can be decomposed into the interaction of object 44 spectral reflectance and the spectral energy of the light source. 45 The content of a scene may be synthetic targets (for example, 46 Macbeth Color Checker) or calibrated representations of real 47 world [5], [6]. A simple approximation of the imaging lens 48 is a parametric diffraction-limited imaging optics model. It 49 counts for a positional dependent irradiance degradation by 50 the cosine fourth law and a linear blur by the spectral optical 51 transfer function (OTF) [1], [9]. A more advanced lens model 52 was proposed [4], [8], [11] for better estimation of a real 53 system. The sensor module simulates the transformation of 54 irradiance into an electrical signal. It can be a physical or 55 an empirical model. Since an accurate physical model links 56 too tightly to a specific sensor design, an empirical model is 57 often used instead. By utilizing sensor characterization data, 58 an empirical model separates the sensor design details and 59 the key system performance describers and thus works better 60
0018-9383/$26.00 © 2009 IEEE
2
IEEE TRANSACTIONS ON ELECTRON DEVICES
as a more generic block. The sensor module accounts for the spatial sampling of the optical image by the imaging sensor, the size and position of pixels and their fill factors, the spectral 64 selectivity of color filters, the transmittance of the infrared cut 65 filter, the photodetector spectral quantum efficiency (QE), the 66 conversion gain (from electrons to voltage), and various noise 67 sources; integrates over spectral bands, photosensitive areas, 68 and exposure time; passes the integrated voltage through an 69 analog-to-digital converter (ADC); and generates a raw Bayer 70 image output in digital counts (DNs). 71 The aforementioned camera modeling framework has been 72 widely accepted and proven to be quite successful [1]. However, 73 when it comes to the validation against an actual digital camera, 74 there is still a gap caused by the lack of model input (scene) 75 sampling precision, the simplistic diffraction-limited optics 76 model, and the sensor-production-related issues. In an effort to 77 narrow such difference, the proposed work extends the existing 78 model in the following aspects: 61
62 63
79 80 81 82 83 84 85 86 87 88 89 90
1) Build a high-resolution (16 M) multispectral scene library that includes various scenes (HDR or non-HDR) and represents the scene by spectral reflectance rather than direct spectral exitance, which allows flexibility to simulate the same scene with different light sources. This is useful particularly for synthetic targets. 2) Extend the optics model to a real lens whose spectral characterization data can be extracted from the lens design in Zemax or Code V,1 resulting in closer resemblance to an actual system. 3) Extend the sensor model to include signal crosstalk for better approximation of the actual system.
Hereafter, the proposed digital camera simulation tool is referred to as the camera and scene modeling (CSM) simulator. Similar to prior work in literature, it has three major com94 ponents: scene, lens, and sensor. Sections II–IV discuss each 95 element in great details. Section V addresses the practical im96 plementation issues. Section VI outlines the system validation 97 efforts, followed by the conclusion in Section VII. 91 92 93
II. M ULTI S PECTRAL S CENE
98
The CSM uses a high-end Canon Mark II EOS-1Ds camera with a Canon EF 50-mm f/1.4 USM lens for scene data collection. An important assumption made in the CSM is that the high-resolution scenes captured by the Canon camera are 103 deemed as infinite resolution. As a result, the performance of 104 the simulation will be limited by the performance of the Canon 105 camera and the lens that were used. Since the products in the 106 mobile space were of primary interest, such an assumption is 107 generally valid. For a higher end system, a better input needs to 108 be used instead to ensure quality. 109 The dynamic range of the selected Canon camera is about 110 70 dB. However, a lot of scenes in nature exceed this dynamic 111 range and could go up to 100 dB or more. To ensure that the 99
100 101 102
1 Both
are popular lens design software tools. For details, please refer to www.zemax.com and www.opticalres.com.
CSM is capable of simulating various scenarios, it is important 112 to include both HDR and non-HDR scenes in the database. The 113 CSM uses the combined multiple-exposure approach to extend 114 the dynamic range of a camera. Note that this works only if 115 the objects in the scene are static and that lighting variation 116 is tiny. When a nonstatic object is present in the scene, the 117 registration of the multiple captures could be quite complicated, 118 and, in our practice, a single capture was used. However, there 119 are more advanced image registration methods widely available 120 in the literature, and they could be used to improve the system 121 performance. 122 The synthesis of multispectral scene data from captured RGB 123 images of natural scenes includes two steps. The first step is the 124 calibration of a given camera plus a lens system and deriving 125 of the camera response curve that maps the RGB in DNs to its 126 RGB exitance in watts per square meter. The second step is to 127 extend the RGB exitance to multispectral data via a principal 128 component analysis (PCA). Sections II-A and -B describe the 129 details of each step. 130 A. Obtain Camera Response Curve
131
Various techniques [12]–[15] were proposed in an effort to 132 determine the relationship between camera signals and radio- 133 metric quantities. The seminal work in the field of multiple- 134 exposure composition by Debevec and Malik [12] is followed 135 for its mathematical simplicity and robustness. The method- 136 ology is quite simple. Define g() as the transfer function of 137 the imaging system. A pixel DN value (Z) obtained with 138 exitance E at a scene and exposure time T is then related by 139 Z = g(ET ). The inverse function g −1 is often referred to as 140 the camera response curve. By constraining g to be monotonic, 141 limiting the Z values to a given range, adding some second- 142 order smooth constraints, and performing optimization in the 143 natural logarithmic domain, the unknown g −1 can be obtained 144 via a standard least square optimization method when enough 145 pixel values are used. 146 The Canon camera was set to raw capture mode to obtain 147 12-b 16-M raw images. Absolute raw values are used to ensure 148 that true camera response is obtained.2 Fig. 2 shows the Canon 149 camera response curve g −1 . The data points for solving the 150 algorithm were obtained by imaging a Macbeth chart in a 151 light booth with different exposure times. The minimum and 152 maximum exposure times were selected based upon the criteria 153 of slightly underexposing the highlights and almost saturating 154 the shadows in the scene, respectively. The data from the 155 24 patches were used to obtain the response curve for each 156 channel. 157 The curves in Fig. 2 are on a relative scale. Absolute calibra- 158 tion can be derived by measuring an absolute luminance in the 159 scene being photographed. Increasing the number of exposures 160 helps to average out noise and thus boost the reconstruction 161 accuracy when the provided multiple exposures fall in the linear 162 range of the sensor response function and are well separated. 163 2 The Canon camera outputs raw images in .CR2 format. Open source code dcraw was used to extract the raw DNs. Details on dcraw is available at www.guillermoluijk.com/tutorial/dcraw/.
CHEN et al.: DIGITAL CAMERA IMAGING SYSTEM SIMULATION
3
where C is the 3 × 31 conversion matrix. Substituting Si with 196 (1) in (2) 197 ri = CBwi .
Fig. 2. Response curves for the Canon EOS-1Ds Mark II camera with a 50-mm EF USM f/1.4 lens. The calculation was performed per color channel on the Canon raw data. The curves are obtained for each channel and plotted by each prime color. They almost sit on top of each other.
It should also be noted that the number of data points used to solve for the radiometric response function should be greater than the range of the pixel values. For an N -b scale, this needs N 167 to be 2 − 1 or greater to ensure a sufficiently overdetermined 168 system of equations. 164
165 166
169
B. Obtain Multispectral Scene Data
With the curves defined in Fig. 2, the DNs of the Canoncaptured image are mapped to exitance of R, G, and B color channels. The next step is to derive the spectral exitance from 173 the RGB exitance obtained. This is done by utilizing a cali174 brated target. More specifically, 24 patches of a color checker 175 were lit by three light sources: daylight, tungsten, and fluores176 cent. At each lighting condition, the multiexposure images of 177 the chart were captured with the Canon camera, and the spectral 178 exitance was measured with a Gamma Scientific spectrora179 diometer from 400 to 700 nm with a 10-nm step, i.e., 31 spectral 180 bands. Under the assumption that the spectral exitance of the 181 24 patches is adequate to expand the spectral exitance space, 182 any RGB exitance can thus be expressed as a linear combination 183 of them. To further reduce the dimensionality, the PCAs [21] 184 were applied. PCA is a popular mathematical method that is 185 particularly useful in identifying the largest variations in the 186 data via principle components (PCs) and represents the data in 187 a coordinate system that is defined by the PCs. Six PCs were 188 found to be adequate for the CSM. Now, we have 170 171 172
Si = B · wi
(1)
where Si (i = 1, . . . , 24), a column vector (31 × 1), denotes the exitance of each of the 24 color patches, B is the PC matrix (31 × 6) with each column corresponding to one PC, and wi 192 is the weighting vector (6 × 1) associating with each of the 193 24 color patches to the PC selected. Assuming that the spectral 194 exitance (Si ) is related to the RGB exitance (ri , 3 × 1 vector) 195 linearly
189 190 191
ri = CSi
(2)
(3)
Since B is constant, the product of C and B can be combined 198 into one matrix CB (now becomes 3 × 6). For a given scene, 199 once CB is determined and the RGB exitance is known, the 200 weighting vectors wi can be calculated from (3), and the spec- 201 tral exitance Si can be obtained via (1). Note that, as needed by 202 such calibration stage, a 24-patch color checker image captured 203 under similar lighting condition as the scene is always required. 204 In order to provide the flexibility of the varying light 205 sources, the aforementioned calculation was performed in the 206 reflectance space, and the scene data saved are the spectral 207 reflectance rather than the spectral exitance. The concept and 208 procedure stay largely the same. In Fig. 3, the estimated spec- 209 tral reflectance is plotted against the measured one. They are 210 comparable, and most deviation occurs in the high wavelength 211 range. This is of less importance as, in the real system, an 212 infrared cut filter would have cut them anyway. 213 III. O PTICS S IMULATION
214
Optics plays a critical role in the imaging system. It transmits 215 and refracts the light reflected by objects in a scene and projects 216 part of it (that is within the field of view) onto the sensor plane. 217 Like any system, it also introduces errors. Aberrations of an 218 optical system fall into two classes, namely, monochromatic 219 and chromatic aberrations [16]. The following data characterize 220 the key aberrations: 221 1) Distortion data describe the deviation from a recti- 222 linear projection. Barrel and pincushion are the most 223 common distortions caused by image magnification 224 decreases/increases with the distance from the optical 225 axis. Fig. 4 illustrates both effects. 226 2) Relative illumination data reflect the reduction of an im- 227 age’s brightness or saturation at the periphery compared 228 to the image center. 229 3) Point spread function (PSF) describes the defocus (blur) 230 to the image resolution introduced by a lens; when sam- 231 pled spatially on the image plane, it accounts for spherical 232 distortion, coma, astigmatism, and aberrations caused by 233 curvature of a field; when sampled spectrally, it reflects 234 chromatic aberration as well. 235 When only a few key lens parameters (focal length, magnifi- 236 cation, and f number) are known, the optics system is assumed 237 to be diffraction limited with no distortion, cosine fourth law for 238 relative illumination, and spatially invariant PSFs. For a circular 239 aperture, the OTF of such system is [20] 240 2 −1 (4) cos (x) − x 1 − x2 OTF(x) = π where the arc cosine function is in radians and x is the normal- 241 ized spatial frequency defined as the absolute spatial frequency 242 f divided by the diffraction cutoff spatial frequency fc 243 x = f /fc .
(5)
4
IEEE TRANSACTIONS ON ELECTRON DEVICES
Fig. 3. (Red) Estimated versus (green) actual spectral reflectance for 24 patches of a color checker.
Fig. 4. Optics distortions: (Left) Barrel and (right) pincushion distortion.
244
There are various formulas to calculate fc , for example fc =
D λf
(6)
where λ denotes the wavelength, f is the focal length, and D stands for the diameter of the lens clear aperture. Two as247 sumptions are made in (6), namely, unity image space refractive 248 index and infinite conjugate ratio. 249 The modeling of an image formed by a real lens is tradi250 tionally done using optical design programs. The simulation 251 of the optical image formation through a multielement lens 252 typically requires millions of ray-tracing operations and is time253 consuming. Fortunately, programs such as Zemax or Code V 254 allow computing projection passes of multiple rays through 255 the optical elements of the imaging system and establish the 256 shape of PSFs, as well as local image distortions and changes 257 in relative illumination. The calculations are based on lens 258 prescriptions provided by lens designers. The ray-tracing data 259 allow characterizing the image at various field heights. The data 260 are collected over the full field on a rectangular grid at prese261 lected image heights and over the operating spectral range. A 262 specially written macrocommand is used to operate the optical 263 design program in an automated manner. The data are recorded 264 into files for processing in the CSM in the subsequent steps. 245 246
The accuracy of the results of the modeling was confirmed 265 in the laboratory measurements against the performance of a 266 real lens. 267 With the data obtained, utilizing them for simulation is 268 relatively straightforward. The distortion is simulated as sam- 269 pling grid mapping followed by resampling to a rectangular 270 grid. Optical vignetting is simply the scaling of data based on 271 spatial positions. The application of PSFs can be done in the 272 Fourier domain for the diffraction-limited optics. In the real- 273 lens case, as PSFs are spatially varying, the Fourier domain 274 calculation loses its advantage. The spatial domain convolution 275 is used instead. The PSF at each spatial position is estimated 276 and resampled, followed by 2-D convolution. The convolution 277 kernel size varies from 31 × 31 to 81 × 81 pixels or bigger 278 depending on the PSF sampling spacings. As image dimension 279 increases and the computation becomes forbiddingly slow, not 280 to mention, these calculations need to be repeated for each spec- 281 tral band. Section V discusses ways to accelerate the process via 282 parallel computing. 283 A lens only passes part of the energy reflected by the scene to 284 the sensor plane. The relationship between the scene radiance 285 L and the sensor plane irradiance E is guided by the basic 286 principles of radiometry [19] 287 E = πL
1 1 + 4 (f # (1 + |m|))2
(7)
where f # and m are the effective focal length and magnification of a lens, respectively. The sensor plane irradiance represents the amount of energy that hits the pixel. The sensor then converts the irradiance image on the sensor plane into electrical signals.
290 291 292
IV. S ENSOR M ODEL
293
288 289
A sensor is an electrical device consisting of an array 294 of pixels. It converts incoming photons into electrons via 295
CHEN et al.: DIGITAL CAMERA IMAGING SYSTEM SIMULATION
5
photodetectors, collects the converted electrons and translates them to electric potentials through a source follower, amplifies the analog signal via chains of amplifiers, and then quantizes 299 it to a digital signal via an ADC for further digital signal 300 processing. There are two kinds of imaging sensors, namely, 301 the CCD and the CMOS sensor. They follow similar working 302 principles and are largely different in readout circuits. As a 303 result, the basic sensor electrical model remains the same with 304 some difference in the sensor noise model components. 305 This paper focuses on CMOS imaging sensor modeling. The 306 following sections briefly review the electrical and noise model 307 that was quite well established and the simple linear quan308 tizer. The sensor crosstalk model is discussed in much greater 309 details. 296
297 298
310 311
A. Sensor Electrical Model The sensor collects the signal spectrally and spatially T λmax V = cg · q · N (λ)QE(λ)S(λ)dλdAdt
(8)
0 A λmin
where V (in volts) is the sensor output, cg (in volts/e− ) is 313 the pixel conversion gain, q denotes the electron charge con314 stant, QE(λ) is the spectral QE of the pixel, S(λ) denotes 315 the combination of spectral filters in the sensor including the 316 transmittance of the color filters and the infrared cut filter, A 317 represents the effective photodetector area, λ and T stand for 318 the wavelength and exposure time, respectively, and N (λ) is 319 the number of incoming photons that is related to the sensor 320 plane irradiance E defined in (7) by
312
N (λ) =
E(λ) · λ h·c
(9)
where h and c are the Planck’s constant and speed of light, respectively. As a monochromatic sensor is difficult to ob323 tain, QE is usually measured with color filter transmittance 324 included. 325 The bottom in Fig. 1 models the sensor as a series of 326 filtering followed by sampling of the sensor grid, where sig327 nal crosstalk happens simultaneously. Although sampling is a 328 relative easy operation, great care needs to be taken to prevent 329 signal aliasing. Crosstalk is a very complicated process and will 330 be addressed in Section IV-C. 321 322
331
B. Sensor Noise Model
An additive sensor model [17], a well-known industrial standard, is used. Among various noise sources, a few most significant ones are the following. 335 1) Photon shot noise (Nshot ) occurs due to the random 336 arrival nature of photons at the sensor and is a ran337 dom process that obeys Poisson statistics in electrons or 338 photons. 339 2) Read noise (Nread ) roots in an on-chip preamplifier 340 and occurs during reset and pixel value readout. It has 341 two parts, namely, the white Gaussian noise and the 332
333 334
Fig. 5. Illustration of a sensor crosstalk. (Left) Optical crosstalk. (Right) Electrical crosstalk. The incoming light already converges by the microlens. It shows how the signal intended for the center red pixel to get crosstalked into the neighboring green pixels optically and electrically.
flicker noise. The Gaussian part decreases as the analog 342 gain increases. The flicker noise approximately varies 343 inversely to the frequency and affects lower frequen- 344 cies more. It stays relatively flat in the frequency range 345 of the output sensor and thus is ignored in the CSM 346 modeling. 347 3) Dark current noise (Ndark ) has two parts; one is shot 348 noise (that follows the Poisson distribution) that is pro- 349 duced when the photocathode is shielded from all exter- 350 nal optical radiation and operating voltages are applied; 351 the other is the dark current nonuniformity noise, mod- 352 eled as a white Gaussian distribution. It is also considered 353 as one of the fixed pattern noises. 354 4) Fixed pattern noise (Nfpn ) is a nontemporal spatial 355 noise due to device mismatches or variations in the 356 pixel/color filters, gain amplifier, and ADC. In addition 357 to dark nonuniformity noise, two other noise sources are 358 considered, namely, the photon response gain nonunifor- 359 mity noise (Ngain ) modeled as a Gaussian noise with 360 unity mean and the row-wise fixed pattern noise modeled 361 as a column vector that is the same for each row, with zero 362 mean and magnitude that is half of the maximum of read 363 noise per row. 364 The noise parameters are measured through experiments and 365 assumed to be constant for a given sensor, except for photon 366 shot noise which correlates with signal itself and dark noise 367 that depends on exposure time. With all noise source expressed 368 in unit volt, the sensor analog output Vo becomes 369 Vo = (V + Nshot ) ∗ Ngain + Nread + Ndark + Nfpn
(10)
where V denotes the real sensor signal.
370
C. Sensor Crosstalk Model
371
Crosstalk refers to any phenomenon by which a signal trans- 372 mitted on one channel creates an undesired effect in another 373 channel. Based on the nature of sensor crosstalk, it can be clas- 374 sified into two categories: electrical and optical crosstalk. Fig. 5 375 illustrates the crosstalk graphically. The electrical crosstalk is a 376 result of photogenerated charge migration between pixels. Its 377 impact becomes more pronounced as the pixel pitch is reduced. 378 The optical crosstalk is due to photons incident on a pixel that 379
6
IEEE TRANSACTIONS ON ELECTRON DEVICES
Fig. 7. (Left) Micron MT9T012 sensor QE of the sensor along with the transmission curves for filters used to obtain narrow-band light sources. The figure on the right shows the signal captured by the sensor. Each color channel output is plotted with its prime color.
Fig. 6. Crosstalk experiment setup illustration. Two rotational axes and the CRA are marked.
are then captured in an adjoining pixel. The optical crosstalk is more severe at the edges of the sensor where incident angles are high and is further impacted by pixel geometry and layout 383 structure. For on-axis pixels, whose radiation is received at 384 normal incident angle, there is minimal optical crosstalk. As we 385 move away from the sensor center, both crosstalks kick in, and 386 it is impossible to separate them. However, as the pixel shrinks, 387 the optical crosstalk starts to pick up even for the on-axis pixel. 388 That is because, when maintaining the same F number, the lens 389 could generate a spot size that is bigger than a pixel or, when 390 reducing the F number, the lens is moving into a bigger incident 391 angle as the photon detector shrinks. 392 The maximum chief ray angle (CRA) of a lens is the angle 393 of incidence a ray makes that passes through the center of the 394 lens and reaches the diagonal edge of the sensor. The smaller 395 the maximum CRA, the better the light transfer efficiency 396 and, hence, the better the image quality. A typical cell-phone 397 module maximum CRA is about 20◦ . When a sensor design 398 is fixed, both micrometer-lens placement and lens CRA as a 399 function of spatial position and wavelength are known. When 400 the actual lens CRA deviates from the design assumption, 401 optical crosstalk is elevated. 402 An experiment has been designed to catch such mismatch. 403 Since it is impossible to separate the electrical crosstalk from 404 the optical crosstalk, the data collected account for both. 405 Narrow-band light sources are used. No lens is included in the 406 setup. Light is incident onto a sensor, and the sensor is oriented 407 such that a certain range of pixels would receive photons at 408 angles that satisfy the conditions under a lens+sensor imaging 409 system. Since the rotating sensor and changing light incident 410 angle are equivalent, an automated test platform is designed to 411 rotate the sensor along the horizontal, vertical, and diagonal 412 axis of the sensor. Fig. 6 illustrates the experiment setup. For 413 example, when the sensor is rotated around the horizontal axis, 414 only the pixels denoted by the green line satisfy the CRA 415 condition and can be used to reconstruct the crosstalk at those 416 locations. Similarly, if the sensor is rotated around the vertical 417 axis, only the pixels denoted by the blue arrows satisfy the 380 381 382
CRA condition, and so on. Note that such experimental design 418 measures low-frequency crosstalk only. 419 Two sets of data are collected. In the first set, the sensor is 420 rotated along its horizontal, vertical, and diagonal axis over an 421 angular range of ±30◦ in steps of 1◦ . This is wide enough to 422 cover most cellular sensors. At each rotation step, only certain 423 pixels along the axis satisfy the CRA requirements for the 424 lens+sensor system of interest and generate signals. A total of 425 61 images were collected for each of the horizontal, vertical, 426 and two diagonal rotations of the sensor. 427 To facilitate later data interpolation, it is desirable to have 428 the data sampled on a rectangular grid. This is achieved by 429 the second data set. The sensor is scanned over a series of 430 angles to cover spatially uniformly distributed rectangular grid 431 points across the entire sensor planes; the crosstalk data are 432 measured at pixels that satisfy the CRA condition. On a grid 433 of 13 × 17, 221 images that are uniformly distributed across 434 the entire sensor were collected during the second stage. The 435 spacing between data points was approximately 75 pixels on a 436 3-M imaging sensor. A complete data set contains 465 images. 437 Each image was an average of 100 images to reduce temporal 438 noise. 439 Fig. 7 (left) shows the spectral QE of a Micron MT9T012 440 sensor [18] (solid lines) along with the normalized transmission 441 spectrum (dashed lines) of the seven filters used to obtain 442 the narrow-band light sources that covers a 400- to 700-nm 443 range, centering at 400, 450, 500, 550, 600, 650, and 700 nm. 444 Fig. 7 (right) shows a plot of the response of the red, green, and 445 blue pixels to the signal that is incident on them (modulated by 446 the filter at the specified wavelength). It shows the calculated 447 total response (pixel response and crosstalk) of the pixel located 448 at the sensor center. The area under each curve, for a given 449 filter, can be integrated to calculate the percentage of the signal 450 expected under each of the pixels corresponding to the RGB of 451 the CFA. 452 For each narrow-band light source, the image that corre- 453 sponds to the sensor center is identified, and an average signal 454 over an 11 × 11 region of interest is calculated for the RGB 455 pixels. The measured values at other positions are then nor- 456 malized to this value and saved as percentages. The resulting 457 percentages are used in the CSM as positional/channel-varying 458 scaling factors to each spectral band. As with the other data, the 459 missing samples are obtained via interpolation. 460
CHEN et al.: DIGITAL CAMERA IMAGING SYSTEM SIMULATION
7
Fig. 8. MTF comparison. Data measured from the captured one versus the simulated images at 300 lx for the MT9T012 sensor. Nf marks the percentage of drop at the Nyquist frequency. TABLE I L UMINANCE S TANDARD D EVIATION C OMPARISON
461
D. ADC
462
The CSM models the ADC as a linear quantizer. The step size (in volts) is
463
s=
W · cg 2bits − 1
(11)
where W is the pixel linear full well capacity in electrons (e− ), 465 cg is the pixel conversion gain as defined in (8), and bits is the 466 pixel circuit internal bit depth; a typical number is 10 or 12 b.
464
V. I MPLEMENTATION
467
Thus far, we have introduced a simulator with highresolution multispectral input scene data, a full real lens system 470 characterization by spectrally and spatially varying distortion, 471 shading, and PSF data, and a sensor model that not only takes 472 into account the electrical and the noise model but also the 473 crosstalk issue. It is a much-closer-to-reality model. However, 474 all have raised the requirement for CPU speed and memory. 475 The CSM workflow details are summarized in Section V-A, and 476 Section V-B explores the approaches for system acceleration.
468 469
Fig. 9. Relative illumination comparison of the captured one versus the simulated images at 300 lx for the MT9T012 sensor. Value taken from row 510 of the image pair shown in Fig. 10(a).
the captured scene or captured under similar lighting 489 condition. Calculate the multispectral scene data, and 490 save them for use in the later steps. Note that the 491 PCA on the calibration data only needs to be perform- 492 ed once. 493 2) Obtain the system characterization data. 494 a) Pixel characterization data: Including spectral QE 495 for each channel of the CFA, IRcut filter transmit- 496 tance, pixel conversion gain, full well, and noise 497 parameters. 498 b) Optical characterization data: Including effective 499 F number, magnification and focal length, functions 500 that describe the mapping of relative illumination and 501 distortion, and a sampled set of PSFs on a rectangular 502 grid. The mapping functions and the PSFs are sampled 503 spectrally as well. 504 c) Crosstalk characterization data: Perform the experi- 505 ment as described in Section IV-C. 506 3) Perform system simulation (please refer to Sections III 507 and IV for details). 508 a) Apply optical simulation, sensor electrical model, and 509 crosstalk data for each spectral band. 510 b) Repeat step a), and accumulate the results until all the 511 spectral bands have been looped. 512 c) Apply the sensor noise model to the accumulated 513 voltage from step b). 514 d) Quantize the result in the previous step to DNs. 515 B. Practical Concerns
477 478 479 480 481 482 483 484 485 486 487 488
A. CSM Workflow In summary, the following steps are followed by the CSM. 1) Build a multispectral scene database (please refer to Section II for details). a) Camera calibration: Choose a camera+lens system, and perform calibration. The calibration only takes place once for a given system. The camera should have enough resolution and is capable of raw data capture. b) Obtain multispectral scene data: Capture images with the system calibrated in step a). All cases of interest should be considered, including HDR and nonHDR cases. A Macbeth chart must be either present in
516
A Dell Precision 380 PC computer with Intel Pentium Duo 517 CPU (each at 3.2 GHz) and 8-GB RAM was used for the 518 simulation. This computer runs on 64-b windows OS. A typical 519 simulation of 16-M input spectral scene data, with the PSFs 520 resampled to about 31 × 31 pixels for optical convolution, takes 521 about 36 h, where a huge chunk of time was spent on the optical 522 simulation. The simulation codes written in MATLAB have 523 taken into consideration code optimization (such as using vec- 524 tor operations whenever possible). The sensor simulation was 525 separated with the main optical simulation such that a change 526 in some simulation conditions such as lux level or exposure 527 time would not mean a rerun of the whole system. However, 528
8
IEEE TRANSACTIONS ON ELECTRON DEVICES
Fig. 10. Comparison of the captured one versus the simulated images at 10 and 300 lx. The images at 10 lx were cropped out with 100% zoom. The MT9T012 sensor was used. (a) Captured at 300 lx. (b) Simulated at 300 lx. (c) Captured at 10 lx (100% zoom). (d) Simulated at 10 lx (100% zoom).
when a different scene is of interest, the lens design changes, or a different illuminant is more desirable; the whole simulation needs to be rerun. What is more is that the computation time 532 increases dramatically as the PSF sampling spacing increases. 533 Such computation bottleneck makes the use of the simulator 534 inconvenient, particularly for work with tight deadlines. 535 A few approaches to accelerate the optical simulation were 536 considered. As the PSF estimation and convolution are defined 537 within a certain window with respect to a given pixel, it is quite 538 natural to think along the line of parallel computing. It can 539 be done via multiprocessors on a network of computers (grid 540 computing) or a single computer or through other hardwares 541 that have parallel scheme in their nature and are computa542 tionally powerful. All approaches have been considered and 543 tested. In current implementation, both the multiprocessor on 544 a single computer and the use of the graphic processing unit of 545 an Nvidia GeForce 8800 GTX graphic card are utilized. The 546 acceleration has significantly shorten the running time down to 547 about 3 h for the typical setting described previously, a much 548 more acceptable time frame. 529 530 531
549
VI. S YSTEM VALIDATION
The proposed simulator outputs have been validated against actual captured images. In the examples shown in Fig. 10, a Micron 2.2-μm sensor MT9T012 [18] was used. The lens 553 characterization data were provided by external collaborators.
550 551 552
Fig. 10 shows a pairwise comparison of the simulated one 554 versus the captured images at 10 and 300 lx at D65. All 555 images were only processed by a simple demosaic algorithm 556 for display purposes. No other digital processing was applied. 557 Qualitatively, they look alike with the similar shading effect, 558 distortion, brightness, and noise level. Fig. 10(c) and (d) exam- 559 ines the 10-lx results more closely. Note that the ISO targets 560 (such as slanted edges) presented in the images were synthetic, 561 thus can be used in the calculation of quantitative values. The 562 example shown was not an HDR scene. 563 Fig. 8 compares the system modulation transfer function 564 (MTF) of the simulated one versus an actual system measured 565 from the images captured at 300 lx. Consistent with the images, 566 the MTF of the simulated system stands higher than that of 567 the captured one. This is because achieving exact focus for 568 real captures was a little difficult due to the highly sensi- 569 tive lever on the module used. The simulation, on the other 570 hand, achieves excellent focus since that can be explicitly set. 571 Table I lists the luminance standard deviation calculated from 572 the achromatic row of the center color checker in Fig. 10. The 573 calculation for all patches shows, on average, an 8% difference 574 between the simulated one and the captured one. The bigger 575 difference tends to show up on the darker patches (patch 10, 576 13, 15, and 24). The patch number starts from the top left 577 corner and increases with row-wise scan order. The average 578 difference drops to 5% with the dark patches removed from the 579 calculation. 580
CHEN et al.: DIGITAL CAMERA IMAGING SYSTEM SIMULATION
Fig. 9 plots the green channel intensity at row 510 for image pairs shown in Fig. 10(a) and (b). The values basically sit on top of each other which is consistent with visual impression as 584 well. The brightness level of the captured set and the simulated 585 set exhibits a slight difference. The omission of modeling of the 586 transmissivity of the lens may account for some of it. Optical 587 distortion is slightly different between them. One potential 588 cause is the production deviation. Since the distortion is not so 589 sensitive once optics tooling is fixed, it is more likely due to the 590 adjustment of the subject distance with the macrolever which 591 affects the distortion. Defects appear visible in the captured im592 ages particularly at lower light levels. The simulated images do 593 not have any defects as no defect model was used. However, we 594 do have the capability to model the defects and can easily do so. 595 Overall, they are fairly close to each other visually and 596 quantitatively. The proposed digital camera simulation system 597 appears to be good over ranges of parameters for the purpose of 598 design and overall system evaluation.
9
Video Processing, C. van den Branden Lambrecht, Ed. Norwell, MA: Kluwer, 2001, pp. 123–150. B. Fowler and X. Liu, “Analysis and simulation of low light level image sensors,” Proc. SPIE, vol. 6201, p. 620 124, 2006. R. C. Short, D. Williams, and A. E. W. Jones, “Image capture simulation using an accurate and realistic lens model,” Proc. SPIE, vol. 3650, pp. 138–148, 1999. C. Garnier, R. Collorec, J. Flifla, and F. Rousee, “General framework for infrared sensor modeling,” Proc. SPIE, vol. 3377, pp. 59–70, 1998. C. Kolb, D. Mitchell, and P. Hanrahan, “A realistic camera model for computer graphics,” in Proc. ACM SIGGRAPH Conf., 1995, pp. 317–324. T. Florin, “Simulation the optical part of an image capture system,” Proc. SPIE, vol. 7297, p. 729 719, 2009. P. Debevec and J. Malik, “Recovering high dynamic range radiance maps from photographs,” in Proc. ACM SIGGRAPH, Aug. 1997, pp. 369–378. M. D. Grossberg and S. K. Nayar, “What can be known about the radiometric response from images,” in Proc. 7th Eur. Conf. Comput. Vis.—Part IV, May 2002, pp. 189–205. T. Mitsunaga and S. K. Nayar, “Radiometric self calibration,” in Proc. IEEE Conf. CVPR, 1999, vol. 1, pp. 374–380. S. Battiato, A. Castorina, and M. Mancuso, “High dynamic range imaging for digital still camera: An overview,” SPIE J. Electron. Imaging, vol. 12, no. 3, pp. 459–469, Jul. 2003. R. Guenther, Modern Optics. New York: Wiley, 1990. J. Nakamura, Ed., Image Sensors and Signal Processing for Digital Still Cameras (Optical Science and Engineering). Boca Raton, FL: CRC Press, 2005. Data Sheet of MT9T012. [Online]. Available: http://www.aptina.com/ products/image_sensors/mt9t012d00stc/#overview R. Kingslake, Optics in Photography, vol. PM06, SPIE Press Monograph. Bellingham, WA: SPIE, 1992, p. 108. J. W. Goodman, Introduction to Fourier Optics. Englewood, CO: Roberts & Company Publishers, 2004. S. I. Lindsay, A Tutorial on Principle Components Analysis, 2002.
637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670
Junqing Chen (S’02–M’04) received the B.S. and M.S. degrees from Zhejiang University, Hangzhou, China, in 1996 and 1999, respectively, and the Ph.D. degree in electrical engineering from Northwestern University, Evanston, IL, in 2003. From January to July 2004, she was a Postdoctoral Fellow with Northwestern University. Between August 2004 and July 2006, she was with Unilever Research, Edgewater, NJ, as an Imaging Scientist. She is currently a Senior Imaging Scientist with Aptina Imaging LLC, San Jose, CA. Her research interests include image and signal analysis, perceptual models for image processing, image and video quality, system modeling, and machine learning.
671 672 673 674 675 676 677 678 679 680 681 682 683
Kartik Venkataraman received the B.Tech. (Hons.) degree in electrical engineering from the Indian Institute of Technology, Kharagpur, India, the M.S. degree in computer engineering from the University of Massachusetts, Amherst, and the Ph.D. degree in computer science from the University of California, Santa Cruz. In 1989–1999, he worked with Intel Corporation during which he was principally associated with a project investigating medical imaging and visualization between Johns Hopkins Medical School and the Institute of Systems Science in Singapore. In 1999–2008, prior to founding Pelican Imaging Corporation, he was with the Advanced Technology Group, Micron Imaging, where, as the Senior Manager of the Computational Camera Group, he worked on advanced imaging technology. As part of this effort, his group worked on the design of extended depth of field imaging systems for the mobile camera market. As Manager of the CSM group, he worked on setting up an end-to-end imaging systems simulation environment for camera system architecture and module simulations. He is currently the Founder and CTO of Pelican Imaging Corporation, Mountain View, CA, a Silicon Valley startup working in the area of computational imaging. He has over 20 years experience working with technology companies in Silicon Valley. His research interests include imaging and image processing, computer graphics and visualization, computer architectures, and medical imaging.
684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707
581
582 583
599
[7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17]
VII. C ONCLUSION [18]
A digital camera simulator that is built upon the framework provided by the ISET simulator has been presented. The real602 world side of information is added to each component to make 603 it closer to a real-use case. The scene module is extended to 604 a much higher resolution with the capability to simulate HDR 605 scenes. What is more is that the light source is isolated from 606 the scene reflectance to enable the simulation of various light 607 sources without performing a large number of experiments 608 in a laboratory. The optics is characterized by a set of data 609 that allows the simulation of an actual lens design without 610 the need of the lens prescription. Crosstalk characteristics are 611 captured via designed experiments, which further narrow the 612 gap between the simulator output and the real product. The 613 simulator is also accelerated to make it convenient for daily use. 614 Our results indicate that it is capable of generating results that 615 are close to those of the actual product. This simulator is, as of 616 now, in use for commercial image sensor design. 600 601
617
ACKNOWLEDGMENT
618 619 620
The authors would like to thank T. Yoneda from Kinolta Minolta for providing the lens data and for the valuable feedback regarding the system simulation validation.
621
R EFERENCES
622 623 624 625 626 627 628 629 630 631 632 633 634 635 636
[1] T. Chen, “Digital camera system simulator and applications,” Doctoral Dissertation, Stanford Univ. Press, Stanford, CA, Jun. 2003. [2] J. E. Farrell, F. Xiao, P. Catrysse, and B. Wandell, “A simulation tool for evaluating digital camera image quality,” in Proc. SPIE Int. Soc. Opt. Eng., 2004, vol. 5294, pp. 124–131. [3] J. Farrell, M. Okincha, and M. Parmar, “Sensor calibration and simulation,” in Proc. SPIE Int. Soc. Opt. Eng., 2008, vol. 6817, p. 601 70R. [4] P. Maeda, O. Catrysee, and B. Wandell, “Integrating lens design with digital camera simulation,” in Proc. SPIE Int. Soc. Opt. Eng., 2005, vol. 5678, pp. 48–58. [5] P. L. Vora, J. E. Farrell, J. D. Tietz, and D. H. Brainard, “Image capture: Simulation of sensor responses from hyperspectral images,” IEEE Trans. Image Process., vol. 10, no. 2, pp. 307–316, Feb. 2001. [6] P. Longere and D. H. Brainard, “Simulation of digital camera images from hyperspectral input,” in Vision Models and Applications to Image and
[19] [20] [21]
10
708 Dmitry Bakin received the M.S. and Ph.D. degrees 709 from Moscow Institute of Physics and Technology, 710 Moscow, Russia, with specialization in quantum 711 electronics, in 1983 and 1988, respectively. 712 He is currently a Principal Scientist with Aptina 713 Imaging LLC, San Jose, CA, where he is designing 714 imaging lens arrays, developing optics for 3- to 715 5-megapixel CMOS cameras with extended depth 716 of field, preparing optical simulation programs for 717 modeling new imaging modules, and evaluating of 718 miniature high-resolution lenses incorporating AF 719 and digital zoom technologies. His interests are in researching novel optical 720 system technologies with practice in optical design and electrooptical system 721 engineering.
722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738
Brian Rodricks received the M.S. degree from Ohio University, Athens, and the Ph.D. degree in solid-state physics from the University of Michigan, Ann Arbor. He was the first Argonne Scholar at the Advanced Photon Source, Argonne National Laboratory, and subsequently worked as a Member of Technical Staff, developing instrumentation and techniques for high-speed imagining. He currently manages the Application and System Engineering Group, Fairchild Imaging, Milpitas, CA, developing high-performance imaging systems for X-ray and scientific imaging applications. He has more than 20 years of experience in the field of imaging—from optical to X-ray applications—having also worked with Hologic Inc., where he developed TFT-based imaging system for medical applications, and Micron Technology, where he was involved in optical imaging for mobile applications. Dr. Rodricks is a member of the conference organizing committee on Digital Photography at the IS&T/SPIE Symposium on Electronic Imaging: Science and Technology and is a member of the Editorial Board of Review of Scientific Instruments.
IEEE TRANSACTIONS ON ELECTRON DEVICES
Robert Gravelle received the B.S. degree in electrical engineering from the University of Colorado, Colorado Springs. In 1991, he joined Micron Technology, where he worked as a Parametric Engineer with DRAM and FLASH wafer processing until 2003 when he joined the imaging group. He is currently the Manager of optics design and characterization with Aptina Imaging LLC, San Jose, CA, where he is working on optical pixel development. He has developed several modeling techniques for evaluating optical pixel response characteristics and interaction of pixel and lens design.
739 740 741 742 743 744 745 746 747 748 749 750
Pravin Rao received the B.S. degree in engineering from Manipal Institute of Technology, Manipal, India, and the M.S. degree from Rochester Institute of Technology, Rochester, NY. He is currently with Pixim Inc., Mountain View, CA, as an Imaging Engineer.
751 752 753 754
Yongshen Ni received the B.S. and M.S. degrees in electrical engineering from Shanghai Jiao Tong University, Shanghai, China, in 1994 and 1997, respectively, and the Ph.D. degree in electrical engineering from The University of Oklahoma, Norman, in 2005. In 2006, she joined the Advanced R&D Group, Micron Technology Inc. Since 2008, she has been with OmniVision Technologies, Inc., Santa Clara, CA. She is currently involved in CMOS imaging sensor algorithm development for camera applications. Her current research interests include the ASIC design of video/image processing algorithm.
755 756 757 758 759 760 761 762 763 764 765 766
IEEE TRANSACTIONS ON ELECTRON DEVICES
Digital Camera Imaging System Simulation
1
Junqing Chen, Member, IEEE, Kartik Venkataraman, Dmitry Bakin, Brian Rodricks, Robert Gravelle, Pravin Rao, and Yongshen Ni
2 3
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1
Abstract—A digital camera is a complex system including a lens, a sensor (physics and circuits), and a digital image processor, where each component is a sophisticated system on its own. Since prototyping a digital camera is very expensive, it is highly desirable to have the capability to explore the system design tradeoffs and preview the system output ahead of time. An empirical digital imaging system simulation that aims to achieve such a goal is presented. It traces the photons reflected by the objects in a scene through the optics and color filter array, converts photons into electrons with consideration of noise introduced by the system, quantizes the accumulated voltage to digital counts by an analogto-digital converter, and generates a Bayer raw image just as a real camera does. The simulated images are validated against real system outputs and show a close resemblance to the images captured under similar condition at all illumination levels.
19 Index Terms—Hyperspectral scene, imaging sensor, lens char20 acterization, modulation transfer function (MTF), pixel charac21 terization, pixel noise, point spread function (PSF).
I. I NTRODUCTION
22
T
HE INCREASING popularity of multimedia applications has led to the ubiquitous presence of digital cameras in various mobile devices such as cell phones, PC webcams, and 26 gaming devices. Such trend incentivizes the development of 27 smaller imagers at lower cost with the least sacrifice of the 28 image quality. A digital camera is a complex system that is 29 composed of a lens, a sensor, and a digital image processor, 30 each of which is a sophisticated system on its own. Since 31 making a test system is costly and time-consuming, it is highly 32 desirable for imager makers, system integrators, and image 33 processing algorithm developers to explore the design tradeoffs 34 and visualize the outcome images via simulation before the 35 actual system becomes available. 36 The digital camera system is usually modeled as a linear 37 system based on the principles of optics and device physics 38 [1]–[3], [5], [7]. As shown in Fig. 1, a camera takes the trace 23 24 25
Manuscript received January 6, 2009; revised April 22, 2009. The review of this paper was arranged by Editor P. Magnan. J. Chen, D. Bakin, and R. Gravelle are with Aptina Imaging LLC, San Jose, CA 95134 USA (e-mail:
[email protected]; dbakin@aptina. com;
[email protected]). K. Venkataraman is with Pelican Imaging Corporation, Mountain View, CA 94041 USA. B. Rodricks is with Fairchild Imaging, Milpitas, CA 95035 USA. P. Rao is with Pixim Inc., Mountain View, CA 94043 USA. Y. Ni is with OmniVision Technologies, Inc., Santa Clara, CA 95054 USA. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TED.2009.2030995
Fig. 1. Camera model. A camera is modeled as a cascaded system of optics, a sensor, and a digital image processor. The sensor is modeled as a series of filtering (the infrared filter, the color filter array, and the pixel QE) followed by a sampling of the sensor grid, where signal crosstalk happens simultaneously.
of light from a scene and passes it through the lens and to 39 the sensor. In the context of this paper, the word scene refers 40 to a projection of information from any physical environment 41 into the eyes or cameras. With the assumption that all objects 42 reflecting the incoming light only and single light source, such 43 projection can be decomposed into the interaction of object 44 spectral reflectance and the spectral energy of the light source. 45 The content of a scene may be synthetic targets (for example, 46 Macbeth Color Checker) or calibrated representations of real 47 world [5], [6]. A simple approximation of the imaging lens 48 is a parametric diffraction-limited imaging optics model. It 49 counts for a positional dependent irradiance degradation by 50 the cosine fourth law and a linear blur by the spectral optical 51 transfer function (OTF) [1], [9]. A more advanced lens model 52 was proposed [4], [8], [11] for better estimation of a real 53 system. The sensor module simulates the transformation of 54 irradiance into an electrical signal. It can be a physical or 55 an empirical model. Since an accurate physical model links 56 too tightly to a specific sensor design, an empirical model is 57 often used instead. By utilizing sensor characterization data, 58 an empirical model separates the sensor design details and 59 the key system performance describers and thus works better 60
0018-9383/$26.00 © 2009 IEEE
2
IEEE TRANSACTIONS ON ELECTRON DEVICES
as a more generic block. The sensor module accounts for the spatial sampling of the optical image by the imaging sensor, the size and position of pixels and their fill factors, the spectral 64 selectivity of color filters, the transmittance of the infrared cut 65 filter, the photodetector spectral quantum efficiency (QE), the 66 conversion gain (from electrons to voltage), and various noise 67 sources; integrates over spectral bands, photosensitive areas, 68 and exposure time; passes the integrated voltage through an 69 analog-to-digital converter (ADC); and generates a raw Bayer 70 image output in digital counts (DNs). 71 The aforementioned camera modeling framework has been 72 widely accepted and proven to be quite successful [1]. However, 73 when it comes to the validation against an actual digital camera, 74 there is still a gap caused by the lack of model input (scene) 75 sampling precision, the simplistic diffraction-limited optics 76 model, and the sensor-production-related issues. In an effort to 77 narrow such difference, the proposed work extends the existing 78 model in the following aspects: 61
62 63
79 80 81 82 83 84 85 86 87 88 89 90
1) Build a high-resolution (16 M) multispectral scene library that includes various scenes (HDR or non-HDR) and represents the scene by spectral reflectance rather than direct spectral exitance, which allows flexibility to simulate the same scene with different light sources. This is useful particularly for synthetic targets. 2) Extend the optics model to a real lens whose spectral characterization data can be extracted from the lens design in Zemax or Code V,1 resulting in closer resemblance to an actual system. 3) Extend the sensor model to include signal crosstalk for better approximation of the actual system.
Hereafter, the proposed digital camera simulation tool is referred to as the camera and scene modeling (CSM) simulator. Similar to prior work in literature, it has three major com94 ponents: scene, lens, and sensor. Sections II–IV discuss each 95 element in great details. Section V addresses the practical im96 plementation issues. Section VI outlines the system validation 97 efforts, followed by the conclusion in Section VII. 91 92 93
II. M ULTI S PECTRAL S CENE
98
The CSM uses a high-end Canon Mark II EOS-1Ds camera with a Canon EF 50-mm f/1.4 USM lens for scene data collection. An important assumption made in the CSM is that the high-resolution scenes captured by the Canon camera are 103 deemed as infinite resolution. As a result, the performance of 104 the simulation will be limited by the performance of the Canon 105 camera and the lens that were used. Since the products in the 106 mobile space were of primary interest, such an assumption is 107 generally valid. For a higher end system, a better input needs to 108 be used instead to ensure quality. 109 The dynamic range of the selected Canon camera is about 110 70 dB. However, a lot of scenes in nature exceed this dynamic 111 range and could go up to 100 dB or more. To ensure that the 99
100 101 102
1 Both
are popular lens design software tools. For details, please refer to www.zemax.com and www.opticalres.com.
CSM is capable of simulating various scenarios, it is important 112 to include both HDR and non-HDR scenes in the database. The 113 CSM uses the combined multiple-exposure approach to extend 114 the dynamic range of a camera. Note that this works only if 115 the objects in the scene are static and that lighting variation 116 is tiny. When a nonstatic object is present in the scene, the 117 registration of the multiple captures could be quite complicated, 118 and, in our practice, a single capture was used. However, there 119 are more advanced image registration methods widely available 120 in the literature, and they could be used to improve the system 121 performance. 122 The synthesis of multispectral scene data from captured RGB 123 images of natural scenes includes two steps. The first step is the 124 calibration of a given camera plus a lens system and deriving 125 of the camera response curve that maps the RGB in DNs to its 126 RGB exitance in watts per square meter. The second step is to 127 extend the RGB exitance to multispectral data via a principal 128 component analysis (PCA). Sections II-A and -B describe the 129 details of each step. 130 A. Obtain Camera Response Curve
131
Various techniques [12]–[15] were proposed in an effort to 132 determine the relationship between camera signals and radio- 133 metric quantities. The seminal work in the field of multiple- 134 exposure composition by Debevec and Malik [12] is followed 135 for its mathematical simplicity and robustness. The method- 136 ology is quite simple. Define g() as the transfer function of 137 the imaging system. A pixel DN value (Z) obtained with 138 exitance E at a scene and exposure time T is then related by 139 Z = g(ET ). The inverse function g −1 is often referred to as 140 the camera response curve. By constraining g to be monotonic, 141 limiting the Z values to a given range, adding some second- 142 order smooth constraints, and performing optimization in the 143 natural logarithmic domain, the unknown g −1 can be obtained 144 via a standard least square optimization method when enough 145 pixel values are used. 146 The Canon camera was set to raw capture mode to obtain 147 12-b 16-M raw images. Absolute raw values are used to ensure 148 that true camera response is obtained.2 Fig. 2 shows the Canon 149 camera response curve g −1 . The data points for solving the 150 algorithm were obtained by imaging a Macbeth chart in a 151 light booth with different exposure times. The minimum and 152 maximum exposure times were selected based upon the criteria 153 of slightly underexposing the highlights and almost saturating 154 the shadows in the scene, respectively. The data from the 155 24 patches were used to obtain the response curve for each 156 channel. 157 The curves in Fig. 2 are on a relative scale. Absolute calibra- 158 tion can be derived by measuring an absolute luminance in the 159 scene being photographed. Increasing the number of exposures 160 helps to average out noise and thus boost the reconstruction 161 accuracy when the provided multiple exposures fall in the linear 162 range of the sensor response function and are well separated. 163 2 The Canon camera outputs raw images in .CR2 format. Open source code dcraw was used to extract the raw DNs. Details on dcraw is available at www.guillermoluijk.com/tutorial/dcraw/.
CHEN et al.: DIGITAL CAMERA IMAGING SYSTEM SIMULATION
3
where C is the 3 × 31 conversion matrix. Substituting Si with 196 (1) in (2) 197 ri = CBwi .
Fig. 2. Response curves for the Canon EOS-1Ds Mark II camera with a 50-mm EF USM f/1.4 lens. The calculation was performed per color channel on the Canon raw data. The curves are obtained for each channel and plotted by each prime color. They almost sit on top of each other.
It should also be noted that the number of data points used to solve for the radiometric response function should be greater than the range of the pixel values. For an N -b scale, this needs N 167 to be 2 − 1 or greater to ensure a sufficiently overdetermined 168 system of equations. 164
165 166
169
B. Obtain Multispectral Scene Data
With the curves defined in Fig. 2, the DNs of the Canoncaptured image are mapped to exitance of R, G, and B color channels. The next step is to derive the spectral exitance from 173 the RGB exitance obtained. This is done by utilizing a cali174 brated target. More specifically, 24 patches of a color checker 175 were lit by three light sources: daylight, tungsten, and fluores176 cent. At each lighting condition, the multiexposure images of 177 the chart were captured with the Canon camera, and the spectral 178 exitance was measured with a Gamma Scientific spectrora179 diometer from 400 to 700 nm with a 10-nm step, i.e., 31 spectral 180 bands. Under the assumption that the spectral exitance of the 181 24 patches is adequate to expand the spectral exitance space, 182 any RGB exitance can thus be expressed as a linear combination 183 of them. To further reduce the dimensionality, the PCAs [21] 184 were applied. PCA is a popular mathematical method that is 185 particularly useful in identifying the largest variations in the 186 data via principle components (PCs) and represents the data in 187 a coordinate system that is defined by the PCs. Six PCs were 188 found to be adequate for the CSM. Now, we have 170 171 172
Si = B · wi
(1)
where Si (i = 1, . . . , 24), a column vector (31 × 1), denotes the exitance of each of the 24 color patches, B is the PC matrix (31 × 6) with each column corresponding to one PC, and wi 192 is the weighting vector (6 × 1) associating with each of the 193 24 color patches to the PC selected. Assuming that the spectral 194 exitance (Si ) is related to the RGB exitance (ri , 3 × 1 vector) 195 linearly
189 190 191
ri = CSi
(2)
(3)
Since B is constant, the product of C and B can be combined 198 into one matrix CB (now becomes 3 × 6). For a given scene, 199 once CB is determined and the RGB exitance is known, the 200 weighting vectors wi can be calculated from (3), and the spec- 201 tral exitance Si can be obtained via (1). Note that, as needed by 202 such calibration stage, a 24-patch color checker image captured 203 under similar lighting condition as the scene is always required. 204 In order to provide the flexibility of the varying light 205 sources, the aforementioned calculation was performed in the 206 reflectance space, and the scene data saved are the spectral 207 reflectance rather than the spectral exitance. The concept and 208 procedure stay largely the same. In Fig. 3, the estimated spec- 209 tral reflectance is plotted against the measured one. They are 210 comparable, and most deviation occurs in the high wavelength 211 range. This is of less importance as, in the real system, an 212 infrared cut filter would have cut them anyway. 213 III. O PTICS S IMULATION
214
Optics plays a critical role in the imaging system. It transmits 215 and refracts the light reflected by objects in a scene and projects 216 part of it (that is within the field of view) onto the sensor plane. 217 Like any system, it also introduces errors. Aberrations of an 218 optical system fall into two classes, namely, monochromatic 219 and chromatic aberrations [16]. The following data characterize 220 the key aberrations: 221 1) Distortion data describe the deviation from a recti- 222 linear projection. Barrel and pincushion are the most 223 common distortions caused by image magnification 224 decreases/increases with the distance from the optical 225 axis. Fig. 4 illustrates both effects. 226 2) Relative illumination data reflect the reduction of an im- 227 age’s brightness or saturation at the periphery compared 228 to the image center. 229 3) Point spread function (PSF) describes the defocus (blur) 230 to the image resolution introduced by a lens; when sam- 231 pled spatially on the image plane, it accounts for spherical 232 distortion, coma, astigmatism, and aberrations caused by 233 curvature of a field; when sampled spectrally, it reflects 234 chromatic aberration as well. 235 When only a few key lens parameters (focal length, magnifi- 236 cation, and f number) are known, the optics system is assumed 237 to be diffraction limited with no distortion, cosine fourth law for 238 relative illumination, and spatially invariant PSFs. For a circular 239 aperture, the OTF of such system is [20] 240 2 −1 (4) cos (x) − x 1 − x2 OTF(x) = π where the arc cosine function is in radians and x is the normal- 241 ized spatial frequency defined as the absolute spatial frequency 242 f divided by the diffraction cutoff spatial frequency fc 243 x = f /fc .
(5)
4
IEEE TRANSACTIONS ON ELECTRON DEVICES
Fig. 3. (Red) Estimated versus (green) actual spectral reflectance for 24 patches of a color checker.
Fig. 4. Optics distortions: (Left) Barrel and (right) pincushion distortion.
244
There are various formulas to calculate fc , for example fc =
D λf
(6)
where λ denotes the wavelength, f is the focal length, and D stands for the diameter of the lens clear aperture. Two as247 sumptions are made in (6), namely, unity image space refractive 248 index and infinite conjugate ratio. 249 The modeling of an image formed by a real lens is tradi250 tionally done using optical design programs. The simulation 251 of the optical image formation through a multielement lens 252 typically requires millions of ray-tracing operations and is time253 consuming. Fortunately, programs such as Zemax or Code V 254 allow computing projection passes of multiple rays through 255 the optical elements of the imaging system and establish the 256 shape of PSFs, as well as local image distortions and changes 257 in relative illumination. The calculations are based on lens 258 prescriptions provided by lens designers. The ray-tracing data 259 allow characterizing the image at various field heights. The data 260 are collected over the full field on a rectangular grid at prese261 lected image heights and over the operating spectral range. A 262 specially written macrocommand is used to operate the optical 263 design program in an automated manner. The data are recorded 264 into files for processing in the CSM in the subsequent steps. 245 246
The accuracy of the results of the modeling was confirmed 265 in the laboratory measurements against the performance of a 266 real lens. 267 With the data obtained, utilizing them for simulation is 268 relatively straightforward. The distortion is simulated as sam- 269 pling grid mapping followed by resampling to a rectangular 270 grid. Optical vignetting is simply the scaling of data based on 271 spatial positions. The application of PSFs can be done in the 272 Fourier domain for the diffraction-limited optics. In the real- 273 lens case, as PSFs are spatially varying, the Fourier domain 274 calculation loses its advantage. The spatial domain convolution 275 is used instead. The PSF at each spatial position is estimated 276 and resampled, followed by 2-D convolution. The convolution 277 kernel size varies from 31 × 31 to 81 × 81 pixels or bigger 278 depending on the PSF sampling spacings. As image dimension 279 increases and the computation becomes forbiddingly slow, not 280 to mention, these calculations need to be repeated for each spec- 281 tral band. Section V discusses ways to accelerate the process via 282 parallel computing. 283 A lens only passes part of the energy reflected by the scene to 284 the sensor plane. The relationship between the scene radiance 285 L and the sensor plane irradiance E is guided by the basic 286 principles of radiometry [19] 287 E = πL
1 1 + 4 (f # (1 + |m|))2
(7)
where f # and m are the effective focal length and magnification of a lens, respectively. The sensor plane irradiance represents the amount of energy that hits the pixel. The sensor then converts the irradiance image on the sensor plane into electrical signals.
290 291 292
IV. S ENSOR M ODEL
293
288 289
A sensor is an electrical device consisting of an array 294 of pixels. It converts incoming photons into electrons via 295
CHEN et al.: DIGITAL CAMERA IMAGING SYSTEM SIMULATION
5
photodetectors, collects the converted electrons and translates them to electric potentials through a source follower, amplifies the analog signal via chains of amplifiers, and then quantizes 299 it to a digital signal via an ADC for further digital signal 300 processing. There are two kinds of imaging sensors, namely, 301 the CCD and the CMOS sensor. They follow similar working 302 principles and are largely different in readout circuits. As a 303 result, the basic sensor electrical model remains the same with 304 some difference in the sensor noise model components. 305 This paper focuses on CMOS imaging sensor modeling. The 306 following sections briefly review the electrical and noise model 307 that was quite well established and the simple linear quan308 tizer. The sensor crosstalk model is discussed in much greater 309 details. 296
297 298
310 311
A. Sensor Electrical Model The sensor collects the signal spectrally and spatially T λmax V = cg · q · N (λ)QE(λ)S(λ)dλdAdt
(8)
0 A λmin
where V (in volts) is the sensor output, cg (in volts/e− ) is 313 the pixel conversion gain, q denotes the electron charge con314 stant, QE(λ) is the spectral QE of the pixel, S(λ) denotes 315 the combination of spectral filters in the sensor including the 316 transmittance of the color filters and the infrared cut filter, A 317 represents the effective photodetector area, λ and T stand for 318 the wavelength and exposure time, respectively, and N (λ) is 319 the number of incoming photons that is related to the sensor 320 plane irradiance E defined in (7) by
312
N (λ) =
E(λ) · λ h·c
(9)
where h and c are the Planck’s constant and speed of light, respectively. As a monochromatic sensor is difficult to ob323 tain, QE is usually measured with color filter transmittance 324 included. 325 The bottom in Fig. 1 models the sensor as a series of 326 filtering followed by sampling of the sensor grid, where sig327 nal crosstalk happens simultaneously. Although sampling is a 328 relative easy operation, great care needs to be taken to prevent 329 signal aliasing. Crosstalk is a very complicated process and will 330 be addressed in Section IV-C. 321 322
331
B. Sensor Noise Model
An additive sensor model [17], a well-known industrial standard, is used. Among various noise sources, a few most significant ones are the following. 335 1) Photon shot noise (Nshot ) occurs due to the random 336 arrival nature of photons at the sensor and is a ran337 dom process that obeys Poisson statistics in electrons or 338 photons. 339 2) Read noise (Nread ) roots in an on-chip preamplifier 340 and occurs during reset and pixel value readout. It has 341 two parts, namely, the white Gaussian noise and the 332
333 334
Fig. 5. Illustration of a sensor crosstalk. (Left) Optical crosstalk. (Right) Electrical crosstalk. The incoming light already converges by the microlens. It shows how the signal intended for the center red pixel to get crosstalked into the neighboring green pixels optically and electrically.
flicker noise. The Gaussian part decreases as the analog 342 gain increases. The flicker noise approximately varies 343 inversely to the frequency and affects lower frequen- 344 cies more. It stays relatively flat in the frequency range 345 of the output sensor and thus is ignored in the CSM 346 modeling. 347 3) Dark current noise (Ndark ) has two parts; one is shot 348 noise (that follows the Poisson distribution) that is pro- 349 duced when the photocathode is shielded from all exter- 350 nal optical radiation and operating voltages are applied; 351 the other is the dark current nonuniformity noise, mod- 352 eled as a white Gaussian distribution. It is also considered 353 as one of the fixed pattern noises. 354 4) Fixed pattern noise (Nfpn ) is a nontemporal spatial 355 noise due to device mismatches or variations in the 356 pixel/color filters, gain amplifier, and ADC. In addition 357 to dark nonuniformity noise, two other noise sources are 358 considered, namely, the photon response gain nonunifor- 359 mity noise (Ngain ) modeled as a Gaussian noise with 360 unity mean and the row-wise fixed pattern noise modeled 361 as a column vector that is the same for each row, with zero 362 mean and magnitude that is half of the maximum of read 363 noise per row. 364 The noise parameters are measured through experiments and 365 assumed to be constant for a given sensor, except for photon 366 shot noise which correlates with signal itself and dark noise 367 that depends on exposure time. With all noise source expressed 368 in unit volt, the sensor analog output Vo becomes 369 Vo = (V + Nshot ) ∗ Ngain + Nread + Ndark + Nfpn
(10)
where V denotes the real sensor signal.
370
C. Sensor Crosstalk Model
371
Crosstalk refers to any phenomenon by which a signal trans- 372 mitted on one channel creates an undesired effect in another 373 channel. Based on the nature of sensor crosstalk, it can be clas- 374 sified into two categories: electrical and optical crosstalk. Fig. 5 375 illustrates the crosstalk graphically. The electrical crosstalk is a 376 result of photogenerated charge migration between pixels. Its 377 impact becomes more pronounced as the pixel pitch is reduced. 378 The optical crosstalk is due to photons incident on a pixel that 379
6
IEEE TRANSACTIONS ON ELECTRON DEVICES
Fig. 7. (Left) Micron MT9T012 sensor QE of the sensor along with the transmission curves for filters used to obtain narrow-band light sources. The figure on the right shows the signal captured by the sensor. Each color channel output is plotted with its prime color.
Fig. 6. Crosstalk experiment setup illustration. Two rotational axes and the CRA are marked.
are then captured in an adjoining pixel. The optical crosstalk is more severe at the edges of the sensor where incident angles are high and is further impacted by pixel geometry and layout 383 structure. For on-axis pixels, whose radiation is received at 384 normal incident angle, there is minimal optical crosstalk. As we 385 move away from the sensor center, both crosstalks kick in, and 386 it is impossible to separate them. However, as the pixel shrinks, 387 the optical crosstalk starts to pick up even for the on-axis pixel. 388 That is because, when maintaining the same F number, the lens 389 could generate a spot size that is bigger than a pixel or, when 390 reducing the F number, the lens is moving into a bigger incident 391 angle as the photon detector shrinks. 392 The maximum chief ray angle (CRA) of a lens is the angle 393 of incidence a ray makes that passes through the center of the 394 lens and reaches the diagonal edge of the sensor. The smaller 395 the maximum CRA, the better the light transfer efficiency 396 and, hence, the better the image quality. A typical cell-phone 397 module maximum CRA is about 20◦ . When a sensor design 398 is fixed, both micrometer-lens placement and lens CRA as a 399 function of spatial position and wavelength are known. When 400 the actual lens CRA deviates from the design assumption, 401 optical crosstalk is elevated. 402 An experiment has been designed to catch such mismatch. 403 Since it is impossible to separate the electrical crosstalk from 404 the optical crosstalk, the data collected account for both. 405 Narrow-band light sources are used. No lens is included in the 406 setup. Light is incident onto a sensor, and the sensor is oriented 407 such that a certain range of pixels would receive photons at 408 angles that satisfy the conditions under a lens+sensor imaging 409 system. Since the rotating sensor and changing light incident 410 angle are equivalent, an automated test platform is designed to 411 rotate the sensor along the horizontal, vertical, and diagonal 412 axis of the sensor. Fig. 6 illustrates the experiment setup. For 413 example, when the sensor is rotated around the horizontal axis, 414 only the pixels denoted by the green line satisfy the CRA 415 condition and can be used to reconstruct the crosstalk at those 416 locations. Similarly, if the sensor is rotated around the vertical 417 axis, only the pixels denoted by the blue arrows satisfy the 380 381 382
CRA condition, and so on. Note that such experimental design 418 measures low-frequency crosstalk only. 419 Two sets of data are collected. In the first set, the sensor is 420 rotated along its horizontal, vertical, and diagonal axis over an 421 angular range of ±30◦ in steps of 1◦ . This is wide enough to 422 cover most cellular sensors. At each rotation step, only certain 423 pixels along the axis satisfy the CRA requirements for the 424 lens+sensor system of interest and generate signals. A total of 425 61 images were collected for each of the horizontal, vertical, 426 and two diagonal rotations of the sensor. 427 To facilitate later data interpolation, it is desirable to have 428 the data sampled on a rectangular grid. This is achieved by 429 the second data set. The sensor is scanned over a series of 430 angles to cover spatially uniformly distributed rectangular grid 431 points across the entire sensor planes; the crosstalk data are 432 measured at pixels that satisfy the CRA condition. On a grid 433 of 13 × 17, 221 images that are uniformly distributed across 434 the entire sensor were collected during the second stage. The 435 spacing between data points was approximately 75 pixels on a 436 3-M imaging sensor. A complete data set contains 465 images. 437 Each image was an average of 100 images to reduce temporal 438 noise. 439 Fig. 7 (left) shows the spectral QE of a Micron MT9T012 440 sensor [18] (solid lines) along with the normalized transmission 441 spectrum (dashed lines) of the seven filters used to obtain 442 the narrow-band light sources that covers a 400- to 700-nm 443 range, centering at 400, 450, 500, 550, 600, 650, and 700 nm. 444 Fig. 7 (right) shows a plot of the response of the red, green, and 445 blue pixels to the signal that is incident on them (modulated by 446 the filter at the specified wavelength). It shows the calculated 447 total response (pixel response and crosstalk) of the pixel located 448 at the sensor center. The area under each curve, for a given 449 filter, can be integrated to calculate the percentage of the signal 450 expected under each of the pixels corresponding to the RGB of 451 the CFA. 452 For each narrow-band light source, the image that corre- 453 sponds to the sensor center is identified, and an average signal 454 over an 11 × 11 region of interest is calculated for the RGB 455 pixels. The measured values at other positions are then nor- 456 malized to this value and saved as percentages. The resulting 457 percentages are used in the CSM as positional/channel-varying 458 scaling factors to each spectral band. As with the other data, the 459 missing samples are obtained via interpolation. 460
CHEN et al.: DIGITAL CAMERA IMAGING SYSTEM SIMULATION
7
Fig. 8. MTF comparison. Data measured from the captured one versus the simulated images at 300 lx for the MT9T012 sensor. Nf marks the percentage of drop at the Nyquist frequency. TABLE I L UMINANCE S TANDARD D EVIATION C OMPARISON
461
D. ADC
462
The CSM models the ADC as a linear quantizer. The step size (in volts) is
463
s=
W · cg 2bits − 1
(11)
where W is the pixel linear full well capacity in electrons (e− ), 465 cg is the pixel conversion gain as defined in (8), and bits is the 466 pixel circuit internal bit depth; a typical number is 10 or 12 b.
464
V. I MPLEMENTATION
467
Thus far, we have introduced a simulator with highresolution multispectral input scene data, a full real lens system 470 characterization by spectrally and spatially varying distortion, 471 shading, and PSF data, and a sensor model that not only takes 472 into account the electrical and the noise model but also the 473 crosstalk issue. It is a much-closer-to-reality model. However, 474 all have raised the requirement for CPU speed and memory. 475 The CSM workflow details are summarized in Section V-A, and 476 Section V-B explores the approaches for system acceleration.
468 469
Fig. 9. Relative illumination comparison of the captured one versus the simulated images at 300 lx for the MT9T012 sensor. Value taken from row 510 of the image pair shown in Fig. 10(a).
the captured scene or captured under similar lighting 489 condition. Calculate the multispectral scene data, and 490 save them for use in the later steps. Note that the 491 PCA on the calibration data only needs to be perform- 492 ed once. 493 2) Obtain the system characterization data. 494 a) Pixel characterization data: Including spectral QE 495 for each channel of the CFA, IRcut filter transmit- 496 tance, pixel conversion gain, full well, and noise 497 parameters. 498 b) Optical characterization data: Including effective 499 F number, magnification and focal length, functions 500 that describe the mapping of relative illumination and 501 distortion, and a sampled set of PSFs on a rectangular 502 grid. The mapping functions and the PSFs are sampled 503 spectrally as well. 504 c) Crosstalk characterization data: Perform the experi- 505 ment as described in Section IV-C. 506 3) Perform system simulation (please refer to Sections III 507 and IV for details). 508 a) Apply optical simulation, sensor electrical model, and 509 crosstalk data for each spectral band. 510 b) Repeat step a), and accumulate the results until all the 511 spectral bands have been looped. 512 c) Apply the sensor noise model to the accumulated 513 voltage from step b). 514 d) Quantize the result in the previous step to DNs. 515 B. Practical Concerns
477 478 479 480 481 482 483 484 485 486 487 488
A. CSM Workflow In summary, the following steps are followed by the CSM. 1) Build a multispectral scene database (please refer to Section II for details). a) Camera calibration: Choose a camera+lens system, and perform calibration. The calibration only takes place once for a given system. The camera should have enough resolution and is capable of raw data capture. b) Obtain multispectral scene data: Capture images with the system calibrated in step a). All cases of interest should be considered, including HDR and nonHDR cases. A Macbeth chart must be either present in
516
A Dell Precision 380 PC computer with Intel Pentium Duo 517 CPU (each at 3.2 GHz) and 8-GB RAM was used for the 518 simulation. This computer runs on 64-b windows OS. A typical 519 simulation of 16-M input spectral scene data, with the PSFs 520 resampled to about 31 × 31 pixels for optical convolution, takes 521 about 36 h, where a huge chunk of time was spent on the optical 522 simulation. The simulation codes written in MATLAB have 523 taken into consideration code optimization (such as using vec- 524 tor operations whenever possible). The sensor simulation was 525 separated with the main optical simulation such that a change 526 in some simulation conditions such as lux level or exposure 527 time would not mean a rerun of the whole system. However, 528
8
IEEE TRANSACTIONS ON ELECTRON DEVICES
Fig. 10. Comparison of the captured one versus the simulated images at 10 and 300 lx. The images at 10 lx were cropped out with 100% zoom. The MT9T012 sensor was used. (a) Captured at 300 lx. (b) Simulated at 300 lx. (c) Captured at 10 lx (100% zoom). (d) Simulated at 10 lx (100% zoom).
when a different scene is of interest, the lens design changes, or a different illuminant is more desirable; the whole simulation needs to be rerun. What is more is that the computation time 532 increases dramatically as the PSF sampling spacing increases. 533 Such computation bottleneck makes the use of the simulator 534 inconvenient, particularly for work with tight deadlines. 535 A few approaches to accelerate the optical simulation were 536 considered. As the PSF estimation and convolution are defined 537 within a certain window with respect to a given pixel, it is quite 538 natural to think along the line of parallel computing. It can 539 be done via multiprocessors on a network of computers (grid 540 computing) or a single computer or through other hardwares 541 that have parallel scheme in their nature and are computa542 tionally powerful. All approaches have been considered and 543 tested. In current implementation, both the multiprocessor on 544 a single computer and the use of the graphic processing unit of 545 an Nvidia GeForce 8800 GTX graphic card are utilized. The 546 acceleration has significantly shorten the running time down to 547 about 3 h for the typical setting described previously, a much 548 more acceptable time frame. 529 530 531
549
VI. S YSTEM VALIDATION
The proposed simulator outputs have been validated against actual captured images. In the examples shown in Fig. 10, a Micron 2.2-μm sensor MT9T012 [18] was used. The lens 553 characterization data were provided by external collaborators.
550 551 552
Fig. 10 shows a pairwise comparison of the simulated one 554 versus the captured images at 10 and 300 lx at D65. All 555 images were only processed by a simple demosaic algorithm 556 for display purposes. No other digital processing was applied. 557 Qualitatively, they look alike with the similar shading effect, 558 distortion, brightness, and noise level. Fig. 10(c) and (d) exam- 559 ines the 10-lx results more closely. Note that the ISO targets 560 (such as slanted edges) presented in the images were synthetic, 561 thus can be used in the calculation of quantitative values. The 562 example shown was not an HDR scene. 563 Fig. 8 compares the system modulation transfer function 564 (MTF) of the simulated one versus an actual system measured 565 from the images captured at 300 lx. Consistent with the images, 566 the MTF of the simulated system stands higher than that of 567 the captured one. This is because achieving exact focus for 568 real captures was a little difficult due to the highly sensi- 569 tive lever on the module used. The simulation, on the other 570 hand, achieves excellent focus since that can be explicitly set. 571 Table I lists the luminance standard deviation calculated from 572 the achromatic row of the center color checker in Fig. 10. The 573 calculation for all patches shows, on average, an 8% difference 574 between the simulated one and the captured one. The bigger 575 difference tends to show up on the darker patches (patch 10, 576 13, 15, and 24). The patch number starts from the top left 577 corner and increases with row-wise scan order. The average 578 difference drops to 5% with the dark patches removed from the 579 calculation. 580
CHEN et al.: DIGITAL CAMERA IMAGING SYSTEM SIMULATION
Fig. 9 plots the green channel intensity at row 510 for image pairs shown in Fig. 10(a) and (b). The values basically sit on top of each other which is consistent with visual impression as 584 well. The brightness level of the captured set and the simulated 585 set exhibits a slight difference. The omission of modeling of the 586 transmissivity of the lens may account for some of it. Optical 587 distortion is slightly different between them. One potential 588 cause is the production deviation. Since the distortion is not so 589 sensitive once optics tooling is fixed, it is more likely due to the 590 adjustment of the subject distance with the macrolever which 591 affects the distortion. Defects appear visible in the captured im592 ages particularly at lower light levels. The simulated images do 593 not have any defects as no defect model was used. However, we 594 do have the capability to model the defects and can easily do so. 595 Overall, they are fairly close to each other visually and 596 quantitatively. The proposed digital camera simulation system 597 appears to be good over ranges of parameters for the purpose of 598 design and overall system evaluation.
9
Video Processing, C. van den Branden Lambrecht, Ed. Norwell, MA: Kluwer, 2001, pp. 123–150. B. Fowler and X. Liu, “Analysis and simulation of low light level image sensors,” Proc. SPIE, vol. 6201, p. 620 124, 2006. R. C. Short, D. Williams, and A. E. W. Jones, “Image capture simulation using an accurate and realistic lens model,” Proc. SPIE, vol. 3650, pp. 138–148, 1999. C. Garnier, R. Collorec, J. Flifla, and F. Rousee, “General framework for infrared sensor modeling,” Proc. SPIE, vol. 3377, pp. 59–70, 1998. C. Kolb, D. Mitchell, and P. Hanrahan, “A realistic camera model for computer graphics,” in Proc. ACM SIGGRAPH Conf., 1995, pp. 317–324. T. Florin, “Simulation the optical part of an image capture system,” Proc. SPIE, vol. 7297, p. 729 719, 2009. P. Debevec and J. Malik, “Recovering high dynamic range radiance maps from photographs,” in Proc. ACM SIGGRAPH, Aug. 1997, pp. 369–378. M. D. Grossberg and S. K. Nayar, “What can be known about the radiometric response from images,” in Proc. 7th Eur. Conf. Comput. Vis.—Part IV, May 2002, pp. 189–205. T. Mitsunaga and S. K. Nayar, “Radiometric self calibration,” in Proc. IEEE Conf. CVPR, 1999, vol. 1, pp. 374–380. S. Battiato, A. Castorina, and M. Mancuso, “High dynamic range imaging for digital still camera: An overview,” SPIE J. Electron. Imaging, vol. 12, no. 3, pp. 459–469, Jul. 2003. R. Guenther, Modern Optics. New York: Wiley, 1990. J. Nakamura, Ed., Image Sensors and Signal Processing for Digital Still Cameras (Optical Science and Engineering). Boca Raton, FL: CRC Press, 2005. Data Sheet of MT9T012. [Online]. Available: http://www.aptina.com/ products/image_sensors/mt9t012d00stc/#overview R. Kingslake, Optics in Photography, vol. PM06, SPIE Press Monograph. Bellingham, WA: SPIE, 1992, p. 108. J. W. Goodman, Introduction to Fourier Optics. Englewood, CO: Roberts & Company Publishers, 2004. S. I. Lindsay, A Tutorial on Principle Components Analysis, 2002.
637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670
Junqing Chen (S’02–M’04) received the B.S. and M.S. degrees from Zhejiang University, Hangzhou, China, in 1996 and 1999, respectively, and the Ph.D. degree in electrical engineering from Northwestern University, Evanston, IL, in 2003. From January to July 2004, she was a Postdoctoral Fellow with Northwestern University. Between August 2004 and July 2006, she was with Unilever Research, Edgewater, NJ, as an Imaging Scientist. She is currently a Senior Imaging Scientist with Aptina Imaging LLC, San Jose, CA. Her research interests include image and signal analysis, perceptual models for image processing, image and video quality, system modeling, and machine learning.
671 672 673 674 675 676 677 678 679 680 681 682 683
Kartik Venkataraman received the B.Tech. (Hons.) degree in electrical engineering from the Indian Institute of Technology, Kharagpur, India, the M.S. degree in computer engineering from the University of Massachusetts, Amherst, and the Ph.D. degree in computer science from the University of California, Santa Cruz. In 1989–1999, he worked with Intel Corporation during which he was principally associated with a project investigating medical imaging and visualization between Johns Hopkins Medical School and the Institute of Systems Science in Singapore. In 1999–2008, prior to founding Pelican Imaging Corporation, he was with the Advanced Technology Group, Micron Imaging, where, as the Senior Manager of the Computational Camera Group, he worked on advanced imaging technology. As part of this effort, his group worked on the design of extended depth of field imaging systems for the mobile camera market. As Manager of the CSM group, he worked on setting up an end-to-end imaging systems simulation environment for camera system architecture and module simulations. He is currently the Founder and CTO of Pelican Imaging Corporation, Mountain View, CA, a Silicon Valley startup working in the area of computational imaging. He has over 20 years experience working with technology companies in Silicon Valley. His research interests include imaging and image processing, computer graphics and visualization, computer architectures, and medical imaging.
684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707
581
582 583
599
[7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17]
VII. C ONCLUSION [18]
A digital camera simulator that is built upon the framework provided by the ISET simulator has been presented. The real602 world side of information is added to each component to make 603 it closer to a real-use case. The scene module is extended to 604 a much higher resolution with the capability to simulate HDR 605 scenes. What is more is that the light source is isolated from 606 the scene reflectance to enable the simulation of various light 607 sources without performing a large number of experiments 608 in a laboratory. The optics is characterized by a set of data 609 that allows the simulation of an actual lens design without 610 the need of the lens prescription. Crosstalk characteristics are 611 captured via designed experiments, which further narrow the 612 gap between the simulator output and the real product. The 613 simulator is also accelerated to make it convenient for daily use. 614 Our results indicate that it is capable of generating results that 615 are close to those of the actual product. This simulator is, as of 616 now, in use for commercial image sensor design. 600 601
617
ACKNOWLEDGMENT
618 619 620
The authors would like to thank T. Yoneda from Kinolta Minolta for providing the lens data and for the valuable feedback regarding the system simulation validation.
621
R EFERENCES
622 623 624 625 626 627 628 629 630 631 632 633 634 635 636
[1] T. Chen, “Digital camera system simulator and applications,” Doctoral Dissertation, Stanford Univ. Press, Stanford, CA, Jun. 2003. [2] J. E. Farrell, F. Xiao, P. Catrysse, and B. Wandell, “A simulation tool for evaluating digital camera image quality,” in Proc. SPIE Int. Soc. Opt. Eng., 2004, vol. 5294, pp. 124–131. [3] J. Farrell, M. Okincha, and M. Parmar, “Sensor calibration and simulation,” in Proc. SPIE Int. Soc. Opt. Eng., 2008, vol. 6817, p. 601 70R. [4] P. Maeda, O. Catrysee, and B. Wandell, “Integrating lens design with digital camera simulation,” in Proc. SPIE Int. Soc. Opt. Eng., 2005, vol. 5678, pp. 48–58. [5] P. L. Vora, J. E. Farrell, J. D. Tietz, and D. H. Brainard, “Image capture: Simulation of sensor responses from hyperspectral images,” IEEE Trans. Image Process., vol. 10, no. 2, pp. 307–316, Feb. 2001. [6] P. Longere and D. H. Brainard, “Simulation of digital camera images from hyperspectral input,” in Vision Models and Applications to Image and
[19] [20] [21]
10
708 Dmitry Bakin received the M.S. and Ph.D. degrees 709 from Moscow Institute of Physics and Technology, 710 Moscow, Russia, with specialization in quantum 711 electronics, in 1983 and 1988, respectively. 712 He is currently a Principal Scientist with Aptina 713 Imaging LLC, San Jose, CA, where he is designing 714 imaging lens arrays, developing optics for 3- to 715 5-megapixel CMOS cameras with extended depth 716 of field, preparing optical simulation programs for 717 modeling new imaging modules, and evaluating of 718 miniature high-resolution lenses incorporating AF 719 and digital zoom technologies. His interests are in researching novel optical 720 system technologies with practice in optical design and electrooptical system 721 engineering.
722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738
Brian Rodricks received the M.S. degree from Ohio University, Athens, and the Ph.D. degree in solid-state physics from the University of Michigan, Ann Arbor. He was the first Argonne Scholar at the Advanced Photon Source, Argonne National Laboratory, and subsequently worked as a Member of Technical Staff, developing instrumentation and techniques for high-speed imagining. He currently manages the Application and System Engineering Group, Fairchild Imaging, Milpitas, CA, developing high-performance imaging systems for X-ray and scientific imaging applications. He has more than 20 years of experience in the field of imaging—from optical to X-ray applications—having also worked with Hologic Inc., where he developed TFT-based imaging system for medical applications, and Micron Technology, where he was involved in optical imaging for mobile applications. Dr. Rodricks is a member of the conference organizing committee on Digital Photography at the IS&T/SPIE Symposium on Electronic Imaging: Science and Technology and is a member of the Editorial Board of Review of Scientific Instruments.
IEEE TRANSACTIONS ON ELECTRON DEVICES
Robert Gravelle received the B.S. degree in electrical engineering from the University of Colorado, Colorado Springs. In 1991, he joined Micron Technology, where he worked as a Parametric Engineer with DRAM and FLASH wafer processing until 2003 when he joined the imaging group. He is currently the Manager of optics design and characterization with Aptina Imaging LLC, San Jose, CA, where he is working on optical pixel development. He has developed several modeling techniques for evaluating optical pixel response characteristics and interaction of pixel and lens design.
739 740 741 742 743 744 745 746 747 748 749 750
Pravin Rao received the B.S. degree in engineering from Manipal Institute of Technology, Manipal, India, and the M.S. degree from Rochester Institute of Technology, Rochester, NY. He is currently with Pixim Inc., Mountain View, CA, as an Imaging Engineer.
751 752 753 754
Yongshen Ni received the B.S. and M.S. degrees in electrical engineering from Shanghai Jiao Tong University, Shanghai, China, in 1994 and 1997, respectively, and the Ph.D. degree in electrical engineering from The University of Oklahoma, Norman, in 2005. In 2006, she joined the Advanced R&D Group, Micron Technology Inc. Since 2008, she has been with OmniVision Technologies, Inc., Santa Clara, CA. She is currently involved in CMOS imaging sensor algorithm development for camera applications. Her current research interests include the ASIC design of video/image processing algorithm.
755 756 757 758 759 760 761 762 763 764 765 766