May 7, 2000 - conversion (ADC); film characteristics limit the dynamic range of ..... exposed pictures,â in IS&T's 48th Annual Conference, Washington, ... Kappa Nu, a recipient of the South African Foundation for Research Development (FRD) award for ... was supported by graduate fellowships from the National Science ...
1
Estimation-Theoretic Approach to Dynamic Range Enhancement Using Multiple Exposures
Mark A. Robertson, Sean Borman, and Robert L. Stevenson
May 7, 2000
DRAFT
2
Abstract This paper presents an approach for improving the effective dynamic range of cameras by using multiple photographs of the same scene taken with different exposure times. Using this method enables the photographer to accurately capture scenes that contain high dynamic range by using a device with low dynamic range. This allows the capture of scenes that have both very bright and very dark regions. The approach requires an initial camera calibration to determine the response function of the camera. Once the response function for a camera is known, high dynamic range images can be computed easily with only a small number of captured images. The high dynamic range output image consists of a weighted average of data from the multiply-exposed input images, and thus contains information captured by each of the input images. From a computational standpoint, the proposed algorithm is very efficient and requires little processing time to determine a solution. Keywords Dynamic range, multiframe image restoration, exposure time.
I. I NTRODUCTION Intensity values of real-world scenes can have a very wide dynamic range. This is particularly true for scenes that have areas of both low and high illumination, such as when there are transitions between sun-lit areas and areas in shadow, or when a light source is visible in the scene. Unfortunately, all image capture devices have a limited dynamic range. For digital cameras, the dynamic range is limited by properties of the charge-coupled device (CCD) and analog-to-digital conversion (ADC); film characteristics limit the dynamic range of traditional cameras. When capturing a scene which contains a dynamic range that exceeds that of the camera, there will be a loss of detail in either the low-light areas, the high-light areas, or both. One may vary the exposure to control which light levels will be captured, and hence which light levels will be lost due to saturation of the camera’s dynamic range. Here, we will consider only variation of the exposure time, i.e., the duration for which the light sensing element (CCD or film) is exposed to light from the scene. Variation of the aperture is not considered due to the effects of aperture on depth of field. By increasing the exposure time, one may get a better representation of low-light areas at the cost of losing information in areas of high illumination; an example of this is shown in Fig. 1(a). Similarly, by using a reduced exposure time, one may sacrifice low-light detail in exchange for improved detail in areas of high illumination; this is demonstrated in Fig. 1(e). However, if the photographer desires an accurate representation of both low- and high-light areas of the scene, and the dynamic range of the scene exceeds that of the camera, then it is futile to May 7, 2000
DRAFT
3
adjust the exposure time—detail will definitely be lost, and varying the exposure time merely allows some control over where the loss occurs. Examination of a scene’s histogram offers further insight into the problem. Suppose that a scene has an intensity histogram as shown in Fig. 2, which has concentrations of intensities around relatively dark and relatively bright levels, with maximum intensity Imax . For simplicity, assume that the output of the camera is a linear function of input exposure, and that a uniform quantizer with K levels is used to produce the digital output. A photographer might adjust the exposure settings such that 13 Imax maps to saturation, which emphasizes the dark regions of the scene. Doing this yields quantization intervals of 13 K
1I max .
If the photographer wants to
capture the bright portions of the scene as well, he or she might reduce the exposure such that 2 3 Imax
maps to saturation. Doing this captures a larger range of intensity values than the previous
exposure setting; however, the quantization intervals are now 23 K
1I max —the
dark regions are
captured, but information is lost due to coarser quantization. This paper proposes a method for combining data from multiple exposures to form an image with improved dynamic range which takes advantage of the favorable qualities of each of the individual exposures. Madden [1] also examined the dynamic range problem, specifically for the case of CCD capture devices. Using direct CCD output allowed Madden to assume a linear response function for the camera, i.e., the observed output value is linearly related to the input exposure. Madden takes multiple pictures of the same scene while varying the exposure time, and uses these multiply-exposed images to construct the final high dynamic range image. To determine the value of a high dynamic range pixel, information is used from only that input image taken at the highest exposure in which the pixel of interest was not saturated. The author justifies this by pointing out that pixels observed at higher exposure times have less quantization noise than do pixels taken at lower exposure times. Mann and Picard [2] examined the situation where multiple pictures, each of different exposures, are taken of a scene; they provided a method of merging these multiple exposures to form a single image with an effective dynamic range greater than that of the camera. By making use of “certainty” functions, which give a measure of the confidence in an observation, Mann and Picard weight the observations from the various exposures to provide the final image. The certainty functions are themselves dependent on camera characteristics, which are modeled from
May 7, 2000
DRAFT
4
the observed data in an ad hoc manner. Yamada et al. [3, 4] studied the dynamic range problem in the context of vision systems for vehicles. The authors use multiple exposures of a scene, and assume a linear response for the CCD’s. The authors pick the final pixel output based only on the observation with the longest exposure time which is not saturated. While not explicitly giving justification for using only data from the highest non-saturated exposure, the implicit justification is the same as that given by Madden—to reduce quantization error. Moriwaki [5] examined the dynamic range enhancement of color images. The author uses multiple exposures of a static scene, and also assumes a linear CCD response. The method employed is similar to Yamada et al. in that the color values for a pixel are taken only from the observation pixel with the highest exposure time that was not saturated. Chen and Mu [6] suggest using a “cut-and-paste” method for increasing dynamic range, where blocks of the final image are taken from blocks of the input images in a manual manner. The authors propose this interactive method in order to avoid more complicated, and perhaps nonlinear, processing. This technique is obviously very limited, and any computational advantage is clearly lost when one considers the computational resources available today. Debevec and Malik [7] offer a more advanced method of increasing image dynamic range using multiple exposures. Rather than assuming a linear camera response, they assume an arbitrary response which is determined as part of the algorithm. The final output pixel is given as a weighted average of the input pixels taken at different exposures. The algorithm weights more heavily input data that are nearer to the mean of the input pixel range (128 for 8-bit data), and weights less the input data that are near to the extremes of the input pixel range (0 and 255 for 8-bit data). There are several limitations of the algorithms described above. In [6], the requirement of human intervention is an obvious drawback. In [1,3–5], linear camera response functions are all required. While one might argue that this is justified due to the linear nature of CCD’s [8], there are still potential problems. First, one is strictly limited to using only linear capture devices, which precludes the possibility of using images scanned from film. Second, while consumer digital cameras do typically use CCD’s, there is no guarantee of a linear response—for while the actual CCD’s may be linear, the camera manufacturer is likely to introduce non-linearities prior
May 7, 2000
DRAFT
5
to output in order to make the image more “visually pleasing.” There is also a fundamental limitation when an algorithm determines the light values using data from only one input source, rather than using all input data. Recall that the main motivation for using only the highest non-saturated input pixel is to try to minimize quantization error. Taking the approaches in [1, 3–5] would indeed make perfect sense if quantization were the only source of noise in the image capture process. However, there are other sources of noise present in the image capture process, and it makes more sense to compute an estimate that takes advantage of all available data. If we have higher confidence in data values taken at higher exposures, then these data should be weighted more heavily in the estimation process. An averaging process is indeed what is done in [2] and [7]. However, in [2], a parametric form is assumed for the response function, restricting the application of the method to a very limited number of situations. Furthermore, the weighting procedure does not take into consideration the quantization effects. While [7] does form an estimate of the nonlinear response function, the weighting process for the output values does not take into consideration the quantization effects discussed here and mentioned by Madden [1], and thus leaves room for improvement. This paper proposes a new method of increasing the dynamic range of images by using multiple exposures; the method is an extension of work first presented by the authors in [9]. The probabilistic formulation of the problem results in a solution which satisfactorily deals with the problems of the algorithms reported above. In particular, the response function of the image capture device is estimated, thus creating a versatility in our algorithm which is lacking in algorithms that assume a linear or parametric response. Estimation of the high-dynamic range pixel takes advantage of all available data by performing a weighted average. Proper weights arise as a result of the problem formulation, whereby data from higher exposure times are weighted more heavily. Section II introduces the observation model for this work. Section III gives the maximum likelihood solution of the high dynamic range image for known camera response, which includes situations such as those in [1, 3–5]. For unknown camera response, Section IV discusses how the response function can be estimated. We present experimental results in Section V, followed by concluding remarks in Section VI.
May 7, 2000
DRAFT
6
II. O BSERVATION M ODEL Assume there are N pictures taken of a static scene, with known exposure times ti ; i = 1; : : : ; N. Each image consists of M pixels, and the jth pixel of the ith exposed image will be denoted yi j ;
the set fyi j g represents the known observations. The goal is to determine the underlying light
values or irradiances, denoted by x j , that gave rise to the observations yi j . Note that the N images must be properly registered, so that for a particular a, the light value xa contributes to yia ; i = 1; : : : ; N. For this work, a normalized cross-correlation function [10] is used as the matching criterion to register images to 12 -pixel resolution.
We assume there is a response function f (), which maps exposure values to the observed
output data. Since only the exposure time is being varied, the exposure values which are arguments of f () are products of time and irradiance, ti x j . Note that the camera response function is actually the composition of various functions, depending on the method of image capture. For a digital camera, f () might consist of the composition of the linear CCD response function, analog-to-digital conversion, and any non-linear processing added by the camera manufacturer. For an analog camera, f () would consist of the composition of the film’s response function, the response function of the printing process (if the images are scanned from prints, rather from the actual film), and the response function of the scanning device which itself consists of the composition of several more functions. Here, we are only concerned with the overall composite response function f () and not any of its individual elements. Since only the exposure time is being varied, the quantity contributing to the output value yi j will be ti x j . To account for image capture noise, we introduce an additive noise term, Nicj , which also contributes to the observed pixel values. Depending on the system used to capture the image, Nicj could come from a variety of sources, such as photon shot noise, dark current noise, and noise in the analog-to-digital conversion process, for example. The quantity ti x j + Nicj
is then mapped by the camera’s response function f () to give the observed output values yi j = f (ti x j + Nicj ):
(1)
Since yi j are digital numbers, f () maps the non-negative real numbers representing exposures
ℜ+ = [0; ∞) to an interval of integers, O = f0; : : : ; 255g for 8-bit data. Without loss of generality,
May 7, 2000
DRAFT
7
this paper assumes the image data is 8 bits. We explicitly write the camera response function as 8 > > 0 >
> > : 255 if z
defined
1 ; Im ]; m = 1; : : : ; 254
(2)
;
2 (I254 ∞) ;
in
;
terms
of
the
255
numbers
Im ,
m = 0; : : : ; 254; Fig. 3 shows an example f (). For a linear response function such as in [1, 3–5], the Im values would be evenly spaced; in general, however, this will not be true. III. H IGH DYNAMIC R ANGE I MAGE W ITH K NOWN R ESPONSE F UNCTION In some situations, the response function of the image capture system is known. If one has access to direct CCD output, then one knows that the response is a linear function of exposure [8]; this is the image capture process assumed in [1, 3–5]. In this section, we show how to obtain high dynamic range image data with known response function. For the general situation where direct CCD output is unavailable or where a film camera is used, Section IV shows how to obtain the response function for arbitrary image capture systems; once the response function is known, then the methods of this section can be directly applied. We wish to estimate the irradiances x j with a dynamic range higher than that of the original
observations. If the function f () is known, we can define a mapping from O to ℜ+ as f
When determining f
1 (m),
1
(yi j )
q
=
ti x j + Nicj + Ni j
=
Iyi j :
(3)
one knows only that it belongs to the interval (Im
noise term above accounts for the uncertainty in assigning f error. One should keep in mind that f
1( )
1 (m ) = I , m
1 ; Im ].
q
The Ni j
and is a dequantization
is not a true inverse, since f () is a many-to-one
mapping. Rewriting (3), Iyi j
= ti x j + Ni j :
(4)
The noise term Ni j consists of the noise term introduced in Section II, as well as the dequanq
tization uncertainty term Ni j . Note that accurately characterizing the noise terms Ni j would be May 7, 2000
DRAFT
8
extremely difficult, as it would require detailed knowledge of the specific image capture process being employed. One would have to characterize each of the separate noise sources that compose Ni j , which would be a complicated task; this process would have to be performed each time a different image capture system is used. Furthermore, if different noise models are found for different capture devices (e.g., noise from one process is found to be approximated well by Gaussian noise, and noise from another capture process is found to be Laplacian), then entirely different estimators would result. Rather than attempt this, we will model the Ni j as zero-mean independent Gaussian random variables, with variances σ2i j . The Gaussian approximation is valid due to the potentially large number of noise sources present: all the noise sources inherent to acquiring digital images, e.g., dark current noise, photon shot noise, amplifier noise, and ADC noise; if a traditional camera is used there is noise inherent to film, e.g., photon shot noise and q
film grain; and the de-quantization noise Ni j . Note that, even with the Gaussian simplifying approximation, the noise variances σ2i j would be difficult to characterize accurately. Again, detailed knowledge of the image capture process would be required, and the noise characterization would have to be performed each time a different image capture device is used. Alternatively, one could attempt to characterize the noise experimentally; however, this would be a burdensome task to perform with every image capture system. Therefore, rather than attempting either of these approaches, the variances will be chosen heuristically. It will be convenient in the following to replace the variances with weights, wi j
=
1 . σ2i j
The
concept of weights is intuitive, and serves to ease the notational burden. The weights are chosen based on our confidence that the observed data are accurate. We take an approach similar to that of [7]. The response function of a camera will typically be steepest, or most sensitive, towards the middle of its output range, or 128 for 8-bit data. As the output levels approach the extremes of 0 and 255, the sensitivity of the camera typically decreases. For this reason, a weighting function will be chosen such that values near 128 are weighted more heavily than those near 0 and 255. The function chosen here is a Gaussian-like function,
wi j = w(yi j ) = exp
W
127:5)2 (127:5)2
(yi j
;
(5)
but scaled and shifted so that w(0) = w(255) = 0, and w(127:5) = 1:0. The parameter W is chosen to reflect our confidence in the reliability of pixel observations (recall that w(m) is the May 7, 2000
DRAFT
9
reciprocal of the noise variance for pixels observed to be m.) If our confidence is high in pixel values near 0 and 255, then W can be rather small. Similarly, if pixel values near 0 and 255 are very noisy and unreliable, then W should be chosen to be relatively large. Figure 4 shows the weighting function for two values of W . From (4), Iyi j are independent Gaussian random variables, and the joint probability density function can be written as
(
1 wi j Iyi j 2∑ i; j
P(Iy ) ∝ exp
ti x j
2
)
(6)
:
A maximum-likelihood (ML) approach will be taken to find the high dynamic range image values. The maximum-likelihood solution finds the values x j which maximize the probability in (6). Maximizing (6) is equivalent to minimizing the negative of its natural logarithm, which leads to the following objective function to be minimized: O(x) = ∑ wi j Iyi j
ti x j
2
(7)
:
i; j
Equation (7) is easily minimized by setting the gradient ∇O(x) equal to zero. This yields xˆ j =
∑i wi j ti Iyi j ; ∑i wi j ti2
(8)
the desired high dynamic range image estimate. Note that data from images taken with longer exposure times are weighted more heavily, as indicated by the ti term in the numerator of (8). Thus this method takes advantage of the quantization effects utilized in [1, 3–5]; however, here a noise-reducing averaging is being performed, which utilizes data from all input pixels. Equation (8) requires that the Im values (i.e., the response function) be known. In general, however, the response function is not known. The following section describes how to determine the response function, so that in the future the results just presented can be applied directly. IV. F OR U NKNOWN R ESPONSE F UNCTION Except in very specialized situations, the camera response function will not be known, and must be estimated. To uniquely determine the response function, the 255 values Im , m
=
0; : : : ; 254 must be found. At first glance, one may consider directly using the objective function in (7) to determine the Im values needed to define the response function. Note that in order to estimate the Im values May 7, 2000
DRAFT
10
from (7), the x j values are also unknown and need to be estimated simultaneously (if the x j ’s were already known, there would be little need to estimate the Im ’s!) Thus, the objective function for the case of unknown response function is O˜ (I; x) = ∑ wi j Iyi j
ti x j
2
(9)
:
i; j
An additional constraint on the response function is required when estimating I and x together using (9). This restriction on f () is in regard to scale. We are not interested here in absolute irradiance values, so issues relating to physical units are avoided. It is sufficient to determine the high dynamic range image to within a scale factor, for then the range of values found can be mapped to any desired interval. Since the scale of xˆ j is dependent on the scale of Im , we will constrain the estimates for the Im values such that Iˆ128 = 1:0. This is enforced by dividing each of the Iˆm ’s by Iˆ128 . A form of Gauss-Seidel relaxation [11] will be used to determine the solution. Seidel relaxation minimizes an objective function with respect to a single variable, and then uses these new values when minimizing with respect to subsequent variables. Here, (9) will first be minimized with respect to each Im . Then the restriction mentioned above is enforced. Finally, (9) will be minimized with respect to each x j . This will constitute one iteration of the algorithm. We denote the estimates for the variables of interest at the lth iteration as Iˆ (l ) and xˆ (l ) . The (0) initial Iˆ (0) is chosen as a linear function, with Iˆ128
=
1:0. The initial xˆ (0) is chosen according
to (8), using the initial linear Iˆ(0) . First, to minimize with respect to Im at the lth iteration, the partial derivative of (9) with respect to Im is taken and set equal to zero. This yields (l ) Iˆm =
1 (l ti xˆ j ∑ Card(Em ) (i; j)2E
1)
;
(10)
m
where the index set Em is defined as Em =
(i; j) : yi j = m
;
(11)
the set of indices such that m was observed for the input images. Card(Em ) is the cardinality (l ) of Em , i.e., the number of times m was observed. Equation (10) is applied for each Iˆm ; m =
0; : : : ; 254. May 7, 2000
DRAFT
11 (l ) After scaling the response function such that Iˆ128 = 1:0, minimization is performed with re-
spect to each x j ; j = 1; : : : ; M. This involves applying (8), (l )
xˆ j
=
(l ) ∑i wi j ti Iˆyi j : ∑i wi j ti2
(12)
This completes one iteration of the algorithm, and the process is repeated until some convergence criterion is met. The convergence criterion used here is for the rate of decrease in the objective function to fall below some threshold. V. E XPERIMENTAL R ESULTS Figures 1 and 5 show series of photographs of scenes that have wide dynamic ranges. The photographs in Fig. 1 were taken with a digital camera; the photographs in Fig. 5 were taken with a traditional camera using slide film and scanned using a Leafscan-35 slide-film scanner. The Leafscan-35 allows one to maintain constant exposure times between scans, as well as retaining the black and white points. Figures 1(c) and 5(b) are photographs taken at what might be considered “normal” exposure settings. From the figures, one notices that there is little detail visible in either the very bright or the very dark regions. Simple contrast stretching in the dark regions results in noisy-looking image areas; contrast stretching in the very bright regions does little good, due to the saturated pixel values. Thus, scenes such as these are excellent candidates for the algorithm discussed in this paper. Figure 6 shows the response functions determined by the algorithm discussed in this paper when applied to the sets of images shown in Figs. 1 and 5. Note that the response functions are not linear functions, and thus use of algorithms such as described in [1, 3–5] would be inappropriate for either of these cameras. For the scene in Fig. 1, the weighting function of Fig. 4(a) was chosen. However, for the scene in Fig. 5, very bright and very dark pixels values were quite noisy, so the weighting function in Fig. 4(b) was chosen to reflect this. There is one more important issue for scenes such as the one in Fig. 5: Strong edges, such as the bar in the doorway, tend to corrupt the results. This is due to the integrating nature of CCD elements: Depending on the exact location of the edge, the resulting edge pixel value can be anywhere between the dark and bright values on either side of the edge; this makes these edge pixels unreliable. To bypass this problem, we limit the calculation of the response function to using May 7, 2000
DRAFT
12
only pixels which are not on strong edges. This is done by limiting the set Em of (11) to Em0 , Em0 =
(i; j) : yi j = m; ya j
not an edge pixel; a = 1; : : : ; N
:
(13)
The edge mask used for the scene in Fig. 5 is shown in Fig. 7. The xˆ j values which were found while determining the response function are the ultimate variables of interest. Displaying these high dynamic range images on devices of limited dynamic range is a nontrivial undertaking. Methods from the computer graphics literature can be found in [12–15]. However, the focus of this research is not the display of high dynamic range images, but rather the acquisition of high dynamic range images. The methods we use for visualization of high dynamic range images are not chosen to give the most “visually pleasing” image, but rather to demonstrate the results of our algorithm. The full range of xˆ j values estimated for the scene in Fig. 1 is first linearly mapped to 8 bits; the result is shown in Fig. 8(a). From the figure, it is apparent that the solution contains accurate high-light information; however, it is not so apparent that the low-light detail is present. To demonstrate that accurate low-light detail is indeed contained in our solution, we can use the xˆ j values to simulate the camera output for a high exposure time. This is done by creating an 8-bit image with the jth pixel value equal to f (td xˆ j ), where td is the desired exposure time. This result is shown in Fig. 8(b) for td
1 = 3
second. Note that this is simulating camera output for an
exposure time not available on our camera. From the figure, one sees that low-light detail is also contained in the high dynamic range image data. We apply a simple contrast-adjusting transform to display the high dynamic range image estimate for the scene in Fig. 5; the transform and the resulting image are shown in Fig. 9. The image in Fig. 9 contains more detail from both dark and bright regions than any of the input images, and its superiority is readily evident. For the example results just given, the high dynamic range image was found while estimating the response function. Note, however, that once the response function for a capture device has been determined, this process need not be repeated when using that device in the future. Instead, (8) can be used to directly determine the desired high dynamic range image values. Figure 10 shows three pictures taken with the same camera as was used to photograph the images in Fig. 5. Using the response function found previously and shown in Fig. 6(b), a high dynamic range image was directly computed using (8). The result is shown in Fig. 11 with histogram May 7, 2000
DRAFT
13
equalization, and demonstrates that the resulting image contains more information than any of the individual images in Fig. 10. Although iterative in nature, the proposed algorithm has been observed to be very computationally efficient. Figure 12 shows the objective function values as a function of iteration number for determining the response function in Fig. 6(a). It is evident that convergence is rapidly achieved; discounting the image registration process, five iterations of the algorithm corresponds to approximately five seconds computation time for eight input pictures of size 640 500 on a 180-MHz personal computer. The high dynamic range images determined using the method outlined here have several advantages over single images such as those in Figs. 1, 5, or 10. The images obtained with our method have decreased noise due to the averaging of pixel values from each of the input images. Furthermore, they contain information in both low- and high-light areas, since the high dynamic range images consist of data from each of the input images. Traditional image processing algorithms (e.g., contrast stretching, histogram equalization, edge detection, etc.) can be applied to the high dynamic range images with more accurate results due to the increased amount of information present. VI. C ONCLUSION We have introduced a method of increasing the effective dynamic range of digital cameras by using multiple pictures of the same scene taken with varying exposure times. If necessary, the method first estimates the response function of the camera. Once the response function is known, high dynamic range images can be directly computed. These high dynamic range images contain accurate representations of both low- and high-light areas in the image, with decreased noise due to averaging of data from all the input images. R EFERENCES [1]
B. C. Madden, “Extended intensity range imaging,” Tech. Rep., GRASP Laboratory, University of Pennsylvania, 1993.
[2]
S. Mann and R. W. Picard, “On being ‘undigital’ with digital cameras: Extending dynamic range by combining differently exposed pictures,” in IS&T’s 48th Annual Conference, Washington, D.C., May 7–11 1995, pp. 422–428.
[3]
K. Yamada, T. Nakana, and S. Yamamoto, “Effectiveness of video camera dynamic range expansion for lane mark detection,” in Proceedings of the 1997 IEEE Conference on Intelligent Transportation Systems, Nov. 9–12 1997, pp. 584–588.
[4]
K. Yamada, T. Nakana, and S. Yamamoto, “Wide dynamic range vision sensor for vehicles,” in IEEE International Conference on Vehicle Navigation and Information Systems, Aug. 31–Sep. 2 1994, pp. 405–408.
May 7, 2000
DRAFT
14
[5]
K. Moriwaki, “Adaptive exposure image input system for obtaining high-quality color information,” Systems and Computers in Japan, vol. 25, no. 8, pp. 51–60, July 1994.
[6]
Z. Chen and G. Mu, “High-dynamic-range image acquisition and display by multi-intensity imagery,” Journal of Imaging Science and Technology, vol. 39, no. 6, pp. 559–564, Nov/Dec 1995.
[7]
P. E. Debevec and J. Malik, “Recovering high dynamic range radiance maps from photographs,” in SIGGRAPH 97 Conference Proceedings, Computer Graphics Annual Conference Series, 1997, Aug. 3–8 1997, pp. 369–378.
[8]
G. C. Holst, CCD Arrays, Cameras, and Displays, JCD Publishing and SPIE Optical Engineering Press, 1996.
[9]
M. A. Robertson, S. Borman, and R. L. Stevenson, “Dynamic range improvement through multiple exposures,” in International Conference on Image Processing, Kobe, Japan, Oct. 23–27 1999.
[10] H.-M. Hang and Y.-M. Chou, “Motion estimation for image sequence compression,” in Handbook of Visual Communications, Hseuh-Ming Hang and John W. Woods, Eds., pp. 147–188. Academic Press, 1995. [11] J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables, Academic Press, New York, 1970. [12] S. N. Pattanaik, J. A. Ferwerda, M. D. Fairchild, and D. P. Greenberg, “A multiscale model of adaptation and spatial vision for realistic image display,” in SIGGRAPH 98 Conference Proceedings, Computer Graphics Annual Conference Series, 1998, July 1998, pp. 287–298. [13] G. W. Larson, H. Reshmeier, and C. Piatko, “Visibility matching tone reproduction operator for high dynamic range scenes,” IEEE Transactions on Visualization and Computer Graphics, vol. 3, no. 4, pp. 291–306, Oct–Dec 1997. [14] J. Tumblin and G. Turk, “LCIS: A boundary hierarchy for detail-preserving contrast reduction,” in SIGGRAPH 99 Conference Proceedings, Computer Graphics Annual Conference Series, 1999. [15] C. Schlick, “Quantization techniques for visualization of high dynamic range pictures,” in Photorealistic Rendering Techniques, Proceedings of the 5th Eurographics Rendering Workshop June 13–15 1994, G. Sakas, P. Shirley, and S. Mueller, Eds., Berlin, 1995, pp. 7–20.
Mark A. Robertson received the B.S. in electrical engineering from Tri-State University, Angola, IN, in 1996, and the M.S. in electrical engineering from the University of Notre Dame, Notre Dame, IN, in 1998. He is currently pursuing the Ph.D. in electrical engineering at the University of Notre Dame. While at Notre Dame, he was supported by the Arthur J. Schmitt Fellowship, the Intel Corporation, and the Indiana Space Grant Consortium. His current research interests include video processing and enhancement, image and video post-processing, and video segmentation. He is a member of Eta Kappa Nu and Tau Beta Pi.
May 7, 2000
DRAFT
15
Sean Borman received the B.Sc. degree in Electrical Engineering with first class honors from the University of Cape Town, South Africa, in 1992. He later completed the MSEE degree at the University of Notre Dame in 1996. After teaching and consulting in South Africa, Mr. Borman returned to Notre Dame to pursue the Ph.D. degree, also in Electrical Engineering. His professional interests include inverse problems, super-resolution video restoration and robust, sub-pixel motion estimation. He is a member of Eta Kappa Nu, a recipient of the South African Foundation for Research Development (FRD) award for post-graduate overseas study and (1993-1995) Fulbright Scholar.
Robert L. Stevenson received the B.E.E. degree (summa cum laude) from the University of Delaware in 1986, and the Ph.D. in Electrical Engineering from Purdue University in 1990. While at Purdue he was supported by graduate fellowships from the National Science Foundation, DuPont Corporation, Phi Kappa Phi, and Purdue University. He joined the faculty of the Department of Electrical Engineering at the University of Notre Dame in 1990, where he is currently an Associate Professor. His research interests include image/video processing, image/video compression, robust image/video communication systems, multimedia systems, ill-posed problems in computational vision, and computational issues in image processing. Dr. Stevenson is an Associate Editor of the IEEE Trans. on Image Processing and the IEEE Trans. on Circuits and Systems for Video Technology, and is a former Associate Editor of the Journal of Electronic Imaging.
May 7, 2000
DRAFT
16
a
b
c
d
e
f
g
h
Fig. 1. Eight pictures of a static scene taken at different exposure times, photographed using a Nikon E2N digital camera with aperture f/6.7. The image resolution is 640 500. The exposure times are (a) 18 , (b) (d)
1 60 ,
May 7, 2000
(e)
1 125 ,
(f)
1 250 ,
(g)
1 500 ,
and (h)
1 1000
1 15 ,
(c)
1 30 ,
seconds.
DRAFT
Histogram
17
1 I 3 max
2 I 3 max
I max
Real-world intensity
Fig. 2. Example histogram of real-world scene intensities, with arbitrary intensity units.
f(exposure)
255
5 4 3 2 1 0
I0
I1
I2
I 3 I4
exposure, t i x j
I254
Fig. 3. Example camera response function, f (). The inset shows a close-up view near the origin and demonstrates
1
1
0.8
0.8
0.6
0.6
Weight
Weight
the discrete nature of f ().
0.4
0.2
0 0
0.4
0.2
50
100 150 200 Observed Pixel Value
0 0
250
a
100 150 200 Observed Pixel Value
250
b
Fig. 4. Weighting function, w(m). (a) W
May 7, 2000
50
= 4;
(b) W
= 16.
DRAFT
18
a
b
c
d
e
f
g
h
i
j
Fig. 5. Ten pictures of a static scene taken with a Nikon FM camera with aperture f/5.6. The image resolution is 450 300. The exposure times are (a) 12 , (b) 14 , (c) 18 , (d)
1 15 ,
(e)
1 30 ,
(f)
1 60 ,
(g)
1 125 ,
(h)
1 250 ,
(i)
1 500 ,
and (j)
1 1000
seconds.
May 7, 2000
DRAFT
19
300
f(exposure), m
250 200 150 100 50 0 −3 10
−2
−1
10
0
10 10 Exposure, I
1
10
2
10
m
300
f(exposure), m
250 200 150 100 50 0 −2 10
−1
10
0
10 Exposure, Im
1
10
2
10
a b
Fig. 6. Semilog plot of the estimated response functions for the cameras used to capture the images in (a) Fig. 1; and (b) Fig. 5.
May 7, 2000
DRAFT
20
Fig. 7. Edge mask used to limit the set Em for the scene in Fig. 5.
May 7, 2000
DRAFT
21
a b
Fig. 8. High dynamic range output images. (a) Full dynamic range linearly mapped to eight bits; (b) using high dynamic range image data to simulate camera output for an exposure time of
May 7, 2000
1 3
second.
DRAFT
22
Transformed Pixel Value
250
200
150
100
50
0 0 10
1
2
10 10 High−Dynamic Range Pixel Value
a b
Fig. 9. Results for scene in Fig. 5. (a) Transform applied to image estimate; the high-dynamic range pixel estimates xˆ j were normalized to [1; 256] prior to application of this transform. (b) The resulting transformed image.
May 7, 2000
DRAFT
23
a
b
c
Fig. 10. Three pictures of a static scene taken with a Nikon FM camera with aperature f/16 and exposure times of (a) 18 , (b)
1 30 ,
and (c)
1 250
seconds.
Fig. 11. Histogram equalization of the high-dynamic range image estimate for the scene in Fig. 10
May 7, 2000
DRAFT
24
12000
Objective Function Value
10000 8000 6000 4000 2000 0 1
2
3
4
5 6 7 Iteration Number
8
9
10
Fig. 12. Value of objective function O˜ (I; x) as a function of iteration number for image set in Fig. 1.
May 7, 2000
DRAFT