Image Processing Considerations for Digital Photography - CiteSeerX

Image Processing Considerations for Digital Photography Ping Wah Wong1, Daniel Tretter1 , Cormac Herley1, Nader Moayeri1 , and Ruby Lee2 1 2

Hewlett Packard Laboratories, 1501 Page Mill Road, Palo Alto, CA 94304. Hewlett Packard Company, 19410 Homestead Road, Cupertino, CA 95014. Abstract

G B G B

We consider an image processing pipeline that enables a user to capture and generate high quality images from a digital camera. In such a system, some image processing tasks are typically implemented within the camera, while others are performed in software on a host computer that controls the output device. We consider the complexity issues and tradeoffs involved in designing several key algorithms in such a system. We also discuss hardware features that accelerate these algorithms.

R G R G

G B G B

R G R G

- demosaicing -

B B B B

B G G G G

B G R R R R

B GG RR RR RR RR

R R R R

Figure 1. A mosaic pattern for a digital camera sensor, and the corresponding full color reconstructed image. compressed using, for example, JPEG [9] during storage. Most digital cameras use a charge coupled device (CCD) array as the image sensor, although other technologies are under investigation. The sensor consists of a twodimensional array of light sensitive elements. When the sensor is exposed to the light from a scene, the elements measure the amount of incident light at each array location, producing a two-dimensional array of picture elements (pixels) from which the image is constructed. To obtain color information, many cameras cover each CCD pixel with a color (usually red, green, or blue) filter. The color filters for the pixels are arranged in a mosaic pattern whereby one primary color is captured for each pixel. An example of such a mosaic pattern is shown in Fig. 1. Since we only capture one primary color for each pixel, some form of interpolation must be used to construct the full color information at each pixel location. Such a task can be accomplished using a procedure called color interpolation or demosaicing. A main purpose of a digital camera is to capture images that will eventually be reproduced in hardcopy (photographs). This is typically done in conjunction with a personal computer and a photographic printer. As a result, it is important to consider the entire image processing pipeline from capture to print. An algorithm and system designer will have the freedom to perform certain processing steps within the camera, and others in the host computer. This is usually done to maximize system performance as well as to satisfy certain system constraints such as memory and speed requirements. In this paper, we consider the image processing pipeline of a camera system including a host computer

1. Introduction The increasing popularity of digital photography can be observed from the number of new digital cameras that have been introduced in the market in the past two years. Various models of digital cameras have been designed; some are targeted for professional photographers whereas others are designed for the average point-and-shoot photographers. An excellent collection of information on digital cameras from various manufacturers can be found in “The Digital Camera Guide” on the world wide web [1]. A main advantage in digital photography is that it is possible to generate high quality prints (photographs) using a good printer without going through a photofinisher. In addition, since the images are captured directly in digital form, they can be easily transmitted for processing, storage and display, either through a local network or through the internet. One can also easily introduce special effects such as incorporating the photograph into a greeting card, or morphing two photographs together. The disadvantages of digital cameras today are that they are still relatively expensive, and usually produce images of lower quality compared to traditional silver halide photography. In a typical digital camera, the image is captured, processed, and then stored on the camera. The images can be transmitted and further processed on a host computer and then either displayed or printed. The computer can also be used for long term storage of the images. In order to maximize the efficiency of storage, the image data is often 1

source - demosaicing - correction - compression {z } | camera: capture and process hardcopy - decompression - enhancement - rendering and scaling {z } |

increased compressibility of the image data. Source correction is used to convert the demosaiced image to a desired standard color space for compression and viewing. The spectral responses of the color mosaic filters usually do not match the desired colors for image compression or display. Also, the colors in an image depend upon the scene illuminant, which varies from picture to picture. A scene viewed in sunlight, for example, can appear quite different from the same scene viewed under incandescent illumination. The data from the demosaiced image planes must thus be corrected to produce a viewable RGB image.

host computer: process and print

Figure 2. An Image Processing Pipeline for a Digital Photography System.

The second part of the pipeline resides on the host computer. This processing is used to produce a hard copy of the compressed images stored on the camera. After decompressing the data, the output print quality can be improved by enhancing the images in several ways. The low resolution and inexpensive optics of consumer digital cameras usually produces slightly blurred images. Scaling the image by a factor of two or more for printing compounds this problem. As a result, a modest to considerable amount of sharpening will improve the appearance of nearly all images from consumer digital cameras. Some camera manufacturers incorporate sharpening into the camera processing itself, so they can pass their JPEG images directly to the host computer without further refinement. However, the hardware and memory constraints of the camera greatly limit the complexity of any onboard sharpening algorithm.

and a printer. We describe the essential processing steps, their trade-offs in the overall system design, and hardware features that accelerate these.

2. Image Processing Procedures Consider the image processing pipeline in Fig. 2, which is quite typically found in consumer grade digital cameras. By performing the demosaicing and source correction on the camera, a full color image of reasonable quality is available at the input of the compression module. The camera can thus use a standard image compression algorithm, usually JPEG, on the image data. Since the JPEG standard [9] has gained widespread usage for still image compression, the camera manufacturer can purchase JPEG compression chips economically from any of a number of vendors. Also, the images from the camera can be easily transferred and displayed on the host computer, since many software applications can read JPEG compressed images. The first processing stage in our example pipeline is demosaicing, where the digitized pixel values from a mosaiced sensor are interpolated to form a full set of color values at each pixel location. For the three color mosaic on the left side of Fig. 1, the demosaicing algorithm will read a pixel value corresponding to red, green, or blue at each array location. The output of the demosaicing block will give values for red, green, and blue at each pixel location, as shown in Fig. 1. The demosaicing stage will therefore triple the amount of data. It may seem odd to perform an operation that triples the amount of data used to represent an image when the available storage space on the camera is typically small, making compression a necessity. However, color image compression algorithms normally take advantage of human visual characteristics by transforming the image into a luminance-chrominance representation, where the chrominance components can be subsampled and compressed to a very small number of bits [18]. Such a color transformation is only possible after demosaicing, so the data expansion from the demosaicing step is largely counteracted by the

The visibility of features within some images can also be improved by applying a contrast enhancement algorithm. This class of algorithms adjusts the image brightness to improve contrast, particularly when the scene conditions or camera settings result in underexposed or washed out images. Some fixed input/output intensity adjustment curves are also known as contrast enhancement, but we will use the term to refer to image-dependent remapping of pixel values. Digital cameras often compress the images rather aggressively, so visible artifacts can appear in the reconstructed image after decompression. JPEG produces characteristic artifacts that can be identified and reduced with some success by an appropriate enhancement algorithm [6]. Compression artifact reduction algorithms are typically applied as postprocessing steps to improve decompressed images, so they must be applied on the host computer after decompression. Finally, the images must be scaled to print at the desired size on the output device. A VGA resolution camera, for example, will produce a 640 480 image, which will need to be scaled by a factor of nearly 3 to produce a 6 4 inch print on a 300 dpi printer. The last block in the pipeline of Fig. 2 is hardcopy rendering. This stage is typically performed by an appropriate print driver installed on the host computer, for converting the image into a format that is suitable for printing. 2

3. Algorithmic Considerations

The first matrix multiplication converts the camera filter colors to a calibrated linear color space, normally some RGB space. Ideally, the matrix values will be set based on the spectral characteristics of the sensor and the color filters as well as the scene illuminant. Sensing and correcting for the scene illuminant is a difficult task, although a number of procedures have been devised, mostly for use with video cameras or traditional analog photography[13, 17]. Once the data is in the desired RGB color space, each of the color planes can be gamma corrected by remapping the data in each plane through a one-dimensional lookup table. Gamma correction adjusts the data for viewing on a monitor, which has a nonlinear input/output characteristic. Since most printer drivers are calibrated to produce prints that approximate the image appearance on a monitor, this correction is also appropriate for printing the image. Finally, a second matrix multiplication can be used to convert to a luminance-chrominance color space, such as YCrCb or YUV, suitable for JPEG compression. Contrast enhancement: Contrast enhancement algorithms are designed to improve the visibilityof image details by making better use of the full dynamic range of luminance values. Some pixels will be brightened and others darkened to improve overall contrast. Contrast enhancement algorithms generally fall into two classes. Global enhancement algorithms change the pixel values according to a single lookup table that is applied to the entire image, while local enhancement techniques change a pixel value according to the image data in a local region surrounding the pixel. Global enhancement algorithms are typically two-pass algorithms, requiring the entire image to be buffered. The first pass is used to look at the image data and gather statistics describing the distribution of pixel values in the image. These statistics are then used to construct the enhancement lookup table, which is applied to the entire image in the second pass. One advantage of global enhancement algorithms is that they tend to be computationally efficient, since the lookup table is constructed based only on the image statistics rather than directly on the pixel values. Disadvantages are that it requires buffering the entire image, and there may be no single lookup table that enhances contrast over the entire image, so the algorithm will not improve some images. Local enhancement algorithms change the lookup table according to the image characteristics in a local area surrounding a pixel. By adjusting the table on a local basis, these methods can improve contrast throughout the image, and only part of the image needs to be buffered for the algorithm to start generating output values. Unfortunately, these algorithms are generally quite computationally complex, since they must generate new statistics for every region. Compression: Compression is widely used in digital cameras to store a large number of images in its finite memory space (often a removable card). There are two strategies

The computational requirements of the processing stages described in the previous section must be taken into account when designing the overall system. In particular, operations that take place on the camera are typically limited in computational complexity and memory requirements, since the camera must operate under a variety of constraints on power consumption, speed, and cost. Processing on the host computer, on the other hand, is normally implemented in software and has less restrictive speed requirements, so these algorithms can be more complex. Demosaicing: The demosaicing module takes in raw data from the CCD array and interpolates to form a fully populated set of input color planes. The algorithm used to perform the interpolation is strongly dependent on the underlying mosaic pattern. Complicated patterns with large numbers of colors generally require more computationally complex algorithms than simple repeating patterns such as the one in Fig. 1. However, more complicated patterns can sometimes be used to better match the colors in the original scene, and such patterns may produce images with fewer color artifacts from the demosaicing. When interpolating the color planes, the algorithm can be designed to either operate on each plane independently or to take advantage of the correlation between color planes by interpolating the planes jointly. Techniques that do the latter will usually produce sharper images with fewer color artifacts, but they require significantly more computation. Examples of such algorithms can be found in [4] and [5]. If demosaicing is to be performed on the camera, the system designers are normally limited to a simple mosaic pattern with only three or occasionally four colors and a fast plane-independent demosaicing algorithm. This approach can result in highly visible color artifacts near image edges and in highly textured regions, so digital cameras in this class almost always include an optical blurring filter to smooth the data before processing. By removing the high frequency details from the image, the color artifacts can be diminished. Unfortunately, this blurring operation degrades the image, essentially decreasing its effective resolution. A discussion of some of the options and issues involved in designing a single chip (only one CCD array) camera can be found in [19]. Source Correction: The source correction stage transforms the demosaiced color planes from the camera sensor to a color space suitable for compression or viewing. This stage typically involves multiple steps of matrix multiplication and table lookup operations. Suppose we consider the processing pipeline of Fig. 2 with JPEG compression. One candidate source correction algorithm could consist of a matrix multiplicationfollowed by a one-dimensional lookup table and a second matrix multiplication. 3

for performing compression in the camera. One is to compress the mosaiced data and then to demosaic the data after decompression on the host/driver. The other is to demosaic the data on the camera and then compress the full color data. The latter is by far the more popular choice and is the scenario depicted in Fig. 2. Two popular lossy compression schemes for use in a digital camera are block transform based coders and subband (wavelet) coders. In the first category JPEG is the logical choice, because it is a popular standardized still image compression algorithm. It yields good rate-distortion performance (compression vs. image quality) and it is readily available in off-the-shelf VLSI chips. In the past several years many other compression schemes have appeared in the literature that do indeed outperform JPEG in a ratedistortion sense, sometimes significantly. Two of the most powerful recent compression schemes are [21] and [12]. The former [21] has the additional advantage that it produces an embedded bitstream. However, these compression schemes are often more computationally complex than JPEG and do not enjoy the standard compatibility of JPEG. JPEG artifact removal: JPEG compression artifacts generally take the form of “blockiness” or discontinuities at the 8 8 image blocks used by JPEG. These artifacts are particularly objectionable if the decompressed image is scaled by a large factor. The most obvious approaches to reducing JPEG artifacts involve some linear filtering or smoothing of the image. While these methods are effective, they also cause considerable blurring. An important class of algorithms for this problem is based on an iterative procedure. Representative of them is [6]. The idea is that one finds an image that is smooth in some sense, but also satisfies the constraint that the image produces the given JPEG bitstream when compressed. An iterative algorithm to find an image that satisfies both constraints results in an image that has far fewer artifacts [6]. Even though a small number of iterations usually suffices for convergence, iterative algorithms are still more complex than straightforward filtering. Sharpening: Unsharp masking [10] is the popular choice for sharpening. This is a linear filtering operation with a kernel of small support size. It is computationally inexpensive. While it can usually improve the quality of the image and make it look sharper, it also amplifies any noise present in the image. A more sophisticated approach is to design a sharpening filter based on the blurring characteristics of the optical lens system and the integration effect of the CCD sensor array. One solution is to design a Wiener filter to compensate for the blur based on models available in the literature for the lens blur and a stochastic model for image data. This approach yields a better performance than unsharp masking, but it generally requires more computation. There are also some nonlinear sharpening methods that do not produce as much ringing around the edges in the image

when they sharpen it. Some of these methods include linear filtering as a subblock in their algorithms. Scaling: Large image sensors are difficult to manufacture inexpensively, and hence the size of the image sensor array in digital cameras is a major factor in the cost. Some form of scaling is necessary if one desires to print at 300 dpi or higher. For example a VGA resolution image will be only 1:600 2:100 if printed at 300 dpi, whereas typical post card sized photographs are of size 400 600. The simplest schemes scale an image by pixel replication or bilinear interpolation [10]. As with one-dimensional interpolation, there is a tradeoff between the complexity of the algorithm (measured by the length of the interpolation filter) and the quality. More complicated schemes than bilinear interpolation are generally based on splines. Because of the limits of the source resolution, scaling by factors greater than three or so generally results in images that are blurred. A recent class of algorithms that attempts to give sharper images at large scaling factors first generates an edge map of the image at the target resolution. This is followed by interpolation, but instead of smoothing across edges some special heuristics are employed [11, 2]. Output rendering: Halftoning is required in the pipeline because many color inkjet printers print at two levels (ON or OFF) for each primary color. Common halftoning algorithms include ordered dithering [3] and error diffusion [7]. In dithering, a dither matrix is used which is an array of thresholds. Each pixel value is compared directly to a threshold element and then the binary output value is decided. In addition to being a very simple operation, it is also very amenable to parallel processing. Another technique for halftoning is error diffusion, which is essentially a feedback quantizer where a binary quantizer is embedded in a feedback loop with a digital filter. Instead of directly quantizing the pixel values, the difference between the pixel value and a filtered version of the “previous” quantization error is thresholded. Error diffusion is a sequential operation that is not very amenable to parallel operation.

4. Hardware Considerations The initial part of the imaging processing pipeline that is contained in the camera (Fig. 2) has to be handled by very low-power, small footprint, and low-cost hardware. While special-purpose hardware for each of the pipeline blocks can be used, a better solution is to use a simple, general-purpose embedded processor and some shared memory, inside the camera, to perform the demosaicing, source correction and compression functions. This allows greater flexibility in changing the algorithms used within the camera. The key low-level operations required are interpolation,table lookup, plus other additions and multiplications. The latter part of the image processing pipeline shown in 4

Fig. 2 is usually done outside the camera in a host computer, or some image processing server computer on the network. Here, more computationally intensive algorithms may be used. We discuss below a few key operations performed in these stages: matrix multiplication, convolution, statistics gathering and table lookup. Matrix multiplication: Matrix multiplication is denoted by Y = AX where A is an m n matrix, X is an n-vector, and Y is an m-vector. We often have m = n. This operation involves mn multiplications and m(n ? 1) additions. Before applying hardware to speed up this operation, we should try to reduce the number of multiplications and additions needed as, for example, in the case of a sparse matrix A. The remaining multiplications need to be examined for the degree of precision required (i.e., the number of bits required in the operands and the results), the format of the data (i.e., integer or floating- point), and whether the multipliers in the matrix A are constants (i.e., values known at compile time). If the degree of precision required is less than 16 bits, then subword parallel instructions [14, 15, 16] may be used effectively to exploit the data parallelism available in matrix multiplication. Subword parallel instructions assume that multiple lower-precision operands are packed into a word of a processor. Today’s microprocessors typically have word sizes of 32 or 64 bits. The word width is the unit around which the processor’s general-purpose registers and integer datapaths are optimized. For example, Subword Parallel Add instruction allows four parallel 16-bit add operations to be performed, with very minor modifications to the standard 64-bit adder in a microprocessor. This can speed up matrix multiplication four times, by processing four X -vectors in parallel, giving four Y -vector results simultaneously. Most of the multimedia instructions added to general-purpose microprocessors [14, 15, 22, 20] include subword parallel instructions for addition, subtraction, and some forms of multiplication. For precision requirements greater than 16 bits, the use of subword parallel integer instructions is less clear, since 32-bit subwords are indicated. (Computers tend to be more efficiently organized around computations on data widths that are a power of two.) Here, the use of 32-bit (singleprecision) floating-point may be preferable to 32-bit integer, since most microprocessors already have floating-point units. In fact, they often have floating-point multiplyaccumulate instructions that perform two operations in a single instruction [8]: a multiply of two operands, and an add of the product to a third operand. While such a floatingpoint multiply-accumulate instruction takes several processor cycles to complete, most pipelined microprocessors allow a new multiply-accumulate instruction to be started every cycle. Hence, the effective parallelism is already two operations per cycle. Adding two 32-bit integer multipliers

to a microprocessor would be very costly, with the same net result of only two operations per cycle. Furthermore, 32-bit floating-point often provides improved accuracy over 32-bit integer computations, with much less programmer attention required to pre- or post-align data and results. Since image data is often in the form of integers and 16-bit precision for intermediate data is often sufficient, the use of 16-bit subword parallel integer instructions is possible. The issue then, is supplying parallel 16-bit integer multipliers to complement the parallel 16-bit add or subtract instructions. Again, this is fairly costly, but the cost has been borne by some microprocessors. In some cases, e.g. SPARC VIS, the cost has been reduced by only allowing 8-bit by 16bit multiplications [22]. In the PA-RISC MAX-1 [14] and MAX-2 [15] multimedia extensions, Subword Parallel Shift Right (or Left) and Add instructions are used in the dataindependent portions of image processing for accelerating multiplications by constants. These instructions use, again, the existing 64-bit adder with its existing preshifter. Subword Parallel Shift Right and Add is useful for multiplication by fractional constants, whereas Subword Parallel Shift Left and Add is more useful for multiplication by integer constants. For example, the use of Subword Parallel Shift Right and Add instructions to perform p four parallel multiplications of the input data by 2 = 1:4121356, is done by approximating this value by (1+1/4+1/8+1/32+1/128), which is 1.4140625. For example, in an inverse discrete cosine transform routine, four Parallel Shift and Add instructions are required, on the average, for each constant multiplication to produce four products. Since most superscalar microprocessors [8] have two 64-bit adders, two Parallel Shift and Add instructions may be executed per cycle, resulting in an effective speed of four cycles per eight multiplications, or half a cycle per multiplication. The low-cost multiplication is often higher in performance than even a dedicated hardware multiply circuit. Convolution: Two dimensional convolution is represented by ym;n = j;k hj;k xm?j;n?k. With an M M filter hj;k , M 2 multiplications and (M 2 ? 1) additions per pixel are generally needed for implementing the linear filtering operation. The filter usually has some symmetry properties that can be exploited to reduce the number of multiplications. The filter can usually be implemented in a sliding window fashion across a few rows of the image, resulting in a reduction of the memory buffer requirement and the number of memory load and store instructions. Hardware acceleration can be accomplished with the same parallel addition and multiplication support described earlier. Although many of the filters are 3x3 or 5x5 in size, there is little incremental acceleration achieved by specialized multi-cycle instructions that perform 3x3 or 5x5 convolution. Here, it is more flexible to build the convolution

P

5

out of multiply and add instructions, which allow any size convolution, without loss in performance. Statistics gathering and table lookup: Statistics gathering often involves building a histogram of some components of an image. An image pixel component xm;n 2 f0; 1; . . . ; k ? 1g is evaluated and used to increment an entry of the k-bin histogram hi . This information is often used to create a remapping of the pixel component values, in the form of a table lookup ym;n = t(xm;n ). Here xm;n serves as an index into the table t with k entries, and ym;n is the value of the table entry selected. A common case is to remap an 8-bit pixel component value into another 8-bit value. This can be done by looking up a table of 256 entries, each entry being 8-bits wide. Since the original pixel values tend to be packed back to back, 8 pixels are packed in a 64-bit word. The table lookup for each of these 8 pixels takes 3 instructions: extract the 8-bit value, use this as an indexed load byte instruction, and deposit this byte into the appropriate position in the result register. This requires 8*3 = 24 instructions, including 8 load instructions. An alternative solution that can sometimes be used is to approximate the 256 table entries by 8 piecewise linear functions, each of the form ai x + bi (i = 1; 2; :::; 8). Assuming each slope ai and each intercept bi are represented by 8-bit values, then all 8 slopes can be contained in one 64-bit register, and all 8 intercepts can be contained in another. We can perform the remapping of 8 pixels with 4 instructions: 1) select slopes for each of the 8 input pixels, 2) select intercepts for each of 8 input pixels, 3) multiply 8 input pixels by 8 selected slopes, and 4) add these 8 results to the 8 selected intercepts. Many combinations of function approximations and smaller table lookups are clearly possible. For example, the initial segment of the curve may not be well approximated by a single straight line. A table lookup for this initial segment might be used.

[2] J. P. Allebach and P. W. Wong. Magnifying digital image using edge mapping. US Patent 5,446,804, Aug. 1995. [3] B. E. Bayer. An optimum method for two-level rendition of continuous-tone pictures. In Proc. IEEE Int. Conf. Commun., pp. 26.11–26.15, 1973. [4] D. H. Brainard. Bayesian method for reconstructing color images from trichromatic samples. In Proc. IS&T 47th Annual Meeting, pp. 375–380, 1994. [5] L. J. D’Luna and K. A. Parulski. A systems approach to custom VLSI for a digital color imaging system. IEEE J. SolidState Circuits, 26:727–737, May 1991. [6] R. Eschbach. Decomposition of standard ADCT-compressed images. US Patent 5,379,122, Jan. 1995. [7] R. Floyd and L. Steinberg. An adaptive algorithm for spatial grey scale. In SID Int. Symp., Dig. Tech. Papers, pp. 36–37, 1975. [8] D. Hunt. Advanced performance features of the 64-bit PA8000. In Proc. IEEE Compcon, Mar. 1995. [9] ITU-T Rec. T.81 — ISO/IEC No. 10918-1. Information technology — Digital compression and coding of continuoustone still images, Part I: Requirements and guidelines, 1993. [10] A. K. Jain. Fundamentals of Digital Image Processing. Prentice-Hall, Inc., Englewood Cliffs, NJ, 1989. [11] K. Jensen and D. Anastassiou. Subpixel edge localization and the interpolation of still images. IEEE Trans. Image Processing, pp. 285–295, Mar. 1995. [12] R. L. Joshi, H. Jafarkhani, J. H. Kasner, T. R. Fischer, N. Farvardin, M. W. Marcellin, and R. H. Bamberger. Comparison of different methods of classification in subband coding of images. IEEE Trans. Image Processing, 1997. To appear. [13] H. Lee. Method for determining the color of a scene illuminant from a color image. US Patent 4,685,071, 1987. [14] R. Lee. Accelerating multimedia with enhanced mircroprocessors. IEEE Micro, pp. 22-32, Apr. 1995. [15] R. Lee. Subword parallelism with MAX2. IEEE Micro, pp. 51–59, Aug. 1996. [16] R. Lee and M. McMahan. Mapping of application software to the multimedia instructions of general-purpose microprocessors. In SPIE Proceedings, Multimedia Hardware Architectures, Feb. 1997. [17] J. Murayama and K. Suzuki. Light source discriminating device for a camera. US Patent 5,128,708, 1992. [18] A. N. Netravali and B. G. Haskell. Digital pictures: Representation, Compression and Standards. Plenum Press, New York, NY, 2nd ed., 1995. [19] K. A. Parulski. Color filters and processing alternatives for one-chip cameras. IEEE Trans. Electron Devices, pp. 1381– 1389, Aug. 1985. [20] A. Peleg and U. Weiser. MMX technology extension to the Intel architecture. IEEE Micro, pp. 42–50, Aug. 1996. [21] A. Said and W. A. Pearlman. A new, fast, and efficient image codec based on set partitioning in hierarchical trees. IEEE Trans. Circuits Syst. Video Tech., pp. 243–250, June 1996. [22] M. Trembley, M. Connor, V. Narayanan, and L. He. VIS speeds new media processing. IEEE Micro, pp. 10–20, Aug. 1996.

5. Conclusion We have described several issues pertaining to image processing for a digital camera system. The major image processing blocks that are necessary for generating high quality prints are described. We described how the multimedia instructions added to general-purpose processors may be used to accelerate some important image processing operations, and mentioned some new instructions that may be useful. Several interesting image processing issues for further consideration include cross-plane demosaicing, watermarking, denoising algorithms, and alternative processing pipelines.

References [1] The digital camera guide. Plug-In Systems, http://rainbow.rmii.com/˜plugin/dcg table.html, 1996.

6