Document not found! Please try again

Compression for Full-Parallax Light Field Displays

0 downloads 0 Views 749KB Size Report
the final reconstructed elemental image is expected to have depth errors and ... bounding box, location of the objects and optical properties of the capture system ...
Compression for Full-Parallax Light Field Displays Danillo B. Graziosi*, Zahir Y. Alpaslan *, Hussein S. El-Ghoroury Ostendo Technologies Inc., 6185 Paseo del Norte, Carlsbad, CA, USA 92011

ABSTRACT Full-parallax light field displays utilize a large volume of data and demand efficient real-time compression algorithms to be viable. Many compression techniques have been proposed. However, such solutions are impractical in bandwidth, processing or power requirements for a real-time implementation. Our method exploits the spatio angular redundancy in a full parallax light field to compress the light field image, while reducing the total computational load with minimal perceptual degradation. Objective analysis shows that depending on content, bandwidth reduction from two to four orders of magnitude is possible. Subjective analysis shows that the compression technique produces images with acceptable quality, and the system can successfully reproduce the 3D light field, providing natural binocular and full motion parallax. Keywords: 3D, Light Field Displays, Light Field Compression, Light Field Rendering

1. INTRODUCTION In recent years, 3D stereoscopic imaging has gained momentum due to the main stream acceptance of 3D movies and availability of 3DTVs. As the 3D technology became more main stream, two of its problems became more obvious: discomfort due to use of glasses and due to vergence accommodation conflict [1]. Commercially available autostereoscopic displays that make use of lenticular lens sheets and parallax barriers provide a viable solution for eliminating the glasses but their resulting low resolution and narrow viewing angle leaves a lot to be desired. Achieving depth and nearly correct focus cues in a practical way requires a display with small angular pitch and this usually means standard resolution commodity displays coupled with high frequency head trackers [2], with narrow field of view [3], or a display that requires a lot of views [4]. In any case, achieving more realistic 3D images require more views or more pixels to be generated in real time, which means increased computational complexity and/or bandwidth requirements. A review of the recent literature shows that research on full parallax light field imaging technologies (such as FTV, IP, holography) is on a rise as the most promising next step in 3D stereoscopic imaging. The biggest problem in realizing full parallax light field displays is the system complexity. Adding the vertical parallax complicates system design in computational complexity, data bandwidth, power consumption, system size and weight. Making full parallax light field displays practical will require significant system level innovations, with the most important being the creation and transmission of the full parallax light field image data. Our new technique, called Compressed Rendering is designed to simultaneously reduce the computational complexity and data bandwidth in creating and transmitting the light field image data. This is achieved by changing the old paradigm of render first, compress later to a new paradigm of render and compression in a single step. In this paper we present initial results for Compressed Rendering framework, and show how it is able to achieve high compression with limited complexity algorithms and still reproduce the entire light field with high fidelity. This article is organized in the following way. Section 2 provides an overview of the state-of-the-art in light field display compression. Section 3 explains our light field compression algorithm, Section 4 presents our simulation methodology, and Section 5 presents the coding results for several types of data. Finally we conclude the article in Section 6, providing a summary of the achievements and discussing future developments.

*{(danillo.graziosi, zahir)@ostendo.com}, Phone: +1 760-710-3000, Fax: +1 760-710-3017, www.ostendo.com

2. PREVIOUS WORK ON LIGHT FIELD COMPRESSION Full-parallax light field displays of varying qualities using diffraction and geometric optics principles have been demonstrated [6]-[10] and various light field compression methods have been proposed [5], [11]-[13]. The common problem with all the full parallax light field displays is the immense amount of data required to drive the display to achieve a reasonable resolution. For example to drive a full parallax color display with 1024x768 elemental images with each elemental image (EI) directionally modulating 50x50 pixels at 60Hz, requires a total bandwidth of 2.8 Tbps. Existing light field compression algorithms can be characterized in two different ways: algorithms that utilize the characteristics of the captured data, and algorithms that utilize the optical characteristics of the capture device as well as the captured data. Examples of algorithms that are purely based on data characteristics include: vector quantization [5], video compression based codec and a disparity compensated codec [11], multi view coding [12], and H.264/AVC standard [13]. Vector quantization can achieve fast decoding and independent access to the light field data, despite the low compression ratio. Video compression based codec of [11] uses several coding modes to reduce the redundancy between elemental images, and the disparity compensated codec uses the disparity between those images in the coding process. Both methods achieve high compression, but require a computationally complex and memory intensive encoding and decoding processes which are very difficult to achieve in real-time. In [12] and [13], sub-images are formed by grouping pixels of the elemental images with the same angular information, which is equivalent to the orthographic projection of the scene with a fixed direction. These methods are appropriate for integral images with low-resolution elemental images, but with a large number of micro-lenses. The number of micro-lenses determines the spatial resolution of the real image [26]. Due to physical limitation of the display, the increase in the image spatial resolution usually comes at the cost of limited angular resolution. Nevertheless, the pre-processing stage of re-arranging the elemental images into sub-images can be computationally cumbersome and time demanding, prohibiting a practical real-time implementation. The common denominator in all the approaches that take into consideration only the captured light field data characteristics is the fact that compression ratio is proportional to the amount of processing, and real-time implementation is usually prohibitive because of the excessive processing and memory requirements. In contrast algorithms that utilize the characteristics of the capture system optics can provide reduced encoding complexity by identifying and discarding data that is repetitive. Examples of algorithms that utilize the optical characteristics of the capture system are plenoptic sampling [15], sub-sample elemental image compression (SEIC) [23] and adaptive plenoptic sampling [24]. Plenoptic sampling of [15] is not a formal compression method but it provides a framework for understanding the information overlap due to optical nature of the capture process. The SEIC method described in [23] is a similar method for elemental image sub-sampling, where the sub-sampling factor is given by the distance from the 3D reference plane to the elemental image plane. Adaptive plenoptic sampling [24] divides the surface of the objects in the scene to slanted planes and adjusts the sampling frequency adaptively based on the slanted plane distance from the capture plane. In both SEIC and adaptive plenoptic sampling the reconstruction is done by using image based rendering (IBR) methods where the discarded elemental images are created by interpolating the sampled elemental images. Since these methods only have elemental image level information and they do not have pixel level information, the final reconstructed elemental image is expected to have depth errors and potential blurriness. Our Compressed Rendering method explained in the next section introduces two new methods in the field of full parallax light field rendering and compression: 1- Visibility Test: Determines the minimum subset of necessary elemental images to render before compression, therefore eliminating unnecessary rendering operations. 2- Multi Reference Depth Image Based Rendering (MR-DIBR): Using references that contain horizontal and vertical parallax information in DIBR. 3- High Resolution Reference Depth Image Based Rendering (HRR-DIBR): Using high resolution references to improve accuracy of decoded images. Our Compressed Rendering Method brings the following advantages: 1- The sampling pattern is determined for each frame and only sampled elemental images are rendered, reducing rendering operations and memory requirements significantly;

2- The sampling pattern is determined by examining the bounding box of the objects in the scene, achieving adaptive sampling accuracy without complex operations; and 3- In addition to texture data, disparity information for each sampled elemental image is also transmitted, adding some overhead, but also increasing perceived image quality.

3. COMPRESSED RENDERING Traditional 2D and 3D image capture systems use two stages to capture image data, the first stage is for generating or rendering the image, and the second stage is for compressing the data for storage or transmission. Previously examined light field compression algorithms also followed the traditional paradigm of capture first and compress later. Here we propose to unite both stages into one unique step, which we call Compressed Rendering. Compressed rendering utilizes the apriori knowledge about the capture system and the scene to determine a subset of the light field data that can sufficiently preserve the perceptual quality of the displayed light field image. Determining this subset light field data prior to rendering (or capturing) the light field information, reduces the processing and memory requirements for rendering while effectively compressing the light field data simultaneously. These savings in processing and memory requirements potentially translate into savings in power consumption, and sufficient reduction in system complexity to allow for real-time full parallax light field capture. Compressed rendering algorithm was designed for capturing a fullparallax light field, and unlike many proposed rendering algorithms and it is not restricted to a horizontal-parallax only setup. Figure 1 shows the stages of the compressed rendering algorithm. The compressed rendering algorithm uses elemental images (EI) as the coding unit and assumes that basic information about the scene and capture system such as the bounding box, location of the objects and optical properties of the capture system are known a priori. Utilizing this information, first, a selection process called visibility test determines the subset of elemental images to be rendered. Then the selected elemental images are rendered, generating the appropriate texture and depth map information for each elemental image. The depth map is converted to a disparity map, and the resulting disparity map and the texture data is packetized for transmission. On the decoding side a novel Multi Reference Depth Image-Based Rendering (MR-DIBR) algorithm uses the texture and disparity information to synthesize the un-rendered elemental images.

Shape

Encoder

Reference Texture

Position Visibility Test

Decoder Multiple Reference DIBR

Rendering Reference EI

Texture Reference Depth

Depth to Disparity Conversion

Reconstructed Light Field

Scene data

Figure 1: Flowchart of Compressed Rendering Algorithm.

Since our MR-DIBR uses images to render new scenes, its final quality is similar to memory-intensive image-based rendering methods. However, with the use of per-pixel geometry information, it is possible to reduce the number of views used for rendering while at the same time maintaining the rendering quality. The use of depth decouples the generation stage from the display stage of 3D multiview systems. The MPEG 3D video standard of FTV (Free-viewpoint TV) [14] has looked for proposals that use depth in horizontal parallax only coding algorithms; our method goes a step further by considering full parallax light fields. By using depth maps, we can save power at the encoder side, generating only a few reference elemental images and their respective depth-maps, and synthesize the remaining light field through MR-DIBR. Nevertheless, this method also has its limitations. DIBR rendering is prone to errors due to occlusions, round-off errors and quantization of depth values. Several techniques have been proposed to tackle some of these

problems [16]-[22], but they also incur in increased complexity, and a careful trade-off between the quality gains and complexity should be considered. The next subsections will detail the techniques we adopted in our compression framework. 3.1 Visibility Test The first step of the compressed rendering algorithm determines a subset of elemental images to be used as references for synthesizing the entire light field. This step is called the visibility test. The visibility test selects the subset of elemental images in order to reduce not only the overall rendering computation but also the bandwidth. This approach is somewhat similar to plenoptic sampling [15], where the optimal camera distance is determined according to the object’s depth in the scene, however plenoptic sampling does not consider per pixel depth and usually performs just image based rendering (IBR) which causes blurriness in the final image.

Figure 2: The overlapping frusta of only a few lenses cover the entire object, making the other lenses redundant.

Our visibility test incorporates the trade-off between real-time implementation requirements (highly parallelizable solution, minimum dependency between elements) and content redundancy (correlation between elements, high compression and power savings). The derivation of the distance between the elemental images is obtained by using the field of view of the lens and distance of the object to the screen. We also consider the boundaries of the object to select the optimal reference arrangement of the elemental images. A primary cause of quality degradation in synthesized images is the presence of holes. Holes develop when background texture is disoccluded due to the presence of objects at different depths. Many algorithms [16]-[18] propose complicated methods to synthesize the missing texture. In our approach, we identify possible hole-causing regions and add extra reference elemental images to the reference list. The extra image provides the texture for the disoccluded background. This has a higher quality than synthesized texture and less computational impact. Therefore, the selection of elemental images aims to cover the entire object and avoid holes during the synthesis process. By selecting multiple references, the probability of hole occurrence is minimized. 3.2 Depth to Disparity Conversion Due to the uniform geometric arrangement of the elemental lenses used to generate the light field, the depth value of the reference elemental image can be converted into horizontal and/or vertical shifts, according to the distance between the target image and the reference one. These disparity values are used to rearrange the reference texture at the desired elemental image position. The depth of reference images is then converted into disparities, where the relative distance between the target image and the reference provides the direction of shifting. The use of disparity instead of depth proves more efficient in the coding point of view, and also the division operation is avoided, simplifying the decoder implementation. The formula in Eq. 1 can be used for the depth to disparity conversion: Eq. (1)

Where f s the focal distance of the micro-lens, z is the depth of the object, and P is the elemental lenses pitch. Due to the similarity between the lenses, the disparity between the reference elemental image and any other image can be determined by just scaling the converted value using the relative distance between the images. Notice that this distance also provides the direction of pixel shifting, according to the position of the elemental images. Our framework uses fixed-point arithmetic instead of floating-point arithmetic because fixed-point has a more efficient hardware implementation. In current DIBR algorithms, usually the depth values are mapped to 8 bits, which according to [25], provides enough accuracy for the synthesis operation. Since fixed-point arithmetic limits the precision of our system, we decided to use 10 bits to represent the converted disparity values. Simulations have shown that the number of bits used provides sufficient accuracy for the dimension of our system, but further optimization should be done in case the display dimension and depth range changes. 3.3 Multiple Reference DIBR For the remaining elemental images (EI), the demanding computer graphics rendering routine is substituted by our novel Multiple Reference DIBR. We use the term multiple reference due to the fact that our synthesis approach uses more references than is usually used in current DIBR implementations (current implementations use two references, while here we use four or more as needed). The reason is twofold: current implementations are customized for horizontal parallax only sequences while our approach targets full-parallax, and using multiple references at the same time reduces the probability of holes in the final synthesized image. Our MR-DIBR algorithm, illustrated in Figure 3, performs the following steps in order to synthesize the elemental images disparity and texture: 1) Perform forward warping for the reference’s disparity 2) Apply a crack filter in each of the warped disparity 3) Merge all warped multiple references into one. 4) Perform backward warping using the merged disparity.

EI Depth (A)

EI Depth (B)

EI Depth (C)

EI Depth (D)

Forward Warping

Forward Warping

Forward Warping

Forward Warping

Crack Filter

Crack Filter

Crack Filter

Crack Filter

Merge

EI Depth (Synthesis)

Backward Warping

EI Texture (A)

EI Texture (B)

EI Texture (C)

EI Texture (Synthesis)

EI Texture (D)

Figure 3: Flowchart of the Multiple-Reference DIBR algorithm.

The use of multiple references increases the chance that the disoccluded texture after warping will be present in one of the references, and therefore the hole filling is minimized or even totally avoided. This provides a better quality than synthetic hole-filling algorithms; however it requires a careful selection of the reference elemental images while increasing MR-DIBR processing time and memory usage. In forward warping, the reference’s disparities are shifted according to the distance between the target elemental image and the reference elemental image, and their respective disparity values. In order to reduce the memory usage of multiple references, only the disparity is used for forward warping. Due to round-off and quantization errors, cracks might appear in the forward warped disparity. Hence, a crack filter is used to detect the erroneous disparity values and correct them with neighboring disparities. The warped and filtered disparities are then merged together, and since multiple references were used, there is a high probability that the disoccluded view will be present in one of the references. Finally in backward warping stage the merged disparity is used to indicate the location in the reference images to obtain the final texture. There are several filtering methods to avoid warping errors due to mismatch between texture and depth [18]-[19]. These were not adopted in our approach, since the input files used were synthetically generated and do not present such mismatches. We believe that there is a strong tendency in improving the accuracy of the depth values even for real scenes. Therefore, we don’t see the need to spend computational resources dealing with these current limitations.

4. SIMULATION METHODOLOGY The design of full-parallax light field displays based on integral imaging principles, also known as IP displays, needs to take into account limitations of the display and its elemental lenses. The elemental lens pitch P determines the spatial resolution, while the number of pixels in an elemental image (which is determined by the pixel pitch p) determines angular resolution, or the depth-of-field of the display. The angular range of the light rays is restricted by the field-ofview of the elemental lenses used, which consequently determines the viewing zone. In our simulations we worked with a hypothetical light field display, in order to study the parameters of the system that directly affect the proposed compression algorithm. In our hypothetical system, the IP display is a focused display [30], where the gap between the elemental lens array and the display panel is equal to the focal length. Table 1 summarizes the hypothetical display parameters that we assumed for our simulations. Table 1: System characteristics for IP displays.

Display Parameter

Value

Panel resolution

1280(H)x720(V)

Pixel Pitch

80µm

Number of Lenses

128(H)x72(V)

Lens Pitch

0.8mm

Field-of-View

20º

Viewing Distance

≥ 290mm

The simulations used the model of the Stanford Dragon [31] at 90mm in front of the display. In order to determine the impact of the display design parameters in the compression algorithm, we performed simulations varying selected display parameters. In the next section, we provide the simulation results for the cases when we modify the elemental lens pitch P and the cases when we modify the pixel pitch p. We analyzed the impact of reference resolution in the compression algorithm and utilized a multi-resolution compression scheme to improve overall rendering quality.

5. SIMULATIONS RESULTS 5.1 Elemental Lens Pitch In this simulation, we analyzed the impact of the elemental lens pitch P on the compression performance. For a given pixel pitch p, changing P changes the number of pixels in an elemental image and also the number of elemental images in a given size display. This enables testing the effect of having more pixels in the reference elemental image. The object is placed at 90mm in front of the display, generating a high degree of correlation between the elemental images. The visibility test does not take into account the elemental lens pitch, only the object’s distance to the modulation surface and the lens FOV. Since we established that the FOV is fixed (20º), this means that the visibility test selects the same number of elemental images for all lens pitches.

RATE (X:1)

PSNR (dB)

1200

SSIM

30

1

1,084.24

0.993

26.598 800

0.985 0.966

25 22.827

0.875 20.409

400

271.06 16.94

20 17.607

0.820

67.76

0

15 6.4mm 3.2mm 1.6mm 0.8mm

0.75 6.4mm

3.2mm

1.6mm

0.8mm

6.4mm 3.2mm 1.6mm 0.8mm

Figure 4: Compression performance with different lens pitches (horizontal axes).

Figure 4 shows the performance of our compression algorithm, when the elemental lens pitch is modified. The first curve shows the compression performance achieved by our compressed rendering process. For the quality evaluation, we present two metrics: peak signal to noise ratio (PSNR) and the structural similarity (SSIM, [20]). The PSNR graphic shows the overall quality of the reconstructed light field, compared with the fully rendered light field. In the case of SSIM, instead of using the light field, we first transform the elemental images into sub-images. Since the sub-images are closely related to the perceived final image, the SSIM can provide insight into the structures perceived by the viewer. For smaller elemental lenses pitch, the compression achieved is very high in comparison to larger elemental lens pitches. Nevertheless, the synthesis quality is very low. Moreover, the SSIM index indicates that the perceived structure of the scene is also affected by the low quality. The MR-DIBR process is influenced by the quality of the reference used in the rendering process. Figure 5 shows the elemental images of the light field used by our MR-DIBR algorithm as references, their respective texture and depth, according to lens pitch. Since smaller elemental lenses have coarser texture and depth values, they generate geometrical and texture inaccuracies in the synthesis process and degrade the final quality.

(a) 0.8mm

(b) 1.6mm (c) 3.2mm (d) 6.4mm Figure 5: Elemental Images Texture and Depth for different elemental lens pitches.

5.2 Pixel Pitch In this simulation, we fixed the elemental lenses pitch to 1.6mm, and varied the display pixel pitch from 80µm to 10µm. Notice that smaller pixel pitch is preferable if it is available, because overall elemental image resolution improves creating a smaller angular pitch. The purpose of this test was to evaluate the impact of the display pixel pitch in the synthesis quality. Figure 6 shows the content of an elemental image when the display pixel pitch is increased. The quality of the image that is used as reference increases with the smaller display pixel pitch because the number of pixels in the elemental image increases, providing more accurate disparity and texture information.

(a) 80µm

(b) 40µm

(c) 20µm

Figure 6: Elemental Images Texture and Depth for different pixel pitches.

(d) 10µm

RATE (X:1)

PSNR (dB)

300

275

SSIM

30

1 0.999

26.956

271.06 271.06 271.06 271.06

0.998

0.995 0.966

25.329

25

22.943 250

0.875 20.409 20

225

200

15 10μm

20μm

40μm

80μm

0.75 10μm

20μm

40μm

80μm

10μm

20μm

40μm

80μm

Figure 7: Compression performance with different pixel pitches (horizontal axes).

The results presented in Figure 7 show that the decrease in pixel pitch improves the synthesis quality. Furthermore, the structures of the sub-images are also better preserved with smaller pixel pitch. Decreasing the pixel pitch will produce more accurate references, therefore also increasing the final image quality. Since the angular pitch of the light field decreases with smaller pixel pitch, the compression performance remains the same, even when we use references with more pixels. However, there is a substantial increase in processing and memory requirements when using smaller pixel pitch, for example a display with 20 µm pixel pitch has to drive 16 times more pixels compared to a display of same size with 80 µm pixel pitch. 5.3 High Resolution References From the last two simulations, we can see the advantages of having a high resolution reference (elemental image with large number of pixels) for our MR-DIBR. To increase the quality of the synthesized light field, the system needs either increase the elemental lens pitch without changing FOV or display pixel pitch (and increase the angular resolution), or decrease the pixel pitch of the display without changing the lens pitch (increasing the overall pixel number of the display). In order to maintain the specified pixel pitch and elemental lens pitch while increasing the quality of the rendered light field without reducing the final image resolution, we applied a multi-resolution framework to our MRDIBR algorithm, which we call high resolution reference DIBR (HRR-DIBR) For the selected references, we utilized a higher resolution, that is, a smaller pixel pitch (40µm, 20µm or 10µm), maintaining the smallest elemental lens pitch possible (0.8mm). These high-resolution elemental images are used as references in the synthesis process for the warping operation. The image generated by the warping operation is downsampled to match the display resolution (80µm). The down-sampling procedure selects the pixel value above and to the left of the center position as the low-resolution value. This procedure is simple and preserves the high frequency content of the light field spread throughout the elemental images, but does not consider the presence of holes or warping artifacts. Alternative re-sampling methods have proved to be effective even for hole filling [21], and will be considered in the future.

RATE (X:1)

PSNR (dB)

1200

SSIM

25

1

1,084.24

900

23.349 0.977

22.5

0.944 20.534 600

20

0.875 17.559 17.840

271.06 300

17.5 16.94

0.825

67.76

0

15 10μm

20μm

40μm

80μm

0.836

0.75 10μm

20μm

40μm

80μm

10μm

20μm

40μm

80μm

Figure 8: Compression performance when increasing the pixel pitch of reference elemental images (horizontal axes).

Figure 8 shows the performance of the compression algorithm for increased reference elemental image resolution. We were able to maintain the small elemental lens pitch, which produces an image with higher spatial resolution, and increase the final quality of the light field. Notice that the structure of the sub-images was affected by the compression and it is improved by using references with higher resolution, as indicated by the higher SSIM index. Another way to possibly increase quality at the cost of reduced compression is to use more references. However, using low-resolution references can introduce inaccuracies in the DIBR process due to lack of precision for the disparity information. Figure 9 shows the rate-distortion performance of the compression algorithm, when using more references, compared to the rate-distortion performance when using high-resolution references. At low-rates, the presence of more references does not reduce the overall defects generated by MR-DIBR with low resolution references and cannot achieve the same synthesis quality as the high resolution scheme does.

24 HIGH RESOLUTION

22 PSNR (dB)

MORE REFERENCES 20

18

16 0

400

800

1200

Rate (X:1) Figure 9: Rate-distortion performance comparison between using more references and increasing the pixel pitch of the reference images.

In order to provide a more complete evaluation of the compression scenario, Figure 10 shows orthographic projections of the uncompressed and synthesized light fields. In Figure 10(a), we can see the sub-image obtained from the original

light field without compression, and in Figure 10(b) we can see the sub-image obtained from the synthesized light field using high resolution reference images (the picture presented is for references with 10µm pixel pitch, with 23.3dB quality at 16.9:1 compression ratio). We can notice the efficiency of the high resolution technique applied to the reference images, and despite some minor artifacts, the reproduced orthographic projection is very similar to the original image. Figure 10(c) shows the result using more references. Here we can see how the inaccurate, low-resolution reference affects the final image quality.

(a) Sub-image obtained from the original light field

(b) Sub-image obtained from light field with high resolution for reference images.

(c) Sub-image obtained from light field with multiple references

Figure 10: Sub-images subjective evaluation.

The orthographic projection (sub-images) is an approximation of what the user would see, because it assumes that the viewer is at infinity. Following similar objective metrics as the ones presented in [27] and [28], we present the PSNR of the sub-images when using higher resolution reference images (10µm pixel pitch) in Figure 11. The figure is a representation of a matrix of PSNR values, one for each viewing angle.

Figure 11: Objective evaluation of the synthesized light field. The axes in the figure denote the visualization angle, where the top left block indicates angle of (-10°, 10°) and the bottom right block indicates angle of (10°, -10°).

Notice that some specific angles have worse quality than others. However, the subjective evaluation of the sub-images indicates a higher quality. The reason for such a low PSNR is the fact that the object border is affected by the MR-DIBR process, and mismatches occur between the original and the synthesized image. Nevertheless, the MR-DIBR process preserves the object’s geometry to a certain extent across all sub-images, therefore preserving motion parallax and consequently the 3D perception of the object. It is clear that there is a gap between the objective evaluation and subjective evaluation of 3D images. Researchers have suggested ways to address this issue [29], and usually indicate subjective evaluation as the principal form of 3D evaluation. The results presented here are based on a hypothetical light

field display. In future works, we intend to provide subjective assessment of our results using prototype full-parallax light field displays.

6. CONCLUSIONS One of the main hurdles for the dissemination of full parallax light field displays is the immense amount of image data required to drive the display. The work presented here aims to describe feasible ways to handle the large volume of data required by such displays. In this paper we have introduced a new full parallax light field image generation framework called Compressed Rendering. Our novel Compressed Rendering framework eliminates redundant computational operations, by combining rendering and compression in a single step. Our Compressed Rendering framework introduces three new methods for generating full parallax light field data. Our first novel method is the Visibility Test, which identifies the key elemental images for data compression, and avoids the rendering of redundant data. This reduces the number of elemental images to be rendered, decreasing the rendering time and saving computational power and memory. Our second novel method is the procedure used to reconstruct the entire light field at the decoder side, our unique Multi Reference Depth Image Based Rendering (MR-DBIR) algorithm customized for full-parallax light field data. MR-DIBR utilizes reference images with horizontal and vertical parallax information to reproduce a full parallax light field image with high spatial and angular resolution, while also reducing the bandwidth requirements of the light field display. Our third novel method is the High Resolution Reference DIBR (HRR-DIBR). HRR-DIBR utilizes reference elemental images that are higher resolution than the displayed elemental images to improve decoding accuracy. We analyzed the impact of the display parameters such as elemental lens size and pixel pitch in the compression algorithm. We also showed that the resolution of the reference images affects the final quality of the synthesized light field, and that the light field quality increases by using references with higher angular resolution. Efficient hardware implementations for the proposed algorithm are currently under development at Ostendo, and the use of dedicated hardware, such as GPUs and DSPs, are being considered as well. MR-DIBR algorithm is amenable to parallelization, which suits well with the architecture of the mentioned hardware, and improvements in quality and rendering speed are expected as well. Finally, we intend to perform extensive subjective evaluation of our compressed light field using prototype full parallax light field displays. By using subjective evaluation of our results, we can better assess the compression algorithm and make more conscious decisions on the trade-off between quality and complexity of the techniques applied.

REFERENCES [1] Hoffman, D. M., Girshick, A. R., Akeley, K. and Banks, M. S., “Vergence-accommodation conflicts hinder visual performance and cause visual fatigue,” J. Vision, 4, 967-992 (2008) [2] A. Maimone, G. Wetzstein, D. Lanman, M. Hirsch, R. Raskar, H. Fuchs, “Focus 3D: Compressive Accommodation Display,” ACM Transactions on Graphics (Journal), 32 (5), (2013) [3] Akeley, K., Watt, S. J., Girshick, A. R., and Banks, M. S, “A stereo display prototype with multiple focal distances,” ACM Trans. Graph. (SIGGRAPH) 23, 804–813, (2004). [4] Takaki, Y. “High-Density Directional Display for Generating Natural Three-Dimensional Images,” Proceedings of the IEEE, 94(3), 654-663, (March 2006) [5] Levoy, M. and Hanrahan, P., “Light field rendering,” SIGGRAPH ’96: Proc. 23rd Annu. Conf. Comput. Graphics Interactive Tech., , 31–42 (1996)

[6] Lucente, M., [Diffraction-specific fringe computation for electro-holography], PhD Thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology (1994) [7] Halle, M. H., [Multiple Viewpoint Rendering for Three-Dimensional Displays], PhD Thesis, Program in Media Arts and Sciences, School of Architecture and Planning, Massachusetts Institute of Technology (1997) [8] Lippmann, G. , “La Photographie Integrale,” Comptes-Rendus Academie des Sciences, 146, 446-451 (1908) [9] Arai, J., Okano, F., Kawakita, M., Okui, M., Haino, Y., Yoshimura, M., Furuya, M. and Sato, M., “Integral ThreeDimensional Television Using a 33-Megapixel Imaging System,” Journal of Display Technology, 6(10), 422-430 (2010) [10] Aggoun, A., Tsekleves, E., Zarpalas, D., Daras, P., Dimou, A., Soares, L., and Nunes, P., “Immersive 3D Holoscopic System,” IEEE Multimedia Magazine, Special Issue on 3D Imaging Techniques and Multimedia Applications, 20(1), 28-37 (2013) [11] Magnor, M. and Girod, B. “Data Compression for Light-Field Rendering,” IEEE Trans. on Circuits and Systems for Video Technology, 10(3), 338-343 (2000) [12] Shi, S., Gioia, P. and Made, G. “Efficient Compression Method for Integral Images Using Multi-View Video Coding,” IEEE International Conference on Image Processing, 137-140 (2011) [13] Olsson, R., Sjöström, M. and Xu, Y. “A combined pre-processing and H.264-compression scheme for 3D integral images,” IEEE International Conference on Image Processing, 513-516 (2006) [14] ISO/IEC JTC1/SC29/WG11 MPEG2013/N14104, “Use Cases and Requirements on Free-Viewpoint Television (FTV),” Geneva, Switzerland, October (2013) [15] Chai, J.-X., Chan, S.-C., Shum, H.-Y. and Tong, X., “Plenoptic sampling,” Proceedings of the 27th annual conference on Computer Graphics and Interactive Techniques - SIGGRAPH’00, 307–318 (2000) [16] Fehn, C., “Depth-Image-Based Rendering (DIBR), compression and transmission for a new approach on 3D-TV,” Proc. SPIE Stereoscopic Displays and Virtual Reality Systems XI, 93-104 (2004) [17] Mori, Y., Fukushima, N., Yendo, T., Fujii, T. and Tanimoto, M., “'View generation with 3D warping using depth information for FTV,” Signal Processing: Image Communication, 24(1-2), 65-72 (2009) [18] Lee, C. and Ho, Y.S. “View synthesis using depth map for 3D video,” Proc. 2009 APSIPA Summit and Conference, Sapporo Japan, 350-357 (2009) [19] Zhao, Y., Chen, Z., Tian, D., Zhu, C. and Yu, L. “Suppressing texture-depth misalignment for boundary noise removal in view synthesis,” Picture Coding Symposium, 30-33 (2010) [20] Z. Wang, A. C. Bovik, H. R. Sheikh and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Transactions on Image Processing, 13(4), 600-612, (2004) [21] Solh, M., AlRegib, G., “Depth adaptive hierarchical hole filling for DIBR-based 3D videos,” Proc. SPIE 8290, Three-Dimensional Image Processing (3DIP) and Applications II, (2012) [22] Sjostrom, M., Hardling, P., Karlsson, L.S. and Olsson, R., “Improved depth-image-based rendering algorithm,” 3DTV Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON), 1-4 (2011)

[23] Piao, Y., Yan, X., “Sub-sampling elemental images for integral imaging compression,” International Conference on Audio Language and Image Processing (ICALIP), 1164-1168 (2010) [24] Gilliam, C., Dragotti, P.-L. and Brookes, M., “Adaptive plenoptic sampling,” 18th IEEE International Conference on Image Processing (ICIP), 2581-2584 (2011) [25] Takahashi, K., “Theoretical Analysis of View Interpolation With Inaccurate Depth Information,” IEEE Transactions on Image Processing, , 21 (2), 718-732 (2012) [26] Hoshino, H., Okano, F., Isono, H. and Yuyama, I., “Analysis of resolution limitation of integral photography,” JOSA A, 15 (8), 2059-2065 (1998) [27] Olsson, R. and Sjöström, M., “A Depth Dependent Quality Metric for Evaluation of Coded Integral Imaging Based 3D-Images,” 3DTV Conference, 1-4 (2007) [28] Forman, M.C., Davies, N. and McCormick, M., “Objective quality measurement of integral 3D images,” Proc. SPIE 4660, Stereoscopic Displays and Virtual Reality Systems IX, 155 (2002) [29] Chen, W., Fournier, J., Barkowsky, M. and Le Callet, P., “New requirements of subjective video quality assessment methodologies for 3DTV,” Video Processing and Quality Metrics 2010 (VPQM), (2010) [30] Park, J.-H., Hong, K. and Lee, B., “Recent three-dimensional information processing based on integral imaging,” Applied Optics, 48 (34), H77-H94 (2009) [31] Curless, B. and Levoy, M., “A volumetric method for building complex models from range data,” SIGGRAPH ’96: Proc. 23rd Annu. Conf. Comput. Graphics Interactive Tech., , 303–312 (1996)

Suggest Documents