Natural 3D images can be generated by displaying high-density directional images. In order to display real 3D scenes, a 3D camera technique which generates ...
Depth tolerance of simple interpolation methods for 3D camera used for natural 3D display Hiroshi Yoshikawa and Yasuhiro Takaki Department of Electrical & Electronic Engineering, Faculty of Technology, Tokyo University of Agriculture & Technology 2-24-16 Naka-cho, Koganei-city, Tokyo 184-8588, Japan
ABSTRACT Natural 3D images can be generated by displaying high-density directional images. In order to display real 3D scenes, a 3D camera technique which generates high-density directional images of real 3D scenes is required. The most promising method is to generate them by using an interpolation algorithm from multiple images captured by horizontally aligned multiple cameras. In this study we examined four different simple interpolation algorithms. The algorithm that utilizes one representative object distance as a priori was found to be the most effective one. This technique offers the fast interpolation. However, there is a tradeoff between the number of cameras and the depth of 3D objects. The allowable depth for 3D objects is reported. Keywords: 3D display, 3D camera, image interpolation
1. INTRODUCTION A next-generation 3D display should provide natural 3D images. A new 3D display technique and a new 3D camera technique are required. We have already developed a new 3D display technique which projects high-density directional images.1 In this paper, we report a new 3D camera technique which generates high-density directional images. Natural 3D images can be generated by displaying high-density directional images as shown in Fig. 1. The directional images are orthographic projections of a 3D scene into specific directions. When the number of directional images becomes large enough, the projection angle pitch becomes small enough so that rays from 3D objects are virtually reconstructed. When the geometric data of 3D scenes are known, the directional images can be easily generated using a CG software. Such model-based rendering technique requires detailed description of 3D scenes to generate photorealistic images. Unfortunately, it is still difficult to obtain such detailed description of 3D scenes. Moreover, it is time consuming to obtain geometric data of 3D scenes because it usually uses block matching techniques. Recently, the image-based rendering technique has been developed mainly for the use of an arbitrary viewpoint camera.2-6 An arbitrary viewpoint image is synthesized from many images captured by many cameras using image interpolation algorithms. With this technique, photorealistic images can be obtained with short calculation time. In this paper, we report the 3D camera technique which generates high-density directional images using the image-based rendering technique.
Fig. 1 Natural 3D display which projects high-density directional images.
2. GENERATION OF DIRECTIONAL IMAGES There are two kinds of arrangement for the camera array, the horizontal arrangement and the circular arrangement. The horizontal arrangement is much easier to align than the circular arrangement. Figure 2 illustrates the horizontal sectional view of rays captured by one camera. A ray that passes through the virtual screen at the position x has the horizontal proceeding direction θ, where x = xc+zv tan θ. Where xc is the horizontal position of the camera and zv is the distance between the camera and the virtual screen. Now consider the x-tanθ plane. In this plane, image data on a horizontal line of the capture image lay on a straight line expressed by tan θ = (x - xc)/ zv. The image data rearranged on the x-tanθ plane is called the Epipolar Plane Image (EPI.)7 Figure 3 illustrates the horizontal sectional view of rays captured by horizontally aligned multiple cameras. Image data on the same horizontal line of multiple captured images are aligned on parallel lines on the EPI, because all lines have the same inclination and the spacing of the lines is equal to the spacing of the cameras. The directional images we required are the orthographic projections of a 3D scene into specific horizontal directions. Therefore, the directional image can be synthesized by calculating image data on a horizontal line on the EPI. When the horizontal projecting angle of the directional image is θ, the horizontal line intersects with the vertical axis at tan θ. Because the number of cameras is limited, only discrete image data exist on the horizontal line. The nonexistent image data must be interpolated. For the arbitrary viewpoint camera which is the traditional application of the image-based rendering technique, the resultant image is a perspective projection of a 3D scene. The image interpolation on the EPI is done along a inclined line, not a horizontal line.
Fig. 2 Rays captured by one camera and EPI.
Fig. 3 Rays captured by a horizontally arranged camera array and EPI.
3. INTERPOLATION METHODS As shown in Fig. 4, rays emitted from a 3D point located at (xo, zo) are expressed by an equation x = xo+(zv−zo) tan θ, and form a straight line on the EPI. When we assume that rays emitted from a 3D point have same optical intensity in all proceeding directions, the straight line has uniform intensity. If the depth distribution of a 3D scene is completely known, the interpolation on the EPI can be easily done by using this characteristic. The intensity of a point (xo, tan θο) on the EPI can be obtained by drawing a line which passes through this point and has the inclination of 1/(zv−zo). Then, the required intensity can be obtained from the intensities of the points where this line intersects with the captured data lines. However, it is a difficult task to obtain the depth distribution of a 3D scene. In this study, we examine four simple interpolation methods which do not require the depth distribution of a 3D scene.
Fig. 4 Rays emitted from a 3D point and EPI.
The four interpolation methods are described as follows: Method A: consider a horizontal line which passes through a point whose intensity is unknown. The nearest and the second nearest points where the horizontal line intersects with captured image lines are determined. The unknown intensity is calculated from the intensities of the two intersections, depending on the distances from the point to the intersections (see Fig. 4 (a).) Method B: consider a vertical line which passes through a point whose intensity is unknown. The nearest and the second nearest points where the vertical line intersects with captured image lines are used for the interpolation (see Fig. 4 (b).) Method C: the nearest and the second nearest data are used. lines is used for the interpolation (see Fig. 4 (c).)
A line which is perpendicular to the captured image
Method D: the depth distribution of a 3D object is represented by one distance zm. 1/( zv−zm) is used for the interpolation (see Fig.4 (d).)
(a) Method A
(b) Method B
(c) Method C
A line which has the inclination of
(d) Method D
Fig. 5 Four interpolation methods.
We experimentally compared the four interpolation methods. Instead of using a camera array, one CCD camera was mounted on a translation stage. The size of the CCD image sensor was 1/2 inch. The camera lens had the focal length of 16 mm and the viewing angle of 22.5 degrees. The 3D object had the depth of 120 mm. The resolution of captured images was 640x486, and that of generated directional images was 320x240. The number of captured images was 64, and that of the generated directional images was also 64. The orthographic projecting angle pitch of the directional images was 0.33 degree. The representative distance of the 3D object used for the method D was set to the distance from the camera to the center of the 3D object. Figure 6 shows the generated directional images when the virtual screen was located at the center of the 3D object. Only one directional image whose projection angle is 0 degree is shown for all methods. The quality of the directional image generated by the method A is obviously inferior.
(a) Method A
(b) Method B
(c) Method C
(d) Method D
Fig. 6 Generated directional images when the virtual screen is located at the center of the 3D object.
Figure 7 shows the generated directional images when the virtual screen is located 80 mm behind the 3D object. In the directional images generated by both methods B and C, you can see slight image displacement in the horizontal direction at the neck of the 3D object which is apart from the virtual screen.
(a) Method A
(b) Method B
(c) Method C
(d) Method D
Fig. 7 Generated directional images when the virtual screen was located behind the 3D object.
4. DEPTH TOLERANCE Among the four methods, the method D gives preferable results. Because this method represents the depth distribution of a 3D object by one depth value, there must be errors in the generated directional images. As shown in Fig. 8, a line used for the interpolation does not have correct inclination so that the intersections are shifted on the image data line. It brings about the error in the selecting pixels of captured images. This error becomes larger when the number of camera becomes smaller because the spacing of the image data lines becomes larger. It also becomes larger when the distance to the virtual screen zv becomes shorter because the inclination of the image data lines becomes larger.
Fig. 8 Pixel selection error occurred in the interpolation method D.
The effects of the pixel error were evaluated by experiments. Two 3D objects having the depth of 50 mm and 100 mm were used. The recording distance zv was determined to make the pixel error smaller than one pixel when the number of cameras was 128. The calculated distances were 258.9 mm and 517.7 mm, respectively. The objects were located with their centers be located at the recording distances. The representative distances were equal to the recording distances. Other experimental conditions were the same as those described in the previous section. Figure 9 shows the relationships between the number of cameras and the tolerable object depths corresponding to the pixel errors less than one and two pixels.
Fig. 9 Relationships between the tolerable depth of 3D objects and the number of cameras.
Figures 10 and 11, respectively, show the generated directional images for the objects having the depth of 50 mm and 100 mm. The number of captured images was changed. The RMS errors of the directional images were calculated. The results were shown in Fig. 12. When the number of cameras is equal to the number of horizontal pixels of the directional image, the pixel error approximately becomes zero. The RMS errors were calculated against the directional images generated from 320 captured images.
(a) 8 captured images
(d) 64 captured images Fig. 10
(b) 16 captured images
(c) 32 captured images
(e) 128 captured images (f) 320 captured images Generated directional images for the object having 50 mm depth
(a) 8 captured images
(d) 64 captured images Fig. 11
(b) 16 captured images
(c) 32 captured images
(e) 128 captured images (f) 320 captured images Generated directional images for the object having 100 mm depth
Fig. 12
RMS errors of the generated directional images.
In Figs. 10 and 11, when the number of captured images is more than 32, the generated directional images can not be obviously distinguished from the directional images generated from 320 captured images. This fact is also seen in Fig. 12. The RMS errors dramatically increase when the number of captured images becomes less than 32. It is because the actual horizontal resolution of the captured images might be less than 640 and the average depths of the 3D objects were smaller than the tolerable depths.
5.
3D IMAGES OF REAL 3D OBJECTS
The generated directional images were displayed on our prototype 3D display to generate a 3D image of a real 3D object. The photographs of the 3D image were shown in Fig. 13. The 3D images were captured from several horizontal directions.
Fig. 13
Photographs of 3D images generated by the prototype 3D display.
6. CONCLUSION A new 3D camera technique which generates high-density directional images was reported. This technique uses a horizontally arranged camera array and an image interpolation algorithm. Four different simple interpolation algorithms were examined and we found that the method that utilizes one representative object distance as a priori was the most effective one. The allowable depth for a 3D object was also examined. We found that the required number of cameras was 32 for two 3D objects having depth of 50 mm and 100 mm. We demonstrated the generation of a 3D image of a real 3D object by use of our prototype 3D display.
REFERENCES 1. 2. 3. 4. 5. 6. 7.
Y. Takaki, “Next-generation 3D display and related 3D technologies,” Technical Digest of Optics in Computing 2003, p.166-169, Washington D.C., 2003. M. Levoy and P. Hanrahan, “Light Field Rendering,” Proceedings of SIGGRAPH’96, p.31-42, ACM Press, New York, 1996. S. Gortler, R. Grzeszczuk, R. Szeliski, and M. Cohen, “The Lumigraph,” Proceedings of SIGGRAPH’96, p.43-54, ACM Press, New York, 1996. T. Naemura, J. Tago, and H. Harashima, “Real-Time Video-Based Modeling and Rendering of 3D scenes,” IEEE Comput. Graphics & Appl., 22, p.66-73, 2002. T. Kanade, P. Rander, and P. J. Narayanan, “Virtualized Reality: Constructing Virtual Worlds from Real Scenes,” 4, p.34-47, 1997. T. Kobayashi, T. Fujii, T. Kimoto, M. Tanimoto, “Interpolation of ray-space data by adaptive filtering,” Proceedings of SPIE, 3958, p.252-259, 2000. R. Bolles, H. Baker, and D. Marimont, “Epipolar-Plane Image Analysis: An Approach to Determining Structure from Motion,” Int. J. of Computer Vision, 1, p.7-55, 1987.