Development of Stereoscopic image editing tool using ... - CiteSeerX

7 downloads 119717 Views 2MB Size Report
Graduate School of Design & IT ... technologies, graphic card devices, and digital imaging ... use, our system is architected as an Adobe Photoshop plug-.
Development of Stereoscopic image editing tool using Image-Based Modeling Chang Ok Yun, Sang Heon Han Software & Visual Contents Department Graduate School of Design & IT

Dongseo University Busan, S. Korea Abstract - In recent years, techniques have shown an increased interest in research and development related to stereoscopic imaging. However, unlike 2D image, stereoscopic image is generated by generally 3D geometric information. Therefore, the lack of 3D geometric information sometimes imposes restrictions or makes editing more tedious. In this work, we overcome some of these limitations and present a new unsupervised technique aimed to generate stereoscopic image which is estimated depth-map information using image-based modeling from a single input image. Our approach, which is image-based modeling and depth-map based generation of stereoscopic image, has two components. The first component is an easy-to-use depth-map generation module which facilitates the recovery of a basic geometric model of the photographed scene. Our depth-map generation module is effective, convenient, and intuitive because it exploits the user-specified modification tool and template shape based object modification tool. The second component is an interactive stereoscopic image generation module which able to determine stereoscopic view of high quality and making binocular fusion rapid and more comfortable. This system is architected as the Adobe Photoshop plug-in which provides function that uses easily each layer with image data information. Keywords: Stereoscopic image, Vanishing Point/Line, Depth image, Image-based Modeling

1

Introduction

The presence of a 3D, coupled with the advance functionalities of modern image/video-coding standards, could add a feeling of reality to the 2D images we have learnt to admire for decades. As a result of the rapid development of the internet, digital entertainment media and virtual reality, generation of stereoscopic image techniques using depth-map has become increasingly widespread. The advances in depth-map generation technology have already led to the design of several promising prototypes of generation of stereoscopic image techniques. These have been an inspiration to many researchers working in the areas of generation, storage and standardization issues related to stereoscopic image. Using

Tae Soo Yun, Dong Hoon Lee Division of Digital Contents Dongseo University Busan, S. Korea a single input image, vanishing lines and vanishing points are extracted using a few heuristics to generate an approximated depth-map [1]. The depth-map is then used to generate stereo pairs. Advances in stereoscopic display technologies, graphic card devices, and digital imaging algorithms have opened up new possibilities in synthesizing stereoscopic images. An interactive stereo image synthesizer [4] uses advanced imaging features and custom Windows-based software that utilizes the Direct X 9 API. A real-time stereoscopic image converter system [2] that can generate stereoscopic images with different perspective depth using motion parallax from a 2D image. A depth image-based rendering system [11] for generating stereoscopic images is that the depth-maps are preprocessed using an asymmetric filter to smoothen the sharp changes in depth at object boundaries. A method to convert a monoscopic video movie to a stereoscopic video movie [9] is based on calculating a planar transformation between images in the sequence and relies on the human capability to sense the residual parallax. In previous methods, it is to be generated by novel view images and stereoscopic image which are used the part of 3D geometric information and provided by userspecified modification using 2D image. Unfortunately, in these previous methods, 2D image editing processes such as segmentation and hidden area hole filling take a long time. Therefore, for improving these problems and overcoming these limitations, we present an interactive stereoscopic image editing tool system using image-based modeling that allows the user to quickly specify and acquire depth-map from 2D input image. The primary motivation for our system is to provide simple tools for the novice user, which allows real-time depth-map editing and preview function through stereo monitor. Once depth information is acquired from the image, several applications can ensue. In order to increase easiness and general-purpose to use, our system is architected as an Adobe Photoshop plugin, which deals with each layer easily. Hence, our plug-in receives layer data from Adobe Photoshop and all functionality is operated on it. As mentioned before, Adobe Photoshop supports many useful tools for performing the segmentation and hidden area hole filling. 2D imageediting programs, such as Adobe Photoshop [13], provide a

powerful 2D framework, where users can easily edit the appearance of the image with much versatility and flexibility. Typically, the system takes a photograph or image as input and then provides to users with pixelediting tools that can freely manipulate the image in any desired fashion. The 2D operators in these systems are easy and intuitive to use; the systems are interactive and handson, giving users fine control; and the simplicity of most operations allows immediate response and feedback. In the hands of an expert, even dramatically edited photographs remain convincingly realistic. In addition, we assume that one is to manipulate disparity and zero crossing point. We show that having an approximated model of the photographed scene makes it possible to intuitively and real-time determine disparity and zero crossing point between left and right image. The success or failure of a stereoscopic imaging system design would largely depend on the visual comfort it provides to the viewer for the long duration viewing of high quality stereoscopic images. Thus, disparity and zero crossing point are important stereo vision cues in stereoscopic imaging generation. This process is achieved by real-time verification of stereoscopic image using stereo monitor and a polarizing filter glasses. The remaining portions of this paper are organized as follow. In section II, we illustrate overview of the proposed system. In section III, we propose the global depth-map generation method describing the techniques to extract the vanishing lines and the vanishing point and section IV explains how the object depth-map is modified with a user specified modification tool and a template shape based object modification tool. Section V explains how the stereoscopic pair image generation using depth-map. Section VI provides a discussion of experimental results using our system. Conclusions can be found in Section VII.

2

System Overview

A flowchart describing the proposed stereoscopic image editing system is illustrated in Fig.1.

accomplished by a person (but not a computer algorithm), and tasks which are easily performed by a computer algorithm (but not a person). This system consists of three parts processing. (i) Pre-processing is a simple 2D image editing process. Single input image is manually segmented into different layers using traditional 2D image-editing tool, Adobe Photoshop. This is typically the most timeconsuming task. The parts of the scene hidden area in the input image need to be manually painted using clone brushing. Each objects and background are represented as a separate layer for easier depth assignment and occlusion and parallax effects. (ii) Depth-map generation consists of global depthmap generation module and object depth-map modification module. Global depth-map is generated by geometric information using horizon line that detected by vanishing line and vanishing point. Object depth-map modification module consists of two methods. The first method is using the user-specified modification through 3D mesh-based technique. The second method is using the template shape based modification which is a method projecting the 3D primitive or pre-acquired model into object in the image. (iii) In the stereoscopic image generation step, user can manipulate the disparity and zero crossing point to intuitively recover very accurate stereo measurements from left and right images. The stereoscopic image is represented by overlapping the left and right images. And this step is intuitively and real-time verified stereoscopic image using stereo monitor and a polarizing filter glasses. We show that this step provides to the user for the long duration viewing of high quality stereoscopic images.

3

Global Depth-Map Generation

To generate the stereoscopic pair image (left and right views), the depth information of the objects in the input scene has to be estimated. In order to obtain the depth information, an image pre-processing is required. It is composed of the following 2 steps. The first step is to detect horizon line. According to image features, the relevant vanishing lines and the related vanishing point (Fig. 2) are properly detected. From this, we can select horizon line of the scene easily.

Fig. 1. System Flowchart. Our approach is successful because it splits the task of modeling from images into tasks which are easily

Fig. 2. Horizon Line extraction using Vanishing Point.

The second step is to specify the ground plane (Fig. 3). The ground plane corresponds to the horizon line of the reference image. This step is useful for specifying the ground, floor, or terrain of the reference image. It should be the first step in extraction depth from most images because it can provide a basis for placing all other geometry object in the reference image.

important to note that an image will not have a clear horizon line if the line is beyond the image’s view frustum. Under these circumstances, it will be difficult to align the ground plane to features in the image. Depth-map is then obtained by extracting the distance from each pixel to the reference camera of the reference image according to the selected area or layer.

4

Back Plane (Sky)

Vanishing Line

Ground Plane

Fig. 3. Plane generation using Horizon line. For an extraction of ground plane picking any point on the horizon line of the reference image allow generating ground plane used to gradually set the depth variation. The depth at each pixel is reading the depth buffer in OpenGL. The depth buffer in OpenGL contains the 3D distance of each pixel in the display window relative to the camera’s position. The depth values are stored ranging from 0 to 255, where the furthest depth value is 255(Fig. 4).

Fig. 4. Global depth-map generation. The ground plane corresponds to the horizon line of the image. This is similar to [5] in that all of their depth values are dependent upon their ”spidery mesh”. We have found that the use of a reference ground plane greatly simplifies depth acquisition and improves accuracy dramatically, since it provides an intuitive reference. However, our ground plane does not limit the image to one-point perspective images, as in [5]. Moreover, it is

Object Depth-Map Modification

In current method, after applying 2D image editing processes such as background and object segmentation, global depth-map is generated by using geometric of 2D input image. However, for detail depth-map generation, we must modify depth-map. In order to solve the problems in current method, Oh [7] has developed hybrid tools that use pre-defined shapes to aid in painting accurate depth-map. In the development of his interface, he has emphasized 2D, rather than 3D interaction, the direct use of cues present in the image, and the use of previously assigned depth as a reference. Single view modeling [10] is presented that takes as input a sparse set of user-specified constraints, including surface positions, normals, silhouettes, and creases, and generates a well-behaved 3D surface satisfying the constraints. As each constraint is specified, the system recalculates and displays the reconstruction in real time. Model-based stereo [3] shows that projecting pairs of images onto an initial approximated model allows conventional stereo techniques to robustly recover very accurate depth measurements from images with widely varying viewpoints. It is used to automatically refine a basic model of a photographed scene. This technique can be used to recover the structure of architectural ornamentation that would be difficult to recover with photogrammetric modeling. While images often contain geometric and architectural objects, we do not wish to limit the range of our depth-map modification tool to this class of objects, regular object and irregular object. Therefore, we describe two kinds of object depth-map modification tools which address user-specified modification method and template shape based object modification method. The first method, the user-specified modification tool, assumes that a layer or selection area is a complete entity whose shape is irregular. This tool operates through 3D mesh-based method, while the other functions through geometric primitives. This idea is similar to organic shapes tool of Photo Editing system [7] and is related to MultiLevel B-Spline method [8]. The second tool is the template shape based object modification tool. This tool is similar to the shape primitives in that it associates a 3D model to its projection onto the image. In this case, however, we generalize the shape primitive to be any arbitrary shape. By determining the correct transformation for this shape primitive to approximate the pose of its projection, we can achieve convincing depth for any generic shape.

These tools are 3D shapes that the user places into the reference scene through 2D interaction, as viewed from the reference camera. If the image contains geometric objects, associating 3D shapes with them is intuitive for the user. It is much easier to imagine the depth of a cylinder-shaped lighting house, such as in Fig. 6, extracted from a cylinder shape template data rather than derived from several paint/chisel strokes. Object depth-map modification tools are therefore essential components of our system because they quickly afford depth information through 2D gestures that define 3D shapes.

4.1

User-specified modification

Some geometric shapes, such as cylinders, spheres or boxes are not hard to paint depth-map accurately using regular geometric information. However, even if a user can easily infer the spatial organization and irregular shapes such as trees, faces depicted in the image, it is not always easy to directly paint the corresponding depth.

image that have no regular shape. This includes hills, trees, and bushes as shown in Fig. 5. Multilevel B-splines are introduced to compute a continuous surface through a set of irregularly spaced points. The algorithm makes use of a coarse-to-fine hierarchy of control lattices to generate a sequence of bicubic B-spline functions whose sum approaches the desired interpolation function. The user-specified modification tool supports Bspline functions between the points of each curve to produce the smoother mesh. Result objects of experimental demonstrate that high-fidelity reconstruction is possible from a selected set of sparse and irregular sample objects. Therefore, we will save the depth-map with 3D meshbased geometric information generated by user-specified modification tool. The depth-map also will be used a template shape data based depth-map modification in the next section.

4.2

Template shape based object modification

The template shape based object modification tool takes as input an arbitrary generic 3D model that the user can align to the reference image. This tool is particularly useful for generating depth for elements in the reference image that can be generalized. Therefore, template shapes are suitable for faces and lighthouse (Fig. 6). And template shapes include the following shape primitives: sphere, cube, cylinder, and pyramid/wedge. Our template shape and the interface for their manipulation build upon previous work in two ways. First, our template shapes include vertices of object. This is possible in [14], [15] because interaction is by selecting and transforming vertices individually. Second, manipulating our template shape always leads to predictable shapes. Because [3], [14], [15]allow the user to arbitrarily place vertices in 3D space that subsequently alter reference camera parameters, the resulting shape parameterized by these vertices can be unpredictable and is often undesirable. Fig. 5. Depth-Map by User-specified modification. Therefore, user-specified modification tool (Fig. 5) offers user direct assess to the data structures of image by allowing modifications at the mesh level. This is particularly useful when incremental depth modifications are necessary, such as cutting away the surface of a wall, carving the indentations of a sculpture, or adjusting the depth of a facade. This tool, however, also depend on the user’s ability to make decisions about the depth variations in the scene. In order to improve efficiently to determine by minimizing a local approximation error for each control point and present a fast approximation and interpolation algorithm for scattered multivariate data. We have also implemented the method by Lee et al. [8]. This allows depth-map generation from set of irregularly spaced points. It can provide crude depth for elements in the reference Fig. 6. Template shape based object modification.

Template shapes require alignment as described above section. The user can perform reference image alignment in two ways. The first method is through traditional 3D modeling interaction. The second method is through point correspondence and image warping as follows. First, the user specifies feature points on the 3D model. Next, the user picks points on the 2D reference image that correspond to these 3D feature points. The system then approximately aligns the 3D model with a transformation that minimizes the error between the projection of the 3D points onto the reference image and their corresponding 2D points. Finally, we warp the 2D depth image acquired from the point correspondence step in order to match the point correspondences exactly. Therefore, we utilize template shape based object modification tool that can be drawn transparently as 2D objects. For example, the user draws a cylinder (Fig. 6(b)) and clicks on some points to assign cylindrical depth (Fig. 6(c)).

optimized independently to avoid shear in the resulting transformation. The 3D face is rendered, and the z-buffer is read back.

5

Stereoscopic Pair Image Generation

In representing a 3-D image on a viewing screen, it is possible to incorporate some of the cues, including shading, texture, perspective, occlusions etc, which the human observer takes for granted when interpreting depth relationships between different points. However, a true binocular or stereoscopic representation is generally missed. Natural stereoscopic image requires that a pair of views of an object, acquired with approximately the same geometry that would be encountered if it were to be viewed naturally from a comfortable distance with binocular vision, be displayed such that the right-eye and left-eye images are directed to the appropriate eyes. While rigorous stereoscopic views should be computed using the diverging ray geometry that occurs naturally when viewing an object (giving it perspective), the human visual system is quite accommodating if the images are calculated using an orthographic (parallel-ray) viewing geometry, provided that the depth of the object is small (less than 20%) compared to the object-eye distance. In fact acceptable stereoscopic views may be obtained simply by calculating two views of an object rotated from each other by 3-7.

Fig. 8. Parallaxes.

Fig. 7. Human face. The specific case of human faces is also important. The output of the impressive morphable model of Blanz et al. [12] could be used to retrieve accurate depth. However, this technique is not easy to implement since it requires a large database of 3D face scans, and unfortunately, it takes tens of minutes to acquire the geometry of a face from a photograph. We have developed a simpler method that trades accuracy and user intervention for speed and simplicity. This method could be further generalized to a broader class of template shapes. We use a generic arbitrary 3D face model, optimize its 3D position to match the photograph, and then use 2D morphing to refine the match (Fig. 7). The user specifies correspondence points between the image and the 3D model. These points are used to find the rotation, scale, and position of the 3D model using CANDIDE optimization [12]. Rotation and scale are

In this section, when we watch a movie, a moving object is often located in the border of the screen. If it process the object with negative parallax (Fig. 8(a)) in order for it to appear closer than the plane of the screen, the screen surround problem will be occurred. It is a conflict of depth due to the interposition factor that the object must be behind the window. It processes both the object and the background with positive parallax (Fig. 8(c)) in order to solve this problem. That is, it processes so that the object is behind the plane of the screen and the background is behind the object. So, disparity (parallax) and zero crossing point are important stereo vision cues. Stereoscopic image is generated by manipulating disparity and zero crossing point. However, it is hard to determine disparity and zero crossing point. Unfortunately, people are different from stereo vision capability with disparity and zero crossing point. The alternative of improving stereovision capability by using images taken from disparity and zero crossing point has the disadvantage that computing depth becomes very sensitive to noise in image measurements. Generally, after manipulating disparity and zero crossing point, makes

it much more difficult and time to generate stereoscopic image. In this paper, we assume that one is able to solve this problem, manipulating disparity and zero crossing point. We show that having an approximate model of the photographed scene makes it possible to intuitively and real-time determine from disparity and zero crossing point from left and right images. The success or failure of a stereoscopic imaging system design would largely depend on the visual comfort it provides to the viewer for the long duration viewing of high quality stereoscopic images. Thus, disparity and zero crossing point are an important stereo vision cues in stereoscopic imaging generation. This process is real time verified stereoscopic image using stereo monitor and a polarizing filter glasses.

6

stereo monitor of the Pavonine [16] allowing all of the original views to be projected producing a stereo effect when viewed by an observer with a polarizing filter glasses. Then user can intuitively and real-time determine disparity and zero crossing point from left and right images. Content for the stereo monitor of the Pavonine is created by applying an interdigitation process (called Interzigging) to generate of stereo image (Fig. 10(e)). Interdigitation process interdigitates left image and right image (Fig. 10(d)). Stereo monitor of the Pavonine is a stereoscopic viewing monitor which requires additional processing be done to create a suitable viewing stereoscopic image.

Experimental Results

We have implemented the approach described in this paper and applied it to generate stereoscopic image of 2D image. Our system has been implemented on Pentium 4.3GHz, 1GB RAM using the Visual C++ and the graphics API OpenGL library, Adobe Photoshop SDK 6.0, and Adobe Photoshop Version CS2. We used a single 24-bit 800x600 input photograph.

(a) Image Segmentation

(b) Hidden parts filling

Fig. 9. 2D editing image using Adobe Photoshop. The most time-consuming part of the acquisition was the manual segmentation of the input image (Fig. 9) in to layers using Adobe Photoshop. Because our system uses only one input photograph, the parts of the scene hidden in the input image was manually clone brushed. The segmentation and clone brushing took about 2 hours. 20 different layers were extracted. Each trees and house are represented as a separate layer, and the layer itself is decomposed into 20 layers for easier depth assignment and to allow for occlusion and parallax effects. Due to the Adobe Photoshop layers with data structures of segmentation image, we can more easily operate all processing. Depth acquisition took an additional three hours. We first defined the ground plane using vanishing line and vanishing point. Each layer was given a coarse billboard depth by utilizing the ground plane. The depths for those layers such as tree and house, we refined with a user-specified modification tool and template shape based object modification tool. We showed modeling result through modified depthmap. Final resulting image can then be displayed on a

Fig. 10. The experimental results image. .

7

Conclusions

In this paper, we have purposed generation of stereoscopic image interactive tool that takes a single photo image as input and allows the user to extract and specify depth using an image-based modeling. The system allows the user to build a representation consisting of layers of image with data structures of segmentation image using Adobe Photoshop. Our system includes image-based modeling and depth image-based generation of stereoscopic image. This system is built from two components that we have developed. The first component is an easy-to-use depth-map

generation module which facilitates the recovery of a basic geometric model of the photographed scene. We have introduced global depth-map generation tool and object depth-map modification tool using user-specified modification method and template shape based object modification method. The second component is an interactive stereoscopic image generation module which able to determine stereoscopic view of high quality and making binocular fusion rapid and more comfortable. We provide interactive stereoscopic image preview function.

[4] M. H. Feldman, L. Lipton, StereoGraphics Corp.,”Interactive 2D to 3D Stereoscopic Image Synthesis”, In Proc of SPIE, 2005. [5] Y. Horry, K. Anjyo, K. Arai,”Tour into the picture: using a spidery mesh interface to make animation from a single image”, In Proc. of SIGGRAPH, 1997. [6] S.B. Kang,”Depth painting for image-based rendering applications”, Tech. report. CRL, Compaq Computer Corporation, Cambridge Research Lab, 1998. [7] B. Oh, M. Chen, F. Durand, and J. Dorsey, ”Imagebased modeling and photo editing”, In Proc. of SIGGRAPH, 2001. [8] S.-Y. Lee, G.Wolberg and S.Y. Shin,”Scattered data interpolation with multi level B-splines”, IEEE Transactions on Visualization and Computer Graphics, 1997.

Fig. 11. The Adobe Photoshop Plug-in. The Windows program we developed is architected as an Adobe Photoshop plug-in (Fig. 11). Specifically, it is an Automation plug-in (one of 6 Photoshop plug-in varieties), which provides for a plug-in that uses easily each layer. Hence, our plug-in receives layer data from Adobe Photoshop, operates on. As we describe, Adobe Photoshop supports many useful tools for performing the segmentation, hidden area hole filling, depth-map creation operations needed. Finally, we have shown stereoscopic image using stereo monitor and a polarizing filter glasses (Fig. 12).

Acknowledgement This research was supported by the Program for the Training of Graduate Students in Regional Innovation which was conducted by the Ministry of Commerce Industry and Energy of the Korean Government.

8

References

[9] E. Rotem, K. Wolowelsky, D. Pelz,”Automatic Video to Stereoscopic Video Conversion”, In Proc of SPIE, 2005. [10] L. Zhang, G. Dugas-Phocion, J. Samson, and S. Seitz,”Single view modeling of free-form scenes”, In Proc. of CVPR, 2001. [11] L. Zhang, W James Tam,”Stereoscopic Image Generation Based on Depth Images for 3D TV”, IEEE Transactions on Broadcasting, 2004. [12] J. Ahlberg,”CANDIDE-3 – an updated parameterized face”, Report No. LiTH-ISY-R-2326, Dept. of Electrical Engineering, Linkoping University, Sweden, 2001. [13] Adobe. http://www.adobe.com. [14] Canoma. http://www.canoma.com. [15] Photomodeler. http://www.photomodeler.com. [16] Pavonine. http://www.3dview.co.kr.

[1] S. Battiato, A. Sapra, S. Curti, M. La Cascia, ”3D Stereoscopic Image Pairs by Depth-Map Generation”, International Symposium on 3D Data Processing, Visualization, and Transmission, (3DPVT), 2004. [2] C.H. Choi, B.H. Kwon, M.R. Choi, ”A Real-Time Field-Sequential Stereoscopic Image Converter”, IEEE Transactions on Consumer Electronics, 2004. [3] P. E. Debevec, C. J. Taylor, and J. Malik, ”Modeling and rendering architecture from photographs: A hybrid geometry and image-based approach”, In Proc. of SIGGRAPH, 1996. Fig. 12. Demonstration.

Suggest Documents