their advantages and disadvantages. ..... Our processing software has two major compo- ..... curacy 3d machine vision me
Implementation of a Shadow Carving System for Shape Capture Silvio Savarese z
Holly Rushmeier y
yIBM T. J. Watson Research Center P.O. Box 704, Yorktown Heights, NY 10598 fholly,
[email protected]
Fausto Bernardiniy
Pietro Peronaz
zCalifornia Institute of Technology Mail stop 136-93, Pasadena, CA 91125 fsavarese,
[email protected]
Abstract— We present a new technique for estimating the 3D shape of an object that combines previous ideas from shape from silhouettes and shape from shadows. We begin with a set-up for robustly extracting object silhouettes by casting a shadow of the object with a point light-source onto a translucent panel. A camera on the opposite side of the translucent panel records an image which is readily processed to obtain the object boundary. Similar to other volume-based shape from silhouette methods, we use a space carving technique to extract an initial estimate of the object shape. In a second phase, we record a series of images of the object lit by point light sources. We compare the areas of self shadowing in these images to those expected if our estimated shape from the space carving were correct. The shape of the object is refined by a shadow carving step that adjusts the current estimate of the shape to resolve contradictions between the captured images and the current shape estimate. The result of the space carving and shadow carving is an estimate of shape that can be further refined by methods that work well in local regions, such as photometric stereo. We have implemented our approach in a simple table top system and present the results of scanning a small object with deep concavities.
I. I NTRODUCTION Shape from silhouettes and shape from shadows are two well-known techniques for estimating 3D shape (e.g. [1], [2].) We present a new approach for combining these methods. First, we use a more robust shape from silhouette technique by converting the silhouette boundary detection problem to a shadow boundary detection problem. We then refine the shape estimate we obtain from the sihouettes by examining shadows the object casts on itself. Our method for using shadow information combined with a conservative shape estimate improves on previous shadow techniques in that it carves out a fully 3D surface, rather than a 2.5D terrain. We produce an improved estiPart of this work was performed during an internship at the IBM T. J. Watson Research Center.
mate of shape that can be further refined by methods that work well locally, such as photometric stereo [3]. Our purpose in designing this new approach is the construction of 3D scanners for use in applications such as e-commerce or the creation of virtual exhibits on the Internet. In such applications the user often has a very limited budget, and is primarily concerned with visually, rather than metrically, accurate representations. Furthermore, because users are often not technically-trained, the system must be robust and require minimal user intervention. Similar to other recent work [4] we address the issues of cost and appearance by constructing a system using a commodity digital camera that can acquire color texture images, and controlled lighting systems composed of inexpensive lamps. We address the issue of minimal user intervention by using a combination of methods that rely on substantial variations in the intensities in acquired images to avoid requiring the user to set parameters. We have designed a technique that progressively improves conservative estimates of surface shape, to avoid having an unstable method in which small errors accumulate and severely impact the final result. In this paper we present the design of a working system, with a more detailed analysis and proof of the approach presented in a companion paper [5]. We begin by reviewing previous work in shape from silhouettes and shape from shadows, and identifying the strengths and weaknesses of these methods. We then present our new approach that includes a new hardware set-up and new algorithms. We demonstrate that our approach produces a conservative estimate of object shape. Finally we present results we have obtained from a small table-top prototype of our system. II. BACKGROUND In our new approach, we combine the techniques of shape from silhouettes and shape from shadows. Here we briefly review the fundamentals of these approaches, and
Camera 3
Image 3
Image 1 Object Camera 1
Image 2
Camera 2
Fig. 1. Shape from Silhouettes: The silhouette and camera location for each view forms a cone containing the object. The intersection of multiple cones is an estimate of object shape.
their advantages and disadvantages. A. Shape from silhouettes The approach of shape from silhouettes, sometimes referred to as shape from contours, has been used for many years. An early shape from silhouette method was presented by Martin and Aggarwal [1] and has subsequently been refined by many other researchers. Essentially, the approach relies on the formation of a cone (or in the case of a distant viewer, a prism) by a point of observation and the silhouette in an image obtained from that point, as shown in Fig. 1. All space outside of this cone must be outside of the object, and the cone represents a conservative estimate of the object shape. That is, the cone is guaranteed to enclose the entire object. By intersecting the cones formed from many different viewpoints, the estimate of object shape can be refined. Different techniques for computing the intersection of cones have been proposed. Martin and Aggarwal used a run-length encoded, uniformly discretized volume. For each view, each subvolume, or voxel, is examined to see if it is outside of the solid formed by the silhouette. If it is outside the voxel is excluded from further estimates of the object shape. Subsequent research has improved on the efficiency of this approach by alternative data structures such as octrees [6] for storing the in/out status of voxels. Standard isosurface extraction methods, such as marching cubes [7] can be used to compute the final triangle mesh surface. It is also possible to model the cones directly as space enclosed by polygonal surfaces and intersect the surfaces to refine the object estimate, similar to the method used by Reed and Allen to merge range images [8].
An alternative use of silhouettes to estimate shape is to use a series of silhouettes obtained from very small angular rotations of the object. As illustrated by Zheng [9], by analyzing the change in silhouette for a small change in orientation, the depth of a point on the silhouette can be computed. The set of 3D points obtained from many views form a cloud that can be integrated into a single surface mesh. As shown by Laurentini [10], the accuracy of a shape from silhouettes by any approach is limited. Some concavities in the object are never represented in the observed silhouettes. The method presented by Zheng has the advantage that the unobserved areas that have unmeasured concavities are identified automatically. It has the disadvantage that many more silhouettes are required to estimate the object shape. The method is also not as robust as the cone intersection methods, because reconstruction errors inherent in meshing noisy points may result in holes or gaps in the surface. Because they are robust and conservative, volume-based space carving techniques similar to Martin and Aggrawal’s original method have found success in low-end commercial scanners. The shape error caused by concavities not apparent in the silhouettes is often successfully masked by the use of color texture maps on the estimated geometry. Although they are simple and relatively robust, space carving approaches fail when they are unable to accurately segment the object from its background. Many systems use a backdrop of a solid, known color – i.e. such as the backdrops used in chroma-key systems for video compositing. This approach can fail when the object itself has the same color as the background. More frequently it fails for objects with some specularity that reflect the backdrop into the direction of the camera view, and so appear to have the same color as the backdrop. Diffuse white objects may also reflect the color of the backdrop towards the camera through muliple self-interreflections. This reflection of the backdrop color can cause the segmentation to fail in two ways. The object boundary may be estimated as entirely inside the actually boundary, resulting in a general shrinkage. Areas in the middle of the object may be classified as backdrop. This results in the more serious error of tunnels in the object. A simple approach to correct tunneling errors is to have the user inspect the images being segmented, and paint in areas that have been misclassified. Another approach to avoid tunneling is to use a large diffusely emitting light source as the backdrop. This can often prevent areas in the middle of the object being misclassified, but does not guarantee that areas near the silhouette edge that scatter light forward will be properly classified. It also prevents texture images being acquired
simultaneously with the silhouette images. Recently, Leibe et al. [11] developed a shape from sillight houettes approach that avoids the segmentation problem θ by using cast shadows. A ring of overhead emitters cast shadows of an object sitting on a translucent table. A camf(x) era located under the table records the shadow images. The emitter positions and shadows form the cones that are intersected to form a crude object representation. Because xb xe only one object pose can be used, the representation cannot be refined. For the remote collaboration application Fig. 2. Shape from Shadows: For terrain surfaces f (x) and a known light source direction ; f 0 (xb ) = tan, and f (xb ) ; being addressed by Liebe et al. however, the crude repref (xe ) = f 0 (xb )(xe ; xb ). Using data for many angles an sentation is adequate. estimate of the continuous function f (x) can be made.
B. Shape from Shadows Computing shape from shadows (sometimes referred to as shape from darkness) has also been studied for many years, although perhaps not as extensively as shape from silhouettes. Shafer and Kanade [12] established fundamental constraints that can be placed on the orientation of surfaces based on the observation of the shadows one casts on another. Hambrick et al. [13] developed a method for labelling shadow boundaries that enables inferences about object shape. Since then several methods for estimating shape from shadows have been presented. Since we are designing a scanner, we focus on methods where the light source position is known, rather than the case of unknown light source direction (e.g. [14].) Hatzitheodorou and Kender [2] presented a method for computing a surface contour formed by a slice through an object illuminated by a directional light source casting sharp shadows. Assuming that the contour is defined by a smooth function, and that the beginning and end of each shadow region can be found reliably, each pair of points bounding a shadow region yields an estimate of the contour slope at the start of the shadow region, and the difference in height between the two points, as shown in Fig. 2. The information for shadows from multiple light source positions is used to obtain an interpolating spline that is consistent with all the observed data points. Raviv et al. [15] developed an extended shape from shadows method. The object is set on a known reference surface with a camera directly above. A series of images is captured as a collimated light source moves in an arc over the surface. For the 2D reference surface a volume of data is then obtained, with the third coordinate being the angle of the light source to the reference surface, and the volume recording whether the reference surface was in shadow for that angle. A slice through this volume is referred to as a shadowgram. Similar to Hatzitheodorou and Kender, by indentifying beginning and ending shadow points for each light position, the height difference between the points can
be computed. Also, by observing the change in shadow location for two light source positions, the height between the start of the shadow at one position relative to the other can be found by integration. As long as shadow beginnings and endings can reliably be detected, the top surface of the object can be recovered as a height field. Furthermore, by detecting splits in the shadowgram, i.e. positions that have more than one change from shadowed to unshadowed, holes in the surface below the top surface can be partially recovered. Langer et al. [16] extend the method of Raviv et al. for computing holes beneath the recovered height field description of the top surface for 2 dimensions. They begin with the recovered height field, an NxN discretization of the two dimensional space, and the captured shadowgram. Cells in this discretization are occuppied if they are in the current surface description. Their algorithm steps through the cells and updates them to unoccupied if a light ray would have to pass through the cell to produce a lit area in the captured shadowgram. Daum and Dudek [17] subsequently developed a method for recovering the surface for light trajectories that are not a single arc. The estimated height field description is in the form of a upper bound and lower bound on the depth of each point. The upper and lower bounds are progressively updated from the information obtained from each light source position. All of these methods rely on accurate detection of the beginning and ends of shadow regions. This is particularly problematic for attached shadows that are the end of a gradual transition of light to dark. Height estimates that use gradients derived from the estimate of the start of attached shadows are particularly prone to error. Yang [18] considers the problem of shape from shadows with error. He presents a modified form of Hatzitheodorou and Kender approach, in which linear programming is used to eliminate inconsistencies in the shadow data used to es-
timate the surface. While the consistency check does not guarantee any bounds on the surface estimate, it does guarTurntable antee that the method will converge. He shows that the check for inconsistences is NP-complete. While more roObject bust than Hatzitheodorou and Kender’s method when apShadow Point light source plied to imperfect data, Yang’s technique is still restricted Camera to 2.5D terrains. Existing shape from shadow methods essentially reTranslucent panel cover terrains, with some hole estimates, rather than complete three dimensional objects. However, a major advan- Fig. 3. New configuration for shape from silhouettes: A camera tage is that the shape from shadow methods do recover observes the shadow cast by a point light source on a translushape in concavities that would not appear in any silhoucent panel. The object is contained in the cone formed by the light source and the shadow. ette of the object. C. Combining Approaches and panel relative to a coordinate system fixed to the turnable are found by calibration. In order to be considered a “point” light source, the lamp simply needs to be an order of magnitude or more smaller than the object to be measured, so that the shadow that is cast is sharp. The lamp needs to have an even output, so that it does not cast patterns of light and dark that could be mistaken for shadows. The translucent panel is any thin, diffusely transmitting material. The panel is thin to eliminate significant scattering in the plane of the panel (which would make the shadow fuzzy) and has a forward scattering distribution that is nearly uniform for light incident on the panel, so that no images are formed on the camera side of the panel except for the shadow. The positions of the light source and camera are determined by making sure that the III. P ROPOSED A PPROACH shadow of the object falls completely within the boundWe present a new hardware configuration that converts aries of the translucent panel for all object positions as the the difficult image segmentation problem in shape from turntable revolves, and that the camera views the complete silhouettes into a simpler shadow detection problem. Sec- translucent panel. ond, we use the surface estimate obtained from shape from By using a translucent panel, the camera views an image silhouettes in a shape from shadowing method that allows that is easily thresholded by performing a k-means clusterus to overcome previous limitations imposed by the need ing analysis to determine the boundary intensity value beto locate precise boundaries. tween lit and unlit areas. Because the camera and panel positions are known, the shadow boundary can be expressed A. A New Configuration for Shape from Silhouettes in the world coordinate system. Now the cone that fully To avoid the segmentation problem inherent in many contains the object is formed by the light source position shape from silhouette systems, we adopt an approach sim- and the shadow boundary. A volume can be defined that ilar to that of Leibe et al. We rearrange the set up however is initially larger than the object. Voxels can be classified to allow for multiple object poses and better refinement of as in or out for each turntable position by projecting the voxel vertices along a line starting at the point light source the object shape. Our proposed new set up for shape from silhouettes is onto the plane of the panel and determining whether they shown in Fig. 3. A point light source is placed in front are in or out of the observed shadow. A more accurate of the object to be measured sitting on a turntable, casting estimate of the surface can be obtained by computing the a shadow on a translucent panel. A camera on the oppo- actual crossing point for each in-out edge. It has become evident that to produce robust scanning systems it is useful to combine multiple shape-from-X approaches. A system is more robust if a shape estimated from shape-from-A is consistent with shape-fromB. A general approach to interpreting image data is photoconsistency, as introduced in Kutulakos and Seitz [19]. A surface description (for example as given from a volumetric represention) is acceptable if it is consistent with all images captured of the surface. We use this basic idea in combining shape from silhouettes with shape from shadows. We will refine our initial, conservative, shape estimate obtained from shape form silhouettes to be consistent with images of the object that exhibit object self-shadowing.
site side of the panel records the image of the shadow cast By using the projected shadow, problems such as the on the panel. The locations of the camera, light source, object having regions the same color as the background,
L2 Shadow on panel
L3 L1
Ln
Front camera
L4
Object
Back camera
Fig. 4. Alternative configuration: Many alternative configurations can be used. Using multiple colored lights allows several shadows to be captured simultaneously. Using each of several lights in turn, and a camera in front, allows multiple shadows and texture data to be obtained for each object position.
or reflecting the background color into the direction of the camera are eliminated. The configuration also allows for some other interesting variations, as shown in Fig. 4. Multiple light sources could be used for a single camera and panel position. One approach would be to use red, green and blue sources, casting different color shadows. In one image capture, three shadows could be captured at once. Another approach would be to add a camera in front, and use several light sources. For each turntable position several shadow images could be captured in sequence, and several images for computing a detailed texture could be captured by the camera in front. Either configuration for using multiple light sources would give information equivalent to using multiple camera views for each in a traditional shape from silhouettes set-up. The result of this is a more complete object description from each full rotation of the turntable, and a reduction in the number of repositionings of the object on the turntable and additional rotations necessary to get a complete, well-defined object. In a sense, this new configuration converts the shape from silhouettes into a shape from shadows method. However, because the shape is estimated by space carving rather than relative heights, the method is not restricted to terrain-like surfaces or sensitive to detection of subtle shadow boundaries. B. A Second Phase – Shape from Self-Shadowing As discussed in section II, shape from silhouettes cannot capture shape of some concave areas which never appear in object silhouettes. The new configuration described above does not overcome that limitation. Additional phases of shape capture are needed to accurately obtain the full object surface. Object concavities are evident from shadows the object
casts on itself. In our second phase of processing we analyze the object’s self-shadowing, and adjust our surface estimate in the first phase to be consistent with the selfshadowing. To obtain multiple images with potential selfshadowing, we propose a hardware set up as shown in Fig. 4. Shadow images on the translucent panel are obtained for multiple point light sources in front of the object. At the same time the camera in front of the object is used to take images of the front of the object and the shadows it casts onto itself from the same light sources. While all of the front and back images are taken in the same rotation of the turntable, the translucent panel images are processed first to obtain a first surface estimate. We use the images obtained by the front camera in a second phase to refine this estimate. Our method for refining the shape estimate with self shadowing has three phases: shadow detection, check for contradiction, and adjustment to resolve contradiction. As mentioned in section II, shadow detection is non-trivial. In objects that have significant self-interreflection and/or spatially varying surface albedo it is frequently the case that lit portions of the object have intensities that are lower than portions that do not have a direct view of the light source, but are lit by interreflection. To detect shadows we make use of multiple light sources for each object position, and by positioning one light source L1 near to the camera view. Images from source L1 contain nearly no object shadows since the light source view is nearly indentical to the camera. We use the images from light source L1 as a baseline for determining whether relatively dark areas in the images are shadows, or simply have low albedo. We choose a conservative shadow threshold based on these comparisons for each image region. That is, we identify areas we are very certain to be in shadow region, and do not attempt to find exact shadow boundaries. The rest of our method is designed to make use of these conservative shadow estimates to continue to refine our initially conservative object estimate. In the next step we test whether the shadow regions identified can be explained by the current object estimate. This is in the spirit of Kutulakos and Seitz [19]. Rather than checking whether the same voxel is being viewed by two cameras by comparing the observed intensity, we are checking whether the same surface is being seen by the light source and camera by observing whether the surface appears to be lit or unlit. Detecting whether the surface is lit or unlit is more robust than checking whether intensities are the same within some preset error bound that depends on unvarying lighting conditions and near Lambertian surfaces, and does not depend on the surface having detectable variations in texture. We can check for whether
plained shadows. Our surface estimate is conservative, so we cannot add material to the object to block the light source. We can only remove material in our adjustment. shadow C Removing material anywhere outside of the unexplained incident light shadow can not form a block to the light source. The only option is to remove material from the unexplained shadow shadow region to the extent that the surface in that region is pushed B back to the point that its view of the light source is blocked. Fig. 5. Not all shadows on an object indicate a contradiction To carve out the correct amount of material, we consider with the current object estimate.For the coffee mug shown, the 2D slice defined by the camera view point, the point the shadow C cast by the handle and the attached shadow B on the current surface estimate and the light source posiare both explained by the current object estimate – no con- tion (Fig. 6.) We need to move the point along the ray from cavities are indicated that require further refinement. the camera to the point were its view of the light source is blocked by the lit surface. As shown in Figure 6, if we move each point in the shadow region so that it lies on the ray from the light source throught the first lit point we Object have adjusted the surface to be consistent with the captured shadow image. Essentially we are observing this 2D Shadows slice as a terrain, and making the same inference on the relative heights of the observed points as in previous shape from shadows methods [2] [15]. The difference is that the Current surface estimate height of the edge of the shadow is fixed from our preCarved region vious surface estimate. Alternatively, this can be viewed New surface position First unexplained as carving out photo-inconsistent volumes as in Kutalakos shadow pixel Image plane and Seitz [19]. Light source We assume that there are no features smaller than our image pixel size. As noted in [15] and [13] this interpretation of shadows fails when there are features sharper than Camera the image resolution. A full solution to the shadow deFig. 6. Adjustment of the position of a surface point in corre- tection problem is not the objective of this paper. However, we assume that we do not misclassify any lit pixels spondance to a contradiction pixel. as shadow. It is permissible with our approach to misclassify shadow pixels as lit without jeopardizing the conserour current surface explains a shadow by casting a ray from vative property of our estimate. A detailed analysis of our the camera, to the current surface estimate, and then cast- shadow carving approach is presented in [20]. ing a second ray from the intersected point in the direction IV. I MPLEMENTATION of the light source. If the ray does not intersect the surface again before reaching the light source the shadow is We have built a small table-top system to test our apunexplained and represents a contradiction that must be proach. Our processing software has two major comporesolved. nents – the volume carving using the silhouette images, Not all shadows observed on an object indicate that the and shadow carving using the images from the five light initial surface estimate is incorrect. Consider the case of a sources. coffee mug shown in Fig. 5. Two shadows would be observed, the attached shadow B and the shadow cast by the A. Hardware setup handle C . Ray-tracing would show that both of these shadOur initial table-top system is show in the image in ows are explained by the current surface. A ray to the light Fig. 7. For convenience in calibration, we use a single source in the attached area B would immediately enter the camera system. The system is composed of a Kaidan MCobject itself, indicating the light source is not seen. A ray 3 turntable with attached calibration target, a removable from area C would intersect the handle before reaching the translucent plane, a single 150 W halogen light source source. in front of the object, and a SONY XC-999 camera and The problem remains of what to do to resolve unex- five light sources behind the movable panel. We capture
Fig. 7. Photograph of a small table-top implementation of our proposed system.
320x240 resolution images. The camera is calibrated with respect to a coordinate system fixed to the turntable using Tsai camera calibration [21]. Control software allows us to automatically capture a series of N images in rotational increments of 360=N degrees. The translucent panel is simply a sheet of unwatermarked copier paper sandwiched between panes of glass. The panel location relative to the camera is calibrated by temporarily placing a checkerboard pattern between the glass planes. The positions of the light sources is measured using a Faro arm. Because we use a horizontal calibration pattern, and the working space of the Faro arm is relatively small, our scanning volume in this simple implementation is relatively small, approximately 6x6x4 cm. By revising our scheme for calibration, and using a two camera system, the working space can be expanded, and this is not a fundamental limitation of our approach. Because we initially use a single camera system, we need to take data in two full rotations of the turntable. In the first rotation the translucent panel is in place and the silhouette images are obtained. In the second rotation of the turntable the panel is removed without moving the object, and the series of 5 images from the 5 camera-side light sources are captured. By using a two camera system only one rotation would be needed, and the user would not need to step in and remove the panel. Sample data that are obtained are shown in Fig. 8. The upper image shows the shadow cast on the translucent panel. The lower image shows one the five self-shadowing images captured for one object position. The original images are captured in color, however for space and shadow carving only greyscale is needed. B. Software – Volume Carving We implemented the shape estimation from the silhouette images formed on the translucent panel in MatLab with a C++ postprocess. Processing begins by doing kmeans analysis on the images to determine the pixel in-
Fig. 8. Sample data from table-top system: Upper image: sample panel image for space carving, Lower image: sample self-shadowing image.
tensity value dividing the lit and unlit regions for the set of captured images of the panel. The boundary of each shadow is then computed with subpixel accuracy. A volume is defined that completely encloses the working space of the scanner. The volume is uniformly subdivided into voxels. Initially all voxels are marked as being in the object. For each panel shadow image, all of the voxels still marked as in the object are tested by projecting their vertices into the plane of the panel along a line originating at the light source. Vertices that are projected into the shadow remain in, and vertices that are projected outside the shadow are marked out. For voxels edges that join in and out vertices, the intersection of the edge with the shadow boundary is computed. If the interection is closer to the in vertex than in previous shadow images, the new distance between the in and out vertices is stored at the edge. Any voxel that has all vertices marked out is eliminated from consideration in subsequent panel shadow images. At the end of processing we have a volumetric data structure that describes the object. The volume data structure is written out to a file by MatLab. A modified Marching Cubes algorithm [7] is used to extract the object surface description. In a standard Marching Cubes, vertices contain a signed distance value, and the value of zero surface crossings need to be computed at edges. In our modified version the edge crossings have already been computed in the panel shadow loop, and so no signed distance function interpolation is needed. For objects which occupy a high percentage of the initial volume, the uniform volume subdivision is inefficient. As
Algorithm Shadow Carving 1. for each camera view 2. Threshold shadow images using reference image 3. Render depth buffer using current surface 4. for each shadow image 5. Cast rays from each shadow pixel and mark contradictions Fig. 11. In the first step of shadow carving, shadows are iden6. Update depths for pixels in contradiction tified in a set of images all taken from the same viewpoint, but with different light positions. 7. Update volume data using adjusted depth map 8. Extract new surface estimate for full object
Fig. 9. Skeleton of the shadow carving algorithm.
Fig. 10. At the start of a shadow carving step, the object is represented by both a triangle mesh, left and center images,and a volume with nodes marked in and out of the object in the right image.
in previous volume carving approaches, an octree representation could be substituted to exploit object coherence and accelerate the calculations. C. Software – Shadow Carving Our implementation of carving out shadow regions to refine the initial surface estimate consist of three parts – shadow detection, test for contradictions, and surface update. These three operations are performed for each camera position, with the cumulative results for all positions k ; 1 carried forward to the processing of position k . The pseudocode of the algorithm is in Figure 9. We have implemented the shadow detection in MatLab, and the contradiction and surface update in C++ using OpenGL for rendering. At the end of the processing for the k ; 1 position, the object is represented in two ways, as show in Figure 10. It is represented as a triangle mesh, as shown shaded and in wireframe in the left and center images of the figure. It is also represented as a volume, as shown on the right, with blue indicating volume nodes outside the object, green inside, and red indicating nodes with edges crossed by the object surface.
Fig. 12. The surface estimage from the k ; 1 step is rendered as a depth image in the same view as the shadow images from the k step.
Because a full solution to the shadow detection problem is not the object of this investigation, we use a simple approach for the first phase. We use the image with the light near the camera position as our basis for comparison. Each of the other four images are analyzed for areas that are relatively dark relative to the reference image. We select a threshold value that is safely within the dark region for all of the images. Because we initially worked with a uniform albedo objects, we simply use the same threshold value across the images.A set of typical images used for the k position update are shown in Figure 11, with the areas classified as shadow superposed in color over the original intensity images on the right side of the figure. To test for whether points in observed shadow regions are in contradiction with our initial surface, we begin by rendering a depth image of our current surface estimate (by simply reading back the z-buffer after rendering the surface with OpenGL.) The depth image has the same resolution as our captured shadow images. In Figure ?? the image on the left shows the k ;1 estimate projected onto an image from view k , with the grid intersections indicating pixel centers. The image on the right of Figure ?? shows the k ; 1 estimate rendered as a depth image in the k view. We then process each shadow image for this view. For each pixel in the detected shadow region that has a non-
Fig. 14. The depth image changes indicated by the shadow images in the k view. The magnitude of the changes into the image plane are shown in the left, and the resulting k depth image is shown on the right. e6 e1
e2
e3
e5
e4
(a) e6 e1
e2
e5 e7 e10
e8 e9
Fig. 13. Each pixel in a shadow area (here shown in purple) is examined for contradiction.In the example shown, a ray from the shadowed pixel in the depth image has a clear view of the light source, indicating the surface should be carved.
background depth, we test if a ray from the point in the depth map to the light is blocked by the object. An example of the rays from the camera and light source are shown from a distance and close up in Figure ??. Any ray casting algorithm could be used, in particular casting a ray through an octree data structure built in the initial surface estimation would be efficient. For this trial implementation however, since the entire object was in the camera view, we just rendered the ray to the light source into the same depth buffer and checked that it was always in front of the surface. Any shadow pixel that has a clear view of the light source from the current estimated surface, as is the case in Figure ??, is marked as a contradiction. To resolve pixels in contradiction, we adjust their height in the depth map. Considering the 2D slice shown in Fig. 6, the depth of the contradiction pixel is adjusted along the line of sight from the current camera view point. The depth is increased to the point it reaches the ray from the light source through the nearest unexplained pixel, as shown in Fig. 6. In terms of Fig. 13, the depth of the pixel is adjusted along the green ray away from the camera, un-
(b) Fig. 15. (a) A depth map is rendered using a surface extracted from volume data. (b) The depth map is adjusted to explain shadows. The volume grid is updated in preparation for extracting a new full surface estimate of the object.
til the yellow ray to the light source pierces the surface at the edge of the purple shadow region. The resulting depth changes for all the shadow images shown in Fig. ?? are shown in Fig. ??. The magnitude of the depth adjustments are shown on the left, with the adjustments being made along camera lines of sight, not perpendicular to the object. The depth image after adjustments is shown on the right. The depth map for the current camera view is used to represent the adjustments for all shadow images from that view. The final step for a camera view is to update the volume representation of the surface, so that a new full object representation can be extracted to carry forward to the next view. We update the volume values as shown in 2D Figure 15. We test the voxel vertices to see if they lie in front of or behind the depth map surface and update the labelling of vertices which have been carved out. Figure ?? shows the process in 3D for the example illustrated in the
Fig. 16. A 3D view of the volume vertices that need to be updated to outside the object as a result of the shadow carving. On the left, the k ; 1 surface is shown in yellow, and the k surface in white. A number of green, formerly interior volume nodes are revealed, and their classification needs to be updated.
previous figures. The k ; 1 depth field is shown in yellow, and the k is shown in white. A number of the green volume nodes, inside the k ; 1 surface approximation, are now outside the k surface. We update the edge values for pairs of connected vertices that cross the adjusted portions of the depth map. We use these updated values then in the modified Marching Cubes processing to produce a new full surface representation that can be rendered into a new depth map for the next camera view. V. R ESULTS We have used our trial system to scan small clay objects. In each case, we use images on the shadow panel for 24 positions of the turntable, spaced 15 degrees apart. Results for a simple object approximately 3 cm on each side with a 2 cm diameter, 6.5 mm deep, indentation on one side are shown in Fig. 17. The top left image shows a captured image of the object, and the top right shows the results of the initial shape from silhouettes processing. The four images in the second two rows in the figure show the results of carving the shadows obtained from one, two, three and four positions respectively. The front indentation is crudely carved out. Note that the top of the object is in actuality slightly concave. Because the shadow panel images were taken with the light above the object, the initial estimate shown in the top row shows a convex shape for the top of the object. Besides carving out the indentation, the top of the object is also partially carved out using the self shadowing images. To assess the accuracy of our carving results, we used a ShapeGrabber laser scanner to capture the 3D shape of the object, shown in Figure 18. We aligned and merged the individual range images obtained using the RapidForm2001 software package. RapidForm2001 was also used to find the rigid transformation that would best align the original silhouette model to the laser scan. The same rigid
Fig. 17. Results for simple object after various stages of processing. Top row: captured image and results of shape from silhouette, Second and third row: Results from shadow carving for four successive views, with four shadow images used for each view.
transformation was applied to all successive approximations of the shape obtained with shadow carving. We intersected a horizontal plane with the model to produce 2D contours. The progression of approximations, shown in yellow, cyan, blue, magenta and green, approach the contour of the red contour we obtained for the laser scan. The global quantitative improvements obtained by the shadow carving are shown in Figure 19. We used the RapidForm function to measure the distance to measure the distance between shells to find the distances between the laser scan and original silhouette model, on the left, and between the laser scan and the model after shadow carving for 4 views.The magnitude of the distances is shown as a colormap painted on the laser scan model, with the distances in ranging from 0 to 10 mm. Another example of results from our trial system is the scan the head of a small clay figure shown in Figs. 7 and 8. The figure has a challenging range of concavities, including slight concavities at the ears, deep and narrow concavities in the face around the eyes, and a wide and steep
Fig. 20. Results for clay figure after various stages of processing. From left to right: Reference image; Shape from silhouettes; Shadow carving from one camera position; Shadow carving from an additional camera position.
Fig. 18. Comparison of laser scan and shadow carving results. The left image shows the laser scan, with a red contour showing the intersection of the model with a horizontal plane. In the center is the original silhouette model aligned with the laser scan. Contours showing the improvement of the shadow carved model are superposed on the model. On the right, the contours are viewed from above.
Fig. 19. Results for simple object after various stages of processing. Top row: captured image and results of shape from silhouette, Second and third row: Results from shadow carving for four successive views, with four shadow images used for each view.
tions of the turntable, spaced 15 degrees apart. The second image in each row of Fig. 20 shows the result of the initial space carving. The object shows no tunnelling or shrinkage that could occur with faulty silhouette boundary detection. The third and fourth images in each row of Fig. 20 show the object after carving using the shadows from 1 and 2 camera positions respectively – i.e. after using 4 and 8 selfshadowing images. The concavities around the eyes and ears have become visible. The large and steep concavity has been significantly carved out. We also scanned this object with the laser scanner, and aligned our shadow carved models with the laser scanned model. We again used the function in RapidForm2001 that measures the distance between two shells. In Figure 21 the improvement before and after a shadow carving on the front of the object is shown (top row), and the improvement before and after a shadow carving on the back is shown (bottom row.) Clearly, our results are not a finely detailed object representation. However the surface after shadow carving is a much better estimate, and a much better starting point for the application of an additional method such as photometric stereo which is good at estimating shape locally. VI. C ONCLUSIONS
concavity in the small hat in back. In Fig. 20 the results of our processing are shown. The top row of images shows the figure from the front, and the second row of images shows the figure from the back. The leftmost image in each row is the figure as captured with the light on closest to the camera view point. Images on the shadow panel were obtained for 24 posi-
We have demonstrated a new system for capturing object shape using inexpensive digital cameras and lamps. Our method combines shape from silhouettes and shape from shadows. We introduced a new hardware configuration to make silhouette extraction more robust by converting it to a simple extraction of an unoccluded sharp shadow. We presented a new variation of shape from shad-
[6]
[7]
[8]
[9]
[10]
[11]
Fig. 21. Comparisons of shadow carved and laser scanned object are illustrated by painted the difference between the objects onto the laser scanned object. The top row shows the improvement from shadow carving for a view of the front of the object, the bottom row shows the improvement from shadow carving for a view in the back of the object.
[12]
[13]
[14]
ows that is not restricted to terrain surfaces. Our new approach makes use of an existing shape estimate by adjusting heights indicated by unexplained shadows in one view and carry the results forward by using these heights to update a volumetric object description. Unless a very large number of shadow images can be obtained, our new method does not obtain a finely detailed object description. However, we produce a much better surface estimate that shape from silhouettes alone. This improved estimate is suitable for further refinement by any shape estimation method that works well in local regions of the surface. In the future, we also plan to perform an error analysis to conservatively bound the amount that can be carved taking into accout uncertainties in calibration and quantization errors. R EFERENCES [1]
[2]
[3] [4]
[5]
W. N. Martin and J. K. Aggarwal, “Volumetric descriptions of objects from multiple views,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 5, no. 2, pp. 150–158, Mar. 1983. M. Hatzitheodour and M. Kender, “An optimal algorithm for the derivation of shape from shadows,” in Proc. of Computer Society Conference on Computer Vision and Pattern Recognition, Ann Arbor, MI, June 1988, pp. 486–491. B. K. P. Horn and M. J. Brooks, Shape from Shading, MIT Press, 1989. C. Rocchini, P. Cignoni, C. Montani, P. Pingi, and R. Scopigno, “A low cost optical 3d scanner,” Computer Graphics Forum, vol. 20, no. 3, pp. 299–309, 2001. S. Savarese, H. Rushmeier, F. Bernardini, and P. Perona, “Shadow
[15]
[16]
[17]
[18] [19]
[20] [21]
carving,” in Proc. of the Ninth IEEE Interational Conference on Computer Vision, Vancouver, CA, July 2001. R. Szeliski, “Rapid octree construction from image sequences,” Computer Vision, Graphics and Image Processing, vol. 58, no. 1, pp. 23–32, July 1993. W. Lorensen and H. Cline, “Marching cubes: A high resolution 3D surface construction algorithm,” Computer Graphics, vol. 21, pp. 163–169, 1987. M. K. Reed and P.K. Allen, “3-d modeling from range imagery: An incremental method with a planning component,” Image and Vision Computing, vol. 17, pp. 99–111, 1999. J. Y. Zheng, “Acquiring 3-D models from sequences of contours,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, no. 2, pp. 163–178, Feb. 1994. A. Laurentini, “How far 3D shapes can be understood from 2D silhouettes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 17, no. 2, pp. 188–195, Feb. 1995. B. Leibe et al., “Toward spontaneous interaction with the perceptive workbench,” IEEE Computer Graphics & Applications, vol. 20, no. 6, pp. 54–65, Nov. 2000. S. A. Shafer and T. Kanade, “Using shadows in finding surface orientations,” Computer Vision, Graphics and Image Processing, vol. 22, pp. 145–176, 1983. L. N. Hambrick, M. H. Loew, and R. L. Carroll, “The entry-exit method of shadow boundary segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 9, no. 5, pp. 597–607, Sept. 1987. D. J. Kriegman and P. N. Belhumeur, “What shadows reveal about object structure,” in Proc. of the European Conference on Computer Vision, 1998, Lecture Notes in Computer Scince, 1407. D. Raviv, Y.-H. Pao, and K. A. Loparo, “Reconstruction of three-dimensional surfaces from two-dimensional binary images,” IEEE Transactions on Robotics and Automation, vol. 5, no. 5, pp. 701–710, Oct. 1989. M. S. Langer, G. Dudek, and S. W. Zucker, “Space occupancy using multiple shadow images,” in Proc. of the International Conference on Intelligent Robotics and Systems, Pittsburg, PA, August 1995. M. Daum and G. Dudek, “On 3-D surface reconstruction using shape from shadows,” in Proc. of Computer Society Conference on Computer Vision and Pattern Recognition, Santa Barbara, CA, June 1998, pp. 461–468. David Kwun-Moo Yang, Shape from Darkness Under Error, Ph.D. thesis, Columbia University, 1996. K. N. Kutulakos and S. M. Seitz, “A theory of shape by space carving,” in Proc. of the Seventh IEEE Interational Conference on Computer Vision, Kerkyra, Greece, Sept. 1999, pp. 307–313. ,” Reference removed for blind review. R.Y. Tsai, “A versatile camera calibration technique for high accuracy 3d machine vision metrology using off-the-shelf tv cameras and lenses,” .