Image-Based Models: Geometry and Reflectance ... - CiteSeerX

4 downloads 20150 Views 5MB Size Report
Master of Science in Electrical Engineering and Computer Science ... This thesis describes two scanning devices for acquiring realistic computer models ...... of the sphere (corresponding to the top and bottom of the maps in Figure 5), because.
Image-Based Models: Geometry and Reflectance Acquisition Systems by

Christopher Damien Tchou

B.S. (Carnegie Mellon University) 1999

A thesis submitted in partial satisfaction of the requirements for the degree of Master of Science

in

Electrical Engineering and Computer Science

in the

GRADUATE DIVISION of the UNIVERSITY of CALIFORNIA at BERKELEY

Committee in charge: Professor Lawrence A. Rowe, Chair Professor David A. Forsyth Professor Carlo H. Séquin

Fall 2002

The thesis of Christopher Damien Xien Nung Tchou is approved:

Chair

Date

Date

Date

Date

University of California at Berkeley

Fall 2002

Image-Based Models: Geometry and Reflectance Acquisition Systems

© 2002 by Christopher Damien Tchou

1

Abstract Image-Based Models: Reflectance and Geometry Acquisition Systems by

Christopher Damien Tchou Master of Science in Electrical Engineering and Computer Science

University of California at Berkeley

Professor Lawrence A. Rowe, Chair

This thesis describes two scanning devices for acquiring realistic computer models of real objects. The first device captures the reflectance properties of a human face by rapidly lighting the face from a large number of directions. With this data, it is possible to reproduce the appearance of the subject under any complex illumination environment.

The second device acquires detailed geometric models of sculptures,

suitable for polygonal rendering systems.

This thesis discusses the design considerations

and implementation of these two systems, including the calibration, data processing, and rendering algorithms.

_________________________________________ Professor Lawrence A. Rowe Thesis Committee Chair

ii

Contents List of Figures ................................................................................................................. iii Acknowledgements .........................................................................................................iv 1 Introduction.................................................................................................................. 1 1.1 Reflectance Acquisition........................................................................................ 3 1.2 Geometry Acquisition........................................................................................... 6 1.3 Thesis Overview ................................................................................................... 9 2 Reflectance Field Scanning........................................................................................ 10 2.1 Hardware ............................................................................................................. 11 2.1.1 Lightstage..................................................................................................... 11 2.1.2 Cameras........................................................................................................ 15 2.1.3 Projector....................................................................................................... 15 2.1.4 Calibration.................................................................................................... 16 2.2 Data Capture ....................................................................................................... 17 2.3 Image-Based Relighting...................................................................................... 18 2.3.1 Reflectance Functions .................................................................................. 20 2.3.2 Realtime Image-Based Relighting ............................................................... 24 2.3.3 Image-Based Lighting With Environment Mattes ....................................... 28 2.4 Summary............................................................................................................. 33 3 Geometry Scanning.................................................................................................... 34 3.1 Scanning Hardware ............................................................................................. 36 3.1.1 Projector....................................................................................................... 37 3.1.2 Camera ......................................................................................................... 38 3.1.3 Calibration.................................................................................................... 40 3.2 Data Capture ....................................................................................................... 43 3.3 Sub-pixel Precision............................................................................................. 48 3.4 Summary............................................................................................................. 55 4 Conclusion ................................................................................................................. 56 Bibliography.................................................................................................................. 57

iii

List of Figures

Figure 1: Occlusion............................................................................................................. 7 Figure 2: The Reflectance Scanner ................................................................................... 10 Figure 3: The Lightstage ................................................................................................... 14 Figure 4: Lightstage Data Set............................................................................................ 18 Figure 5: Re-sampling a light environment ...................................................................... 19 Figure 6: Relighting the face............................................................................................. 20 Figure 7: An Array of Reflectance Functions ................................................................... 22 Figure 8: Relighting Equation on Reflectance Functions ................................................. 23 Figure 9: Reflectance function re-lighting operation in DCT space ................................. 25 Figure 10: The Face Demo ................................................................................................ 27 Figure 11: Calculating the environment matte.................................................................. 30 Figure 12: Environment Matting ...................................................................................... 32 Figure 13: Scanning sculptures in the Basel Skulpturhalle .............................................. 34 Figure 14: The Sculpture Scanning Rig............................................................................ 35 Figure 15: The Proxima Ultralight x350 DLP Projector .................................................. 37 Figure 16: The Pulnix TM-1040 Digital Camera ............................................................. 39 Figure 17: Sculpture Scanner Calibration Device ............................................................ 40 Figure 18: Calibrated Camera and Projector Distortion ................................................... 42 Figure 19: Identifying pixel location from a image pattern sequence .............................. 45 Figure 20: Horizontal and Vertical Gray Code Patterns ................................................... 47 Figure 21: Gray code response for highest frequency patterns ......................................... 48 Figure 22: Deviance from sinusoidal sub-pixel curve ...................................................... 50 Figure 23: Parthenon Frieze Scan..................................................................................... 52 Figure 24: Acquired frieze geometry................................................................................ 53 Figure 25: Caryatid scan................................................................................................... 54 Figure 26: Caryatid Rendering.......................................................................................... 55

iv

Acknowledgements

Above all I would like to thank my advisors Larry Rowe and Paul Debevec, the members of my committee David Forsyth and Carlo Séquin, and my parents Patrick and Shirley Tchou. Additional thanks to Tim Hawkins for extensive error finding, readability testing, and sanity checking, Barbara Halliday for telling me to get back to work ten times a day, and Okan Arikan, Patrick Nelson, and Jon Cohen for programming expertise. Also I would like to thank Brian Emerson, Andrew Jones, Diane Suzuki, and Mark Brownlow for putting together the models and renderings of the Parthenon scans, and Marcos Fajardo for the use of his awesome global illumination renderer Arnold.

1

Chapter 1 Introduction This masters thesis describes the development of two scanning devices that can acquire computer models of real objects. Both scanning devices operate by recording the way in which objects reflect light, but each focuses on capturing a different aspect of the object while providing data and flexibility not obtainable with current commercial scanners. The first device, the reflectance scanner, obtains data on the complex, spatially varying reflectance properties of a human face.

The second device, the geometry

scanner, rapidly acquires detailed information on the surface geometry of sculptures. While the models produced by these scanners can serve many uses, they are similar in that they can both be used to generate realistic computer generated images of the object. The creation of realistic computer generated images entails reproducing the effects of light in the real world. Ignoring purely manual methods, there are two general approaches to the creation of realistic imagery with a computer: light simulation and image-based techniques.

The two scanners presented here incorporate ideas from both

methodologies. In light simulation, objects are usually represented as a surface along with some reflectance characteristics that describe how the surface interacts with light.

For

example, the reflectance may specify diffuse color, specular intensity, or a more general representation of the interaction, like a Bi-directional Reflectance Distribution Function

2 (BRDF) [27]. The advantage of using light simulation is the ease with which the models can be modified. Once they are created, it is fairly straightforward to make changes to the geometry and surface properties of the model.

The disadvantages are that light

simulation computations are lengthy, and it can be difficult to create models that look realistic when rendered.

The data acquired by the geometry and reflectance scanners

described in this thesis can be processed to derive realistic models suitable for light simulation.

Models created by the reflectance scanner can have complex reflectance

properties, whereas the geometry scanner can provide only simple approximations of the reflectance properties.

The geometry scanner, on the other hand, can provide more

detailed geometric information. The other approach to realistic computer imagery, namely, image-based techniques, takes advantage of the inherent realism of photographs. It leverages the light ‘simulation’ provided by the real world to ‘pre-calculate’ the appearance of objects. Image-based object models are often easy to acquire, and can render quickly, but they have the disadvantage of being hard to modify.

Changing the geometry, viewpoint, or

the lighting in an image-based model can be a non-trivial task, involving either recapturing the images, or inverting the illumination in the existing images to yield a lightsimulation model [42].

The flexibility of image-based techniques can be increased by

taking more images. For example, each data set acquired by the reflectance scanner is composed of over 4000 images of the subject. With these images, we can derive surface and reflectance models akin to the light simulation methods, or, in a purely image-based approach, use them directly to reproduce the appearance of the subject under novel lighting conditions.

3 The next two subsections give a more detailed introduction to each scanner, discuss related work for each, and provide a general overview of the topics covered in the rest of the thesis.

1.1 Reflectance Acquisition The reflectance scanner was built to capture spatially varying reflectance properties across the surface of a human face. The reflectance of a surface describes how it reflects photons, in other words, how it transforms incident light into radiant light. For a single point on the surface, reflectance can be described by a bi-directional reflectance distribution function (BRDF) [27].

A number of parameterized functions have been

developed by various researchers to approximate simple BRDFs [4] [20] [29] [35] [39]. Modeling reflectance as a BRDF assumes reflection happens entirely at the surface of the object.

Hanrahan and Krueger [17] created a more advanced model via Monte Carlo

simulations to simulate the reflectance of layered surfaces due to subsurface scattering. Their work was motivated by human skin layers, and they produced qualitatively skinlike reflectance properties. While the data acquired from the reflectance scanner can be analyzed to determine point reflectance for all the points on an object, one can also use the data more directly by considering the reflectance of the object as a whole. This ‘global’ reflectance is called the reflectance field. The reflectance field is a description of how light traveling along each ray incident to the object is transferred to all exiting rays of light. It should be noted that an exiting ray of light may have a contribution from an incident ray even if the

4 two rays do not intersect at a common point on the surface because the light can bounce multiple times within the volume. In fact, one does not need to assume that the objects have a well defined surface. They could, for example, be a transmissive object, such as a cloud. Like a light field, that describes the intensity of light for all rays through a surface [13], the surface over which a reflectance field is parameterized will generally not coincide with any physical surface, but rather it will be a simple convex shape enclosing the object.

In terms of light fields, we can describe a reflectance field as a function that

transforms an incident light field into a radiant light field. With a full reflectance field, we can reproduce the appearance of an object under any lighting condition, from any viewpoint.

Several researchers have addressed this

method of re-lighting. Nimeroff et al. [28] and Haeberli [16] showed how correct views of a scene under different lighting conditions can be created by summing images of the scene under a set of basis lights.

However, they do not take into account changes in

viewpoint. Wong et al. [40] applied a similar technique to light fields [21], creating a repositionable lightsource as well as allowing changes in viewpoint. Both algorithms take advantage of the super-position principle of light; the fact that the image of a scene under two light sources is the sum of the images of the scene under each light source separately. Similarly, the reflectance scanner described in this thesis takes images of the subject under a large array of lights, allowing one to calculate its appearance in any lighting environment.

Zongker et al., took a different approach with environment matting [45].

They showed that by illuminating shiny or refractive objects with a set of coded lighting patterns, the objects could be composited over an arbitrary background by approximating

5 the direction and spread of reflected and refracted rays. While not as physically accurate as the basis light technique, this method can produce much higher resolution data, and therefore retains the high frequency environmental information present when re-lighting highly specular or translucent objects.

To handle these types of objects, a method for

integrating environment matting with the reflectance scanner is presented in section 2.3.3. The reflectance field in its full form is eight dimensional, far too large to acquire in its entirety with current techniques.

The reflectance scanner uses two cameras in fixed

positions, from which the appearance of the subject under a spot light from over 2000 directions is captured.

The scanner incorporates a geometry scan in order to

geometrically interpolate intermediate camera viewpoints.

We use this data to derive

models of diffuse and specular reflectance that can be used to interpolate the subject’s appearance.

To simplify further, it is assumed that the spotlight used to illuminate the

subject generates a homogenous light field; that is, all the light is assumed to be traveling in the same direction with the same intensity, as if the light source were infinitely far away. This assumption agrees with the concept of a lighting environment, which records the intensity of light from each direction.

While lighting environments only record data

for a single point in space, they can be applied to an entire object with little error if the light sources are sufficiently distant.

Thus, we can use the data captured by the

reflectance scanner directly to compute the appearance of the subject under any lighting environment.

To compute the re-lighting in real-time, compression techniques are

implemented to reduce the amount of data to be read from memory for each calculation. A detailed description of the reflectance scanner and the algorithms used is presented in Chapter 2.

6

1.2 Geometry Acquisition The second device, which is intended to scan sculptures, takes a different approach.

Its main goal is to quickly acquire detailed surface geometry.

The surface

reflectance is estimated using only a few photographs from the scanning positions. The two most common methods used to obtain geometric scans of objects are time-of-flight and triangulation.

Time-of-flight scanners count the elapsed time between

an outgoing pulse of light and the arrival of its reflection. This time is used to calculate the distance it has traversed. This technique is employed by scanners produced by ZCam and Cyra.

Time-of-flight scanning generally has few occlusion problems, since the

emitter and detector can be co-located.

However, the scanners must have extremely

precise timing, and it can be challenging to align the resulting geometry with reflectance data captured from a separate camera. The geometry scanner described in this thesis uses the second technique, namely, triangulation, or ‘stereo’ scanning.

That is, given the images of a point from two

cameras, the 3D location of that point is triangulated by a computer.

Triangulation

scanners are generally inexpensive, as they can be made from consumer parts. However, since depth calculation accuracy is dependent on the parallax between the two cameras, they are often widely separated, resulting in more problems with occlusion. There are two prerequisites that must be met in order to estimate 3D depth using triangulation.

First, the orientations of the cameras (i.e. their extrinsic parameters), and

their projective properties (i.e. focal length and warping), must be calculated in the camera calibration step [12]. Second, one must determine which corresponding points in

7 the two images are projections of the same

3D

point.

correspondence

is

Determining

a

Projector

non-trivial task

because a point on the object can appear different

when

viewed

from

another

Subject

direction, or it may not be visible at all (see Figure 1). To circumvent this difficulty, one camera

is

replaced

with

a

digital

projector that projects light onto the scene.

A projector is, in essence, the

Camera

Figure 1: Occlusion. Because the camera and projector do not see the subject from the same angle, the visible surfaces differ. Only surfaces visible from both the projector and camera can be scanned.

inverse of a camera. Where a camera captures the incoming rays of light, the projector creates outgoing rays of light. The 3-D position of a point on the object is determined by the intersection of the rays from the camera and projector. With a projector, one can now easily solve the correspondence problem; in the minimal case, we can light a single projector pixel, and search for it in the camera image, thereby determining which camera pixel corresponds to that projector pixel. Using a projector or other light source in this way is known as “active structured light scanning”, as we are changing the scene by introducing light with a known structure.

Similar scanners have been constructed by

Trobina [36], and Zitnick [44], and used by Levoy [22] and Rushmeier [31] among others. Because the scanner cannot see the entire object at once, several scans are required.

Once acquired, individual scans can then be aligned using an iterated closest

8 point (ICP) algorithm [3], and merged with polygon zippering [38] or volumetric range merging [6] to produce a model of the entire object. The presentation of the geometry scanner focuses on the design of the device, as well as the processing and sub-pixel analysis to produce good single scans. The process to acquire a scan is as follows. First, the geometry scanner is calibrated by scanning a known object called the calibration object. In this case, a planar checkerboard object is scanned from several angles, and the calibration is derived using the method presented by Zhang [43].

After calibration, the object to be captured is

scanned. For each scan, a pattern is projected in sequence from the projector onto the subject, and the appearance of each pattern is captured by the camera. The entire series of patterns uniquely identifies each pixel in the projector.

The resulting camera images

are processed to retrieve the pixel correspondences. With these correspondences and the calibrations derived from the scans of the calibration object, the surface of the object can be triangulated. Our sub-pixel analysis technique goes further in estimating the exact sub-pixel location of the center of each camera pixel in the projector image.

This technique is

similar to the sub-stripe estimation of McIvor and Valkenburg used for calibration [25], and yields more accurate depth information than considering only pixel-to-pixel correspondence between the camera and projector. The geometry scanner is presented fully in Chapter 3.

9

1.3 Thesis Overview The remainder of this thesis describes the two scanners, the algorithms used to compute the geometry, and several re-lighting applications.

It is organized as follows.

Chapter 2 discusses the implementation of the reflectance scanner and image-based relighting methods using the resulting data.

Chapter 3 describes the design of the

geometry scanner and improved geometry algorithms, as well as showing global illumination renders using the resulting models. summary.

And Chapter 4 presents a brief

10

Chapter 2 Reflectance Field Scanning This chapter presents an apparatus for acquiring data on the reflectance field of a human face, as well as applications that use this data1 . Though designed to capture the appearance of people, the reflectance scanner also works well for other objects. The scanner takes a series of images of the subject under a spotlight five feet away. Each image is captured with light

coming

direction.

from

a

different

The entire data set of

lighting directions covers nearly the entire sphere around the subject. The hardware used in the scanner, shown in Figure 2, is described in section 2.1.

Calibration

of

geometry

acquisition is also described. Section 2.2 describes the data capture method.

1

Figure 2: The Reflectance Scanner ready to capture. The operators stand on a scaffold to handle the cords controlling the light, while video cameras capture the subject’s appearance.

This thesis goes into more depth on the implementation and algorithms described in the paper ‘Acquiring the Reflectance Field of a Human Face’ [8], presented at SIGGRAPH 2000.

11 By scaling and combining the captured images, one can recreate the appearance of the subject under any light environment.

This re-lighting takes advantage of the

superposition principle of light: the appearance of a person under two light sources is the sum of their appearance under each light source separately. Additionally, assuming care has been taken to avoid non-linear operations (e.g. clamping bright values), the color channels of each image can be scaled to produce the effect of different color light sources. Thus, scaling each acquired image to the color and intensity of the light coming from the corresponding direction in the environment and summing the results yields the appearance of the subject under that lighting environment. Section 2.3 goes into further detail on several variants of this rendering algorithm.

2.1 Hardware The hardware used to acquire data consists of a lightstage, two digital video cameras, a projector, and a calibration object. As described below, the hardware was designed to be cheap, simple to build, easy to setup, and yield good reflectance data.

2.1.1 Lightstage The lightstage (Figure 3) is a device we designed to move a light to nearly any position around a subject. The lightstage is used to investigate the reflectance field of an object by providing a basis set of lighting conditions.

The light can be positioned to

provide illumination from any direction, excluding only a solid angle of 50 degrees towards the ground.

While these illumination directions may be significant for arbitrary

12 objects, for seated human subjects they are blocked by the lower body, producing a complete shadow over the face. The device was built over the course of two weeks by Westley Sarokin and Tim Hawkins, using approximately $1000 worth of material from the local hardware store. It is composed of a wooden support frame on top of which was mounted an axle that is attached to two sets of PVC pipes holding the light source.

The axle allows the light

source to be rotated about the vertical axis, controlling longitude. The axle is powered by a person pulling steadily on the power cord, which must be wrapped around the axle before each data acquisition. The inner PVC pipe (a.k.a. the phi bar) is hinged with the outer PVC pipe (the theta bar), allowing the phi bar to rotate along the horizontal axis, controlling the latitude of the light. The phi bar is raised and lowered via a ball-chain that passes through the central axis.

This chain is let out during data capture, lowering the

light source from its topmost position, shown in the diagram in Figure 3, to the bottom of the lightstage. The path of the light source during a data capture can be seen in the longexposure photograph below Figure 3. To capture alignment readings on the rotation of the theta bar, a simple beeper was installed on the central axis to beep each time the theta bar passed through the zero angle. At each beep, the phi bar was lowered by one mark on the ball-chain. These marks were calculated to produce an equiangular spread in latitude. We can determine the exact time we hit theta zero on each rotation by analyzing the audio track of the video cameras to extract the theta beeps.

The intermediate angles of theta are interpolated

linearly between each pair of beeps.

This control mechanism is fairly accurate, as the

13 theta bar had little resistance and significant angular momentum, which makes it easy to keep the bar rotating at a fairly constant speed. The lightstage allows a dataset of the subject to be captured with the light in many positions quickly and accurately. The setup of the control cords, while labor intensive, is cheap and requires little mechanical, electrical, or robotics expertise.

14

Figure 3: The Lightstage. Above, a schematic of the device. Below, a long exposure photograph of the lightstage in operation, showing the path of the light source during a reflectance capture session.

15

2.1.2 Cameras The main data acquisition devices consisted of two Sony VX1000 mini-DV video cameras. These cameras, one to the front left and one to the front right of the subject, were set to record in parallel during both the geometry and reflectance scans.

Each

camera captures a 720 x 480 color image at 59.94 fields per second (interlaced). These cameras are relatively inexpensive, self-contained, and easy to use. Because they can be set to freely record in real-time, the cameras allow data to be acquired quickly.

The

images, unfortunately, are compressed with an interlaced block transform compression algorithm to reduce bandwidth. This compression introduces spurious high frequency noise (i.e., JPEG jaggies), and also encodes the chrominance information at half resolution in each direction, degrading the color data.

2.1.3 Projector For geometry acquisition, we wanted to acquire scans from the left, right, and front of the subject. The scanning was accomplished using a single LCD projector placed on a rolling platform.

Aside from being more economical than buying three separate

projectors, this setup had the added advantage that it could be easily removed during reflectance capture, so as not to interfere with the phi bar on the lightstage. Unfortunately, the geometry scans must be re-calibrated each time the projector is moved, necessitating the development of a special calibration device, described in section 2.1.4. The projector was connected to a laptop that generated the patterns (described in section 2.2.1). The LCDs in the projector can take up to 1/15th of a second to stabilize to a new color value, so the maximum attainable frame rate was 15 FPS. To simplify the

16 final processing, each projected pattern was separated by several black frames, which could be algorithmically detected, allowing automatic processing of the captured video.

2.1.4 Calibration To calibrate the geometry scan, an object with known geometry was required. Because the geometry scans must be calibrated each time the projector is moved, the calibration object had to be easily placed and removed, without touching the subject or the cameras.

The resulting design consisted of a six-sided calibration mask, and was

constructed by Westley Sarokin. The mask is made of cardboard styrofoam with attached calibration targets. It was attached to a boom arm so that it could be raised and lowered over the subject’s head. Knowing the positions of the calibration targets on the Mask of Fury, one can determine the orientations of the camera and projector relative to the subject. The method used for calibration was a direct local fit to the calibration object. That is, given a collection of visible, labeled target points on the calibration object, a set of virtual planes through those points is chosen and homographies are calculated that relate the position of the points in the camera image to their position on the virtual planes. Then, for each image pixel, a 3D point on each virtual plane is generated from the homographies, and a ray is fit to the set of all points. While this method is slow to apply because it must calculate homographic projections for a number of points, it requires no orientation estimation or non-linear minimization to solve for a camera model. rays dare generated directly from the calibration scan.

Instead,

This method works reasonably

17 well for low-distortion cameras like the VX1000, as it is essentially blending a number of non-distorted models. This section has described the hardware involved in the reflectance scanner.

The

next section will describe the procedure for capturing a reflectance scan.

2.2 Data Capture Our data capture process is composed of two stages: 1) a coarse geometry scan, and 2) the reflectance acquisition. Initially, the lightstage must be wound up, to wrap the power cord around the central theta axis. Then, the subject is placed in the center of the lightstage, and the two video cameras are aimed and set to record. The geometry of the subject is then captured using the projector, the Mask of Fury, and the two video cameras. The projector is moved to the left of the subject, and the patterns are displayed to capture geometry. Then, the Mask of Fury is lowered over the subject and the projector patterns are displayed again for calibration.

The projector is moved to the center position, and

patterns are displayed (calibration), followed by the raising of the Mask of Fury, and another set of patterns (geometry).

Finally the projector is moved to the right, and

geometry and calibration patterns are displayed a third time. Immediately afterward, the projector and Mask of Fury are removed and the reflectance model of the subject is captured using the lightstage and the same two video cameras.

Two people control the latitude and longitude of the light source through the

control cords. The theta control is pulled at an approximately constant rate, producing a revolution of approximately 1.5 Hz. control is lowered one notch.

At each revolution, marked by a beep, the phi

This procedure moves the light around the subject and

captures an image from many incoming directions.

18 We down-sample the longitudinal

image stream to 64 images between successive beeps, producing a dataset of 64 (longitude) x 23 (latitude) images, captured in approximately 70 seconds. The captured reflectance data encodes the appearance of the subject’s face under a point light coming from every direction. A subset of this data is shown in Figure 4.

Figure 4: Lightstage Data Set. A subset of the images acquired by a camera during the reflectance capture. The subject is imaged from the save viewpoint with a moving directional light source. The vertical axis in this image corresponds to the latitude of incoming light, and the horizontal axis is mapped to the longitude. The above data set (16 x 6 images) is down-sampled from an original resolution of 64 x 23 full resolution face images.

2.3 Image-Based Relighting At this point, an exhaustive data set has been captured under the constraints of a single point of view and a homogenous directional light source. This data can be used to compute the appearance of the subject from that point of view under any distant environment of light.

19 The light environments used are captured by taking panoramic high-dynamic range (HDR) images.

These images capture the amount of light coming from every

direction. The typical environment acquisition captures a series of images of a mirrored sphere at different known exposures, and combines them into an HDR image using the technique presented in [9]. To relight the scanned data set with a captured lighting environment, the environment must first be re-sampled into the same mapping (latitude/longitude) and resolution as the captured lightstage data (see Figure 5).

Figure 5: Re-sampling a light environment to match a reflectance data set. Taken from the SIGGRAPH 2000 Electronic Theater animation. The environment (in angular map form on the left) is first warped to the same mapping (center), and then re-sampled to the same resolution (right).

Once this is done, the appearance of the subject under that lighting environment can be calculated by scaling each data image to the color and intensity of the light coming from the corresponding direction in the environment and summing the results, (Figure 6). Care must be taken to down-weight the contribution of lights towards the top and bottom of the sphere (corresponding to the top and bottom of the maps in Figure 5), because there are more image samples in these regions. cos(F) to maintain a constant total intensity.

Each sample (T, F) is weighted by

20

Figure 6: Relighting the face. By scaling each reflectance image by the color and intensity coming from the corresponding direction in the environment (above left), and then summing all the resulting images (above right), we get the appearance of that subject under the light in that environment. The remainder of section 2.3 will explain variants on this rendering algorithm. Section 2.3.1 discusses an alternate perspective on the organization of the captured data and the corresponding re-lighting algorithm. In 2.3.2 a compression method is presented that allows real-time re-lighting, and finally, section 2.3.3 describes an extension that combines these techniques with environment matting to achieve high resolution reflected and refracted backgrounds.

2.3.1 Reflectance Functions The captured data set stores light intensities, or colors, indexed by image coordinates X, Y and light directions T, F. The images captured with the video cameras are indexed by (X, Y) and represent a single (T, F) light direction. Consider the image indexed by (T, F) for a single (X, Y) coordinate, which essentially is a transposition of the original data set. We call this image a reflectance function: the color of a single pixel on the camera image for every incoming light direction.

Figure 7 shows an array of

reflectance functions for various points on a face.

21 Reflectance functions are similar to

BRDFs, except they encode only a single output direction (towards the camera), and they include

non-local

effects

such

as

shadows,

interreflections,

and

camera

glare.

Reflectance functions allow easy comparison of the responses of different points on the subject, which may be hard to see in the original data set.

For example the effect of

fresnel reflection on the subject near glancing angles to the camera can be easily discerned.

This effect can be seen in Figure 7: the specular lobe of the reflectance

functions becomes quite bright near the edge of the face. Also, as described by Debevec et al. [8], it is possible to derive information about the surface at each point from its reflectance function, including surface normal, diffuse color, specular intensity and specular roughness, or fit the data to a more complicated model, that allows the subject to be relit from any viewing angle.

22

Figure 7: An Array of Reflectance Functions . Each reflectance function is the response of a single point on the subject (inset) to light from different directions. This reflectance function array is essentially a transposition of the original data set shown in Figure 4.

23

In reflectance function space, the relighting operation becomes a channel-wise dot product between the normalized environment map (as re-sampled in Figure 5) and the reflectance function. This process is illustrated in Figure 8.

Figure 8: Relighting Equation on Reflectance Functions. The image-based relighting process using reflectance functions is essentially a channel-wise dot product between the normalized light map and the reflectance function for each camera pixel. The environment light map must be normalized to account for the increased sampling rate near the top and the bottom of the image.

In summary, this section defined reflectance functions and the corresponding relighting algorithm.

In the next section, a method for performing these calculations in

real-time using compressed versions of the images will be presented.

24

2.3.2 Realtime Image-Based Relighting The re-lighting algorithm presented above requires O(n) time to compute, where n is the number of pixels in the input data set. A significant limitation of this acquired data set is its size; even when down-sampled, n can be extremely large.

Given current

maximum memory bandwidths of around 3.6 GB/s, the theoretical maximum rate for image relighting is 3 frames per second.

In practice, it is hard to even approach that

speed. We typically achieve 1.1 frames per second on an Intel Pentium 4 running at 1.8 Ghz. One means of speeding up the frame computation rate is to compress the data set. We applied a block transform compression algorithm similar to JPEG to each reflectance function. Each channel of the reflectance function is split into 8x8 blocks and then each block is mapped onto a Discrete Cosine Transform (DCT) basis.

However, instead of

applying the standard JPEG quantization step followed by zigzag reordering and Huffman coding, values below a certain threshold are simply reduced to zero (using a standard JPEG frequency weighting table as the thresholds), and then run-length encoded (RLE) in zigzag order. Because the DCT basis is orthonormal, the technique presented in Smith and Rowe [34] can be used to do the re-lighting dot product calculation directly on the DCT encoded data, as illustrated in Figure 9.

In fact, if the environment map is

processed similarly (with the zigzag reorder, but without threshold or run-length encoding), then the dot product can be accumulated as the RLE data is being decompressed, eliminating the need to store uncompressed data. Using this technique, a total speed-up equal to the compression ratio can be achieved, which is between 6 and 10 times for good JPEG quality levels. It should be noted that JPEG compression of video

25 typically converts the image into a luminance (intensity) channel and two chrominance (color) channels, and compresses the chrominance channels more heavily. The real-time re-lighting process uses a different strategy; each color channel is compressed separately using the same algorithm.

This method can result in some color-banding in dark areas,

but since the component-wise multiplication used in the calculations is not preserved in luminance-chrominance space, the re-lighting process would be much more complicated.

1

normalized light map

reflectance function

lighting product

rendered pixel

1

DCT Basis Figure 9: Reflectance function re -lighting operation in DCT space same answer, because the DCT basis is orthonormal.

produces the

The real-time relighting method described here was implemented with the help of Dan Maas in the Face Demo program, available at http://www.debevec.org/Face. The Face Demo lets the user choose a face dataset and a lighting environment, and then calculates the appearance of that face in the environment (see Figure 10). One can rotate the environment interactively to animate the light. The demo can run up to 32 calculating threads in addition to the rendering thread, and automatically slices up the calculation for each thread into chunks small enough to ensure good user responsiveness.

On a 16

processor SGI system, it can run 30 frames per second at full resolution. On computers

26 with less computing power, the face demo renders low-resolution datasets when the user is changing the lighting environment, and automatically switches to high resolution data sets when the user pauses on one set of lighting parameters. In addition to the captured lighting environments, users can also choose to create their own lighting environment interactively.

In the interactive lighting mode, a user can modify the color and intensity

of the ambient light, as well as three repositionable spot lights. The user can change the “softness” of each spot light, which controls the angular size of the light. angular size creates a softer, more diffuse look.

A larger

These lights are rendered into an

artificial environment map, and the calculation proceeds as above. When playing with the Face Demo, the user may notice that the background behind the subject is almost always pitch black, regardless of the lighting environment. This effect is not the result of editing the data, but rather the effect of cameras clipping the image data to eight bits. Typical 8-bit images have enough dynamic range to capture the light intensity variation across a human face, but fail to capture the intensity coming directly from the light. When the light is behind the subject (i.e., facing the camera), the intensity of the light in the image is clipped to 255, which is not much brighter than the face itself. The other 511 images most likely contain near-zero values in that position (it was a dark room), and thus the dark background. The next section addresses this problem, as well as the related problem of generating high resolution backgrounds.

27

Figure 10: The Face Demo Above, relighting the subject with light captured from Grace Cathedral. Below, with interactive user specified lights.

28

2.3.3 Image-Based Lighting With Environment Mattes The dynamic range of the images in the face data sets is usually too small to capture the full intensity of the light source directly. As a result, when using this data to re-light the subject, the background is too dark to see. However, when the reflectance data is acquired with a high-dynamic range [9], relighting the subject correctly creates a low-resolution image of the background behind him or her (see Figure 12b). If the data was acquired with enough angular resolution, dynamic range, and a sufficiently small light source, the background would look exactly as if the person were standing in that environment.

Unfortunately, capturing the reflectance data set at such a detailed

resolution is not currently possible.

However, several techniques can produce a similar

result. The simplest method is a traditional blue-screen matting system, often used in TV and movies to replace the background behind an actor. Unfortunately, this method only accounts for areas where you can directly see the background.

It does not correctly

handle reflection (e.g., off the steel ball in Figure 12), transparency (e.g., the glass lamp and bottles), or objects that are the same color as the matte color key. However, when the scene to be matted does not move, such as the still-life shown in Figure 12, it is possible to do much better. Smith and Blinn [33] presented a technique that uses two different background colors to obtain a better approximation of transparency.

This technique,

unfortunately, still fails to reproduce correctly the effect of refraction and reflection, as the light rays are considered to always pass straight through transparent objects.

The

environment matting technique of Zongker et al. [45] produces a good approximation of many refraction and reflection effects.

An environment matte is an approximation not

only of the relative transparency of each pixel, but also of any light ray bending.

29 We

combined the lightstage re-lighting technique with environment matting, which allows the calculation of a high resolution background that reflects and refracts through the objects in the scene.

For example, consider the scene in Figure 12, which includes a

number of semi-transparent, reflective and refractive objects.

To capture the highly

reflective surfaces, the reflectance data set was taken at high angular resolution (i.e., 128 samples longitudinally and 46 samples in latitude), and at five different exposure levels to generate high dynamic range.

In addition, the appearance of the objects was captured

while displaying a sequence of Gray codes on a screen behind them (as shown in Figure 12b).

From these images, Zongker et al.’s environment matting technique can estimate,

for each pixel in the image, a region of the background that contributes light to that pixel. By finding a region instead of a single pixel, it can better account for the magnification and reduction that can occur with refractive and reflective objects. However, because of limitations of the technique, the region must be estimated as an axis-aligned rectangular box.

This estimation is done by a separate multi-resolution search to find the optimal

range along each axis, minimizing an error function that compares observed pixel values to the expected Gray code response for a given range. The exact estimation algorithm is not detailed by Zongker et al.

We implemented a simple, but effective axis-aligned

gradient descent algorithm, which is outlined in Figure 11.

30 compute_error(first, last, observed_pixels) { observed_pixels is the current pixel’s value over our graycode image set level_scale is the average color of pixels in each graycode level (intensities tend to drop at higher frequencies – this is derived from analyzing the background) graycode(x, L) returns the parity (0,1) of pixel x in graycode level L error = 0; for (level = 1 to graycode_image_count) { // a pre-calculated sum-table makes ‘mean’ fast signature = mean of graycode(x, level) for x in [first, last]; error += signature * level_scale(level) – observed_pixels(level); } return error; }

bounds_search() { first = 0; last = 1024; step_size = (last – first) / 2;

// or whatever size you want to search

while step_size > 0 { Calculate and choose the minimum error of: (first, last), (first + step_size, last), (first – step_size, last) Assign first to the new minimum error value Calculate and choose the minimum error of: (first, last), (first, last + step_size), (first, last – step_size) Assign last to the new minimum error value If first and last did not change this iteration, halve step_size. } }

Figure 11: Calculating the environment matte for each pixel is accomplished by minimizing the error metric above. This non-linear minimization is performed on each axis by the algorithm below. This algorithm takes advantage of the fact that the error for large ranges changes more slowly than small ranges, as they must, since the error is being averaged over the entire range. So we can step quickly at first without worrying about overlooking minimal points. This algorithm found the optimal rectangular area in all but a few of the pixels in the test images (average 99.985% correct), and it was orders of magnitude faster than a direct search of the entire space.

31 Once the environment matte is found, it can be applied to any environment to calculate the effect of transmitted and reflected light from the background. The effect of light from the rest of the environment is calculated from the lightstage data set, by running the lightstage relighting process described in previous sections while omitting the contributions of lights coincident with the background (Figure 12e), since these directions will be approximated with the environment matte. Then, adding the environment matted background (Figure 12d) to this image yields an image of the scene transposed into the destination environment (Figure 12f). This process illustrates the similarities between environment matting and the lightstage process. Both processes attempt to find the response of a scene to incoming light. But where the lightstage acquires a complete basis of lighting over a small number of directions (e.g., 128 x 64 lighting directions in this example), environment matting uses the Gray code patterns as a set of non-complete basis lighting functions, allowing it to gather information quickly for a very large number of directions (e.g., 1024 x 768 in this case), but with reduced accuracy in each direction.

Environment matting must

assume some structure to the light response of the scene (i.e. the pre-image of each camera pixel is a rectangular region of the background), whereas the lightstage actually maps out the response to light from each direction exactly (but in lower resolution). This response map is none other than the reflectance function described in 2.3.1.

32 ß (a) Scene under one light source

(b)à Environment matte source frame

ß(c)à Scene relit with lightstage data. Grace Cathedral on the left and the Berkeley eucalyptus grove on the right

ß(d)à Environment matted scene captures reflected and refracted light from the background in higher resolution.

ß(e)à Same as (c) with omission of lights visible in the background

ß(f)à Sum of (d) and (e): a fully relit scene with high resolution background

Figure 12: Environment Matting integrated with the lightstage relighting process. The lightstage data set was recorded in high dynamic range and at twice the angular resolution to account for the highly specular surfaces in the scene.

33

2.4 Summary This chapter described the design and application of a reflectance scanner capable of acquiring data on the varying reflectance of a human face. Re-lighting algorithms for rendering faces with the captured data were presented, including a real-time variant and a method for integrating environment matting in order to achieve high quality backgrounds. The next chapter will discuss the geometry scanner constructed to quickly acquire the geometry of a number of Parthenon sculptures.

34

Chapter 3 Geometry Scanning The second scanning system, shown in Figure 13, was built to scan sculptures. The main goal of this scanner was to acquire accurate and detailed surface geometry of the subjects.

Less emphasis was placed on acquiring accurate reflectance properties.

Because we intended to take this scanner to several museums around the world, it had to be portable and easily positioned for each scan. We also wanted to reduce the amount of time to scan an object, to maximize the amount of data acquisition possible in our limited time frame.

Figure 13: Scanning sculptures in the Basel Skulpturhalle Our sculpture scanner in action, scanning a cast of a caryatid from the Acropolis.

35 As described in Chapter 1, we decided to use a structured light scanner, wherein a projector displays a set of patterns onto the subject which are recorded by a camera. These images are used to triangulate the 3D structure of the visible surface.

Structured

light scanners can acquire a large number of points very quickly, as they require only O(log N) images to reproduce geometry of resolution N x N. inexpensive to construct, and they are small and lightweight.

These scanners are Often, structured light

scanners are less accurate and have less range than laser scanners. This chapter describes the construction, calibration, and processing involved in the geometry scanner. 3.1

explains

the

Section

choice

of the

scanner

hardware,

and

scanner

calibration

technique.

Section 3.2 discusses scanner operation,

including

the

light

patterns employed for geometry and

reflectance

acquisition.

Finally, to help overcome the limited precision of structured light scanners, a method of deriving

accurate

sub-pixel

correspondences is presented in section

3.3,

resulting scans.

along

with

the Figure 14: The Sculpture Scanning Rig with adjustable baseline, mounted on a tripod.

36

3.1 Scanner Hardware The scanner is composed of a camera, a projector, a mounting system, a calibration object, and a computer. Each component had to be relatively light-weight and portable. We chose to use a portable suitcase computer (manufactured by MaxVision), with 2 GB of RAM, that enabled it to cache the captured camera frames in memory. The computer also featured two video cards and a video capture card, that enabled it to attach simultaneously to the camera, projector, and a user interface display. The scanner mounting system had to be reconfigurable and the camera and projector had to be secured in a solid relative orientation when locked down. To these ends, the camera and projector were each mounted on a FOBA Mini-Superball camera head, that allowed them to be repositioned to nearly any angle and locked in place very securely. These camera head mounts, in turn, were attached to a square steel rod about 1 meter in length, that was connected to a larger FOBA Superball camera head. This setup allowed the entire rig to be rotated (e.g., to the vertical position shown in Figure 13, or the horizontal position in Figure 14).

The rig was supported by a heavy duty camera

tripod with optional wheel attachment. The camera and projector must support high resolutions and a fast frame rate while providing a clean signal. The exact details of these components is described in the remainder of this section.

37

3.1.1 Projector A Proxima Ultralight x350 DLP Projector was used for the light and pattern source (Figure 15).

This projector, with a pixel resolution of 1024 x 768, had the

advantage of using DVI input, which transmits the signal digitally from the computer. We found that a number of other projectors did not produce a clean signal when using standard analog connections (e.g. VGA), often generating pixels that were blurred horizontally.

As an added advantage, the x350 projector was also one of the smallest

(2.36” x 7.12” x 8.42” ) and lightest (3.5 pounds) projectors available at that time. This small form factor allowed a secure mounting to the scanning rig and reduced the weight of the overall system. The Digital Light Processing (DLP) chip, or Digital Micro-Mirror Device (DMD), inside the projector uses an array of oscillating micro-mirrors, one mirror per pixel, to produce varied intensities in the projected light.

The projected light is

nearly un-polarized, in contrast to the strongly polarized light produced by an LCD projector. This fact is important, as the reflectance of polarized light from a Figure 15: The Proxima Ultralight x350 DLP Projector is small, light, and can utilize a DVI digital signal.

surface can change depending on the orientation

of

the

polarization.

Additionally, since the mirrors in a DLP projector oscillate at a very high rate (50-100

38 KHz), each projected pixel value can change intensities very quickly, whereas pixels in an LCD projector can take up to 1/20th of a second to completely change. Unfortunately, because of the cost of the chips, most DLP projectors like the x350 use a single chip to produce the red, green and blue color channels of a projected image. They use a rotating color-filter wheel in front of the light source to cycle the color of the projected light, switching the mirrors’ response appropriately, instead of using a separate chip for each color channel as in most LCD projectors.

This ‘interleaved’ color system complicates

our scanner, as camera images taken at short exposure times can vary in intensity based on where the color-filter wheel happens to be during the exposure.

However, we

determined experimentally that the color-filter wheel rotates at a constant rate of 120Hz (even when powered by 50 Hz AC as in Europe), allowing us to time the exposures to an integral number of rotations, thus obtaining constant intensity values from exposure to exposure. The projector was used in conjunction with a Matrox G200 MMS Dual DVI PCI video card, which was one of the few PCI cards that supported DVI output at the time, allowing the computer to project patterns on the object and simultaneously display the captured images on a monitor connected to the computer.

3.1.2 Camera Images were captured using a Pulnix TM-1040 Black and White progressive scan CCD camera (Figure 16). The Pulnix camera can capture 30 frames per second, where each frame is composed of a 10-bit black and white image at a resolution of 984 x 1010 pixels. This camera was connected to a Matrox Meteor II digital frame capture card. The advantages of this camera setup over the Sony VX1000 used in Chapter 2 to capture face

39 data are numerous. Aside from having nearly twice the resolution, the exposure of each frame can be controlled precisely by the connected computer, and the resulting frames can be delivered digitally without compression.

Because the Pulnix camera has

“progressive scan” capture rather than interlaced, there was no need to wait for two fields to be captured sequentially.

The frames, being sent straight to the computer, can also be

used immediately without the need to transfer them from a video tape. However, because the camera must be connected to a computer while in operation, versatility was somewhat limited. The light-response curve of the camera (shown in Figure 16) was calibrated with HDR Shop [http://www.debevec.org/HDRShop], using a technique similar to that presented by Debevec [9].

Figure 16: The Pulnix TM-1040 Digital Camera (above) can capture 984 x 1010 pixel black and white images at 30 fps. Its response curve is show on the right.

The camera light response was fairly linear, although it did have an offset that needed consideration; the response curve did not intersect the origin, which led to an artificial noise floor.

One additional difficulty encountered was the dual-channel

digitization of the camera: even and odd lines of the CCD sensor were digitized by

separate DACs.

40 Each DAC is configured individually, and it must be calibrated to

respond similarly in order to get an even signal. Finally, it was discovered that the CCD sensors were not positioned on an even grid. Every other row was shifted vertically by 25%, which introduced a corrugation in the 3D geometry unless this shifting was compensated for in the acquired data.

3.1.3 Calibration The size of the scanning volume was approximately 1.3m x 1.0m. This size ruled out the use of a small calibration object such as the one used to calibrate the scanner in chapter 2.

Consequently, a much larger calibration object was needed.

Such large

objects tend to be too flimsy and subject to warping, or they are too heavy and cumbersome to transport easily.

We

finally settled on using an aluminum honeycomb

panel,

covered

with

checkerboard wallpaper (Figure 17). By scanning this object in several orientations it is possible to derive both intrinsic and extrinsic properties of the scanner and projector.

The distortion

characteristics (shown in Figure 18) and the relative orientations of the scanner can be determined.

Figure 17: Sculpture Scanner Calibration Device, consisting of an aluminum honeycomb panel about one meter square, covered with checkerboard grid wallpaper. An easel and scaffold were used to position the calibration device adjacent to the sculpture to be scanned.

41 The basic non-linear error minimization method in Matlab, fminsearch, was used to solve for the camera and projector intrinsic and extrinsic parameters simultaneously. The intrinsic parameters consisted of focal length and eight distortion parameters: center of projection (x, y), center of distortion (x, y), and 4 radial distortion coefficients: f(r) = r * (1 + k1 r + k2 r2 + k3 r3 + k4 r4 ). Using this method, we achieved an RMS re-projection error of less than half a pixel for the calibration. Rays through corresponding points from the camera and projector intersected with a mean error of less than one millimeter.

42

Figure 18: Calibrated Camera and Projector Distortion calculated from images of the calibration object. The graphs on the left plot distorted radius vs. undistorted radius. The images on the right illustrate the resulting distortion on equally spaced parallel lines, and circles that highlight the radial center. The radial distortion was modeled as a full fourth order polynomial around a center of distortion.

43

3.2 Data Capture We took the scanner to the Skulpturhalle in Basel, Switzerland to scan the geometry of the Parthenon frieze. The actual Parthenon frieze has been scattered around the world, but the Skulpturhalle has casts of every known piece, reconstructed on site. Because the pieces were arranged lengthwise and were fairly flat, we devised a systematic set of scans to be taken, that allowed us to quickly scan the entire 500 feet of frieze in under 3 days. We set down masking tape in a line along each run of frieze, one meter from the sculptures, and marked off every half meter on the masking tape. With the scanning positions marked off, we calibrated the scanner once at a certain angle, and then performed a series of scans covering several contiguous runs of frieze without pausing. For each individual scan, we aligned downward pointing lasers mounted on the scanning rig with the marks on the tape, giving us an orientation and a position estimate. Each section of frieze was scanned five times from different directions in order to capture areas possibly obscured from one viewpoint. We scanned straight on, from the left and the right every half meter, and from above and below every meter. Each scan consisted of projecting and capturing the appearance of the horizontal and vertical ‘Gray code’ patterns described below, as well as three High-Dynamic Range (HDR) color images, obtained at the beginning, middle and end of each scan.

The

images were obtained by projecting full screen red, green, blue, and black (ambient light level) from the projector, and recording an exposure sequence for each with the black and white video camera (i.e., as described in [9]). Subtracting the ambient image from the red, green, and blue images yielded the three color channels of a full color image.

44 Depending on the amount of overlap between the different scans, it is possible to reconstruct a simple reflectance model from these color images, though we have not yet investigated them in detail.

Holly Rushmeier constructed similar reflectance models

[31], though she used a number of light directions for each scanning position. The entire data set captured in Basel totaled over 80 gigabytes of losslessly compressed images, consisting of 2,200 scans. This section has discussed the process used to capture a large number of geometry scans of the Parthenon sculptures. The remainder of the section discusses the projected patterns used by the scanner to achieve robust identification of the projector pixels. In order to solve the correspondence problem, a set of patterns is projected on the object such that, for every camera pixel, one can uniquely determine which projector pixel it is viewing by analyzing its color sequence.

In this manner, a projector pixel

correspondence can be found for each pixel of the camera. For example, one can uniquely identify the rows of the projector by creating a set images where each row is shaded according to a binary pattern (Figure 19). The same can then be done with the columns, giving the coordinate of the pixel. binary encoding scheme is not ideal.

However, the

During the projection and photographing of the

pattern, a certain amount of ‘blurring’ (i.e., destruction of high frequency information) inevitably occurs.

Even ignoring the imperfect optics of the camera and projector, the

projected patterns are re-sampled across each camera pixel, which introduces blur. Unfortunately, when a binary pattern is blurred spatially, along with any amount of camera noise, the introduced error is unbounded, even for very small blurs.

For example, consider Figure 19.

45 The images on the top show the idealized

patterns as projected, and the images below show the same pattern blurred spatially, as captured by the camera.

In the binary patterns on the left, consider a camera pixel that is

looking at a position halfway between rows 3 and 4. In each image, that camera pixel will receive a color halfway between bright and dark, since all the images are in transition between rows 3 and 4.

Figure 19: Identifying pixel location from a image pattern sequence. Above, idealized patterns as projected by the projector. Below, the same patterns blurred spatially, as captured by the camera. In a binary pattern (on the left), any amount of blur introduces an unbounded error in the row/column identification. The Gray code pattern on the right is more robust to blurs, as only one pattern is in transition between any two rows/columns.

In this case, with a small amount of camera noise on top of the images, it is possible for that camera pixel to identify itself as corresponding to any row in the projector.

In transitioning regions, small errors in the pixel intensities can shift them

above or below the threshold between bright and dark pixels. Because every image is in transition between rows 3 and 4, large errors can be introduced. In practice, this problem

46 is very noticeable as collections of error points cluster on projector rows with large numbers of transitions. To reduce these errors, it is better to use a Gray code pattern, named after Frank Gray [15]. The Gray code (Figure 19, on the right) is calculated as:

 x GrayCode ( x) =   ⊗ x  2  log 2 x  

x GrayCode ( x) = ⊗  i  i=0 2  −1

where the nth bit of GrayCode(x) corresponds to image n, row x. In a Gray code, at most one image will be in transition between any two rows. Assuming the blur is less than one pixel, even if the thresholds cannot distinguish bright from dark in a transitioning region, the largest possible error induced is one row. When capturing geometry, horizontal and vertical Gray code sequences and their inverses are projected on the subject and the images of the subject are captured from the cameras (as shown in Figure 20). The Gray code response for any pixel, gi, can be found by subtracting the inverse from each image. The Gray code response allows a better localization of the bright/dark transitions, which are now located at the zero crossings.

Since light from secondary

diffuse bounces is usually very similar in intensity in both the Gray code and its inverse, subtracting the two should remove most of the effect of this light.

This assumption is

better for high-frequency Gray code patterns than low-frequency ones; diffusely reflected light is, in some sense, a weighted average of all visible points. High-frequency patterns

47 are more likely to have several oscillation periods visible from any point, so the average will be more likely to be similar to its inverse than to low-frequency patterns.

Figure 20: Horizontal and Vertical Gray Code Patterns . Two of the 44 images taken to get a full camera-projector pixel correspondence. Each of 10 patterns and its inverse is displayed both horizontally and vertically, as well as an additional 4 patterns described in section 3.3.

Given a vertical Gray code response image gi for each pixel p from the camera, we generate the corresponding projector coordinate R(p) as follows:

  R( p ) = GrayCode −1  ∑ ( g i ( p ) > 0) * 2 i   i  Calculating these coordinates for both the horizontal and vertical components gives the pixel coordinate of the projector pixel (R(p), C(p)) that was projected onto the subject and recorded by the camera pixel p.

48

3.3 Sub-pixel Precision To achieve more accurate geometry data, the scanner captured an additional set of stripe patterns of the same frequency as the highest frequency Gray code patterns, but with a phase shift of 0.25 periods. This pattern provides information that is theoretically redundant with the rest of the Gray code patterns, but should be less affected by globalillumination effects.

For this reason, when the other Gray code patterns produce an

answer in conflict with this additional pattern, the conflicting pixels are corrected in favor of the high frequency pattern. Furthermore, with both sets of high frequency patterns, a good estimate of sub-pixel position can be generated: for every camera pixel, the projector coordinate (not necessarily aligned on a pixel) that corresponds is calculated. This method is similar to the sub-stripe estimation approach of McIvor and Valkenburg [25],

where

calibration

the

authors

targets

to

localize sub-pixel

accuracy in the projector.

In our

application they are applied to the entire image to create more accurate correspondences. The

sub-pixel

location

is

generated by taking advantage of the inherent

camera/projector

discussed in section 3.2.1. makes

it

blur The blur

possible to differentiate

Figure 21: Gray code response for highest frequency patterns . Because of blurring between the projector and camera, the response becomes rounded. This allows us to find sub-pixel correspondence.

49 between sub-pixel coordinates. Figure 21 shows the plot of a typical Gray code response for the two high frequency patterns (one phase shifted 25%).

If these curves were

perfectly sinusoidal, we could extract the sub-pixel position by taking the arctangent of the first curve divided by the second curve and scaling the answer appropriately.

The

results of such an operation are shown in Figure 23c. They show a distinct improvement over the standard correspondence. In reality these curves are not sinusoidal. However, if one assumes that the actual curve over a pixel is monotone, the approximate position derived from taking the arctangent will be within a monotone transformation of the actual position.

This

transformation can be modeled statistically. Assuming the actual sub-pixel positions, that is, the fractional part of the coordinates, are evenly distributed, the transformation is described simply by sorting the list of approximate sub-pixel positions. By inverting this transformation, we can derive the actual sub-pixel location of every point. This process is much like flattening the histogram of sub-pixel positions: it evenly distributes them. Unfortunately

there

are

some

additional

complications.

transformation often varies across the subject, as illustrated in Figure 22.

The

modeled

To account for

this variation, the image is decomposed into regions of similar geometry and each region is modeled separately, with smooth blending between models on the borders of each region. Some care must be taken to ensure that the regions are not too small, which can lead to erroneous transformation curves, as our assumption that the sub-pixel coordinates are evenly distributed may be violated. It is interesting to note that, for the camera and projector used in the sculpture scanner, the sub-pixel positions generated by the sinusoidal approximation are nearly correct – our modeled curves only appear to apply a

50 small sinusoidal offset, as shown in Figure 22. The simple structure of the offset curves implies it may be possible to generate a low parameter model that describes the transformation, precluding the need to model a transformation for each region with data from the scan.

Figure 22: Deviance from sinusoidal sub-pixel curve, over 5 different regions of the scan. Because the curves change over the image, each region is modeled separately.

51 The results of the geometry scanning system are shown in Figure 23. The upper left image is the HDR color image of the scan. The other three images display: 1) the result of no sub-pixel correspondence (upper right), 2) the original sub-pixel correspondence assuming purely sinusoidal curves (lower left), and 3) the final sub-pixel correspondence using locally modeled curves (lower right).

These three images are

shaded according to the gradient magnitude of the vertical correspondance. This shading produces an image somewhat similar to how they might look when reconstructed in 3D and lit from below, and highlights details in the surface structure. The actual 3D geometry resulting from the frieze scan is displayed at the top of Figure 24. The scan shown in Figure 23 has been aligned and merged with four adjacent scans to create a short strip of the frieze. Despite being down-sampled by a factor of two in each dimension, the resulting mesh consists of over 1 million polygons. At the bottom of Figure 24 is a global illumination rendering of the frieze strip placed into a model of the Parthenon.

The render was computed using the Arnold global illumination renderer

by Marcos Fajardo. Notice the shadow in the global illumination rendering, a result of an overhanging support beam blocking the sun.

Figures 25 and 26 show a partially

reconstructed model of a caryatid, and a global illumination rendering using it.

The

caryatid model was scanned from a plaster cast of the original caryatid, one of six that supported the porch of the Erectheon, a small temple adjacent to the Parthenon. damage apparent on the caryatid model is the result of weathering.

The

millennia of vandalism and

52

(a)

(b)

(c)

(d)

Figure 23: Parthenon Frieze Scan (a) is the color picture of this section of frieze derived from 3 black and white images under colored illumination. (b) is the standard correspondence with no sub-pixel information. (c) is a sub-pixel scan derived assuming sinusoidal behavior. (d) is a sub-pixel scan using localized models of the sub-pixel curve behavior.

53

Figure 24: Acquired frieze geometry of the first two panels of the west frieze of the Parthenon. To create the geometry (above), four separate scans were aligned. The geometry shown here has been down-sampled by a factor of two in each dimension, and consists of over one million polygons. Below, a rendering of the frieze placed in a model of the Parthenon. Sunlight throws a shadow from a support beam onto it.

54 Figure 25: Caryatid scan, as acquired by the sculpture scanner. The four images on the left are single scans, with the HDR image applied as a texture map onto the geometry. Each scan, processed at ½ resolution in X and Y, consists of about 100,000 vertices. The three images below are different views of three aligned scans. The geometry is flat-shaded, with a single directional light source. The texture maps have been removed, so all the detail visible is a product of the geometry. The entire data set of the caryatid consisted of 48 scans. This caryatid is a plaster cast of one of six caryatids that supported the porch of the Erectheon, a small building adjacent to the Parthenon in Athens. The caryatid actually is missing part of her nose and mouth they have been broken off.

55

Figure 26: Caryatid Rendering, computed using the acquired caryatid model and the Arnold global illumination rendering software.

3.4 Summary This chapter described the design and application of a scanner capable of quickly acquiring detailed surface geometry. The hardware considerations, projector and camera calibration, and a method for achieving sub-pixel correspondences were discussed. The key features of this scanner are its portability, increased accuracy and a fast scanning rate, allowing it to capture many detailed scans in a short time.

56

Chapter 4 Conclusion This thesis described two scanners that can acquire the geometry and reflectance of real world objects, that can be used to produce detailed models suitable for realistic rendering. The reflectance scanner described in Chapter 2 can acquire high resolution data on the response of the subject to different directional lights. This allows the creation of more complete reflectance models describing how the subject interacts with light. Furthermore, this data can be used directly to re-light the subject in an image-based manner, producing convincing renderings under novel lighting environments. The geometry scanner described in Chapter 3 can quickly acquire scans of the surface geometry of the subject. By modeling the sub-pixel behavior of projected Gray code patterns, the geometry scanner presented can produce more accurate scans. With its speed, portability and relatively inexpensive components, this scanner allowed us to scan a large number of sculptures from the Parthenon in very little time.

57

Bibliography [1]

Baribeau, R., Cournoyer, L., Godin, G., and Rioux, M. Colour three-dimensional modeling of museum objects. Imaging the past, Electronic Imaging and Computer Graphics in Museum and Archaeology (1996), 199-209.

[2]

Bouguet, J.-Y., and Perona, P. Method for recovering 3D surface shape based on grayscale structured lighting. Technical Report 136-93, California Institute of Technology.

[3]

Chen, Y., and Medioni, G. Object modeling from multiple range images. Image and Vision Computing 10,3 (April 1992), 145-155.

[4]

Cook, R.L., and Torrance, K.E. A reflectance model for computer graphics. Computer Graphics (Proceedings of SIGGRAPH 81) 15,3 (August 1981), 307-316.

[5]

Curless, B., and Levoy, M. Better optical triangulation through spacetime analysis. In Intl. Conference on Computer Vision (June 1995), 987-994.

[6]

Curless, B. and Levoy, M. A volumetric method for building complex models from range images. In SIGGRAPH 96 (1996), 303-312.

[7]

Debevec, P. Rendering synthetic objects into real scenes: Bridging traditional and image-based graphics with global illumination and high-dynamic range photography. In SIGGRAPH 98 (July 1998).

[8]

Debevec, P., Hawkins, T., Tchou, C., Duiker, H.-P., Sarokin, W., and Sagar, M. Acquiring the reflectance field of a human face. Proceedings of SIGGRAPH 2000 (July 2000), 145-156.

[9]

Debevec, P., and Malik, J. Recovering high dynamic range radiance maps from photographs. In SIGGRAPH 97 (August 1997), 369-378.

[10] Debevec, P., Taylor, C. J., and Malik, J. Modeling and rendering architecture from photographs: A hybrid geometry- and image-based approach. In SIGGRAPH 96 (August 1996), 11-20. [11] Debevec, P., Yu, Y., and Borshukov, G.D. Efficient view-dependent image-based rendering with projective texture-mapping. In 9th Eurographics workshop on Rendering (June 1998), 105-116. [12] Faugeras, O. Three-Dimensional Computer Vision: a Geometric Viewpoint. MIT Press (1999).

58 [13] Gershun, A. Svetovoe Pole (the Light Field). Journal of Mathematics and Physics XVIII (1939), 51-151. [14] Gortler, S. J., Grzeszczuk, R., Szeliski, R., and Cohen, M. F. The Lumigraph. In SIGGRAPH 96 (1996), 43-54. [15] Gray, F. Pulse Code Communication. U.S. Patent 2,632,058 (1953). [16] Haeberli, P. Synthetic lighting for photography. Available at http://www.sgi.com/graifca/synth/index.html, January 1992. [17] Hanrahan, P., and Krueger, W. Reflection from layered surfaces due to subsurface scattering. Proceedings of SIGGRAPH 93 (August 1993), 165-174. [18] Heikkila J., and Silven, O. A four-step camera calibration procedure with implicit image correction. Proceedings of IEEE Computer Society Conference (1997), 1106-1112. [19] Karner, K.F., Mayer, H., and Gervautz, M. An image based measurement system for anisotropic reflection. In EUROGRAPHICS Annual Conference Proceedings (1996). [20] Lafortune, E. P. F., Foo, S.-C., Torrance, K. E., and Greenberg, D. P. Non-linear approximation of reflectance functions. In SIGGRAPH 97 (1997) 117-126. [21] Levoy, M., and Hanrahan, P. Light Field Rendering. In SIGGRAPH 96 (1996), 31-42. [22] Levoy, M., Pulli, K., Curless, B., Rusinkiewicz, S., Koller, D., Pereira, L., Ginzton, M., Anderson, S., Davis, J., Ginsberg, J., Shade, J., and Fulk, D. The digital Michelangelo project: 3D scanning of large statues. Proceedings of SIGGRAPH 2000 (July 2000), 131-144. [23] Malzbender, T., Gelb, D., and Wolters, H. Polynomial texture maps. Proceedings of SIGGRAPH 2001 (August 2001), 519-528. [24] Marschner, S. Inverse rendering for computer graphics. PhD thesis, Cornell University (August 1998). [25] McIvor, Alan M., and Valkenburg, Robert J. Substripe Localisation for Improved Structured Light System Performance. DICTA/IVCNZ’97 (December 1997), 309314. [26] McIvor, Alan M. An Alternative Interpretation of Structured Light System Data. Industrial Research Limited Report 690 (March 1997).

59 [27] Nicodemus, F.E., Richmond, J.C., Hsia, J.J., Ginsberg, I.W., and Limperis, T. Geometric considerations and nomenclature for reflectance. [28] Nimeroff, J.S., Simoncelli, E., and Dorsey, J. Efficient re-rendering of naturally illuminated environments. Fifth Eurographics Workshop on Rendering (June 1994), 359-373. [29] Oren, M., and Nayar, S.K. Generalization of Lambert’s reflectance model. Proceedings of SIGGRAPH 94 (July 1994), 239-246. [30] Rushmeier, H. and Bernardini, F. Computing consistent normals and colors from photometric data. In Second Intl. Conference on 3D Digital Imaging and Modeling (1999), 99-108. [31] Rushmeier, H., Bernardini, F., Mittleman, J., and Taubin, G. Acquiring input for rendering at appropriate levels of detail: Digitizing a pieta. Eurographics Rendering Workshop 1998 (June 1998), 81-92. [32] Sato, Y., Wheeler, M.D., and Ikeuchi, K. Object shape and reflectance modeling from observation. In SIGGRAPH 97 (1997), 379-387. [33] Smith, A.R., and Blinn, J.F. Blue screen matting. In Proceedings of SIGGRAPH 96 (August 1996), 259-268. [34] Smith, B., and Rowe, L. Compressed domain processing of JPEG-encoded images. Real-Time Imaging 2, 2 (1996), 3-17. [35] Torrance, K.E., and Sparrow, E.M. Theory for off-specular reflection from roughened surfaces. Journal of Optical Society of America 57,9 (1967). [36] Trobina, Marjan. Error Model of a Coded Light-Range Sensor. Technical Report BIWI-TR-164, ETH-Zentrum, (1995). [37] Tsai, Roger Y. A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf tv cameras and lenses. IEEE Journal of Robotics and Automation (August 1987), 3(4):323-344. [38] Turk, G., and Levoy, M. Zippered polygon meshes from range images. In SIGGRAPH 94 (1994), 311-318. [39] Ward, G.J. Measuring and modeling anisotropic reflection. In SIGGRAPH 92 (July 1992), 265-272.

60 [40] Wong, T.-T., Heng, P.-A., Or, S.-H., and Ng, W.-Y. Image-based rendering with controllable illumination. Eurographics Rendering Workshop 1997 (June 1997), 13-22. [41] Wood, D. N., Azuma, D. I., Aldinger, K., Curless, B., Duchamp, T., Salesin, D.H., and Stuetzle, W. Surface light fields for 3D photography. Proceedings of SIGGRAPH 2000 (July 2000), 287-296. [42] Yu, Y., Debevec, P., Malik, J., and Hawkins, T. Inverse Global Illumination: Recovering Reflectance Models of Real Scenes from Photographs. In SIGGRAPH 99 (August 1999), 215-224. [43] Zhang, Z. Flexible Camera Calibration by Viewing a Plane From Unknown Orientations. International Conference on Computer Vision (September 1999), 666-673. [44] Zitnick, C. L., and Webb, J. A. Multi-baseline Stereo Using Surface Extraction. Technical Report CMU-CS-96-196, Carnegie Mellon University (November 1996). [45] Zongker, D. E., Werner, D. M., Curless, B., and Salesin, D.H. Environment matting and compositing. In SIGGRAPH 99 (August 1999), 205-214.