Low Latency Photon Mapping using Block Hashing - CiteSeerX

University of Waterloo School of Computer Science Technical Report CS-2002-15

Low Latency Photon Mapping using Block Hashing Vincent C. H. Ma and Michael D. McCool

February 2002

Low Latency Photon Mapping using Block Hashing Vincent C. H. Ma1 and Michael D. McCool1 University of Waterloo, School of Computer Science Waterloo, Ontario, Canada email: [email protected]

Technical Report CS-2002-15 February 2002 Abstract. Photon mapping is useful in the acceleration of global illumination and caustic effects computed by path tracing. For hardware accelerated rendering, photon maps would be especially useful for simulating caustic lighting effects on non-Lambertian surfaces. For this to be possible, an efficient hardware algorithm for the computation of the k nearest neighbours to a sample point is required. Existing algorithms are often based on recursive spatial subdivision techniques, such as kd-trees. However, hardware implementation of a tree-based algorithm would have a high latency, or would require a large cache to avoid this latency on average. Such an implementation would require specialised hardware. It is much more desirable to use an algorithm that exploits current accelerator capabilities and hardware support, perhaps with some small extensions. We present a neighbourhood-preserving hashing algorithm that is lowlatency and has sub-linear access time. This algorithm is more amenable to fine-scale parallelism than tree-based recursive spatial subdivision, and maps well onto coherent block-oriented pipelined memory access. These properties make the algorithm suitable for hardware implementation. In particular, we demonstrate a direct mapping of our technique onto a hardware implementation, and sketch an implementation using future programmable fragment shaders with only one stage of dependent texturing.

2

Table of Contents

Table of Contents 1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Major Contributions . . . . . . . . . . . . . . . . . 2 The k-Nearest Neighbours Problem . . . . . . . . . . . 2.1 Problem Definition . . . . . . . . . . . . . . . . . . 2.2 Previous Work . . . . . . . . . . . . . . . . . . . . . Recursive Spatial Subdivision . . . . . . . . . . . Graph-based Techniques . . . . . . . . . . . . . . Point Location . . . . . . . . . . . . . . . . . . . . . Hashing-based Techniques . . . . . . . . . . . . . Surveys on kNN Research . . . . . . . . . . . . . 3 Block Hashing: Preliminaries . . . . . . . . . . . . . . . 3.1 Locality-Sensitive Hashing . . . . . . . . . . . . . 3.2 Block-Oriented Memory Model . . . . . . . . . . 4 Block Hashing: Details . . . . . . . . . . . . . . . . . . . . 4.1 Organising Photons into Blocks . . . . . . . . . Space-Filling Curves . . . . . . . . . . . . . . . . . Sorting and Grouping the Photons . . . . . . . Compaction of blocks . . . . . . . . . . . . . . . . . 4.2 Creating the Hash Tables . . . . . . . . . . . . . . 4.3 Inserting Photon Blocks . . . . . . . . . . . . . . . 4.4 Querying . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Post Processing of Candidate Set . . . . . . . . 5 Choice of Parameter Values . . . . . . . . . . . . . . . . 6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Metrics for Gauging Algorithm Performance 6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . Test Scene 1: Ring . . . . . . . . . . . . . . . . . . Test Scene 2: Venus with Ring . . . . . . . . . . 6.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 7 Hardware Implementation . . . . . . . . . . . . . . . . . 7.1 Custom Hardware . . . . . . . . . . . . . . . . . . . 7.2 Accelerator-based Implementation . . . . . . . 8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1 Block Hashing Related . . . . . . . . . . . . . . . . 10.2 Photon Mapping Related . . . . . . . . . . . . . . 10.3 Hardware related . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 4 5 5 6 6 7 8 8 8 8 9 11 13 13 15 16 16 17 19 21 23 24 25 26 26 26 27 29 31 31 34 36 36 36 37 37 38

1. Introduction

1

3

Introduction

Photon mapping, as described by Jensen [35], is a technique for reconstructing the incoming light field at surfaces everywhere in a scene from sparse samples generated by light path tracing. The major contribution of photon mapping is the technique efficiently simulates global illumination and realistic caustic lighting effects1 , as exemplified by Figure 1c. The latter is something that all rendering techniques predating photon mapping fail to do effectively.

(a)

(b)

(c)

Figure 1: All three images depict a room containing three transparent spheres of various indices of refraction. Image (a) was rendered with only direct illumination, image (b) with only global illumination, and image (c) with both global illumination and caustic effects. It is obvious that photon mapping is another important step towards photorealistic rendering. With this technique, architects and interior designers can visualise their designs in a more realistic fashion. They can see how sunlight will shine inside a room, or how light from interior light fixtures interplay with shiny or glossy objects. It is also apparent that photon mapping, along with other photo-realistic rendering techniques, can benefit immensely from a real-time, hardware-assisted2 implementation. In fact the current trend is to progressively migrate rendering 1

2

Caustic lighting effects are the view-independent concentration of light caused by nondiffuse reflections or refractions. Examples include the light focus pattern generated by a magnifying glass and the light pattern at the bottom of an outdoor swimming pool. There are two categories of hardware-assisted solutions. Hardware implementation is an implementation based on custom-designed hardware for the specific purpose of the algorithm at hand, whereas an accelerator-based implementation is an implementation of the algorithm using generic facilities provided by some hardware platform, such as a graphics card.

4

1. Introduction

algorithms onto hardware platforms (for example, the recent work by Purcell et al. [54]), either through adapting algorithms for on-chip evaluation, or through major re-design of algorithms for hardware implementation. It was this trend that motivated the author to investigate the migration of photon mapping towards a hardware-assisted solution. Within the two stages of photon mapping, a potential major performance bottleneck3 is the search for the set of photons nearest to a point in the scene being shaded by the renderer. This is part of the interpolation step that joins light paths propagated from light sources with rays traced from the camera during rendering, and it is but one application of the well-studied k-nearest neighbours (kNN) problem in the field of computational geometry [17,25,59]. To speed up photon mapping, the first hurdle is to be able to optimise this kNN step, and perhaps migrate the kNN solution onto hardware. Jensen uses the kd-tree [8,9] data structure to find the kNN. However, solving the kNN problem using the kd-tree requires a search that traverses the tree. Even if the tree is balanced and stored as a heap, traversal still requires random-order memory access and some memory space to store a stack. More importantly, a nontrivial subtree pruning algorithm, based on data already examined, is required to avoid accessing all data in the tree. This introduces serial dependencies between one memory lookup and the next, and is a major flaw in view of our interest in a high-performance hardware implementation of a kNN algorithm. We therefore decided to look at alternatives to the kd-tree. Since photon mapping is already making an approximation by using kNN interpolation, we conjectured that there is actually no need to solve the kNN problem exactly, so long as visual quality is maintained. In this document we investigate the possibility of solving the approximate kNN problem instead. 1.1

Major Contributions

The algorithm we will present provides an approximate solution to the kNN (AkNN) problem, and has bounded query time, bounded memory usage, and high potential for fine-scale parallelism. Moreover, our algorithm results in coherent, nonredundant accesses to block-oriented memory. The results of one memory lookup do not affect subsequent memory lookups, so accesses can take place in parallel within a pipelined memory system. It is important to emphasise that our contribution is meant to jump-start the rethinking of how to solve the kNN problem in a hardware setting. It should not be construed as a move to usurp the use of kd-tree as a solution of the kNN problem in software. Furthermore, any photon mapping 3

In practice, techniques such as irradiance caching [14] mitigates this bottleneck.

2. The k-Nearest Neighbours Problem

5

acceleration technique that continues to rely on a form of kNN (such as irradiance caching [14]) can still benefit from our technique. Our algorithm was developed in the context of high-performance hardwarebased photon mapping, but its use is not limited to this single application. Any application that needs to solve a form of the kNN problem can benefit — some of which are listed in Section 9.

2 2.1

The k-Nearest Neighbours Problem Problem Definition

To aid the description of the prior art in the rest of this chapter, we give the mathematical definitions of the k-nearest neighbours problem (kNN) and the approximate k-nearest neighbours problem (AkNN), and discuss the tradeoff between the two problems. Definition 1 (k-Nearest Neighbour Problem) Given • an n-dimensional space D, • a distance metric d(x, y) : D×D 7→ IR, and • a data set S ⊆ D where |S| = N , the kNN problem for a query q ∈ D is defined to be the search for the neighbour set N ⊆ S, |N| ≤ min(k, |S|), such that ∀a ∈ N, b ∈ N\S, d(q, a) ≤ d(q, b). In words, the kNN problem is to find a subset N of the input data set S such that each member of N is “closer” to the query point q (according to the distance metric) than any member of N\S, the set difference between N and S. The approximate kNN algorithm differs from the exact kNN algorithm in only one way: the solution to the AkNN solution is a (1 + ) approximation of the true solution, for some error ratio > 0. Definition 2 (Approximate k-Nearest Neighbour Problem) Given • an n-dimensional space D, • a distance metric d(x, y) : D×D 7→ IR, and • a data set S ⊆ D where |S| = N , the AkNN problem for a query q ∈ D is defined to be the search for the neighbour set N ⊆ S, |N| ≤ min(k, |S|), such that ∀a ∈ N, b ∈ N\S, d(q, a) ≤ (1 + )d(q, b), for some > 0.

6

2. The k-Nearest Neighbours Problem

Observe that Definition 2 implies that some members of N may be farther away from the query point than members of N\S by some defined error ratio > 0. In other words, the approximation dilates the bounding n-dimensional spheroid that surrounds the nearest neighbours. Each AkNN algorithm has its own , usually dependent upon N , k, or some other parameter particular to the technique. More importantly, it is impossible for an approximate solution to the kNN problem to err in a way such that the bounding spheroid is contracted. This is because the exact solution offers the tightest bounding volume, and any approximation can only increase the bounding radius. The justification for such error is a possible increase in the performance of the search process, be it a speedier search, or one that consumes less memory. Indeed, while the research of Borodin et al. [11] and Chakrabarti et al. [13] supports the claim of a lower bound for the exact kNN problem being exponential in domain space dimensionality, work on the AkNN problem by Indyk and Motwani [31], Kleinberg [38], and Kushilevitz et al. [41] demonstrates techniques that take time polynomial in dimension and polylogarithmic in the size of the data set. In the photon mapping setting, the domain space in Definitions 1 and 2 is the 3D Euclidean space IR3 and the distance metric d is the Euclidean distance between points in IR3 . However, in general the domain space D does not have to be an Euclidean space. Moreover, the distance metric d(x, y) does not have to be the Euclidean distance, even if D is an Euclidean space. For example, one method of lossy compression is to perform vector quantisation of the raw data, and then, by comparing these feature vectors to a database of known patterns, one can translate features in the raw data into code words that enumerate known patterns [23]. 2.2

Previous Work

Any non-trivial algorithm that claims to be able to solve the kNN problem faster than brute-force does so by reducing the number of candidates that have to be examined when computing the solution set. Algorithms fall into the following categories: recursive spatial subdivision, point location, neighbourhood graphs, and hashing. Recursive Spatial Subdivision Simply put, recursive spatial subdivision works by partitioning the sample set recursively based on the spatial location of each sample. Then, all samples within a partition can be categorically rejected during the kNN query when the group is deemed farther away from the query point then the current set of kNN candidates.

2.2. Previous Work

7

Amongst algorithms in the recursive spatial subdivision category, the kd-tree [8] method is the approach commonly used to solve the kNN problem [21,66]. An advantage of the kd-tree is that if the tree is balanced it can be stored as a heap in a single array, avoiding the use of pointers and memory for a separate index structure. While it has been shown that kd-trees have optimal expected-time complexity [9], in the worst case finding the k nearest neighbours may require an exhaustive search of the entire data structure. The tree is searched by recursive descent. This requires a stack the same size as the depth of the tree. During the recursion, a choice is made of which subtree to search next based on a test at each internal node. This introduces a dependency between one memory access and the next. This last factor makes it hard to map this algorithm into high-latency pipelined memory accesses. Much work has been done to find methods to optimise the kd-tree method of solving the kNN and AkNN problems. Christensen described an iterative traversal method for a heap-based kd-tree used in photon mapping [37] and claimed a 25% performance improvement, under ideal conditions, over the recursive method. Vanco et al. [70] devised a scheme that employs a “pseudo kd-tree” [52] with hash tables as leaf nodes, the latter helping to limit the number of candidates examined at each leaf node. Havran [27] analysed a novel memory mapping method designed to improve spatial locality of data stored in binary trees, thus speeding up tree traversal algorithms. Sample et al. [62] proposed a kd-tree search algorithm that combines optimal depth-first branch and bound with a new path-ordering and pruning method. Many other recursive subdivision-based techniques have also been proposed for the kNN and AkNN problems, including kd-B-trees [55], BBD-trees [7], BAR-trees [16], Principal-Axis Trees [47], the R-tree family of data structures, [40,58], and ANN-trees [44]. Unfortunately, all schemes based on recursive search over a tree share the same memory dependency problem as the kd-tree. Graph-based Techniques The second category of techniques are based on building and searching graphs that encode sample-adjacency information. The randomised neighbourhood graph approach [5] builds and searches an approximate local neighbourhood graph. Eppstein et al. [18] investigated the fundamental properties of a nearest neighbour graph. Jaromczyk and Toussaint surveyed data structures and techniques based on Relative Neighbourhood Graphs [34]. Graph-based techniques tend to have the same difficulties as tree-based approaches: searching a graph also involves stacks or queues, dependent memory accesses, and pointerchasing unsuited to high-latency pipelined memory access.

8

3. Block Hashing: Preliminaries

Point Location Voronoi diagrams can be used for optimal 1-nearest neighbour searches in 2D and 3D [17]. This and other point-location based techniques [25] for solving nearest neighbour problems do not need to calculate distances between the query point and the candidates, but do need another data structure (like a BSP tree) to test a query point for inclusion in a region. Hashing-based Techniques Hashing approaches to the kNN and AkNN problems have recently been proposed by Indyk et al. [31,32] and Gionis et al. [24]. These techniques have the useful property that multi-level dependent memory lookups are not required. The heart of these algorithms are simple hash functions that preserve spatial locality, such as the one proposed by Linial and Sasson [45], and Gionis et al. [24]. We base our technique on the latter, and shall give details about this choice and describe related issues in the next chapter. The authors also recognise recent work by Wald et al. [71] on real-time global illumination techniques where a hashing-based photon mapping technique was used. Surveys on kNN Research Finally, numerous surveys and books provide an overview of the “Nearest Neighbour(s)” family of problems and catalogues the numerous techniques and data structures developed to solve these problems. Notable ones includes surveys by Agarwal et al. [2,3], Arya and Mount [6], Smid [65], Gaede and G¨ unther [22], and Tsaparas [68], as well as books edited by Goodman et al. [25], and Sack et al. [59]

3

Block Hashing: Preliminaries

We have developed a novel technique called block hashing to solve the approximate kNN (AkNN) problem in the context of, but not limited to, photon mapping. As its name implies, our algorithm uses hash functions to categorise photons by their positions. Then, a kNN query can be performed by deciding which hash bucket is matched to the query point and retrieving the photons contained inside the hash bucket for analysis. One attraction of the hashing approach is that evaluation of hash functions takes constant time. In addition, once we have the hash value, accessing data we want in the hash table takes only a single access. These advantages permit us to avoid operations that are serially dependent on one another, such as pointer-chasing or subtree pruning using previously examined data, and are major stepping stones towards a low-latency shader-based implementation. Furthermore, our technique is designed under two assumptions on the behaviour of memory systems in (future) accelerators. First, we assume that memory is allocated in fixed-sized blocks (which, coincidentally, makes possible constant-time

3.1. Locality-Sensitive Hashing

9

dynamic memory allocation and deallocation without the problems of fragmentation). Second, we assume that access to memory is via burst transfer of blocks that are then cached. Under this assumption, if any part of a fixed-sized memory block is “touched”, access to the rest of this block will be virtually zero-cost. This is typical even of software implementations on modern machines which rely heavily on caching and burst-mode transfers from SDRAM or RDRAM. In a hardware implementation with a greater disparity between processing power and memory speed, using fast block transfers and caching is even more important. Due to these benefits, in Block Hashing all memory used to store photon data is broken into fixed-sized blocks. The rest of this section shall examine these topics in detail. 3.1

Locality-Sensitive Hashing

Since our goal is to solve the kNN problem as efficiently as possible in a blockoriented cache-based context, our hashing technique requires hash functions that preserve spatial neighbourhoods. Locality-preserving hash functions take points that are close to each other in the domain space and hash them close to each other in hash space. By using such hash functions, photons within the same hash bucket as a query point can be assumed to be close to the query point in the original domain space. Consequently, these photons are good candidates for the kNN search. More than one such scheme is available; we chose to base our technique on the LocalitySensitive Hashing (LSH) algorithm proposed by Gionis et al. [24], but have added several refinements (which we describe in Section 4). The hash function in LSH groups one-dimensional real numbers in hash space by their spatial location. It does so by partitioning the domain space and assigning a unique hash value to each partition. Mathematically, let T = {ti | 0 ≤ i ≤ P } be a monotonically increasing sequence of P +1 thresholds between 0 and 1. Assume t0 = 0 and tP = 1, so there are P − 1 degrees of freedom in this sequence. Define a one-dimensional Locality-Sensitive Hash Function hT : [0, 1] → {0 . . . P − 1} to be hT (t) = i, ti ≤ t < ti+1 . In other words, the hash value i can take on P different values, one for each “bucket” defined by the threshold pair [ti , ti+1 ). An example is shown in Figure 2.

10


t0

t1

t2

t3

t4

Figure 2: An example of hT . The circles and boxes represent values to be hashed, while the vertical lines are the thresholds in T . Since the boxes lie between thresholds t1 and t2 , the values they represent are hashed to 1 by this particular hT . The function hT can be interpreted as a monotonic non-uniform quantisation of spatial position, and is characterised by P and the sequence T . It is important to note that hT gives each partition of the domain space delineated by T equal representation in hash space. Depending on the location of the thresholds, hT will contract some parts of the domain space and expand other parts. If we rely on only a single hash table to classify a data set, a query point will only hash to a single bucket within this table, and the bucket may represent only a subset of the true neighbourhood we sought. Therefore, multiple hash tables with different thresholds are necessary for the retrieval of a more complete neighbourhood around the query point (See Figure 3.)

hT1 t0

t1

t2

t3

t4

hT2 t0

t1

t2

t3

t4

hT3 t0

t1

t2

t3

t4

Figure 3: An example of using multiple hash functions to classify a dataset. The long vertical line represents the query value. Combining results from multiple hash tables with different thresholds facilitates the retrieval of a more complete neighbourhood from hash space. To deal with n-dimensional points, each hash table will have one hash function per dimension. Each hash function generates one hash value per coordinate of the point (See Figure 4.) The final hash value is calculated by: n−1 X i=0

hi P i ,

3.2. Block-Oriented Memory Model

11

where hi are the hash values and P is the number of thresholds. In other words, each hash value is treated as a digit in base-P , and our final hash value is taken as a base-P number with n digits, and it is converted to a decimal number before it is used. If P were a power of two, then this amounts to concatenating the bits. 4 The only reason we use the same number of thresholds for each dimension is simplicity. It is conceivable that a different number of thresholds could be used for each dimension to better adapt to the data. We defer the discussion of threshold generation and query procedures until Sections 4.2 and 4.4, respectively. hy

P

hx(P.x) = 00b hy (P.y) = 01b

Final hash value = 0100b = 4

hx

Figure 4: Using two hash functions to handle a 2D point. Each hash function will be used to hash one coordinate. LSH is very similar to Nievergelt’s grid file [29,50]. However, the grid file was specifically designed to handle dynamic data, whereas LSH must be modified in order to do the same. Also, the grid file is more suitable for range searches than it is for solving the kNN problem. 3.2

Block-Oriented Memory Model

It has been our philosophy that hardware implementations of algorithms should treat off-chip memory the same way software implementations treat disk: as a relatively slow, “out-of-core”, high-latency, block-oriented storage device. This analogy implies that algorithms and data structures originally developed for disk, or even tape, drives are potentially applicable to hardware design. It also drove us to employ fixed-sized blocks to store the data records involved in the kNN search algorithm, which are photons in the context of this application. In our prototype software implementation of Block Hashing, each photon is stored in a structure similar to Jensen’s “extended” photon representation [35]. As 4

We could also interleave the bits for greater spatial coherence; however, buckets are usually big enough to fill a cache line.

12


shown in Figure 5, each component of the 3D photon location is represented by a ˆ and 32-bit fixed-point number. The unit vectors representing incoming direction d ˆ are quantised to 16-bit values using Jensen’s polar representation. surface normal n Photon power is stored in four channels using sixteen-bit floating point numbers (following IEEE 754 conventions, but using an 8-bit mantissa, a 7-bit exponent, and a 1-bit sign). This medium-precision signed representation permits other AkNN applications beyond that of photon mapping. Four (rather than three) 16-bit colour channels are also included to better match the four-vectors supported in fragment shaders. For the photon mapping application specifically, replacing these four 16-bit colour values with a colour represented in the Ward RGBE format [72] is certainly compatible with our technique. Likewise, in a shader implementation, we might use a different representation for the normal and direction unit vectors.

1

32−bit x

2

y

3 4

z n c1

d

c3 16−bit

c4

5 6

c2

Figure 5: Representation of a photon record. The 32-bit values (x, y, z) denote the ˆ n ˆ position of a photon and are used as keys. Two quantised 16-bit unit vectors d, and four 16-bit floating point values are carried as data. After all photons are generated, their records are stored in fixed-sized memory blocks. Photon records are not permitted to span block boundaries, so the number of records that fit in a block depends on the size of each record and the size of the block. For instance, our photon representation occupies six 32-bit words. Block Hashing uses a 64 32-bit-word block size, chosen to permit efficient burstmode transfers over a wide bus to transaction-oriented DRAM5 . Using a 128-bit wide path to DDR6 SDRAM7 , for instance, transfer of this block would take eight cycles, not counting the overhead of command cycles to specify the operation and the address. Using next-generation QDR8 SDRAM this transfer would take only four cycles (or eight on a 64-bit bus, etc.) 5 6 7 8

Dynamic Random Access Memory Double Data Rate Synchronous DRAM [1] Quad Data Rate

4. Block Hashing: Details

13

Ten photons will fit into a 64-word block with four words left over. Some of this extra space is used in our implementation to record how many photons are actually stored in each block. For some variants of the data structures we describe, this extra space could also be used for flags or pointers to other blocks. It might be possible or desirable in other implementations to support more or fewer colour channels, or channels with greater or lesser precision, in which case some of our numerical results would change.

4

Block Hashing: Details

Block Hashing (BH) contains a preprocessing phase and a query phase. The preprocessing phase consists of three steps. After photons have been traced into the scene, the algorithm 1. organises the photons into fixed-sized memory blocks, 2. creates a set of hash tables, and 3. inserts photon blocks into the hash tables. In the second phase, the hash tables will be queried for a set of candidate photons from which the k nearest photons will be selected for each point in space to be shaded by the renderer. 4.1

Organising Photons into Blocks

Due to the coherence benefits associated with block-oriented memory access, Block Hashing starts by grouping photons and storing them into fixed-sized memory blocks. However, the benefits of accessing groups of photons together are only obtained if the photons within a group are close together spatially. To see why, recall that a working set is maintained in the search process within any kNN algorithm. This set contains the best candidates (i.e., closest neighbours) found by the algorithm so far, and a bounding volume that encloses the data in the working set. An incoming data point is examined by comparing its distance from the query point with the size of the bounding volume; if the incoming point is inside the current bounding volume, then it will replace the farthest candidate within the working set, thus shrinking the bounding volume. When working with blocks of data points, a block is examined if the bounding volume of the data inside a block intersects with the bounding volume of the working set. A block whose data are close together has a smaller bounding volume, so these blocks have a higher chance of being pruned at later stages of a query when the bounding volume of the working set is small. On the other hand, a block with

14


a large bounding volume will intersect a larger variety of working sets, implying a lower chance of being pruned, thus slowing down the query.

(a)

(b)

Figure 6: (a) The new block (squares within the dashed rectangle) contains spatially coherent points, which have a higher probability to be inside the bounding sphere of the current kNN set, and are apt to shrink the bounding sphere. (b) This is less probable with a block having highly variant points. Also, the data points within a selected block in the kNN search have higher chances of being “good quality” – the points are within a tighter range of distances from the query point (Figure 4.1a). Conversely, if the selected block has highly variant points, these points have a higher probability of being far away from the query point (Figure 4.1b). As such, these points are rejected more often, which leads to the need to examine more blocks, which slows down the query. Hence, it is important that the photons be sorted spatially before grouping. For this we turn to spatially coherent space-filling curves.

4.1. Organising Photons into Blocks

15

Space-Filling Curves A space-filling curve [60] maps n-dimensional points in the box [0, 1]n onto a one-dimensional curve, and provides a one-dimensional order to all points on this curve. The curve itself is continuous, and never crosses itself. In particular, we will employ the Hilbert curve [20,28,64]. An approximation of the 2D Hilbert curve is shown in Figure 7a.

(a) Hilbert Curve

(b) Z−order Curve

Figure 7: Space-filling curve approximations. Finding the quantised arc-length distance s along a Hilbert curve from quantised spatial positions (x, y, z) and vice-versa are relatively straightforward operations. In 2D, these conversion algorithms can be represented using a reversible state machine with only four states [10,46]. In 3D, the conversion is slightly more complex [12,49], but it can still be implemented reasonably efficiently. The advantage of the Hilbert curve encoding of position is that points mapped near each other on the Hilbert curve are guaranteed to be within a certain distance of each other in the original domain [33,69]. Points nearby in the original domain space have a high probability of being nearby on the curve, although there is a non-zero probability of them being far apart on the curve. If we sort photons by their Hilbert curve order before packing them into blocks, then the photons in each block will have a high probability of being spatially coherent. Each block then corresponds to an interval of the Hilbert curve, which in turn covers some compact region of the domain (see Figure 10a). Each region of domain space represented by the blocks is independent, and regions do not overlap.

16


Other space-filling curves or orderings could be used besides the Hilbert curve. A particularly interesting one is the bit-interleaved or “Z-order” curve [51] (also referred to as the Morton curve [48].) This curve is shown Figure 7b. Unlike the case with the Hilbert curve, points nearby on the Z-order curve are not guaranteed to be nearby in space. With the Z-order curve there are periodic “jumps” in position, with longer jumps having an exponentially decreasing probability of occurring. The main advantage of the bit-interleaving scheme is that it can be implemented in hardware at zero cost; even in a software implementation Z-order encoding is simpler and faster than Hilbert encoding. In fact, the Hilbert conversion algorithms include bit-interleaving as a subtask. Ultimately, the use of Hilbert curves are favoured over Z-order curves because of the ability of the former in preserving spatial coherency. Henceforth, we will use the “Hilbert” terminology exclusively, with the understanding that other spatially coherent 1D orderings of multidimensional sample positions could be used. In particular, Block Hashing is compatible with the Zorder curve. Sorting and Grouping the Photons Block Hashing sorts photons and inserts them into a B+ -tree [15,39] using the Hilbert curve encoding of the position of each photon as the key. This method of spatially grouping points was first proposed by Faloutsos and Rong [19] for a different purpose. Since a B+ -tree stores photon records only at leaves, therefore with careful construction the leaf nodes of the B+ tree can serve as the photon blocks used in the later stages of Block Hashing. One advantage of using a B+ -tree for sorting is that insertion cost is bounded: the tree is always balanced, and in the worst case we may have to split h nodes in the tree, when the height of the tree is h. Also, B+ -trees are optimised for block-oriented disk-space storage, as we have assumed. Compaction of blocks The B+ -tree used by Block Hashing has index and leaf nodes that are between half full to completely full. To minimise the final number of blocks required to store the photons, the leaf nodes are compacted (see Figure 8.) After the photons are sorted and compacted, the resulting photon blocks are ready to be used by Block Hashing, and the B+ -tree index and any leaf nodes that are made empty by compaction are discarded. If the complete set of photons is known a priori, the compact B+ -tree [56,57] for static data can be used instead. This data structure maintains full nodes and avoids the extra compaction step. Regardless, observe that each photon block contains a spatially clustered set of photons disjoint from those contained in other blocks. This is the main result we

4.2. Creating the Hash Tables

17

(a)

Index node Empty cell in leaf node Occupied cell in leaf node

(b)

Figure 8: Compaction of photon blocks. (a) The B+ -tree when every photon have been inserted. Most of the leaf nodes have empty cells, but are all at least half full. (b) All the photon records have been shifted towards one end and packed into the leaf nodes. After compaction, empty nodes are discarded along with the rest of the B+ -tree index, while non-empty nodes become photon blocks. are after; any other data structures that can group photons into spatially-coherent groups, such as grid files [29,50], can be used in place of the B+ -tree and space-filling curve. Other techniques could of course be used for sorting. Another approach would be to store photons sequentially in a linked list of blocks as they are generated, and then sort the photons using a sequential-access algorithm, such as the ones originally developed for tape drives [39]. 4.2

Creating the Hash Tables

The hash tables used in Block Hashing are based on the LSH scheme described in Section 3.1. Block Hashing generates L tables in total, each table having three hash functions (since photons are being classified by their 3D positions), and each hash function having P + 1 thresholds. There are many ways to generate the threshold set T . The simplest way is to generate them randomly. To improve on purely random thresholds slightly, P − 1 stratified samples can be taken over the range of [0, 1) in order to have some guarantee that thresholds are distributed over the entire range. However, preliminary results obtained by using purely random thresholds and stratified random thresholds proved abysmal. We conjecture that having random thresholds between different hash tables do not give adequate overlap and variety of domain-space partition to achieve desirable results. The next attempt was to use compression waves: ti = ai +

b sin(ωi + θj ), 2π

18


1 , P 2π ω = , P 6πj θj = J a =

where j identifies the jth hash function in each hash table, J is the number of hash functions in each hash table, which is also the dimensionality of the domain space (J = 3 in our case,) and i gives the ith threshold in the threshold set T . The parameter a has been set up to give P thresholds in a period over [0, 1), with ωj changing their phase. The only free parameter is b, which must lie in the range between 0 and 1. Choosing b = 0 would give a uniform subdivision, but with varying phases; such a selection of thresholds would be useful if the sample density were uniform, and yields partitions of equal size. If b = 1, then the buckets would vary widely in size; this would correspond to the case when the density of photons vary over the domain space, such as the case with caustic photons. Unfortunately, it turns out that it is very hard to control b: it is infeasible to determine the variance of density to a high enough degree to decide accurately which b value to use. Even if the effort was spent to calculate a good b value, the compression waves cannot be made to capture the true photon density easily. Block Hashing employs an adaptive method that generates the thresholds based on the photon positions. For each dimension, a one-dimensional histogram of photon positions is built during the compaction scan of the previous preprocessing step. Then, the histogram is integrated to obtain a cumulative distribution function (cdf ). Lastly, stratified samples are taken from the inverse of the cdf to obtain threshold locations. The resulting thresholds will be far apart where there are few photons, and close together where photons are numerous. Ultimately this method attempts to have a similar number of photons into each bucket. The cost of building the cdf s is O(nN ), where n is the dimensionality of the domain space, and N is the number of photons. Once the cdf s are built, they are reused for all L hash tables. For each hash function in each hash table, P −1 samples are taken from the inverted cdf, taking O(P log C) time, where C N is the number of buckets in the original histogram. Therefore, generating all thresholds needed for Block Hashing requires O(nN + nLP log C) time. Hash tables are stored as a one-dimensional array structure, shown in Figure 9. The hash key selects a bucket out of the P n available buckets in the hash table. Each bucket refers up to B blocks of photons, and has space for some record-keeping data. We defer the discussion on the choice of P , L and B until Section 5.

4.3. Inserting Photon Blocks

19

B

Priority V V V

V

Figure 9: Hash tables are linear arrays of P n buckets. Each bucket has B references to photon blocks, a validity flag per reference, and storage for a priority. 4.3

Inserting Photon Blocks

In Block Hashing, references to entire photon blocks, rather than individual photons, are inserted into the hash tables. One reason for doing so is to reduce the memory required per bucket. Another, more important, reason is that when merging results from multiple hash tables (Section 3.1), Block Hashing needs to compare only block addresses instead of photons when weeding out duplicates as each block contains a unique set of photons. This means fewer comparisons have to be made and the individual photons are only accessed once per query, during post-processing of the merged candidate set to find the k nearest photons. Consequently, the transfer of each photon block through the memory system happens at most once per query. All photons accessed in a block are potentially useful contributions to the candidate set, since photons within a single block are spatially coherent. Due to our memory model assumptions, once we have looked at one photon in a block it should be relatively inexpensive to look at the rest. Each bucket in a hash table corresponds to a rectangular region in a non-uniform grid as shown in Figure 10b. Each block is inserted into the hash tables once for each photon within that block, using the position of these photons to create the keys. Each bucket of the hash table refers to not only the photons that have been hashed into that bucket, but also the compact region of the domain that the photon blocks cover, and all the rest of the photons that are included in those blocks (see Figure 10c.) Indeed, the axes of the Hilbert curve are aligned with those of the thresholds

20


(a)

(b)

(c)

Figure 10: Block hashing illustrated. (a) Each block corresponds to an interval of the Hilbert curve, which in turn covers some compact region of the domain. Consequently, each bucket (b) represents all photons (highlighted with squares) in each block with at least one photon hashed into it (c). in the hash tables. Conceivably the cracks in the Hilbert curve9 could coincide with one of the thresholds and thus cause visual artifacts. A small rotation of the Hilbert curve space relative to the principle axis can address this. The hash table should fit tightly around the scene to avoid wasting space in it. We enlarged the space of Hilbert-quantised coordinates instead to avoid having the corners of the scene clipped (and hence not encoded by the Hilbert curve.) Since each photon block is inserted into a hash table multiple times, using different photons as keys, a block may be hashed into multiple buckets in the same hash table. Of course, a block should not be inserted into a bucket again if it already exists in that bucket of the hash table. More importantly, our technique ensures that each block is inserted into at least one hash table. Orphaned blocks are very undesirable since the photons within will never be considered in the subsequent AkNN evaluation and will cause a constant error overhead. Hence, our technique does not na¨ıvely drop a block that causes a bucket to overflow. There may be collisions that cause buckets to overflow, especially when a large bucket load factor is chosen to give a compact hash table size, or there exists a large variation in photon density. Our algorithm uses two techniques to address this problem. The first technique attempts to insert every block into every hash table, but in different orders on different hash tables, such that blocks that appear earlier in the ordering are not favoured for insertion in all tables. Block Hashing 9

Places where points close to each other in domain space are encoded to far away points on the Hilbert curve, such as in the centre of the Hilbert curve.

4.4. Querying

21

uses a technique similar to disk-striping [61], illustrated by the pseudo code in Figure 11. for h from 0 to (number_of_hash_tables-1) for b from 0 to (number_of_blocks-1) idx = (h+b) modulo L insert block[b] into hashtable[idx] endfor endfor 0

1

2

0

1

2

3

4

5

0

1

2

0

1

2

3

4

5

0

1

2

0

1

2

3

4

5

Photon Block Hash Table Bucket

1st iteration

2nd iteration

3rd iteration

Figure 11: Striping insertion strategy The diagram in the same figure serves as an example. For hash table zero, it receives blocks 0 and 3 in the first iteration, then 1 and 4 in the second, and so on; hash table one receives blocks 1 and 4 first, then 2 and 5, with 0 and 3 being last. The second technique involves a strategy to deal with overflow in a bucket. For each photon block, Block Hashing keeps the count of buckets that the block has been hashed into so far. When a block causes overflow in a bucket, the block in the bucket that has the maximum count will be bumped if that count is larger than one, and larger than that of the incoming block. This way we ensure that all blocks are inserted into at least one bucket, given adequate hash table sizes, and no block is hashed into an excessive number of buckets. 4.4

Querying

Recall from Sections 3.1 and 4.2 that there are multiple hash tables that serve as parallel and complementary indices to the photon data. A query into the Block Hashing data structure proceeds by delegating the query to each of the L hash tables. These parallel accesses will yield as candidates all photon blocks represented by buckets that matched the query. The final approximate nearest neighbour set comes from scanning the unified candidate set for the nearest neighbours to the query point (see Figure 12.) Note that unlike kNN algorithms based on hierarchical data structures, where candidates for the kNN set trickle in as the traversal

22


progresses, in Block Hashing all candidates are available once the parallel queries are completed, and unique photon blocks are extracted. Therefore, Block Hashing can use algorithms like selection [43] that has O(n) time complexity when selecting the k nearest photons, instead of algorithms such as priority queue that has time complexity O(n log k).

(a)

(b)

(c) Data point Query point Matched point

Figure 12: A 2D example of merging the results from multiple hash tables. (a) shows the query point returning different candidates sets from different hash tables, (b) shows the union set after merging the results, and (c) shows the best two neighbours selected from the combined candidate set. Each query will retrieve one bucket from each of the L hash tables. If the application can tolerate elevated inaccuracy in return for increased speed of query (for example, to pre-visualise a software rendering), it may be worthwhile to consider using only a subset of the L candidate sets. Block hashing is equipped with a userspecified accuracy setting: Let A ∈ IN be an integer multiplier. The block hashing algorithm will only consider Ak candidate photons in the final scan to determine the k nearest photons to a query. Obviously the smaller A is, the fewer photons will be processed in the final step; as such, query time is significantly reduced, but with an accuracy cost. Conversely, a higher A will lead to a more accurate result, but it will take more time. Experimental results that demonstrate the impact of the choice of A will be explored in Section 6.

4.5. Post Processing of Candidate Set

23

There needs to be a way to select the buckets from which the Ak candidate photons are obtained. Obviously, we want to devise a heuristic to pick the “best” candidates. Suppose every bucket in every hash table has a priority given by α = |bucket capacity − #entries − #overflows| where “#overflows” is the number of insertion attempts after the bucket became full. The priority can be pre-computed and stored in each bucket of each hash table during the insertion phase. The priority of a bucket is smallest when the bucket is full and did not overflow. Conversely, when the hash bucket is underutilised or overflow has occurred, α will be larger. If a bucket is underutilised, it is probably too small spatially (relative to local sample density). If it has experienced overflow, it is likely too large spatially, and covers too many photon block regions. During each query, Block Hashing will sort the L buckets returned from the hash tables by their priority values, smallest values of α first. Subsequently, buckets are considered in this order, one by one, until the required Ak photons are found. In this way the more “useful” buckets will be considered first. 4.5

Post Processing of Candidate Set

Once the candidate set has been found, we must select from it the samples most likely to be the true kNN set. The first test is to compute the distance from each sample in the candidate set to the query point and then discard all but the closest k samples. This can be done by sorting on the squared distances. ˆ at the query point. Any incoming sample The second test uses the normal n ˆ î · n ˆ < 0, can be discarded. We with a direction di “below the horizon”, with d ˆ i of the surface at which each sample arrives. By discarding also store the normal n î · n ˆ < θ for some threshold θ we can avoid including light samples for which n energy from photons that might have landed on a nearby surface with a significantly different orientation. Finally, for surfaces with near-specular BRDFs, we may even want to add another test to discard samples that arrived from a direction significantly different from the direction of the reflection vector. We can either apply the latter tests during the insertion into a distance-sorted list of k neighbours or afterwards. Both tests require a linear scan through the candidates to make the necessary calculations. However, the normal and orientation tests takes less time to perform than the distance test, and if done first these tests eliminate candidate photons that are of poor quality in the context of photon mapping before the nearest photons are found. Indeed, our experiments show that if the normal and orientation tests are done before the rejection by distance, Block

24

5. Choice of Parameter Values

Hashing is more likely to be able to extract a set of “better” neighbour photons from the candidate set.

5

Choice of Parameter Values

Block Hashing is a scheme that requires several parameters: B, the bucket capacity; L, the number of hash tables whose results are merged; and P , the number of thresholds per dimension. We would like to determine reasonable values for these parameters as functions of k, the number of nearest neighbours sought, and N , the number of photons involved. It is important to realise the implications of these parameters. The total number of 32-bit pointers to photon blocks is given by LP 3 B. Along with the number of thresholds 3LP , define the memory requirement by the Block Hashing data structure alone, in addition to the memory required by the photon records themselves. The upper bound for this value is 6N , the number of photons multiplied by the six 32-bit words each photon takes up in our implementation. If we allow B to be a fixed constant for now, the constraint LP 3 + 3LP ≤ N arises from the reasonable assumption that we do not want to have more references to blocks than there are photons, or more memory used in the index√ than in the data. One way to look at it is that L and P should be no more than √4 N . In fact, L and P should take on a much smaller value to be practical, since 4 N implies a 100% memory overhead for using Block Hashing in photon mapping! Empirically, L = P = ln N has turned out to be a good choice. ln N remains sub-linear as N increases, and this value gives a satisfactory index memory overhead ratio: There are a total of B(ln N )4 block references. Being four bytes each, the references require 4B(ln N )4 bytes. With each hash table, there needs to be 3LP = 3(ln N )2 thresholds. Represented by a 4-byte float each, the thresholds take another 12(ln N )2 bytes. Next, assuming one photon block can hold ten photons, N photons requires N/10 blocks; each block requires 64 words, so the blocks require 25.6N bytes in total. The total memory required for N photons, each occupying 6 words, is 24N bytes. This gives an overhead ratio of 4B(ln N )4 + 12(ln N )2 + 25.6N − 24N . 24N

(1)

The choice of B is also dependent on the value of k specified by the situation or the user. However, since it is usual in photon mapping that k is known ahead of time, B can be set accordingly. B should be set such that the total number of photons retrieved from the L buckets for each query will be larger than k.

6. Results

25

Mathematically speaking, each photon block in our algorithm has ten photons, hence 10LB k. In particular, 10LB > Ak should also be satisfied. Since we choose L = ln N , rearranging the equation yields: Ak B> 10 ln N For example, assuming A = 16, N = 2000000, k = 50, then dBe = 6. If we substitute B back into Equation 1, we obtain the final overhead equation 4 Ak (ln N )3 + 12(ln N )2 + 1.6N 10 . (2) 24N Figure 13 plots the number of photons versus the memory overhead. For the usual range of photon count in a photon mapping application, we see that the memory overhead, while relative large for small numbers of photons, becomes reasonable for larger numbers of photons, and has an asymptote of about 6%. Of course, if we assumed different block size (cache line size), these results would vary, but the analysis is the same.

Memory Overhead Ratio (%)

30

A=16 A=8 A=4

25 20 15 10 5

0.25

0.50

0.75 1.00 1.25 Number of Photons

1.50

1.75

2.00

Figure 13: Plot of photon count vs. memory overhead incurred by Block Hashing, assuming k = 50.

6

Results

For Block Hashing to be a useful AkNN algorithm, it must have satisfactory algorithmic accuracy. Moreover, in the context of photon mapping, Block Hashing must also produce good visual accuracy. This section will demonstrate that Block Hashing satisfies both requirements, while providing a speed-up in terms of the time it takes to render an image, even in a software implementation (which is, however, not our main target).

26

6.1

6. Results

Metrics for Gauging Algorithm Performance

To measure algorithmic accuracy, our renderer was rigged to use both the kdtree and Block Hashing based photon maps. For each kNN query the result sets were compared for the following metrics: False negatives: the number of photons that Block Hashing incorrectly identified as not in the kNN set. Maximum distance dilation: the ratio of the distances between the query point and the “farthest” neighbour reported by the two algorithms. Average distance dilation: the ratio of the average distances between the query point and each of the nearest neighbours reported by the two algorithms. While the first two metrics are gauges for algorithm correctness, the last two are specifically designed for photon mapping to see how the approximation affects the performance of the technique. To gauge visual accuracy, we calculate a Caustic RMS Error metric, which compares the screen space radiance difference between the caustic radiance values (only) obtained from kd-tree and Block Hashing. A timing-related Time Ratio metric is calculated as a ratio of the time taken for a query into the Block Hashing data structure versus that for the kd-tree data structure. Obviously, as the accuracy parameter A increases, the time required for photon mapping using Block Hashing approaches that for a kd-tree based photon mapping. 6.2

Results

The rest of this section will present different scenes with which we have tested Block Hashing. Each presentation includes a physical description of the scene and what we wish to test in each scene. This is followed by the comparison of the rendered images from the stock kd-tree-based solution and those from the Block Hashing based solution. Finally, the algorithmic accuracy and timing parameters are plotted. Test Scene 1: Ring Our first test scene, shown in Figure 14 with numerical results in Figure 15, consists of a highly specular ring placed on top of a nonLambertian plane (in fact, the surface was given a Modified Phong [42] reflectance model). This scene tests the ability of Block Hashing to handle a caustic of varying density, and a caustic that has been cast onto a non-Lambertian surface.

6.2. Results

27

(a) kd-tree

(b) BH, A=4

(c) BH, A=16

(d) BH, A=8

Figure 14: “Ring” image Test Scene 2: Venus with Ring Figure 16 shows a second scene consisting of the Venus bust, with a highly specular ring placed around the neck of Venus. Figure 17 shows the numerical statistics of this test scene. The ring casts a cardioid caustic onto the (non-planar) chest of the Venus. This scene demonstrates a caustic on a highly tessellated curved surface. Global illumination is also used for both scenes, however the times given are only for the query time of the caustic maps.

28

6. Results

1.6

False−

0.8 0.6 0.4 0.2 0

Max Radius Dilation Avg Radius Dilation

1.5 Dilation Ratio

Average #errors

1

1.4 1.3 1.2 1.1

0

2

4

6 8 10 12 14 16 18 20 Accuracy Setting (A)

1

0

2

4

False Negatives/Positives

RMS error

0.015 0.01 0.005

Timing Ratio

1.2 Time Ratio

Radiance RMS Error

Radius Dilation

1.4

0.02


1 0.8 0.6 0.4 0.2

0

0

2

4 6 8 10 12 14 16 18 20 Accuracy Setting (A)

0

0

2

4

Timing Ratio

6 8 10 12 14 16 18 20 Accuracy Setting (A) Timing Ratio

Figure 15: “Ring” statistics

6.3. Discussion

(a) kd-tree

29

(b) BH, A=16

(c) BH, A=8

(d) BH, A=4

Figure 16: “Venus with Ring” images 6.3

Discussion

The general trend to notice is that for extremely low accuracy (A) settings the visual and algorithmic performance of Block Hashing is not very good, as noted by the elevated radiance RMS error and distance dilation. The candidate set is simply not large enough in these cases. However, as A increases, these performance indicators drop to acceptable levels very quickly, especially between values of A between 2 and 8. After A = 8 “diminishing returns” sets in, and the increase in accuracy incurs a significant increase in the cost of the query time required. This numerical error comparison is parallelled by the visual comparison of the images: the image rendered with A = 8 and A = 16 are virtually indistinguishable, as indicated by the small difference in radiance RMS error. These results suggests that intermediate values of A, between 8 to 10, should be used as a good compromise between query speed and solution accuracy. It is apparent from the query time ratio plots that there exists a close-to-linear relationship between values of A and time required for a query into Block Hashing. This is consistent with the design of the A parameter; it corresponds directly to the number of photons accessed and processed for each query. Another important observation that can be made from the visual comparisons is that images with greater approximation value A look darker. This is because

6. Results

4 3.5 3 2.5 2 1.5 1 0.5 0

1.1

False−

Max Radius Dilation Avg Radius Dilation

1.08 Dilation Ratio

Average #errors

30

0

2

4

1.06 1.04 1.02 1


0

2

4

False Negatives/Positives

Radius Dilation

1.2

RMS error

0.04 0.03 0.02 0.01 0

Timing Ratio

1 Time Ratio

Radiance RMS Error

0.05


0.8 0.6 0.4 0.2

0

2

4 6 8 10 12 14 16 18 20 Accuracy Setting (A) Timing Ratio

0

0

2

4

6 8 10 12 14 16 18 20 Accuracy Setting (A) Timing Ratio

Figure 17: “Venus with Ring” statistics

7. Hardware Implementation

31

the density estimate is based on the inverse square of the radius of the sphere enclosing the k nearest neighbours. The approximate radius is always larger than the true radius. This is an inherent problem with any approximate solution to the kNN problem, and indeed even with the exact kNN density estimator: as k goes to infinity, the kNN density estimator does converge on the true density, but always from below.

7

Hardware Implementation

Two possible implementations for Block Hashing are sketched in this chapter. Firstly, we describe an implementation based on custom designed hardware. Then, we sketch an accelerator-based implementation. 7.1

Custom Hardware

The algorithm we have described is useful for software implementation, although its speed performance advantage over a kd-tree kNN algorithm is not as significant as the advantage that can be provided by a hardware implementation. However, our algorithm maps onto a pipelined hardware implementation, as we demonstrate here. Such a hardware implementation would return a sequence of candidate samples to a shader unit in an accelerator, which could then select which samples to use in a shader program based on programmable criteria. We do not show the Hilbertorder sorting and block compression phase, but only the lookup phase (ideally these would also be accelerated, along with the path-tracing computations needed to generate the photons, but that’s another paper). We assume that hash table entries for L hash tables have already been computed. Our overall hardware architecture is shown in Figure 18. Hashing and table lookup takes place in parallel across multiple lookup units. Each lookup unit returns a stream of up to B block addresses, along with a priority. A tree of merging units then combines these streams into a single stream of block addresses, removing duplicates when possible and sorting on priority. When the same block with a different priority appears on the inputs of a merging unit, only the one with the best priority is output. The merger unit forwards the highest-priority (lowest α` value) block address and advances that input stream by asserting the appropriate NEXT signal. If both streams present identical block addresses, it advances both to remove the duplicate. The output of the system is a sequence of block addresses, sorted by priority, but possibly still containing some duplicates. Before accessing memory, it would be best if the duplicates were removed. This can be accomplished using a small

32

7. Hardware Implementation x

y

z

lookup

lookup

lookup

merge

lookup

merge

merge

VALID

PRI

BLK

NEXT

Figure 18: Address-generator hardware architecture. Multiple hash-table lookups are made and the results are sorted by priority (using merge sort). Only the highorder bits of x, y, and z are needed (on the order of 16 bits); a higher-precision distance comparison will be done later. content-addressable memory (not shown) to keep track of which blocks have been scanned for a given AkNN lookup. Figure 19 shows the lookup unit, along with a detailed view of the hashing function. This implementation shows a high-performance parallel evaluation of the hashing function, which compares the input value against all thresholds in parallel. A binary search approach is also possible, but would require a deeper logic depth, would be more complex, and/or would require multiple clocks to execute. Not shown here is logic to reload the thresholds and the hash tables. The lookup unit also has a counter which permits it to advance through all entries in a given bucket. Each entry needs to have a bit marking it as valid, and all valid entries should be packed towards the start of the bucket. Also, the hash

7.1. Custom Hardware x

33

y

z

RESET

t q hash

hash

c1

t1

hash

pmo c2

t2 h( x)

h( y )

c3

h( z )

t3 lg B

counter

i

Addr

cp−1

tp−1

T[inlv( h( x) h( y ) h( z ) ) @ i] Data

hash

lookup VALID PRI

BLK

lg(p) h(t)

INC

Figure 19: Hash table lookup. Hash keys are computed on each dimension and bitinterleaved into a single key. The counter is used to advance through table entries on command. Each entry has a bit to mark it as valid. The pmo combinational function finds the 1 with the highest index and outputs this index in binary-coded form. Logic to reload the tables and threshold registers is not shown. tables need to store a value of α` per bucket. This implementation is designed for high performance, under the assumption that it would operate as a kind of “texture unit” in the fragment pipeline. This address generation unit would be followed by a pipelined memory access and caching unit [4,26,30] that could be shared with the texture units. This architecture would also permit the use of cached and pipelined memory access units for the hash tables themselves. Interleaving of the bits of the keys would improve the spatial coherency of these accesses. It would be useful to add hardware after the memory access unit to compute high-precision distances from the query point to each sample and reject all but the

34

7. Hardware Implementation

k nearest samples in the candidate set. This could be done with insertion sort10 into another content-addressable memory containing k slots, which, as a side effect, would compute the distances to the k nearest samples. This information, along with the samples themselves, would then be passed onto the fragment shader unit where the samples would be placed in registers. Depending on how it is programmed, the fragment shader unit could then decide to reject some of the surviving samples (based on direction or normal, for instance) before using them. If a lower performance system is adequate, some of the parallel hardware shown here could be replicated and reused in a multi-clock design. For instance, a fragment shader might take multiple clock cycles to process each sample (and it might take several clock cycles simply to transmit each sample from the memory system to the fragment unit). In this case, a multi-clock design that reused the bank of parallel comparators in the hashing unit, for instance, would be reasonable (although these comparators do not take many gates compared to the gates needed for the threshold registers themselves). On the other hand, the architecture shown could easily be pipelined—we have not diagramed the pipelined version because we want to make the diagrams simpler. 7.2

Accelerator-based Implementation

There are two ways to approach a hardware implementation of an algorithm in a hardware accelerator: with a custom hardware implementation, or as an extension or exploitation of current hardware support. While there would be certain advantages in a full custom hardware implementation of the algorithm proposed here, this would probably lead to a large chunk of hardware with low utilisation rates. Although there are many potential applications of AkNN search beyond photon mapping (we list several in the conclusions), it seems more reasonable to consider first if Block Hashing can be implemented using current hardware and programmable shader features, and if not, what the smallest incremental changes would be. We have concluded that Block Hashing, while not quite implementable on today’s graphics hardware, should be implementable in the near future. We consider only the lookup phase here, since the preprocessing would indeed require some custom hardware support, but support which perhaps could be shared with other useful features. In the lookup phase, (1) we compute hash keys, (2) look up buckets in multiple hash tables, (3) merge and remove duplicates from the list of retrieved blocks and optionally sorting by priority, (4) retrieve the photon records stored in these blocks, and (5) process the photons. Steps (1) and (5) could be performed with current shader capabilities, although the ability to loop would 10

because in practice insertion sort has better performance for small number of items to be sorted.

7.2. Accelerator-based Implementation

35

be useful for the last step to avoid replicating the code to process each photon. Computing the hash function amounts to doing a number of comparisons, then adding up the zero-one results. This can be done in linear time with a relatively small number of instructions using the proposed DX9 fragment shader instruction set. If conditional assignment and array lookups into the register file, this could be done in logarithmic time using binary search. Steps (2) and (4) amount to table lookups and can be implemented as nearestneighbour texture-mapping operations with suitable encoding of the data. For instance, the hash tables might be supported with one texture map giving the priority and number of valid entries in the bucket, while another texture map or set of texture maps might give the block references, represented by texture coordinates pointing into another set of texture maps holding the photon records. We might want to revisit our block size and data-structure representation assumptions, however, to fit the cache line size of a particular accelerator. Step (3) is difficult to do efficiently without true conditionals and conditional looping. Sorting is not the problem, as it could be done with conditional assignment. The problem is that removal of a duplicate block reduces the number of blocks in the candidate set. We would like in this case to avoid making redundant photon lookups and executing the instructions to process them. Without true conditionals, an inefficient work-around is to make these redundant texture accesses and process the redundant photons anyhow, but discard their contribution by multiplying them by zero. We have not yet attempted an implementation of Block Hashing on an actual accelerator. Without looping, the current state of the art does not permit nearly enough instructions in shaders to process k photons for the usual values of k required for density estimation. However, we feel that it might be feasible to implement our algorithm on a next-generation (DX9-like) shader unit using the “multiply redundant photons by zero” approach, if looping (a constant number of times) were supported at the fragment level. We expect that the generation that follows DX9-class accelerators will probably have true conditional execution and looping, in which case implementation of Block Hashing will be both straightforward and efficient, and will not require any additional hardware or special shader instructions. It will also only require two stages of conditional texture lookup, and lookups in each stage can be performed in parallel. Even current accelerators permit this many stages of texture lookup. In comparison, it would be nearly impossible to implement a tree-based search algorithm on said hardware due to the lack of a stack and the large number of dependent lookups that would be required. With a sufficiently powerful shading unit, of course, we could implement any algorithm we wanted, but Block Hashing

36

10. Future Work

makes fewer demands than a tree-based algorithm does.

8

Conclusion

We have presented an efficient, scalable, and memory-coherent AkNN scheme that is amenable to fine-scale parallelism. Our technique is suitable for the highperformance implementation of photon mapping. Block Hashing achieves a performance improvement over a kd-tree-based kNN implementation by trading memory space usage for speed. The coherent memory access patterns of block hashing lead to improved performance even for a software implementation. The key observation is that Block Hashing is not intended to usurp the kd-tree in a software photon mapping implementation, but is one approach that allows the migration of photon mapping onto hardware within foreseeable future, alongside other efforts to migrate other rendering techniques towards a hardware implementation. It is hoped that this work stimulates more research into hardware-assisted AkNN.

9

Applications

AkNN has many potential applications in graphics beyond photon maps. For rendering, it could also be used for: • sparse data interpolation – visualisation of sparse volume data, – BRDF and irradiance volume representation, and – other sampled functions. • sparse and multi-resolution textures, • direct ray-tracing of point-based objects [63], and • gap-filling in forward projection point-based rendering [73]. AkNN could also potentially be used for non-rendering purposes: • collision detection, • surface reconstruction, and • physical simulation.

10

Future Work

There are many things to try with Block Hashing in the future, which may be grouped into different categories.

10.1. Block Hashing Related

10.1

37

Block Hashing Related

Given that the speed of a query into the Block Hashing data structure can be directly controlled by the accuracy setting A, an interesting application may be to adapt Block Hashing so that it can be a progressive process. The renderer can start with a low accuracy setting, generate a fast but inaccurate preview of the final image, then progressively increase the accuracy setting to give images of higher radiance accuracy. This may have important ramifications for a real-time interactive photon mapping renderer: Use the lowest accuracy setting when the user is interacting or actively engaging a walk-through of the scene, and progressively increase the accuracy when the user stops moving the camera of the renderer. Currently the thresholds are generated for each dimension, with no correlation between these dimension. An investigation into new methods for generating thresholds that take into account all dimensions together is worthwhile if this proves to give better results. This may lead to the ability to specify the total number of thresholds in all hash functions in a hash table, and the threshold generator would be able to adaptively assign thresholds to the dimensions that need them. 10.2

Photon Mapping Related

There are a handful of photon mapping specific investigations to try. Firstly, it would be interesting to see how volumetric photon maps [36] fare with Block Hashing. Unlike regular global or caustic photon maps, in which photons mostly lie on two-dimensional surfaces, volumetric photons are free to lie anywhere in space. Intuition indicate that such photons would also benefit from the advantages of Block Hashing, but we need to experiment to find out if any modifications to the current design of Block Hashing are required. Suykens and Willems [67] proposed a technique in which the photon density can be made user-controllable. In their framework, photon density can be specified for different parts of the scene so photon density in regions of the photon map which are unimportant or overly dense can be capped. With regards to Block Hashing, it would be intriguing to see if the performance and accuracy of Block Hashing can be increased if Block Hashing is allowed to put limits on photon density in a way and in regions that Block Hashing deems helpful to itself. One of Christensen’s important contribution to photon mapping is his technique of “Irradiance Caching” [14] in which the local irradiance given by the photon map is pre-calculated and stored inside each photon. Then, the renderer can make a radiance estimate by searching for a single nearest photon to the point being shaded, and use the cached irradiance to estimate power. Attention should be paid to verify if Block Hashing can indeed be used for this purpose: What amount

38

10. Future Work

of performance speed-up would Block Hashing give for the 1-nearest-neighbour problem? Peter and Pietrek [53] described a three-step photon map construction process whereby the first step involves distributing “importons” (packets of importance) into the scene. The density of these importance-particles is an indication of the importance of the local region. It is intriguing to see if these importance-particles themselves, or in conjunction with photons, are good indications on how the thresholds should be positioned. 10.3

Hardware related

The coherent memory access patterns of Block Hashing lead to improved performance even for a software implementation. However, in the near future we plan to implement at least the lookup phase of the Block Hashing AkNN algorithm in hardware. A hardware implementation would be even more interesting if it permits other applications to make use of the high performance AkNN capability Block Hashing provides. The key to making a hardware implementation of an AkNN algorithm that is generically useful for a wide range of problems would be the implementation of a flexible (programmable) post-processing capability.

References 1. PC SDRAM Specification, Rev 1.7. Technical report, Intel Corporation. http://developer.intel.com/technology/memory/pc133sdram/spec/sdram133.htm. 7 2. P. K. Agarwal. Range Searching. In J. E. Goodman and J. O’Rourke, editors, Handbook of Discrete and Computational Geometry. CRC Press, July 1997. 2.2 3. P. K. Agarwal and J. Erickson. Geometric range searching and its relatives. Advances in Discrete and Computational Geometry, 23:1–56, 1999. 2.2 4. Bruce Anderson, Andy Stewart, Rob MacAulay, and Turner Whitted. Accommodating Memory Latency in a Low-Cost Rasterizer. In Proc. Graphics Hardware, pages 97–101. Eurographics/SIGGRAPH, 1997. 7.1 5. S. Arya and D. M. Mount. Approximate Nearest Neighbor Queries in Fixed Dimensions. In Proc. ACM-SIAM SODA, 1993. 2.2 6. S. Arya and D. M. Mount. Approximate range searching. In Proc. 11th Annu. ACM Sympos. Comput. Geom., pages 172–181, 1995. 2.2

10.3. Hardware related

39

7. S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Y. Wu. An Optimal Algorithm for Approximate Nearest Neighbor Searching. In Proc. ACM-SIAM SODA, pages 573–582, 1994. 2.2 8. J. L. Bentley. Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9), September 1975. 3, 2.2 9. J. L. Bentley, B. W. Weide, and A. C. Chow. Optimal Expected-Time Algorithms for Closest Point Problems. ACM TOMS, 6(4), December 1980. 3, 2.2 10. T. Bially. Space-Filling Curves: Their Generation and Their Application to Bandwidth Reduction. IEEE Transactions on Information Theory, IT-15(6):658–664, November 1969. 4.1 11. Allan Borodin, Rafail Ostrovsky, and Yuval Rabani. Lower bounds for high dimensional nearest neighbor search and related problems. In Proc. ACM STOC, pages 312–321. ACM Press, 1999. 2.1 12. Arthur R. Butz. Alternative Algorithm for Hilbert’s Space-Filling Curve. IEEE Transactions on Computers, C-20:424–426, April 1971. 4.1 13. Amit Chakrabarti, Bernard Chazelle, Benjamin Gum, and Alexey Lvov. A lower bound on the complexity of approximate nearest-neighbor searching on the hamming cube. In Proceedings of the thirty-first annual ACM symposium on Theory of computing, pages 305–311. ACM Press, 1999. 2.1 14. Per H. Christensen. Faster Photon Map Global Illumination. Journal of Graphics Tools, 4(3):1–10, 1999. 3, 1.1, 10.2 15. D. Comer. The Ubiquitous B-Tree. ACM Computing Surveys, 11(2):121–137, June 1979. 4.1 16. C. A. Duncan, M. T. Goodrich, and S. G. Kobourov. Balanced aspect ratio trees: Combining the advantages of k-d trees and octrees. In Proc. ACM-SIAM SODA, volume 10, pages 300–309, 1999. 2.2 17. H. Edelsbrunner. Algorithms in Combinatorial Geometry. Springer-Verlag, 1987. 3, 2.2 18. D. Eppstein, M. S. Paterson, and F. F. Yao. On nearest neighbor graphs. Discrete & Computational Geometry, 17(3):263–282, April 1997. 2.2 19. C. Faloutsos and Y. Rong. DOT: A Spatial Access Method Using Fractals. In Proc. 7th Int. Conf. on Data Engineering,, pages 152–159, Kobe, Japan, 1991. 4.1

40

10. Future Work

20. C. Faloutsos and S. Roseman. Fractals for Secondary Key Retrieval. In Proc. 8th ACM PODS, pages 247–252, Philadelphia, PA, 1989. 4.1 21. J. H. Freidman, J. L. Bentley, and R. A. Finkel. An Algorithm for Finding Best Matches in Logarithmic Expected Time. ACM TOMS, 3(3):209–226, 1977. 2.2 22. V. Gaede and O. G¨ unther. Multidimensional access methods. ACM Computing Surveys (CSUR), 30(2):170–231, 1998. 2.2 23. A. Gersho and R.M. Gray. Vector Quantization and Data Compression. Kluwer, 1991. 2.1 24. A. Gionis, P. Indyk, and R. Motwani. Similarity Search in High Dimensions via Hashing. In Proc. VLDB, pages 518–529, 1999. 2.2, 3.1 25. J. E. Goodman and J. O’Rourke, editors. Handbook of Discrete and Computational Geometry. CRC Press, July 1997. ISBN: 0849385245. 3, 2.2, 2.2 26. Ziyad S. Hakura and Anoop Gupta. The Design and Analysis of a Cache Architecture for Texture Mapping. In Proc. 24th International Symposium on Computer Architecture, pages 108–120. ACM SIGARCH, 1997. 7.1 27. V. Havran. Analysis of Cache Sensitive Representation for Binary Space Partitioning Trees. Informatica, 23(3):203–210, May 2000. 2.2 28. D. Hilbert. Ueber stetige Abbildung einer Linie auf ein Flächenst¨ uck. Mathematische Annalen, 38:459–460, 1891. 4.1 29. K Hinrichs. The grid file system: implementation and case studies of applications. PhD thesis, Institut f¨ ur Informatik, ETH, 1985. 4, 4.1 30. Homan Igehy, Matthew Eldridge, and Kekoa Proudfoot. Prefetching in a Texture Cache Architecture. In Proc. Graphics Hardware, pages 133–142. Eurographics/SIGGRAPH, 1998. 7.1 31. P. Indyk and R. Motwani. Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. In Proc. ACM STOC, pages 604–613, 1998. 2.1, 2.2 32. P. Indyk, R. Motwani, P. Raghavan, and S. Vempala. Locality-Preserving Hashing in Multidimensional Spaces. In Proc. ACM STOC, pages 618–625, 1997. 2.2 33. H. V. Jagadish. Linear clustering of objects with multiple attributes. In Proc. Acmsigmod, pages 332–342, May 1990. 4.1 34. J. W. Jaromczyk and G. T. Toussaint. Relative Neighborhood Graphs and Their Relatives. Proc. IEEE, 80(9):1502–1517, September 1992. 2.2


41

35. H. W. Jensen. Realistic Image Synthesis Using Photon Mapping. A.K. Peters, 2001. 1, 3.2 36. H. W. Jensen and P. H. Christensen. Efficient Simulation of Light Transport in Scenes With Participating Media Using Photon Maps. Proc. SIGGRAPH 98, pages 311–320, July 1998. ISBN 0-89791-999-8. Held in Orlando, Florida. 10.2 37. H. W. Jensen, F. Suykens, and P. H. Christensen. A Practical Guide to Global Illumination using Photon Mapping. In SIGGRAPH 2001 Course Notes, number 38. ACM, August 2001. 2.2 38. Jon M. Kleinberg. Two algorithms for nearest-neighbor search in high dimensions. In Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, pages 599–608. ACM Press, 1997. 2.1 39. D. E. Knuth. The Art of Computer Programming Vol. 3: Sorting and Searching. Addison-Wesley, 1969. 4.1, 4.1 40. J. K. P. Kuan and P. H. Lewis. Fast k Nearest Neighbour Search for R-tree Family. In Proc. of First Int. Conf. on Information, Communication and Signal Processing, pages 924–928, Singapore, 1997. 2.2 41. Eyal Kushilevitz, Rafail Ostrovsky, and Yuval Rabani. Efficient search for approximate nearest neighbor in high dimensional spaces. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, pages 614–623. ACM Press, 1998. 2.1 42. E. P. Lafortune and Y. D. Willems. Using the Modified Phong brdf for Physically Based Rendering. Technical Report CW197, Department of Computer Science, K.U.Leuven, November 1994. 6.2 43. C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. MIT Press, 2001. 4.4 44. K.-I. Lin and C. Yang. The ANN-Tree: An Index for Efficient Approximate NearestNeighbour Search. In Conf. on Database Systems for Advanced Applications, 2001. 2.2 45. N. Linial and O. Sasson. Non-Expansive Hashing. In Proc. acm stoc, pages 509–518, 1996. 2.2 46. Michael D. McCool, Chris Wales, and Kevin Moule. Incremental and Hierarchical Hilbert Order Edge Equation Polygon Rasterization. In Proc. Graphics Hardware, pages 65–72. Eurographics/SIGGRAPH, 2001. 4.1

42

10. Future Work

47. J. McNames. A Fast Nearest-Neighbor Algorithm Based on a Principal Axis Search Tree. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(9):964– 976, 2001. 2.2 48. G. M. Morton. A computer oriented geodetic data base and a new technique in file sequencing. Technical report, IBM Ltd., Ottawa, Ontario, Canada, 1966. 4.1 49. F. Kenton Musgrave. A Peano Curve Generation Algorithm. In James Arvo, editor, Graphics Gems, volume II, page 25. Academic Press, 1991. 4.1 50. J. Nievergelt, H. Hinterberger, and K. C. Sevcik. The Grid File: an adaptable, symmetric multikey file structure. ACM TODS, 9(1):38–71, March 1984. 4, 4.1 51. J. A. Orenstein and T. H. Merrett. A Class of Data Structures for Associative Searching. In Proc. ACM PODS, pages 181–190., Waterloo, Ontario, Canada, 1984. 4.1 52. Mark H. Overmars. The design of dynamic data structures. Springer-Verlag, Berlin, 1983. 2.2 53. Ingmar Peter and Georg Pietrek. Importance Driven Construction of Photon Maps. Eurographics Rendering Workshop 1998, pages 269–280, June 1998. ISBN 3-21183213-0. Held in Vienna, Austria. 10.2 54. T. J. Purcell, I. Buck, W. R. Mark, and P. Hanrahan. Ray Tracing on Programmable Graphics Hardware. In to appear in Proc. SIGGRAPH, 2002. 2 55. J. T. Robinson. The K-D-B-tree: A Search Structure for Large Multidimensional Dynamic Indexes. In Proc. acm sigmod, pages 10–18, 1981. 2.2 56. Arnold L. Rosenberg and Lawrence Snyder. Compact B-Trees. ACM SIGMOD, pages 43–51, 1979. 4.1 57. Arnold L. Rosenberg and Lawrence Snyder. Time- and space-optimality in b-trees. ACM TODS, 6(1):174–193, 1981. 4.1 58. N. Roussopoulos, S. Kelley, and F. Vincent. Nearest Neighbor Queries. In Proc. ACM-SIGMOD, 1995. 2.2 59. J.R. Sack and J. Urrutia, editors. Handbook of Computational Geometry. Elsevier Science, Amsterdam, North-Holland, 2000. ISBN: 0444825371. 3, 2.2 60. H. Sagan. Space-Filling Curves. Springer-Verlag, New York, NY, 1994. 4.1 61. K. Salem and H. Garcia-Molina. Disk striping. In IEEE ICDE, pages 336–342, 1986. 9


43

62. N. Sample, M. Haines, M. Arnold, and T. Purcell. Optimizing Search Strategies in kd-Trees. May 2001. 2.2 63. G. Schaufler and H. W. Jensen. Ray Tracing Point Sampled Geometry. Rendering Techniques 2000, pages 319–328, June 2000. 9 64. I. J. Schoenberg. On the Peano Curve of Lebesgue. Bull. Am. Math. Soc., 44:519, 1938. 4.1 65. M. Smid. Closest-Point Problems in Computational Geometry. In J. R. Sack and J. Urrutia, editors, Handbook on Computational Geometry. Elsevier Science, Amsterdam, North Holland, 2000. 2.2 66. R. F. Sproull. Refinements to Nearest-Neighbor Searching in k-Dimensional Trees. Algorithmica, 6:579–589, 1991. 2.2 67. Frank Suykens and Yves D. Willems. Density Control for Photon Maps. In 11th Eurographics Workshop on Rendering, June 2000. 10.2 68. P. Tsaparas. Nearest neighbor search in multidimensional spaces. Qualifying Depth Oral Report 319-02, Dept. of Computer Science, University of Toronto, 1999. 2.2 69. P. van Oosterom. A Reactive Data Structure for Geographic Information Systems. In Auto-Carto, volume 9, pages 665–674, April 1989. 4.1 70. M. Vanco, G. Brunnett, and T. Schreiber. A Hashing Strategy for Efficient k-Nearest Neighbors Computation. In Computer Graphics International, pages 120–128. IEEE, June 1999. 2.2 71. I. Wald, T. Kollig, C. Benthin, A. Keller, and P. Slusallek. Interactive global illumination. Technical report, Computer Graphics Group, Saarland University, 2002. to be published at EUROGRAPHICS Workshop on Rendering 2002. 2.2 72. G. Ward. Real Pixels. In James Arvo, editor, Graphics Gems II, pages 80–83. Academic Press, 1991. 3.2 73. M. Zwicker, H. Pfister, J. van Baar, and M. Gross. Surface Splatting. Proc. SIGGRAPH 2001, pages 371–378, 2001. 9

Low Latency Photon Mapping using Block Hashing - CiteSeerX

Low Latency Photon Mapping using Block Hashing - CiteSeerX

Suggest Documents

Low Latency Photon Mapping using Block Hashing - Semantic Scholar

HASHING + MEMORY = LOW COST, EXACT PATTERN ... - CiteSeerX

PRINCE â A Low-latency Block Cipher for Pervasive Computing

Fair, Efficient and Low-Latency Packet Scheduling Using ... - CiteSeerX

PRINCE â A Low-latency Block Cipher for Pervasive Computing

Short Block-length Codes for Ultra-Reliable Low-Latency ... - arXiv

Low Latency Fault Tolerance System - CiteSeerX

Realistic low-latency mobile AR rendering - CiteSeerX

Separated High-bandwidth and Low-latency ... - CiteSeerX

Low-Latency, HDL-Synthesizable Dynamic Clock ... - CiteSeerX

High bandwidth, Low Latency Global Interconnect - CiteSeerX

Direct Mapping of Low-Latency Asynchronous ... - IEEE Xplore

Block-based image hashing with restricted

Demartek Low Latency Evaluation

Low-latency trading - NYU

Low-latency trading - NYU

Low-latency trading - NYU

5G Ultra low latency

Complementary Projection Hashing - CiteSeerX

Autonomous Navigation and Mapping Using Monocular Low ...

A Low Latency Kernel Recursive Least Squares Processor using ...

low-latency music software using off-the-shelf operating systems

Ultra-Reliable Low Latency Communication (URLLC) using ... - arXiv

Overhead and Performance of Low Latency Live Streaming using ...

Low Latency Photon Mapping using Block Hashing - CiteSeerX