Selective Rendering for High-Fidelity Graphics

Selective Rendering for High-Fidelity Graphics

Kurt Debattista

A thesis submitted to the University of Bristol, UK in accordance with the requirements for the degree of Doctor of Philosophy in the Faculty of Engineering, Department of Computer Science.

2006

c. 54, 000 words

Abstract Rendering physically-accurate images of complex scenes without incurring the costly rendering times commonly associated with realistic image synthesis is one of the major challenges in computer graphics. Using algorithms that non-uniformly adapt the computation to focus, potentially limited, rendering resources on those areas of the scene deemed more important, without any loss in the perceived quality of the final image, can reduce the cost of such prohibitive computations considerably. This thesis investigates these selective rendering techniques in depth. We identify within the graphics literature rendering algorithms which use some form of selective criteria, from the simplest of comparing light intensity thresholds, to those which use complex models of the human visual system, to render images adaptively, progressively and under system constraints. We present our own selective renderers that exploit stimulus driven visual attention processes, run under different hardware resources, particularly CPUs, GPUs and in parallel, and can complete within given time constraints. Based on the literature and our own selective methods we show how most of these selective rendering algorithms follow a number of computation phases in order to identify where to best spend computation time to speed up the rendering process. These stages could be broadly identified as a pre-selective rendering stage which initiates the process and is used as input to a selective guidance stage, that selects which are the important areas to be rendered, and a final selective rendering stage that uses selective variables, to adapt in accordance with the importance of that area. By considering the rendering of ray-traced images as composed of finer grained computations based on components rather than the traditional ray tree, we present algorithms for progressive, selective and time-constrained rendering using these component-based methods. We further develop the component-based approach by showing component-based algorithms that can parallelise, over a distributed system, the irradiance cache, a shared cache data structure which is notoriously hard to parallelise, and show how component-based methods can be used in an adaptive sampling approach to rendering. Components can also be used to identify multiple selective variables to improve performance for selective renderers. We demonstrate how the interdependence between the various selective rendering phases can influence the choice of pre-selective rendering, selective guidance and the number and type of selective variables. We demonstrate how progressive rendering algorithms may be suited for selective rendering systems with multiple selective variables using componentbased rendering to identify and compute the selective variables and in particular, we introduce a novel progressive selective irradiance cache as a complex example of such a system. Also, we show how selective guidance can be computed independently for each selective variable resulting in an increase in performance without a perceptual loss in quality. Furthermore, the progressive nature of the progressive selective algorithms makes them ideal candidates for time-constrained systems, permitting the computation of images in a fraction of the time it traditionally takes to render.

i

Declaration The work in this thesis is original and no portion of the work referred to here has been submitted in support of an application for another degree or qualification of this or any other university or institution of learning.

Signed:

Date: August 25, 2006

Kurt Debattista

ii

Acknowledgements Many thanks to my advisor Alan Chalmers for the advice and encouragement throughout the PhD and for giving me the opportunity in the first place. Many thanks to Luis Paulo Santos who unofficially acted as a second advisor. The concepts and ideas of component-based rendering, in particular Chapter 6 came about as a result of my visit to Luis in Portugal at the start of the PhD. Not only did he offer good advice but also helped with the coding of the parallel renderers. It has been a wonderful experience working with the Computer Graphics group at the University of Bristol. Patrick, Gavin, Dave, Piotr, Timo it has been great working with you. Francesco, Roger, Pete, Cathy, Richard, Julie, Veronica I think we made some great work togehter, I thank you all. Veronica collaborated closely with me on some of the selective renderers, her thesis can be read as a validation of many of the selective methods outlined in this thesis. Francisco, you are a great friend and always try to help everyone, you are missed in Bristol. Antonis and Maria, were the first to welcome me in Bristol, their kindness and friendship will be always remembered. Greg Ward has been an inspiration on his visits to Bristol. Mashhuda thanks for providing the extra funding to support me towards the end of the PhD. I have made many friends in Bristol over the past years. Stella and Alberto always made me feel welcome at their place. Nancy, Erika, Sue, Vasillis, Eleana you are all good friends and enhanced my life in Bristol. Mark you have helped in many ways. I would also like to thank the people at the Department of Computer Science at the University of Malta, in particular, Kevin Vella, who is not only a good friend but also started me out on my research career. Joseph Cordina is also a close friend and it has always been a pleasure working with him. Great thanks to my friends in Malta. Andrew, Josric, Simon, Terence, Kenneth, Bebe, Robert, Karen, Stefan, the other Andrew, and countless others who always make it a pleasure to go back. Joseph Borg has been a friend and a teacher since I was sixteen. His advice and friendship are much appreciated. Dru has been a best friend for many years, he has always given good counsel. My family always supported me especially, my Aunt Monica, cousin Roberto and Uncle Henri. My grandfather Frank Bianchi, whose kindness and gentleness I will forever aspire to. My grandmother Blanche, whose joy for life lives on in us today. Anna, I would not have wanted to spend this time with anyone else, σ0 αγαπω. My parents have always helped me with their love and support. This thesis would have never been possible without them. I am eternally grateful.

iii

iv

Contents

List of Figures

xii

List of Tables

1

2

xviii

Introduction

1

1.1

Digital Image Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

1.2

Selective Rendering for High-Fidelity Graphics . . . . . . . . . . . . . . . . . .

3

1.2.1

Selective Time-Constrained Rendering . . . . . . . . . . . . . . . . . .

4

1.3

Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5

1.4

Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6

Realistic Image Synthesis

7

2.1

Radiometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7

2.2

Light Reflectance Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8

2.3

Light Transport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10

2.4

Rasterisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11

2.5

Radiosity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12

2.6

Ray tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

2.6.1

Improving the Computation Complexity due to Geometry . . . . . . . .

16

2.6.2

Stochastic Ray Tracing . . . . . . . . . . . . . . . . . . . . . . . . . . .

16

v

3

2.7

Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18

2.8

Component-Based Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . .

20

2.9

Enhancing Ray Tracing Performance . . . . . . . . . . . . . . . . . . . . . . . .

22

2.9.1

Irradiance Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

2.9.2

Photon mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23

2.10 Optimisations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23

2.10.1 Optimising Rendering on the CPU . . . . . . . . . . . . . . . . . . . . .

24

2.10.2 Programmable Graphics Hardware . . . . . . . . . . . . . . . . . . . . .

24

2.10.3 Parallel Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26

2.11 R ADIANCE: A Physically-based Renderer . . . . . . . . . . . . . . . . . . . . .

27

2.12 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28

Selective Rendering

29

3.1

Selective Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30

3.2

The Human Visual System . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30

3.2.1

Perceptually-Based Metrics . . . . . . . . . . . . . . . . . . . . . . . .

31

3.3

Attention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

32

3.4

Selective Rendering Techniques . . . . . . . . . . . . . . . . . . . . . . . . . .

36

3.4.1

Selective Rendering for Rasterisation . . . . . . . . . . . . . . . . . . .

36

3.4.2

Selective Radiosity . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

36

Selective Ray Tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

37

3.5.1

Selective Rendering for Specific Components . . . . . . . . . . . . . . .

39

3.6

Selective Rendering based on Perceptual Differences . . . . . . . . . . . . . . .

40

3.7

Selective Rendering based on Perceptual Oracles . . . . . . . . . . . . . . . . .

41

3.8

Component-Based Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . .

42

3.5

vi

3.9

4

Rendering under System Constraints . . . . . . . . . . . . . . . . . . . . . . . .

44

3.10 Sparse Sampling Techniques using Temporal Coherence . . . . . . . . . . . . .

46

3.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

47

The Selective Rendering Pipeline

49

4.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

49

4.2

Selective Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

50

4.3

Introduction to the Selective Renderers . . . . . . . . . . . . . . . . . . . . . . .

53

4.3.1

The Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

53

Case I: Selective Rendering for Bottom-Up Visual Attention . . . . . . . . . . .

55

4.4.1

Selective Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . .

55

4.4.2

Implementation and Results . . . . . . . . . . . . . . . . . . . . . . . .

56

4.4.3

Case I Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

58

Case II: Rendering with On-Screen Distractors . . . . . . . . . . . . . . . . . .

59

4.5.1


59

4.5.2


61

4.5.3

Case II Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

64

Case III: GPU-Assisted Selective Rendering . . . . . . . . . . . . . . . . . . . .

66

4.6.1

Edge Detection on GPU . . . . . . . . . . . . . . . . . . . . . . . . . .

67

4.6.2

Saliency Map on GPU . . . . . . . . . . . . . . . . . . . . . . . . . . .

67

4.6.3


67

4.6.4

Case III Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

70

Case IV: Selective Rendering in Parallel . . . . . . . . . . . . . . . . . . . . . .

72

4.7.1


73

4.7.2

Selective Rendering in Parallel . . . . . . . . . . . . . . . . . . . . . . .

74

4.4

4.5

4.6

4.7

vii

4.8

4.9

5

4.7.3

Parallel Irradiance Cache . . . . . . . . . . . . . . . . . . . . . . . . . .

74

4.7.4


76

4.7.5

Still Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

78

4.7.6

Case IV Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

78

Case V: Time-Constrained Rendering . . . . . . . . . . . . . . . . . . . . . . .

79

4.8.1

Time-Constrained Rendering for Traditional Ray-Tracing . . . . . . . .

79

4.8.2

Time-Constrained Rendering using an Irradiance Cache . . . . . . . . .

80

4.8.3


81

4.8.4

Case V Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

84

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

86

Selective Component-Based Rendering

89

5.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

89

5.2

Component-Based Rendering Framework . . . . . . . . . . . . . . . . . . . . .

90

5.2.1

Rendering by Components . . . . . . . . . . . . . . . . . . . . . . . . .

90

5.2.2

The Component Regular Expression . . . . . . . . . . . . . . . . . . . .

92

5.2.3

Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

93

5.2.4

Applying the crex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

94

Selective Component-Based Rendering . . . . . . . . . . . . . . . . . . . . . .

96

5.3.1

Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

96

5.3.2

Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

98

5.3.3

Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

98

Time-Constrained Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . .

99

5.3

5.4

5.4.1

Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5.4.2

Time-Constrained Framework . . . . . . . . . . . . . . . . . . . . . . . 102

viii

5.4.3 5.5

5.6

5.7

Selective Time-Constrained Rendering . . . . . . . . . . . . . . . . . . . . . . . 103 5.5.1

Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

5.5.2

Time-Constrained Rendering Comparison . . . . . . . . . . . . . . . . . 109

Issues Related to Selective Component-Based Rendering . . . . . . . . . . . . . 109 5.6.1

Memory Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

5.6.2

crex Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

Applying a component-Based Approach to General Selective Rendering . . . . . 112 5.7.1

5.8

6

Time-Constrained Results . . . . . . . . . . . . . . . . . . . . . . . . . 103

Implementation and Results . . . . . . . . . . . . . . . . . . . . . . . . 113

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

Accelerating the Irradiance Cache through Parallel Component-Based Rendering

117

6.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

6.2

Irradiance Cache Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

6.3

6.4

6.2.1

Stills . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

6.2.2

Animations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

6.2.3

Parallel Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

Traditional Parallel Irradiance Cache Approaches . . . . . . . . . . . . . . . . . 122 6.3.1

The Centralised Approach . . . . . . . . . . . . . . . . . . . . . . . . . 122

6.3.2

The Broadcast Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 123

Component-Based Parallel Irradiance Cache . . . . . . . . . . . . . . . . . . . . 123 6.4.1

Rendering by Components . . . . . . . . . . . . . . . . . . . . . . . . . 124

6.4.2

Component-Based Approach . . . . . . . . . . . . . . . . . . . . . . . . 125

6.4.3

Component Subdivision . . . . . . . . . . . . . . . . . . . . . . . . . . 125

6.4.4

Load Balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

ix

6.5

7

6.5.1

Still Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

6.5.2

More Still Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

6.5.3

Animation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

6.6

Selective Parallel Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

6.7

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

Component-Based Adaptive Sampling

139

7.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

7.2

Traditional Adaptive Sampling as Selective Rendering . . . . . . . . . . . . . . 140

7.3

Rendering by Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

7.4

Component-Based Adaptive Sampling . . . . . . . . . . . . . . . . . . . . . . . 143

7.5

7.6

7.7

8

Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

7.4.1

Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

7.4.2

Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 7.5.1

Traditional Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 146

7.5.2

Component-Based Implementation . . . . . . . . . . . . . . . . . . . . 146

Results and Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 7.6.1

Performance Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

7.6.2

Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

Progressive Selective Rendering

151

8.1

The Interaction amongst the Selective Rendering Stages . . . . . . . . . . . . . . 151

8.2

Progressive Selective Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . 153

x

8.3

8.2.1

Progressive Selective Rendering Algorithms . . . . . . . . . . . . . . . . 155

8.2.2


Time-Constrained Selective Rendering . . . . . . . . . . . . . . . . . . . . . . . 163 8.3.1

8.4

9


Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

Conclusions and Future Work

171

9.1

Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

9.2

Directions for Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

9.3

Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

Bibliography

176

xi

xii

List of Figures 1.1

Examples of physically-based images rendered using the physically-based lighting simulation system R ADIANCE. . . . . . . . . . . . . . . . . . . . . . . . . . . .

2

2.1

BRDF examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9

2.2

An example of the rasterisation pipeline. . . . . . . . . . . . . . . . . . . . . . .

11

2.3

The radiosity pipeline. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12

2.4

The radiosity pipeline using rasterisation for rendering. . . . . . . . . . . . . . .

14

2.5

The ray tracing pipeline. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15

2.6

Examples of random sampling and stratified sampling methods. . . . . . . . . .

18

2.7

Examples of Halton sequence sampling methods. . . . . . . . . . . . . . . . . .

20

2.8

Examples of (0, 2) sampling method showing the stratified nature of 16 samples.

20

2.9

Examples of (0, 2) sampling method showing the hierarchical nature for 2, 4, 8 and 16 samples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21

3.1

An example of using VDP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31

3.2

An example of using the saliency map. . . . . . . . . . . . . . . . . . . . . . . .

33

3.3

Rendering around the foveal angle. . . . . . . . . . . . . . . . . . . . . . . . . .

34

3.4

Visualisation of task map, saliency map and importance map for the Corridor scene. 35

4.1

Selective rendering cyclic process. . . . . . . . . . . . . . . . . . . . . . . . . .

50

4.2

Selective rendering pipeline. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

51

xiii

4.3

Demonstrating where the selective rendering frameworks are placed in the context of the ray-tracing pipeline. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

51

4.4

Case I selective rendering pipeline.

. . . . . . . . . . . . . . . . . . . . . . . .

55

4.5

Scenes used for results of Case I. . . . . . . . . . . . . . . . . . . . . . . . . . .

56

4.6

Case II selective rendering pipeline. . . . . . . . . . . . . . . . . . . . . . . . .

59

4.7

Scenes used for results of Case II. . . . . . . . . . . . . . . . . . . . . . . . . .

62

4.8

The scene used for the sound-emitting objects experiment. . . . . . . . . . . . .

63

4.9

Case III selective rendering pipeline.

. . . . . . . . . . . . . . . . . . . . . . .

66

4.10 Scenes used for results of Case III. . . . . . . . . . . . . . . . . . . . . . . . . .

68

4.11 Edge maps for the scenes used for results of Case III. . . . . . . . . . . . . . . .

69

4.12 Saliency maps for the scenes used for results of Case III. . . . . . . . . . . . . .

69

4.13 Case IV selective rendering pipeline.

. . . . . . . . . . . . . . . . . . . . . . .

72

4.14 The saliency map, and visualisation of workload for the Kalabsha scene. . . . . .

73

4.15 Task map visualisation for the Corridor scene. . . . . . . . . . . . . . . . . . . .

73

4.16 Broadcast parallel irradiance cache. . . . . . . . . . . . . . . . . . . . . . . . .

75

4.17 The animations used for the results of Case IV. . . . . . . . . . . . . . . . . . .

75

4.18 Results for Kalabsha scene for Case IV. . . . . . . . . . . . . . . . . . . . . . .

77

4.19 Results for Corridor scene for Case IV. . . . . . . . . . . . . . . . . . . . . . . .

77

4.20 Time-constrained rendering, an example of the rendering order of the pixels.

. .

80

4.21 Time-constrained rendering: timing estimates and actual timings. . . . . . . . . .

81

4.22 Time-constrained rendering without irradiance cache. . . . . . . . . . . . . . . .

83

4.23 Time constrained rendering for the Cornell Box. . . . . . . . . . . . . . . . . . .

84

4.24 Time constrained rendering for the Corridor Scene. . . . . . . . . . . . . . . . .

85

4.25 Time constrained rendering for the Art Gallery scene. . . . . . . . . . . . . . . .

85

4.26 Time constrained rendering for the Desk scene. . . . . . . . . . . . . . . . . . .

86

xiv

5.1

Component-based rendering, BRDF split into components. . . . . . . . . . . . .

90

5.2

The Cornell box scene split into a number of components. . . . . . . . . . . . .

91

5.3

User controlled component-based rendering of the Library scene. . . . . . . . . .

95

5.4

Progressive component-based rendering of the Desk scene. . . . . . . . . . . . .

95

5.5

Selective component-based rendering pipeline. . . . . . . . . . . . . . . . . . .

96

5.6

The corridor scene used for the visual attention experiment. . . . . . . . . . . . .

97

5.7

A visualisation of the importance map. . . . . . . . . . . . . . . . . . . . . . . .

97

5.8

Selective component-based rendering of the Cornell Box. . . . . . . . . . . . . .

99

5.9

Selective component-based rendering of the Desk Scene. . . . . . . . . . . . . . 100

5.10 Selective component-based rendering of the Art Gallery scene. . . . . . . . . . . 100 5.11 Selective component-based rendering of the Corridor scene. . . . . . . . . . . . 101 5.12 Time-constrained rendering of the Cornell Box. . . . . . . . . . . . . . . . . . . 104 5.13 Time-constrained rendering of the Library Scene. . . . . . . . . . . . . . . . . . 104 5.14 Time-constrained rendering of the Corridor scene. . . . . . . . . . . . . . . . . . 104 5.15 Component-based time-constrained rendering timings for the Library scene and Desk scene. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 5.16 Component-based time-constrained rendering for the Cornell Box and the Corridor scene. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5.17 Comparisons between the time-constrained renderers for the Desk scene. . . . . . 110 5.18 Comparisons between the time-constrained renderers for the Corridor scene. . . . 110 5.19 Differences in quality for images rendered using the specular threshold componentbased renderer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.20 Scenes used for results for specular threshold component-based renderer. . . . . . 114 5.21 Results for the animations used for validating various selective guidance methods using the specular threshold component-based renderer. . . . . . . . . . . . . . . 114

6.1

Irradiance cache misses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

xv

6.2

High-fidelity rendering examples using an irradiance cache for animations. . . . 119

6.3

Irradiance cache analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

6.4

Irradiance cache analysis for no sharing and broadcast. . . . . . . . . . . . . . . 121

6.5

Centralised parallel irradiance cache. . . . . . . . . . . . . . . . . . . . . . . . . 123

6.6

Component-based parallel irradiance cache. . . . . . . . . . . . . . . . . . . . . 126

6.7

Scenes used for results and analysis of parallel irradiance cache. . . . . . . . . . 128

6.8

Parallel irradiance cache timings for Kalabsha scene and Corridor scene. . . . . . 129

6.9

Aggregated timings showing where the component-based approach obtains its speedup from. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

6.10 Irradiance cache misses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 6.11 Idle times due to load imbalance. . . . . . . . . . . . . . . . . . . . . . . . . . . 133 6.12 Timings for the Tables scene and the Cornell Box. . . . . . . . . . . . . . . . . . 133 6.13 Timings for the Corridor scene and the Library scene. . . . . . . . . . . . . . . . 134 6.14 Timings for the Kalabsha scene and the Art Gallery scene. . . . . . . . . . . . . 134 6.15 Irradiance cache animation results. . . . . . . . . . . . . . . . . . . . . . . . . . 135 6.16 Speedup for selective parallel renderer using component-based rendering. . . . . 136

7.1

Traditional adaptive sampling. . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

7.2

Traditional adaptive sampling framework. . . . . . . . . . . . . . . . . . . . . . 141

7.3

Samples per component without interpolation. . . . . . . . . . . . . . . . . . . . 143

7.4

Interpolated components. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

7.5

Component-based adaptive sampling framework. . . . . . . . . . . . . . . . . . 144

7.6

Scenes used for component-based adaptive sampling results. . . . . . . . . . . . 147

8.1

A selective rendering pipeline using rasterisation for image preview. . . . . . . . 152

8.2

Our novel selective rendering pipeline. . . . . . . . . . . . . . . . . . . . . . . . 152

xvi

8.3

Progressive Selective Rendering Framework. . . . . . . . . . . . . . . . . . . . 154

8.4

Detailed description of the progressive selective rendering pipeline. . . . . . . . 156

8.5

Selective rendering for direct lighting. . . . . . . . . . . . . . . . . . . . . . . . 157

8.6

Selective guidance for participating media. . . . . . . . . . . . . . . . . . . . . . 158

8.7

Rendering stages for progressive selective participating media. . . . . . . . . . . 159

8.8

Rendering stages for progressive selective irradiance cache. . . . . . . . . . . . . 160

8.9

Scenes used for progressive selective rendering results. . . . . . . . . . . . . . . 161

8.10 Progressive time-constrained rendering with multiple selective variables. . . . . . 165 8.11 The Cornell Box scene timings . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 8.12 The Simple Boxes scene timings. . . . . . . . . . . . . . . . . . . . . . . . . . . 168 8.13 The Tables scene timings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 8.14 The Corridor scene timings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

xvii

xviii

List of Tables 4.1

Examples of selective rendering frameworks categorisation for recent selective renderers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

52

4.2

Examples of the stages of various selective renderers. . . . . . . . . . . . . . . .

53

4.3

Speedup for Case I of selective rendering over traditional rendering. . . . . . . .

56

4.4

The percentage of computation spent on the pre-selective rendering phase of the selective rendering pipeline for Case I. . . . . . . . . . . . . . . . . . . . . . . .

57

4.5

Results for Scenes used in Case II. . . . . . . . . . . . . . . . . . . . . . . . . .

63

4.6

Speedup for Case III of selective rendering using edge detection as selective guidance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

69

4.7

Speedup for Case III of selective rendering using saliency map as selective guidance. 70

4.8

Still images timings for Case IV. . . . . . . . . . . . . . . . . . . . . . . . . . .

78

4.9

Time-constrained rendering timings without irradiance cache. . . . . . . . . . . .

82

4.10 Time-constrained rendering timings for the Cornell Box and Corridor scene. . . .

82

4.11 Time-constrained rendering timings for the Library and Desk scene. . . . . . . .

83

5.1

The crex description. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

93

5.2

crex BNF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

93

5.3

Speedup for the selective component-based renderer. . . . . . . . . . . . . . . .

99

5.4

Profiling results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

5.5

Time-constrained component-based rendering timings. . . . . . . . . . . . . . . 106

xix

5.6

Time-constrained component-based rendering timings. . . . . . . . . . . . . . . 106

5.7

Speedup for specular threshold component-based renderer. . . . . . . . . . . . . 113

6.1

Speedup gains relative to second best algorithm. . . . . . . . . . . . . . . . . . . 131

6.2

Timings for the parallel selective renderer using component-based rendering. . . 136

7.1

Results for the various adaptive sampling renderers for the Corridor scene and the Library scene. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

7.2

Results for the various adaptive sampling renderers for the Cornell Box and the Corridor scene. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

7.3

Visual differences predictor results of the adaptive sampling renderers compared the traditional approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

8.1

An example of a svt for the four selective variables used in our system. . . . . . . 155

8.2

The features of the renderers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

8.3

Progressive rendering timings for the Cornell Box and Mist Cornell scene. . . . . 163

8.4

Progressive rendering timings for the Cornell Boxes and Corridor scene. . . . . . 163

8.5

Progressive rendering timings for the Tables scene and Simple Boxes scene. . . . 164

8.6

Image preview timings compared to the entire process. . . . . . . . . . . . . . . 164

8.7

Time-constrained rendering timings for the Cornell Box and Simples Boxes scene. 168

8.8

Time-constrained rendering timings for the Tables scene and Corridor scene. . . . 169

xx

Chapter 1

Introduction The computation of physically-based realistic images from complex virtual scenes remains one of the major goals in the field of computer graphics. While algorithms exist that can render high quality images, the rendering times may be prohibitive. The ability to render different parts of the image at different qualities, that is non-uniformly, based on criteria that generally take advantage of the limitations in the human visual system, by selectively focusing, potentially limited, resources on the more relevant parts of the computation, make it possible to reduce rendering times considerably, while maintaining a high perceptual fidelity. This thesis is a study of such techniques, analysing previous methods, developing novel ones and adapting them to execute under bounded time constraints.

1.1

Digital Image Synthesis

Rendering images is important for many applications including: the entertainment industry, video games, animation and film special effects; scientific visualisation, including medical imaging; simulation, such as flight, driving, fire fighting, crowd control; computer aided design and computer aided manufacturing; architecture; archaeology etc. In many of these cases the rendering needs to be accurate and based on physical aspects of light. Examples of such images for a number of applications can be seen in Figure 1.1. The process of generating images for virtual environments is generally comprised of a number of stages. The first of these is typically modelling where the virtual scene is constructed, and sometimes animated, using mathematical properties which specify both shape and material of the contents of the scene. This is a complex task and a field unto itself, see for example [Wat93]. The subsequent process deals with computing images from the virtual scene, and is generally termed

1

1.1 Digital Image Synthesis

2

rendering or image synthesis [Gla95].

Figure 1.1: Examples of physically-based images rendered using the physically-based lighting simulation system R ADIANCE [LS98]: (left) The temple of Kalabsha, an archaeological reconstruction [SCM04], (middle) the corridor scene, that can be used for fire safety simulation [SDL+ 05] and (right) the art gallery, an example of rendering for architecture [LS98]. Rendering uses lighting simulations, termed lighting models, in order to compute the desired image. The lighting models may be composed of two parts: the local illumination for the reflectance and emission of light from a surface and the global illumination, describing the transport of light amongst surfaces. Physically-based rendering is the process of realistic image synthesis using physically-based measurements for local illumination and physically-based light transport for the global illumination. When viewing the history of realistic image synthesis Dutré et al. [DBB02] highlighted the fact that there are many widely varying solutions to physically-based rendering. We follow the overall method of Greenberg [Gre99] who presented a framework for realistic image synthesis, where he described the processes of rendering realistic images which are “visually and measurably indistinguishable from real world images”. Greenberg’s framework established that a physically-based rendering system consists of three distinct stages: the goniometric model, the light transport model and the perceptual model. In this thesis we are primarily concerned in the interaction of the light transport with the perceptual model. The goniometric model is composed of methods to capture and model physically-based reflectance and emission for objects in virtual scenes, accounting for the local illumination part of rendering. One pre-computation aspect of the goniometric model comprises the acquisition and validation of the measurements. This aspect of the illumination model concerns itself with how light is emitted from light emitting surfaces and how light behaves when it hits a surface, even if this is the size of a particle. In most cases, this process can be described by a function termed the bidirectional reflectance distribution function (BRDF). We will discuss the BRDF in Section 2.2. The light transport model describes the distribution of light in a virtual scene. The mechanics of this model are formulated as an equation, known as the rendering equation [Kaj86]. The major challenge in solving this equation is the computational expense. The two most popular methods for solving the equation are ray tracing based methods [Whi80] and radiosity based methods

1.2 Selective Rendering for High-Fidelity Graphics

3

[GTGB84] and their extensions for rendering with global illumination. While the earlier deterministic approaches of these algorithms could not fully solve the rendering problem, modern versions of these techniques generally use stochastic methods to solve the rendering equation, through Monte Carlo sampling which converges to the final solution [DBB02]. Light transport and related techniques are further expanded upon in Section 2.3. The physically-based results obtained from the light transport stage must eventually be translated to be displayed on a hardware viewing device such as a monitor and, in particular, be interpreted by the human visual system. Of particular relevance to us is the fact that the human visual system does not respond linearly to changes in the radiometric levels of illumination. Tone mapping operators [DCWP02, LCTS05, RWPD05] are required to be able to map the radiometric data onto a display device which has a lower range of display intensities while attempting to maintain the same perceived visual sensation obtained by viewing the real scene. In addition, the perceptual model may also be used to aid in the rendering process. Perceptual metrics have been developed to direct renderers to identifying halting conditions for rendering [Mys98] or to direct rendering resources by exploiting low-level processes [YPG01] and high-level processes of the human visual system [CCW03], thus speeding up the rendering process. Post-processing validation of perceptually rendered images [MCTR98] forms part of the perceptual model also.

1.2

Selective Rendering for High-Fidelity Graphics

The techniques based on perception mentioned above allow known perceptual limitations to be translated into tolerances at the light transport stage. One of the major contributions of this thesis is identifying already existing interactions and developing new methods for accelerating this process at the light transport and light reflection level. We term the rendering that uses such thresholds, from the simplest criteria based on radiance thresholds, to the more complex metrics based on visual attention, to select computation to either non-uniformly adapt or progress the rendering, selective rendering. When rendering selectively, it is sufficient for the physically-based rendering, using a physicallybased solution, to converge up to the point where further computation would be imperceptible to the human visual system. We term this form of physically-based rendered images high-fidelity images. Note that the differences might still be perceptible by other detectors. After analysing a number of selective rendering algorithms that provide an amount of flexibility in terms of adaptive or progressive rendering algorithms to speedup rendering systems, a pattern begins to appear particularly with those algorithms that take advantage of advanced perceptual techniques. In this thesis we highlight this pattern in the form of a number of rendering stages, the most common of which is the selective rendering pipeline. We also describe the pertaining issues

1.2 Selective Rendering for High-Fidelity Graphics

4

concerning the interaction between the various stages of the pipeline. Based on these observations, we present novel rendering algorithms that improve selective rendering in terms of both quality and speed. Another aspect emerges when considering selective rendering algorithms based on ray tracing. Traditional ray-tracing is by nature recursive, solving the rendering problems backwards from the eye towards any possible reflective medium within a virtual scene and recursively from the intersection point onwards until a pre-condition is met. Due to ray-tracing having linear complexity to the number of pixels rendered or rays shot, selective rendering systems attempt to reduce the number of primary rays shot. While this approach is straightforward, there is a further granularity at which the calculation can be divided. By considering that at each intersection point the reflectance function of the medium is calculated by shooting rays to simulate different properties of the material we can use the individual rays from one intersection point to the next as a finer level of granularity. We term these rays component rays, and such a rendering system a component-based rendering system. Component-based rendering systems are not exclusive to this thesis, they have been used mostly as a method of correctly solving the rendering problem, where they are usually called multipass methods. However, we use them in a novel way, to further enhance the flexibility and performance of rendering systems. In this thesis we shall show how this flexibility improves the performance of parallel rendering algorithms and selective rendering systems. Another benefit of rendering at this granularity is that it also enhances the choice of selective criteria which are used to identify and compute where to best perform selective rendering.

1.2.1 Selective Time-Constrained Rendering Rendering under system constraints is a form of rendering that attempts to render images within a given set of resources, typically timing constraints. Selective rendering systems and system constrained rendering methods are mutually beneficial. The use of such constraint-based systems makes it possible to make best use of the available resources within the given constraints, attempting to achieve the “best bang for the buck”. Constrained rendering applications have mainly been the domain of level of detail for rasterisation systems, with the possible exception of the time-constrained rendering for global illumination on graphics hardware from [DPF03]. This is primarily due to the exclusivity of these systems in rendering at interactive frame rates. Use of level of detail is usually the adopted mechanism to achieve these interactive frame rates since the rasterisation-based renderers scale linearly with scene geometry complexity [LWC+ 02]. For interactive rendering, time constraints can be used in the same way that level of detail time-constrained systems are used, to achieve interactive frame rates, with the level of detail techniques replaced by selective rendering algorithms introduced in the previous section. However, unlike in rasterisation, selective rendering for high-fidelity graphics provides advantages for other aspects of rendering, such as off-line rendering systems for simulation, architecture, and in particular the large render

1.3 Contributions

5

farms used by the entertainment industry. Firstly, it can be used as part of the animation process by allowing animators to view selectively rendered results of their work, maybe focusing on a certain aspect that they would like to highlight, e.g. rendering only a certain character and its interaction within a scene, within a given time frame, such as a fifteen minute coffee break, without having to incur the cost of the full rendering. Secondly, off-line rendering of animations for the entertainment industry needs to be performed within a given time budget to meet looming production deadlines. Finally, new business services provide on demand rendering to animation houses that don’t have the budget to maintain their own rendering resources [BBC04]. Time constrained rendering can provide a business model whereby computation is bought in terms of rendering time rather than just frames and could potentially guarantee the best result is delivered in the acquired time.

1.3

Contributions

The major novel contributions of this thesis are: • A comprehensive literature review of the field of rendering we have termed selective rendering, including the broad categorisation of such techniques into a meaningful framework. • An analysis of selective rendering frameworks based on a series of designs, implementations and results from selective renderers, some of which are novel. This includes the selective renderer developed for the Rendering on Demand project [CDS+ 06]. These observations lead to the engineering of further novel selective rendering techniques, the most relevant of which are mentioned below. • The use of time-constrained rendering in combination with selective rendering, in particular for constraining complex rendering algorithms such as the irradiance cache. • The use of component-based renderers working at a finer level of granularity than traditional renderers. This is primarily emphasised through a framework for component-based rendering which controls the desired transport equation using a regular expression. This technique is used for various aspects of selective rendering, including progressive rendering, time-constrained rendering and perceptually-based rendering. • Using the component-based approach embedded within a rendering system to speedup the parallel implementation of an irradiance cache algorithm compared to traditional methods. • Selective rendering techniques based again on the flexibility of implicit component-based rendering to identify different criteria to adaptively sample rendered images and demonstrate the effectiveness of this approach compared to the traditional method.

1.4 Overview

6

• Progressive selective rendering algorithms for both selective and time-constrained rendering, which use aspects of the previous selective renderers to obtain an order of magnitude speedup over traditional systems, provide better quality renderings than conventional selective renderers and improve the scalability of time-constrained systems.

1.4

Overview

This thesis is divided as follows. Chapter 2 presents an overview of realistic image synthesis, focusing on light transport techniques for global illumination and relevant optimisations. Chapter 3 describes rendering algorithms that use progressive, adaptive algorithms and system constraints for global illumination. Chapter 4 categorises selective rendering algorithms, and presents a number of selective renderers to highlight the interaction between various stages of one of these particular selective frameworks, the selective rendering pipeline. This chapter also introduces selective rendering under time constraints. Chapter 5 presents the theory for component-based rendering systems, and shows how this can be applied to progressive and in particular selective and time-constrained systems. Chapter 6 extends the concept of component-based rendering to be able to improve parallel rendering using an irradiance cache. Chapter 7 further extends the use of component-based rendering for an adaptive sampling selective renderer. Chapter 8 discusses issues raised with the selective rendering pipeline implementations of Chapter 4 and presents progressive selective rendering algorithms which tackle this problem. The progressive quality of these algorithms also make them ideal candidates for time-constrained systems. Finally, Chapter 9 concludes the thesis and presents potential future directions.

Chapter 2

Realistic Image Synthesis This chapter serves as a general introduction to the ways and methods of generating physicallyaccurate renderings of virtual scenes. We begin with an introduction to physical quantities of radiometry. Subsequently, we discuss local illumination in Section 2.2, followed by global illumination through light transport and the rendering equation in Section 2.3. We describe the rendering techniques used in digital image synthesis, rasterisation, radiosity, ray-tracing, multipass methods and enhanced ray-tracing techniques in Sections 2.4, 2.5, 2.6, 2.8 and 2.9 respectively. We also present some optimisations for ray-tracing that are not related to selective rendering. Selective rendering will be discussed in the next chapter. Finally, in Section 2.11 we present a physicallybased renderer, in the form of the light simulation package R ADIANCE, which we shall be using in the rest of this thesis.

2.1

Radiometry

Radiometry is the field of study dealing with the physical quantities and measurements of light. In this section we present some of the definitions of radiometry which we shall use in the rest of the thesis. In this section we follow closely terms and terminology from [DBB02]. Photometric terms generally used in light perception can be derived directly from their radiometric counterparts. Radiant flux describes how much radiant energy flows per unit time. It is denoted by Φ and measured in Watts or Joules/sec.

Φ=

7

dQ dt

2.2 Light Reflectance Models

8

Irradiance (E) is the incoming amount of radiant flux per unit surface area. It is measured in Watt/m2 . Exitance radiance also known as radiosity (B) denotes the outgoing radiant flux per unit area. It is the most common quantity used in radiosity algorithms. It is also calculated in Watt/m2 .

E=

dφ dA

B=

dφ dA

Radiance (L) is perhaps the most traditional quantity used in realistic rendering since it denotes the flux coming from a certain direction onto a small area. It is constant in a vacuum across a line of sight which makes it the measurement of choice for most ray tracing systems [Jen01]. It is given as irradiance or radiosity per unit solid angle, or radiant flux per unit projected area per unit solid angle. Radiance is measured in Watt/(m2 steradian). d2φ dwdAcosθ

L=

The radiometric quantities can also be expressed in terms of transport theory for a point x and direction Θ, and L(x → Θ) denotes the radiance leaving at point x in direction Θ, as follows: Z Z

Φ=

A Ω

L(x → Θ)cosθdwΘ dxA

Z

E(x) =

Ω

L(x ← Θ)dwΘ

Z

B(x) =

2.2

Ω

L(x → Θ)dwΘ

(2.1)

Light Reflectance Models

The light reflectance model describes the emission and reflectance of a material. The reflectance model describes how light interacts on, around and through a surface, including very small surfaces such as particles in a volume. It is usually represented as a function that attempts to model the physical behaviour of the material. While complex reflectance functions such as the bidirectional surface reflectance distribution function provide the reflectance for materials that are both reflective and partially translucent (such as skin or marble) we limit ourselves in this description to the simpler reflectance only function which is used in the majority of cases [Jen01]. This function

2.2 Light Reflectance Models

9

N

N

Figure 2.1: BRDF examples. Top: arbitrary BRDF (left) and pure diffuse or Lambertian (right). Bottom: glossy (left) and pure specular (right).

is described by the bi-directional reflectance distribution function (BRDF) of a material. We follow the definition of BRDF presented in [DBB02] and use the same symbols for the rest of this thesis. The BRDF ( fr ) is a function that describes the ratio of the reflected differential radiance at a point x from direction Ψ to the differential irradiance in the outgoing direction Θ, given by:

fr (x, Ψ → Θ) =

dLr (x → Θ) dLr (x → Θ) = dE(x ← Ψ) L(x ← Ψ) cos(Nx , Ψ)dwΨ

(2.2)

A common property of BRDFs is reciprocity, whereby fr (x, Ψ → Θ) = fr (x, Ψ ← Θ). From now on we shall use fr (x, Ψ ↔ Θ) to denote either direction. Many BRDFs have been proposed, the simplest being perfect diffuse and perfect specular. See Figure 2.1 for examples of BRDFs. Most of the early computer graphics community’s focus had centred around the Phong [Pho75], which however, according to [Gre99], only describes the characteristics of one material type: hard plastic. Subsequently, more accurate BRDF models have been presented [Bli77, CT81]. Several other models exist, such as Ward’s anisotropic reflection model [War92]. Glassner [Gla95] presents a comprehensive overview of BRDF models, including spectral BRDF representations.

2.3 Light Transport

2.3

10

Light Transport

The light transport of the computation describes the global aspect of the illumination model. This is represented as a mathematical equation, for scenes without participating media, termed the rendering equation [Kaj86]. The rendering equation calculates the outgoing radiance in a direction Θ from a point x on a surface described as L(x → Θ) as a sum of the emitted radiance from a point x, described as Le (x → Θ) and the reflected radiance, Lr (x → Θ). We follow [DBB02] for our derivation of the rendering equation.

L(x → Θ) = Le (x → Θ) + Lr (x → Θ)

(2.3)

Integrating through Equation 2.2 we get: Z

Lr (x → Θ) =

Ωx

fr (x, Θ ↔ Ψ)L(x ← Ψ) cos(Nx , Ψ)dwΨ

Finally, substituting in 2.3 we obtain the rendering equation: Z

L(x → Θ) = Le (x → Θ) +

Ωx

fr (x, Θ ↔ Ψ)L(x ← Ψ) cos(Nx , Ψ)dwΨ

(2.4)

The rendering equation can also be expressed in terms of an area formulation which expresses the equation in terms of the surfaces contributing to the reflected radiance. A visibility function V (x, y) is used for this formulation. V (x, y) is 1 if y is directly visible from x, and 0 otherwise. In a vacuum L(x ← Ω) is equal to L(y → −Ω) for radiance incoming to point x from direction Ω and outgoing from point y. Also the solid angle dwΨ can be used as cos(Ny , −Ψ) dA . r2 xy

Z

L(x → Θ) = Le (x → Θ) +

A

fr (x, Θ ↔ Ψ)L(y ← −Ψ)V (x, y)

cos(Nx , Ψ) cos(Ny , Ψ) dAy 2 rxy

This is traditionally described as: Z

L(x → Θ) = Le (x → Θ) + where:

A

fr (x, Θ ↔ Ψ)L(y ← −Ψ)V (x, y)G(x, y)dAy

(2.5)

2.4 Rasterisation

11

G(x, y) =

cos(Nx , Ψ) cos(Ny , Ψ) 2 rxy

Most rendering which attempts to be realistic uses the rendering equation in some form or other. The simplest and earliest forms of rendering based on rasterisation would only sample Lr from the light sources. Classical ray tracing would only sample along the specular reflected and transmitted angles, while classical radiosity would do so using a diffuse only BRDF. When rendering with participating media a number of other factors need to be taken into account, notably emission, absorption, out scattering and in scattering within a volume. This complicates the rendering equation even further, since these effects must be taken into account. Rendering equations and algorithms that represent these effects do exist and are presented in [DBB02, SM03].

2.4

Rasterisation

Figure 2.2: An example of the rasterisation pipeline for Gouraud shading and using the z-buffer for hidden surface removal. After [FvDFH90]. Rasterisation for 3D graphics is the process of converting three dimensional models onto a two dimensional image plane using projection. Rasterisation is usually intrinsically linked with polygon models. While other object models can be represented they are generally converted to polygons first. The process is usually best described as a pipeline, see Figure 2.2 for a basic traditional rasterisation pipeline. While different pipelines exist for rendering with different effects or use slightly different algorithms, they broadly follow this model. The rasterisation process generally begins by transforming all object models into world space and subsequently into camera or view space. At either of these stages a process known as back face culling, which removes back facing polygons, is performed. The lighting is applied to each polygon vertex at this stage. Lighting generally takes the form of the empirical Phong shading model [Pho75], yet the programmability of modern graphics hardware makes it possible to program any complex shader at this stage, see for example the implementation of Ward shader [War92] in [LDGC05]. Subsequently, 3D frustrum clipping is performed to remove the polygons that are out of the view area. The viewport mapping maps the geometry into normalised screen space, with depth. Finally, fragment processing is performed, composed of polygon filling routines including hidden surface removal using the z-buffer algorithm [Cat74]. Furthermore, shading by interpolating from the vertices, illumination

2.5 Radiosity

12

for Gouraud shading [Gou71], normals for Phong [Pho75], and texture mapping [BN76] are also computed at this stage. The type of rendering performed by the pipeline above is only a very coarse approximation of physically-based rendering. Primarily, only direct lighting is computed and until recently this was dominated by the Phong illumination model [Pho75] which is not physically accurate. A multitude of techniques do exist to improve the quality of rasterised rendering [AMH02]. We mention some of the most relevant here. Shadows are simulated by shadow volumes [Cro77] and shadow maps [Wil78], yet the modelling of physically-based illumination for area light sources is usually faked using soft shadow algorithms which are still costly [HLHS03]. Reflections can also be computed [OR98] as well as glossy reflections and refractions [DB97]. Depth of field and motion blur can be implemented by use of a hardware accumulation buffer [HA90]. Also, radiosity is sometimes used in conjunction with rasterisation for rendering of diffuse interreflections, see next section. Unfortunately, most of these techniques require separately distinct rendering algorithms most of them based on complex projection techniques similar to those described in the rasterisation pipeline. Also these techniques still only vaguely approximate solutions to the rendering equation. Yet, the popularity of rasterisation has not waned over the past few decades, despite the lack of physical accuracy, mostly due to the advent of programmable graphics hardware which has made it possible to render millions of polygons per second on commodity workstations.

2.5

Radiosity

Figure 2.3: The radiosity pipeline. Radiosity algorithms and techniques provide methods of computing the global illumination component of the illumination model. Radiosity-based methods were primarily introduced into computer graphics from thermal engineering by Goral et al. [GTGB84] and provide a view-independent finite element method of describing the amount of illumination leaving one surface and reaching another. Classical radiosity supported only diffuse-diffuse interactions through the computation of the distribution of light amongst perfectly diffuse surfaces. For perfectly diffuse surfaces:

L(x → Ω) = L(x) Then using Equation 2.1, we have:

2.5 Radiosity

13 Z

B(x) = L(x)

Ω

cos θdw = πL(x)

From the rendering equation, Equation 2.5 and considering the BRDF now is a function of space only, ρ(x), yielding: Z

L(x) = Le (x) + ρ(x)

A

L(y)G(x, y)dAy

which yields the radiosity equation: Z

B(x) = Be (x) + ρ(x)

A

B(y)G(x, y)dAy

(2.6)

The traditional way of solving the radiosity equation is to discretise the scene into a series of patches and computing the radiosity Bi of each of the N patches:

N

Bi = Bi,e + ρ ∑ B j Fi j

(2.7)

j=1

This is resolved as a system of N simultaneous linear equations. Fi j is known as the form factor that determines the fraction of power arriving at patch i from patch j. The linear equations for calculating the radiosity and the form factor determination are the two major computations that must be solved for radiosity computation. An overview of the radiosity pipeline is given in Figure 2.3. The visibility problem associated with form factor determination is potentially the most expensive computation in the radiosity pipeline [SP94]. There are a number of methods used to solve the visibility issue, the most popular ones being those based on projection since the form factor at a certain point x with any surface can be computed from the projection of that surface onto the hemisphere around x. The hemi-cube method [CCWG88] discretises the hemisphere onto a hemicube and uses projection and hidden surface removal techniques, based on those of rasterisation, for each of the faces of the hemi-cube. Other techniques for form factor determination can be used including different objects other than hemi-cube for projection or ray casting techniques, described in further detail in [SP94, Ash95]. The radiosity equation is traditionally solved by expressing the system of linear equations from Equation 2.6 as matrices. The solution is then obtained using iterative techniques, particularly the Jacobi or Gauss-Siedel methods. The time complexity of these techniques is of O(N 3 ) for N elements. Fortunately, this time can be reduced through the use of faster convergence tech-

2.6 Ray tracing

14

niques [CCWG88] and managing patch complexity adaptively [HSA91], see Section 3.4.2. Stochastic techniques for solving the radiosity equation using iterative methods may replace form factor computations with form factor sampling improving both the computation speed and issues with memory management [Bek99, DBB02]. One of the major disadvantages of classical radiosity techniques is that the solution is applicable only to ideal diffuse surfaces. Techniques have been presented to solve this problem for glossy surfaces [AH93], yet these techniques require precomputation of the radiosity equation from the rendering equation and resulting computations can be expensive. Most of the radiosity techniques that attempt to solve the rendering equation use hybrid ray tracing and radiosity systems. The combination of ray tracing and radiosity in the context of multipass algorithms is discussed in Section 2.8. Returning to the radiosity pipeline shown in Figure 2.3, it now becomes apparent how the computation of the form factors is entirely dependent on the geometry. However, if only the lighting were to change, the solution to the radiosity equation only need be re-computed, the form factors do not need to be recalculated since they only require visibility. The final aspect of rendering with radiosity is the viewing. Since the computation of the lighting is already computed for viewing static scenes, without geometry or lighting changes, no complex calculations need be performed and viewing can be sustained at real time rates. This facility makes radiosity ideal for walkthroughs of static scenes consisting of diffuse only (or mostly) materials. The viewing can be either carried out using rasterisation when interactivity is desired or through ray tracing. An example of the possible complexities of the radiosity viewing pipeline using rasterisation is shown in Figure 2.4. Many other improvements to radiosity rendering and techniques discussed here are described in [SP94, DBB02].

Figure 2.4: The radiosity pipeline using rasterisation for rendering. After [FvDFH90].

2.6

Ray tracing

Ray-tracing techniques encompass the set of algorithms that simulate light propagation in a virtual scene as rays representing groups of photons. The earliest form of ray tracing, was ray casting,

2.6 Ray tracing

15

a method of hidden surface removal which identified non-hidden surfaces by shooting a ray per pixel from the virtual camera onto a scene and identifying the first object hit [App68]. The earliest method of ray tracing for illumination, known as classical ray tracing was presented in [Whi80]. Classical ray tracing as proposed by Whitted is a view-dependent algorithm, whereby the scene is sampled from the point of view of the camera, tracing rays of light towards the scene and light sources. In classical ray tracing, rays were shot from a virtual camera into a scene, identifying the first object hit (assuming no participating media), shading it and if necessary recursing the computation for specular rays. Ray tracing works because radiance is constant along the line of sight as described in Section 2.3. The first operation is the identification of the surface hit. This takes the form of a series of ray-object intersections, which identify if and where on an object a ray has hit. The object that is hit is identified by the object associated with the ray-object intersection closest to the camera. The subsequent operation following the ray-object intersection would be to shade the object by computing a shadow ray to each light source, if another object was hit along the way, the object was in shadow, otherwise it would be shaded. Also, when surfaces which contained a pure specular reflection or transmission were hit, a further ray (or two) were shot in the ideal specular direction. This computation was performed recursively. The classical ray tracing algorithm dispenses with the diffuse-diffuse and specular-diffuse interactions by only tracing specular rays, recursively around a scene. This solved the specular part of the rendering equation in the same way that radiosity solves the rendering equation for purely diffuse surfaces. Subsequent improvements over ray tracing, using Monte Carlo techniques, such as distributed ray tracing [CPC84] and path tracing [Kaj86], provided a global illumination solution to the problem. We will discuss further aspects of ray tracing for fully solving the rendering equation below.

Figure 2.5: The ray tracing pipeline. After [FvDFH90].

One aspect of ray tracing is that the rendering is mostly performed through the computation of rays for most aspects of the rendering process and thus does not require the complexities associated with the rasterisation and radiosity rendering pipelines. The rendering pipeline for ray tracing, after [FvDFH90], can be seen in Figure 2.5. It is worth noting that this pipeline remains unchanged for most of more complex ray tracing techniques outlined below. Furthermore, ray tracing provides further simplicity in the amount of objects that it is possible to render directly through ray-object intersections [Gla89], including but not limited to: polygons, meshes, parametric surfaces, subdivision surfaces, point clouds etc. In contrast, both rasterisation and radiosity methods require a tessellation step before rendering most object representations.

2.6 Ray tracing

16

2.6.1 Improving the Computation Complexity due to Geometry

One of the major disadvantages of ray tracing was that its complexity was linear in terms of the number of pixels rendered and also the number of geometrical objects in the virtual scene. This cost is particularly high due to the relatively expensive nature of the ray-object intersection routines. The geometrical complexity was further compounded by the need to traverse the virtual scene for computing shadow rays and when recursing the computation for the component rays. Fortunately, a number of spatial subdivision techniques reduce the computational complexity substantially. The three most prominent algorithms used are octree data structures, binary space partition trees and grids. Octrees [Gla84] are used to adaptively sub-divide space non-uniformly into eight voxels recursively. Rays first intersect individual voxels and then the objects in the voxel rather than intersecting objects directly. If no object intersections are found the ray traverses the next voxel in the tree. Binary space partitioning trees work on the same principle, but the subdivision occurs along the three dimensional planes. The grids data structures subdivide space into equal areas, resulting generally in more voxels. However, the grids are easy to traverse using an extension of the line drawing Digital Differential Analyzer (DDA) algorithm, the 3DDDA [FTI86]. The time complexity of all ray tracing when using these algorithms for N geometrical objects is reduced to √ O( 3 N) [RCJ98].

2.6.2 Stochastic Ray Tracing

The classical ray tracing proposed by Whitted [Whi80] could not reproduce most aspects of the rendering equation. The deterministic nature of the classical ray tracing algorithm and similar algorithms based on it such as cone tracing [Ama84], beam tracing [HH84] and pencil tracing [HH84] that attempted to solve aliasing problems by grouping rays together, also meant that if further rays would be used to sample more complex BRDFs the aliasing could be disagreeable. Cook et al. [CPC84] introduced the notion of stochastic ray tracing using Monte Carlo techniques to accurately extend ray tracing to compute many high-fidelity aspects of rendering including soft shadows, glossy and diffuse surfaces, depth of field, motion blur etc. Aliasing associated with deterministic ray tracing techniques was replaced with noise, which is less objectionable to the human eye [Coo86]. Monte Carlo techniques could be used to solve the rendering equation by using a Monte Carlo R estimator [DBB02]. For an integral I = f (x)dx the Monte Carlo integrator < I > is given as:

2.6 Ray tracing

17

< I >=

1 N f (xi ) ∑ p(xi ) N i=1

for N samples and a probability distribution function p(x). The estimated value of < I > is equal to I. Monte Carlo integration is particularly important because it makes it possible to solve the rendering equation with arbitrary BRDFs by sampling a set of directions over the integrated domain [PH04]. For the rendering equation, Equation 2.4, Lr can be integrated using Monte Carlo integration by generating N samples over Ψ as:

< Lr (x → Θ) =

1 N L(x ← Ψi ) fr (x, Θ ↔ Ψi ) cos(Nx , Ψi ) ∑ N i=1 p(Ψi )

(2.8)

This method produces a tree of rays due to the recursive nature of the equation. Also, it is more favourable to compute the direct and indirect illumination of the equation at each level of the recursion since it is unlikely that the random samples will hit light sources. This conforms to the process of classical ray tracing, that is of tracing shadow rays after a point is hit. This process is known as splitting and is a general Monte Carlo technique. The computation of the rendering equation then becomes:

L(x → Θ) = Le (x → Θ) + Ldirect (x → Θ) + Lindirect (x → Θ) The stochastic methods provide solutions for both the direct lighting and indirect lighting. The direct lighting computation is calculated by directly sampling the light sources, such that rays are shot randomly on the surfaces of the area light sources, used to determine visibility and calculate shading accurately. The most common method of doing this is to shoot rays to each light source. However, techniques that prioritise and improve performance by sampling depending on the contribution of the shading to the final image do exist, see for example [War91]. Some of these methods will be discussed in Section 3.5.1. Indirect lighting needs to sample the hemisphere as described in Equation 2.8, and the choice of the probability distribution function becomes important. The probability distribution function could be based on cosine sampling, BRDF sampling or the incident radiance [DBB02]. A number of light transport algorithms are based solely on Monte Carlo ray tracing methods. Foremost amongst these methods were the techniques in [CPC84, Kaj86] which still suffered mostly from the large cost required to reduce noise. Bi-directional techniques which ray traced rays both from the eye and light sources improved performance to a certain degree, see for example Lafortune’s bi-directional path tracing [LW93], Veach and Guibas [VG94] and Pattanaik [Pat93]. Biased techniques improved the performance further while biasing the Monte Carlo integration,

2.7 Sampling

18

Figure 2.6: Examples of sampling methods. Top: random sampling (left) and uniform stratified sampling (right). Below: jittered stratified sampling (left) and curtailed jittered stratified sampling (right). A uniform grid is added to help visualise the image space. for example irradiance caching [WRC88] and photon mapping [Jen01], both described in Section 2.9. There were further enhancements to ray tracing algorithm many of which are documented in [Gla89] and [SM03].

2.7

Sampling

One aspect of particular relevance to ray tracing is the choice of where to position the samples. While this is of concern for most of the operations, such as sampling BRDF, lens, image plane etc., we restrict our discussion to the image plane only, primarily due to its importance for selective rendering. For more information for sampling across the other dimensions see [PH04].

2.7 Sampling

19

Image plane sampling is the process of selecting samples within or around pixels in such a way so as to reduce artifacts. One of the simplest and effective methods for sampling a pixel was to uniformly subdivide the pixel into strata where each strata would correspond to a sample. This technique compared to randomly sampling the pixel can be seen in Figure 2.6 (top). While such a result is an improvement over a purely random method, as we mentioned above, aliasing is less pleasing to the eye than noise. A jittered sampling approach improves the stratified technique by replacing aliasing with noise and maintaining a better distribution. The jittered stratified sampling technique can also be curtailed by allowing the jittering only within a certain distance to avoid overlap of some samples. Distributions of samples using these methods can be seen in Figure 2.6 (bottom). The number of samples for stratified sampling that are going to be generated needs to be known before the computation begins to subdivide the strata. Another method for generating well distributed samples is using low-discrepancy methods based on quasi-random sampling techniques. The advantage of these low discrepancy sampling methods is that unlike the stratified sampling method, but like the random method, they are hierarchical [WLH97]. Following [WLH97] and [PH04], a non-negative integer can be expressed as: r

ai i i=1 p

k=∑

for ai in [0, p − 1). The radical inverse function in base p, Φ p (k) converts the integer k into a floating point number in [0, 1):

Φ p (k) =

a1 a2 a3 ar + 2 + 3 ...+ r p p p p

The Halton sequence uses the radical inverse with different bases for computing the samples. For n−dimensional space, the Halton sequence would be (Φ p1 (k), Φ p2 (k), Φ p3 (k), . . . Φ pd−1 (k)), where the bases should be prime numbers. For the two dimensional case it is simply (Φ p1 (k), Φ p2 (k)). Figure 2.7 shows (Φ2 (k), Φ3 (k)) compared to a random sequence using 256 samples. Another example of a low discrepancy sequence is the (0, 2) sequence [KK02]. It is composed of the Φ2 sequence also known as the van der Corput sequence and the Sobol sequence. It is useful to demonstrate the hierarchical and stratified nature of the low discrepancy methods using this sequence. Figure 2.8 shows the stratification for 16 samples along five different boundaries. It is clear that the samples are well distributed. Figure 2.9 shows how the samples can be incrementally added and still maintain a good distribution. The same seed was used for the creation of all these sampling patterns.

2.8 Component-Based Techniques

20

Figure 2.7: Examples of sampling methods. Random sampling (left) and the Halton sequence (Φ2 (k), Φ3 (k)) (right). A uniform grid is added to help visualise the image space.

Figure 2.8: Examples of (0, 2) sampling method showing the stratified nature of 16 samples.

2.8

Component-Based Techniques

Since classical radiosity and classical ray tracing techniques account for global specular and global diffuse illumination respectively, it was to be expected that combinations of the two algorithms would arise. Rendering has been divided into components on a number of occasions in order to solve the rendering problem more efficiently. The following algorithms, more generally termed


21

Figure 2.9: Examples of (0, 2) sampling method showing the hierarchical nature for 2, 4, 8 and 16 samples.

multipass algorithms, computed components separately as a means of solving the rendering equation [Kaj86] completely and efficiently. The algorithms used were combinations of different rendering techniques, primarily radiosity and ray tracing approaches. [WCG87] presented a multipass algorithm that computed the diffuse component with a rendering pass and used a z-buffer algorithm for view dependent planar reflections. [SP89] adapted a technique proposed by [WCG87], using ray tracing for computing the specular component and the form factors of the non-planar objects, enabling multiple specular reflections. [Shi90] used a three pass method for varying components, path tracing from the light source was used for caustics, soft indirect illumination was obtained through radiosity and stochastic ray tracing completed the rest of the components. Another multipass algorithms that calculated components separately was Heckbert’s [Hec90] adaptive radiosity texture approach that calculated the indirect illumination using a technique similar to progressive radiosity yet storing the radiosity on textures and ray tracing from the eye. Chen et al.’s multipass

2.9 Enhancing Ray Tracing Performance

22

method [CRMT91] initially computed the indirect diffuse through a progressive radiosity pass which included ray-tracing for non-diffuse surfaces and was followed by the computation of caustics using light ray tracing and direct and high-frequency lighting using path tracing. The novelty of their algorithm was the progressive nature of the entire algorithm. Slussalek et al. [SSH+ 98] introduced the concept of lighting networks as a technique to render scenes based on the users combining the implementations of different rendering algorithms into a network and adding the functionality of testing the correctness of the network. The work of Chen et al., Heckbert and Slussalek all had some selective aspect which will be discussed in Section 3.8.

2.9

Enhancing Ray Tracing Performance

The current trend seems to demonstrate that Monte Carlo ray tracing only methods, as opposed to hybrid multipass algorithms using radiosity, are taking over as the de facto global illumination algorithms. Jensen [Jen01] underlines radiosity’s demerits in dealing with complex geometry, and the way the geometry handles light such as specular reflections on complex geometry, and also highlights the accuracy that ray tracing provides at a pixel level as compared to radiosity, which is particular interesting from a selective rendering aspect. While diffuse interreflections could be solved by the earliest stochastic ray-tracing algorithms, the computation times were very high. Two ray-tracing techniques have been generally used to accelerate the indirect diffuse calculations: the irradiance cache [WRC88], and the photon map [Jen01]. These two algorithms share a common property in that they store illumination in a geometry independent data structure. This decoupling of the lighting computation from the geometry makes it easier to avoid the above mentioned pitfalls associated with complex geometry in radiosity.

2.9.1 Irradiance Cache Ward et al.’s irradiance cache [WRC88] is an acceleration data structure which helps accelerate ray tracing based systems in global illumination calculations. The irradiance cache, caches indirect diffuse samples within the framework of a distributed ray-tracing algorithm [CPC84]. Traditionally, distributed ray tracing calculated the indirect diffuse component through sub-sampling the hemisphere by shooting a large number of rays where each ray contributed only a small fraction to the final result. The problem was further compounded due to the recursive nature of this algorithm. Ward et al. noticed that the indirect diffuse component was generally a continuous function over space not affected by the high frequency changes common with the specular component. The irradiance cache was designed to exploit this insight. Initially the irradiance cache

2.10 Optimisations

23

is empty. The first direct sample is calculated the traditional way and the result is cached in the irradiance cache’s spatial data structure usually represented by an octree. Whenever a new indirect component is required the irradiance cache is first consulted. If one or more samples fall within the user-defined search radius of the indirect diffuse component to be computed, the result is extrapolated from the samples using a weighted averaging strategy. Ward et al. demonstrated that the irradiance cache offered an order of magnitude improvement in overall computational time over the traditional method. The irradiance cache improves performance even further when rendering animations of static scenes, since the indirect diffuse computation remains constant. The irradiance cache has been adapted for glossy surfaces [KGBP05] and to handle dynamic scenes [TMD+ 04, SKDM05, GBP06]. Parallel versions of the irradiance cache are discussed in Section 2.10.3. Selective versions of the irradiance cache are discussed in the next chapter. The irradiance cache algorithm [WRC88] can also be viewed as a component-based system since the indirect diffuse computation and the rest of the computation are computed relatively independently.

2.9.2 Photon mapping Photon mapping functions similar to other particle tracing algorithms operating from light sources [Pat93], useful for rendering diffuse interreflections and ideal for caustics. The photon mapping can be broadly divided into two passes. In the first pass, photons are shot into the virtual scene from the light sources and recorded onto surfaces, traditionally using Russian Roulette for sampling the BRDF and to determine a stopping condition [Jen01]. In photon mapping, photons are stored onto a three dimensional data structure, usually a K-d tree as opposed to earlier techniques of using textures. A separate photon map is commonly used for the caustic photon map and the indirect diffuse photon map [PH04]. The second pass uses a traditional, potentially stochastic ray tracing approach, to render direct lighting, specular and glossy surfaces. When a diffuse surface is hit the photon map is consulted to compute the caustics or indirect diffuse illumination. For improved quality, final gathering should be performed for indirect diffuse surfaces and it is common to use a one-bounce irradiance cache to accelerate this aspect of the computation. As was the case with the irradiance cache, photon mapping could also be viewed as a component-based system in the calculation of the diffuse interreflections, caustics and the rest of the computation.

2.10

Optimisations

In this section we present optimisation to rendering based generally on making the best use of the hardware resources. General algorithms that speedup rendering due to adaptive or progressive techniques are dealt with in the next chapter.

2.10 Optimisations

24

2.10.1 Optimising Rendering on the CPU Wald et al. presented techniques for fast ray tracing [WS01, WSBW01]. They developed one of the first interactive ray tracers relying solely on standard ray tracing techniques mainly by making use of the underlying potential of modern CPUs. In particular they paid attention to minimising conditions and using tight loops, to ensure that they made the best out of the speculative, out-of order execution and avoiding branch predication failures. They also attempted to make effective use of cache aligning data structures on cache line boundaries, ensuring commonly accessed code lie on the same cache line and bunching rays together in coherent packets. They also improved the performance of their ray tracer by making use of SSE [Int03] instructions available in modern Intel processors, which can perform up to four floating point operations simultaneously. They achieved this by using structures of arrays instead of the more conventional arrays of structures, whereby a single ray structure is composed of an array of four (in this case) floating point values representing different properties of each ray. This technique is particularly useful for ray tracing primary and shadow rays. Their results demonstrated a speedup of 3.5 over similar C-based raytriangle intersections. For their Phong shading routines the SSE code achieves a speed up of two times more. Their results also showed that their ray tracer performed well in comparison to other established ray tracers and outperformed OpenGL based renderers, running on graphics cards, for scenes composed of millions of triangles. The ray tracer was improved to support dynamic scenes [WBS02], global illumination [WKB+ 02] and many paths per pixel to obtain globally illuminated antialiased interactive animations [BWS03]. If one criticism could be levelled at the techniques used for the fast ray tracing and global illumination algorithms it would be their use of brute force approach without taking into account techniques for speeding up performance through temporal coherence and visual perception.

2.10.2 Programmable Graphics Hardware Programmable graphics hardware are off-the-shelf products which have given the standard home PC the graphical prowess superior to the dedicated graphics workstations of the late nineties. The Graphics Programming Unit (GPU) lies at the heart of the modern programmable graphics hardware cards. One major factor in favour of GPUs is their speed, the calculated GFLOPs of a GeForce5900 was demonstrated to be 20GFLOPs in comparison to a theocratical peak of 6 GFLOPs for a Pentium 4 3GHz CPU [LHK+ 04]. The GPUs obtain the majority of their speedup by taking advantage of the inherent parallelism of the graphics pipeline for the computation of vertex transformations and pixel computations. These can be seen mainly as single instruction multiple data (SIMD) machines. Programmable graphics hardware has revolutionised the way in which rendering is processed on commodity platforms. Firstly, they have made it possible to compute rendered images at interac-

2.10 Optimisations

25

tive rates for scenes with large amounts of triangles. Secondly, their programming capabilities has added a level of flexibility to the rendering pipeline. The main problem which relates to these two points is the strict adherence to the rasterisation graphics pipeline which is ideal for interactive rasterised applications. However, due to their cost effectiveness and always improving performance rates compared to CPUs, GPUs have become attractive to people interested in high throughput for general purpose computing [OLG+ 05]. As has been demonstrated previously in this chapter, the rasterisation pipeline and ray tracing use largely varying algorithms. Nevertheless, attempts have been made to be able to transfer ray tracing based global illumination computations (not physically-based as yet) onto the GPU. Purcell et al. [PBMH02], used the GPU for programming a ray tracer. They used the GPU to process a set number of processes or kernels, representing different aspects of the computation. This was done by representing the ray tracing as a streaming process. The kernels used, implemented as fragment shaders, were to generate eye rays, traverse the acceleration structure, in their case a grid, intersect the triangles and shade. Their results demonstrated similar performance to ray tracing on optimised CPUs of Wald et al. [WPS+ 03] described in the previous section. Further work has been performed on global illumination on GPUs such as photon mapping [PDC+ 03] and a one bounce irradiance cache implementation [GKBP05]. The major issues with these techniques is that as the modification of the original algorithms to fit onto the rasterisation pipeline means some of these algorithms need to be recreated on GPUs and modified when newer GPUs appear that offer different programming options. Another disadvantage of current GPUs is the lack of native support for double floating point computation, ideal for accurate ray-object intersection. Certain approaches have attempted to solve the programming issue by abstracting the use of the GPU through providing programming models that map onto the graphics hardware and abstract the programming system. One such technique called BrookGPU [BFH+ 04], generalised the work of the Purcell et al. [Pur04] ray tracer in that the programming model is based on stream processing composed of streams of data and processing kernels. Their technique mapped the streams to the textures and the kernels to the hardware shaders but kept it abstract from the user by providing a compiler and run-time system that decides whether to run the computation on the CPU or GPU. This decision was taken at compile time. Other options are the use of hardware for physically-based rendering which was developed directly for general purpose computationally intensive applications, and therefore uses native double point computation and general purpose programming models [Cle]. Other approaches have attempted to develop specialised hardware for ray tracing, see for example [WSS05]. At this stage other interesting approaches do exist, mainly those that use hybrid techniques to render certain aspects of the computation in hardware, such as the shading cache [TPWG02], see Section 3.10, image previews for selective renderers [LDGC05, LDC06] and Section 4.6, and or tone-mapping on GPUs [ABWW03].

2.10 Optimisations

26

2.10.3 Parallel Rendering

Parallel rendering algorithms have been used to alleviate the cost of rendering for a number of years. Reinhard et al. [RCJ98] and Chalmers et al. [CDR02] offer a comprehensive analysis of the standard approaches for static and dynamic load balancing, data and task management, and more advanced approaches. Parker et al. [PMS+ 99] implemented a custom shared memory ray tracer for use on a 64-processor SGI Origin. They achieved linear speed up for highly complex scenes by making use of a masterslave demand driven model and a technique known as frameless rendering [BFMZ94] whereby work packets are sent to processors for subsequent frames even though the current frame would not have been completed yet. Wald et al. [WSB01] parallelised the fast ray tracer described above. Their first ray tracer based implementations achieved linear speed up as expected. However, it is their adaptation of the instant radiosity algorithm [WKB+ 02, BWS03] that we examine here. They adopted the instant radiosity algorithm [Kel97] by initially shooting virtual point light sources from each light source and photons which are stored inside a photon map for computing caustics and subsequently ray tracing the entire scene using their fast ray tracer. The parallel implementation used a master-slave demand driven model and quasi-random Monte Carlo techniques (to minimise communication and speed up calculations). To avoid sharpness in the image, they used an interleaved sampling technique by which they dissect their image into tiles of five by five, and each corresponding pixel in the tile was assigned to the same set of virtual light sources on a corresponding node thus avoiding communication amongst nodes for calculating the virtual light sources. A combination of this technique and the discontinuity buffer reduced aliasing artifacts. Using these techniques they achieve next to linear speedup for clusters of up to 48 processors. Gunther et al. [GWS04] extended this distributed framework further to support caustics through photon mapping.

Parallel Irradiance Cache

As described previously, the irradiance cache can improve rendering performance times by an order of magnitude. There have been a number of implementations of a parallel irradiance cache, primarily using R ADIANCE [LS98]. Since the irradiance cache is a shared data structure, a shared memory parallel version could easily be modelled on the uniprocessor version by providing access control on the caching data structure. Care would need to be taken with the granularity of the access control. On the other hand it is notoriously hard to design a distributed algorithm, due to the latency induced by having to transmit values, a situation further compounded since the algorithm achieves maximum performance when the cached samples are used immediately. This usually results in a trade off between communication and cache misses.

2.11 R ADIANCE: A Physically-based Renderer

27

The standard R ADIANCE distribution [LS98] supports a parallel renderer over a distributed system using the Network File System (NFS) for concurrent access of the irradiance cache. This has been known to lead to contention and may result in poor performance when using inefficient file lock managers. Koholka et al. [KMG99] used the Message-Passing Interface (MPI) instead of NFS for their distributed R ADIANCE implementation. The irradiance cache values were broadcast amongst processors after every 50 samples calculated at each slave. Robertson et al. [RCLL99] presented a centralised parallel version of R ADIANCE whereby the calculated irradiance cache values were sent to a master process whenever a threshold was met. Each slave then collected the values deposited at the master by the other slaves.

2.11

R ADIANCE: A Physically-based Renderer

In this section we present R ADIANCE [War94] as an example of a physically-based renderer and provide motivation for the use of R ADIANCE as the base for the implementation of parallel, selective and time-constrained renderers in this thesis. The version of R ADIANCE used throughout this thesis is the official R ADIANCE 3.6 release. R ADIANCE is a light simulation package, that encompasses many of the ray tracing techniques described in this chapter. Realistic rendered images produced by R ADIANCE can be seen in Figure 1.1 and dotted around this thesis. In terms of geometry, R ADIANCE is capable of modelling polygons, meshes and general quadric surfaces. R ADIANCE supports many BRDFs such as the isotropic and anisotropic BRDFs described in [War92] and provides the facility of being able to program new BRDFs. In terms of rendering, R ADIANCE is predominantly based on distributed ray tracing [CPC84]. Direct lighting is computed by randomly sampling the light sources, but is improved in performance by using the selective shadow testing algorithm [War91], see Section 3.5. The area light sources themselves are adaptively subdivided depending on the distance to improve accuracy and performance. Since R ADIANCE does not shoot any photons from the light sources, light source illumination in highly specular objects is obtained by computing virtual light sources. The indirect calculation is obtained for specular and glossy reflections and refractions by spawning a secondary ray in the required direction. The direction is calculated by Monte Carlo importance sampling of the BRDF. The indirect diffuse calculation is calculated using the irradiance cache algorithm described in Section 2.9.1. Participating media is supported in the form of single-scatter homogenous participating media. The ray tracing uses an octree as a spacial subdivision data structure and also supports instancing to improve performance. R ADIANCE also has a wealth of tools associated with it, such as tools for tone-mapping, generating animations, and fparallelism. One criticism that can be applied to R ADIANCE is its inability to accelerate the computation of caustics, due to its lack of tracing photons from the light sources, a facility which has been included into an extension of R ADIANCE [Sch06].

2.12 Summary

28

The reason for using R ADIANCE for the renderers implemented to show the results from the algorithms in this thesis is that it is primarily an established physically-based renderer and that the source code is freely available and free to customise. While other free to use physicallybased renderers are becoming available that could potentially provide further flexibility, such as pbrt [PH04], they are relatively new, were not available at the start of this work and are not yet established. R ADIANCE also has a large user-installed base and community, both commercial and academic, meaning that bugs are ironed out and tools are readily available. This also means that there are a large number of varied and complex scenes that people have developed over the years which can be used for benchmarks (as shall be seen in the upcoming chapters) instead of relying solely on arbitrary scenes for rendering. By doing so the timings generated by the implementations of our novel algorithms can be immediately compared with the traditional timings of software using relevant scenes that are of interest to the rendering community.

2.12

Summary

In this chapter we have presented the theory and methodology for computing physically-based rendered images. In particular, we have show how ray-tracing functions and described some of the most common aspects of rendering with ray tracing. We have also shown different methods for sampling for ray tracing, and the particular advantages of low discrepancy methods that we will return to when constructing our selective rendering algorithms. We have also discussed techniques that improve rendering performance based on taking advantage of hardware, for CPUs, GPUs and parallelism. Finally, we have presented an overview of R ADIANCE, a physically-based renderer that we shall be using for testing our novel algorithms in this thesis.

Chapter 3

Selective Rendering In order to alleviate costs for rendering complex scenes a number of techniques have been devised that select specific computations to be performed over others. These selective rendering algorithms are able to achieve high computational savings in a number of ways. Firstly, by adapting the computation so that resources are concentrated on the most favourable parts of the computation. These rendering techniques, we term adaptive, perform non-uniform operations on various elements to be rendered. Some criteria defines which elements to refine further. Secondly, by rendering incrementally in successive steps such that the computation can be stopped at any point. These methods we term progressive rendering methods. Thirdly, by identifying the most beneficial computations within a number of constraints, such as time, so that the system attempts to make best use of available resources. We term this rendering under system constraints. Selective rendering algorithms can be defined as those techniques which require a number of rendering quality decisions to be taken and acted upon prior to, or even dynamically during the actual computation of any image or frame of an animation. Selective rendering algorithms all require a number of decisions to be made identifying which elements of the computation are to be controlled and based on what criteria. In this chapter we survey this area of rendering. We provide a description of the different approaches, and discuss the methodology underpinning several selective rendering algorithms. This chapter begins by introducing the criteria that drive selective renderers, from the simplest Section 3.1, to the more complex based on the human visual system in Section 3.2 and Section 3.3. Selective rendering for rasterisation and radiosity in Section 3.4 is followed by simple selective ray tracers. The more complex ray tracing techniques for selective ray tracing are discussed in Section 3.6 and Section 3.7. Selective techniques that used component-based rendering to some degree are presented in Section 3.8 and rendering under system constraints in Section 3.9. Section 3.10 describes sparse sampling techniques that use some form of caching scheme for temporal coherence.

29

3.1 Selective Criteria

3.1

30

Selective Criteria

Selective rendering techniques, particularly the ones that adapt non-uniformly, rely on some criteria to decide whether and when to refine the computation. For the simplest and earliest selective renderers, the criteria was based on simple difference in radiance or intensity between computations, whereby further computations would be performed if the difference was below a certain threshold. An example of this is the antialising technique for ray-tracing presented in [Whi80], where further rays were shot within a pixel if the intensity at pixel corners was below a predefined threshold. Subsequently, selective algorithms began to rely on some measure of variance amongst the sampled pixels, see for example [LRU85, Pur87]. While these criteria were useful, they ignored the fact that the human visual system’s response is not directly relative to the physical stimulation, so computer graphics researchers began to study the human visual system to best take advantage of its limitations. We briefly describe the human visual system and related techniques next.

3.2

The Human Visual System

The human visual system processes visual information using a complex mechanism that encompasses the eye where the light is input and detected, the visual pathways which transmit the signals to the visual cortex, where they are ultimately decoded and interpreted [BS06]. Light enters the eye through the cornea, is deflected through the pupil and projected onto the retina. The retina is a layer of tissue consisting of a large number of photoreceptor cells. There are two types of cells in the retina, the cones and the rods. There are approximately 7 million cones and 100 million rods. These photoreceptors are sensitive to light between the wavelengths of 400 to 700 nanometers (nm). The rods are very sensitive to light and provide light sensitivity at low illumination. The cones are less sensitive to illumination but unlike the rods are sensitive to colour. There are diffident types of cones which respond to different wavelengths corresponding to blue (420nm), green (534nm) and red (564nm). The majority of the rods lie in the foveal area of the retina and account for the region with the best capacity for resolving visual information. The rods and cones connect to the ganglion cells which transmit signals along the optic nerve. Through the use of both physiological and psychophysical experiments it has been established that the components of the retina do not act independently and respond by means of a centre surround antagonism of the ganglion cells to changes in contrast and not changes in light intensity, spatial frequency and orientation [Fer01]. We discuss some of these simpler responses below, further information on these mechanisms can be found in [BS06]. Visual acuity is a measurement of human visual system’s ability at resolving detail. Since the human visual system relies mostly on change in contrast, visual acuity is depended on contrast

3.2 The Human Visual System

31

Figure 3.1: An example of using the VDP [Dal93] (middle) showing perceptual differences between pixels in the rendered images (left and right) in false colour. sensitivity. Visual acuity is best measured as a function of visual angle, measured using angular degrees. One degree subtends on the retina a 1cm object at a distance of 57.3 cm approximately. One degree is composed of smaller units called minutes, where 60 minutes = 1 degree. Each minute in turn is composed of 60 seconds. The visual acuity based on contrast is limited to 0.5 minutes of visual angle. Hyperacuity is the ability of the human visual system to discern small changes in the visual field at between four to six seconds of the visual angle, an ability which has implications for aliasing artifacts in rendering [Fer01]. The visual acuity of the human eye is strongest on the fovea region of the eye. When attempting to resolve detailed information in a scene, the eye’s path is directed in such a way to expose this information onto the fovea. When moving from one region to the next, the human visual system’s visual acuity is reduced. This high acuity to low acuity back to high acuity “jump” from one interested region to another is known as a saccade. The light sensitivity of the human visual system can function over a large range of luminance which can range from 10−3 candela per square metre (cd/m2 ) to 105 cd/m2 . However, the human visual system is not capable of seeing the entire range of luminance concurrently but only around four orders of magnitude in this range. The visual system adapts whenever large changes in luminance are required. Even at different ranges the human visual system functions differently. At low illumination levels small changes in luminance can be distinguished albeit with low visual acuity. At higher luminance levels the visual acuity is improved at the cost of the sensitivity to luminance differences.

3.2.1 Perceptually-Based Metrics Perceptual metrics were developed as image-space algorithms that measure the perceptual difference between two images. A large number of perceptual metrics exist, see for example [RWP+ 95, OHM+ 04]. However, for rendering purposes it is more convenient to be able to identify the per-

3.3 Attention

32

ceptual differences at a pixel level. Two visual difference metrics, which have become popular in computer graphics rendering because of such a facility, are the Visible Differences Predictor (VDP) [Dal93] and the Sarnoff Visual Discrimination Model (VDM) [Lub95] which are considered similar in performance [LMK98]. The VDP takes two images as input and outputs a probability map of perceivable differences. The first stage, amplitude non-linearity, accounts for the non-linear response of the human visual system at lower and higher luminance values, usually modelled on a threshold versus intensity function. The images are then transformed into the frequency domain for the second stage which corresponds to the contrast sensitivity accounting for the way the luminance non-uniformly changes across a scene. Subsequently, the image is split into 31 channels, composed of six orientations at five spatial frequencies and an additional independent base frequency, after which, the channels are transformed back into the spatial domain. Masking is then applied and a single threshold elevation map is computed for both images per channel. The contrast between both images is computed per channel and scaled by the threshold elevation. This result is scaled by a psychometric function to compute a probability of detection per channel, which is then combined into one image. Figure 3.1 demonstrates an example using the VDP for two different renderings of the same image at different quality settings. The VDP image is displayed over the images with false colours to highlight the areas of largest differences. The image differences in the rendering are a result of using different quality settings for computing the direct lighting of the area light source, which is particularly noticeable in the shadow of the transparent sphere.

3.3

Attention

Another important aspect of the human visual system accounts for how and where observers may focus their attention. Two major approaches broadly determine where humans direct their visual attention [Jam90]. These processes are labelled bottom-up, which is an automatic visual stimulus and top-down, which is voluntary and focuses on the observer’s goal within an environment. The bottom-up process is primarily influenced by salient features in a scene such as contrast, size, shape, color, brightness, orientation, edges and motion. Koch and Ullman [KU85] presented the idea of a saliency map, a two dimensional map that encoded the most conspicuous locations of a view of a scene. Itti et al. [IKN98, IK00] developed a saliency-based computer model that predicted the saliency of an object in the scene based on the early primate visual system. This model was composed of a number of features which attempted to identify conspicuous features based on colour, intensity and orientation. The image generation was split into the three features. For each feature a gaussian pyramid was applied and a centre surround difference was used for

3.3 Attention

33

Figure 3.2: An example of the Itti et al. [IKN98] saliency map for the Corridor scene. feature extraction. The intensity was obtained from the colour channels, the colour feature map was obtained from the chromatic opponency of green and red and blue and yellow. The orientation was obtained from the local orientation contrast. The features were then combined into a saliency map. The location with the highest saliency was predicted by a winner takes all neural network. An example of an image generated using this saliency map can be seen in Figure 3.2. Yee et al. [YPG01] were the first to use the saliency map in computer graphics in the form of a perceptual oracle that directed rendering resources. Furthermore, Yee et al. extended the conspicuous feature maps to include motion. They incorporated into their model the sensitivity of the eye due to motion and the ability of the eye to track moving objects. They termed their novel saliency map the Aleph map. While it was possible to calculate the motion using image-based techniques, since the computer graphics model naturally provided the information in model space, the pixel motion could be easily obtained by finding the difference in motion of objects at consecutive frames. Their selective renderer is discussed in Section 3.7. Longhurst et al. [LDC06] developed a GPU friendly saliency map. Their work offered a number of innovations. Primarily, it was designed to be implemented on the GPU, around seventy times faster than a similar CPU version. In order to achieve this they discarded the winner takes all approach from the Itti and Koch saliency map instead highlighting the importance of the saliency for each pixel rather than attempting to identify which part of the image would be attended to first. They also simplified the orientation channel with an edge detector. Furthermore, they add the motion channel as in Yee et al.’s model, also based on model space calculations. They also followed the advice of Marmitt and Duchowski [MD02] who through their evaluation of the Itti and Koch saliency map suggested the algorithm did not perform well for virtual environments since it was computed from scratch at each frame and lacked any understanding of human’s memory and familiarity with objects, by modelling the behaviour termed habituation. The habituation channel maintained the saliency of an object for the first three seconds and decreased it over the next ten

3.3 Attention

34

Figure 3.3: Rendering around the foveal angle. The green area was rendered in high quality, the red area in medium quality and the rest in low quality. The task consisted of counting the pencils in the mug. From [CCL02].

seconds approximately based on suggestions by [MNS00]. While the above approaches have considered the reaction of human observers to visual stimuli, Mastoropoulou et al. [MDCT05] used sound to involuntarily direct the visual attention of a viewer towards a sound-emitting object within a virtual environment. They showed that viewers failed to notice the degradation in quality in pixels outside of the foveal distance of the sound-emitting object when a sound was played, but noticed when it was not. For further details about this experiment, including timings and to selective rendering see Section 4.5. The top-down approach was demonstrated in Yarbus [Yar67], in which experiments showed how the movements of the eye were related to the task being performed. The human visual system perceives only a small portion of items within a scene while attending to others. This property of the human visual system is known as inattentional blindness [MR98]. Furthermore, when occupied, the human visual system finds it difficult to notice small changes within a scene, a phenomenon known as changed blindness [ROC97]. Cater et al. [CCL02] used these concepts to identify tasks objects prior to the rendering of the animation. This could be thought of as part of the modelling process. They verified their work by means of a psychophysics experiment. They used an animation comprising the task of counting pencils in a mug, rendered at three different qualities, representing low, high and selective, see Figure 3.3. The selective quality had higher quality for the foveal angle around the task object. They showed pairs of animations to a series of viewers, who failed to notice any difference between the high quality and selective quality animations when they were performing the task. In [CCW03], the concept of the task map was introduced

3.3 Attention

35

Figure 3.4: The Corridor scene. Top row, from left to right: the fire safety objects identified as tasks, the task map, the task map with foveal angle gradient applied. Bottom row, from left to right: the saliency map, the importance map of IM(0.5, 0.5, +) and the final rendered image.

for identifying objects which are given a task value, a value between 0 for unimportant objects to 3, for very important objects that are meant to be followed closely for a task. Figure 3.4 (top) demonstrates the task map for fire safety objects used in the task experiments from [SDL+ 05]. The objects deemed important to the task at hand are identified (top left) at the modelling stage, the selective renderer identifies these objects at the rendering stage (top middle) and computes a foveal angle gradient around these objects (top right). Further details of how selective rendering is performed for such methods will be described in the next chapter. Sundstedt et al. [SDL+ 05] integrated the concepts of the task maps and the saliency maps into an importance map. The importance map is a two dimensional map defined by IM(wt , ws , op) where the wt is the weighting applied to the task map, ws is the weighting applied to the saliency map and op is the operator that is performed to compose the importance map. Using this operator, a saliency map would be IM(0, 1, op), a task map would be IM(1, 0, op) and an equally weighted importance map would be IM(0.5, 0.5, +). Figure 3.4 shows an example of an importance map (middle bottom) composed of the task map (above-right) and the saliency map (below-left) using IM(0.5, 0.5, +).

3.4 Selective Rendering Techniques

3.4

36

Selective Rendering Techniques

Selective rendering methods are techniques for speeding up or controlling rendering through nonuniform adaption, progression and system constraints as introduced at the beginning of this chapter. In presenting selective rendering we follow the approach of the previous chapter by presenting techniques for rasterisation, radiosity and ray tracing but focusing mainly on selective ray tracing methods.

3.4.1 Selective Rendering for Rasterisation The majority of selective rendering techniques for rasterisation centre around reducing computational costs by reducing the number of geometrical objects, usually polygons, that need to be sent through the rendering pipeline. Most of these techniques are level of detail [Cla76] techniques. The simplest level of detail techniques store distinct versions of any given model and discretely switch between them based on some criteria, traditionally distance. More complex techniques adapt non-uniformly across a given model and use perceptual metrics to select the level of detail, see for example [LH01]. Level of detail is a very large field and since we are mostly interested in ray tracing techniques we will not describe further details here. For a comprehensive look at level of detail techniques for rasterisation see [LWC+ 02, AMH02]. An alternative approach to level of detail for selective rendering with rasterisation was proposed by Bergman et al. [BFGS86] whereby the complexity of the shading is adaptively refined from displaying vertices only, to edges, to flat shading, Gouraud shading, Phong shading and antialiasing. Individual polygons may have been rendered at different qualities concurrently depending on which are considered to give the better result. No metrics were developed for selecting polygons that were to be refined, instead a set of heuristics were used. Bergman et al.’s approach may be seen as a precursor to the component-based selective rendering approaches outlined in Section 3.8.

3.4.2 Selective Radiosity Progressive refinement radiosity [CCWG88] improves the usability of the radiosity technique through a simple selective process. Rather than arbitrarily solving the radiosity equation, Equation 2.7, for each patch individually, progressive refinement radiosity approached the problem by shooting flux into the scene. The flux was first shot from the patches that emit and reflect the light most. Each path was allocated an amount of unsent exitance, initially this corresponded to the light sources. The patch with the highest flux was identified and the radiosity equation computed for each patch. Each patch received some portion of the exitance which it added to its unsent exitance

3.5 Selective Ray Tracing

37

and also stored for further use. After the exitance is completely shot, the sending patch’s exitance was then reset to zero and the next patch with highest flux was shot. This algorithm is progressive since the process could be stopped at any of these iterations. For viewing purposes, an ambient term was added which accounts for the average reflectance of the unsent flux. The progressive nature of this algorithm improved the usability since it could generate a reasonably good image in less time than other methods. Cohen et al. [CCWG88] demonstrated a scene converging to an acceptable rendering in 100 steps while the traditional radiosity solution was still computing the first iteration of a single patch. Another selective technique used to make radiosity practical was the hierarchical radiosity approach [HSA91]. Rather than compute the entire radiosity formulation for every tesselated element, hierarchical radiosity clustered together patches of adjacent elements to compute from, based on some criteria, such as magnitude of form factor. The computation was first performed at a coarse level then adaptively refined. This method is akin to techniques for solving the N-body problem [Gre87]. A further selection of selective radiosity techniques is given in [PP99].

3.5

Selective Ray Tracing

In this section we begin by presenting the earlier techniques in selective ray tracing. In subsequent sections we present state of the art selective ray tracing methods. Mitchel [Mit87] presented a ray tracer that non-uniformly sampled rays across the image plane. Initially a coarse grain computation was performed using non-uniform Poisson sampling. The next stage could be considered as one of the earliest attempts at exploiting human vision. Rather than just using the variance to decide where to use further samples, the non-linear response of the eye to changes in intensity was modelled using contrast measure for the red, green and blue channels. A separate threshold was used for each to decide when to shoot further rays. Painter and Sloan [PS89] presented a ray tracing algorithm that is both adaptive and progressive. They used a K-d tree for storing samples and identifying where the next sample is to be shot. For nodes greater than a pixel their sampling strategy prioritised for larger areas and areas containing edges. For pixel level nodes a heuristic comprised of the product of area contributing to the size strategy and variance of a current node and surrounding nodes conforming to the edge location priority was used. For nodes larger than a pixel the maximum of the two child priority values were used. For sub-pixel nodes, variance was used to identify where the next sample would be shot. While Painter and Sloan suggest using the non-linear response to the human visual system they had not implemented this approach. Meyer and Liu [ML92] extended the work based on the Painter and Sloan suggestion.


38

While the Metropolis Light Transport (MLT) algorithm [VG97], is a general rendering method for computing the rendering equation, its inclusion here is commended by its ability to naturally adapt due to its Metropolis sampling mechanism. Metropolis sampling is used for sampling functions from just being able to evaluate the function [DJA+ 04]. MLT uses random mutations in the bidirectional paths and random perturbations in lens and specular surfaces such that random paths are distributed based on the radiance. In such a way MLT methods can help identify and improve aspects of the computation that may require further computation, such as caustics or areas of higher radiance, automatically. Guo [Guo98] presented a progressive rendering algorithm based on the use of a directional coherence map (DCM). The approach, initially regularly, then recursively subdivided the image plane into blocks of four using a quadtree by calculating the radiance at the corner pixels of each block. The second stage was progressive and involved subdividing the blocks into smooth blocks or blocks that are deemed to not have discontinuities, and edge blocks. Edge blocks were chosen if the corner pixels are above a contrast based threshold and through the use of rasterisation to detect lines that may have escaped the threshold tests, particularly useful for large blocks. While smooth blocks were only sampled at corners, edge blocks were sampled along the boundaries. The edge blocks have the least discrepancy direction calculated, which can then be used for the interpolation stage and testing the child nodes of this edge block for further subdivision. Edge blocks can be further classified as complex through a series of tests which indicate more than one edge within a block. Children of complex edge blocks were tagged automatically as edge blocks for the next iteration. The final step subdivided blocks into four separate quads and continued the process at the beginning of the second stage. At any time the image could be reconstructed by interpolating from the calculated samples using the DCM for edge blocks. This work was extended in [FP04] to use an adaptive perceptual metric rather than the discontinuity map, similar to the techniques described in Section 3.6. Wolleey et al. [WLWD03] described a progressive approach to both interactive ray casting (ray tracing without any secondary rays) and interactive level of detail that was automatically interruptible when a temporal error estimate exceeded a spatial error estimate granting adaptability over space and time. We are only interested in their ray casting application which used a quadtree for progressive rendering. The spatial error was denoted by the maximum size of the image that each ray sampled. The temporal error was calculated by the distance of the current image rendered to the location of the latest input. Rendering could occur in one of two buffers, the front buffer (or the buffer being currently displayed) and the back buffer. Buffers could be switched on one of two occasions. Firstly, when the combined spatial and temporal error of the back buffer was smaller than that of the front buffer the buffers were switched. Rendering would continue in the front buffer while the temporal error was smaller than the spatial error. Secondly, when the back buffer temporal error exceeded the spatial error. Their system allowed rendering at interactive rates whereby coarse images are generated at high frame rates when the interaction was moving


39

rapidly and finer rendering at lower frame rates for slower input.

3.5.1 Selective Rendering for Specific Components While the techniques above deal with the selective forms of general ray-tracing, specific aspects of the computation can also be selective. A common area of applying adaptive techniques is to the computation of direct lighting for ray tracing. One such technique is the selective shadow testing algorithm [War91]. Rather than just shoot shadow rays to each area light source and compute the direct lighting for those points not in shadow, selective shadow testing first computed the illumination for each of these points, then ordered these points based on a number of criteria: distance from light source, size of light source and brightness. The combination of these criteria were designated as potential. Shadow rays were then computed for the points with the highest potential until the total remaining potential was some user set fraction less than that which had been computed. To further improve quality of the final results, the rest of the points were computed using statistics collected from previous visibility tests. This algorithm reduced the complexity of the shadow testing from linear to logarithmic. A similar method was presented in [SSW+ 06]. The scene is divided into cells and the contribution of each light source to a particular cell is stored into two lists, one for those light sources deemed important for this cell and one for the rest. Sources were then sampled according to their location in the lists. Another related technique, is the lightcuts method [WFA+ 05]. The lightcuts method clustered light sources into a binary tree, such that each parent in the tree corresponds to one of the child nodes. When shading, the computation was performed progressively. When a node was calculated, the upper bound of the child nodes were computed and if the error was greater than a certain threshold, usually Weber’s law, the the child nodes are evaluated, until either the criteria is satisfied or the leaf nodes, the actual light sources, are used. The results demonstrated an improved speedup over the Ward technique [War91], presented above. Importance sampling techniques can be used to identify where to best shoot further samples based on certain aspects of the computation that are known such as cosine sampling for diffuse computations, the BRDF and incident radiance fields [DBB02]. Ongoing computations can provide partial knowledge of the scene radiance which can be used to direct further samples [LW95, Jen95]. For photon mapping, visual importance can be used to direct photon shooting to areas of the scene that will contribute more to the final generated image [PP98]. This method shot importance photons or importons from the view, when the photon map was built, photons were then shot and stored in areas considered visually important based on the density of importons. The adaptive radiance cache and irradiance cache [KBPv06], adapted the (ir)radiance cache search radius depending on the contribution of the cached sample to the samples which are meant to in-

3.6 Selective Rendering based on Perceptual Differences

40

terpolate from it. When rendering using adaptive (ir)radiance caching, all samples were first identified and placed onto a queue. When a sample was required to be interpolated from more than one cached record, if the outgoing radiance of the interpolated resultant was discernable due to a Weber law-based threshold, from the sample interpolated without one of the cached samples, the cached sample with the least weight had the contribution radius reduced. This process occurred iteratively until no other cached samples need to be reduced further. This process improved rendering performance for simple scenes and produced better results in scenes with more complex shading.

3.6

Selective Rendering based on Perceptual Differences

Myszkowski [Mys98] presented the possibility of using visible differences predictors in rendering for global illumination. Daly’s VDP [Dal93] was used as a perceptual metric to guide a progressive Monte Carlo path tracer and hierarchical progressive radiosity renderers. The VDP was used to detect when to stop rendering within the context of the Monte Carlo path tracer by comparing generated images at regular intervals. When the perceptual differences between the two images was below a certain threshold the rendering was stopped. Unfortunately, the time of selective rendering with the VDP was only marginally better than traditional rendering due to the expensive VDP computation. Bolin and Meyer [BM98] concurrently developed a similar approach based on the VDM [Lub95]. Myszkowski et al. [MRT00] furthered the application of the VDP as a selection criteria, by adapting it to animations termed the Animation Quality Metric (AQM). The AQM was used for rendering animations without change in light and geometry. Rather than render each frame, key frames were identified and the frames in between key frames were computed using image-based rendering (IBR) techniques. For two given adjacent key frames, the IBR frames were generated starting from each frame and the two adjacent frames corresponding to the middle of the key frames were compared using the AQM. If there were too many differences a new image is rendered using a combination of traditional rendering and IBR and the process is repeated recursively. Ramasubramanian et al. [RPG99] developed a physically-based perceptual metric with the distinct feature that the spatial component and the luminance component can be computed individually. They argued that the spatial component was largely unaffected by the indirect illumination computation of the rendering process. This led to them developing a selective rendering algorithm based on path tracing whereby a precomputation stage involved the calculation of the direct illumination which when combined with an approximate ambient term was used to calculate the expensive spatial component of their perceptual threshold map. The subsequent stage involved the computation of the indirect illumination using a progressive algorithm which combined the luminance dependent component with the spatial component to predict which areas of the image

3.7 Selective Rendering based on Perceptual Oracles

41

required more samples. This stage was computed iteratively each time progressively refining the computation. They reported a difference in calculation of the spatial component and the luminance component of two orders of magnitude. While similar to the technique by Myszkowski [Mys98] and Bolin and Meyer [BM98] the precomputation of the spatially dependent component improved performance considerably. A method related to those above, due to the progressive technique, is the multidimensional lightcuts techniques [WABG06], which was an extension to the original lightcuts method presented in Section 3.5.1. The system is similar to the lightcuts method but was extended to handle indirect diffuse, participating media, motion blur and depth of field. This method maintained two trees one for the light samples that cater for indirect diffuse, participating media and indirect lighting and one for the pixel samples that catered for motion blur and depth of field. For any rendered image a light tree was created and for each pixel a pixel sample tree was also created. The Cartesian product of the two trees, termed the product graph was used to adaptively progress the computation. The algorithm was progressed by identifying if the upper bounds of child nodes were within a given threshold, usually a 2% limit based on Weber’s law, and if not the algorithm recursed to the two child nodes. By considering the rendering as a single equation over the multiple domains and selectively rendering over these domains, the large number of interactions between points and lights were shown to be reduced by many orders of magnitude.

3.7

Selective Rendering based on Perceptual Oracles

Yee et al. [YPG01] removed the need for expensive comparison operations completely by using a perceptual map as an oracle rather than using stopping conditions. They introduced the Aleph Map, described in Section 3.3. The Aleph Map for a given image was initially constructed from a rasterised rendered image using OpenGL. Unlike techniques in the previous section it needed to be computed only once. The Aleph Map was then used to direct the indirect illumination computation. They demonstrated their techniques by using, as a selective renderer, a modified version of R ADIANCE that used the Aleph Map as input. The Aleph Map was consulted to modify the irradiance cache search radius for the indirect diffuse computation. Since for images rendered without a precomputed irradiance cache the bulk of the computation is the indirect diffuse computation, the strategy employed was successful in reducing computation cost by close to an order of magnitude for the tested scenes. Haber et al. [HMYS01] used a combination of saliency map for bottom-up visual processing and what they term an importance map (not to be confused with the importance map described in Section 3.3) for top-down visual processing in their interactive selective renderer as their perceptual oracle. Their saliency map was based on the Itti et al. [IKN98] model without the use of the expensive to compute orientation channel. The importance map used distance and pixel coverage


42

of non-diffuse objects in what the authors described as being a top-down approach to interactive environments. Their renderer used particle tracing to pre-compute diffuse environments and rasterisation for direct rendering. The combined saliency map was used for ray tracing of only the non-diffuse objects. Their approach worked best on parallel hardware when the map computation and rasterised rendering were run on separate processors and any number of remaining processors could be used for the ray tracing. They demonstrated results for interactive rendering at around eight frames per second using an eight processor machine. Cater et al. [CCW03] used the notion of task importance [CCL02], to drive their selective renderer. Initially they sub-sampled the image with one ray per sixteen pixels, identifying the task objects from the ray casting operation. The image preview was then used to generate an error estimate using the variance of five closest already shot samples, combined with the task map, and a combination of contrast sensitivity function and motion into an error conspicuity map which was then used to direct the rest of the rays to be sampled. The selective renderer was also used for animations in which case they would use image based rendering to re-project pixel samples based on a number of criteria right after the initial sub-sampling stage. They demonstrated a speedup of about seven times the traditional rendering for a four minute animation without the use of indirect diffuse computations. Anson et al. [ASGC06] presented a selective renderer for accelerating the computation of participating media. They created an extinction map, the X-Map, from the accentuation of the rendering from the primary rays to the first intersection which stored the percentage of light returned to the eye. They combined the X-Map with a saliency map obtained by rendering only using direct lighting into an XS-Map using a similar technique used for the importance map. They used the XS-Map to direct the rays per pixel shot for a selective renderer and obtained a speedup of five to ten times for complex scenes. Other selective renderers that use perceptual oracles are, the selective renderer for importance maps [SCCD04, SDL+ 05], selective rendering for auditory bias of visual attention [MDCT05] and the selective renderers in [LDC05, LDGC05, LDC06]. More recent versions of these selective renderers will be discussed as case studies in selective rendering in Chapter 4.

3.8

Component-Based Techniques

The component-based techniques outlined in Section 2.8 were primarily created as a method of efficiently solving the rendering problem by computing the individual components using various possibly faster techniques, some of them provided the flexibility to control the rendering, we discuss these methods below.


43

While not strictly a component-based approach, shade trees by Cook [Coo84] used in the REYES renderer [CCC87] were the first attempt at adaptability for shaders. The influence of this approach is clear in all the component-based approaches. Prior to this approach every surface was illuminated using the same BRDF. The shade trees presented a novel method for an animator to program the surface reflection function of any surface. This added a great flexibility which provided the facility for animators to render images composed of both complex and simple surfaces. Perlin [Per85] added control structures and flow to shade trees in the form of a high-level language. The adaptive radiosity textures [Hec90] algorithm was novel since it included an initial rendering stage entitled the size pass which, while determining visibility, also dictated the adaptive subdivision of the radiosity textures related to the screen size. The second pass of this algorithm was composed of a backward ray tracing pass similar to progressive refinement radiosity but the radiosity was stored on the adaptive radiosity textures. Light rays were also adaptively sampled. The final pass termed the eye pass was a traditional ray tracing pass using the adaptive radiosity textures for the direct computation. Adaptive sampling from the eye pass was also used in the form of the traditional adaptive approach [Whi80]. While these forms of selective rendering did not offer much in terms of modern selective rendering algorithms it was novel in terms of its adaptive approach. The progressive multipass method by Chen et al. [CRMT91] gave user control at various points of the rendering computation. Their rendering process began with a progressive refinement radiosity pass with extended form factors using ray tracing accounting for specular effects. Due to the use of progressive radiosity the computation converged to a reasonable image within the initial iterations. Further iterations would have refined the computation further. At any point the user was permitted to interrupt the computation to progress onto the subsequent stage. At this stage a view independent solution would have resulted. When a view was selected the high frequency computation for caustics and direct illumination was computed using ray tracing from the eyes and light sources. After this stage the user could chose to improve the low frequency by either continuing the progressive computation or by using a low frequency refinement based on path tracing. This approach was particularly interesting in its control over distinct rendering paths. Slussalek et al.’s [SSH+ 98] composite lighting simulations represented a formalisation of many of the multipass methods. Their system provided a framework for distinct algorithms to solve different parts of the rendering problem arranged in lighting networks which could be combined in various topologies including loops. A user could program their own lighting network for a given situation and select that only parts of the scene were computed with given algorithms thus giving a flexible and faster result. Lighting networks could be viewed as an evolution from shading trees [Coo84]. Since the produced lighting networks might be quite complex, a system based on regular expressions, similar to the light transport notation [Hec90] was developed to detect redundancy and missing light paths.

3.9 Rendering under System Constraints

44

Recently the component-based approach has been tied in with perceptual rendering. Stokes et al. [SFWG04] presented a perceptual metric which predicted the importance of the components for a given scene. Stokes et al. proposed that the perceptual metric could be used to drive a path tracing renderer, where the primary rays collected information about the scene and then used the perceptual metric to allocate the individual component calculations to resources based on their importance. The final image was then composited from the distinct components. Their perceptual metric was scene and image dependent and the proposed framework did not support hybrid paths.

3.9

Rendering under System Constraints

One particular aspect of selective rendering, involves rendering with the highest quality possible within a given time constraint by allocating resources appropriately. While in Section 3.4.1 we only briefly described level of detail techniques, the majority of work for rendering under system constraints has been related to this field of rendering due to the need of achieving interactive rates. Funkhouser et al. [FS93] described a system of rendering under timing constraints to maintain fixed frame rates for rendering based on rasterisation. For each model they maintain different discrete level of details. Based on an object tuple (O, L, R) where L and R define the level of detail and shading for object O, and two functions, C and B to define cost and benefit, they suggested maximising benefits while constraining costs:

Maximise : ∑ B(O, L, R) S

such that : ∑ C(O, L, R) < constraint

(3.1)

S

where S is the set of tuples rendered in a frame. In their system, the cost function was defined by the time taken to render the object primitive with the given level of detail and shading option. For calculating costs they used precomputed system specific constants by running a series of tests with varying levels of details and shading options. Their benefit function was composed of a number of features: size of projected object onto the image, accuracy depending on a function of shading algorithm and number of primitives (polygons, vertices etc. depending on the choice of shading algorithm), motion blur, focus, hysteresis to prevent distracting pop-up effects and semantics for user-selected application specific important objects. Their benefit function did not contain any of the complex perceptual models introduced in Section 3.6. Maximising the use of benefit under a defined cost threshold corresponds to the continuous multiple choice knapsack problem which is known to be NP-complete. They used a greedy algorithm to improve prediction performance. They chose to add tuples to the rendering list of objects S depending on the highest value of B(O, L, R)/C(O, L, R) until the cost was below the constraint threshold. If the same object was added more than once the one with the highest benefit function was chosen. Their solution was

3.9 Rendering under System Constraints

45

at least half as good as the optimal. Their results demonstrated that their approach maintained a constant frame rate compared to static and reactive fixed frame rate systems [LWC+ 02]. [MS95] and [MB97] extended the concept by manipulating hierarchies of objects, possibly represented by imposters. [GB99] furthered the work with respect to continuous level of detail models. Horvitz and Lengyel [HL97] presented a decision theoretic approach for time-constraints of realtime rendering based on the Talisman architecture [TK96] which allowed re-using objects that had been rendered previously through affine transformations. Their flexibility of computation resulted primarily from replacing rendered images with sprites, imposters which can be transformed through image-based techniques, but their framework could be used for further selective rendering parameters such as level of detail. Their benefit function (modelled in the form of perceptual cost) was a function of rendering a sprite at a certain rendering level and also included a function of attentional focus on the sprite. Moreover, they costed into the benefit function probabilities of users attending to sprites under certain evidence, they describe one such example as a task in a video game. Their computational cost consisted of the cost of warping a sprite as opposed to re-rendering it. They used a similar greedy algorithm to the one in [FS93] to maximise benefits. Furthermore, they proposed techniques that attempt to maximise benefits for rendering with multiple selective rendering variables. Dumont et al. [DPF01] described a constrained system that optimised for space rather than time. Since texture mapping using graphics hardware is accelerated if all the textures to be rendered are cached on the graphics hardware, they presented a system that selected the perceptually best set of textures from a mip-mapped pyramid while making sure the selected texture sub-pyramids fit in texture memory. Their system could also be viewed in terms of Equation 3.1, were the constraint is the texture memory size. While their cost function was simply the size of the texture, their benefit function used a more complex perceptual based metric that decomposed the computation in a similar way to the that used in [RPG99] and the projected area of the texture onto the screen. Their results demonstrated a considerable speed up against the traditional approach. In [DPF03] the approach in [DPF01] was adopted for the general case similar to the decision theoretic approach from [HL97]. They presented three cases of decision making under constraints which allowed them to achieve interactive rates for global illumination rendering. Their global illumination algorithms used a version of the hierarchical radiosity [HSA91] approach to precompute the diffuse interreflections onto textures. The interactive rendering used the pre-computed textures for diffuse calculations and prefiltered environment maps for glossy and specular reflections. The constraint for the diffuse textures was similar to the one outlined above from [DPF01] and constrained the textures by memory usage on the graphics hardware. For non-diffuse textures a benefit function similar to that of the diffuse textures was defined but the constraint was in time and memory, and the cost was a function of calculating and displaying the environment maps. The final constraint was in the limit of the creation of the meshes for the radiosity solution. The cost

3.10 Sparse Sampling Techniques using Temporal Coherence

46

was in the number of elements. The benefit function took the form of a modified visual difference predictor similar to that in [DPF01]. The resource allocation is computed by ordering the benefit function of each element over that of its children ensuring that parent nodes followed child nodes in the ordering. When a parent node was added any corresponding child node was removed. The number of elements above the constraint was chosen as the final set for rendering. Cater et al. [CCW03] also used time-constraints for constraining their selective rendering system based on ray tracing, described in Section 3.7. They could constraint the rendering any time after the image preview. Their constraints after the image preview, IBR and map generation phases, were selected by ordering the rays in order of the error conspicuous map. While their system based on R ADIANCE supported indirect diffuse rendering through the irradiance cache, they did not demonstrate any results using the indirect diffuse interreflections. Their system did not use profiling but just stopped the computation when the constraint was met. They did not provide any results to demonstrate the accuracy of their technique.

3.10

Sparse Sampling Techniques using Temporal Coherence

Interactive rendering may be achieved by exploiting both spatial and temporal coherence. These properties lie at the heart of the majority of the techniques loosely termed sparse sampling algorithms. While not directly related to the work in this dissertation, they are selective rendering methods that form a natural progression for what might come next if the work in this dissertation were to be taken further. In each case a form of caching is used to re-use previously collected samples. The render cache [WDP99, WDG02] is an image-space based system. It collected samples through Monte Carlo ray tracing techniques and stored them in a data structure named the rendering cache. A viewing process ran asynchronously to the rendering and displayed the images by re-projecting pixels onto the view plane. The advantage of this asynchronous system was that viewing becomes interactive, since rendering samples were generated at the renderer’s own pace. One major disadvantage was that the images could have visual artifacts due to the interpolation over the available samples. Another similar algorithm is the point cloud algorithm [RCLL99] developed for a parallel version of R ADIANCE which suffered from the same disadvantages of the render cache. Bala et al. [BWG03] generated an edge and point map from projection and sparse sampling for their render cache like renderer. The edge and point map was then used to ensure interpolation did not occur over edges, and was also used to improve antialiasing. An alternative approach was a world space approach known as the shading cache [TPWG02]. The shading cache took advantage of the asynchronous system similar to the render cache, however it used a vertex library based system (in their case OpenGL) to store the samples. This also

3.11 Summary

47

enabled the display to take advantage of the high performance of commodity graphic processors which enhanced the OpenGL based rendering. The shading cache had all the advantages of the render cache while reducing artifacts. Moreover, it made use of methods that perform well with modern graphics hardware. However, the hierarchical model adopted might be a poor solution for situations with complex geometry and shading. Dayal et al. [DWWL05] presented a caching scheme that furthered the concept of frameless rendering [BFMZ94] which involves rendering without back buffering by constantly updating the frame buffer. Rather than sampling only within the spatial domain, their renderer sampled within both time and space. They stored samples in a K-d tree data structure representing the image plane with an added dimension for time. Filtering was performed using a spatiotemporal Gaussian and reconstruction was performed on the GPU by splatting the filtered samples. They attempted to find a compromise between high temporal detail with low resolution and high spatial detail with low frame rates. They demonstrated interactive rates for non-complex ray-traced scenes with relatively low spatial and temporal errors. A further description of caching schemes for interactive global illumination is given in [DBB02].

3.11

Summary

In this chapter we have presented work related to selective rendering. We began by identifying selective criteria, in particular techniques based on human perception and attention. We subsequently gave a brief overview of selective rendering for rasterisation and radiosity. We also provided a comprehensive overview of selective ray tracing techniques. These techniques rely on a number of methods for choosing the criteria and adaptively rendering. In the next chapter we categorise these methods and show they are performed using a number of stages which are quite similar across the board for selective renderers using ray tracing. We will show how this formalisation will help us improve the selective rendering process. We also showed how some of the component-based systems use adaptive and progressive techniques, yet none, with the possible exception of [SSH+ 98], use these techniques to obtain further flexibility within their rendering frameworks. We presented work on system constrained rendering and showed how, while this has been used frequently for rasterisation, there has been little work on constrained rendering for physically-based rendering. Finally, we concluded by presenting temporal coherent techniques that use spares sampling, which could be a future direction to follow for our selective rendering techniques in order to achieve interactive rates.

3.11 Summary

48

Chapter 4

The Selective Rendering Pipeline In this chapter we present our underlying selective rendering framework and introduce the concept of a selective rendering pipeline to describe a broad category of selective renderers. We demonstrate, using a number of selective renderers as case studies, the general benefits and problems associated with selective rendering. Subsequent chapters will provide further, and sometimes more complex examples.

4.1

Introduction

In this thesis our main focus is accelerating the computation of high-fidelity graphics using selective rendering. The selective rendering frameworks we are interested in are designed for accelerating high-fidelity rendering of still images, animations, and interactive systems for complex scenes using ray-tracing based system. Within this context we have identified a selective rendering graphics pipeline, which, as will be demonstrated in the case studies, is an effective method of viewing the selective rendering process and identifying potential improvements. Moreover, we will map the selective rendering pipeline to the underlying software and the software to the hardware to demonstrate the flexibility of this method. This chapter begins with a classification of selective renderers, and is followed by a series of five selective renderers that we have developed as case studies for our selective rendering frameworks.

49

4.2 Selective Rendering

50

LQ

Figure 4.1: Selective rendering cyclic process using progressive ray tracing and visual difference predictor.

4.2

Selective Rendering

Most perceptually-based selective rendering algorithms discussed in the previous chapter can be broadly placed into one of two categories: either those that use progressive algorithms and selective criteria, generally visual difference predictors, to identify a stopping condition, frameworks we term selective rendering cyclic process frameworks, see Figure 4.1; those that use a low quality image estimate, either through a rapid rasterisation render or a lower resolution ray-traced image to identify areas of the image where to best spend rendering time, frameworks we term selective rendering pipeline frameworks, see Figure 4.2. Many of these selective rendering algorithms pass through a number of rendering stages. For ray-tracing based selective renderers these stages can be described as the pre-selective rendering stage, the selective guidance stage and the final selective rendering stage. The way selective rendering algorithms traverse these stages is closely linked with the two categories described above. Those that traverse these stages only once, for example the renderers described in Section 3.7, are effectively making the whole process a selective rendering pipeline, see Figure 4.2. Other selective renderers may cycle through the last two stages in the selective rendering cyclic process, see Figure 4.1, such as the methods described in Section 3.6. Examples of selective rendering frameworks that fit into our broad categorisation are shown in Table 4.1. Note that the granularity of the selective cycle within the selective rendering cyclic process varies largely, from the large cycle of [Mys98], which could compare visual differences in an entire image, to the much finer grain cycle of [WABG06] which just compares two possible

4.2 Selective Rendering

51

Figure 4.2: Selective rendering pipeline using a rasterised pre-selective rendering and saliency map as selective guidance.

Figure 4.3: Demonstrating where the selective rendering frameworks are placed in the context of the ray-tracing pipeline. options in a hierarchy at each iteration. Also, it is worth noting that there is a small difference between the two frameworks, selective rendering pipelines, can be extended to cyclic processes by re-iterating the last two steps. While we have not included earlier versions of selective rendering within our framework, these could generally be accounted for by our frameworks also, for example the recursive method using a pixel based radiance threshold in [Whi80] could be considered part of the selective cyclic process, see Section 7.2. Figure 4.3 shows where the selective rendering frameworks fit within the broader view of the ray-tracing pipeline presented in Section 2.6. Pre-selective rendering is akin to a traditional rendering process and is the primary input to the selective guidance. This stage takes the form of an initial rendering pass and is used to generate an approximation of the final image, an image preview, that may be used to identify parts of the image which are to be considered more salient than others. For some renderers, mainly the ones that follow the selective rendering cyclic process, the output of the image preview, is just an initial indication that is replaced in successive passes, for others, the selective rendering pipeline ones, it is a definite indication which will not be replaced. There are two main methods used to generate the pre-selective rendering as an image preview. The first method uses rasterisation to generate an approximate image preview usually composed only of direct lighting and traditionally rendered using rasterisation on graphics hardware [Guo98, YPG01, LDC06]. The second method is a rapid low quality ray tracing pass [Mys98, CCW03] which may then be updated in subsequent stages. When the latter technique is used the quality of the low quality pass must be determined. We term this quality the base quality. The issues related to the techniques chosen and how to determine the base quality will be further discussed in the upcoming sections and Chapter 8. Certain techniques, such as image filtering and tone-mapping, that are traditionally applied at the end of the computation may need to be applied at this stage also. The selective guidance stage consists of the method and criteria used to determine where computation time is best allocated. Selective guidance is responsible for generating directives to drive the selective rendering. While these criteria may be as simple as the colour intensity dif-

4.2 Selective Rendering Selective Rendering Cyclic Process [Mys98], [BM98], [RPG99], [FP04], [WABG06]

52 Selective Rendering Pipeline [YPG01], [HMYS01], [CCW03], [SDL+ 05], [MDCT05], [LDC06], [ASGC06]

Table 4.1: Examples of selective rendering frameworks categorisation for recent selective renderers keyed by their publications.

ference threshold to direct adaptive sub-sampling techniques for antialiasing [Whi80], we are primarily interested in the more complex perceptually-based criteria such as the visual difference predictors [Dal93] and perceptual oracles [IKN98] based on bottom-up and top-down visual attention models, and other metrics such as motion (of both the on screen objects and camera movement [YPG01] and the observer [EC06, ECD06]) and predicted time complexity [GDC05, GLDC06]. Feedback from previous results could also influence the selective guidance. The way the selective renderer uses the selective guidance is also relevant. Visual difference predictors are used to successively differentiate between subsequent rendered images until the difference between the rendered images is below a certain threshold and the rendering can be stopped [Mys98, RPG99]. Visual difference predictors are used mostly when a cyclic selective rendering approach is used. Oracles usually take the form of saliency maps [IKN98], task maps [CCL02] or some combination of both [SDL+ 05] that are consulted in the rendering stage to decide where best to concentrate rendering resources. Oracles are mainly used as part of a selective rendering pipeline. Perceptual oracles based on image previews are generally independent from the selective renderer and can be interchanged. Selective algorithms furthermore perform some form of adaptive computation by choosing a component we term the selective variable that is modified dynamically based on the selective guidance for different areas of the final image. In this stage the rendering is generally performed non-uniformly throughout the image. The selective variable is the modifiable parameter that may be modulated to a certain quality based on the selective guidance, we term the selective quality. Rendering at selective quality is directly related to the eventual speedup. The selective variable is typically the number of rays shot per pixel [Mys98, CCW03, LDC06]. However, other approaches have been attempted, of particular interest is the approach of Yee et al. [YPG01] that use as a selective variable, the interpolation search radius of the irradiance cache [WRC88], and thus report close to an order of magnitude speedup. Table 4.2 shows an overview of how various selective algorithms compute the various stages in the selective rendering pipeline. Some of these selective renderers are part of the original work in this thesis and will be presented where indicated. Some of the selective variables and other methods outlined have not yet been introduced, these will become clearer when they are introduced in their corresponding sections.

4.3 Introduction to the Selective Renderers Selective Renderer [Mys98] [BM98] [RPG99] [YPG01] [HMYS01], Case I† [CCW03] [SDL+ 05], sel-st‡ [MDCT05], Case II† [LDC06] Case III† [CDS+ 06], Case IV† [ASGC06] [WABG06] crex§ Progressive]

Pre-Selective Rendering ray tracing ray tracing rasterisation rasterisation rasterisation ray tracing ray tracing ray tracing rasterisation rasterisation rasterisation rasterisation ray tracing ray tracing ray tracing

Selective Guidance VDP VDM VDP Aleph map saliency map conspicuity map importance map OSDM saliency map saliency map importance map XS-map Weber’s Law importance map distinct

53 Selective Variables rays rays indirect rays IC radius rays rays rays, spec threshold rays rays ray, IC radius rays rays various component rays various

Selective Framework cyclic cyclic cyclic pipeline pipeline pipeline pipeline pipeline pipeline pipeline pipeline pipeline cyclic pipeline pipeline

Table 4.2: Examples of the stages of various selective renderers keyed by their publications. † will be discussed in this chapter’s case studies for selective renderers. ‡ will be presented in Section 5.7. § will be presented in Chapter 5. ] will be presented in Chapter 8. Note, some of the pre-selective rendering stages that compute only direct illumination via ray tracing have been labelled rasterisation instead, such as [ASGC06], since this would be sufficient and potentially faster.

4.3

Introduction to the Selective Renderers

In this thesis our primary interest lies in selective rendering using the selective rendering pipeline. However, our novel progressive selective algorithms described in Chapter 8, may be easily adapted to the selective rendering cyclic process. Also, we are not particularly concerned with the method chosen for selective guidance but on how to use it. The selective guidance methods we commonly use are based on the saliency maps and tasks maps or combinations of both. While most of the selective renderers presented in this thesis make use of such image space maps for selective guidance, they could be replaced by simple selective guidance methods, such as those mentioned in Section 3.1, or object space selective guidance methods such as [BWG03, KBPv06, WABG06].

4.3.1 The Case Studies In the rest of this chapter we will describe a number of renderers that we have developed based on the selective rendering pipeline. This will serve to highlight the breadth and depth of the selective rendering pipeline and identify potential improvements. Furthermore, we will show how the pipeline is mapped to different software applications (or techniques) and how this software

4.3 Introduction to the Selective Renderers

54

best makes use of the hardware. At the end of the case studies we will evaluate these applications. It is worth noting that the abstraction of the selective rendering pipeline was conceived subsequent to, and mostly as a result of, the following case studies. The case studies are introduced more or less in order of complexity, in terms of software and hardware used. The first two case studies describe the simplest form of selective renderer using visual attention: the first uses bottom-up visual attention by means of a saliency map and the second is designed for applications that take advantage of general visual attention by selectively rendering on-screen distractors. The third case study demonstrates different aspects of selective rendering and how certain processes can be mapped onto graphics hardware. The fourth case study is a description of a framework to handle both top-down and bottom-up applications simultaneously with an importance map and extends the notion of efficient hardware use through parallel selective rendering. The final case study describes a time-constrained selective renderer.

4.4 Case I: Selective Rendering for Bottom-Up Visual Attention

4.4

55

Case I: Selective Rendering for Bottom-Up Visual Attention

LQ SM

BQ

SG processing

SQ

Selective R enderer

C PU

Figure 4.4: Case I selective rendering pipeline. This selective renderer exploits the human visual system’s property of attending to certain features in an image instead of others, as discussed in Section 3.3. For this renderer, this is accomplished by means of a saliency map based on the bottom-up visual attention model presented by Longhurst et al. [LDC06], presented in Section 3.3. This first case study demonstrates the basics of the selective rendering pipeline. Figure 4.4 illustrates our pipeline at three different levels. At the top is the conceptual level, including visualisations of the rendering process. The images shown, particularly the pre-selective rendering images are used just as an indication of what process is being performed and are not necessarily the ones used to generate the shown saliency maps. The subsequent level illustrates the software processes that are used, and, on the third level, the hardware used.

4.4.1 Selective Rendering The pre-selective rendering phase uses the selective renderer, running on the CPU, to produce a preview image of the final image. In this case the same renderer that will generate the final image is used for the preview. An image rendered in base quality, typically at one ray per pixel, is used in this case to obtain the image preview. Although this stage in the pipeline might seem straightforward, it entails a number of potential complications which we will begin to outline in the results section of this case study. The selective guidance phase in the pipeline obtains an input from the image preview and applies the saliency model on the image preview. The software techniques used are image processing


56

Figure 4.5: Scenes used for results of Case I. From left to right: the Cornell Box, the Tables scene, the Corridor scene and the Temple of Kalabsha scene [SCM04]. The rendered scenes (top) and the saliency maps (bottom). techniques labelled SG processing, short for selective guidance processing. This software also uses the CPU for computation. The resultant from the SG processing in this case is a saliency map based on the work described in Section 3.3. Examples of generated saliency maps can be seen in Figure 4.5. The final stage of the rendering uses the saliency map obtained from the selective guidance stage to drive the selective renderer. The sole selective variable for this renderer is rays per pixel. The value in the saliency map modulates the number of rays per pixel shot, out of a user-defined maximum, to the desired selective quality.

4.4.2 Implementation and Results

gold selective speedup gold pre-IC selective pre-IC speedup

Cornell Box 738 637 1.15 182 48 3.79

Tables 9,665 6,572 1.47 1,473 278 5.30

Corridor 12,062 9,122 1.32 2,574 721 3.57

Kalabsha 1,993 1,634 1.22 334 87 3.95

Table 4.3: Speedup for Case I of selective rendering over traditional rendering. Timings in seconds. The simple selective renderer has been implemented within R ADIANCE. The pre-selective rendering renders at the base quality number of rays per pixel using traditional rendering in R ADIANCE. The resultant image preview is first tone mapped using the photographics tone mapper [RSSF02]


selective pre-selective percentage (%) selective pre-IC pre-selective pre-IC percentage (%)

Cornell Box 637 494 78 48 12 25

Tables 6,572 5,225 79 278 91 33

Corridor 9,122 6,034 66 721 165 23

57 Kalabsha 1,634 1,368 84 87 22 25

Table 4.4: The percentage of computation spent on the pre-selective rendering phase of the selective rendering pipeline for Case I. Timings in seconds.

and then input into the selective guidance. The selective guidance is calculated by generating a saliency map based on a software version of the image space saliency calculations in [LDC06]. The final selective rendering stage uses the saliency map to modulate the rays per pixel. All sampling for this renderer is done using a random sampler. In order to demonstrate some of the issues with selective rendering, we present results for selective rendering, compared with traditional rendering, for four scenes under two different conditions. The scenes used are shown in Figure 4.5 (top). Under the first condition, the irradiance cache is computed from scratch representing the first frame of an animation or rendering of a still image. For the second condition, a pre-computed and saturated irradiance cache, representative of successive coherent frames within an animation is used. The rendering parameters were a maximum of 16 rays per pixel, which is set to constant for the traditional renderer, high-quality settings for the irradiance cache and default settings for the rest. The only selective variable used was rays per pixel. The base quality was set to 1 ray per pixel. All results were computed on an Intel Pentium 4 2.4GHz with 3GB of memory running under Linux. We present the results in Table 4.3. The traditional renderings are labelled gold for the first condition and gold pre-IC for the pre-computed irradiance cache. The selectively rendered images for the first and second condition are labelled selective and selective pre-IC respectively. The results demonstrate that while a good amount of speedup is obtained when an irradiance cache is pre-computed this is not the case when the irradiance cache is not pre-computed. In this case, the speedup is minimal. The main reason for this can be seen in Table 4.4 which shows the percentage of time taken on the pre-selective rendering stage. When rendering with a pre-computed irradiance cache, the pre-selective rendering takes longer due to the way the irradiance cache is computed whereby the first computations are more likely to be cache misses and therefore take more time to compute, see irradiance cache analysis in Section 6.2. Using a selective variable based on the irradiance cache would reduce this problem, as we shall show in Section 4.6 and Chapter 8.


58

4.4.3 Case I Discussion This selective renderer serves to highlight a number of observations that will be dealt with throughout this thesis. Firstly amongst these, is the interdependence amongst stages. This case study demonstrates that the base quality, at the pre-selective rendering stage, is affected by the choice of the selective variables and the method of computing the selective variables in turn affects the sampling method chosen. Since we are only using rays per pixel, the selective variable chosen is not ideal since the base quality incurs the cost of seeding the irradiance cache as the results demonstrated. While the irradiance cache serves to highlight this issue, other selective variables which might not be needed in the future might still be computed at the base quality level also. This introduces the issue of how not to discard the data calculated during the pre-selective rendering stage. In this case, for rays per pixel that are shot randomly, this is not a problem. Yet, random sampling is not the ideal method for sampling [PH04]. For certain sampling techniques, such as stratified sampling, there clearly is a problem. The rays per pixel sampling method must be preferably based upon an algorithm that is progressive to be able to progress from pre-selective rendering stage to the eventual rendering stage without incurring extra costs. Also, the selective rendering phase of the pipeline and the potential speedup obtained depends on the selective variable used. In this case while a certain amount of speedup is obtained this is not comparable to that demonstrated by [YPG01]. The improvement in speedup can be obtained using a different selective variable or a combination of them.

4.5 Case II: Rendering with On-Screen Distractors

4.5

59

Case II: Rendering with On-Screen Distractors

LQ O SD M

BQ

SQ

Selective R enderer

C PU

Figure 4.6: Case II selective rendering pipeline. The selective renderer for Case II detects on-screen distractors as an integral part of the rendering process to produce selectively rendered animations that are rendered in selective quality when onscreen distractors are present, and in high-quality otherwise. This selective renderer was originally designed for investigating the effects that sound emitting objects (SEOs) have on the perception of high-fidelity graphics [MDCT05]. However, it can be used in the general case when rendering selectively due to on-screen distractors, such as the work in [CCW03]. This selective renderer can be used for both top-down and bottom-up visual attention processes, when an object on the screen draws the viewer’s attention, whether as part of a task or involuntarily. This case is interesting because the selective guidance does not involve any intermediate phase of selective guidance computation since all operations can be carried out with simple modifications to the selective renderer and just a simple list of distractor objects. The selective rendering pipeline for this algorithm can be seen in Figure 4.6. This selective rendering pipeline has only two phases. The pre-selective rendering and selective guidance are combined into one. Also note how all computation is performed by the selective renderer. The first phase of the selective rendering pipeline involves rendering to a base quality level while locating on-screen distractors. The second phase renders the image selectively applying a foveal angle gradient decrease in quality around the projected image of the on-screen distractors and maintaining quality consistency using a data structure we term quality buffer, or q-buffer for short.

4.5.1 Selective Rendering As with Case I, this selective renderer’s only selective variable is the number of rays per pixel. Since this selective renderer is primarily created for animations, an animation file maintains a list


60

of all frames with the default rendering quality for each frame and whether any distractor should be considered more salient. Frames that contain distractors are tagged as selective. Furthermore, a base rendering quality is associated with each frame. We note that this feature would apply also in the case of distractor objects that are out of the frame. The option of not having objects as always being distractors, but being distractors only for certain frames, is primarily influenced from the need to selectively render with SEOs, which function as distractors only when they are emitting sounds. For frames that are not meant to be rendered selectively, traditional rendering is performed at the default rendering quality.

Identifying the On-Screen Distractors

When rendering the frames tagged as selective in an animation, the selective rendering process for this renderer can be viewed as a two pass process for each frame. In the first pass the image is rendered using the traditional rendering method up to the base quality. In this pass the distractor objects are detected through object identification of the primary ray or certain class of secondary rays, as they intersect the distractors’ geometry. We term the rays that are allowed to detect the distractors detectable rays. Only certain types of secondary rays are detectable rays, such as pure specular which are representing pure reflections of the object. Other secondary rays, for example indirect diffuse rays, would reflect very little of the true nature of the distractor. Examples of this are given in the results section. A data structure we term an on-screen distractor list (OSDL) is a queue data structure responsible for storing on-screen distractor object data and pixel locations. When the intersection routine for the detectable rays returns the object hit, a distractor object list is consulted and if the object appears on the list, the OSDL adds the detectable object to the list. The first phase ends when the image is rendered entirely to base quality, at which point the computation of the OSDL is completed for the first phase. The OSDL can be visualised as an on-screen distractor map (OSDM). A visualisation of the OSDMs for a number of scenes is shown in Figure 4.7.

Selective Rendering and the Quality Buffer

The second phase introduces another data structure, the quality buffer (q-buffer) which ensures that the correct number of rays are shot for each pixel. The q-buffer is inspired by the z-buffer [Cat78], the algorithm used for solving depth issues in hidden surface removal of rasterised rendering. The q-buffer is a buffer equal in size to the resolution of the image to be rendered. It stores the quality in rays per pixel of the number of rays rendered up to that point for that given pixel. At the beginning of the second phase all entries in the q-buffer are initialised to the value of the base quality. In the second phase, the OSDL is parsed and for each entry a number of rays intended to be shot is calculated. The ray’s hit point is also tested to discover whether or not the pixel is a boundary


61

pixel of the projected image of the distractor object. If there are no distractor objects at this point, no further action is taken. If the pixel corresponds to an internal point on the distractor’s projected image, the difference between the distractor’s value and the q-buffer at that pixel is calculated. If this difference is positive a number of primary rays equal to the difference is shot. The q-buffer at that pixel is then set equal to the distractor’s quality value. If the difference is negative or zero, no action is taken. For border pixels, the renderer degrades the quality around the border of the distractor objects within a user-defined radius, usually the foveal angle. This option provides a method of rendering around the foveal angle that is not limited to the size of the object. Each pixel within this radius is cycled through and, for each pixel, the desired quality in rays per pixel is calculated. When the quality is calculated the q-buffer is consulted. If the new quality is greater than the corresponding entry in the q-buffer the difference in rays is shot and the q-buffer entry updated in the same method as described above. The q-buffer is necessary to protect the quality of pixels lying within the degradation radius of more than one different objects which might result in more rays then necessary contributing to a given pixel. As the rendering progresses, the image is refined so the projected distractor objects on the image plane’s boundaries might change subtly. This algorithm ensures that the new boundaries are updated accordingly by storing the new information onto the OSDL. Visualisation of the q-buffers for various OSDMs and scenes can be seen in 4.7 (right).


Like the previous renderer, this selective renderer was implemented in R ADIANCE. A number of user adjustable parameters were added. Firstly, a base quality and a high quality is set for each frame. Secondly, for animations, a frame needs to be tagged as selective or high quality, and if selective a list of distractor objects for that frame is identified. If a frame is tagged as high quality the entire image is rendered in high quality. This option is used when no distractors are present. For frames that are tagged as selective, the base quality is rendered in the first phase and the selective quality in the second phase, following the algorithm described above. Note that some frames might not have distractor objects and still be marked as selective; useful for when there is a distractor outside the view images, such as an abrupt sound. We present two sets of results for this selective renderer. The first set of results are taken from a number of scenes where certain objects were chosen as being more important just to demonstrate results. The second set of results from [MDCT05] demonstrates a more practical use of this renderer.


62

Figure 4.7: Scenes used for results of Case II. From top to bottom: the Room scene, the Corridor scene, the Cornell Box scene detecting reflected OSDs and the Cornell Box scene detecting OSDs with modulation for the reflected on-screen distractors. From left to right: the rendered scenes, the OSDMs and the q-buffers.


63

Figure 4.8: The scene used for the sound-emitting objects experiment [MDCT05]. From left to right: the high quality rendering, the selectively rendered image and the OSDM.

Results

We present results for the scenes displayed in Figure 4.7. The distractors for these scenes are: the vases for the room scene, the exit signs for the Corridor scene and the glossy sphere for the Cornell Box. For the first and second scene the rendering parameters do not detect the distractor in the reflections, for the third scene, the distractors are fully detected in all the reflections and for the final scene the rendering parameters were set to modulate the importance depending on the material of the object. The differences can be clearly seen in the q-buffered OSDMs of the two Cornell box scenes. Rendering parameters were set to default settings with a maximum of 16 rays per pixel, and a base quality of 1 ray per pixel. The irradiance cache was pre-computed. If the irradiance cache were not pre-computed similar results to those in Case I without pre-computed irradiance cache would be expected. Results, in seconds, to render the images are shown in Table 4.5. Results demonstrate notable speedup for some of the renderings and lesser speedup for others. This was expected since the object coverage is both object and view dependent. The Cornell Box scenes and results demonstrate the advantages of being able to first capture the distractors in reflections and second being able to modulate them. The reflections of the glossy ball in the translucent object would be imperceptible to most viewers, yet very easy to spot in the mirror so rendering is weighted accordingly. As can be seen in the results, the speedup for the modulated Cornell Box experiment is much better than the unmodulated version.

gold selective speedup

Room 366 87 4.21

Corridor 2,454 335 7.32

Cornell Box 1 179 81 2.21

Cornell Box 2 179 50 3.58

Table 4.5: Results for scenes used in Case II. Timing in seconds.


64

Sound-Emitting Objects

As a method of validation and to present some further results, we introduce the basic premise of the investigation originally published by Mastoropoulou et al. [MDCT05] for which this selective renderer was primarily created. The investigation focused on discovering if on screen soundemitting objects could act as a distractor, in a similar way as inattentional blindness, such that the viewers gaze is distracted towards the projected area of the distractor on the screen. This would enable sections of animations that contain SOEs to be rendered faster by rendering the SOEs at higher quality while the rest of the image is rendered at a diminished quality. In order to determine this Mastoropoulou et al. conducted a psychophysical experiment whereby a rendered animation was presented to 120 participants. Two animations were rendered at a resolution of 740 × 540. The first one was rendered completely in high quality and the other in selective quality for 3 seconds when the SOE was active. A frame of this rendered animation can be seen in Figure 4.8 (left). The high quality animation was rendered with 16 rays per pixel for each pixel, while the second animation was selectively rendered for those animations were an SOE was active for 3 seconds with the distractor rendered at 16 rays per pixel and the surrounding foveal angle degrading up to a base quality of 1 ray per pixel. Figure 4.8 (middle) demonstrates the selectively rendered image and (right) visualises the q-buffer as a grey-scale map. The high quality frames took on average 18.22 minutes to render, while the selectively rendered images took on average 4.75 minutes for the 3 seconds when the SOE was active, on an Intel Pentium 4, 2.4Ghz. This signifies a speedup of 3.83 for the selective rendered frames over the high quality frames. Two versions of the high quality animation and selective quality animation were created: one with the sound effect and one silent. Each of the participants was presented one of the four animations. Subsequently, the participants were shown two still images from the high quality and selective quality animations when the SOE was active and asked which image they thought they saw during the animation. Results confirmed that it was participants failed to notice the selective rendering quality when sound effects were present.

4.5.3 Case II Discussion This selective renderer introduced a number of concepts which are interesting. Firstly, is the idea of conflict resolution for areas in higher quality due to the q-buffer. Secondly, empowering the selective renderer to render different qualities throughout the rendering is ideal for when there are no distractors on-screen, but some form of distraction is still present. From the point of view of a selective rendering pipeline, this selective renderer is interesting primarily for its simplicity. The pre-selective rendering phase and the selective guidance are effectively combined into one phase. The software used is the selective renderer and all techniques are contained within the renderer. The choice of selective variable does not influence the way this selective renderer would perform.


65

In this case the need to use a progressive algorithm to render the base quality is fundamental. Improvements to the selective rendering pipeline as outlined in Chapter 8 could be incorporated into this selective rendering pipeline easily. The simplicity does however limit its use only to applications that can exploit on screen distractors, excluding techniques demonstrated in Case I. More complex interactions amongst visual attention processes are described in Case IV.

4.6 Case III: GPU-Assisted Selective Rendering

4.6

66

Case III: GPU-Assisted Selective Rendering

LQ R IE

SM

R apid im age estim ate

SQ

SG processing

G PU

Selective R enderer

C PU

Figure 4.9: Case III selective rendering pipeline. This case study serves primarily to show how graphics hardware can be made to assist selective rendering and furthermore to demonstrate the flexibility of the pipeline in coping with different hardware. The pre-selective rendering part and the selective guidance stages of the selective rendering pipeline are accelerated by making use of fast commodity graphics hardware which substantially improves the performance from that which would normally be obtained using only a CPU. Yet, as we shall show, this approach will affect certain parts of our pipeline and certain assumptions we made in the first case studies. In this section we will describe two selective rendering systems. The first selective rendering system uses edge detection and adaptive sub-sampling. This helps demonstrate that different selective guidance techniques can be used for rendering. The second is a rendering system similar to that presented in Case I. In both these cases the pre-selective rendering phase and the selective guidance phase are performed on the GPU. Figure 4.9 demonstrates the pipeline of these two rendering systems. Note how this time the hardware level is represented by both a CPU and GPU. The initial pre-selective rendering phase of both these rendering systems is identical so we describe it first. The pre-selective rendering stage uses rasterisation thus taking advantage of fast graphics hardware, similar to the selective rendering technique employed by Yee et al. [YPG01] to produce a rapid image estimate of the image to be rendered. The advantage of such systems is near interactive image preview times. However, due to the rasterised nature of the rapid image estimate it may compromise the visual fidelity. A number of rasterised images compared to their selectively rendered counterparts can be seen in Figure 4.10. In this case the rapid image estimate computations are handled by the graphics hardware. In order to be able to take advantage of the graphics hardware properly a separate rasterisation based rendering system must be used that can


67

be run on the GPU. We will present such a system developed purposefully for selective rendering in Section 4.6.3.

4.6.1 Edge Detection on GPU The first example we present of GPU-assisted rendering is to support selective rendering using edge detection and adaptive sub-sampling. The pre-selective phase is generated by the rasterised rapid image estimate. This feeds into the selective guidance phase which for this rendering system only performs edge detection. The edge detection may also be performed on the graphics card resulting in orders of magnitude improvement in performance [LDGC05]. Finally, the resulting edge map is used as input into our selective renderer. During the selective rendering stage, the selective renderer performs adaptive sub-sampling by shooting rays at the corners of a square of user-defined distances in pixels (usually eight) and depending on the edge map subdividing the square further recursively until at least 1 ray per pixel is shot. This form of selective rendering could be considered a simpler form from that presented in the previous section yet it has interesting aspects. It demonstrates that our selective rendering pipeline is applicable for these type of renderers, similar to the work of Bala et al. [BWG03] and Guo [Guo98].

4.6.2 Saliency Map on GPU The second selective rendering pipeline uses a similar approach to Case I. However, rasterisation software generates the image preview which is then input into the selective guidance. The selective guidance may also be computed on the graphics hardware again resulting in an order of magnitude speed up [LDC06]. Since the final selective rendering stage is independent of the pre-selective rendering image preview, this makes it possible to use any sampling strategy, as opposed to Case I and II and also use more than one selective variable without any loss of rendering time. This form of saliency map was discussed in Section 3.3.

4.6.3 Implementation and Results In this section we present results and implementation details for both GPU-assisted selective rendering systems. Since both systems used the same rasterisation software for the pre-selective rendering we describe this first followed by the implementation and results of the two renderers. The pre-selective rendering stage for both pipelines uses the Snapshot software to produce a rapid image estimate of the scene. The Snapshot is a software application developed by Longhurst et al. [LDC05] to provide a quick image estimate of an image about to be rendered. The Snapshot


68

Figure 4.10: Scenes used for results of Case III. From left to right: the Cornell boxes, the Simple Boxes scene, the Tables scene and the Corridor scene. Top row shows rasterised images and bottom row fully rendered images. attempts to improve the fidelity of the rendering by copying the shading of the selective renderer, in this case R ADIANCE shaders [War92], dealing with shadows through shadow buffers [Wil78] and handling specular and transparent materials through cubic maps [VF94]. The selective guidance may be computed on hardware but in the case of these results we used software versions developed in conjunction with the hardware approaches. Timings for the edge maps and saliency maps in hardware are in the order of tens of milliseconds. Full details are given in [LDGC05, Lon05, LDC06]. Longhurst [Lon05] also provides perceptual validation of images created with earlier versions of these selective renderers with traditionally rendered images. The scenes that are used for this set of experiments including rapid image estimates calculated using Snapshot are shown in Figure 4.10. All results were taken on an Intel Pentium 4, 2.4Ghz with 3GB RAM under Linux.

Edge Detection

The selective renderer for edge detection is an extension of the work in [LDGC05]. The rapid image preview is generated by the Snapshot software using the GPU. The edge map for these results uses a software version of the Sobol algorithm used in Snapshot’s edge detection [LDC06]. The selective renderer is based on R ADIANCE. The selective variable used was rays per pixel with adaptive sub-sampling. Rays are shot at the corners of a square of pixels. If the edge map has any discontinuities within the equivalent square, the square is recursed further until either no more discontinuities are present or each ray is shot. Default R ADIANCE settings were used for all


69

Figure 4.11: Edge maps for the scenes used for results of Case III. From left to right: the Cornell boxes, the Simple Boxes scene, the Tables scene and the Corridor scene.

Figure 4.12: Saliency maps for the scenes used for results of Case III. From left to right: the Cornell boxes, the Simple Boxes scene, the Tables scene and the Corridor scene. computations with a pre-computed irradiance cache. Table 4.7 shows the results. As can be seen, a reasonably good speedup is obtained through the adaptive sub-sampling. This is obviously very scene dependent. Simple scenes like the first two obtain better speedup than more complex ones.

gold selective speedup

Cornell Boxes 175 15 11.67

Simple Boxes 1,087 132 8.23

Tables 1,536 282 5.45

Corridor 2,740 529 5.18

Table 4.6: Speedup for Case III of selective rendering using edge detection as selective guidance. Timing in seconds.

Saliency

This selective renderer is an extension of the selective renderer developed for [LDC06]. Primarily the selective renderer for saliency uses the Snapshot for generating the rapid image estimate. The selective guidance takes the form of the image space saliency map properties described in Section 4.6.2. The selective renderer used is based on R ADIANCE, and uses two selective variables: rays per pixel and the irradiance cache search radius, similar to the work in [YPG01]. While rendering with multiple selective variables is introduced here, just to highlight the advantages of this selective renderer, further details will be described in Chapter 8. The selective renderer differs


70

from the one described in Section 4.4 because the sampling used is jittered stratified sampling which is considered in the general case better than random sampling [PH04]. This is made possible because the selective renderer has knowledge of the selective guidance before it commences. Any general sampling strategy could have been used in this case. The saliency maps of the scenes used for this set of experiments are shown in Figure 4.12. Rendering parameters, like those used for Section 4.4.2, are 16 rays per pixel and high-quality irradiance cache parameters. In this set of experiments we always rendered without a pre-computed irradiance cache. Results are displayed in Table 4.7. Traditional rendering is labelled gold, selective rendering using only rays per pixel as the selective variables is labelled sel-rp and selective rendering using both rays per pixel and the irradiance cache search radius is labelled sel-aa. As can be seen there is poor speedup for sel-rp as with the simple selective renderer. However, the combination of both selective variables achieves a considerable further speedup.

gold sel-rp speedup sel-aa speedup speedup (aa vs. rp)

Cornell Boxes 681 490 1.40 323 2.1 1.5

Simple Boxes 3,687 2,731 1.35 1,035 3.56 2.64

Tables 9,700 6,629 1.46 2,548 3.81 2.61

Corridor 11,384 7,512 1.52 4,123 2.76 1.82

Table 4.7: Speedup for Case III of selective rendering using saliency map as selective guidance. Timing in seconds.

4.6.4 Case III Discussion When using the GPU-assisted selective rendering pipeline of this case study, the use of the rapid image estimate for the pre-selective rendering phase is useful since it overcomes problems presented in Case I. The choice of selective variable does not influence the whole of the selective rendering process as much. In the discussion for selective renderer in Case I, we described how the selective rendering phase was closely linked to the pre-selective rendering phase. This is no longer the case here, since the pre-selective rendering has been decoupled from the selective rendering phase. This allows us to use a stratified sampling scheme for the rays per pixel selective variable and perhaps more importantly multiple selective variables, in particular the possibility of modulating the costly indirect diffuse computations. The use of the graphics hardware for the selective guidance is particularly useful since graphics hardware is very efficient for these types of image processing per pixel operations. A trend which seems to continue to favour the GPU over the CPU [OLG+ 05]. The speedup in the generation of the selective guidance encourages the use of these techniques for selective guidance processing whenever possible.


71

While for simple scenes this rendering system would function extremely well and provide the flexibility to control selective variables, the rapid image estimate does cause a number of problems. Firstly, it requires two different rendering systems to be developed and maintained concurrently. Secondly, the complexity of the scene in terms of both lights and geometry adversely affects the performance of the rapid image estimate, although these can be alleviated with level of detail and occlusion culling techniques. Thirdly, and perhaps most importantly, the rapid image estimate does not capture certain effects produced by global illumination, such as indirect diffuse illumination, shading and shadows from area light sources and participating media, which could have an impact on the final solution, and omitting them would mean that the selective guidance having no knowledge of them, and subsequently might result in a poor selectively rendered image. We discuss these issues in further detail in Chapter 8. A further use of GPU-assisted selective rendering we have not discussed in this section but is relevant to our work, is for the creation of task maps to exploit top-down visual attention, an effective replacement for the work described in Case II. This technique has been used in [Lon05, SDL+ 05] and in the selective renderer described in the next case study. The issues that apply for bottom-up visual attention and GPU-assisted selective rendering apply in this case too.

4.7 Case IV: Selective Rendering in Parallel

4.7

72

Case IV: Selective Rendering in Parallel

SM

TM G

BQ

SQ

TM

IM

R apid im age estim ate

SG processing

Selective R enderer

G PU M C

MC

N ode 1

N ode 2

N ode 3

N ode 4

Figure 4.13: Case IV selective rendering pipeline.

In this section we bring together several of the approaches of selective rendering highlighted previously. We present a selective parallel rendering framework and demonstrate how it is possible to reduce rendering times by exploiting these approaches to near real-time high-fidelity rendering for complex scenes. We demonstrate how selective rendering can make use of various hardware in particular, distributed systems to achieve significant performance improvements. This work has been published in [DSPC05, CDS+ 06]. A selective renderer similar to this without the parallelism has been published in [SDL+ 05]. The selective rendering pipeline for Case IV is shown in Figure 4.13. As can be seen, the pipeline has become more complex, both horizontally and vertically. While the pre-selective rendering stage is similar to the previous methods, it makes use of additional hardware as does the final selective rendering stage. The more complex selective guidance makes use of the importance map [SDL+ 05].


73

Figure 4.14: The Kalabsha scene: (left) the full rendered image, (middle) the saliency map and (right) visualisation of the parallel sub-division of workload, in reality the sub-division is finer.

Figure 4.15: The Corridor scene (Frame 75): (left) image rendered with selective quality, (middle) the task objects and (right) the task map including foveal angle gradient. For this scene, the task objects are the fire extinguishers, fire alarms and emergency exit signs.

4.7.1 Selective Rendering The pre-selective rendering is generated using a quick rasterisation pass using hardware, to generate inputs for the selective guidance. In this case we have two maps generated, an image preview for generating the saliency map from, as in Case III, and a task map, which uses the graphics hardware to quickly identify and project the task objects onto the image plane. Within our system the importance map is a combination of a task map to account for the effect of topdown visual attention and a saliency map for bottom-up visual attention. Other potential maps for selective guidance, not present within our implementation, such as maps generated for timecomplexity [GDC05, GLDC06] and motion [YPG01], could be included in the importance map. For the saliency map, an image preview is generated using rasterisation and is then used for generating the saliency map by using it as input to a saliency generator. We use the full saliency method from Itti et al. [IKN98] to compute our saliency map as opposed to the Longhurst et al. [LDC06] saliency as was done for the previous selective renderers, since this was not completed at the time. Figure 4.14 demonstrates the saliency map (middle) of the rendered Kalabsha scene (left).


74

Secondly, a rasterised quick estimate, is used to generate a task map by identifying user-selected task-related objects and applying a foveal-angle gradient around the objects. Figure 4.15 demonstrates how a task map (middle) with added foveal-angle gradient (right) is generated for one of the frames (left) for the animations described in Section 4.7.4. The saliency map could have been generated through the GPU-assisted technique presented in Case III. The task map and saliency map are combined using a user-defined weighting into the importance map. The importance map is then input into the selective renderer for use in the next phase of the rendering.

4.7.2

Selective Rendering in Parallel

The phase corresponding to the selective rendering process uses parallel processing to speed up the rendering further. The importance map is used by the master to subdivide the workload and by the slaves to decide how many rays per pixel to cast. The master is responsible for subdividing the image plane into a number of tiles of a given granularity. Each image tile represents a job for a slave to compute. We use the importance map as a simple cost prediction map. Since, at the slave, the importance map dictates the number of rays shot per pixel, the master uses it to improve subdivision by ensuring that each tile contains an equal number of primary rays to be shot. This improves load balancing by ensuring a more even distribution of the workload. The improvement is of 2% to 4% in terms of computation time when compared to a fixed tile demand driven approach. Although the computational requirements of each individual ray may differ, the demand driven approach together with our subdivision map alleviates the problem significantly. Figure 4.14 demonstrates how the importance map (middle), which in this case is just a saliency map, effects the subdivision of the workload (right). Each tile can be visualised as the area between two white lines. In this frame, the bottom part of the image is not that salient, therefore the image tiles for this part of the image are larger. Conversely, the middle part of the image is more salient, requiring more time to compute, so the tile sizes are smaller. The master farms out the work to all the slaves in the form of the coordinates of the tile to be rendered. The slaves then render the image tile assigned to them using the importance map to direct the rendering of each pixel. The sole selective variable in this case is rays per pixel. When the slave finishes executing the job, it asks for more work from the master until the entire process is completed.

4.7.3 Parallel Irradiance Cache Ray tracing is traditionally easily extended into a parallel framework, however our approach follows the R ADIANCE implementation. Although R ADIANCE uses distributed ray tracing to render images, the irradiance cache [WRC88] is used to accelerate the calculation of the indirect diffuse


75

Figure 4.16: Broadcast parallel irradiance cache.

...

...

...

...

Figure 4.17: The Kalabsha scene (top) and Corridor scene (bottom) 90 frame animations. Frames: (left) the first frames, (middle) the 45th frames and (right) the final frames.

component. As the irradiance cache is a shared data structure, it is non-trivial to parallelise. For this selective renderer, we have parallelised the irradiance cache using an approach similar to that of [KMG99]. Our own novel approach to a parallel irradiance cache will be discussed in Chapter 6 An overview of the system architecture for this method can be seen in Figure 4.16. For the distributed approach, each slave maintains an outgoing buffer storing computed irradiance cache values. Whenever the the buffer size meets a given threshold, it is broadcast to all other slaves. Each slave is allowed to broadcast computed irradiance cache values to every other slave. In order to maximise computation, each slave has a separate communicator process which listens for incoming irradiance cache samples. Whenever a set of samples is received, the communicator process stores the data in a shared memory area, where the computation process can collect it and insert it onto the local irradiance cache structure.


76

4.7.4 Implementation and Results As with the other case studies, this selective renderer was implemented in R ADIANCE. Distributed computation over a network of workstations was performed using the MPI message passing protocol. It had been our intention to use rasterisation for the pre-selective rendering phase, but the Snapshot software was not completed by the time the experiments were performed, so we simulated the image previews using a low-quality ray tracing pass. The task maps were identified using Snapshot. We used two complex scenes for evaluating our system, the Kalabsha scene [SCM04] and the Corridor scene. In both cases we use animations for our results as opposed to the still images we had used in the results for the other selective renderers. The system we used for our results, is a cluster of eight dual Intel Xeon processors running at 2.4 GHz with 3 GB of memory under Linux and a single workstation with a single processor 2.53 GHz Intel with 1 GB of memory acting as the master controller for the parallel implementations. All the nodes were connected by a Fast Ethernet N-way switch (100 Mbit). We used default R ADIANCE settings to render the scenes and both scenes were rendered without a pre-computed irradiance cache.

Exploiting Bottom-Up Visual Attention

We demonstrate bottom-up visual attention within our framework by showing timings of rendering a 90 frame animation in the Kalabsha scene, see Figure 4.17 (top), with four different settings. All frames were rendered at a resolution of 500 × 500 with a maximum of 5 rays per pixel. The first rendered sequence used the plain uniprocessor version, representing the traditional rendering method within R ADIANCE. In this sequence all the pixels in each frame were rendered with 5 rays per pixel. The second sequence exploited visual attention by rendering only the salient parts of the scene at a higher quality, based on a saliency map of that scene. The third rendered sequence used the parallel version on 16 processors to render the sequence without visual attention. The final sequence used both parallelism on 16 processors and the saliency maps for visual attention. We used the distributed version of the parallel irradiance cache for both of the sequences that were rendered in parallel. Results, presented in Figure 4.18, clearly demonstrate the effectiveness of our approach. For all the rendered sequences, the first few frames of the animations were initially quite expensive. This was due to the irradiance cache being empty. In subsequent frames as the irradiance cache became more populated the timings became more homogenous. The uniprocessor (uni SM) saliency version gained a 3 time speedup over the standard uniprocessor version (uni). The parallel (16) version was around 13 times faster than the traditional version. The combined saliency and parallelism (16 SM) approach was around 37 times faster than the traditional rendering.


77

Kalabsha

Kalabsha

180 uni uni SM 16 16 SM

160

12

120

Time (seconds)

Time (seconds)

140

uni uni SM 16 16 SM

14

100 80 60

10 8 6 4

40

2

20 0

0 10

20

30

40 50 Frame No.

60

70

80

90

10

20

30

40 50 Frame No.

60

70

80

90

Figure 4.18: The Kalabsha scene exploiting bottom-up visual attention: (left) comparison of all results and (right) zoomed in on the parallel results only Corridor

Corridor

1000 uni uni TM 16 16 TM

800

uni uni TM 16 16 TM

90 80

Time (seconds)

Time (seconds)

70 600

400

60 50 40 30 20

200

10 0

0 10

20

30

40 50 Frame No.

60

70

80

90

10

20

30

40 50 Frame No.

60

70

80

90

Figure 4.19: The Corridor scene exploiting top-down visual attention: (left) comparison of all results and (right) zoomed in on the parallel results only. Exploiting top-down visual attention

We use the Corridor scene to demonstrate how we exploit the top-down approach of visual attention. The Corridor scene was designed to investigate the potential of selectively rendering an animation, where the user is asked to perform a fire safety task within the virtual scene [SDL+ 05]. The viewer is asked to verify the position and number of the fire safety objects placed within the corridor. We used this knowledge to render the pre-selected task objects and the foveal angle around these objects at a higher quality than the rest of the image, see Figure 4.15. As with the Kalabsha scene we rendered a 90 frame animation for four sequences, see Figure 4.17 (bottom). All sequences were rendered at a resolution of 500 × 500 and a maximum of five rays per pixel. We calculated timings for rendering an animation using the traditional method, using task maps to render the animation on uniprocessor, traditional method in parallel and using task maps for rendering on 16 processors. For these animations the importance map was the task map as no saliency map was used. Results, are presented in Figure 4.19. The uniprocessor version using task maps (uni TM) was about 2.5 times faster than the traditional version (uni). The parallel version (16) running on 16


Traditional uni Selective uni Speedup uni Traditional 16 Selective 16 Speedup 16

Kalabsha First Frame 45th Frame 299 104 223 30 1.34 3.46 29 7.9 23 2.4 1.26 3.29

78 Corridor First Frame 45th Frame 1,663 599 1,290 224 1.28 2.49 139 44 134 18 1.04 2.44

Table 4.8: Timing for the first frame, which does not benefit from the previous frames computation of the irradiance cache, and the 45th frame which does. Timing in seconds. processors gained a 13 times speedup. The parallel version on 16 processors using task maps obtained a speedup of about 31 times on the traditional version (16 TM).

4.7.5 Still Images It is useful to compare the speedup of the first image in the animation for the selective rendering, since, as was the case in the selective renderer in Case I, this corresponds to the method using expensive irradiance cache computations. Table 4.8 demonstrates the timings for the first frame compared with the average timings for an arbitrary frame in the animation. As was the case with the images rendered in Case I, little speedup is achieved from the selective computation for a frame with an empty irradiance cache compared with frames that have pre-calculated irradiance caches, re-iterating the importance of using other selective variables besides rays per pixel. Section 6.6 will address this issue for parallel selective rendering.

4.7.6 Case IV Discussion The renderer presented here is more of an overall selective rendering framework which aspires to make the best use of available hardware and exploit various aspects of visual attention. We have demonstrated through these results that speedup can be obtained for animations also without compromising the rendering quality. Note that the results of the Corridor scene were validated, albeit with slightly different settings in [SDL+ 05]. Yet, there are further gains to be made as was demonstrated in Case III when we used multiple selective variables and sub-sampling to improve rendering times. Also, the use of parallelism introduces different problems which might need to be solved separately, particularly the expense of the irradiance cache for the first few frames.

4.8 Case V: Time-Constrained Rendering

4.8

79

Case V: Time-Constrained Rendering

In this section we introduce a slightly different selective rendering approach. The selective guidance systems described so far have helped identify which are the most relevant areas of a scene to be rendered. The selective guidance may be seen as prioritising certain areas of an image allowing the rendering to be performed in order of the most important elements first. These priorities are useful in a time-constrained environment since the computation may be interrupted at any point to satisfy the timing constraints and it is desirable to have rendered the most important parts of an image by that time. The time-constrained rendering uses the same selective rendering pipeline as we have described in Case I, but with the possibility of using more advanced selective guidance such as an importance map. The pre-selective rendering stage doubles as both an initial profiling stage and the more traditional role of image preview and the selective rendering stage is where the time-constrained rendering is performed. Time-constrained rendering benefits from a method of profiling and scheduling rendering tasks or jobs as in traditional real-time systems [BW90]. The task we identify for this time-constrained renderer is the shooting of a ray or group of rays from the virtual camera consisting of the entire recursive computation of these rays. We term this collection of rays a batch. Profiling is also carried out on a batch basis. An instruction counter is used to ensure fine grained profiling with a tiny overhead of only a few instructions per ray. For traditional ray-tracing the cost of tracing rays within a given image-based area is for the most part constant. In these cases, maintaining an image-based timing map of given areas is usually sufficient for predicting the cost of a subsequent ray. However, when using more complex data structures such as the irradiance cache, this cost may vary as the indirect diffuse part of the computation may be computed once, cached and then used to interpolate from for subsequent computations. Our selective time-constrained rendering uses rays per pixel with adaptive sub-sampling as the selective variables.

4.8.1 Time-Constrained Rendering for Traditional Ray-Tracing In the simplest case for rendering without irradiance cache, profiling is only performed at the preselective rendering stage. The selective guidance produces not only an importance map from the preview image but also a profiling map that maintains computational costs for that particular area of the image. The selective guidance is then used to calculate the number of rays that need to be computed. This list of rays is then sorted into an array of queue of rays. The size of the queue of rays array is set equal to the maximum number of rays per pixel. Each one of the queue of rays represent one of the rays that need to be shot per pixel such that a pixel with maximum rays per pixel will have one ray in each of the queues. Each queue of rays is a queue sorted by the ray’s pixel importance. Encoded with each ray is the estimated cost in instructions for each ray. At


A

B

4

80

4

3

A

A

2

C

3

1

B

D

A

B

1 D

E

3 G

F

2 H

1

1

D

E

I

A

B

D

2

E

I

C

F

G

H

I

1

Figure 4.20: Time-constrained rendering, an example of the rendering order of the pixels. the selective guidance stage the queue is parsed and the timing of each ray is summed while the sum is less than the time constraint. This technique is similar to the time-constrained techniques discussed in Section 3.9. While it would be possible to sort their value over cost, for sorting we assume that the cost is constant. While this is not necessarily the case, it simplifies the sorting and, the difference is negligible for a large number of rays, particularly in the second case, when using an irradiance cache. This makes Equation 3.1 simple to solve by just prioritising the rays since cost is constant. Figure 4.20 illustrates the order in which rays are rendered. Note that, as mentioned before, the rays are not profiled and scheduled individually, when in the queue, but as batches.

4.8.2 Time-Constrained Rendering using an Irradiance Cache The general rendering and ordering of pixels for running time-constrained rendering with an irradiance cache is the same as described above, however profiling and scheduling decisions are different due to the need to account for the irradiance cache computations. When rendering using an irradiance cache, the overall computation cost of a single ray on average decreases over time for a single frame, since more cached values are stored into the irradiance cache the likelihood of a lookup resulting in a cache hit improves accordingly. Figure 4.21 shows how the timing changes for three scenes as more pixels are rendered. The Kalabsha scene was rendered using two indirect diffuse interreflections and the other two scenes were rendered with 1 indirect diffuse interreflection. As can be seen the estimation of the timing is both scene dependent and varies across the computation. Since it would be hard to predicate the timing behaviour from the pre-selective rendering stage only, we adopt a running profile mechanism. Our running profile attempts to predict the cost of the next batch of rays. We use a exponential graph fitting function due to the overall appearance of the timing, which is updated for every batch. This is a minimal computation composed of a handful of instructions. The timing estimate produced by this scheduling mechanism

4.8 Case V: Time-Constrained Rendering 60

81

40 Estimate Timing

60 Estimate Timing

35

50

Estimate Timing 50

30 40

30

Time

25 Time

Time

40

20 15

20

30 20

10 10

10

5

0

0 0

100000

200000

300000 Pixels

400000

500000

0 0

100000

200000

300000 Pixels

400000

500000

0

100000

200000

300000 Pixels

400000

500000

Figure 4.21: Time-constrained rendering: timing estimates and actual timings for the Corridor scene, the Kalabsha scene and the Art Gallery scene. Note for visibility the timing calculations are coarser grain than what would normally be used. can be seen in the timing plots also, see Figure 4.21. Since we use a running profile and scheduling in the selective rendering stage of the computation, this method could be considered a selective rendering cyclic process since the computation is iterated in the last two phases.

4.8.3 Implementation and Results We have implemented both time-constrained systems described above within R ADIANCE. The renderer uses a mixture of adaptive sub-sampling and rays per pixel as a selective variable. Firstly, rendering is performed using adaptive sub-sampling at the corners of a square of pixels, as further rendering is required the corners are subdivided and rendered further until no further subdivision is required in which case a rays per pixel approach is used. The criteria on whether to subdivide further and how many rays per pixel to shoot is based on the selective guidance system which is based on the importance map used in the previous selective renderer. We present two sets of results. Firstly we present the result for traditional ray-tracing using only one estimate and secondly using an irradiance cache with multiple profiling. All results in this section were computed on an Intel Pentium 4 running at 2.4GHz under Linux. For the time-constraints of traditional ray-tracing we use two scenes. The Cornell Box is rendered first using only ray-tracing and a global diffuse term then using distributed ray-tracing, and the desk scene ray-tracing only (ignoring indirect diffuse reflection computations). A maximum of 16 rays per pixel were shot and a minimum sub-sampling of 4 rays at the corner of every 4 by 4 square of pixels were used. Default parameters were used for all other settings. We use batch sizes of 128, as empirical tests suggested the number was a good compromise between accuracy and


Constraint Traditional

Cornell Box 10% 25% 4 10 4.1 10.2

82

Cornell Box (distributed) 5% 10% 391 781 391.1 781

Desk 10% 25% 306 765 306.1 765

Table 4.9: Time-constrained rendering timings without irradiance cache. The timings for the scenes were: Cornell box ray traced, Cornell box with distributed ray tracing 7,810 and the desk scene 3,059. Timing in seconds


Cornell Box 10% 25% 50% 34 83 170 36 83.1 136

10% 302 341

Corridor 25% 50% 754 1507 753.8 1,507.2

Table 4.10: Time-constrained rendering timings. The values in bold are those that we consider not to have met the constraints. The timings for the scenes were: Cornell box 339 and the Corridor scene 3,014. Timing in seconds.

overheads. For the images rendered without distributed ray-tracing we present time constraints corresponding to 10% and 25% of the total non-selective computation. For the distributed raytracing this was limited to 5% and 10% since the computation completes before 25% due to the selective rendering. All selective guidance computation is in the form of a saliency map. Results are presented in Table 4.9 and the scenes are shown in Figure 4.22. What is clear from the results is that, although the rendering is estimated once, it is accurate in terms of timing. The time-constrained rendering using an irradiance cache were conducted on four scenes, the Cornell box, the Corridor scene, the Art Gallery and the Desk scene. The scenes were rendered with a maximum of 16 rays per pixel. The Cornell box was rendered with 2 indirect diffuse interreflections and the rest of the scenes with 1 indirect diffuse interreflection. All other rendering parameters were set to R ADIANCE default parameters. Timing constraints were set to 10%, 25% and 50% of the full non-selective rendered images. Results are presented in Table 4.10 and Table 4.11 and shown in Figure 4.23, Figure 4.24, Figure 4.25 and Figure 4.26. Out of all the images the only ones not to complete in time are the Cornell Box and the Corridor scene being rendered at 10%, since the irradiance cache computation for the pre-selective rendering takes considerable time, mostly due to the two indirect reflections. The rest of the results closely match the time constraints except for the 50% rendering of both the Cornell box and desk scene where the computation completes earlier due to the selective rendering. In all cases the amount of time taken up by the scheduling and profiling was around 0.1 seconds.


83

Figure 4.22: Time-constrained rendering: Top to bottom: the Cornell Box ray traced, the Cornell box using distributed ray tracing and the desk scene. Left to right: time constraint of 10%, 25% and full rendering.


Art Gallery 10% 25% 50% 277 692 1395 277.2 692 1395.2

Desk 10% 25% 50% 395 987 1,974 394.8 987 1,634

Table 4.11: Time-constrained rendering timings. The values in bold are those that we consider not to have met the constraints. The timings for the scenes were: the Art Gallery 2,769 and the desk scene 3,947. Timing in seconds.


84

Figure 4.23: Time-constrained rendering: The Cornell Box scene. Left to right and top to bottom: time constraint of 10%, 25%, 50% and full rendering.

4.8.4 Case V Discussion

We have presented a novel time constrained rendering system that uses selective rendering as a scheduling criteria. We have shown that it is possible to constrain our computation to within fractions of the computation time even for complex rendering algorithms that use an irradiance cache. The main disadvantage of this approach, which will be addressed in future chapters, is the length of time it might take to generate the preview image which is fundamental to achieve before the time-constrained rendering begins, particularly due to the irradiance cache computation. This is the same issue that besets the traditional selective rendering. Our time-constrained rendering might be improved by a combination of a subdivision data structure, such as a quadtree and a quick rasterisation preview to identify salient areas at this stage [PS89].


85

Figure 4.24: Time-constrained rendering: The Corridor scene. Left to right and top to bottom: time constraint of 10%, 25%, 50% and full rendering.

Figure 4.25: Time-constrained rendering: The Art Gallery scene. Left to right and top to bottom: time constraint of 10%, 25%, 50% and full rendering.

4.9 Summary

86

Figure 4.26: Time-constrained rendering: The desk scene. Left to right and top to bottom: time constraint of 10%, 25%, 50% and full rendering.

4.9

Summary

In this chapter, we have presented the concept of the selective rendering pipeline as a way of viewing selective renders. We have introduced a number of traditional and novel selective renderers and explored their potential by mapping them to the selective rendering pipeline. Case I introduced the general concept of selective rendering for stimulus driven visual attention and showed how the pre-selective rendering calculation influences the rest of the pipeline, particularly if too much time is spent on computation for the image preview. Case II introduced a simple selective renderer that is ideal for selectively rendering with on-screen distractors but still suffers from the same problems of Case I. Case III made use of rasterisation for a rapid image preview thus avoiding the problems with the previous cases and allowing the use of different sampling schemes and selective variable choices. The Case IV selective renderer demonstrated how to use various hardware and more complex selective guidance to speedup rendering. Finally, selective rendering was used to drive a time-constrained renderer. In the thesis we are focusing primarily on accelerating the selective rendering pipeline and experimenting with different possibilities for selective guidance and selective rendering. As we have shown in Case III speedup can be gained using more than one selective variable. This approach requires either the use of rapid image estimate in the first stage, for example the rasterisation of Case III, or the ability to create pre-selective rendering images without incurring large costs as was shown in Case I. The former case may include certain problems as were discussed in Sec-

4.9 Summary

87

tion 4.6.4. In the latter case the use of progressive algorithms for these specific selective variables would mean that the pre-selective rendering does not need to be discarded or rendered at a higher quality than would be needed. Our novel progressive selective algorithms described in Chapter 8 will be useful in this respect. Moreover, the interaction between the various stages in the pipeline has been neglected. While we have introduced some of these interactions in this chapter, we further examine these issues in Chapter 8. Also, the use of component-based rendering to break off the computation would be beneficial for the selective rendering for multiple variables since each one could be progressed individually. In the next chapter we begin by formalising techniques specific to component-based rendering. Furthermore, the breaking down of the computation into components will benefit further issues related to the selective rendering pipeline primarily introduced in this chapter: parallel computation and adaptive sub-sampling. We shall show in Chapter 6 and Chapter 7 how considering the computation at a finer granularity improves performance for these techniques as well.

4.9 Summary

88

Chapter 5

Selective Component-Based Rendering Traditionally for high-fidelity selective rendering, flexibility is obtained by varying the number of primary rays shot as we have seen in the previous two chapters. In this chapter we investigate a different method, the component-based approach, where we use the property that the light hitting a surface can be divided into various components which can then be rendered individually as a basis for our flexibility, see Figure 5.1. Figure 5.2 visualises separate components, and their composite in the final image. This also for many applications which wish to visualise only certain aspects of the light transport simulation at certain points, within an appropriate time frame, for example during an animation development process. To achieve this, the rendering process needs a flexible user-controlled system to facilitate the trade-off between the desired quality of the solution and the rendering time.

5.1

Introduction

In this chapter we demonstrate the benefits of component-based rendering for a number of aspects of selective rendering. Firstly, we show a rendering system with the flexibility of controlling the light transport rendered in an image according to a specific component regular expression, which we term a crex. Secondly, we use the crex to specify the perceptual priority of features in a scene. Finally, we use the crex within a time-constrained rendering system with and without the use of visual attention to determine which components can be computed within a bounded time. The chapter is organised as follows. Initially, in Section 5.2, we present our component-based framework and the crex. Then, in Section 5.3, we demonstrate how our framework can be used as part of the selective rendering process and in Section 5.4, we show how the crex is incorporated in time-constrained systems. In Section 5.5 we combine the time-constrained rendering and the

89

5.2 Component-Based Rendering Framework

90

N

N

N

Figure 5.1: Component-based rendering. Top: a BRDF split into components. Below splitting the computation into separate components for diffuse, glossy and specular.

selective rendering and in 5.6 we discuss certain issues that arise from use the crex. Finally, in Section 5.7 we present a simple selective renderer that uses both rays per pixel and a simple form of component-based rendering as selective variables.

5.2

Component-Based Rendering Framework

In this section we present the theory underpinning our work. We introduce a component-based rendering framework driven by a regular expression and discuss our implementation of a componentbased renderer.

5.2.1 Rendering by Components The radiance at a pixel (x, y) in direction −Θ which intersects an object in the scene at point p is given by the rendering equation, Equation 2.4 as:

Z

Le (p → Θ) +

Ωp

L(x, y) = L(p → Θ) = fr (p, Θ ↔ Ψ) cos(Np , Ψ)L(p ← Ψ)δwΨ

We can estimate L(p → Θ) using Monte Carlo integration by generating N random directions Ψi distributed over the hemisphere Ω p (we omit Le (p → Θ) for clarity):


91

Figure 5.2: The Cornell box scene split into a number of R ADIANCE shader-specific components: (top-left) direct, (top-middle) indirect diffuse, (top-right) pure specular, (bottom-left) specular for mirror shader, (bottom-middle) transmitted for glass shader, (bottom-right) reflected for glass shader (including transmitted) and (right) the full solution.

L(p → Θ) ≈< L(p → Θ) >= where Ti =

1 N Ti L(p ← Ψi ) N∑ i

(5.1)

fr (p, Θ ↔ Ψi ) cos(Np , Ψi ) p(Ψi )

The total set of N Ψi directions can be conceptually subdivided into Nc subsets of Ψic directions, commonly thought of as components, having cardinality Nic . This is commonly done by recognising two major components: direct and indirect illumination. The indirect component can be arbitrarily subdivided into a further Nc components. From Equation 5.1:

N

1 id 1 Nc Nic < L(p → Θ) >= ∑ Tid L(p ← Ψid ) + ∑ ∑ Tic L(p ← Ψic ) N i N c i where Nid + ∑Nc c Nic = N and the subscript d refers to direct illumination. By defining the operator Ni D p d as the evaluation of the direct illumination at point p over Nid directions and the operator N C p ic as the evaluation of component c at point p over Nic directions, then the above equation can be written as 1 Nc N 1 Ni < L(p → Θ) >= D p d + ∑ C p ic (5.2) N N c N

C p ic is a recursive operator, since it implies evaluating the reflected radiance at the intersection points found by the ray casting function r(p, Ψ). Subscripted ordinal prefixes will be used to refer to points and coefficients at different levels of recursion along the illumination path, 1 p and 1 T


92

N

referring to the primary ray intersection. C1 pic can be expanded as N C1 pic

Nic

Nic

1 Nc N 1 Ni = ∑ 1 Tic L(1 p ← Ψic ) = ∑ 1 Tic ( D2 pd + ∑ C2 pic ) N N c i i

with 2 p = r(1 p, Ψ). By substituting into Equation 5.2: N

< L(p → Θ) >=

1 Nid 1 Nc ic 1 Ni 1 Nc N D1 p + ∑ ∑ 1 Tic ( D2 pd + ∑ C2 pic ) N N c i N N c

This series expansion could recurse indefinitely, but in practice indirect components are not evaluated after a given depth b of the illumination path. For the particular case of b = 2 the final equation is 1 Nid 1 Nc Nic 1 Ni < L(p → Θ) >= D1 pd + ∑ ∑ 1 Tic Dp N N c i Nid 2 The previous equation indicates how the equation can be solved by calculating the direct illumination of each separate component at discrete steps and how the coefficients Ti must be rippled down the path for correct weighting. This is fundamental for our framework since we set aside indirect values and always calculate the direct incident radiance for the component which is currently being executed. The rippled coefficient at level b is given by Tripple =

b−1 1 ∏ ( j Tic ) b Tid ∏bi=1 (i N) j=1

where i N is the total number of rays spawned at each level along the path.

5.2.2 The Component Regular Expression Inspired by Heckbert’s light transport notation [Hec90], we propose a component regular expression, or crex , which informs the renderer on the order in which the components are to be rendered. The crex takes the form of a string of characters with each character representing either a component or a special character used for recurrence or grouping, as shown in Table 5.1. The BNF of our syntax is presented in Table 5.2. The alphabetic characters each represent an individual component. The order of the components in the crex dictates the order in which the components are rendered. Components spawn rays to be executed by subsequent components or groups. The integer multiplier is used to execute a component or group k times. The * operator is used to execute the previous component or group until no more rays belonging to the recursed components are spawned. The groups ( ) and < > are used to group components together. When using ( ) the components or groups of components within a group inherit the spawned rays from the previous components within the group. On the other hand when using < > all of the rays spawned within the group will be executed when the < > block terminates. The components within < >

5.2 Component-Based Rendering Framework Character ( ) < > { } [ ] k positive integer * D S G T R M

93

Description Group one or more component. The latter components in the group execute rays spawned by the former in the group. Group one or more component. Any spawned ray is never launched within the group but is executed after the group terminates. Group one or more component. Group in { } is modulated by an importance map. Group one or more component. Similar to < > but only used for timing constraints. Execute last component or group k times. Execute until no more rays spawned. Indirect diffuse. Indirect specular. Indirect glossy. Transmitted glass/dielectric† . Reflected glass/dielectric† . Mirror† .

Table 5.1: The component regular expression description (crex). † Shader specific component. < crex >

::=

< component > < digit > < integer > < mult >

::= ::= ::= ::=

( < crex > ) | < < crex > > | { < crex > } | [ < crex > ] | < component > | < crex >< component > | < crex >< mult > D | G | S | T‡ | R‡ | M‡ 0| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 < digit > | < digit >< integer > * | < integer >

Table 5.2: crex BNF. ‡ Implementation specific for comrpict. can be executed in parallel. The differences between the groups is best illustrated by an example. Consider the case of (DGS)G. When the D part of the crex is executed it can spawn new glossy and specular rays. Here the glossy rays spawned by D are used immediately in the first G. If, on the other hand, we had used G, the glossy rays spawned by D are not used by the G within < >, but the G outside. The { } and [ ] groups will be discussed in Section 5.3 and Section 5.4 respectively.

5.2.3 Implementation We present our framework within the R ADIANCE. In particular, we present the implementation of a new component-based renderer, which we call comrpict based on R ADIANCE’s rpict renderer. All component-based images in this chapter have been rendered with comrpict. Our implementation functions similarly to rpict supporting similar features and parameters and, in addition, can also be passed the crex as a parameter. R ADIANCE uses traditional recursive distributed ray tracing. In order to be able to reproduce the framework described by crex, our implementation removes the recursion. In our implementation,


94

when a primary ray hits a surface, the direct lighting contribution is calculated and stored directly in a buffer representing the image plane. If necessary, secondary rays are spawned. When such a secondary ray is spawned, instead of tracing the ray, the information about this ray is stored in a bin with other rays from the same shader. The co-efficient Tripple is calculated by multiplying the co-efficient of the shader with the value of the spawned ray’s parent Tripple and stored together with this ray’s information. Primary rays have a Tripple set to one. The pixel coordinates that the ray is contributing to are also passed as a parameter to the spawned rays. After all the primary rays have been traced, the crex is checked to determine which set of rays should be considered next. The procedure is the same for these component rays as it was for the primary rays. Direct lighting is calculated and added to the image plane at the appropriate pixel coordinates, by first multiplying it with the component co-efficient Tripple , and if necessary additional rays are again spawned and stored in the appropriate bins. The process continues until the crex has been satisfied. Our current implementation supports specular (S), glossy (G), diffuse (D) and some some shader specific components such as (R) for reflected glass and dielectric objects, (T) for transmitted glass and dielectric objects, and (M) for mirror reflections. This list could easily be extended to support all R ADIANCE shaders. As our system is progressive in nature, we also introduce some level of user control for a crex with an indirect diffuse component. The user can specify a global ambient value, which is at first added to the radiance of the image and as the indirect diffuse value is calculated the global value is removed. This is similar to the ambient value that Cohen et al. [CCWG88] used for progressive radiosity.

5.2.4 Applying the crex The crex provides a flexible rendering framework, which can be used in a number of ways. For example, the crex can be used by a user who is interested in certain aspects of the light transport. Figure 5.3 shows how the user could choose to use a different crex for varying quality of rendering. The first image (top left) demonstrates the use of rendering primary rays and transparent objects while the second image (top right) uses classical ray tracing. The third example (bottom left) results in a solution similar to classical radiosity with added transparencies for clarity. The final image (bottom right) is rendered with a full solution. Figure 8.9 demonstrates how the crex can be used progressively. The first image (left), is an example of rendering using only one bounce for every component in followed by a T* for clarity. The second image (middle) is rendered with the same crex as the first, executed twice, resulting in secondary bounces that can be seen in the reflected glass and mirror. In the final image (right) the same crex is recursed until


95

Figure 5.3: User controlled component-based rendering of the Library scene with crex: (top left) T*, (top right) (TRS)*, (bottom left) (TD)* and (bottom right) (TRSGD)*.

Figure 5.4: Progressive component-based rendering of the Desk scene with crex: (left) T*, (middle) (T*)2 and (right) (T*)*.

no more rays are needed to be shot. Other applications for crex include its use in conjunction with selective rendering, as discussed in Section 5.3. Furthermore, its progressive nature also allows it to control rendering within a given temporal bound, see Section 5.4 and also for selective time-constrained rendering, described in Section 5.5. In addition, by using perceptual metrics similar to those proposed in [SFWG04], it might be possible to compute the crex dynamically.

5.3 Selective Component-Based Rendering

96

LQ BQ

SM

SG processing

SQ

Selective R enderer

C PU

Figure 5.5: Selective component-based rendering pipeline.

5.3

Selective Component-Based Rendering

Previous work on selective rendering has predominantly determined quality as a function of number of rays traced, since rays per pixel was the only selective variable. The more salient a pixel, the more rays were traced for that pixel as in [CCW03, SCCD04] and most of the selective renderers discussed in the previous chapter. In Yee et al.’s work [YPG01] and the work we presented in our Case III selective renderer, on the other hand, the saliency effected only the indirect diffuse computation, in particular the accuracy of the search radius used in examining previously cached samples inside the irradiance cache. Our approach extends this notion by empowering the renderer with the ability to terminate the path of a ray at any point of its execution as dictated by the crex.

5.3.1 Rendering Figure 5.5 demonstrates the rendering pipeline for the selective component-based renderer. For this form of rendering the selective variables are the components rather than the more traditional rays per pixel. The pre-selective rendering stage renders a few of the components and then generates an importance map as selective guidance. As was the case with the selective renderers in the previous chapter, this form of selective guidance can be replaced with other methods. The part of the crex which is grouped in { } is modulated by the value in the importance map for a given pixel. No recursion (*) is allowed in { }, but different { } can be separated. The components in { } are ordered by importance such that the first components require a lesser value in the importance map to be rendered. Effectively the selective variable for this renderer is the number of component rays shot. The importance map for the rendered image seen in Figure 5.6


97

Figure 5.6: One set of images from the Corridor scene used for the visual attention experiment: (left) high quality image (HQ) and (righ) component-based quality (CBQ).

645)/ 645) 645 64 6

Figure 5.7: A visualisation of the importance map used for the visual attention experiment: (left) the task map, (middle) the task map with foveal angle gradient and (right) a colour-coded visualisation of which components of the crex are rendered for each pixel for a crex of T{RSGM}.

(right) can be seen in Figure 5.7 (middle). Figure 5.7 (right) shows a colour-coded visualisation of how the crex effects the individual pixels for the rendered image. The part of the crex prior to the first { is used as pre-selective guidance. In the example shown in Figure 5.5, the crex used is TT{TTMRSGD}2 and the image preview is generated only using TT, the rest of the rendering is modulated from the importance map. Further examples of the entire selective rendering pipeline can be seen in Figure 5.8, Figure 5.9, Figure 5.10 and Figure 5.11. In order to demonstrate the effectiveness of component-based rendering while exploiting visual attention we ran a task-based psychophysical experiment similar to [CCW03] except that in our case the quality is determined by the crex rather than the resolution. This experiment was previously published in [DSSC05], we summarise it here for completeness, since it validates the selective component-based rendering approach.


98

5.3.2 Experiment

The experiment was conducted using 32 participants, in two groups of 16 participants each. The scene used in the experiment is an office corridor scene which contains several items related to fire safety, as shown in Figure 5.6. Before beginning the experiment, the subjects read a sheet of instructions on the procedure of the particular task they were to perform. The participants were asked to play the role of a fire security officer with the task of counting the total number of fire safety items. Both groups saw two images of the Corridor scene for a limited time of six seconds. A pre-study was run to confirm that the difference between a high and low quality image was sufficient to be easily noticeable while free-viewing the scene. An additional pre-study was run to confirm that the observer would have enough time to perform the task. We rendered a High Quality (HQ) image using standard R ADIANCE and a Component-Based Quality (CBQ) image using a crex of T{RSGM} in half the one hour rendering time it took for the HQ image. Half of the participants were shown two high quality images, HQ/HQ, whereas the other half were shown a pair of images rendered at different quality levels, HQ/CBQ. The order in which the participants saw the images was randomised to minimise any bias. The objects were in different positions in each image to avoid any familiarity between the two scenes that might affect the scan path of the eye. Having watched both images, the participants were asked which of the two images they thought had the worse rendering quality, using the two-alternative forced-choice (2AFC) paradigm. Statistical analysis using the chi-square test demonstrated that viewers failed to notice the difference between the two images.

5.3.3

Results

In this section we present results for using the importance map when rendering. Since we have already used the task map as an importance map in the previous section, for these results we only use a saliency map as an importance map. Also, since we are are limiting ourselves only to component-based rendering, all rendering is performed with one ray per pixel at a resolution of 512 × 512. We use the four scenes shown in Figure 5.8, Figure 5.9, Figure 5.10 and Figure 5.11. The crex used for the Cornell Box scene, the Art Gallery scene and the Corridor scene was TT{TTMRSGD}2. For rendering the Desk scene the crex used was TTM{RSGDTTRSG}. Results, shown in Table 5.3, demonstrate reasonable speedup compared to the standard rendering times. One important aspect of this is that the rendering was carried out at one ray per pixel, a setting at which most of the selective renderers presented in the previous chapter would fail to obtain any speedup.

5.4 Time-Constrained Rendering

gold cbr speedup

Cornell 35 7.5 4.67

99 Desk 326 86 3.79

Art Gallery 654 270 2.42

Corridor 712 163 4.37

Table 5.3: Speedup for the selective component-based renderer. Timing in seconds.

Figure 5.8: The Cornell Box. From left to right and top to bottom: pre-selective rendering images, the selective guidance, selective component-based images and the traditionally rendered image.

5.4

Time-Constrained Rendering

Our time-constrained component-based rendering uses scheduling, profiling and progression to time-constrain the rendering of images. We use a running profile as an estimate and the crex provides part of the selective guidance of what to do first when rendering with time constraints. All results in this section were run on an Intel Pentium IV 2.4 GHz system with 2 GB memory under Linux.


100

Figure 5.9: The Desk Scene. From left to right and top to bottom: pre-selective rendering images, the selective guidance, selective component-based images and the traditionally rendered image.

Figure 5.10: The Art Gallery scene. From left to right and top to bottom: pre-selective rendering images, the selective guidance, selective component-based images and the traditionally rendered image.


101

Figure 5.11: The Corridor scene. From left to right and top to bottom: pre-selective rendering images, the selective guidance, selective component-based images and the traditionally rendered image.

5.4.1 Profiling

In order to demonstrate our time-constrained rendering system we have developed a simple profiling method. In our profiling scheme, the approximate computational cost to render individual pixels, an image, or even an entire animation, is derived from data collected as the computation proceeds. We use a profile cache to maintain an estimate for each component. If the profile cache is empty, before tracing the rays for each component an initial subset is traced and stored in the cache. The timings for that particular component are then stored in the profile cache for subsequent use. The profile cache is continually built, containing the cost of tracing each component and the number of rays that have contributed to that component so far. For animations, as further frames are computed, their contribution is added to this profile cache by weighting the number of rays shot for that frame to the number of rays for that component already in the cache. The pre-computed profile cache can then be re-used for a given scene.

5.4 Time-Constrained Rendering Component % S D G T R M

Cornell Box (%) N/A 8 2 2 1.8 2.8

102 Corridor (%) 40 2 13 16 13 8

Library (%) 17 20 20 19 35 N/A

Table 5.4: Profiling results. The % error for the estimate of each component compared with the calculated time. The Cornell box and Corridor scenes are estimates from still images. The Library scene results are estimated from a profile cache and the error shown is the average error over 180 images.

Table 5.4 shows the results as a percentage error between the estimated and the calculated rendering time. The time taken to render images of the Cornell box scene, Figure 5.12 (left), and Corridor scene, Figure 5.14 (left), was estimated using an empty profile cache and the number of profiling rays shot was of 1% of the total number of rays to be computed for a given component. The results for the Library scene were estimated using a profile cache created from ten low resolution images of 256 × 179 and used to estimate 180 images at four times the resolution. To highlight the effects, the profile cache was not improved by the subsequent calculations but was kept the same for each. The results show the average error for all the 180 images. As can be seen, although simple, our profiling scheme is effective.

5.4.2 Time-Constrained Framework

The component-based rendering framework is extended to time-constrained rendering by introducing a new group [ ]. This new group is similar to < >, however, for it to be scheduled, the entire group must be likely to finish within the remaining time. The [ ] groups may be nested. Within a [ ] other groups are assumed to be [ ] also. The use of * within a [ ] on any component or group is also not allowed, and instead the * must be replaced with a user-defined integer multiplier. The ( ) and < > groups are not effected by timing constraints outside of [ ]. A component or [ ] group of the crex are the individual tasks to be scheduled by our timeconstrained renderer. The priority of the tasks is determined directly from the order of the components in the crex. Prior to allocating a task to be rendered, the scheduler decides, based on the profile cache, whether it is likely that the task will be completed in the remaining time. If that task is unlikely to complete, the scheduler considers the next task within the crex. The time to make these decisions is only of the order of microseconds.

5.5 Selective Time-Constrained Rendering

103

5.4.3 Time-Constrained Results To demonstrate our time-constrained renderer, we present the results for images produced with and without time constraints all rendered at 1200 × 1200 resolution. We change the bounds for each test to give an overall sample of what can be achieved. The first scene we demonstrate is the Cornell box. We use a crex of (TTMRGD)* for the rendering. For this experiment we set the time constraints to two-thirds and one-third of the time required to fully render the crex. Figure 5.12 demonstrates the resultant images. The full render (left) took 150 seconds. The second image (middle) under a time constraint of two-thirds the original (100 seconds) took 63 seconds to complete, and execute the complete TTMRG portion of the crex. The final image (right) was rendered in 49 seconds. This is just a fraction under its timing bound of one-third of the original (50 seconds), with the TTM portion of the crex executed. The second scene is the view of the showcase from the Library scene. For this test we precalculated the cost of rendering the primary rays (slightly under three minutes) and began our test rounding up by one minute for each different scene. The crex chosen for these sets of renderings was (TT[RS]G)*D. Figure 5.13, demonstrates the resultant images. Under a five minute time constraint (left), the result is equivalent to a crex of (TT[RS]G)* . The second image (middle) was bound by four minutes, and in this case the crex is equivalent to TT[RS]. Finally, the three minute time-constrained image (right) was rendered with a crex equivalent to TT only. The final scene helps to highlight some of the limitations of our current implementation based solely on component-based rendering. For this experiment we calculated the cost of rendering the full crex and then used time constraints of a half of the full rendering and one quarter. Figure 5.14 demonstrates the resulting images from this test. We use the Corridor scene with a crex of TTMRSGD for the full rendering (left). The time taken was 860 seconds. Under the time constraint of a half the full rendering (430 seconds) the rendering of the TTM portion of the crex was completed in 428 seconds. However, for the time constraint of a quarter (215 seconds), only the primary rays were rendered in 372 seconds thus failing to meet the bound.

5.5

Selective Time-Constrained Rendering

In this section we bring together the selective and time-constrained component-based rendering approaches to render images within a given time using per-pixel selective guidance. The time-constrained rendering using selective guidance can be used in one of two ways. Firstly, it can be used as an aid to the profiling and scheduling. In this case the selective guidance is used only to direct the time-constrained rendering by prioritising and scheduling each individual


104

Figure 5.12: Cornell Box: (left) no time constraints (crex is (TTMRGD)*), (middle) two-thirds time constraints of full rendering (equivalent to a crex of (TTMRG)*) and (right) one-third time constraints of full tendering (equivalent to a crex of TTM).

Figure 5.13: Library scene with a view of the showcase with an attempted crex of (TT[RS]G)*D: (left) five minutes time constraint (equivalent to a crex of (TT[RS]G)*), four minutes time constraint (equivalent to a crex of TT[RS]) and (right) three minute time constraint (equivalent to a crex of TT).

Figure 5.14: Corridor scene: (left) no time constraints (crex of TTMRSGD), (middle) time constraints of a half the full rendering (equivalent to a crex of TTM) and (right) a time constraint of a quarter of the full rendering time (primary rays traced only).


105

component ray by their pixel importance, thus providing time-constrained rendering at a finer granularity than just using the whole set of component rays for profiling and scheduling, as was the case for the time-constrained renderer described in the previous section. Secondly, it could be used to modulate the crex, as was done in Section 5.3, as well as prioritising the rays for scheduling. In the first instance, the scheduling for time-constrained rendering is not computed for the whole component, that is for the whole set of rays in one component, but for every individual ray within the components. The same crex possibilities as those used in the time-constrained rendering are available. Each component in the crex is scheduled in order. While the rays of a particular component are being computed, a running estimate of the cost of each ray is maintained. This profiling estimate is performed for every batch of rays. As with the selective time-constrained renderer from the previous chapter, a batch is a set of rays, in this case component rays. Timing is accurate since it uses the same instruction counters to those introduced in Section 4.8. Scheduling occurs at regular intervals also. This might be at the same time as the profiling. When the estimated time to render the next set of component rays is larger than the time constraint, the computation is stopped. The advantage of this system over the one presented in the previous section is that the time constraint can be more accurate. As we shall show in the results section, the computation nearly always terminates within a fraction of a second from the intended constraints. Also decoupling the computations of the various components provides further flexibility allowing the rendering to terminate earlier in the computation than was possible with the more traditional timeconstrained rendering approach. The selective guidance allows the rays from each component to be first prioritised, such that if the computation were to terminate during the computation of one component the component rays contributing to the most important pixels would have been computed first. As with the general time-constrained system, the indirect diffuse values are replaced with a global value when the indirect diffuse computation is not completed on time. As we have shown in Section 5.3, the combination of crex and selective guidance can be used to selectively render individual pixels. While in the above time-constrained rendering system the importance map is only used for prioritising, the second instance of the time-constrained renderer uses the selective guidance for both scheduling and rendering. This time-constrained renderer functions similar to the time-constrained rendering described above, but with the added condition that all the pixels are modulated using the crex as was described in Section 5.3.

5.5.1 Results We present results for these two renderers for a variety of scenes. For each scene we render with time constraints of half the time it took to render a non time-constrained image, a quarter of the time and ten percent of the time. The full image timing is the time taken to generate a fully renderer


Constraint cbtcr

Library 10% 25% 50% 40 99 199 40.2 99.5 198.7

106

10% 83 111

Desk 25% 50% 207 413 207.5 413.5

Table 5.5: Time-constrained rendering timings. The values in bold are those that we consider not to have met the constraints. The timings for the scenes were: the Art Gallery scene 2,180 and the Desk scene 826. Timing in seconds.

Constraint cbtcr-sel

Cornell Box 10% 25% 50% 8 20 40 15 19.75 28

Corridor 10% 25% 50% 237 593 1,186 427 590 698

Table 5.6: Time-constrained rendering timings. The values in bold are those that we consider not to have met the constraints. The timings for the scenes were: Cornell box 80 and the Corridor scene 2,371. Timing in seconds.

image with the same settings with a traditional renderer. All scenes were rendered at a resolution of 512 × 512 at 4 rays per pixel. Default values were used for the rest of the parameters. The scenes used were the Cornell box, the Corridor scene, the Library scene and the Desk scene. The crex used for the Cornell box scene was TT{TTMRSGD}2, for the Corridor scene T{TMRSGD}3, for the Library scene T{TMRSGD}2 and for the Desk scene MT{TRSGD}3. All timings were computed on an Intel Pentium 4 running at 2.4 GHz under Linux. Results of our computations can be seen in Table 5.5 and Table 5.6. The first instance of the time constrained renderer is labelled cbtcr and the second using the selective guidance for rendering is labelled cbtcr-sel. The rendered images for cbtcr are shown in Figure 5.15 and for cbtcr-sel in Figure 5.16. The table highlights in bold the times for those images that did not complete within (or close to) the time constraints, that is before the pre-selective guidance stage. The results clearly demonstrate that the component-based time-constrained renderers scales rather well. It is clear that the error in the timing constraints is minimal, as opposed to the coarse nature of timing using the time-constrained renderer as was seen in the results in Section 5.4.3. The exception is when rendering with 10% time constraint, but this problem is due to having to at least render the primary rays. Also notice how in the 10% time constraint for the Desk scene the rendering of the transparent material exposing the Kalabsha picture under the glass frame is cut-off showing only part of the image, since not all transparent rays had been processed. The raster ordering of the pixels was still maintained here since the rendering stops before the selective rendering phase so the pixels would not have been prioritised yet. This demonstrates that even if the selective guidance has not yet been computed, the time-constraints are already being obeyed.


107

Figure 5.15: Component-based time-constrained rendering: the Library scene and the Desk scene. Top to bottom and left to right for each scene: time constraint of 10%, 25%, 50% and full rendering.


108

Figure 5.16: Component-based time-constrained rendering: the Cornell Box and the Corridor scene. Top to bottom and left to right for each scene: time constraint of 10%, 25%, 50% and full rendering.

5.6 Issues Related to Selective Component-Based Rendering

109

5.5.2 Time-Constrained Rendering Comparison

In this section we show two scenes rendered under the same time constraints as a comparison between the time-constrained selective from Case V in Chapter 4 and cbtcr-sel. We chose time constraints of half the time and 10% of the time it took to selectively render the image using the selective renderer which will be discussed in Chapter 8, with rays per pixel as a selective variable. We used default R ADIANCE settings with 1 indirect diffuse bounce, and a maximum of 16 rays per pixel for the traditional time-constrained renderer and a fixed 4 fixed rays per pixel for cbtcrsel when rendering with half the time and 1 fixed ray per pixel when rendering with 10%, which makes the image noisier, but allows it to complete in the allotted time. The scenes used were the Desk scene, see Figure 5.17 and the Corridor scene, see Figure 5.18. For the 50% time constraint, the Desk scene was rendered with a time constraint of 690 seconds and a crex of MT{TRSGD}3, and the Corridor scene was rendered at 524 seconds and a crex of T{TMRSGD}2. For the 10% time constraint the timing was 138 seconds for the Desk scene and 105 seconds for the Corridor scene and the same crex was used in the component-based rendering case. It is clear that cbtcr-sel has less problems with noise, although it conveys less information of the indirect lighting. Experiments would need to be conducted to decide which image best represents the original. The selective time-constrained renderer did not complete within the 10% time constraint. Rendering of the Desk scene went 121 seconds over budget and the Corridor scene rendering time constraint was exceeded by 86 seconds. The results clearly demonstrate that the component-based time-constrained renderers scale better to lower time constraints since the scheduling and profiling is at a finer grain than that of the traditional time-constrained renderer. The time-constrained renderer described in Chapter 8 attempts to provide the best features out of these two time-constrained renderers.

5.6

Issues Related to Selective Component-Based Rendering

The selective component-based rendering approach we have presented so far raises some issues that will be presented here and addressed, in some cases, further in this thesis.


110

Figure 5.17: Comparisons between the time-constrained renderers for the Desk scene. Component-based time-constrained rendering (top left) and selective time-constrained rendering (top right) with a 25% time constraint. The reference image (below left). The component-based time-constrained rendering with a 10% time constraint.

Figure 5.18: Comparisons between the time-constrained renderers for the Corridor scene. Component-based time-constrained rendering (top left) and selective time-constrained rendering (top right) with a 25% time constraint. The reference image (below left). The component-based time-constrained rendering with a 10% time constraint.


111

5.6.1 Memory Issues Due to the progressive nature of the crex approach, every pass needs to store a certain amount of information. The potential ray explosion incurred by distributed ray-tracing in our componentbased implementation may result in high levels of memory consumption particularly due to the indirect diffuse computations which spawn hundreds of rays upon intersection. The use of an irradiance cache provides a natural solution to this approach, since the irradiance cache reduces the number of indirect diffuse computations required to the order of thousands rather than tens of thousands or hundreds of thousands required by a traditional distributed ray tracing approach. While the computation might still be excessive with the use of an irradiance cache for more than one indirect diffuse bounce, an approach similar to that used in the irradiance cache implementation in [PH04] may be adopted, whereby all bounces after the first bounce use only one indirect diffuse ray. Furthermore for animations where lighting does not change, the irradiance cache is only slightly updated therefore the costs are even less (see irradiance cache analysis in Section 6.2).

5.6.2

crex Issues

While the crex provides flexibility, it might add an extra layer of complexity and decision making for the user. This might not always be wanted or needed, so some method of automation might be useful. Also, the correct use of the crex requires the animator to be familiar with the scene and as with other modelling options, experience would be vital to determine the ability at which the crex proves useful. For example, the use of a crex using TT in Figure 5.12 requires the user to know that there are two layers of transparency in the sphere inside the Cornell Box. While we do not provide any automatic method of creating the crex, the work of [SFWG04], that identifies the components which are more perceptually important, might potentially be able to provide an automatic crex for a given scene or image. In this thesis we present a number of selective renderers that make use of automatic component-based rendering techniques which are independent from the crex. The first of these will be described Section 5.7. Another issue to arise from the crex as it is defined in this chapter, is that strict and simple adherence to the crex sequence would only be possible in renderers that use forward ray tracing only. If for example a photon mapping algorithm is used to generate caustics and diffuse computations as a pre-rendering pass, the indirect diffuse components will not obey the crex path (although these computations might be for the most part perceptually unnoticeable). The same effect might apply if care is not taken when rendering using the irradiance cache in combination with visual attention. It is important to render the pixels deemed more important first such that the more important pixels do not interpolate indirect diffuse values from less important ones, that were previously calculated. For animations this might require the need to store the quality of each irradiance cache sample and ignore that sample when a request for a higher quality sample is made.

5.7 Applying a component-Based Approach to General Selective Rendering

112

Figure 5.19: Differences in quality for images rendered using the specular threshold componentbased renderer. From left to right: high quality, low quality and perceptual visual differences calculated using VDP.

5.7

Applying a component-Based Approach to General Selective Rendering

Until now we have intentionally limited ourselves to selective component-based rendering to demonstrate the merits of this approach exclusively. Also, in the previous section we discussed some disadvantages of this approach in certain situations. In future chapters we will demonstrate how component-based techniques can be used in the design of more complex selective renderers. In this section we present a simple selective renderer influenced by the component-based selective rendering that addresses some of these issues. This selective renderer is based on the work discussed in Case III and Case IV from Chapter 4. The selective renderer uses rapid image rasterisation for the pre-selective rendering stage, an importance map for the selective guidance stage and uses two selective variables for the selective rendering stage. The two selective variables used are the rays per pixel and a selective variable we term the specular rendering threshold, a parameter commonly found in ray-traced renderers. The specular rendering threshold is based on the material properties of the object a ray hits. All object materials have an associated specular material component. Whenever a ray hits an object, if the object’s material specular co-efficient is higher than that of the specular threshold a specular ray is shot, otherwise none are shot. This method provides a simple and effective method of reducing the computation through a simple component-based approach using the specular component properties. It also addresses the complexity that may occur when using crex and by not using component-based rendering exclusively may produce better results. Figure 5.19 demonstrates how the two selective variables effect a rendered image, and the perceptual differences as captured by the VDP between them.

5.7 Applying a component-Based Approach to General Selective Rendering

113


We implement a selective renderer using the specular threshold based on the GPU-assisted selective renderer implementation of Case III in Chapter 4. Selective guidance takes the form of an importance map, used in the selective renderer of Case IV, see Section 4.7. While we intended the pre-selective guidance to be based on rasterisation, we use a low-quality ray traced image for representing the rasterised image since the Snapshot software, introduced in Section 4.6, was not fully implemented at the time this work was done. The Snapshot software is however used for generating task maps. The selective renderer uses two selective variables, either rays per pixel only or both rays per pixel and specular threshold. Jittered stratified sampling was used for pixel sampling. The scenes we use for the experiments are shown in Figure 5.20 (top). The selective guidance used for these scenes is shown in Figure 5.20 (bottom). The scenes chosen had a variety of specular surfaces to highlight the effects of this renderer. The rendering was performed with default rendering settings at 16 rays per pixel and high quality specular settings. The irradiance cache used was pre-computed. All images were rendered on an Intel Pentium 3 2.4GHz under Linux. Results can be seen in Table 5.7. The traditional rendering is labelled gold. The selective rendering with only rays per pixel is labelled sel-rp and the selective renderer with both rays per pixel and specular threshold is labelled sel-st. As can be seen the speedup varies depending on the scene. The speedup between sel-st and sel-rp while not very large is still substantial. This selective renderer was used in [SDL+ 05] to validate animations rendered using a task map, a saliency and an equally weighted importance map, IM(0.5, 0.5, +). Users failed to notice the difference between animations rendered selectively and in high quality for the Corridor scene. Timing results of these animations can be seen in Figure 5.21, where HQ, IQ, SQ, TQ, TWFQ and LQ represent high quality, selective quality with importance map, selective quality with saliency map, selective quality with task maps, selective quality with task map without foveal angle gradient and low quality respectively.

gold sel-rp speedup sel-st speedup speedup (st vs. rp)

Corridor 4,309 1,007 4.28 885 4.87 1.14

Desk 3,105 419 7.41 372 8.35 1.13

Library 15,098 2,531 5.97 1,801 8.38 1.40

Art Gallery 2,591 331 7.83 311 8.33 1.06

Table 5.7: Speedup for specular threshold component-based renderer. Timing in seconds.

5.8 Summary

114

Figure 5.20: Scenes used for results for specular threshold component-based renderer. From left to right: the Corridor scene, the Desk scene, the Library scene and the Art Gallery scene. Top row shows rendered images and bottom row selective guidance. Corridor 200 HQ TQ IQ SQ TWFQ LQ

Time

150

100

50

0 0

50

100

150 200 Frame No.

250

300

Figure 5.21: Results for the animations used for validating various selective guidance methods using the specular threshold component-based renderer.

5.8

Summary

We have demonstrated how rendering can be subdivided into smaller computations which can further add to the flexibility of our rendering systems, a useful property that will benefit a number of applications as we shall show in the latter chapters of this thesis. We have presented a framework for time-constrained selective rendering using components. Pri-

5.8 Summary

115

marily the component-based selective rendering framework directed by crex introduces a novel level of flexibility for selective and progressive rendering. We also showed how the crex can be used for selective rendering using the selective guidance to influence the crex on a per-pixel basis. Furthermore, in our experiment, we showed that participants were not able to distinguish between the quality of the HQ and CBQ renderings when they were performing a visual task. The crex results from Section 5.4, show the potential of the component-based rendering for profiling and rendering within given deadlines for both traditional and selective rendering approaches. The timeconstrained component-based approach provides better results than the selective time-constraint renderer when noise reduction or antialising is the priority and can potentially scale better to lower time constraints. A future use of the crex may be to include it into an automatic perceptual rendering framework, which would benefit from the general advantages of the crex. Another, future use of the selective component-based rendering as presented here that we do not explore is in combination with rasterisation, whereby the first pass is rendered using fast graphics hardware in a fraction of the time it takes to shoot the primary rays. This pass can also be used to identify the intersection points for the primary rays. The crex approach is then used for all subsequent computations. This is an ideal marriage since rasterisation is ideal for fast simple direct lighting calculations and ray tracing is ideal for computation of global illumination and the crex can be used as a flexible method of controlling these calculations. This problem partially solves the aliasing and noise problems, that are usually solved with higher ray per pixel counts, since at least direct lighting for the primary intersections could be computed at high resolutions. Furthermore, timing problems with rendering the first pass such as those seen in Section 5.4.3 would be mostly solved since the primary computation would be orders of magnitude faster.

5.8 Summary

116

Chapter 6

Accelerating the Irradiance Cache through Parallel Component-Based Rendering In this chapter we take a step back from selective rendering algorithms to present novel work derived from our approach of viewing the granularity of the rendering at a finer level than the pixel level. The component-based rendering work of the previous chapter will be used to accelerate distinct parts of the rendering process described in Chapter 4 and can be used either as part of a traditional renderer or as part of a selective renderer. With this in mind, in this chapter, we present a parallel irradiance cache algorithm that benefits from a component-based approach to rendering. Towards the end of this chapter we incorporate this technique into a selective renderer.

6.1

Introduction

The irradiance cache, as introduced in Section 2.9.1, has become a fundamental algorithm for rendering high fidelity images using global illumination [Her04], whether as a stand-alone algorithm for computing indirect diffuse values [LS98] or when used in conjunction with photon mapping [Jen01]. Also, as we have seen in previous chapters, the irradiance cache is one of the principle components of our selective renderers. As we have already mentioned in the Case IV selective renderer in Chapter 4, the irradiance cache is a shared data structure that is notoriously hard to parallelise efficiently on distributed systems particularly since the irradiance cache is at its most efficient when each cached sample can be used immediately. However, sharing implies communication overheads induced by having

117

6.2 Irradiance Cache Analysis

118

to transmit values; thus, there is a trade-off between cache misses and sharing frequency. This has been expressed in previous approaches whereby groups of cached values are either stored at a central node and then retrieved by other nodes or broadcast to every node whenever some threshold is reached. We propose a different approach to solving the problem. By conforming to the philosophy of the irradiance cache, we subdivide computation at the component level and transfer any indirect diffuse computations to a set of dedicated indirect diffuse rendering nodes, where indirect values are computed and stored. Sharing amongst the reduced set of indirect diffuse nodes occurs with higher frequency than amongst other nodes, reducing the sharing overhead while still maintaining a high irradiance cache hit ratio. To the best of our knowledge, this is the first parallel rendering approach to decompose the rendering problem into components as a means of making best use of available resources. In order to demonstrate our approach we implemented a number of parallel renderers, all extensions of R ADIANCE. Two of these represent the traditional methods [KMG99, RCLL99], while the third renderer uses the novel component-based approach to compute the irradiance cache in parallel. This chapter is divided as follows. In Section 6.2, we demonstrate the motivation behind our approach. In Section 6.3 we present our own implementations of traditional approaches to parallelising the irradiance cache. In Section 6.4 we describe our novel parallel component-based approach to the irradiance cache. In Section 6.5 we compare results from the traditional renderer and the new parallel renderer. In Section 6.6 we combine this work with selective rendering. Finally in Section 6.7 we summarise our work and describe possible future work.

6.2

Irradiance Cache Analysis

The irradiance cache is important for a number of rendering applications: rendering still images, rendering animations and potentially for interactive environments. In this section we analyse the performance of the irradiance cache through the use of two animations using R ADIANCE. To demonstrate the behaviour of the irradiance cache we present a series of results for still images and animations. We also present some results that demonstrate the behaviour of a parallel irradiance cache when not sharing values and when using the broadcast method to distribute samples.

6.2.1 Stills The irradiance cache improves rendering times by allowing the object space to be undersampled and the indirect diffuse values to be extrapolated if there are previously evaluated values within a user defined search radius. The number of extensively calculated indirect values is thus dras-


119

Figure 6.1: Irradiance cache misses, Tamb and Tnotamb (small constant time of 7.3 seconds).

Figure 6.2: High-fidelity rendering examples using an irradiance cache for animations: (left) the corridor scene and (right) the Art Gallery scene. tically reduced. However, the rendering times of images of acceptable quality from scenes with significant ratios of diffuse reflectors is still dominated by the diffuse interreflection component. The rendering time is thus strongly correlated with the number of indirect diffuse computations as shown in Figure 6.1, for the Kalabsha temple, see Figure 6.7 (bottom middle), rendered on a single machine with different diffuse interreflections quality parameters. We set the ambient accuracy parameter of R ADIANCE, which controls the acceptable search radius for extrapolation, from 0.3 to 0.1. Smaller values correspond to a smaller search radius, thus resulting in more indirect diffuse calculations, or irradiance cache misses (#ICmisses). It can be seen that Tamb , time spent on indirect diffuse (ambient) calculations, increases linearly with the number of cache misses, while the time spent on other components (Tnotamb ) remains constant (around 7.3 seconds).

6.2.2 Animations For animations, we render frames as a sequence of short animations and then again as a single long animation, in each case starting with an empty irradiance cache. This is done for two animations.


120

3000

5000 Sparse IC re-use IC re-use No IC computations

4500

Sparse IC re-use IC re-use No IC computations

2500 4000 3500 Time (seconds)

Time (seconds)

2000

1500

3000 2500 2000

1000 1500 1000 500 500 0

0 20

40

60 Frame No.

80

100

120

10

20

30 Frame No.

40

50

60

Figure 6.3: Irradiance cache analysis: (left) Corridor scene and (right) Art Gallery scene.

The first animation is a 128 frame straight walk through down the Corridor scene, see Figure 6.2 (left), and the second a 64 frame walk through using a rotation for the Art Gallery scene, from the room seen in Figure 6.2 (right) through the aperture on the left-hand side of the image. Results for timings can be seen in Figure 6.3. Initially we render the scene at intervals of 8 consecutive frames (sparse IC re-use) for the Corridor scene and 4 frames for the Art Gallery scene. Subsequently, we re-render the entire animation (IC re-use) in one pass and the same animation without diffuse interreflections to highlight their computational cost (no IC computations). Results demonstrate that the majority of the computation is taken up by the computation of the first few frames. This is the point at which the irradiance cache is being constructed. This is indeed where there are the largest number of cache misses. For the Corridor scene, the number of irradiance cache misses for the first frame is ≈ 20, 000 as the irradiance cache is initially empty. After a few frames the cache misses drop to under 500. A similar pattern can be observed for the Art Gallery scene, where even though certain later frames (from 30 onwards), when entering the new room, are more expensive to compute (for the sparse IC re-use case) the expense is absorbed gradually through re-using the irradiance cache values. For the Art Gallery scene, the number of cache misses on an empty cache is ≈ 50, 000, and this falls to below 1, 000 after a few frames. The general behaviour of the irradiance cache, as can be seen from both these stills and animations, indicates that it is important to share irradiance cache values. Cache misses can further compound computational losses in highly dynamic environments when an irradiance cache might have to be computed for nearly every frame. With this in mind we claim that to make the most of distributed computation resources for rendering still images, animations and navigating interactive environments that use an irradiance cache, a cooperative effort to subdivide the work of rendering each individual frame amongst the available resources is required for frames that would result in a large number of cache misses (typically the first frames). This is obvious for still images, and for navigating environments interactively where image space decomposition is the status quo. For animations, the approach could be to render the frames that result in irradiance cache misses using image plane decomposition and the subsequent frames either individually on different nodes by


121

Figure 6.4: Irradiance cache analysis for no sharing and broadcast. first sharing the built irradiance cache or continuing with the same approach. Either way and for both other cases it is apparent that what is needed is a speedier way of computing the few frames including and immediately after those where there is a large number of cache misses.

6.2.3 Parallel Rendering A parallel implementation of the irradiance cache on a distributed memory system might compromise efficiency due to replicated computations across the processors. Since each process has its own addressing space they will not be able to use each others’ cached indirect values, thus increasing the aggregated time spent computing these (sum of Tamb across all processing nodes). A mechanism has to be provided to share cached values. Traditional approaches include a centralised server and a broadcast mechanism as first described in Section 2.10.3. Higher sharing frequencies result in lower irradiance cache miss rates. However, even with very high sharing frequencies, there will always be a latency associated with sharing, meaning that a given value

6.3 Traditional Parallel Irradiance Cache Approaches

122

might not be available when required even if it was already computed elsewhere. Additionally, sharing implies overheads. Communication bandwidth and processor computation time are spent preparing, transferring and reading the messages. Sharing overheads and latency increase with both the sharing frequency and the number of processors. Figure 6.4 (top) shows, for the Corridor scene, shown in Figure 6.7 (top right), that irradiance cache misses are larger for the parallel no-sharing implementation than for the sequential single-processor version and that these increase with the number of processors, resulting in larger aggregated indirect diffuse calculation times. Figure 6.4 (bottom) shows that sharing the irradiance cache values, using the broadcast parallel irradiance cache approach described in Section 4.7.3, for rendering the Corridor scene, alleviates this problem, but it is still present and still depends on the number of processors. Increasing the number of processors and the sharing frequency is desirable to reduce execution time, but these result in realisation penalties that impact on efficiency. We propose to subdivide the processing elements into two sets: one dedicated to computing only the indirect diffuse component and the other to computing the remaining components of the illumination model. By reducing the number of processors that contribute to the irradiance cache, the sharing frequency amongst these can be kept high, thus reducing both irradiance cache misses and sharing overheads. Cached values are shared with the remaining processing nodes with lower frequency, in order to reduce the number of indirect diffuse calculations requests forwarded to the specialised processors.

6.3

Traditional Parallel Irradiance Cache Approaches

In this section we outline our own implementation of the traditional approaches of parallelising the irradiance cache. All our implementations use the Message Passing Interface (MPI). These implementations will prove useful to evaluate the performance of the new component-based approach.

6.3.1 The Centralised Approach Our centralised parallel irradiance cache is influenced by the standard parallel version of R ADI ANCE and the work of Robertson et al. [RCLL99]. A central node, usually the master controller, is used to exchange irradiance cache samples after these are computed by the individual nodes. The software architecture of this approach is outlined in Figure 6.5. The image plane is subdivided amongst processors by a master controller (MC) in a demand driven fashion. The initial image tiles are sent to the individual nodes, processing elements (PEs), and when these are computed, the resulting image tile is sent back to the MC together with a request for new work. When the MC has run out of work the PE’s request for new work is satisfied by means of a message indicating

6.4 Component-Based Parallel Irradiance Cache

123

Figure 6.5: Centralised parallel irradiance cache.

an end of frame. In our implementation of the centralised parallel irradiance cache, each of the PEs compute irradiance cache values and store them in an outgoing buffer. Whenever the buffer reaches a user-defined threshold the buffer is transmitted to the MC. The MC maintains a list of all the samples sent to it and also a running status of which samples each slave has been given so far. When a buffer is received by the MC, the set of new samples since the last communication with the sending PE is sent back. This approach is simple to implement since only one process is needed for every slave which can handle both communication and computation.

6.3.2 The Broadcast Approach

The broadcast approach is influenced by the work of Koholka et al. [KMG99]. Whenever a group of irradiance cached values are computed, they are broadcast to other nodes. The broadcast version of the implementation was described in Section 4.7.3.

6.4

Component-Based Parallel Irradiance Cache

In this section we present our novel parallel irradiance cache algorithm. We begin with the theory behind our work and subsequently describe the component-based parallel algorithm and implementation.


124

6.4.1 Rendering by Components

In this section we re-use the component-based rendering theory first described in Section 5.2.1 but concentrate on the aspects that are specific for this case. Again we start off from the rendering equation, Equation 2.4. The radiance at a pixel (x, y) in direction −Θ which intersects an object in the scene at point p:

L(x, y) = Z

Le (p → Θ) +

Ωp

fr (p, Θ ↔ Ψ) cos(Np , Ψ)L(p ← Ψ)δωΨ

The total set of Ψ directions distributed over the hemisphere ΩΨ can be conceptually subdivided into subsets of directions, commonly thought of as lighting components. In R ADIANCE lighting calculations are subdivided into direct, indirect specular (including glossy) and indirect diffuse components [War94]. The direct and specular components are removed from the integral by spawning rays into the appropriate directions, but for diffuse interreflections the integral must be approximated using Monte Carlo integration techniques. The previous equation becomes L(x, y) = Le (p → Θ) + Ld (p → Θ) + Ls (p → Θ) + La (p → Θ) where Ld stands for direct illumination, Ls represents the specular contribution and La is for diffuse interreflections or ambient component. For the later component only diffuse interactions are computed, thus fr (p, Θ ↔ Ψ) = ka (p) and La (p → Θ) is given by Z

La (p → Θ) =

Ω p,a

ka (p) cos(Np , Ψ)L(p ← Ψ)δωΨ

where directions included on direct and specular components are excluded from Ω p,a . Using Monte Carlo integration:

La (p → Θ) ≈< La (p → Θ) >= πka (p) N−1 ∑ cos(Np , Ψi )L(p ← Ψi ) N i=0 Indirect diffuse lighting computations are required not only for primary rays, but at every level of recursion b along the specular paths. Subscripted ordinal prefixes will be used to refer to points and coefficients at different levels of recursion along the specular path, 1 p and 1 Θ referring to the


125

primary ray intersection. To correctly weight the ambient contribution for pixel (x, y) at recursion level b, the specular coefficients j T = fr ( j p, j Θ ↔ j Ψ) cos(N j p , j Ψ) must be rippled down the path, resulting in

< La (b p → b Θ) >= πka (b p) ∏b−1 jT j N

N−1

∑ cos(N p , b Ψi )L(b p ← b Ψi ) b

(6.1)

i=0

6.4.2 Component-Based Approach The computation of Equation 6.1 requires sampling the hemisphere over a large number of N directions and it becomes prohibitively expensive if done for each intersection point and for a significant number of indirect diffuse bounces. The direct and specular components are calculated for each intersection point, but hemispherical sampling occurs less frequently by reusing values stored in the irradiance cache. Populating the irradiance cache is still time consuming though. The problem is further compounded on a distributed system since it is important, particularly at the beginning of the calculation, to share data quickly while trying to avoid communication latency to maximise computation. Using a component-based approach the components can be computed independently of each other and composited on the image plane by adding the respective contributions for each pixel. The shading of each intersection point can be spatially decomposed into two major tasks: direct plus indirect specular, and indirect diffuse calculations. Indirect diffuse lighting requests are forwarded to a subset of dedicated nodes together with data about the intersection point p, the pixel coordinates (x, y) and the rippled coefficient. Allocating a reduced set of nodes to this task allows for more frequent data sharing, thereby reducing the number of irradiance cache misses and thus lowering rendering times.

6.4.3

Component Subdivision

We divide the distributed nodes, into two sub groups. The first group the PAs (processors of indirect diffuse, or ambient calculations) are dedicated to the indirect diffuse calculations. The second group, the PRs (processors of the rest of the rendering), are responsible for the traditional rendering, except when the first indirect diffuse calculation is required. Figure 6.6 illustrates an overview of our system architecture. As in the case with the broadcast method, the PAs overlap communication and computation using distinct processes. The processes communicate using shared memory.


126

Figure 6.6: Component-based parallel irradiance cache.

The PRs obtain work using a traditional master-slave, image decomposition approach from the master controller, MC, in a similar method to that described in Section 6.3.1. The PRs take on the role of intersecting all forms of primary and secondary rays, and calculating the shading until the first indirect diffuse calculation is required. At each intersection point where shading is required, the co-efficient from Section 6.4.1, j T is calculated by multiplying the co-efficient of the shader with the value of the spawned ray’s parent and stored within the ray’s data structure. Initially, for primary rays j T is set to 1. The pixel coordinates for which the ray is contributing are also passed down as a parameter from parent rays to spawned rays. When the first indirect diffuse calculation is required the PR’s local irradiance cache is consulted. If a cache miss occurs the essential attributes from the data structure of the current ray being calculated, including pixel coordinates, intersection point, reference to intersected object, j T etc. are stored on an outgoing indirect diffuse request buffer. Whenever the outgoing buffer meets a threshold, we term the sending threshold, the outgoing buffer is sent to a particular PA, which is selected on a round robin basis for each PR. When the radiance of a pixel is finally calculated (this might be only partially computed since the indirect diffuse computation is migrated to one of the PAs), the result is stored in a buffer representing the image plane. Whenever the PR runs out of rays, it requests a new image tile from the MC. For reasons which will become apparent below, unlike the approaches described in Section 6.3, the image plane buffer is only sent back at the end of the frame calculation. The PRs also maintain a local irradiance cache. With time, this local irradiance cache will eventually minimise the outgoing indirect diffuse calculations. The PRs’ irradiance cache is synchronised with that of the MC using the same techniques as the centralised approach. This synchronisation is of course less frequent than that of the PAs and occurs only when asking for image plane data from the MC.


127

The PA’s role is dedicated to calculating the indirect diffuse calculations only. The PAs are sent work directly from the PRs in the form of the indirect diffuse request buffer as discussed above. The values in the indirect diffuse request buffer are reconstructed into rays. The PA’s function is then to calculate the indirect diffuse from the intersection point stored in the ray. This involves shading and may also involve further recursive indirect calculations, including indirect diffuse calculations. All these calculations are performed locally and the final calculated radiance, is multiplied by the original j T co-efficient and stored in the appropriate pixel coordinates on the image plane buffer. The irradiance cache values stored in the PAs are shared amongst each other and the MC much more frequently than with the PRs. This ensures that the irradiance cache samples are readily available on the PAs when required. Similarly with the PRs, the image plane buffer is only sent back at the end of the communication. This approach avoids sending computed radiance values back as messages to the PRs originally responsible for the pixel. This is a significant advantage, since it allows PRs to avoid the complex synchronisation issues required to await for and then store the final result for each pixel before sending the result back, improving computational performance. The MC composites all image planes buffers at the end of the frame to generate the final rendered image.

6.4.4 Load Balancing

A major issue with partitioning the processors onto two subsets is how to balance the load between PAs and PRs. While a correctly chosen PA to PR ratio helps to minimise load imbalance, in reality optimal static load balancing is hard to achieve due to both the unpredictable nature of ray tracing and because PRs keep on forwarding requests up to the end of their assigned tasks and these requests take much longer to compute than the remaining components. In order to balance the system we allow PRs to change state and download work from PAs whenever the MC has run out of image tiles. Every time a PA synchronises its irradiance cache with the MC, it also sends the number of pending requests on that PAs queue. Since this happens often, the MC has overall knowledge of the load on every PA. When a PR requests a new task and all image tiles have already been assigned, the MC selects the most loaded PA and signals it that some of its load should be sent to that PR. Upon reception of this signal, the PAs communicator process forwards to the PR a fraction of its load. It is the PAs communicator that decides how many pending requests to send to the PR, since it has access to the queues current length. This process is repeated until all the workload has been processed. At this load balancing stage all processors synchronise their local irradiance cache with the MC at the same frequency as the PAs, but in reality this rarely occurs because few new irradiance cache samples are generated.

6.5 Results

128

Figure 6.7: Scenes used for results and analysis. Top row, left to right: the Tables scene, the Cornell Box and the Corridor scene. Bottom row, left to right, the Library scene, the Temple of Kalabsha scene and the Art Gallery scene.

For animations, we run an overture sub-sampling of a few rays per node performed to give an overall indication of the number of irradiance caches misses. We find that for an initially empty irradiance cache, a weighting of 50 : 50 PAs to PRs produces the best results. Whenever the irradiance cache is not empty, the number of PAs is set to half of the estimated cost of the number of irradiance caches misses compared to the irradiance cache hits. This heuristic has worked well in practice. Finally, when the estimated ratio of irradiance cache misses is less than the number of nodes, all nodes are set as a new status where they compute the entire path, effectively rendering this method similar to the traditional approaches. Future metrics could couple this knowledge with the number of objects and reflective surfaces in the scene and user settings for indirect diffuse bounces.

6.5

Results

We present two sets of results running on different clusters for still images and one set of results for animations.

6.5 Results centralised broadcast component-based no sharing linear

20

15

Speedup

Speedup

20

129

10

5

centralised broadcast component-based no sharing linear

15

10

5

5

10 15 Processor No.

Speedup

20

20

5

10 15 Processor No.

20


15

10

5

5

10 15 Processor No.

20

Figure 6.8: Timings for the Temple of Kalabsha scene (top-left), the Corridor scene (top-right) and the Cornell Box (bottom).

6.5.1 Still Images This set of results were obtained with a cluster of 12 machines each with two Intel Xeon processors at 3.2 GHz with 2 GB of memory under Linux. All the nodes are connected by a 1 Gbit switch. 2, 4, 8 and 12 nodes were used to run these experiments (4, 8, 16 and 24 processors) with an additional node acting as the master controller. In order to demonstrate the potential of this approach still images were rendered from three selected scenes. The Temple of Kalabsha scene (resolution 672 × 512), and the Corridor scene, (resolution 512 × 512), were rendered with 1 indirect diffuse bounce and an ambient accuracy of 0.2. The Cornell Box, (resolution 512 × 512), was rendered with 2 indirect diffuse bounces an ambient accuracy parameter of of 0.1. The scenes can be seen in Figure 6.7 together with other scenes used for other results in this chapter. Default R ADIANCE parameters were used for all other settings. Plots of the achieved speedups can be seen in Figure 6.8. Results are shown for a parallel version that never shares any irradiance cache values (no sharing), the centralised, broadcast and the new component-based approach. Results reflect speedup when compared to the dedicated uniprocessor version of Radiance running under the same settings. The centralised and broadcast implementations synchronise the irradiance cache for every new 50 indirect samples, as suggested by Koholka et al. [KMG99]. For the component-based approach half the processors were allocated as PAs. These synchronise for every 8 new samples.

6.5 Results

130

Figure 6.9: Aggregated timings showing where the component-based approach obtains its speedup from. It is clear that the new algorithm outperforms the other algorithms in all cases. Speed-up gain for each scene relatively to the second best algorithm is shown in Table 6.1. This is due to the reduction of irradiance cache misses, compared with the other parallel approaches, as illustrated

6.5 Results

131 Processors 4 8 16 24

Kalabsha (%) 10.3 10.7 11.7 8.8

Corridor (%) 15.5 12.1 11.9 6.9

Cornell Box (%) 12.6 8.8 13.2 10.0

Table 6.1: Speedup gains relative to second best algorithm in percentages (%). in Figure 6.9 for the three scenes running on the 24 processors. This reduction is achieved by increasing the frequency of the irradiance cache sharing operation, while at the same time increasing its locality by restricting it to the PAs. With 4 processors the component-based approach also achieves super-linear speedup for the Corridor scene (speedup=4.1, efficiency=103.1%) and Cornell Box (speedup=4.2, efficiency= 105.7%) scenes. For a larger number of processors the speedup is noticeably close to linear; for the Cornell Box scene efficiency never falls below 90%, even with 24 processors. It is notable that super-linear speedup is never achieved by any of the other two renderers. This is a highly encouraging result indicating the new component-based renderer is more efficient even than the traditional uniprocessor approach in these cases. This is due to the fact that for all scenes the number of irradiance cache misses registered with the new approach is less than those registered with the sequential uniprocessor version, as shown in Figure 6.10 for the three scenes. The reordering of indirect diffuse samples computation results in a higher hit ratio, compared to the sequential version where these are computed in raster order. The same effect is verified with the centralised and broadcast approaches, but to a lesser extent and insufficient to achieve super-linear speedup. These results can also be attributed to making better use of the processors’ physical caches, particularly since the PAs contribute to only the indirect calculations and the cache footprint does not become corrupted by the other components. These results indicate that the component-based approach may be suitable for shared memory implementations also. With respect to assessing load balancing, Figure 6.11 shows the percentage of aggregated idle time, computed as the ratio of the sum of idle times across all processors and the aggregated rendering time, i.e. the rendering time multiplied by the number of processing elements. It can be seen that, up to 24 processors, idle times never go above 5.5%; we can conclude that the system is reasonably balanced, although the imbalance increases with the number of processors.

6.5.2 More Still Images We demonstrate further results for computing irradiance caches for still images. These extra results were taken since the irradiance cache synchronisation setting was not optimised for that network and to present results with a larger number of indirect diffuse bounces. In this set of results we

6.5 Results

132

Figure 6.10: Irradiance cache misses.

use a more optimised setting calculated empirically by first finding a rough indicator of the ideal size to send node to node communication with the least latency and testing the values with a running implementation. This setting was set to 14 and 8 for the component-based approach. Unfortunately, a smaller cluster was used for these results. The system used was a cluster of 8 dual Intel Xeon processors running at 2.4 GHz with 3 GB of memory under Linux and a single workstation with a single processor 2.53 GHz Intel with 1 GB of memory acting as the frontend for the parallel implementations. All the nodes were connected by a 1 Gbit switch.

6.5 Results

133

Figure 6.11: Idle times due to load imbalance. 16 14

14 12

10

Speedup

Speedup

12

16 centralised broadcast component-based no sharing linear

8

10 8

6

6

4

4

2


2 2

4

6

8 10 Processor No.

12

14

16

2

4

6

8 10 Processor No.

12

14

16

Figure 6.12: Timings for the Tables scene (left) and the Cornell Box (right).

In order to demonstrate the potential of our approach we rendered still images from a wide variety of realistic and test scenes. We render the view from the Tables scene with 1 indirect bounce, (512 × 512 resolution). The Cornell Box, (512 × 512 resolution), the Corridor scene (512 × 512 resolution), the Library scene, (512 × 481 resolution) and Kalabsha scene [SCM04], (600 × 400 resolution), with 2 indirect diffuse bounces. Finally, we render the Art Gallery [LS98], (600 × 400 resolution) with 3 indirect diffuse bounces, see Figure 6.7. Default R ADIANCE parameters were used for all other settings. Plots of the results can be seen in Figure 6.12, Figure 6.13 and Figure 6.14. We show the results for the centralised approach (centralised), broadcast approach (broadcast) the new componentbased approach (component-based) and for a parallel version that never shares any irradiance cache values (no sharing). Each renderer was run on 2, 4, 8 and 16 processors and the results reflect speedup when compared to the dedicated uniprocessor version of R ADIANCE running under the same settings. It is clear from all the results that the new algorithm outperforms the other

6.5 Results

134

16 14

14 12

10

Speedup

Speedup

12


8

10 8

6

6

4

4

2


2 2

4

6

8 10 Processor No.

12

14

16

2

4

6

8 10 Processor No.

12

14

16

Figure 6.13: Timings for the Corridor scene (left) and the Library scene (right). 16 14

14 12

10

Speedup

Speedup

12


8

10 8

6

6

4

4

2


2 2

4

6

8 10 Processor No.

12

14

16

2

4

6

8 10 Processor No.

12

14

16

Figure 6.14: Timings for Temple of Kalabsha scene (left) and the Art Gallery (right).

algorithms in all cases and for many scenes achieves near linear speedup up to 16 processors. As with the previous results, the component-based also achieves super-linear speedup for a reduced number of processors. In most scenes this can be observed up to 8 processors.

6.5.3 Animation We render animations for two scenes, the Corridor scene and the Art Gallery, see Figure 6.2. The Corridor scene is rendered at a resolution of 512 × 512 and 2 indirect diffuse bounces are used. The Art Gallery is rendered at 600 × 400 and 3 indirect diffuse bounces are used. The rest of the parameters are set at R ADIANCE defaults. Animations are only rendered for the parallel renderers using 16 processors and for the uniprocessor version with the same hardware configuration described in Section 6.5.2. Results for the animations can be seen in Figure 6.15. Results are similar for both cases. The only noticeable difference in performance is the first frame where results mirror those of the still images. In fact results are comparable, for the parallel renderers, from the second frame onwards because the component-based renderer switches into the traditional mode.

6.6 Selective Parallel Rendering

135 300

uniprocessor broadcast component-based

uniprocessor broadcast component-based

250 Time (seconds)

Time (seconds)

200

150

100

200 150 100

50 50 0

0 20

40

60 80 Frame No.

100

120

10

20

30 40 Frame No.

50

60

Figure 6.15: Irradiance cache animation results: (left) Corridor scene and (right) Art Gallery.

6.6

Selective Parallel Rendering

In this section we combine the work from this chapter to our selective rendering work. As we did with selective renderer in Case IV of Chapter 4, we extended the selective renderer to a distributed environment. Furthermore, since the primary work of this chapter involved rendering using the irradiance cache, we apply the same selective variables, rays per pixel and irradiance cache search radius, as used in the selective renderer described in Case III of Chapter 4 which provides improved speedup for the selective rendering. The selective guidance used was the same as Case III also. This selective renderer was implemented in R ADIANCE by extending the parallel componentbased renderer to selectively render scenes. Figure 6.16 shows the parallel speedup for this renderer for the Cornell Box, the Tables scene and the Corridor scene using the same parameters and views as those used in the same scenes for the results in Section 6.5.2. The hardware configuration was identical to that used in Section 6.5.2 also. Table 6.2 demonstrates the total speedup obtained when rendering using both parallel and selective rendering. For parallelism, similar timing to the non-selective version can be observed. For selective rendering, as can be seen the speedup is quite notable overall, a vast improvement from that seen in the results section from Case IV, when rendering without a pre-computed irradiance cache. This is due to using the irradiance cache radius as a selective variable. Subsequent images in an animation would produce results similar to those observer in the simpler parallel selective renderer of Case IV in Chapter 4.

6.7

Summary

The irradiance cache is a key acceleration structure in the computation of high-fidelity graphics. Despite the order of magnitude improvement the irradiance cache can provide over traditional distributed ray-tracing, the computational times for such high quality graphics are still substantial.

6.7 Summary

136

gold selective 2 4 8 16

Cornell Box Time Psu Tsu 450 1 1 130 1 3.46 62 2.09 7.26 34 3.82 13.24 17 7.65 26.47 11 11.82 40.91

Tables Psu Tsu 1 1 1 2.26 2.15 4.88 3.92 8.88 7.20 16.29 12.03 27.23

Time 4,057 1,793 832 457 249 149

Time 4,500 1,859 882 451 234 141

Corridor Psu 1 1 2.11 4.12 7.94 13.18

Tsu 1 2.42 5.10 9.98 19.23 31.91

Table 6.2: Timings for the parallel selective renderer using component-based rendering. Psu stands for parallel speedup and Tsu for total speedup resulting from both parallelism and selective rendering. Time in seconds. 16

16 component-based linear

14

12

12

10

10

Speedup

Speedup

14

8

8

6

6

4

4

2

2 2

4

6

8 10 Processor No.

12

14

component-based linear

16

2

4

6

8 10 Processor No.

12

14

16

16 14

component-based linear

Speedup

12 10 8 6 4 2 2

4

6

8

10

12

14

16

Processor No.

Figure 6.16: Speedup for selective parallel renderer using component-based rendering for the Cornell Box, the Table scene and the Corridor scene. Parallel processing is one approach which can significantly reduce the overall computational time. However, the inherently sequential nature of how the irradiance cache is created and used has previously prevented the maximum benefit being gained from this structure in any parallel implementation. In this chapter we have presented a novel method of dividing the work load amongst the processors of our cluster. This ensures that the computationally expensive part of establishing the initial irradiance cache is dealt with by dedicated processors, minimising network latency and minimising the number of redundant indirect diffuse samples that need to be calculated. Such an approach has resulted in significant performance increases in our parallel implementation for

6.7 Summary traditional and selective rendering.

137

6.7 Summary

138

Chapter 7

Component-Based Adaptive Sampling In this chapter we demonstrate how to accelerate rendering using a component-based approach to adaptive sampling. While this approach can be applied to most renderers based on ray tracing, it can also be viewed in the context of selective rendering using separate selective guidance for different components.

7.1

Introduction

As we have seen in previous chapters, ray-traced images rely on shooting rays to calculate the radiance at a given pixel. The radiance of adjacent pixels may tend to exhibit spatial coherence, by which radiance values close to each other are similar. As we mentioned in Section 3.1, this property has been taken into consideration for designing adaptive sampling algorithms, whereby rays are not traced at each pixel but at certain intervals and only if the resulting difference in the radiance at the pixels is above a certain threshold are the intermediate pixels calculated, see Figure 7.1. These algorithms can improve speedup substantially by reducing the total number of rays shot significantly. The traditional method of doing image plane adaptive sampling was at the level of the pixel’s gross radiance computation. In this chapter we introduce a new flexible mechanism for adaptive sampling based on the reflectance function of a material whereby the light that hits a surface can be divided into a number of components, as demonstrated in Chapter 5. When using ray-tracing techniques it has long been assumed that certain reflective properties of a material exhibit different behaviours. For example, indirect diffuse computations are traditionally smooth and do not change much over space, a property that the irradiance cache takes full advantage of to obtain orders of magnitude performance over traditional distributed ray-tracing methods. Similarly, specular

139

7.2 Traditional Adaptive Sampling as Selective Rendering

140

Figure 7.1: Traditional adaptive sampling. Samples only (left) and interpolation (right). reflections exhibit high spatial frequencies, meaning that the specular contribution to the radiance of adjacent pixels hitting the same material may be substantially different even if the direct, glossy and indirect diffuse components of the material are very similar. By taking advantage of this insight we present a novel adaptive sampling algorithm that bases its sampling criteria not on the radiance of an individual pixel but on the individual contributions of each component to the final radiance. This chapter is divided as follows. The next section presents the traditional adaptive sampling as a selective cyclic process. Section 7.3 presents the component-based rendering theory specific for this case. Section 7.4 introduces our novel algorithm. Section 7.5 presents our implementation of the algorithm in R ADIANCE. Section 7.6 presents results using a number of scenes and evaluates our work when compared the traditional method using a visible difference predictor. Finally Section 7.7 summarises contributions of this chapter.

7.2

Traditional Adaptive Sampling as Selective Rendering

We explain the traditional adaptive sampling by mapping it to one of our selective rendering frameworks, introduced in Chapter 4. The traditional adaptive sampling can be seen as a selective cyclic process. While the computation is generally performed recursively, it can be viewed progressively as we describe below. Figure 7.2 demonstrates the framework for this process. Initially the image is rendered at a base quality corresponding to the lowest quality threshold that is willing to be tolerated. Rays are usually cast to evaluate pixel radiance at a selected set of pixels using a fixed stride. Subsequently,

7.3 Rendering by Components

141

LQ SEO M

BQ

SQ

Selective R enderer

C PU

Figure 7.2: Traditional adaptive sampling framework. the difference in radiance between the pixels is calculated at the corners of each box (length and breadth based on the stride) and areas that have a radiance difference above a given threshold are tagged for further subdivision. This corresponds to the selective guidance that will guide the selective rendering stage. These areas are subdivided once by calculating the pixel radiance in the middle of the stride and the selective guidance in the form of radiance difference is recalculated for those areas which were subdivided in the first pass. This process is then repeated starting from the calculation of the selective guidance stage until the selective guidance criteria is satisfied. The selective variable in the selective rendering stage is the amount of adaptive subdivision that will be used. All pixels that were not calculated directly are interpolated from the corner pixels. While we have described this process as adaptive sub-sampling of the image plane, the same process applies for adaptive super sampling.

7.3

Rendering by Components

While we have seen how the rendering equation can be divided into components in Section 5.2.1, the component subdivision required for this approach is slightly simpler so we present it again here for completeness. The radiance at a pixel (x, y) in direction −Θ which intersects an object in the scene at point p is given by the rendering equation:

Z

Le (p → Θ) +

Ωp

L(x, y) = L(p → Θ) = fr (p, Θ ↔ Ψ) cos(Np , Ψ)L(p ← Ψ)δwΨ

7.3 Rendering by Components

142

For convenience we set Z

Li (p → Θ) =

Ωp

fr (p, Θ ↔ Ψi ) cos(Np , Ψi )L(p ← Ψi )δwΨ

where Ψi refers to a specific direction. Traditionally it is common to subdivide the computation into direct and indirect computations, using Ψd to refer to the direction of the direct contribution of the light and Ψid for the indirect contribution:

L(p → Θ) = Le (p → Θ) + Ld (p → Θ) + Lid (p → Θ) Furthermore, the indirect computation can be further divided into components. The more traditional components used are indirect diffuse or ambient (a), indirect specular (is) and indirect glossy (ig):

Lid (p → Θ) = La (p → Θ) + Lis (p → Θ) + Lig (p → Θ) This can be abstracted further to account for shaders with more components. Assuming Nc components:

Nc

Lid (p → Θ) = ∑ Lc (p → Θ) c

Finally,

Nc

L(p → Θ) = Le (p → Θ) + Ld (p → Θ) + ∑ Lc (p → Θ) c

This only means that we have split the calculation into the direct and the indirect components for the first intersection from the virtual camera. As we showed in Section 5.2.1, the process can be taken further and be done recursively. However, since the algorithm described in this chapter is based on the image plane and for design simplicity, as shall be described below, we are only interested in the first intersection.

7.4 Component-Based Adaptive Sampling

7.4

143

Component-Based Adaptive Sampling

Figure 7.3: Samples per component without interpolation: (a) direct, (b) indirect diffuse, (c) glossy and (d) specular component of the transparency. Low resolution is used to make samples more visible.

Figure 7.4: Interpolated components: (a) direct, (b) indirect diffuse (without material contribution), (c) glossy and (d) specular component of the transparency. As outlined previously the method adopted by our novel adaptive sampling algorithm relies on adaptively sampling the individual components. The motivation behind this lies in the different spatial variances of the components and the flexibility which arises in rendering them in separate passes. We use this knowledge in our framework to be able to render individual components using different adaptive sampling thresholds, see Figure 7.3, since certain components, for example indirect diffuse, are less likely to have highlights, and finally composite the results into a single individual plane. Furthermore, we only break components up after the first intersection with an object for this renderer. The separate component samples are interpolated in a similar fashion to the traditional adaptive sampling method, Figure 7.4. For this rendering strategy, the philosophy is that any renderer can have enhanced speedup by a simple modification of the shaders. Moreover, all of this can be done at a system level and is completely transparent to the user. The separate calculation and rendering of the components effectively results in different selective guidance and selective rendering. The selective guidance is specific to each component and can possibly have different thresholds for different parameters. While the selective variable is effectively the same per component the resultant sampling subdivision for each component will be different from the traditional rendering.

7.4 Component-Based Adaptive Sampling

144

7.4.1 Framework

SQ

BQ

LQ

Selective R enderer

C PU

Figure 7.5: Component-based adaptive sampling framework.

The overall framework for this system can be seen in Figure 7.5. The algorithm we use is a modification of the classic ray-tracing algorithm and can be applied to any ray tracer. Furthermore, in the description of this algorithm we assume a very simple adaptive sampling scheme whereby rays are shot at the corners of a square and if the resultant calculated radiance of the rays differs by some threshold, the square is subdivided recursively up to a user-defined depth as described in Section 7.2. The only additional data structures that are strictly required are separate image buffers for each of the components selected for adaptive rendering, see Figure 7.4, and a data storage for the primary rays that hit objects that have component-based materials. We will term these structures, the component image buffers and the primary ray structure. These structures are equal in size to the maximum number of samples that can be shot. The memory requirements of these structures on modern systems is negligible. The material types (or shaders) that can have components that are going to be rendered separately need to be identified. This is a system design decision and does in no way effect the user. It is added to our algorithm design since certain shaders such as a mirror shader in R ADIANCE always perform a pure specular reflection and thus it is computed immediately. We term shaders that can be broken down into components as breakable. A description of our choice of breakable shaders for our implementation in R ADIANCE will be discussed in Section 7.5.

7.5 Implementation

145

7.4.2 Algorithm The algorithm can be thought of as consisting of two passes. The first pass corresponds to the calculation of the radiance contribution by the direct light and the radiance contribution from the non-breakable components. The second pass corresponds to the calculation of the individual components. Initially the first pass rays are traced in the traditional manner with the sampling method outlined above. When a ray initially hits an object, primarily the direct lighting is calculated and stored on the direct component image buffer. Whereas using a traditional method each ray is traced to completion, with our method when the ray first intersects an object, if the object’s material properties contain a breakable shader, a flag is set inside the primary ray structure tagging that particular pixel for calculating the component, or components if more than one, for future use. Non-breakable shader computations are performed the traditional way. After each of the four corners have been calculated, the adaptive sampling criteria is consulted and if necessary further adaptive sampling is performed and the operations above are repeated for each of the direct rays. If the criteria were satisfied the underlying pixels are interpolated. When the first part terminates, the direct lighting and some parts of the indirect lighting (for the non-breakable components) would have been completed and stored in the direct component image buffer. The second phase of the algorithm begins by identifying which of the components need to be sampled. This is performed for each of the breakable components. The samples to be calculated are identified by consulting the primary ray structure. The samples use the information from the primary rays stored in the primary ray structure to launch the component ray. If the primary ray data is missing due to the component requiring finer grain sampling than the direct rays, the information can either be interpolated if all the information is available or a primary ray calculated at that stage in the algorithm. The component ray is fully recursive and computes until it terminates in the traditional ray tracing method as implemented by the underlying renderer. The resulting radiance values are stored in the component buffer for that specific component. Adaptive sampling proceeds in the traditional fashion. When the algorithm has iterated through all components, the component buffers are composited to produce the final image buffer.

7.5

Implementation

In this section we outline our own implementation of the algorithm described in the previous section, and show how we modified an existing renderer to do so. We will describe the decisions we took for choosing the breakable shaders and use this implementation for obtaining results. Our component-based adaptive sampling algorithm has been incorporated into the rpict R ADI ANCE renderer. Our component-based adaptive sampling can sit on top of the irradiance cache

7.5 Implementation

146

algorithm in R ADIANCE effectively providing a two-tier image space and object space interpolation method, with improved results.

7.5.1 Traditional Implementation

We have modified the R ADIANCE rpict renderer in two further guises to benchmark the performance of our new implementation. The simplest implementation just removes the sampling scheme traditionally used by R ADIANCE and replaces it with a more traditional stratification scheme which renders one ray for each entry of a stratified grid. The resolution of the images is computed as a secondary pass using a separate filtering application which can downsample the calculated image. Effectively the whole process could be viewed as uniformed jittered stratified supersampling. This implementation is performed to simplify the sampling process and implementation of the other renderers, since it is the most traditional form of sampling. We call this renderer tpict. The second traditional renderer is a modification of rpict to perform adaptive sampling by shooting pixels at the corners of the stratified structure at user-defined intervals as described in Section 7.2. We term this renderer apict. Figure 7.1 was computed using this renderer.

7.5.2 Component-Based Implementation

Our implementation of the component-based adaptive sampling algorithm was applied as an extension to rpict we term capict. The capict sampling scheme at its basic is similar to that of apict. However, this renderer follows the implementation of the algorithm outlined in Section 7.4.2. The breakable shaders chosen for this implementation, were the isotropic shaders [WH92] that contain indirect diffuse, indirect specular, indirect glossy for both reflected and transmitted materials. Each of the components in these shaders are breakable. Furthermore, the pure transparent component glass is breakable into only the reflected component. The transmitted component is computed as part of the direct computation as a non-breakable computation. This method is chosen since no direct computation is performed, so breaking the component into both reflected and transmitted would be useless. Furthermore, for similar reasons, pure specular materials as implemented by the mirror shader in R ADIANCE are also bundled with the direct computation. These shaders satisfied the requirement of our test scenes so no further shader adjustments were required. However, further shader modifications would be straightforward to add. The renderings of the breakable shader modifications are illustrated in Figure 7.4.

7.6 Results and Verification

7.6

147

Results and Verification

Figure 7.6: Views of the scenes used for results: (a) the Corridor scene, (b) the Library scene, (c) the Cornell Box and (d) the Temple of Kalabsha.

In order to demonstrate our approach we present results using four distinct scenes, see Figure 7.6. The scenes chosen represent a variety of different realistic and practical scenes. The results were computed with default R ADIANCE parameters except for the resolution and indirect diffuse bounces. The four scenes were rendered with a resolution of 512 × 512, with a maximum of nine rays per pixel effectively rendering a stratified sampling grid of 1536 × 1536 and filtered using the standard R ADIANCE gaussian filter. The Corridor scene and the Library scene were rendered with 1 indirect diffuse bounce and the Kalabsha scene and Cornell Box with 2 indirect bounces. All these images were rendered with an irradiance cache. Furthermore, we present results for the Cornell Box without irradiance cache computations rendered at 512 × 512 with a maximum of 4 rays per pixel and 1 indirect diffuse bounce. Lower values were chosen for this benchmark due to the prohibitive time it takes to render without adaptive sampling. All results were computed on a Intel Pentium Xeon CPU 2.40GHz with 3GB RAM under Linux.

Renderer tpict apict capict

Corridor Time Speedup 2,340 1 1,687 1.39 1,107 2.11

Library Time Speedup 2,700 1 2,085 1.29 1,900 1.42

Table 7.1: Results for the various renderers. Time in seconds.

Renderer tpict apict capict

Cornell Box Time Speedup 313 1 199 1.57 159 1.96

Cornell Box (no IC) Time‡ Speedup 238 1 61 3.9 22 10.81

Kalabsha Time Speedup 901 1 870 1.04 610 1.48

Table 7.2: Results for the various renderers. Time in seconds. ‡ Time in minutes.

7.7 Summary Renderer apict capict

148 Corridor 0.2% 0.2%

Library 1.3% 0.5%

Cornell Box 0.2% 0.2%

Cornell Box (no IC) 0.1% 0.08%

Kalabsha 1.3% 1.3%

Table 7.3: Visual differences predictor results of the adaptive sampling renderers compared the traditional approach.

7.6.1 Performance Results The results are presented in Table 7.1 and Table 7.2. It is clear from the speedup that capict outperforms both apict and tpict always obtaining close to 50% speedup and more over the traditional method. The slowest improvement is for the library scene and is due to the glass on the tables being both at the top and bottom of the tables and this weakens the component-based approach for those areas of the image. Yet the results are still an improvement over the traditional adaptive approach. Of particular interest is the result of the Cornell Box without the use of an irradiance cache. The result demonstrates an order of magnitude improvement in performance over the standard technique signifying that the novel rendering method can be used as an easy to implement substitute to the irradiance cache for systems that do not support irradiance caching. It also suggests that the component-based adaptive sampling algorithm is useful for other global illumination algorithms such as path tracing where the integration of an irradiance cache is nontrivial.

7.6.2 Verification We provide verification of the results using the Visible Differences Predictor [Dal93], a metric that creates a grey scale image of the perceptual differences between two images, highlighting only differences that are visible by a human observer, see Section 3.2.1. Since the resultant images between any two of our images is mostly blank we summarise the results as the average pixel error between the two adaptive sampling renderers and tpict in Table 7.3. The results demonstrate that there is practically no perceptual difference between any of the adaptive renderers and the traditional approach. The small percentage difference is probably due to the stochastic nature of our rendering algorithm. In fact when comparing two distinct images rendered with the same conditions under tpict the error is around 0.2% similar to that of the adaptive renderers.

7.7

Summary

In this chapter we have presented a novel adaptive sampling algorithm that uses a componentbased approach to speed up rendering times with minimum perceivable differences in the resultant

7.7 Summary

149

images. Unlike the selective component-based approach using the crex, this algorithm is transparent to a user and can be included into existing ray traced renderers through a modification of the required shaders. In terms of selective guidance this work introduces the concept of using different selective guidance techniques, for the different components, a technique which we will take advantage of in the next chapter for different selective variables. The results also show that, as expected, the component-based adaptive sampling algorithm can be used in conjunction with an irradiance caching scheme providing a two-tier interpolation mechanism and can also achieve an order of magnitude performance increase when using strict distributed ray tracing only. While we have presented the algorithm in terms of one of the simplest adaptive sampling schemes, the algorithm could potentially be used with other more complex adaptive sampling schemes such as [BTB91, Guo98]. Since it relies on ray tracing and samples at the image plane, the componentbased sampling approach is straightforward to parallelise using an image tiling demand driven approach similar to that use in the selective parallel renderer in CaseIV in Chapter 4. Future work could investigate the usefulness of this algorithm for other global illumination algorithms in particular path tracing [Kaj86]. Furthermore, this form of rendering could potentially be used as a hybrid sub-sampling and time-constrained renderer to take the best advantages of the traditional time-constrained rendering and the component-based time-constrained rendering. A similar time-constrained approach is presented in the next chapter.

7.7 Summary

150

Chapter 8

Progressive Selective Rendering In this chapter we combine the selective rendering algorithms introduced in Chapter 4 with the fine-grained flexibility provided to us from the component-based selective rendering from Chapter 5 and component-based selective guidance from Chapter 7, to create algorithms that take advantage of such systems for both selective rendering and time-constrained rendering. As mentioned in Chapter 4, most perceptually-based selective rendering algorithms can be broadly placed into one of two categories: the selective rendering pipeline and the selective cyclic process. By focusing primarily on the selective rendering pipeline, in this chapter we shall show the interdependence of the selective framework with the choice of image preview and selective guidance and the choice of the selective variables. Based on this view, we will then present a selective rendering framework that attends to each part of the selective rendering process. Due to the progressive nature of these algorithms they are ideal for time-constrained rendering. We present a time-constrained renderer for multiple selective variables using these techniques. The chapter is divided as follows. In Section 8.1 we outline the interactions between the various stages of the selective rendering frameworks. In Section 8.2 we explain the progressive selective rendering algorithms. In Section 8.3 we present the use of our progressive selective rendering for time-constrained rendering.

8.1

The Interaction amongst the Selective Rendering Stages

The interaction amongst the different stages of selective rendering algorithms in the selective rendering frameworks has often been neglected. The decisions taken in the calculation of the preselective guidance may have complications further on in the rendering process. When using a se-

151

8.1 The Interaction amongst the Selective Rendering Stages

152

Figure 8.1: A selective rendering pipeline using rasterisation for image preview. The image preview image (left) is used to generate the saliency map, which is then used to render the final image. Note the artefact caused by the lack of knowledge of indirect lighting in the image preview. Far right (top) selectively rendered image and (bottom) reference image. Contrast with saliency maps and image preview in Figure 8.2.

Figure 8.2: Our novel selective rendering pipeline. The images on the far left represent the image preview, full (top) and for irradiance cache only (bottom), followed by the selective guidance for each. The top images are used to render the image on the right (top) and all four to render the one on the bottom. The far right image is the reference image. Contrast with artifacts created by rasterised image preview in Figure 8.1. lective rendering pipeline it is customary to use a rasterised rapid image estimate for pre-selective guidance. This is due primarily to the speed of this approach. However, the rapid image rasterisation approach struggles to compete with a ray tracer for versatility in matching glossy reflections, soft shadows, participating media, high-dynamic range lighting, indirect diffuse calculations, motion blur and other effects that are physically based. Using rasterised rapid image estimates may lead to artifacts in the final image since these effects are ignored in the image preview stage. An example of such an error can be seen in Figure 8.1. In this example, the image preview is computed using a rasterisation-based renderer and the selective guidance, in this case a saliency map, is computed from this image. As can be seen in the selectively rendered image, the indirect diffuse interreflections were not rendered correctly due to their absence in the image preview stage. Fur-

8.2 Progressive Selective Rendering

153

thermore, the image preview detects the bottom part of the image with shadows where in actual fact these would be unimportant in the final image, resulting in unnecessary computation. Further results demonstrating that the rasterisation pipeline is not ideal and produces poorer image quality results and rendering times, compared to our novel techniques will be shown in Section 8.2.2. Also, while rasterisation could potentially simulate many effects to a certain degree, this would result in having to maintain two rendering systems with possibly diverging results. Using ray-tracing for the pre-selective rendering stage generates image previews more faithful to the final image. Contrast the image preview, saliency map and rendered image in Figure 8.2 (top) with those using a rasterised preview in Figure 8.1. The former produces better results due to its reproduction of the indirect diffuse lighting and also the selective guidance does not detect as important the areas that would not be so in the end, such as the shadows at the bottom of the image detected in the latter. The problem with ray-traced image previews is the rendering cost compared with the rasterised option. The ray-traced image preview has been ideal when the selective variable in the selective rendering stage is a ray per pixel approach since progressive rendering can be used, as in [CCW03, MDCT05] and the selective renderers in Chapter 4 and Chapter 5. In these cases, the computation of the image preview is not wasted. When rendering with more complex selective variables, such as the indirect diffuse parameter used by [YPG01], when generating the image preview care must be taken to what degree of quality the selective variables are rendered in. Ideally, an incremental approach similar to that used when using a ray per pixel renderer is used for all selective variables. This leads us to the notion of using progressive selective algorithms to construct our selective renderers. In such a way, for selective rendering pipelines the need to use rasterised image previews, with the artifacts that these result in, maintaining two working systems and sometimes resulting in extra computation is avoided. Also, using progressive algorithms one can make use of all the features of the physically-based renderer at the image preview stage, at a base quality. Crucially, this computation of the base quality components is not lost. This feature is particularly important as we shall see in the results sections for computations that do not scale linearly such as the irradiance cache. Also for both selective rendering pipelines and selective rendering cyclic processes the ability to render with multiple selective variables is maintained resulting in better performance. Furthermore, the ability to render images progressively makes it easy to use time-constrained rendering on any of the selective variables a feature that improves the flexibility of the time-constrained system.

8.2

Progressive Selective Rendering

In this section we present our progressive selective rendering framework. While our progressive selective rendering algorithms may be used both for the selective rendering pipeline and the selective rendering cyclic process frameworks, we will focus our attention only on the selective


BQ

154

PM BQ

SM

PM SM

SQ

SQ

BQ

IC SM

IC BQ \

SG processing

G PU

Selective R enderer

C PU

Figure 8.3: Progressive Selective Rendering Framework.

rendering pipeline. Our framework for progressive selective rendering allows us to selectively render images directed through selective guidance by modifying a number of standard rendering parameters. This multiple selective variable rendering approach enables extra computational gains and further flexibility when compared to other selective renderers since there are a larger number of parameters to manipulate. Our rendering framework makes use of a data structure we call the selective variable table, or svt for short, to maintain the parameters. The svt is a list of all selective variables, together with the maximum and minimum value for each parameter. Examples of the parameters in the svt are described in Table 8.1. While all our selective variables are modulated linearly, the svt could also contain a modulation function of how an individual selective variable’s quality can change. The progressive selective rendering framework can be seen in Figure 8.3. This case is using three, selective variables: rays per pixel, irradiance cache search radius and participating media. The pre-selective rendering stage produces computation for each of the selective variables. The SG processing, creates a selective guidance map for each selective variable. These particular maps will be discussed in the next section. The use of distinct selective guidance for each selective variable will influence the selective rendering in the final stage.

8.2 Progressive Selective Rendering Selective variable Rays per pixel IC interpolation search radius Participating media rays Direct lighting rays

155 minimum 1 0.75 1 1

maximum 25 0.125 16 16

Table 8.1: An example of a svt for the four selective variables used in our system.

8.2.1 Progressive Selective Rendering Algorithms In this section we discuss our general approach with four progressive selective rendering algorithms, for rays per pixel, direct lighting, participating media and indirect diffuse computation. While we only show selective rendering for four selective variables others could be used, such as depth control similar to the techniques in Chapter 5 and selective choice of the amount of rays to shoot in the final gathering stage of a renderer using photon mapping. In the next section we discuss the choice and adaptability of these selective variables. Subsequently, we discuss the general progressive selective rendering algorithms. Finally, we discuss a progressive selective irradiance cache, which is similar in concept to the other methods but is sufficiently more complicated to warrant its own explanation.

Selective rendering variables

Our rays per pixel approach is the simple progressive algorithm commonly used in most selective renderers. Since we do not know beforehand the number of rays that are going to be rendered and a completely random patten would cause noise artifacts, we adopt an incremental hierarchical low-discrepancy method. We have shown in Section 2.7 the advantages of such methods are the reduction of aliasing, the well distributed nature of the samples and most importantly the hierarchical aspect. For direct lighting, the selective variable we use is the number of shadow rays shot. As with the case of the pixel sampling, a hierarchical low-discrepancy method can be used to sample the area light sources, with the same advantages attributed to the pixel rendering. However, in our case, due to the nature of our implementation in R ADIANCE, we are constrained to the system that R ADIANCE uses to calculate direct lighting from area light sources. In the R ADIANCE case, area light sources are subdivided and one ray is shot in every subdivided area. We adaptively subdivide the light sources using our selective criteria. This effectively serves the same goal of modifying the number of shadow rays and direct lighting calculations. For participating media we use an approach based on the single scattering for homogenous par-


BQ

156

SM

PM BQ

PM SM SQ

SQ BQ

IC BQ

IC SM \

Figure 8.4: Detailed description of the progressive selective rendering pipeline. ticipating media in R ADIANCE. When entering a participating media volume a number of rays are used to sample single scattering. The number of these rays is used as the selective variable. Traditionally the rays are sampled at fixed intervals, instead we use a low-discrepancy sequence to select where within the mist volume to sample the rays. The irradiance cache has a number of parameters that affect the performance, such as number of indirect diffuse rays shot and the irradiance cache search radius. While it would be tempting to use techniques similar to those for the sampling approaches and use progressive algorithms on the number of indirect diffuse rays shot, the situation would be complicated when more than one indirect diffuse bounce is required since all the cached samples that are affected by other samples would need to be updated whenever a higher quality sample is required. Modifying the irradiance cache interpolation search radius using the same technique used in Case III from Chapter 4 and [YPG01] selective irradiance cache is a more feasible option.

Selective rendering pipeline

Figure 8.4 shows a more detailed overview of our selective rendering pipeline. In the pre-selective rendering stage the selective renderer computes the image preview using the minimum values in the svt, corresponding to the base quality of that selective variable, and subsequently uses this knowledge for the selective guidance stage. We use a component-based approach for the preselective rendering stage, separating all the selective variables as individual components. In the primary rendering phase the primary rays are shot until a component which is a selective vari-


157

Figure 8.5: Selective rendering for direct lighting. Left to right: The pre-selective rendering for direct lighting only, the edge map and the final rendered image of the scene being used for the selective rendering.

able needs to be calculated. When this occurs, the necessary data is stored for further use as in the component-based adaptive sampling approach in Chapter 7. When the primary ray calculation and non-selective variable computations are complete, the previously stored components are computed individually at the base quality for that selective variable, a phase we term base quality component rendering. The component subdivision is done primarily to generate individual selective guidance methods for each selective variable and is also required by some algorithms, such as our implementations of the selective progressive irradiance cache and selective progressive direct lighting. Figure 8.5 (left) and Figure 8.6 (left) visualise the pre-selective rendering computations for the direct lighting and participating media respectively. Applying selective guidance routines to individual components allows the rendering to be more flexible and to apply specific selective guidance techniques. The selective guidance techniques are a saliency map for the rays per pixel and an extended edge map for the direct lighting, see Figure 8.5 (middle). The saliency map used for the selective guidance of rays per pixel is generated from the composites of all the pre-selective rendering image previews of the other selective variables. For participating media, we use a map that captures the fraction of the light at a surface of the object to that received at the camera, see Figure 8.6 (middle). This is similar in concept to the X-map in [ASGC06]. In areas were there is no direct lighting, such as the top of the Cornell Box, the participating media computation is used instead. As was the case with previous selective renderers, the selective guidance used here could be replaced by other selective guidance methods. The next stage in our rendering pipeline, the selective quality component rendering stage entails the progressive update of all the components based on the values of that particular selective variable’s selective guidance. Since the algorithms are progressive and already rendered at the base quality, the computations are progressed for the selective variable components to the level desired by the selective guidance by using the svt modulating function between the maximum and minimum svt values to render at selective quality. The final stage of the rendering is the traditional selective rendering method without splitting of components, but with each selective variable mod-


158

Figure 8.6: Selective rendering for participating media. Left to right: The pre-selective rendering for the single scatter computation in participating media only, the selective guidance map and the final rendered image of the scene being used for the selective rendering. ulated separately. We illustrate the process with an example for selectively rendering using participating media and rays per pixel, shown in Figure 8.7. In this example a number of pixels labeled A, B, C and D are computed. The primary rendering stage shoots the primary rays and identifies the participating media computations for the secondary stage, where the single scatter participating media computation is performed. We maintain the already calculated participating media calculations on the subsequent figures to demonstrate they have already been calculated and to emphasise the hierarchical approach. The selective guidance stage, not shown in the figure, produces a saliency map for the rays and a map for the participating media. For simplicity we assume that the same quality is attributed for both selective variables. The selective quality component rendering stage shows how for pixels B and D considered of higher quality, further samples are calculated to improve the estimation. Finally, in the selective rendering phase all remaining rays are calculated at the appropriate selective rendering quality for both selective variables.

Progressive selective irradiance cache

The progressive irradiance cache algorithm commences as in the general case with the primary rendering phase accounting for the computation of the primary rays and the identification of the indirect diffuse samples. As with the other selective variables, in the base quality component rendering phase the selective variable is computed, in this case the indirect diffuse samples. The indirect diffuse samples are computed with the irradiance cache interpolation search radius at base quality. The indirect diffuse component when calculated is stored on a separate image plane, Figure 8.2 (left-bottom), since this will only be used for the selective guidance stage, where it is used to generate a separate selective guidance for the irradiance cache in the form of an edge map, see Figure 8.2 (middle-bottom). This image plane is then discarded. After the computation of the selective guidance when our framework reaches the selective quality component rendering stage,


159

1 A

A

B

B

C

C

D

D

3

1

2

Primary Rendering

Base Quality Component Rendering

1

1

A

A 3

3

B

B 1

1

C

C 2

2

D

D

Selective Quality Component Rendering

Selective Rendering

Key Participating media Light source

Ray

A...D

Pixels

PM ray

1...3

Quality

PM calculation

Figure 8.7: Rendering stages for progressive selective participating media using single-scatter homogenous media.

the same set of indirect diffuse samples are reprocessed. This time the interpolation search radius is modified through consultation with the selective guidance. For base quality values the older values are used, while for those close to the base quality values of the search radius the likelihood of interpolation from the existing samples is still high. After this stage the framework does not function on components any more and when the progressive selective irradiance cache receives a query it performs the normal function by rendering the interpolation search radius at selective quality. Figure 8.8 illustrates the progressive selective irradiance cache for a specific scene using rays per


160

A

A

B

B

C

C

D

D

Primary Rendering

A

Base Quality Component Rendering

2

2

A 1

1 B

C

D

B 1

C

3

D

1

3

Selective Quality Component Rendering

Selective Rendering

Key Surface IC search radius

Ray

A...D

Pixels

IC interpolation

1...3

Quality

IC calculation

Figure 8.8: Rendering stages for progressive selective irradiance cache.

pixel and irradiance cache interpolation search radius as the only selective variables. The process would be similar using other selective variables. This example demonstrates the rendering of four pixels labelled A, B, C and D. The primary rendering phase only computes the base quality of primary rays, the image generated at this intermediate stage lacks the indirect diffuse component. The subsequent base quality component rendering stage computes the indirect diffuse calculation using a uniform base quality interpolation search radius. At the end of this stage a temporary image is generated which is then input into the selective guidance which produces a saliency map for the rays per pixel selective variable and an edge map for the interpolation search radius selective variable, this phase is not shown in the figure. The values on the left hand side of the image represent the importance of that pixel. For the sake of this example we assume that the same value holds


161

Figure 8.9: Scenes used for progressive selective rendering results: Cornell box (left), Mist Cornell (middle) and Cornell Boxes (right) on the top row; Scenes used for progressive selective rendering results: Corridor (left), Tables (middle) and Simple Boxes (right) on the bottom row. for both the saliency map and the irradiance cache edge map. The final selective rendering phase begins in this same image with the selective quality component stage, were the irradiance cache values are updated in accordance with the selective guidance. Note how pixel D requires a new indirect diffuse calculation. Also, we note how the other pixels would not require the expensive computation since most of these would be interpolations. The final selective rendering stage progresses the computation using the selective guidance to produce the final rendered image. Pixels A and B require new rays to be shot which may result also in further indirect diffuse computations.

8.2.2 Implementation and Results In this section we demonstrate the potential of our selective rendering framework by rendering various scenes with different renderers. Results were taken using a number of Intel Pentium 4s of different performance under Linux (the same machine was used for all the results for a given scene). For comparisons between images we use a measure of a Visual Difference Predictor that produces a percentage of the number of pixels in error (Perr) and an average percentage of those pixels in error (Aerr). We implement our framework within the lighting simulation package R ADIANCE. We have mod-

8.2 Progressive Selective Rendering Renderer Gold Ras-rp Ras-rpic Prog-rp Prog-all Prog-dis-all

Pre-Selective Guidance none rasterisation rasterisation progressive progressive progressive

162 Selective Guidance none sal map sal map sal map sal map distinct map

Selective Variable none rp rp and IC rp all all

Table 8.2: The features of the renderers.

ified the renderer in a number of guises that can be seen in Table 8.2, to represent the selective renderers we have demonstrated in Section 4. We do not provide comparisons with the selective component-based renderer since it requires a certain amount of user interaction to compute the crex. It may be noted that the progressive selective renderers of this chapter can be viewed as an evolution of the selective component-based renderers from Chapter 5. Pre-selective guidance describes the computation of the image preview, either progressively at one ray per pixel or from rasterisation, in this case using the Snapshot software [LDC06]. Selective guidance describes the form of selective guidance used. A software version of the image-based components of the saliency map used in [LDC06] was used to compute saliency for the rays per pixel approach. The saliency maps used were all generated with the same software. The distinct map refers to a distinct map per selective variable, saliency map for rays per pixel and the selective guidance maps described in Section 8.2.1 for each selective variable. The selective variables field describes the selective variables used: rays per pixel, irradiance cache interpolation radius and all. For the renderers that use all the selective variables the actual selective variables used depend on the scene being tested. We have rendered a number of scenes (see Figure 8.9), with our renderers at a resolution of 512 × 512 with 16 rays per pixel. We tried a variety of scenes, most of which can be rasterised without difficulty. We do not attempt to use scenes that would make rasterisation obviously struggle, such as enclosures illuminated only through multiple interreflections. For all cases the selective variables used are at most rays per pixel and irradiance cache interpolation search radius. For the Mist Cornell scene there is an added participating media selective variable and for the Cornell Boxes scene the indirect lighting is also used as a selective variable. The results of the scenes can be seen in Table 8.3, Table 8.4 and Table 8.5. For all the scenes except Mist Cornell and Cornell Boxes, the same sequence of results is apparent. The fastest times are always obtained by the novel progressive rendering algorithms. In particular, Prog-disall always achieves an order of magnitude speedup. In terms of comparisons with the rasterised based renderers, it is worth comparing Ras-rp with Prog-rp and Ras-rpic with Prog-all and Progdis-all since they use the same selective variables. Prog-rp is superior to Ras-rp both in terms of visual quality, and performance, probably due to the latter’s use of attempting to render areas at higher quality than is necessary (as the example of Figure 8.1 has shown). Similarly Prog-all

8.3 Time-Constrained Selective Rendering

Gold Ras-rp Ras-rpic Prog-rp Prog-all Prog-dis-all

Time 501 390 151 391 141 38

Cornell Speed up Perr 1 0 1.28 0.74 3.31 4.64 1.28 0.55 3.55 1.14 13.2 3.75

163

Aerr 0 0.39 0.63 0.38 0.25 0.79

Time 1,825 1,229 325 1,070 250 127

Mist Cornell Speed up Perr 1 0 1.48 1.12 5.62 4.34 1.71 1.11 7.3 7.41 14.3 3.59

Aerr 0 0.87 0.90 0.70 1.15 1.09

Table 8.3: Progressive rendering timings for the Cornell Box and Mist Cornell scene. Timings in seconds.


Time 2,113 393 315 402 130 174

Cornell Boxes Speed up Perr 1 0 5.37 1.88 6.71 7.36 5.26 1.43 16.25 13.33 12.14 4.45

Aerr 0 0.29 0.58 0.28 0.78 0.48

Time 13,666 10,680 5,430 8,539 3,621 1,360

Corridor Speed up Perr 1 0 1.28 0.391 2.52 3.50 1.60 0.21 3.77 2.08 10.04 2.13

Aerr 0 0.11 0.41 0.08 0.26 0.32

Table 8.4: Progressive rendering timings for the Cornell Boxes and Corridor scene. Timings in seconds. and Prog-dis-all both perform better than Ras-rpic in terms of speed up and image quality. For the Mist Cornell scene and Cornell Boxes, rendering was performed with additional participating media for the former and direct lighting for the latter as additional selective variables. Results are primarily similar to the rest of the results, except for the poor performance in visual quality from Prog-all since the specifics of these particular selective variables are not well captured by a general purpose saliency map. Table 8.6 shows the result of the time taken to compute the image previews for Prog-rp and Progdis-all. This is useful to understand what would happen if progressive algorithms were not used and the computation was restarted from scratch. As can be seen this would result in a loss of between 14% to 35% of the computation for Prog-dis-all and higher for Prog-rp. These results also demonstrate the advantage of rendering using more selective variables since the selective guidance process commences earlier, a problem we commented upon in our first instance of selective renderer for Case I in Chapter 4.

8.3

Time-Constrained Selective Rendering

The progressive nature of our selective rendering framework makes it an ideal candidate for a time-constrained system. In particular the facility of being able to decompose the computation



Time 10,380 7,320 2,556 7,080 2,195 754

Tables Speed up 1 1.42 4.06 1.47 4.73 13.77

Perr 0 0.51 4.97 0.14 3.31 4.01

164

Aerr 0 0.146 0.71 0.10 0.70 0.70

Time 2,744 2,248 988 2,200 887 270

Simple Boxes Speed up Perr 1 0 1.22 3.70 2.77 3.90 1.24 2.96 3.09 1.43 10.16 1.66

Aerr 0 0.19 0.82 0.08 0.85 1.01

Table 8.5: Progressive rendering timings for the Tables scene and Simple Boxes scene. Timings in seconds. Scenes Cornell Mist Cornel Cornell Boxes Corridor Tables Simple Boxes

Total Time 38 127 174 1,360 754 270

Prog-dis-all Image Preview Time 11 32 24 296 262 87

% 29 25 14 22 35 32

Total Time 391 1,070 402 8,539 7,080 2,200

Prog-rp Image Preview Time 273 620 165 6,211 5,046 1,924

Table 8.6: Image preview timings compared to the entire process. Timings in seconds. into various components allows another dimension of flexibility when attempting to constrain our rendering time. Our time-constrained rendering framework follows the progressive selective rendering pipeline while keeping tabs on the time computation. This is made possible by means of profiling and scheduling, similar to that introduced in Chapter 4, Case V. Our profiling is computed while we are rendering and is updated continuously. Profiling of individual rays is computed by means of a timing map per pixel of the rays generated while rendering. As with Case V, we use an inverse exponential curve to simulate the computation of the irradiance cache components. All profiling is performed with clock cycles using a hardware instruction counter to maintain a high degree of accuracy. Scheduling occurs after a number of computations have elapsed. The computations are defined in terms of groups of similar components such as a number of rays per pixel or a number of indirect diffuse calculations, similar to the rays batches used in the rays per pixel only time-constrained renderer. The frequency of this scheduling determines the final accuracy of the result. While we perform scheduling at regular intervals, if we wanted to be tighter on constraints, the scheduling frequency could increase towards the end of the computation. Our scheduling kernel decides whether to schedule the next set of computations or not. Whenever the irradiance cache is used, the inverse exponential curve is consulted and the per pixel timing map when it is not or when the irradiance cache is saturated. In either case the scheduling uses profiling to estimate the timing of the next batch of computation. Scheduling computations are kept to a minimal and their affect on

% 70 58 41 73 71 87


A

B

3

B

3

2

I

A

B

D

E

I

C

F

G

H

4

3

E

A

2

1

F

B

D

G

2

1

I

B

C

H

I

C

E

F

1

C

1 E

4

1

3

F

1 H

A

E

3 I

B

4

A

D

1

1

4

3

B

F

H

G

A

C

4

2

D

D

2

2 E

A

B

I

1

G

A

1

1

H

1

D

A

2

F

2

A

3

1

E

3 G

4 C

4 D

165

1

D

A

G

H

I

2

3

A

B

E

D

D

A

B

G

D

E

Figure 8.10: Progressive time-constrained rendering with multiple selective variables. An example of the rendering order of the pixels, for three different components. The first component represents the typical rays per pixel, the other two are arbitrary. The order in which the individual components could be executed is demonstrated in the bottom queue.

the final computation is negligible, around 0.1 seconds on average. While it would be possible to use progressive rendering for the image preview stage using some form of sparse sampling such as a K-d tree [PS89] for the ray-tracing and the progressive algorithms for the components, for simplicity we assume that the minimum base quality is always required. This is in general the case since our minimum rays per pixel is equivalent to four rays every sixteen pixel blocks. Image fidelity may decrease substantially for lower quantities. When the time constraint is not met in the required time for the rays per pixel the constraint is assumed to be broken. For the other selective variables, the constraint can be broken, or in specific cases, such as for the indirect diffuse calculation, may be replaced by a constant term, similar to the ambient term in [CCWG88] and as used in our selective component-based renderer. In practice this has not been a problem as the results below show. The use of the novel progressive selective algorithms are essential at this stage since, as we have seen in Table 8.6, the Prog-rp renderer may spend most of the time on the image preview stage. After computation of selective guidance, the computation of the selective quality component rendering would begin as we saw in the previous sections. In the time-constrained case this runs concurrently with the final selective rendering stage, since it is preferable to select the most rele-


166

vant contributions to run before the others. We follow the same approach for the scheduling when using rays per pixel only, for our simple time-constrained renderer described in Section 4.8. In our case since we have multiple components at the selective rendering stage, there will be more than one ordered queue of computations. The multiple queues composed of batches of computations for a specific component are all ordered based on the individual selective guidance maps for that particular component. The scheduler determines which one of the queues to select a batch from. The decision is based on the overall contribution of the component to the rendered image, based on the computation of the first stage, over the cost of rendering that component. In our case we use the overall contribution of the component in terms of radiance, but other techniques based on the limited sensitivity in the human visual system could be used as well. As was the case with the simpler time-constrained renderer for rays per pixel only, the profiling is crucial for monitoring the irradiance cache computation which changes over time, affecting both the rays per pixel queue and the irradiance cache component queues. An example of this system for three components can be seen in Figure 8.10. As can be seen each of the components has its own queue based on different selective guidance maps. For this case the components are visualised on their own not consisting of batches for clarity. This could be considered as a batch of one component. The top queue is for the rays per pixel, which explains why the rays are repeated. The other two queues represent any of our other selective variables from the progressive selective renderers. The bottom queue demonstrates one possible execution order for the queues.

8.3.1 Implementation and Results We have implemented our time-constrained renderer as an extension to the progressive selective renderers based on R ADIANCE. We demonstrate results of four of the scenes from Section 8.2.2. As with those results, these results also have two selective variables: the irradiance cache interpolation search radius and rays per pixel. For each scene we show four rendered images, see Figures 8.11, 8.12, 8.13 and 8.14. The images are rendered at times of 10%, 25% and 50% of the total time it took to render the gold standard (bottom right in all sets of images), which in this case was rendered using Prog-dis-all, without any constraints. The images were rendered at a resolution of 512 × 512 at a maximum of 25 rays per pixel, and the image preview had four pixels shot at the corner of every four by four block of pixels and adapted further in the selective rendering stage. All results were rendered on an Intel Pentium 4 running at 2.8GHz with 2GB under Linux. Timings are presented in Table 8.7 and Table 8.8. The timings for the scenes in seconds were (image preview time in brackets): Cornell box 54 (2.68), Simple boxes 352 (18.62), Tables 670 (66.42), Corridor 860 (56.6). For the image preview of Prog-rp, representing the time-constrained renderer from Chapter 4, the timings in seconds were: Cornell box 70, Simple boxes 219, Tables


167

Figure 8.11: The Cornell Box scene timings: (from left to right and top to bottom) 10%, 25% and 50% of the gold standard rendered and the gold standard image. All images rendered using our progressive selective renderer.

459, Corridor 494 all above even the 50% time constraint. As can be seen, for Prog-dis-all, due to the progressive nature of the rendering and consequently the ability to begin rendering with minimum values there is always an image rendered close to the time constraint, unlike Prog-rp which for all these scenes never manages to render a single image within the required constraint, due to the very low time constraints. The scheduling and profiling coupled with the progressive nature of the rendering have proven effective as can be seen by the low percentage of error in the time constraint. The timing is always slightly in the positive since the fine grained profiling does not take into account certain computations such as loops (these are still part of the rendering process not to be confused with scheduling time). The scheduling time for all these tests is around 0.1 seconds on average, which is mostly negligible. The resulting images are also quite impressive when one considers that the reference image was rendered with Prog-dis-all which in itself produces results that are usually an order of magnitude faster than the traditional method as we showed in Section 8.2.2. One of the disadvantages of our approach is highlighted in Figure 8.14, where it


168

Figure 8.12: The Simple Boxes scene timings: (from left to right and top to bottom) 10%, 25% and 50% of the gold standard rendered and the gold standard image. All images rendered using our progressive selective renderer.

Constraint Actual Error %

Cornell Box 10% 25% 50% 5.4 13.5 27 5.45 13.73 27.5 0.9 1.7 1.8

Simple Boxes 10% 25% 50% 35 88 176 35.04 88.2 176.4 0.1 0.2 0.2

Table 8.7: Time-constrained rendering timings. The timings for the scenes were (average image preview time in brackets): Cornell Box 54 (2.68), Simple boxes 352 (18.62). Timings in seconds.

can be noted the legs of the table are missing. This occurs because the selective guidance fails to pick up the leg due to the sparse sampling in the image preview. In this case rasterisation or an approach similar to [BWG03] would be useful for feeding in to the image preview stage.

8.4 Summary

169

Figure 8.13: The Tables scene timings: (from left to right and top to bottom) 10%, 25% and 50% of the gold standard rendered and the gold standard image. All images rendered using our progressive selective renderer.

Constraint Actual Error %

10% 67 67.18 0.2

Tables 25% 50% 167.5 335 167.63 335.47 0.07 0.15

10% 86 86.8 0.9

Corridor 25% 50% 215 430 217.01 430.67 0.9 0.2

Table 8.8: Time-constrained rendering timings. The timings for the scenes were (average image preview time in brackets): Tables 670 (66.42), Corridor 860 (56.6). Timings in seconds.

8.4

Summary

In this chapter we have used the concept of rendering stages in the selective rendering frameworks originally presented in Chapter 4 to identify possible problems with these frameworks, and with this in mind identify novel solutions to selective rendering. We have determined that progres-

8.4 Summary

170

Figure 8.14: The Corridor scene timings: (from left to right and top to bottom) 10%, 25% and 50% of the gold standard rendered and the gold standard image. All images rendered using our progressive selective renderer. sive selective rendering algorithms are ideal for this framework since they are computed using physically-based computations from the offset, they can identify separate components which can be used as selective variables, which do not result in lost computation for the pre-selective rendering stage and use separate selective guidance mechanisms. This progressive selective rendering framework significantly speeds up general rendering performance while maintaining high image fidelity for the selective rendering pipelines. While we demonstrated these selective algorithms for the selective rendering pipeline, the progressive algorithms with little modification could also be used for the selective cyclic process. Furthermore, their progressive nature has made them ideal candidates for time-constrained rendering, using ongoing scheduling and profiling, a feature which has important uses since it could provide perceptually good results within fractions of the time traditionally required, a characteristic that was not possible using the other time-constrained renderers.

Chapter 9

Conclusions and Future Work The use of physically-based rendering for generating realistic images of complex virtual environments has increased due to its growing importance in the field of realistic simulations and to meet the needs and expectations of the entertainment industry. Improving the performance of physically-based rendering of complex scenes remains one of the major active fields of research in computer graphics. Selective rendering algorithms improve the performance by adaptively and sometimes progressively rendering non-uniformly, based on some selective criteria, while attempting to maintain the same perceptual fidelity as traditional rendering algorithms. While many diverse selective rendering methods have been proposed, there has been little work done on identifying and categorising these methods, a process which is helpful in avoiding pitfalls of such techniques and identifying potential improvements.

9.1

Contributions

In this thesis we have presented an overview of physically-based rendering and in particular a comprehensive guide to the literature of selective rendering of high-fidelity graphics. We developed a series of selective rendering systems, that take advantage of visual attention processes, parallelism and simple time-constrained rendering approaches. These systems demonstrated the potential for selective rendering for taking advantage of bottom-up visual attention processes and on-screen distractors. We made use of graphics hardware to accelerate the pre-selective rendering stage and the selective guidance stage and showed how the independence of these two stages from the final, selective rendering, stage in the pipeline resulted in the ability to use multiple selective variables and sampling methods. We further extended this method for use with parallelism and more complex selective guidance methods based on the importance map. We then introduced selective time-constrained rendering, through the progression of the ray per pixel selective variable. Based

171

9.1 Contributions

172

on these methods and the literature review of selective rendering we noticed that most of these techniques follow certain stages within a selective rendering framework. We broadly categorised these frameworks into two approaches we termed selective rendering pipeline and selective cyclic process. We presented selective component-based algorithms that adapt the light transport for the individual components based on selective criteria. The first of these methods used the crex as a method for user-aided selective guidance. The crex when combined with methods for exploiting visual attention provides control of selective rendering on a per pixel basis. The approach can also be used for progressive and time-constrained rendering, an alternative approach to the traditional selective rendering. Furthermore, the component-based approach was used to accelerate adaptive sampling, an automatic approach that used distinct selective guidance techniques per component to accelerate the computation. Finally, the finer grained nature of the component-based rendering allowed us to improve the performance of the parallel irradiance cache when compared to the traditional approaches. The component-based parallel irradiance cache was further used to accelerate the rendering of the irradiance cache search radius as a selective variable, improving the performance of our parallel selective rendering approach. By using progressive algorithms we showed how to construct selective renderers that can generate image previews for pre-selective guidance rapidly, without the loss in high-fidelity effects associated with the rasterisation previews, without having any of the computation discarded, maintaining the ability to be able to render using any selective rendering variables and the ability to function with the selective rendering pipeline and cyclic selective rendering process. We used a number of algorithms based on each of our selective variables, from the simplest which used the rays per pixel selective variable and hierarchical sampling methods, to the more complex participating media and progressive irradiance cache. By using multiple selective variables and selective guidance methods we obtained an order of magnitude improvement over the traditional rendering methods, and a better performance than the previous selective methods tested. Our novel selective rendering algorithms adapt naturally to time-constrained rendering with the simple addition of a lightweight micro management kernel within the selective rendering process that performed the dual function of scheduling and profiling at minimum costs amongst different selective variables. The time-constrained approach used by these algorithms could scale better than the previous approaches due to the use of multiple selective variables which can be computed and progressed independently. This approach is further accentuated by the multidimensional lightcuts approach [WABG06], discussed in Section 3.6, developed concurrently with ours, that uses adaptive and progressive techniques functioning on multiple selective variables to improve rendering performance.

9.2 Directions for Future Work

9.2

173

Directions for Future Work

Besides selective rendering, there are a number of areas that are being used to achieve speedup in rendering performance. These methods could potentially, with modifications, benefit from the use of the methods presented in this thesis. The first major push in improving rendering performance is based on exploiting current trends in hardware. These methods, outlined in Section 2.10, attempt to make the best of the instructionlevel parallelism, caching mechanism, speculative execution and super scalar pipelines of modern CPUs, use the large SIMD pipelines of current GPUs and have begun to exploit the multicore architectures of the next generation of CPUs [BWSF06]. These ray-tracing based methods achieve the majority of their speed up by grouping coherent rays together to better take advantage of the memory access disparity between cache and main memory and simultaneously make best use of the instruction-level parallelism [SSW+ 06]. Interactive frame rates can be achieved through the careful engineering of such ray-tracing systems for moderately complex static scenes [RSH05] and dynamic scenes [WIK+ 06]. Due to the rigorous demands of such systems, operations are kept simple and tight, little work has been done using selective rendering for such techniques since, by nature, selective rendering is not coherent. While, promising methods have been used, for example [DWWL05], which adaptively ray-traced images by shooting cross hairs rather than individual rays, such that coherence is maintained, further work needs to be done to identify a balance between the selective rendering, to obtain speedup demonstrated in this thesis, while maintaining coherence and use of instruction-level parallelism used in accelerating interactive ray tracers. While we have tackled distributed parallelism, the other aspect of parallelism which is gaining popularity is the intra-node computation that may be exploited through the use of different computational resources in the same machine [BFH+ 04], for example GPU, CPU and the potentially new devices for handling physical simulations. The ability to subdivide the rendering into smaller tasks, using approaches based on, or akin to, component-based rendering, could help distributing the process across the hardware, since specific hardware excels on certain aspects of the computation. One could envisage a system were the graphics card handles the primary ray intersection, using rasterisation-like techniques, selective guidance techniques, as were shown in this thesis, direct illumination and potentially shadows, while the CPU and dedicated physics processors handle the complex ray-geometry intersections. Methods to further enhance these approaches using selective rendering would have to take into consideration aspects of coherence discussed above in the case of CPU based interactive ray tracers. A number of methods have been devised to speedup real-time rendering by precomputing parts of the lighting equation in a pre-computation step [SKS02]. These pre-computed radiance transfer (PRT) methods have begun to bridge the gap between ray-tracing and rasterisation techniques. While the first of such methods functioned under specific conditions, for image-based low fre-

9.3 Concluding Remarks

174

quency lighting of static scenes, newer methods can achieve higher frequency lighting and more dynamic scenes. Until now, no work has been attempted to using selective rendering methods in conjunction with PRT methods. Sparse sampling methods that make use of cache re-use, as first mentioned in Section 3.10, are selective by nature. In this thesis, with the exception of the irradiance cache algorithms, we have not considered the effect of selective rendering on the re-use of aspects of the computation. These methods mainly use pixel re-projection to achieve temporal coherence. The methods in this thesis may be useful to produce new algorithms based on selective rendering. The adaptive componentbased approach may be used for separating the components and re-projecting those that change least, in a method not dissimilar to the irradiance cache operation for animations. Progressive selective rendering algorithms would also be useful to identify specific areas of improvement and constrain deadlines, since these techniques are generally interactive. Rendering under timing constraints for interactive applications and animations might require a few changes to the methods, primarily for off-line rendering, presented in this thesis. When rendering interactively, care must be taken to ensure that subsequent frames do not change too much in quality. A hysteresis term, similar to the one used in [FS93] for real-time time-constrained rasterisation rendering, could be added to the benefit function to ensure a smoother change. For rendering animations, care must also be taken to ensure that the resulting frames are all of similar quality. This could potentially not be the case if equal time is allotted to each frame. Either the same hysteresis option as used for interactive rendering could be used or alternative methods could be devised that contribute to all the frames concurrently until time runs out. Constraint rendering could benefit other aspects of rendering than time. The memory constraints used for [DPF01] have far reaching implications due to the divergence between processor speeds and the various levels of memory caching and could result in further improvements for methods based on coherence. For small-form-factor devices power-constraint techniques could be used to ensure that the application does not terminate before power runs out even if it means executing at a lower quality. Perhaps, the major limitation of our approach is due to the inability of rendering from the light sources, which is important to be able to accelerate certain aspects of the rendering computation, in particular caustics. Methods based on visual importance sampling used for photon mapping [PP98], might prove useful in identifying importance in object space to direct photons to and may be used in conjunction with the progressive methods described in Chapter 8.

9.3

Concluding Remarks

Realism in real time remains a holy grail of computer graphics rendering. The novel methods presented in this thesis have made some headway towards this direction by generally advancing the


175

state of the art of selective rendering. The combination of parallelism and selective rendering have made it possible to bring down the costs to close to real time, on a conventional global illumination renderer, without requiring engineering of the implementation to account for quickly-changing hardware trends as used by the majority of real-time ray tracers. The progressive selective rendering algorithms using multiple selective variables and distinct selective guidance methods have made it possible to produce highly controlled selectively rendered images that can be perceptually indistinguishable from the fully rendered images if allowed to complete or terminated at a point in the computation to satisfy time constraints. Selective rendering methods offer an elegant alternative to the brute-force traditional approaches. By ensuring the images rendered are just within the bounds of the human visual threshold of visibility and attention, such methods can save significant computational effort while maintaining high perceptual fidelity. The work in this thesis has highlighted the potential of such selective rendering methods and shown a way that realism in real-time may well be achieved.


176

Bibliography [ABWW03] Alessandro Artusi, Jiri Bittner, Michael Wimmer, and Alexander Wilkie. Delivering interactivity to complex tone mapping operators. In EGRW ’03: Proceedings of the 14th Eurographics workshop on Rendering, pages 38–44, Aire-la-Ville, Switzerland, Switzerland, 2003. Eurographics Association. [AH93]

Larry Aupperle and Pat Hanrahan. A hierarchical illumination algorithm for surfaces with glossy reflection. In SIGGRAPH ’93: Proceedings of the 20th annual conference on Computer graphics and interactive techniques, pages 155–162, New York, NY, USA, 1993. ACM Press.

[Ama84]

John Amanatides. Ray tracing with cones. In SIGGRAPH ’84: Proceedings of the 11th annual conference on Computer graphics and interactive techniques, pages 129–135, New York, NY, USA, 1984. ACM Press.

[AMH02]

Tomas Akenine-Moller and Eric Haines. Real-Time Rendering. A. K. Peters, Ltd., Natick, MA, USA, 2002.

[App68]

Arthur Appel. Some techniques for shading machine renderings of solids. In Proceedings of the Spring Joint Computer Conference, pages 37–45, 1968.

[ASGC06]

Oscar Anson, Veronica Sundstedt, Diego Gutierrez, and Alan Chalmers. Efficient selective rendering of participating media. In APGV 2006 - Symposium on Applied Perception in Graphics and Visualization. ACM, July 2006.

[Ash95]

Ian Ashdown. Radiosity: a programmer’s perspective. John Wiley & Sons, Inc., New York, NY, USA, 1995.

[BBC04]

BBC. 2004.

[Bek99]

Philippe Bekaert. Hierarchical and Stochastic Algorithms for Radiosity. PhD thesis, Leuven, Belgium, 1999.

http://news.bbc.co.uk/2/hi/technology/4014333.stm.

177

BBC News Website,

BIBLIOGRAPHY

178

[BFGS86]

Larry Bergman, Henry Fuchs, Eric Grant, and Susan Spach. Image rendering by adaptive refinement. In SIGGRAPH ’86, pages 29–37. ACM Press, 1986.

[BFH+ 04]

Ian Buck, Tim Foley, Daniel Reiter Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, and Pat Hanrahan. Brook for gpus: stream computing on graphics hardware. ACM Trans. Graph., 23(3):777–786, 2004.

[BFMZ94]

Gary Bishop, Henry Fuchs, Leonard McMillan, and Ellen J. Scher Zagier. Frameless rendering: double buffering considered harmful. In SIGGRAPH ’94: Proceedings of the 21st annual conference on Computer graphics and interactive techniques, pages 175–176, New York, NY, USA, 1994. ACM Press.

[Bli77]

James Blinn. Models of Light Reflection for Computer Synthesized Pictures. In SIGGRAPH’77, pages 192–198. ACM Press, 1977.

[BM98]

Mark R. Bolin and Gary W. Meyer. A perceptually based adaptive sampling algorithm. In SIGGRAPH ’98, pages 299–309. ACM Press, 1998.

[BN76]

James F. Blinn and Martin E. Newell. Texture and reflection in computer generated images. Commun. ACM, 19(10):542–547, 1976.

[BS06]

Randolph Blake and Robert Sekuler. Perception. McGraw Hill, 2006.

[BTB91]

Christian Bouville, Pierre Tellier, and Kadi Bouatouch. Low sampling densities using a psychovisual approach. In EUROGRAPHICS 1991, 1991.

[BW90]

Alan Burns and A. J. Wellings. Real-time systems and their programming languages. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1990.

[BWG03]

Kavita Bala, Bruce Walter, and Donald P. Greenberg. Combining edges and points for interactive high-quality rendering. ACM Trans. Graph., 22(3):631–640, 2003.

[BWS03]

Carsten Benthin, Ingo Wald, and Philipp Slusallek. A Scalable Approach to Interactive Global Illumination. Computer Graphics Forum, 22(3):621–630, 2003. (Proceedings of Eurographics).

[BWSF06]

Carsten Benthin, Ingo Wald, Michael Scherbaum, and Heiko Friedrich. Ray Tracing on the CELL Processor. Technical Report, inTrace Realtime Ray Tracing GmbH, No inTrace-2006-001 (submitted for publication), 2006.

[Cat74]

Edwin Earl Catmull. A subdivision algorithm for computer display of curved surfaces. PhD thesis, University of Utah, 1974.

[Cat78]

Edwin Catmull. A hidden-surface algorithm with anti-aliasing. In SIGGRAPH ’78: Proceedings of the 5th annual conference on Computer graphics and interactive techniques, pages 6–11, New York, NY, USA, 1978. ACM Press.

BIBLIOGRAPHY

179

[CCC87]

Robert L. Cook, Loren Carpenter, and Edwin Catmull. The reyes image rendering architecture. In SIGGRAPH ’87: Proceedings of the 14th annual conference on Computer graphics and interactive techniques, pages 95–102, New York, NY, USA, 1987. ACM Press.

[CCL02]

Kirsten Cater, Alan Chalmers, and Patrick Ledda. Selective quality rendering by exploiting human inattentional blindness: looking but not seeing. In Proceedings of the ACM symposium on Virtual reality software and technology, pages 17–24. ACM Press, 2002.

[CCW03]

Kirsten Cater, Alan Chalmers, and Gregory Ward. Detail to Attention: Exploiting Visual Tasks for Selective Rendering. In Proceedings of the Eurographics Symposium on Rendering, pages 270–280, 2003.

[CCWG88] Michael F. Cohen, Shenchang Eric Chen, John R. Wallace, and Donald P. Greenberg. A progressive refinement approach to fast radiosity image generation. In SIGGRAPH ’88, pages 75–84. ACM Press, 1988. [CDR02]

Alan Chalmers, Timothy Davis, and Erik Reinhard. AK Peters Ltd, July 2002.

[CDS+ 06]

Alan Chalmers, Kurt Debattista, Veronica Sundstedt, Peter Longhurst, and Richard Gillibrand. Rendering on demand. In EGPGV2006 - 6th Eurographics Symposium on Parallel Graphics Visualization. Eurographics, May 2006.

[Cla76]

James H. Clark. Hierarchical geometric models for visible surface algorithms. Commun. ACM, 19(10):547–554, 1976.

[Cle]

Clearspeed. http://www.clearspeed.com.

[Coo84]

Robert L. Cook. Shade trees. In SIGGRAPH ’84: Proceedings of the 11th annual conference on Computer graphics and interactive techniques, pages 223–231, New York, NY, USA, 1984. ACM Press.

[Coo86]

Robert L. Cook. Stochastic sampling in computer graphics. ACM Trans. Graph., 5(1):51–72, 1986.

[CPC84]

Robert L. Cook, Thomas Porter, and Loren Carpenter. Distributed ray tracing. In SIGGRAPH ’84: Proceedings of the 11th annual conference on Computer graphics and interactive techniques, pages 137–145, New York, NY, USA, 1984. ACM Press.

[CRMT91] Shenchang Eric Chen, Holly E. Rushmeier, Gavin Miller, and Douglass Turner. A progressive multi-pass method for global illumination. In SIGGRAPH ’91, pages 165–174. ACM Press, 1991. [Cro77]

Franklin C. Crow. Shadow algorithms for computer graphics. In SIGGRAPH ’77: Proceedings of the 4th annual conference on Computer graphics and interactive techniques, pages 242–248, New York, NY, USA, 1977. ACM Press.

BIBLIOGRAPHY

180

[CT81]

Robert L. Cook and Kenneth E. Torrance. A Reflectance Model for Computer Graphics. In SIGGRAPH’81, pages 307–316. ACM Press, 1981.

[Dal93]

Scott Daly. The visible differences predictor: an algorithm for the assessment of image fidelity. Digital images and human vision, pages 179–206, 1993.

[DB97]

Paul J. Diefenbach and Norman I. Badler. Multi-pass pipeline rendering: realism for dynamic environments. In SI3D ’97: Proceedings of the 1997 symposium on Interactive 3D graphics, pages 59–ff., New York, NY, USA, 1997. ACM Press.

[DBB02]

Philip Dutre, Kavita Bala, and Philippe Bekaert. Advanced Global Illumination. A. K. Peters, Ltd., 2002.

[DCWP02] Kate Devlin, Alan Chalmers, Alexander Wilkie, and Werner Purgathofer. Tone Reproduction and Physically Based Spectral Rendering. In Eurographics 2002 State of the Art Reports, 2002. [DJA+ 04]

Philip Dutre;, Henrik Wann Jensen, Jim Arvo, Kavita Bala, Philippe Bekaert, Steve Marschner, and Matt Pharr. State of the art in monte carlo global illumination. In SIGGRAPH ’04: Proceedings of the conference on SIGGRAPH 2004 course notes, page 5, New York, NY, USA, 2004. ACM Press.

[DPF01]

Reynald Dumont, Fabio Pellacini, and James A. Ferwerda. A perceptually-based texture caching algorithm for hardware-based rendering. In Proceedings of the 12th Eurographics Workshop on Rendering Techniques, pages 249–256, London, UK, 2001. Springer-Verlag.

[DPF03]

Reynald Dumont, Fabio Pellacini, and James A. Ferwerda. Perceptually-driven decision theory for interactive realistic rendering. ACM Trans. Graph., 22(2):152–181, 2003.

[DSPC05]

Kurt Debattista, Veronica Sundstedt, Francisco Pereira, and Alan Chalmers. Selective parallel rendering for high-fidelity graphics. In Proceedings of Theory and Practice of Computer Graphics 2005, pages 59–66. Eurographics Association, June 2005.

[DSSC05]

Kurt Debattista, Veronica Sundstedt, Luis Paulo Santos, and Alan Chalmers. Selective component-based rendering. In GRAPHITE, 3rd International Conference on Computer Graphics and Interactive Techniques in Australasia and South East Asia, pages 13–22. ACM Press, November 2005.

[DWWL05] Abhinav Dayal, Cliff Woolley, Benjamin Watson, and David P. Luebke. Adaptive frameless rendering. In Rendering Techniques, pages 265–275, 2005.

BIBLIOGRAPHY

181

[EC06]

Gavin Ellis and Alan Chalmers. The effect of translational ego-motion on the perception of high fidelity animations. In Spring Conference on Computer Graphics. ACM SIGGRAPH, April 2006.

[ECD06]

Gavin Ellis, Alan Chalmers, and Kurt Debattista. The effect of rotational ego-motion on the perception of high fidelity animations. In APGV 2006. APGV, July 2006.

[Fer01]

James A. Ferwerda. Elements of early vision for computer graphics. IEEE Compututer Graphics Applications, 21(5):22–33, 2001.

[FP04]

Jean-Philippe Farrugia and Bernard Péroche. A progressive rendering algorithm using an adaptive perceptually based image metric. Comput. Graph. Forum, 23(3):605–614, 2004.

[FS93]

Thomas A. Funkhouser and Carlo H. Sequin. Adaptive display algorithm for interactive frame rates during visualization of complex virtual environments. In SIGGRAPH ’93, pages 247–254. ACM Press, 1993.

[FTI86]

Akira Fujimoto, Takayuki Tanaka, and Kansei Iwata. Arts: Accelerated ray-tracing system. 6(4):16–26, April 1986.

[FvDFH90] James D. Foley, Andries van Dam, Steven K. Feiner, and John F. Hughes. Computer graphics: principles and practice (2nd ed.). Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1990. [GB99]

Enrico Gobbetti and Eric Bouvier. Time-critical multiresolution scene rendering. In VIS ’99: Proceedings of the conference on Visualization ’99, pages 123–130. IEEE Computer Society Press, 1999.

[GBP06]

Pascal Gautron, Kadi Bouatouch, and Sumanta Pattanaik. Temporal radiance caching. Technical Report 1796, IRISA, Rennes, France, 2006.

[GDC05]

Richard Gillibrand, Kurt Debattista, and Alan Chalmers. Cost prediction maps for global illumination. In Proceedings of Theory and Practice of Computer Graphics 2005, pages 97–104. Eurographics Association, June 2005.

[GKBP05]

Pascal Gautron, Jaroslav Kˇrivánek, Kadi Bouatouch, and Sumanta Pattanaik. Radiance cache splatting: A GPU-friendly global illumination algorithm. In Proceedings of Eurographics Symposium on Rendering, June 2005.

[Gla84]

Andrew S. Glassner. Space subdivision for fast ray tracing. IEEE Computer Graphics Applications, 4(10):15–22, October 1984.

[Gla89]

Andrew S. Glassner, editor. An introduction to ray tracing. Academic Press Ltd., London, UK, UK, 1989.

BIBLIOGRAPHY

182

[Gla95]

Andrew Glassnar. Principles of Digital Image Synthesis. Morgan Kaufmann, 1995.

[GLDC06]

Richard Gillibrand, Peter Longhurst, Kurt Debattista, and Alan Chalmers. Cost prediction for global illumination using a fast rasterised scene preview. In AFRIGRAPH 2006 4th International Conference on Computer Graphics, Virtual Reality, Visualisation and Interaction in Africa, pages 41–48. ACM SIGGRAPH, January 2006.

[Gou71]

H. Gouraud. Continuous Shading of Curved Surfaces. 20(6):623–628, 1971.

[Gre87]

Leslie Frederick Greengard. The rapid evaluation of potential fields in particle systems. PhD thesis, 1987.

[Gre99]

Donald P. Greenberg. A framework for realistic image synthesis. Commun. ACM, 42(8):44–53, 1999.

[GTGB84]

Cindy M. Goral, Kenneth E. Torrance, Donald P. Greenberg, and Bennett Battaile. Modeling the interaction of light between diffuse surfaces. In SIGGRAPH ’84, pages 213–222. ACM Press, 1984.

[Guo98]

Baining Guo. Progressive radiance evaluation using directional coherence maps. In SIGGRAPH ’98: Proceedings of the 25th annual conference on Computer graphics and interactive techniques, pages 255–266, New York, NY, USA, 1998. ACM Press.

[GWS04]

Johannes Guenther, Ingo Wald, and Philipp Slusallek. Realtime Caustics using Distributed Photon Mapping. In Proceedings of the Eurographics Symposium on Rendering, 2004.

[HA90]

Paul Haeberli and Kurt Akeley. The accumulation buffer: hardware support for highquality rendering. In SIGGRAPH ’90: Proceedings of the 17th annual conference on Computer graphics and interactive techniques, pages 309–318, New York, NY, USA, 1990. ACM Press.

[Hec90]

Paul S. Heckbert. Adaptive radiosity textures for bidirectional ray tracing. In SIGGRAPH ’90, pages 145–154. ACM Press, 1990.

[Her04]

Christophe Hery. Rendering evolution at industrial light & magic. In Rendering Techniques, pages 19–22, 2004.

[HH84]

Paul S. Heckbert and Pat Hanrahan. Beam tracing polygonal objects. In SIGGRAPH ’84: Proceedings of the 11th annual conference on Computer graphics and interactive techniques, pages 119–127, New York, NY, USA, 1984. ACM Press.

[HL97]

Eric Horvitz and Jed Lengyel. Perception, attention, and resources: A decisiontheoretic approach to graphics rendering, 1997.

BIBLIOGRAPHY [HLHS03]

183

Jean-Marc Hasenfratz, Marc Lapierre, Nicolas Holzschuch, and François Sillion. A survey of real-time soft shadows algorithms. In Eurographics. Eurographics, Eurographics, 2003. State-of-the-Art Report.

[HMYS01] Jorg Haber, Karol Myszkowski, Hitoshi Yamauchi, and Hans P. Seidel. Perceptually guided corrective splatting. In Computer Graphics Forum, volume 20, pages 142– 152, 2001. [HSA91]

Pat Hanrahan, David Salzman, and Larry Aupperle. A rapid hierarchical radiosity algorithm. In SIGGRAPH ’91: Proceedings of the 18th annual conference on Computer graphics and interactive techniques, pages 197–206, New York, NY, USA, 1991. ACM Press.

[IK00]

Laurent Itti and Christof Koch. A saliency-based search mechanism for overt and covert shifts of visual attention. In Vision Research, volume 40, pages 1489–1506, 2000.

[IKN98]

Laurent Itti, Christof Koch, and Ernst Niebur. A model of Saliency-Based Visual Attention for Rapid Scene Analysis. In Pattern Analysis and Machine Intelligence, volume 20, pages 1254–1259, 1998.

[Int03]

Intel. Software Developer’s Manual Volume 1: Basic Architecture. Technical report, Intel Corporation, 2003.

[Jam90]

William James. A saliency-based search mechanism for overt and covert shifts of visual attention. In Principles of Psychology, 1890.

[Jen95]

Henrik Wann Jensen. Importance Driven Path Tracing Using the Photon Map. In P. M. Hanrahan and W. Purgathofer, editors, Rendering Techniques ’95 (Proceedings of the Sixth Eurographics Workshop on Rendering), pages 326–335, New York, NY, 1995. Springer-Verlag.

[Jen01]

Henrik Wann Jensen. Realistic Image Synthesis Using Photon Mapping. AK Peters, 2001.

[Kaj86]

James T. Kajiya. The rendering equation. In SIGGRAPH ’86: Proceedings of the 13th annual conference on Computer graphics and interactive techniques, pages 143–150, New York, NY, USA, 1986. ACM Press.

[KBPv06]

ˇ ara. Making Jaroslav Kˇrivánek, Kadi Bouatouch, Sumanta N. Pattanaik, and Jiˇr´ı Z´ radiance and irradiance caching practical: Adaptive caching and neighbor clamping. In Tomas Akenine-Mller and Wolfgang Heidrich, editors, Rendering Techniques 2006, Eurographics Symposium on Rendering, pages 127–138, Nicosia, Cyprus, June 2006. Eurographics Association, Eurographics Association.

BIBLIOGRAPHY

184

[Kel97]

Alexander Keller. Instant radiosity. In SIGGRAPH ’97: Proceedings of the 24th annual conference on Computer graphics and interactive techniques, pages 49–56, New York, NY, USA, 1997. ACM Press/Addison-Wesley Publishing Co.

[KGBP05]

Jaroslav Kˇrivánek, Pascal Gautron, Kadi Bouatouch, and Sumanta Pattanaik. Improved radiance gradient computation. In SCCG ’05: Proceedings of the 21st spring conference on Computer graphics, pages 155–159, New York, NY, USA, 2005. ACM Press.

[KK02]

Thomas Kollig and Alexander Keller. Efficient multidimensional sampling. Computer Graphics Forum, 21(3):557–563, September 2002.

[KMG99]

Roland Koholka, Heinz Mayer, and Alois Goller. Mpi-parallelized radiance on sgi cow and smp. In ParNum ’99: Proceedings of the 4th International ACPC Conference Including Special Tracks on Parallel Numerics and Parallel Computing in Image Processing, Video Processing, and Multimedia, pages 549–558. Springer-Verlag, 1999.

[KU85]

Christof Koch and Shimon Ullman. Shifts in selective visual attention: towards the underlying neural circuitry. In Human Neurobiology, volume 4, pages 219–227, 1985.

[LCTS05]

Patrick Ledda, Alan Chalmers, Tom Troscianko, and Helge Seetzen. Evaluation of tone mapping operators using a high dynamic range display. In ACM SIGGRAPH 2005, LA. ACM Press, August 2005.

[LDC05]

Peter Longhurst, Kurt Debattista, and Alan Chalmers. Snapshot: A rapid technique for driving a selective global illumination renderer. In WSCG 2005 SHORT papers proceedings, 2005.

[LDC06]

Peter Longhurst, Kurt Debattista, and Alan Chalmers. A gpu based saliency map for high-fidelity selective rendering. In AFRIGRAPH 2006 4th International Conference on Computer Graphics, Virtual Reality, Visualisation and Interaction in Africa, pages 21–29. ACM SIGGRAPH, January 2006.

[LDGC05]

Peter Longhurst, Kurt Debattista, Richard Gillibrand, and Alan Chalmers. Analytic antialiasing for selective high fidelity rendering. In SIBGRAPI 2005, pages 359–366. IEEE Computer Society Press, October 2005.

[LH01]

David P. Luebke and Benjamin Hallen. Perceptually-driven simplification for interactive rendering. In Proceedings of the 12th Eurographics Workshop on Rendering Techniques, pages 223–234, London, UK, 2001. Springer-Verlag.

[LHK+ 04]

David Luebke, Mark Harris, Jens Kruger, Tim Purcell, Naga Govindaraju, Ian Buck, Cliff Woolley, and Aaron Lefohn. Gpgpu: general purpose computation on graphics

BIBLIOGRAPHY

185

hardware. In SIGGRAPH ’04: Proceedings of the conference on SIGGRAPH 2004 course notes, page 33, New York, NY, USA, 2004. ACM Press. [LMK98]

B. Li, G. Meyer, and R. Klassen. A comparison of two image quality models. In SPIE, Human Vision and ELectronic Imaging III, number 3299, 1998.

[Lon05]

Peter Longhurst. Rapid Saliency Identification for Selectively Rendering High Fidelity Graphics. PhD thesis, University of Bristol, December 2005.

[LRU85]

Mark E. Lee, Richard A. Redner, and Samuel P. Uselton. Statistically optimized sampling for distributed ray tracing. In SIGGRAPH ’85: Proceedings of the 12th annual conference on Computer graphics and interactive techniques, pages 61–68, New York, NY, USA, 1985. ACM Press.

[LS98]

Greg Ward Larson and Rob Shakespeare. Rendering with radiance: the art and science of lighting visualization. Morgan Kaufmann Publishers Inc., 1998.

[Lub95]

Jeffrey Lubin. A visual discrimination model for imaging system design and evaluation. Vision Models for Target Detection and Recognition, pages 245–283, 1995. Eli Peli Editor, World Scientific, New Jersey.

[LW93]

Eric P. Lafortune and Yves D. Willems. Bidirectional Path Tracing. In 3rd International Conference on Computational Graphics and Visualization Techniques, pages 145–153, Alvor, Portugal, 1993.

[LW95]

Eric P. Lafortune and Yves D. Willems. A 5D Tree to Reduce the Variance of Monte Carlo Ray Tracing. In P. M. Hanrahan and W. Purgathofer, editors, Rendering Techniques ’95 (Proceedings of the Sixth Eurographics Workshop on Rendering), pages 11–20, New York, NY, 1995. Springer-Verlag.

[LWC+ 02]

David Luebke, Benjamin Watson, Jonathan D. Cohen, Martin Reddy, and Amitabh Varshney. Level of Detail for 3D Graphics. Elsevier Science Inc., 2002.

[MB97]

Ashton E. W. Mason and Edwin H. Blake. Automatic hierarchical level of detail optimization in computer animation. Computer Graphics Forum, 16(3):C191–C199, 1997.

[MCTR98] Ann McNamara, Alan Chalmers, Tom Troscianko, and Erik Reinhard. Fidelity of graphics reconstructions: A psychophysical investigation. In Proceedings of the 9th Eurographics Rendering Workshop, pages 237–246. Springer Verlag, June 1998. [MD02]

Gerd Marmitt and Andrew T. Duchowski. Modeling Visual Attention in VR: Measuring the Accuracy of Predicted Scanpaths. In Eurographics 2002, Short Presentations, pages 217–226, 2002.

BIBLIOGRAPHY

186

[MDCT05] Georgia Mastoropoulou, Kurt Debattista, Alan Chalmers, and Tom Troscianko. Auditory bias of visual attention for perceptually-guided selective rendering of animations. In GRAPHITE 2005, sponsored by ACM SIGGRAPH, Dunedin, New Zealand. ACM Press, December 2005. [Mit87]

Don P. Mitchell. Generating antialiased images at low sampling densities. In SIGGRAPH ’87, pages 65–72. ACM Press, 1987.

[ML92]

Gary Meyer and Aihua Liu. Color spatial acuity control of a screen subdivision image synthesis algorithm. In In Human Vision, Visual Processing, and Digital Display III, 1992.

[MNS00]

Stephen Marsland, Ulrich Nehmzow, and Jonathan Shapiro. Novelty detection on a mobile robot using habituation. In From Animals to Animats: The 6th International Conference on Simulation of Adaptive Behaviour, 2000.

[MR98]

Arien Mack and Irvin Rock. Inattentional Blindness. MIT Press, 1998.

[MRT00]

Karol Myszkowski, Przemyslaw Rokita, and Takehiro Tawara. Perception-based fast rendering and antialiasing of walkthrough sequences. IEEE Transactions on Visualization and Computer Graphics, 6(4):360–379, October 2000.

[MS95]

Paulo W. C. Maciel and Peter Shirley. Visual navigation of large environments using textured clusters. In SI3D ’95: Proceedings of the 1995 symposium on Interactive 3D graphics, pages 95–ff. ACM Press, 1995.

[Mys98]

Karol Myszkowski. The Visible Differences Predictor: Applications to global illumination problems. In Eurographics Workshop on Rendering, pages 223–236, 1998.

[OHM+ 04] Carol O’Sullivan, Sarah Howlett, Rachel McDonnell, Yann Morvan, and Keith O’Conor. Perceptually adaptive graphics. In Eurographics State of the Art Reports, 2004. [OLG+ 05]

John D. Owens, David Luebke, Naga Govindaraju, Mark Harris, Jens Krger, Aaron E. Lefohn, and Timothy J. Purcell. A survey of general-purpose computation on graphics hardware. In Eurographics 2005, State of the Art Reports, pages 21–51, August 2005.

[OR98]

Eyal Ofek and Ari Rappoport. Interactive reflections on curved objects. In SIGGRAPH ’98: Proceedings of the 25th annual conference on Computer graphics and interactive techniques, pages 333–342, New York, NY, USA, 1998. ACM Press.

[Pat93]

Sumata N. Pattanaik. Computational Methods for Global Illumination and Visualisation of Complex 3D Environments. PhD thesis, National Institute for Software Technology, Bombay, February 1993.

BIBLIOGRAPHY

187

[PBMH02] Timothy J. Purcell, Ian Buck, William R. Mark, and Pat Hanrahan. Ray tracing on programmable graphics hardware. ACM Transactions on Graphics, 21(3):703–712, July 2002. ISSN 0730-0301 (Proceedings of ACM SIGGRAPH 2002). [PDC+ 03]

Timothy J. Purcell, Craig Donner, Mike Cammarano, Henrik Wann Jensen, and Pat Hanrahan. Photon mapping on programmable graphics hardware. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware, pages 41–50. Eurographics Association, 2003.

[Per85]

Ken Perlin. An image synthesizer. In SIGGRAPH ’85: Proceedings of the 12th annual conference on Computer graphics and interactive techniques, pages 287– 296, New York, NY, USA, 1985. ACM Press.

[PH04]

Matt Pharr and Greg Humphreys. Physically Based Rendering: From Theory to Implementation. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2004.

[Pho75]

Bong Tui Phong. Illumination for computer generated images. Communications of the ACM, 18(6):311–317, 1975.

[PMS+ 99]

Steven Parker, William Martin, Peter-Pike J. Sloan, Peter Shirley, Brian Smits, and Charles Hansen. Interactive Ray Tracing. In 1999 Symposium Interactive 3D Computer Graphics, pages 119–126, 1999.

[PP98]

Ingmar Peter and Georg Pietrek. Importance driven construction of photon maps. In G. Drettakis and N. Max, editors, Rendering Techniques ’98 (Proceedings of Eurographics Rendering Workshop ’98), pages 269–280, New York, NY, 1998. Springer Wien.

[PP99]

Jan Prikryl and Werner Purgathofer. Overview of perceptually-driven radiosity methods. Technical Report TR-186-2-99-26, Vienna, Austria, 1999.

[PS89]

James Painter and Kenneth Sloan. Antialiased ray tracing by adaptive progressive refinement. In SIGGRAPH ’89, pages 281–288. ACM Press, 1989.

[Pur87]

Werner Purgathofer. A statistical method for adaptive stochastic sampling. Computer and Graphics, (11), 1987. Pergamon Press, New York.

[Pur04]

Tim Purcell. Ray Tracing on a Stream Processor. PhD thesis, Stanford University, March 2004.

[RCJ98]

Erik Reinhard, Alan Chalmers, and Frederik W. Jansen. Overview of parallel photorealistic graphics. In Eurographics ’98 State of the Art Reports, pages 1–25. Eurographics Association, August 1998.

[RCLL99]

David Robertson, Kevin Campbell, Stephen Lau, and Terry Ligocki. Parallelization of Radiance for Real Time Interactive Lighting Visualization Walkthroughs. In ACM/IEEE Conference on Supercomputing, Portland, OR, USA, 1999. ACM Press.

BIBLIOGRAPHY

188

[ROC97]

Ronald Rensink, Kevin O’Regan, and James Clark. To see or not to see: The need for attention to perceive changes in scenes. In Investigative Ophthalmology & Visual Science, 1997.

[RPG99]

Mahesh Ramasubramanian, Sumanta N. Pattanaik, and Donald P. Greenberg. A perceptually based physical error metric for realistic image synthesis. In SIGGRAPH ’99: Proceedings of the 26th annual conference on Computer graphics and interactive techniques, pages 73–82, New York, NY, USA, 1999. ACM Press/AddisonWesley Publishing Co.

[RSH05]

Alexander Reshetov, Alexei Soupikov, and Jim Hurley. Multi-level ray tracing algorithm. ACM Trans. Graph., 24(3):1176–1185, 2005.

[RSSF02]

Erik Reinhard, Michael Stark, Peter Shirley, and James Ferwerda. Photographic tone reproduction for digital images. In SIGGRAPH ’02: Proceedings of the 29th annual conference on Computer graphics and interactive techniques, pages 267–276, New York, NY, USA, 2002. ACM Press.

[RWP+ 95]

Holly Rushmeier, Gregory Ward, Christine Piatko, Phil Sanders, and Bert Rust. Comparing real and synthetic images: Some ideas about metrics. In Eurographics Rendering Workshop 1995, 1995.

[RWPD05] Erik Reinhard, Greg Ward, Sumanta Pattanaik, and Paul Debevec. Morgan Kaufmann Publishers, December 2005. [SCCD04]

Veronica Sundstedt, Alan Chalmers, Kirsten Cater, and Kurt Debattista. Top-down visual attention for efficient rendering of task related scenes. In Vision, Modeling and Visualization, 2004.

[Sch06]

Roland Schregle. Radiance photon map. aktiv/radiance/photon-map/, 2006.

[SCM04]

Veronica Sundstedt, Alan Chalmers, and Philippe Martinez. High fidelity reconstruction of the ancient egyptian temple of kalabsha. In AFRIGRAPH 2004. ACM SIGGRAPH, November 2004.

[SDL+ 05]

Veronica Sundstedt, Kurt Debattista, Peter Longhurst, Alan Chalmers, and Tom Troscianko. Visual attention for efficient high-fidelity graphics. In Spring Conference on Computer Graphics (SCCG 2005), May 2005.

http://www.ise.fhg.de/alt-aber-

[SFWG04] William A. Stokes, James A. Ferwerda, Bruce Walter, and Donald P. Greenberg. Perceptual illumination components: a new approach to efficient, high quality global illumination rendering. ACM Trans. Graph., 23(3):742–749, 2004.

BIBLIOGRAPHY [Shi90]

189

Peter Shirley. A ray tracing method for illumination calculation in diffuse-specular scenes. In Proceedings of Graphics Interface ’90, pages 205–12, Toronto, Ontario, 1990. Canadian Information Processing Society.

[SKDM05] Miloslaw Smky, Shin-ichi Kinuwaki, Roman Durikovic, and Karol Myszkowski. Temporally coherent irradiance caching for high quality animation rendering. In The European Association for Computer Graphics 26th Annual Conference EUROGRAPHICS 2005, volume 24 of Computer Graphics Forum, pages 401–412, Dublin, Ireland, 2005. Blackwell. [SKS02]

Peter-Pike Sloan, Jan Kautz, and John Snyder. Precomputed radiance transfer for real-time rendering in dynamic, low-frequency lighting environments. In SIGGRAPH ’02: Proceedings of the 29th annual conference on Computer graphics and interactive techniques, pages 527–536, New York, NY, USA, 2002. ACM Press.

[SM03]

Peter Shirley and R. Keith Morley. Realistic Ray Tracing. A. K. Peters, Ltd., Natick, MA, USA, 2003.

[SP89]

Francois X. Sillion and Claude Puech. A general two-pass method integrating specular and diffuse reflection. In SIGGRAPH ’89, pages 335–344. ACM Press, 1989.

[SP94]

Francois X. Sillion and Claude Puech. Radiosity and Global Illumination. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1994.

[SSH+ 98]

Philipp Slusallek, Marc Stamminger, Wolfgang Heidrich, Jan-Christian Popp, and Hans-Peter Seidel. Composite lighting simulations with lighting network. IEEE Computer Graphics and Applications, 18(2):22–31, /1998.

[SSW+ 06]

Peter Shirley, Philip Slussallek, Ingo Wald, William Mark, Gordon Stoll, Dinesh Minocha, and Abe Stephens. Interactive ray tracing. In SIGGRAPH ’06: Proceedings of the conference on SIGGRAPH 2006 course notes, 2006.

[TK96]

Jay Torborg and James T. Kajiya. Talisman: commodity realtime 3d graphics for the pc. In SIGGRAPH ’96: Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, pages 353–363, New York, NY, USA, 1996. ACM Press.

[TMD+ 04] Takehiro Tawara, Karol Myszkowski, Kirill Dmitriev, Vlastimil Havran, Cyrille Damez, and Hans-Peter Seidel. Exploiting temporal coherence in global illumination. In SCCG ’04: Proceedings of the 20th spring conference on Computer graphics, pages 23–33, New York, NY, USA, 2004. ACM Press. [TPWG02] Parag Tole, Fabio Pellacini, Bruce Walter, and Donald Greenberg. Interactive Global Illumination. In SIGGRAPH’02. ACM Press, 2002.

BIBLIOGRAPHY

190

[VF94]

Douglas Voorhies and Jim Foran. Reflection vector shading hardware. In SIGGRAPH ’94: Proceedings of the 21st annual conference on Computer graphics and interactive techniques, pages 163–166, New York, NY, USA, 1994. ACM Press.

[VG94]

Eric Veach and Leonidas J. Guibas. Bidirectional Estimators for Light Transport. In Fifthe Eurographics Workshop on Rendering, 1994.

[VG97]

Eric Veach and Leonidas J. Guibas. Metropolis light transport. In SIGGRAPH ’97: Proceedings of the 24th annual conference on Computer graphics and interactive techniques, pages 65–76, New York, NY, USA, 1997. ACM Press/Addison-Wesley Publishing Co.

[WABG06] Bruce Walter, Adam Arbree, Kavita Bala, and Donald P. Greenberg. Multidimensional lightcuts. ACM Trans. Graph., 25(3):1081–1088, 2006. [War91]

Gregory J. Ward. Adaptive shadow testing for ray tracing. In 2nd Annual Eurographics Workshop on Rendering, pages 11–20, 1991.

[War92]

Gregory Ward. Measuring and Modeling Anisotropic Reflection. In SIGGRAPH’92 - 19th International Conference on Computer Graphics and Interactive Techniques, pages 266–272. ACM Press, 1992.

[War94]

Gregory J. Ward. The radiance lighting simulation and rendering system. In SIGGRAPH ’94, pages 459–472. ACM Press, 1994.

[Wat93]

Alan Watt. 3d Computer Graphics. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1993.

[WBS02]

Ingo Wald, Carsten Benthin, and Philip Slusallek. A Simple and Practical Method for Interactive Ray Tracing of Dynamic Scenes. Technical report, Saarland University, Germany, 2002.

[WCG87]

John R. Wallace, Michael F. Cohen, and Donald P. Greenberg. A two-pass solution to the rendering equation: A synthesis of ray tracing and radiosity methods. In SIGGRAPH ’87, pages 311–320. ACM Press, 1987.

[WDG02]

Bruce Walter, George Dettrakis, and Donal P. Greenberg. Enhancing and Optimizing the Render Cache. In Thirtennth Eurographics Workshop on Rendering, 2002.

[WDP99]

Bruce Walter, George Dettrakis, and Steven Parker. Interactive Rendering using the Render Cache. In Tenth Eurographics Workshop on Rendering, pages 311–320, 1999.

[WFA+ 05]

Bruce Walter, Sebastian Fernandez, Adam Arbree, Kavita Bala, Michael Donikian, and Donald P. Greenberg. Lightcuts: a scalable approach to illumination. ACM Trans. Graph., 24(3):1098–1107, 2005.

BIBLIOGRAPHY

191

[WH92]

Gregory Ward and Paul Heckbert. Irradiance Gradients. In 3rd Annual Eurographics Workshop on Rendering, Bristol, UK, 1992.

[Whi80]

Turner Whitted. An improved illumination model for shaded display. In SIGGRAPH ’80, page 14. ACM Press, 1980.

[WIK+ 06]

Ingo Wald, Thiago Ize, Andrew Kensler, Aaron Knoll, and Steven G Parker. Ray Tracing Animated Scenes using Coherent Grid Traversal. ACM Transactions on Graphics, 2006. (Proceedings of ACM SIGGRAPH 2006).

[Wil78]

Lance Williams. Casting curved shadows on curved surfaces. In SIGGRAPH ’78: Proceedings of the 5th annual conference on Computer graphics and interactive techniques, pages 270–274, New York, NY, USA, 1978. ACM Press.

[WKB+ 02] Ingo Wald, Thomas Kollig, Carsten Benthin, Alexander Keller, and Philip Slusallek. Interactive Global Illumination using Fast Ray Tracing. In 13th EUROGRAPHICS Workshop on Rendering, Pisa, Italy, 2002. [WLH97]

Tien-Tsin Wong, Wai-Shing Luk, and Pheng-Ann Heng. Sampling with hammersley and halton points. J. Graph. Tools, 2(2):9–24, 1997.

[WLWD03] Cliff Woolley, David Luebke, Benjamin Watson, and Abhinav Dayal. Interruptible rendering. In SI3D ’03: Proceedings of the 2003 symposium on Interactive 3D graphics, pages 143–151, New York, NY, USA, 2003. ACM Press. [WPS+ 03]

Ingo Wald, Timothy Purcell, Jorg Schmitter, Philip Slusallek, and Carsten Benthin. Realtime Ray Tracing and its use for Global Illumination. In Eurographics 2003 State of the Art Reports, pages 85–121, 2003.

[WRC88]

Gregory J. Ward, Francis M. Rubinstein, and Robert D. Clear. A ray tracing solution for diffuse interreflection. In SIGGRAPH ’88, pages 85–92. ACM Press, 1988.

[WS01]

Ingo Wald and Philip Slusallek. State-of-the-Art in ”Interactive Ray Tracing”. In EuroGraphics 2001, State of the Art Reports, pages 21–42, Manchester, United Kingdom, September 2001.

[WSB01]

Ingo Wald, Philip Slusallek, and Carsten Benthin. Interactive Distributed Ray Tracing of Highly Complex Models. In 12th EUROGRAPHICS Workshop on Rendering, pages 274–285, London, United Kingdom, June 2001.

[WSBW01] Ingo Wald, Philip Slusallek, Carsten Benthin, and Markus Wagner. Interactive Rendering With Coherent Raytracing. In EUROGRAPHICS 2001, pages 153–164, Manchester, United Kingdom, September 2001. [WSS05]

Sven Woop, Jorg Schmittler, and Philipp Slusallek. Rpu: a programmable ray processing unit for realtime ray tracing. ACM Trans. Graph., 24(3):434–444, 2005.

[Yar67]

A.L. Yarbus. Eye movements during perception of complex objects. In Eye Movements and Vision, pages 171–196, 1967.

[YPG01]

Hector Yee, Sumata Pattanaik, and Donald P. Greenberg. Spatiotemporal sensitivity and Visual Attention for efficient rendering of dynamic Environments. In ACM Transactions on Computer Graphics, volume 20, pages 39–65, 2001.

192