Multi-Core Architectures and Programming Speeding up ... - CS 12

8 downloads 29 Views 2MB Size Report
Christopher Dreher [email protected] [email protected]. Hardware/ Software Co-Design. University of Erlangen-Nuremberg. 25. September 2009.
Multi-Core Architectures and Programming

Speeding up autopano-sift with CUDA Wolfgang Schnurrer Christopher Dreher [email protected] [email protected]

Hardware/Software Co-Design University of Erlangen-Nuremberg

25. September 2009

University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher

1

Agenda

1

Introduction

2

Implementation

3

Results

4

Conclusion

University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher

2

Basics autopano-sift automatically creates control points for groups of overlapping photographs.

University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher

3

Basics autopano-sift automatically creates control points for groups of overlapping photographs. generatekeys uses the sift algorithm to find important keypoints on every image I

load image to a ImageMap structure (CUDA)

I

SIFT-LoweDetector (partially CUDA) * upscale image with b-linear interpolation (CUDA) * gaussian convolution (CUDA) * ... further steps in chapter SIFT algorithm on page 5

University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher

3

Basics continuation autopano generates panorama project file(pto) from keypoint data of all images. I

keypoint matching * keypoints are merged into a large kd-tree (128 dimensions) * for every point a nearest neighbour“is searched (BBF) and the ” matches are grouped in partitions1 * partitions are checked for geometric consistency using an algorithm called RANSAC2

I

succesful matches are grouped into partitions and combined in control points

1 this searching step is still the most time consumptive and forbids real-time application use of the SIFT algorithm 2 RANdom SAmple Consensus

http://www.npac.syr.edu/projects/

University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher

4

SIFT algorithm I The Scale Invariant Feature Transform algorithm, proposed by D. Lowe in 1999 identifies visually distinct features(keypoints) in an image and creates Feature Vectors. These features are invariant to image scale and rotation.

Abbildung: The keypoint is defined by its location x, y, scale and orientation

University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher

5

SIFT algorithm II - The 5 steps 1 Convert into intensity, up-sample and prefilter (CUDA)

Abbildung: step 1

2 Build Gaussian image pyramids and calculate DoG

Abbildung: Difference of Gaussian Pyramids (one octave), nach [3] University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher

6

SIFT algorithm II - The 5 steps cont. 3 Keypoint detection

Abbildung: find local minima and maxima, aus [3]

University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher

7

SIFT algorithm II - The 5 steps cont. 4 Compute feature orientations 5 Compute feature descriptors(vectors)

Abbildung: keypoint descriptor, aus [3]

University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher

8

Short Examples

Abbildung: unfiltered keypoints Abbildung: matched keypoints without filtering University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher

9

Short Examples cont.

Abbildung: panorama with 4x 640x480 images

University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher

10

Code Design source of our project was autopano-sift-c which is a part of the hugin project + no .net/mono environment needed - unlovely code due to the porting from the object oriented code high-level transparency reuse the source project and apply the same functions and signatures to achieve an easy way to switch between cpu and gpu computation.

#ifdef CPU #ifdef GPU #ifdef DEBUG

University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher

11

Code Snippets libgeneratekeys cuda.so is our shared library which links to the existing autopano-sift-c GPU CUDA vars globalpointer=CUDA init struct(pic−>width,pic−>height); DisplayImage ConvertToImageMapDevice cuda(pic); #endif //GPU #ifdef

//der alte aufruf, der die imagemap wieder runtergeladen hat //ImageMap∗ picMap1 = DisplayImage ConvertToImageMap cuda(pic); #ifdef DEBUG ImageMap∗ picMap1 = CUDA Download D2H ImageMap(); #endif //DEBUG CPU CPUTIME START ImageMap∗ picMap = DisplayImage ConvertToImageMap(pic); CPUTIME STOP #endif //CPU #ifdef

University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher

12

Speedup Timings (in seconds): Computation of a 2816x2112 image

(1) ImageMap CPU 0.100000 GPU 0.000801 (factor 125) (2) scaleDouble CPU 0.320000 GPU 0.213126 (factor 1.5) (3) prefiltering with Gauss CPU 6.510000 GPU (0.203379+0.202111)=0.40549 (factor 16)

University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher

13

Plot Time in s 6 q(CPU) 0.9   0.8  0.7   0.6 !q ! 0.5 ! 0.4 q!!! 0.3 0.2 0.1 (GPU) q q q 600x480

800x600

1024x768

- Picture Size

Abbildung: GPU vs. CPU Timings

University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher

14

Demo

Abbildung: Livedemo: Howto make a nice stichted panorama image

University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher

15

Impressions

* image processing is a big field for parallel computation * missing memory management leads to hours of debugging * porting a complete project to cuda and evaluate the results takes a lot of more time than the week we had

University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher

16

References M. Brown and D.G. Lowe. Automatic panoramic image stitching using invariant features. International Journal of Computer Vision, 74(1):59–73, 2007. S. Heymann, K. Muller, A. Smolic, B. Frohlich, and T. Wiegand. SIFT implementation and optimization for general-purpose GPU. In Proceedings of the International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision, 2007. D.G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91–110, 2004. S.N. Sinha, J.M. Frahm, M. Pollefeys, and Y. Genc. GPU-based video feature tracking and matching. In EDGE, Workshop on Edge Computing Using New Commodity Architectures, volume 278. Citeseer, 2006. A. Vedaldi and B. Fulkerson. VLFeat: An open and portable library of computer vision algorithms. http://www.vlfeat.org/, 2008. C. Wu. SiftGPU. Web: http://cs. unc. edu/ ccwu/siftgpu. University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher

17

Questions

University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher

18