. Hardware/
Software Co-Design. University of Erlangen-Nuremberg. 25. September 2009.
Multi-Core Architectures and Programming
Speeding up autopano-sift with CUDA Wolfgang Schnurrer Christopher Dreher
[email protected] [email protected]
Hardware/Software Co-Design University of Erlangen-Nuremberg
25. September 2009
University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher
1
Agenda
1
Introduction
2
Implementation
3
Results
4
Conclusion
University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher
2
Basics autopano-sift automatically creates control points for groups of overlapping photographs.
University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher
3
Basics autopano-sift automatically creates control points for groups of overlapping photographs. generatekeys uses the sift algorithm to find important keypoints on every image I
load image to a ImageMap structure (CUDA)
I
SIFT-LoweDetector (partially CUDA) * upscale image with b-linear interpolation (CUDA) * gaussian convolution (CUDA) * ... further steps in chapter SIFT algorithm on page 5
University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher
3
Basics continuation autopano generates panorama project file(pto) from keypoint data of all images. I
keypoint matching * keypoints are merged into a large kd-tree (128 dimensions) * for every point a nearest neighbour“is searched (BBF) and the ” matches are grouped in partitions1 * partitions are checked for geometric consistency using an algorithm called RANSAC2
I
succesful matches are grouped into partitions and combined in control points
1 this searching step is still the most time consumptive and forbids real-time application use of the SIFT algorithm 2 RANdom SAmple Consensus
http://www.npac.syr.edu/projects/
University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher
4
SIFT algorithm I The Scale Invariant Feature Transform algorithm, proposed by D. Lowe in 1999 identifies visually distinct features(keypoints) in an image and creates Feature Vectors. These features are invariant to image scale and rotation.
Abbildung: The keypoint is defined by its location x, y, scale and orientation
University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher
5
SIFT algorithm II - The 5 steps 1 Convert into intensity, up-sample and prefilter (CUDA)
Abbildung: step 1
2 Build Gaussian image pyramids and calculate DoG
Abbildung: Difference of Gaussian Pyramids (one octave), nach [3] University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher
6
SIFT algorithm II - The 5 steps cont. 3 Keypoint detection
Abbildung: find local minima and maxima, aus [3]
University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher
7
SIFT algorithm II - The 5 steps cont. 4 Compute feature orientations 5 Compute feature descriptors(vectors)
Abbildung: keypoint descriptor, aus [3]
University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher
8
Short Examples
Abbildung: unfiltered keypoints Abbildung: matched keypoints without filtering University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher
9
Short Examples cont.
Abbildung: panorama with 4x 640x480 images
University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher
10
Code Design source of our project was autopano-sift-c which is a part of the hugin project + no .net/mono environment needed - unlovely code due to the porting from the object oriented code high-level transparency reuse the source project and apply the same functions and signatures to achieve an easy way to switch between cpu and gpu computation.
#ifdef CPU #ifdef GPU #ifdef DEBUG
University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher
11
Code Snippets libgeneratekeys cuda.so is our shared library which links to the existing autopano-sift-c GPU CUDA vars globalpointer=CUDA init struct(pic−>width,pic−>height); DisplayImage ConvertToImageMapDevice cuda(pic); #endif //GPU #ifdef
//der alte aufruf, der die imagemap wieder runtergeladen hat //ImageMap∗ picMap1 = DisplayImage ConvertToImageMap cuda(pic); #ifdef DEBUG ImageMap∗ picMap1 = CUDA Download D2H ImageMap(); #endif //DEBUG CPU CPUTIME START ImageMap∗ picMap = DisplayImage ConvertToImageMap(pic); CPUTIME STOP #endif //CPU #ifdef
University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher
12
Speedup Timings (in seconds): Computation of a 2816x2112 image
(1) ImageMap CPU 0.100000 GPU 0.000801 (factor 125) (2) scaleDouble CPU 0.320000 GPU 0.213126 (factor 1.5) (3) prefiltering with Gauss CPU 6.510000 GPU (0.203379+0.202111)=0.40549 (factor 16)
University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher
13
Plot Time in s 6 q(CPU) 0.9 0.8 0.7 0.6 !q ! 0.5 ! 0.4 q!!! 0.3 0.2 0.1 (GPU) q q q 600x480
800x600
1024x768
- Picture Size
Abbildung: GPU vs. CPU Timings
University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher
14
Demo
Abbildung: Livedemo: Howto make a nice stichted panorama image
University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher
15
Impressions
* image processing is a big field for parallel computation * missing memory management leads to hours of debugging * porting a complete project to cuda and evaluate the results takes a lot of more time than the week we had
University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher
16
References M. Brown and D.G. Lowe. Automatic panoramic image stitching using invariant features. International Journal of Computer Vision, 74(1):59–73, 2007. S. Heymann, K. Muller, A. Smolic, B. Frohlich, and T. Wiegand. SIFT implementation and optimization for general-purpose GPU. In Proceedings of the International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision, 2007. D.G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91–110, 2004. S.N. Sinha, J.M. Frahm, M. Pollefeys, and Y. Genc. GPU-based video feature tracking and matching. In EDGE, Workshop on Edge Computing Using New Commodity Architectures, volume 278. Citeseer, 2006. A. Vedaldi and B. Fulkerson. VLFeat: An open and portable library of computer vision algorithms. http://www.vlfeat.org/, 2008. C. Wu. SiftGPU. Web: http://cs. unc. edu/ ccwu/siftgpu. University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher
17
Questions
University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher
18