Multi-Core Architectures and Programming Speeding up ... - CS 12

Multi-Core Architectures and Programming

Speeding up autopano-sift with CUDA Wolfgang Schnurrer Christopher Dreher [email protected] [email protected]

Hardware/Software Co-Design University of Erlangen-Nuremberg

25. September 2009

University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher

1

Agenda

1

Introduction

2

Implementation

3

Results

4

Conclusion


2

Basics autopano-sift automatically creates control points for groups of overlapping photographs.


3

Basics autopano-sift automatically creates control points for groups of overlapping photographs. generatekeys uses the sift algorithm to find important keypoints on every image I

load image to a ImageMap structure (CUDA)

I

SIFT-LoweDetector (partially CUDA) * upscale image with b-linear interpolation (CUDA) * gaussian convolution (CUDA) * ... further steps in chapter SIFT algorithm on page 5


3

Basics continuation autopano generates panorama project file(pto) from keypoint data of all images. I

keypoint matching * keypoints are merged into a large kd-tree (128 dimensions) * for every point a nearest neighbour“is searched (BBF) and the ” matches are grouped in partitions1 * partitions are checked for geometric consistency using an algorithm called RANSAC2

I

succesful matches are grouped into partitions and combined in control points

1 this searching step is still the most time consumptive and forbids real-time application use of the SIFT algorithm 2 RANdom SAmple Consensus

http://www.npac.syr.edu/projects/


4

SIFT algorithm I The Scale Invariant Feature Transform algorithm, proposed by D. Lowe in 1999 identifies visually distinct features(keypoints) in an image and creates Feature Vectors. These features are invariant to image scale and rotation.

Abbildung: The keypoint is defined by its location x, y, scale and orientation


5

SIFT algorithm II - The 5 steps 1 Convert into intensity, up-sample and prefilter (CUDA)

Abbildung: step 1

2 Build Gaussian image pyramids and calculate DoG

Abbildung: Difference of Gaussian Pyramids (one octave), nach [3] University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher

6

SIFT algorithm II - The 5 steps cont. 3 Keypoint detection

Abbildung: find local minima and maxima, aus [3]


7

SIFT algorithm II - The 5 steps cont. 4 Compute feature orientations 5 Compute feature descriptors(vectors)

Abbildung: keypoint descriptor, aus [3]


8

Short Examples

Abbildung: unfiltered keypoints Abbildung: matched keypoints without filtering University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher

9

Short Examples cont.

Abbildung: panorama with 4x 640x480 images


10

Code Design source of our project was autopano-sift-c which is a part of the hugin project + no .net/mono environment needed - unlovely code due to the porting from the object oriented code high-level transparency reuse the source project and apply the same functions and signatures to achieve an easy way to switch between cpu and gpu computation.

#ifdef CPU #ifdef GPU #ifdef DEBUG


11

Code Snippets libgeneratekeys cuda.so is our shared library which links to the existing autopano-sift-c GPU CUDA vars globalpointer=CUDA init struct(pic−>width,pic−>height); DisplayImage ConvertToImageMapDevice cuda(pic); #endif //GPU #ifdef

//der alte aufruf, der die imagemap wieder runtergeladen hat //ImageMap∗ picMap1 = DisplayImage ConvertToImageMap cuda(pic); #ifdef DEBUG ImageMap∗ picMap1 = CUDA Download D2H ImageMap(); #endif //DEBUG CPU CPUTIME START ImageMap∗ picMap = DisplayImage ConvertToImageMap(pic); CPUTIME STOP #endif //CPU #ifdef


12

Speedup Timings (in seconds): Computation of a 2816x2112 image

(1) ImageMap CPU 0.100000 GPU 0.000801 (factor 125) (2) scaleDouble CPU 0.320000 GPU 0.213126 (factor 1.5) (3) prefiltering with Gauss CPU 6.510000 GPU (0.203379+0.202111)=0.40549 (factor 16)


13

Plot Time in s 6 q(CPU) 0.9 0.8 0.7 0.6 !q ! 0.5 ! 0.4 q!!! 0.3 0.2 0.1 (GPU) q q q 600x480

800x600

1024x768

- Picture Size

Abbildung: GPU vs. CPU Timings


14

Demo

Abbildung: Livedemo: Howto make a nice stichted panorama image


15

Impressions

* image processing is a big field for parallel computation * missing memory management leads to hours of debugging * porting a complete project to cuda and evaluate the results takes a lot of more time than the week we had


16

References M. Brown and D.G. Lowe. Automatic panoramic image stitching using invariant features. International Journal of Computer Vision, 74(1):59–73, 2007. S. Heymann, K. Muller, A. Smolic, B. Frohlich, and T. Wiegand. SIFT implementation and optimization for general-purpose GPU. In Proceedings of the International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision, 2007. D.G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91–110, 2004. S.N. Sinha, J.M. Frahm, M. Pollefeys, and Y. Genc. GPU-based video feature tracking and matching. In EDGE, Workshop on Edge Computing Using New Commodity Architectures, volume 278. Citeseer, 2006. A. Vedaldi and B. Fulkerson. VLFeat: An open and portable library of computer vision algorithms. http://www.vlfeat.org/, 2008. C. Wu. SiftGPU. Web: http://cs. unc. edu/ ccwu/siftgpu. University of Erlangen-Nuremberg Wolfgang Schnurrer Christopher Dreher

17

Questions


18

Multi-Core Architectures and Programming Speeding up ... - CS 12

Multi-Core Architectures and Programming Speeding up ... - CS 12

Suggest Documents

Speeding up Dynamic Programming

Speeding up Genetic Programming - Universidade de Coimbra

Approximation Algorithms for Speeding up Dynamic Programming and ...

OpenMP in Multicore Architectures - CiteSeerX

Speeding up Q( )-learning

Multicore and GPU Programming - TECDIS

Speeding up External Mergesort

Multicore and GPU Programming - TECDIS

Utilizing Parallization and Multicore Architectures for ... - CiteSeerX

Speeding up Q( )-learning

Speeding up Dynamic Programming for Some NP-hard ... - Description

Speeding up Dynamic Programming with Application to the

Speeding up the Spread - CIHC

Speeding Up QUIZ with Suprtool

Speeding up the Spread - CIHC

Speeding up of microstructure reconstruction

A multithreaded communication engine for multicore architectures

speeding up snakes - TU Chemnitz

Inductively Speeding Up Logic Programs

Speeding Up the ESG Algorithm

Issues in embedded single-chip multicore architectures

Program Execution on Reconfigurable Multicore Architectures - arXiv

Issues in embedded single-chip multicore architectures

Parallel Graph Partitioning on Multicore Architectures - Computer ...