Nowadays, there is a growing interest in Computer Vision algorithms able to work on mobile platform (e.g. .... The percentage of true positives is computed.
On the Performances of Computer Vision Algorithms on Mobile Platforms S. Battiatoa , G. M. Farinellaa , E. Messinaa , G. Puglisia , D. Rav`ıa , A. Caprab , V. Tomasellib a University
of Catania, Viale A. Doria, Catania, Italy Image Processing Lab http://iplab.dmi.unict.it b AST - Computer Vision, STMicroelectronics Catania, Italy ABSTRACT Computer Vision enables mobile devices to extract the meaning of the observed scene from the information acquired with the onboard sensor cameras. Nowadays, there is a growing interest in Computer Vision algorithms able to work on mobile platform (e.g., phone camera, point-and-shot-camera, etc.). Indeed, bringing Computer Vision capabilities on mobile devices open new opportunities in different application contexts. The implementation of vision algorithms on mobile devices is still a challenging task since these devices have poor image sensors and optics as well as limited processing power. In this paper we have considered different algorithms covering classic Computer Vision tasks: keypoint extraction, face detection, image segmentation. Several tests have been done to compare the performances of the involved mobile platforms: Nokia N900, LG Optimus One, Samsung Galaxy SII. Keywords: Mobile Devices, Computer Vision, Face Detection, Image Segmentation, Feature Extraction
1. INTRODUCTION In recent years there is a growing interest in new technology to be employed in the context of mobile devices. Despite today’s mobile devices (e.g., smartphone, tablet, etc.) are still limited in terms of resources (e.g., processor speed, available RAM, etc.), novel Computational Photography solutions are available to build appealing imaging applications that cannot be performed before.1, 2 The main idea is to overcome the limitation of traditional imaging devices by using computational methods which can exploit the different inputs offered by a mobile devices (e.g., from low level data, such as Bayer pattern, GPS position, etc.).3 Since different cameras are usually embedded in devices of new generation, Computer Vision algorithms will be extremely useful in many mobile applications of next future. For example, visual tracking can be exploited to interact with video games, or the recognition of the visual content could help in building new applications in the context of cultural heritage (e.g., giving back information on a recognized archeological site). The main contribution of this work is related to the porting and testing of Computer Vision algorithms on mobile platforms. Specifically, different algorithms covering the classic tasks in Computer Vision have been considered: keypoint extraction, face detection, image segmentation. The porting has been performed considering the operating systems Maemo and Android which are typically used in Nokia N900, LG Optimus One, Samsung Galaxy SII. These operating systems have been considered because they can be easily extended with customized libraries and/or programs and provide a standardized and fairly widespread API (Application Program Interface). It is worth noting that the considered Computer Vision algorithms could be optimized to properly work on low resources devices. For instance, the FCAM library4 available for N900 Nokia smartphone, allows to interact with the low level algorithms (e.g., demosaicing, white balancing, denoising, etc.) and data (Bayer pattern) involved in the imaging pipeline.3 In this way a better algorithmic design taking into account constrained resources devices can be done. Comparative tests have been performed to asses both quantitatively and qualitatively the performances of the considered Computer Vision algorithms on the aforementioned mobile devices. The remainder of the paper is organized as follows: Section 2 introduces the considered operating systems and computational platforms whereas Section 3 briefly reviews the involved Computer Vision algorithms used in our tests. Section 4 reports experiments and discusses the performances of the considered mobile devices. Finally, Section 5 concludes the paper with avenues for further research.
2. OPERATING SYSTEMS AND COMPUTATIONAL PLATFORMS Maemo Operating System is based on GNU Linux Kernel. This means that many development tools (such as gcc, make, etc.), which generally are used in a desktop computer, are integrated in Maemo OS. The porting of useful Computer Vision library such as OpenCV Library,5 is hence straightforward. The framework used to write new applications in Maemo OS is Qt;6 it is cross platform, written in C++ adding a layer of abstraction to access the low level functions (gui, network, gps, etc.). Qt applications can be easily cross-compiled in many different platforms, such as Maemo/MeeGo, Symbian, Windows Mobile, desktop PC, etc. The Nokia N900 mobile has been used for testing OpenCV algorithms considering Maemo OS. This device has a high-end OAP 3430 ARM Cortex A8 as main processor, running at 600 MHz. The GPU is the PowerVR SGX 530 which supports OpenGL ES2.0. The TMS320C64x processor, working at 430 MHz, is used to run the image processing (camera), audio processing and data transmission. The system has 256 MB of dedicated high performance RAM (Mobile DDR) paired with access to 768 MB swap space managed by the Maemo OS. This provides a total virtual memory of 1 GB. Like Maemo, Android Operating System is also based on a Linux kernel. The Android operating system is one of the currently world’s best-selling Smartphone platform and has a large community of developers. There are currently over 200,000 apps available on market. The applications for the Android platform are Java based and run on a Dalvik virtual machine featuring JIT compilation, this means a continuously translation and caching of code to minimize performance degradation. Unlike Maemo, the Android OS does not have a native support for the full set of standard GNU libraries, and this makes difficult to port existing GNU/Linux applications. It has, instead, its own C library. Therefore, the OpenCV Library cant be directly compiled, since the Java Native Interface (JNI) programming framework is required to interact with the Java classes. Android offers the possibility of programming in C/C++ using the Native Dev Kit (NDK), together with the standard SDK. To simplify the wrapping of OpenCV code to the JNI functions, we used the SWIG tool7 to bind programs written in C/C++ with Java. It works by taking the declarations found in header files and using them to generate the wrapper code that scripting languages need to access the underlying native code. To test OpenCV algorithms on Android OS we exploited two different devices: LG Optimus One and Samsung Galaxy SII. The first device has a camera of 3.15 Mpx (2048x1536 pixels) and a VGA video resolution at 18fps. It has a Qualcomm MSM7227 chipset ARM CPU running at 600 MHz and 512 MB of RAM. The second device has two cameras: one rear of 8.1 Mpx (3264x2448 pixels) and one frontal of 2,0 Mpx. The video resolution is 1080p at 30 fps. A Cortex-A9 ARM CPU with 2 x 1,2 GHz (dual core) is embedded with a 1.024 MB of RAM.
2.1 Involved Computer Vision Algorithms The main tasks considered to compare the selected mobile platforms are: • Feature extraction; • Face detection; • Image segmentation. Feature extraction algorithms are typically used to detect the point of interests (called also keypoints) in an image.8 These features can be used for many purposes: image registration, visual tracking, image retrieval, etc. In our test we employed FAST,9 STAR10 and SURF11 as implemented in OpenCV Library.5 The face detection method used in our tests is based on well-known object detection algorithm proposed by Viola and Jones.12 During the training phase, a boosting cascade classifier (working with haar-like features) is trained on samples of a particular object (e.g., face), called positive examples, and on a set of arbitrary images (not faces), called negative examples. After training phase, the face detector is applied to the regions of interest in an input image. The algorithm considered to test image segmentation is Graph Cuts.13 This algorithm considers the segmentation as an energy minimization problem which is reduced to an instance of maximum flow in a graph. Segmentation is hence performed taking into account the max-flow min cut theorem.
Face Detection LG Optimus One
Samsung SII
Nokia N900
15 13,5 12 10,5
Fps
9 7,5 6 4,5 3 1,5 0,5
0,4
0,3
0,2
Scale
Figure 1. Average frame rate (FPS) vs. scale parameter.
2.2 Experimental Results Tests have been performed to estimate the performances of the different involved mobile platforms. A significant effort has been done to properly configure the different devices in order to run the various algorithms. A first test has been conducted considering the face detection algorithm implemented as in the OpenCV library.5 Videos have been acquired with a VGA resolution (640 × 480) and the performances in terms of fps (frame per second) have been measured at varying of the minimum scale parameter (i.e., face smaller than that are ignored). As can be seen from Fig. 1 all the considered devices decrease their computational performance (fps) at decreasing of the minimum scale parameter. A lower minimum scale implies a higher number of patches to be analyzed. Samsung Galaxy SII, as expected considering its hardware specifications, outperforms the other devices. The second test consists on a quantitative and qualitative analysis of the Graph Cuts algorithm on the Nokia N900 device. Also this algorithm has been implemented by using OpenCV library. Fig. 2 reports the average segmentation time (sec) at different iteration steps vs. image resolution. As can be easily seen from the segmentation results (Fig. 3) better results in terms of image segmentation are obtained increasing the number of iterations and hence the overall computational time. Computational tests have been also performed considering the feature point extraction task. Several well-known feature point detectors, implemented in OpenCV, have been considered: FAST,9 STAR10 and SURF.11 FAST feature detector outperforms the other ones in terms of computational time in all the considered platforms (Fig. 4, Fig. 5 Fig. 6). As already reported in the previous experiments, Samsung Galaxy SII is able to obtain higher fps with respect to the other devices. Comparison between face detection performed on RGB images (i.e., images obtained after complex demosaicing procedure) with respect to the one which takes into account information obtained from the Bayer pattern3 has been performed by considering a dataset with more than 1000 images containing a faces (one for each image). From the original images the corresponding Bayer pattern has been obtained by subsampling the different channels by considering the pattern GRBG. The considered Bayer pattern corresponds to the one used in the Nokia N900 sensor. Since the face detector takes a grayscale image as input, the original colour images have been converted in gray scale, whereas the green pattern has been used for testing the performances of the detector starting from the Bayer pattern. Specifically, the missing values into the gray pattern have been filled with the average of the surrounding pixels to form the input images for the detector. In Fig. 7 the obtained results are reported. The percentage of true positives is computed as the ratio of the properly detected faces and the total number of faces within the considered dataset. The false positives values have been obtained as the ratio of the erroneously detected faces (i.e., detected patches not containing faces) and the total number of patches detected by the algorithm as faces (i.e., patches detected which contain both faces or not). The results indicate that faces can be detected, and eventually used to support other algorithms involved into the image generation pipeline, before a complex demosaicing procedure by maintaining the detection performances.
+%&,-./0$ 1.2(3'&.2455 800x600
640x480
480x360
320x240
240x180
250
200
!"
150
100
50
0 1
2
3
4
5
6
#$!%&$'()*
Figure 2. Average segmentation time at different iteration steps vs. image resolution.
Figure 3. First row: the input image with the background and foreground seeds provided to Graph Cuts. Second and third rows: Visual assessment of the segmentation results at different iterations. These screenshots are captured directly on the N900.
Feature Extraction: Nokia N900 Fast
Star
Surf
35
30
25
FPS
20
15
10
5
0 320x240
400x300
640x480
Resolution
Figure 4. Average frame rate (FPS) vs. image resolution on the Nokia platform.
$,)(-$./0)-,1)*&+2.34.56)*7(%.5+$ ,%)
"),-
"(-8
35
30
25
!"
20
15
10
5
0 320x240
400x300
640x480
#$%&'()*&+
Figure 5. Average frame rate (FPS) vs. image resolution on the LG platform.
Feature Extraction: Samsung SII Fast
Star
Surf
35
30
25
FPS
20
15
10
5
0 320x240
400x300
640x480
Resolution
Figure 6. Average frame rate (FPS) vs. image resolution on the Samsung platform.
Face Detection: RGB vs Bayer Pattern RGB
Bayer Pattern
100,00 90,00 80,00
Percentage
70,00 60,00 50,00 40,00 30,00 20,00 10,00 0,00 True Positives
False Alarms
Figure 7. Face detection results considering Bayer pattern data and RGB images.
3. CONCLUSIONS In this paper we have tested several Computer Vision algorithms on mobile devices. Different classical tasks of Computer Vision have been considered: keypoint extraction, face detection, image segmentation. Their computational performances have been then tested on Nokia N900, LG Optimus One, Samsung Galaxy SII with Maemo and Android operating systems. Finally, low level data (Bayer pattern) has been used to perform face detection instead of considering the final RGB image produced by the image pipeline obtaining hence a considerable time complexity reduction. Future works will be devoted to exploits low level data to improve the performance of the considered computer vision algorithm.
REFERENCES [1] Farinella, G. and Battiato, S., “Scene classification in compressed and constrained domain,” Computer Vision, IET 5(5), 320 –334 (2011). [2] Puglisi, G. and Battiato, S., “A robust image alignment algorithm for video stabilization purposes,” IEEE Transactions on Circuits and Systems for Video Technology 21(10), 1390–1400 (2011). [3] Battiato, S., Bruna, A. R., Messina, G., and Puglisi, G., eds., [Image Processing for Embedded Devices ], Bentham Science Publisher (2010). [4] Adams, A., Talvala, E.-V., Park, S. H., Jacobs, D. E., Ajdin, B., Gelfand, N., Dolson, J., Vaquero, D., Baek, J., Tico, M., Lensch, H. P. A., Matusik, W., Pulli, K., Horowitz, M., and Levoy, M., “The frankencamera: an experimental platform for computational photography,” ACM Transactions on Graphics 29, 29:1–29:12 (July 2010). [5] http://opencv.willowgarage.com/wiki/. [6] http://qt.nokia.com/. [7] Beazley, D. M., “Swig: an easy to use tool for integrating scripting languages with c and c++,” in [Proceedings of the 4th conference on USENIX Tcl/Tk Workshop, 1996 - Volume 4], 15–15, USENIX Association, Berkeley, CA, USA (1996). [8] Szeliski, R., “Computer vision : Algorithms and applications,” Computer 5, 832 (2010). [9] Rosten, E. and Drummond, T., “Machine learning for high-speed corner detection,” in [Proceedings of the 9th European Conference on Computer Vision (ECCV 2006) ], 430–443 (2006). [10] Agrawal, M., Konolige, K., and Blas, M. R., “Censure: Center surround extremas for realtime feature detection and matching.,” in [Proceedings of the 10th European Conference on Computer Vision (ECCV 2008) ], Forsyth, D. A., Torr, P. H. S., and Zisserman, A., eds., Lecture Notes in Computer Science 5305, 102–115, Springer (2008). [11] Funayama, R., Yanagihara, H., Van Gool, L., Tuytelaars, T., and Bay, H., “Robust interest point detector and descriptor,” (September 2009). [12] Viola, P. and Jones, M., “Robust real-time object detection,” in [International Journal of Computer Vision], (2001). [13] Zabih, R. and Kolmogorov, V., “Spatially coherent clustering using graph cuts,” in [Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2004) ], 2, II–437 – II–444 Vol.2 (june-2 july 2004).