Computer vision based object recognition principles ...

4 downloads 149275 Views 472KB Size Report
Windows, Linux, Android and MAC. .... Figure 4 Image center detection using Moments (yellow dot is .... Development Operational Programme funded by the.
ICETA 2013 • 11th IEEE International Conference on Emerging eLearning Technologies and Applications • October 24-25, 2013, Stary Smokovec, The High Tatras, Slovakia

Computer vision based object recognition principles in education J. Lámer, D. Cymbalak and F. Jakab Department of Computers and Informatics Technical University of Košice, Letná 9, 042 00 Košice, [email protected], [email protected], [email protected] images, and so on. This is the simplest case of detection. The complex problems need complex solutions. Therefore complex problems are solved like in human brain. It means that all problems are divided into few small problems and evaluated independently first. Then all particular results are combined, evaluated and brings up final result. [3]

Abstract— This article describes the simple step by step system, to teach object recognition and tracking in computer vision systems. Methodology is based on object recognition system complexity incrementation. Student doesn’t need any knowledge about mathematical principles of computer based object detection. This is achieved using OpenCV and other high level libraries. Article consists from explanation of simple detection methods based on OpenCv, to complex pattern-based detection methods provided by OpenTLD libraries. All library recognition methods are explained in C/C++ language and results are illustrated as graphical output images.

B. Object detection and tracking Object detection in an image is used for identification and classification of these objects and to determine its characteristics such as size, rotation as well as position. To recognize an object in an image is necessary to divide the image into segments, reduce less important parts of the image, or to extract important parts of the image. Reduction or extraction of elements in the image can be realized by applications preprocessing filters on the image or by the description of the image based on its symptoms.[11] In terms of tracking the object in the video is important to realize the detection of the object on series of selected graduated frames of video. In each processed frame with successful detection of the object, the position of the object related to the shooting scene is calculated. The trajectory of the movement of object is defined by a series of information about object position in time, whether in the form of a sequence of points with coordinates or in form of the function.[11] Typical object tracking system consists of three components: a representation of the object, dynamic model and search engine. The object can be represented either holistic descriptors such as color histogram or the value of the brightness level of a pixel, or it may be represented by the local descriptors such as local histogram or color information. The dynamic model is typically used to simplify the computational complexity in tracking of the object. The search engine is used to optimize the object tracking and it can use deterministic and also stochastic methods. An important element of tracking methods is the motion model, which can represent for example the translational motion, the similarity transformation or the affine transformation. [11]

I. INTRODUCTION The main problem that this work solves is to explain many systems for computer vision purposes and select one, which will be the best for education purposes. Computer vision is a field of science, that is trying to replace human eyesight, recognizing objects in real world from digital images or videos using sophisticated methods and transform it to other representation (for example to binary objects). For education purposes is necessary, that the best candidate of recognition system will be opensource and free and must have easy to understand application interface for student (programmer). In market are various systems (libraries), which pass through these criteria. Many of they are a part or a result of various research projects, other are a single products of commercial sphere. The main target is to find the best recognition system for education purposes, that fulfill these criteria and explain the basic principles of object detection on it. The last part of this article is devoted to other advanced tools for object detection in computer vision. II.

ANALYSIS

A. Computer vision Is a kind of transformation from real word to new representation in word of computers -binary systems. All kind of transformations are used to achieve the particular goals. The result is reached with evaluating all particular goals. Operating data is separated to input and output part. The input part is a kind of calibration data like “wanted object wearing in red” or “find all round objects”. The output part may be like “person with red clothing detected” or “found 6 round objects”. [3] The transformation principles are many. For example turning a image from color image to binary black/white image using thresholds, removing camera motion from

978-1-4799-2162-1/13/$31.00 ©2013 IEEE

C. Tools for computer vision There are many tools to realize computer vision. The most known are OpenCV, VXL, LTI and OpenTLD. Qualcomm provides computer vision library called FastCV. This library is optimized for mobile devices and is licensed under its own license (FastCV SDK License Agreement). [6]

253

J. Lámer et al. • Computer Vision-based Object Recognition Principles in Education

With all its advantages is OpenCV the better candidate to choose from described libraries.

1) VXL The Vision-something-Libraries are the computer vision C++ based libraries created with emphasis on light, fast and consistent system. This library is portable over many platforms. Core of VXL is divided into few parts like numerics (vnl), imaging (vil), geometry (vgl), streaming (vsl), basic templates (vbl), utilities (vul) and much more. Libraries are self-independed. They can be used separately. [4] 2) LTI This library is Object oriented primarily used in image processing and computer vision. It was made on Aachen University of Technology as a part of other computer and robotics research projects. The main idea of this library is to simplify the code sharing and maintenance as much as is possible. LTI is available for Linux and Windows NT. The main parts are Linear algebra, Classification and Clustering, Image processing and Visualization and Drawing Tools. Library is usable under GNU license. [5] 3) OpenCV This Open Source Computer Vision software library is a kind of high level application interface, which provides high level computer vision and machine learning functionalities. The main idea is to provide a common infrastructure for computer vision applications and to decrease the software developing time. BSD license system guarantee free use in the field of business and makes its code modifiable. The library has about 2500 optimized algorithms. Algorithms are usable in face detection and recognition, object recognition, classify human actions in videos, track camera movements, track moving objects, extract 3D models of objects, produce 3D point clouds from stereo cameras, stitch images together to produce a high resolution image of an entire scene, find similar images from an image database, remove red eyes from images taken using flash, follow eye movements and recognize scenery and establish markers to overlay it with augmented reality. Therefore is a great candidate for digital camera vendors. Library has over 7 million users worldwide. The main known implementations are in stitching Google StreetView images, detecting intrusions in surveillance video in Israel, monitoring mine equipment in China, detection of Europe swimming pool drowning accidents, running interactive art in Spain and New York, checking runways for debris in Turkey, inspecting labels on products in factories around the world on to rapid face detection in Japan and more other. Library is fully optimized using MMX and SSE instructions. They are 5 interfaces for work with library - C++, C, Python, Java and MATLAB. [1] Supported are the most used operation systems like Windows, Linux, Android and MAC. [1, 2] It is possible to speed up the library with other libraries like IPP (Intel Performance Primitives)

Figure 1

Comparison of mostly used libraries for computer vision [3]

E. Color format used in computer vision The most used color code format used in computer vision is HSV format. It’s based on three parts, which defines final color. It is the Hue, Saturation and Value (brightness). Comparing with RGB is this format better for human, because is the closest format in compare to human eyesight. Hue represents the “Name of color”, saturation then closer specifies this color and finally value set color brightness. [7] OpenCV has implemented functions to convert between color codes. For example function cvCvtColor with input code CV_BGR2HSV converts RGB color code based image to HSV color code. Function works with images in program memory. III. PRINCIPLES OF DETECTION IN COMPUTER VISION They are many different methods to recognize one or many objects in image. Next chapters explaining them in implementation complexity order. A. Binary thresthold The easiest way to detect desired objects from image is based on object color - creating binary thresholds based on object color range. The color in image, that passes through this criterion is transformed to white color (or binary one) and other colors are transformed to black color (binary zero). Result is the black/white 2bit depth image. Transformation can be inverted of course. [8] OpenCV implement high level function for threshold detection – cvInRangeS. It works with images in computer memory. Upper and lower color values are defined as scalars of HSV color codes, which are possible to pass on. The only input criterion, that is needed is object color. This brings one big advantage – small criterion complexity – and one big disadvantage too – color interference. This disadvantage is critical, therefore the best results are achieved with detailed filter settings.

D. Comparsion of mostly used libraries. In compare with other vision libraries is OpenCV much faster. [3] This is illustrated in Figure 1. Test was made on Pentium M 1.7 GHz. Scores are proportional to runtime needs, therefore low score is better. As is illustrated, OpenCV with IPP is much faster than other libraries. [3]

Figure 2

254

Color interference in threshold filtering

ICETA 2013 • 11th IEEE International Conference on Emerging eLearning Technologies and Applications • October 24-25, 2013, Stary Smokovec, The High Tatras, Slovakia

When object with the same color will be placed on area corners, the equation gives the middle-value of detected objects center.

Color interference is not the one problem. When is more than one object with the same desired color in view, all results are useless. This is solved by passing threshold detection results to next level of detection based on object shape or patterns. Camera weight color balance is the third problem. When scene change quality or intensity of lightning conditions, automatic color balance will change saturation and value of colors in image (in most cases is Hue changed too).

Figure 3

Hue and value change after automatic color balance in camera

Figure 5

C. Contours Moment based object detection is usable only with one object (or multiply object known as one object) and detect only area and center coordinates. For complexly shape detection, contour based principles are needed. This principle detects many objects in image and returns a set of vertices with their coordinates (vertices representing object vertexes). The simplest method to detect object is based on count of object vertices (for example triangle has three vertices). Method is more resistant to color interference. For this kind of detection, no other calculation is needed. It is possible to calculate other properties of object like convexity, concavity and other. This also needs to provide extra calculations. OpenCV provides functionality to find the contours cvFindContours function. This function returns all recognized objects. Function cvApproxPoly provide separated access to all objects and their properties.

B. Image Moments In computer vision is image moment m() an average of the image pixel intensities. Pixels selection is based on some attractive property, or interpolation. [9] Moments are separated by order of the moment. Moment m(p,q) order is depended on moment indices (p,q). Addition of these indices (p+q) gives image moment order. So next moments are defined: [10] Zero order moment (addition of indices is equal to zero (m(0,0) ): it describes the area of object in image First order moment (m(0,1) , m(1,0)): this moment describes the center of object in image N’th moments: they are not relevant for this article targets Moments are of many types like spartial moment - give information about the object in the image relating to object position and central moments – give information about origin of coordinate system. The last type of moment is central normalized moment, which is scaled by the area of object. [10] From moments m(0,1) and m(1,0) (spartial moments) can be calculated center of detected object. This is provided by dividing this moments with moment m(0,0) (central moment). For working with moments have OpenCV implemented many functionalities. The main function is cvMoments, where input is the threshold image and output is a moment structure. Spartial moments are available by calling function cvGetSpatialMoment, where input is moment structure and p,q are indices. The final central coordinates of detected object are defined as: X = m(1,0) / m(0,0) (1) Y= m(0,1) / m(0,0) (2)

Figure 6

IV.

Contour based object detection with color interference (yellow is highlighted recognized object)

ADVANCED OBJECT DETECTION AND TRACKING METHODS

The current modern methods used in tracking objects in real time are: IVT [12], VRT [13], FRAG [14], boost [15], SEMI [16], BeSemiT [17], L1T [18], MIL [19], RTD [20] and the method of detection and tracking of learning ability (TLD) [22], which is shown as the most reliable and effective in the comparison of tracking methods implemented on various kind of videos with various kind of tracked object.

Using explained methods and calculation is possible to detect the center coordinates of recognized object. For recognizing center of larger areas is possible to use one of threshold disadvantage – detection of multiple objects with the same color.

Figure 4

Using more markers to detect center of large area

A. TLD algorithm Tracking algorithm of TLD is based on the step of monitoring recursive tracking in forward and backward direction, which is performed by Lucas-Canade algorithm in both directions with calculating of median at the end. Detection of the TLD uses dispersion filter, file classifier and nearest neighbor classifiers. Learning in TLD is

Image center detection using Moments (yellow dot is calculated center)

255

J. Lámer et al. • Computer Vision-based Object Recognition Principles in Education

The part of output file sample could be seen on next figure.

realized by PN learning, where the data can be classified with yielding the structure in the form of path of object and application of the positive (P) limitations and then the negative (N) limitation. Next stage is based on generating the new data and updating the classifier object. Segments of the image type P represent objects with a high probability of correlation with the template and the segments of image type N vice versa with low correlation [21, 22]. TLD algorithm has his own implementation in Matlab environment. There was tested a various kind of tracking objects such as faces, hands, eyes, microphones etc.

Figure 9 OpenTLD implementation output file example

Learned object model can be used in a subsequent tracking or in other instances of program with other video sources.

Figure 7

Experiment of tracking eye with TLD implementation

One of the implementation of TLD algorithms is OpenTLD in C++ using only open libraries without additional Matlab environment [23]. The resulting application has an option to define a template for object tracking either the manual way, or the object model, or by initialization coordinates

Figure 8

Figure 10 Experiment of tracking hand with TLD implementation

V. OBJECT DETECTION IN MOBILE DEVICES There is also OpenTLD application on Android platform, which is created by OpenTLD C++ implementation in cooperation with FastCV library that provides a live preview rendering with displaying the tracking layer around the detected and tracked object [24]. The application operates on the principle of defining a template for detection on first rendered frame from camera. Then it starts the detection of template model in the actual picture frame in real time, where another characteristic of tracked object is learned and extended in to template model. The application is only a prototype, which is used to label tracked object by drawn rectangle in the live preview of video. An interesting addition to this application would be to extend the opportunity to continuously gather information about the position and size of the object in the form of coordinates. Computer Vision Library FastCV was developed by Qualcomm and later it has been released for free using. Its main advantages include: functions for image processing and transformation, detection, tracking, recognition, 3D reconstruction functions for use in Augmented Reality, segmentation and advanced memory management. FastCV is primarily designed for mobile devices, ARM platform, in which it achieves significantly faster and more efficiently results than competing image processing library such as OpenCV [26].

Experimental face tracking with OpenTLD C++ application

It includes option for running with the parameter defining the video source for object tracking in real time or from video recording. It has ability to define the output file with history of coordinates of tracked object position and it also store the learned model of the observed object.

256

ICETA 2013 • 11th IEEE International Conference on Emerging eLearning Technologies and Applications • October 24-25, 2013, Stary Smokovec, The High Tatras, Slovakia

[10] KILIAN. Simple Image Analysis By Moments. 2001, Available online: http://public.cranfield.ac.uk/c5354/teaching/dip/opencv/SimpleIm ageAnalysisbyMoments.pdf [11] WANG Q. a kol.: An Experimental Comparison of Online Object Tracking Algorithms, Proceedings of SPIE: Image and Signal Processing Track, 2011 [12] ROSS D. a kol.: Incremental learning for robust visual tracking, International Journal of Computer Vision 77(1-3), s. 125–141, 2008 [13] COLLINS R. – LIU T. – LEORDEANU M.: Online selection of discriminative tracking features, IEEE Transactions on Pattern Analysis and Machine Intelligence 27, s. 1631–1643, 2005. [14] ADAM A. – RIVLIN E. – SHIMSHONI I.: Robust fragmentsbased tracking using the integral histogram, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, s. 798– 805, 2006. [15] GRABNER H. – BISCHOF H.: On-line boosting and vision, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, s. 260–267, 2006. [16] GRABNER H. – LEISTNER C. – BISCHOF H.: Semi-supervised on-line boosting for robust tracking, Proceedings of European Conference on Computer Vision, s. 234–247, 2008. [17] STALDER S. – GRABNER H. - VAN GOOL A.: Beyond semisupervised tracking: Tracking should be as simple as detection, but not simpler than recognition, Proceedings of IEEE Workshop on Online Learning for Computer Vision,2009. [18] MEI X. - LING H.: Robust visual tracking using l1 minimization, Proceedings of the IEEE International Conference on Computer Vision, s. 1436–1443, 2009. [19] BABENKO B. – YANG M. – BELONGIE S.: Visual tracking with online multiple instance learning, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, s. 983– 990, 2009 [20] KWON J. - LEE K.: Visual tracking decomposition, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1269–1276, 2010. [21] KALAL Z. – MATAS J. – Mikolajczyk K.: P-N learning: Bootstrapping binary classifiers by structural constraints, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 49–56, 2010. [22] KALAL Z. - MIKOLAJCZYK K. - MATAS J.: ForwardBackward Error: Automatic Detection of Tracking Failures, International Conference on Pattern Recognition, Istanbul, Turkey, ICPR 2010 [23] KALAL Z. - MATAS J. - MIKOLAJCZYK K.: P-N Learning: Bootstrapping Binary Calssifiers by Structural Constraints, 23rd IEEE Conference on Computer Vision and Pattern Recognition, CVPR, San Francisco, USA, 2010 [24] NEBEHAY G.: RObust Object Tracking Based on TrackingLearning-Detection, Diplomarbeit, Technische Universität Wien, 2012 [25] GRAVDAL E.: Augumented Reality and Object Tracking for Mobile Devices, NTNU Trondheim, 2012 [26] QUALCOMM INC.: FastCV Library 1.1.1 [online], available on internet: https://developer.qualcomm.com/docs/fastcv/api/index.html

VI. CONCLUSION This article described mostly used object detection and tracing methods and tools with emphasis to use in education. All described methods where practically tested and illustrated. From [3] and from own testing and research will be formed the conclusion, that OpenCV is a good system (library), that is primary intended for education and to solve easy problems. Fourth chapter shows, that OpenTLD is better choice for complex problems. Application level of implementation in all presented tools is very good. Programmer (student) doesn’t need any mathematical skills to detect and track objects in real-time computer environment. Computer programming skills can be at low level, thanks higher level of method implementation. This overview is sufficient for solving easy problems in a field of object detection in computer vision. ACKNOWLEDGMENT Article is the result of the Project implementation: University Science Park TECHNICOM for Innovation Applications Supported by Knowledge Technology, ITMS: 26220220182, supported by the Research & Development Operational Programme funded by the ERDF. REFERENCES [1]

[2]

[3] [4]

[5]

[6]

[7]

[8] [9]

OpenCV Developers Team., About, 2013, [Online; Accessed 30September-2013, [Online], Available: http://opencv.org/about.html Refsnes Data., OS Platform Statistics, 2013, [Online; Accessed 30-September-2013, [Online], Available: http://www.w3schools.com/browsers/browsers_os.asp BRADSKI, Gary; KAEHLER, Adrian. Learning OpenCV: Computer vision with the OpenCV library. O'reilly, 2008. VXL developper team., Introduction: What is VXL?, 2013, [Online; Accessed 30-September-2013, [Online], Available: http://vxl.sourceforge.net/ LTI developper team., Introduction, 2013, [Online; Accessed 30September-2013, [Online], Available: http://ltilib.sourceforge.net/doc/homepage/index.shtml QUALCOMM Technologies INC., Computer Vision (FastCV), 2013, [Online; Accessed 30-September-2013, [Online], Available: https://developer.qualcomm.com/mobile-development/mobiletechnologies/computer-vision-fastcv SURAL, Shamik; QIAN, Gang; PRAMANIK, Sakti. Segmentation and histogram generation using the HSV color space for image retrieval. In: Image Processing. 2002. Proceedings. 2002 International Conference on. IEEE, 2002. p. II589-II-592 vol. 2. AL-AMRI, Salem Saleh, et al. Image Segmentation by using threshold Techniques. arXiv preprint arXiv:1005.4020, 2010. MUKUNDAN, Ramakrishnan; RAMAKRISHNAN, K. R. Moment functions in image analysis: theory and applications. Singapore: World Scientific, 1998.

257