subject with the OpenCV library, which is an open-source set of classes, functions ... help of the OpenCV library range from automatic correction of image quality ...
Teaching image processing and pattern recognition with the Intel OpenCV library Adam Kozłowski, Aleksandra Królak Department of Medical Electronics, Institute of Electronics, Technical University of Lodz Wolczanska 211/215, 90-924 Lodz, Poland ABSTRACT In this paper we present an approach to teaching image processing and pattern recognition with the use of the OpenCV library. Image processing, pattern recognition and computer vision are important branches of science and apply to tasks ranging from critical, involving medical diagnostics, to everyday tasks including art and entertainment purposes. It is therefore crucial to provide students of image processing and pattern recognition with the most up-to-date solutions available. In the Institute of Electronics at the Technical University of Lodz we facilitate the teaching process in this subject with the OpenCV library, which is an open-source set of classes, functions and procedures that can be used in programming efficient and innovative algorithms for various purposes. The topics of student projects completed with the help of the OpenCV library range from automatic correction of image quality parameters or creation of panoramic images from video to pedestrian tracking in surveillance camera video sequences or head-movement-based mouse cursor control for the motorically impaired. Keywords: Keywords: image processing, computer vision, teaching
1. INTRODUCTION Image processing and pattern recognition, along with computer vision, are important branches of modern science. They allow us to acquire and process vast amounts of visual data that we must tackle these days, aiding us in the task of extracting crucial information from the data. The technologies of image processing and computer vision apply just as well to critical tasks involving medical diagnostics, security or automated production processes, as to everyday tasks or art and entertainment purposes, including photo retouching, video editing and television transmissions. We are now surrounded with images and videos, each day we use digital imagery in various branches of science, entertainment and everyday life. Digital cameras are widely available and reasonably priced. With the digital form of images and videos we must learn how to manage, transform, process and edit this material for various purposes. Computer or a digital device, fitted with a camera can obtain various information from images and video and make decisions or take actions depending on that. In the Medical Electronics Division, at the Technical University of Lodz, techniques of image processing and pattern recognition are used in a broad range of projects. The research activities of the Division concentrate on processing and analysis of signals and images, as well as design of human-machine interfaces and electronic aids for the disabled. It is obvious that along with the scientific efforts in the field of image analysis, especially in case of medical imaging, our Division needs to organize extensive learning courses in image processing and pattern recognition. On the other hand, in case of our research of human-machine interfaces, computer vision methods are used, which enable capturing motion of hands, head, eyes or lips, and these movements are then translated into prespecified actions. Another research area where computer vision methods are used at the Medical Electronics Division is building aiding systems for the visually impaired. In this specific application, we use algorithms of stereoscopic image sequence analysis, image rectification, three-dimensional scene reconstruction and ego-motion estimation.
2. TEACHING IMAGE PROCESSING AND PATTERN RECOGNITION The areas of research carried out at the Medical Electronics Division imply the need and capabilities for teaching of image processing and pattern recognition. Our Division is in charge of the master of science course in signal and image processing, as well as several other courses, eg. in biomedical engineering. It is our mission to provide the best possible education in the field of image processing, pattern recognition and computer vision. Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2009, edited by Ryszard S. Romaniuk, Krzysztof S. Kulpa, Proc. of SPIE Vol. 7502, 750205 · © 2009 SPIE · CCC code: 0277-786X/09/$18 · doi: 10.1117/12.837520 Proc. of SPIE Vol. 7502 750205-1 Downloaded From: http://proceedings.spiedigitallibrary.org/ on 02/27/2014 Terms of Use: http://spiedl.org/terms
An important incentive for both teaching and learning the techniques of image processing, pattern recognition and computer vision is that nowadays we experience thorough penetration of digital imaging in possibly every branch of human activity. We deal with rich visual content on a daily basis through the Internet, our computers must handle vast amounts of visual data, and our mobile phones are fitted with digital still and video cameras and are also able to send and receive images and real-time video feeds. In the last decade, we have experienced a digital imaging revolution, which has boosted not only our everyday lives, but also medicine, industry, education and art. The teaching process in the course of image processing and pattern recognition held at the Medical Electronics Division of the Technical University of Lodz is carried out in the form of lectures in parallel with practical exercises, whereas the final mark is the outcome of a written examination along with the grade from the final practical project. The final projects are spread among the groups of two students each from a broad list of topics that are changed each year. It is very important to remember that in case of teaching the students, their knowledge of the theoretical principles and mathematical models useful in image processing and pattern recognition must go hand in hand with the ability of designing and implementing working solutions. However creating the programming tools from scratch can be discouraging for the students and it does not promise optimal performance of the developed solutions. Therefore it is a good idea to supply them with tools giving a possibility to design working solutions and empirically test algorithms with their own images and videos. A classic tool for teaching of image processing and pattern recognition is the Matlab software package by Mathworks. This programming environment is extremely easy to use and it is particularly suitable for presenting the intricate algorithms of image processing. However, in order to be able to process video from a digital camera, or multimegapixel digital images, it is advisable to use a regular programming language, like C++, along with special libraries which give users readily available tools for performing certain image processing tasks quickly and easily. There is a considerable amount of such libraries available for download from many sources, however, at the Medical Electronics Division, we found that the OpenCV library, developed by Intel researchers, will be our library of choice for teaching image processing and pattern recognition.
3. THE INTEL OPENCV LIBRARY OpenCV is an open source computer vision library. The library is developed in C and C++ and works on multiple platforms, including Windows, Mac OS X and Linux. Furthermore, interfaces for other languages, eg. Python or Matlab are being developed. OpenCV was designed to be computationally efficient so that applications working in real-time can be developed, especially that it supports multicore processors. The incentive for developing the OpenCV library at Intel Research was that at some of the best technical universities there were already some computer vision libraries that were open to the students of a given university. This was very important because students could use ready building blocks in order to create better and more advanced applications from year to year by just building upon previously developed software. Therefore the OpenCV development team took the effort to create such a computer vision infrastructure that would be open source and universally available. The following rules were set as the guidelines for OpenCV [1]: • OpenCV mission is to advance computer vision research by providing open and optimized code for basic computer vision infrastructure • Dissemination of computer vision knowledge by providing a common infrastructure that developers could build on, so that code would be more readily available and transferable • Boost progress in computer vision-based commercial applications by providing portable and performanceoptimized code available for free and with a license that did not require commercial applications to be open The OpenCV library contains over 500 functions that span many areas in computer vision, including industrial product inspection, medical image analysis, video surveillance, graphical user interface, camera calibration and rectification, stereo vision and robotics. It consists of 4 main components: basic data structures and content (Core), basic image processing and higher-level computer vision algorithms (CV), machine learning library (ML), functions for the user interface and file I/O (HighGUI). There is also an additional sublibrary with auxiliary or experimental functions (CVAux). The structure of the OpenCV library is presented in Fig.1.
Proc. of SPIE Vol. 7502 750205-2 Downloaded From: http://proceedings.spiedigitallibrary.org/ on 02/27/2014 Terms of Use: http://spiedl.org/terms
Fig. 1. The basic structure of the OpenCV library
The Core part of the library covers the following areas of functionality: • intialization of basic data structures • copying, filling and accessing subelements of data structures • arithmetic, logic, statistical and mathematical functions • drawing functions (lines, circles, rectangles, text) • error handling There is a considerable number of various data structures, specific to OpenCV, which provide very efficient operations even on large images. Data structures provided range from 8-bit unsigned to 64-bit, floating point structures. Most functions work on various kinds of structures and if they don't, the documentation specifies it clearly. Even if the user commits a mistake by assigning a specific function to a wrong type of image, the error handling functions will inform speficically which function received a wrong type of input or output. Most of the very useful and fast arithmetic or logic functions are also in-place functions, which mean that they do not require and empty destination image to be declared first. Drawing functions are also very simple to use and therefore provide the users with excellent means of visual representation of detected objects in the images or inserting information in the form of text to specific images or to video feeds (for example frames per second count, or an area of a certain detected object). The CV part of the library contains the following functions: • gradient calculation, edge and corner detection • feature finding for tracking • interpolation and geometrical transformation • color conversion • morphological operations • 2D filtering • template matching • histogram calculation and analysis • image segmentation and connected components • motion analysis and object tracking • object recognition • camera calibration and 3D scene reconstruction This part of the library provides most functions that are used in the teaching process as well as in the final projects that students must develop and submit at the end of the semester. It is worth to emphasize that the most commonly used functions, for example color conversion, edge detection or morphological operations and 2D filtering are highly optimized and work on average at least 50 times faster than their Matlab counterparts. This gives the users the opportunity to create real-time video-processing applications based on the image sequences obtained from webcams, which are fitted in most modern laptops, which reinforces the students' commitment to this subject by allowing them to work with their own everyday equipment. Another very important aspect of this part of the library is that it contains
Proc. of SPIE Vol. 7502 750205-3 Downloaded From: http://proceedings.spiedigitallibrary.org/ on 02/27/2014 Terms of Use: http://spiedl.org/terms
many high-end functions that are normally unavailable as a standard function for example from within Matlab. Such functions include object recognition, face detection, 3D scene reconstruction and others. All functions are well documented and almost as easy to use as Matlab functions in the Image Processing Toolbox. The GUI part of the OpenCV library is not only devoted to the GUI itself and covers the following functionality: • graphical user interface creation – windows, trackbars, mouse and keyboard event detection • loading and saving images of various formats • video file input/output and camera parameter setup • other utility and system functions It is important to emphasize the ease of use of the AVI writer function which enables users to produce attractive presentation of their projects, especially if they cover the aspects of video image processing and object recognition in image sequences. There is a separate part of the OpenCV library that is devoted solely to machine learning algorithms. It is extremely useful for all modern pattern recognition applications and it covers the following methods of classification: • normal Bayes • k-nearest neighbors • support vector machines • decision trees • boosting • random trees • expectation-maximization • neural networks All those functions have been included in the library since the 1.0 version released in 2006 and they provide students and researchers with the most up-to-date and universal tools for machine learning, classification which are commonly used now in all pattern recognition problems. All functions provide many parameters required for tweaking the performance of a given ML algorithm to a specific problem. The open source license for OpenCV has been structured such that you can even build a commercial product using all or part of OpenCV, but you are under no obligation to open-source that software or to return library improvements to the public domain. Because of such liberal licensing terms, there is a very broad user community, including forums that operate in various countries all over the world. Since the first release of the library in 1999, OpenCV has been used in many applications, including stitching images together in satellite and web maps, image scan alignment, medical image noise reduction, object analysis, security and intrusion detection systems, automatic monitoring and safety systems, manufacturing inspection systems, camera calibration, military applications, and unmanned aerial, ground, and underwater vehicles. OpenCV was also a key part of the vision system in the robot from Stanford, “Stanley”, which won the DARPA Grand Challenge desert robot race [2].
4. STUDENT PROJECTS Image Processing and Pattern Recognition courses in general belong to the most popular ones among students. The good practice is to use theory given during the lectures for developing interesting projects that end up with either reallife problem solutions or entertaining applications. The project topics should require form the students the knowledge of image processing methods, however programming the tools from the scratch can be discouraging and does not promise optimal solutions. The whole idea of progress is to use inventions done by previous generations to create new inventions. Therefore, we have decided to supply our students with possibly best set of tools for developing image processing applications. In the Institute of Electronics at the Technical University of Lodz we have first used the OpenCV library in research activities spanning a broad range of projects including analysis of medical images, analysis and classification of textures, object localization with stereovision [3], and development of solutions for the disabled [4]. Encouraged by the effects we decided to facilitate the teaching process in the subject of image processing and pattern recognition with this specific tool. The first projects requiring the use of OpenCV library were introduced in 2006 for the students of International Faculty of Engineering. The projects range from image enhancing applications, through various Human-Computer Interfaces, to solutions developed just for fun.
Proc. of SPIE Vol. 7502 750205-4 Downloaded From: http://proceedings.spiedigitallibrary.org/ on 02/27/2014 Terms of Use: http://spiedl.org/terms
The software for image editing and manipulation available on the market offers a broad range of tools, however it is usually expensive. To show the students that they can easily develop their own solutions for free, they can choose form the topics concerning image enhancement, such as noise removal in digital camera images, automatic image rectification or white balance correction with calibration. The last of these was to be done for video sequences. The solution was based on the idea of adding the correction coefficient, found in the calibration procedure, to the blue and red channel of RGB image. OpenCV provides fast and efficient tool for accessing the values of the pixels in particular channels for a given color space – cvGet2D. This function, returning particular array element allowed for real-time processing of the sequence of images with corrected white balance (Fig. 2).
Fig. 2. Input and output of the automatic white balance correction software for real time video
Many digital cameras, especially those where the size of the sensor is not large enough, exhibit very low dynamic range. It means that images of scenes containing bright and dark regions will exhibit severe loss of detail in those areas. A method to overcome this problem and to create pleasing images, exhibiting high dynamic range (therefore HDR), resembling human vision, is to take sequences of images of the same scene using different exposure settings and then combine them to produce a final image containing all the highlight and shadow details that a human can perceive. One of the projects required developing method of automatic merging three images, taken at different exposure settings, to produce one with high dynamic range, as shown in the example Fig.3.
Fig. 3. Three input images and the output image of the HDR imaging software
For completing this, difficult at first glance, very simple algorithm was developed. For the image of average intensity two binary masks were created: one covering too dark regions of the image, and second for too bright regions of the image. The regions determined by the first mask were extracted from image with too high exposure, and from the image with too low exposure – regions determined by the second mask. The final step was merging the three regions. OpenCV provides functions for quick thresholding of the image (cvThreshold) as well as tools for matrix algebra, like cvMul for
Proc. of SPIE Vol. 7502 750205-5 Downloaded From: http://proceedings.spiedigitallibrary.org/ on 02/27/2014 Terms of Use: http://spiedl.org/terms
multiplication of two images or cvAdd for addition of two images. Proper combination of these functions allowed for obtaining result presented in Fig.3. Many of the project topics concerned development of different kinds of Human-Computer interfaces. One group of the applications for communication with the computer is based on color space conversion and template matching. OpenCV offers an optimized function for very fast conversion between two arbitrary color spaces (cvCvtColor). Template matching is realized by function cvMatchTemplate, that offers different comparison methods: square difference, crosscorrelation and correlation coefficient. Three applications were developed using these methods: drawing software operated using hand and a webcam instead of a mouse (simple MSPaint), software for viewing image gallery controlled by hand gestures (simplified “Minority Report” computer interface), and an interface for controlling sound parameters in the media player with two flash lights. In all these projects round red and blue markers were used, therefore cvCvtColor function was used for converting the RGB images to YCbCr color space. In OpenCV library we can find a tool for quick dividing of dividing multi-channel array into several single-channel arrays, or extracting a single channel from the array – cvSplit. In this way analysis of red and blue chrominance channels was very easy and allowed for detection of the markers. Function cvMinMaxLoc applied to the result of cvTemplateMatching function allowed for tracking of the position of detected markers. The most spectacular of these three projects is the hand-gesture-controlled interface. The markers are blue and red LED diodes attached to specially prepared gloves (Fig. 4). It gives a user a possibility to view the images in the photo gallery, rotate them and display in the full-screen size. This solution was presented in the Innovation Gardens, part of the Orange Labs global network.
Fig. 4. The gesture-controlled image browser: LED-fitted gloves (left), several of the possible gestures (middle), the main window of the final program (right)
The second group of the developed Human-Computer Interfaces are controlled by head movements. OpenCV offers a very efficient tool for face detection – Haar classifier and a boosted rejection cascade, known also as Viola-Jones detector [5, 6]. Using function cvHaarDetectObjects and a trained classifier for frontal face detection two applications for communication with the computer were developed. One project was based on tracking the position of characteristic points on human face. Detection of the particular part of the face in the proper region of the image was associated with executing certain action (Fig. 5). The second Human-Computer Interaction tool was based on detecting face rotation directions. In this project apart from face detection also the motion detection and tracking techniques were employed. OpenCV includes the tools for this purpose, such as optical flow [7] (cvOpticalFlow), or motion templates [8, 9] (cvUpdateMotionHistory). The Haar classifier can be trained for detection of different types of objects, e.g. eyes, cars, bowls, etc. OpenCV haartraining application creates a classifier for arbitrary object detection based on the given set of positive and negative samples.
Proc. of SPIE Vol. 7502 750205-6 Downloaded From: http://proceedings.spiedigitallibrary.org/ on 02/27/2014 Terms of Use: http://spiedl.org/terms
Fig. 5. Head controlled HCI program window: recognized face (red), control marker (green), application control areas (purple)
Motion templates were also used in the project for pedestrian detection and tracking in image sequences. Such application can be used in monitoring security systems and pedestrian detecting systems that can be installed in cars. Using difference images (cvAbsDiff) followed by thresholding as the input to cvUpdateMotionHistory function, the movement in the sequence of images was detected. To visualize the detected pedestrians and the path they traveled two additional function were used: cvFindContours to retrieve contours from the binary image and returns the number of retrieved contours, and cvBoundingRect that calculates up-right bounding rectangle. OpenCV provides also a set of functions for drawing desired shapes in the image, such as lines (cvLine), rectangles (cvRectangle) or ellipses (cvEllipse). For marking the detected persons the cvRectangle function was used, while the path was sketched with cvLine function. The sample resulting image is presented in Fig. 6.
Fig. 6. Pedestrian detection and tracking software - video output with a pedestrian in red rectangle and its path of movement
5. SUMMARY Teaching image processing and pattern recognition techniques is crucial in times when we find digital imaging in almost every branch of our activity. Especially important is the ability of implementation of the image processing algorithms in computer applications. At the same time we should keep in mind that it is a good idea to use already existing solutions to create new inventions.
Proc. of SPIE Vol. 7502 750205-7 Downloaded From: http://proceedings.spiedigitallibrary.org/ on 02/27/2014 Terms of Use: http://spiedl.org/terms
We have shown in this article that OpenCV library is an efficient, easy to use and powerful set of image processing tools. Therefore we recommend it for teaching image processing, pattern recognition and computer vision. Presented example projects show, that this library allows for working out optimized, effective image processing solutions in an easy way, what will encourage students to explore this branch of science. Moreover, OpenCV is under active development so more functionality will be added with every update. The next step of expanding OpenCV is using Graphics Processing Units (GPUs) to facilitate it. There already exists GpuCV, an open-source GPU-accelerated image processing and Computer Vision library. It takes advantage of the high level of parallelism and computing power available from recent GPUs and easily ports existing OpenCV applications.
REFERENCES [1] [2]
[3] [4] [5] [6] [7] [8] [9]
G.Bradski and A.Kaehler, “Learning OpenCV”, O'Reilly Media (2008). S. Thrun, M. Montemerlo, H. Dahlkamp, D. Stavens, A. Aron, J. Diebel, P. Fong, J. Gale, M. Halpenny, G. Hoffmann, K. Lau, C. Oakley, M. Palatucci, V. Pratt, P. Stang, S. Strohband, C. Dupont, L.-E. Jendrossek, C. Koelen, C. Markey, C. Rummel, J. van Niekerk, E. Jensen, P. Alessandrini, G. Bradski, B. Davies, S. Ettinger, A. Kaehler, A. Nefi an, and P. Mahoney. “Stanley, the robot that won the DARPA Grand Challenge”, J. Robot. Syst. 23, 661–692 (2006). P. Skulimowski, P. Strumiłło "Refinement of depth from stereo camera ego-motion parameters", Electronics Letters, 12, 729-730 (2008). A. Królak, P. Strumiłło, “Vision-based eye blink monitoring system for human-computer interfacing”, 2008 Conference on Human Computer Interaction, 994-998 (2008). P. Viola and M. J. Jones, “Robust real-time face detection,” International Journal of Computer Vision 57, 137–154., (2004). R. Leinhart , J. Maydt, “An extended set of Haar-like features,” Proc. Int. Conf. on Image Processing, 900–903., (2002). B. D. Lucas and T. Kanade, “An iterative image registration technique with an application to stereo vision,”Proceedings of the 1981 DARPA Imaging Understanding Workshop, 121–130 (1981). J. Davis and A. Bobick, “The representation and recognition of action using temporal templates”, Technical Report 402, MIT Media Lab, Cambridge, MA (1997). J. Davis and G. Bradski, “Real-time motion template gradients using Intel CVLib” ICCV Workshop on Framerate Vision (1999).
Proc. of SPIE Vol. 7502 750205-8 Downloaded From: http://proceedings.spiedigitallibrary.org/ on 02/27/2014 Terms of Use: http://spiedl.org/terms