where = is the mean histogram of the two histogram (H and K) vectors. ..... therefore, they are captured as a bright white color by the IR thermal camera. While ..... [43] Niblack, Carlton W., Ron Barber, Will Equitz, Myron D. Flickner, Eduardo H.
A Color Night Vision System For Vehicle Navigation By Hemin Ali Qadir Bachelor of Science Electrical Engineering Salahaddin University-Hawler 2009
A thesis Submitted to Electrical and Computer Engineering at Florida Institute of Technology in partial fulfillment of requirements for the degree of Master of Science in Electrical Engineering
Melbourne, Florida December, 2013
We the undersigned committee hereby recommend that the attached document be accepted a fulfilling in part of the requirements for the of Master of Science of Electrical Engineering “A Color Night Vision System For Vehicle Navigation” a thesis by Hemin Ali Qadir, Sr.
___________________________________ KOZAITIS, Samuel P Professor and Head, Electrical and Computer Thesis Advisor
___________________________________ MITRA, Debasis Professor, Computer Science
___________________________________ KEPUSKA, Veton Associate Professor, Electrical and Computer
___________________________________ OTERO, Carlos E Assistant Professor, Electrical and Computer
© Copyright 2013 Hemin Ali Qadir, Sr. All Right Reserved
The author grants permission to make single copies ____________________________
ABSTRACT Title: A Color Night vision System for Vehicle Navigation Author: Hemin Ali Qadir Major Advisor: KOZAITIS, Samuel P
In this thesis, a system for displaying nighttime imagery with natural colors is presented. The system fuses two spectral bands, thermal and visible, to enhance night vision imagery. The fused image can give a more detailed image than either of the input senses; however, it often has an unnatural color appearance. Therefore, a color transfer based on look-up table (LUT) is added to the system in order to replace the false color in the multiband fused images with natural colors. Natural colors are borrowed from a daytime reference image. A single colormap is not sufficient for navigation when the environment is changing. Hence a set of colormap for different environments is derived. The system is connected to the internet to obtain an image of the location of interest from Google Street View. Therefore, it has ability to select the best derived colormap by comparing histograms of the obtained image with the histograms of all reference daylight images used to derive the colormaps. We also evaluated the results of several histogram comparison metrics and showed the results of several nighttime driving scenarios.
iii
The proposed method could be used in real-time and aid nighttime vehicle navigation. The derivation of a colormap may require some time. However, once the colormap is derived, it can be employed in a real-time implementation because the swapping process requires minimal amount of processing time to exchange the false colormap of a multiband image with the derived colormap.
iv
TABLE OF CONTENTS ABSTRACT ……………………………………………………………………
iii
LIST OF FIGURES …………………………………………………………. ix LIST OF ABBREVIATIONS ……………………………………………… x ACKNOWLEDGMENT ……………………………………………………. xi
1
Introduction ………………………………………………………………..
1
1.1 Introduction ……………………………………………………………..
1
1.2 Literature Review ……………………………………………………….
3
1.2.1
Road Navigation Assistance ……………………………………..
3
1.2.2
Multimodal Grayscale Image Fusion …………………………….
5
1.2.3
Color Transfer Between Two Images ……………………………
6
1.2.4
Color Night Vision ……………………………………………….
7
1.3 The Proposed System …………………………………………………… 10
2
1.4 Objective ………………………………………………………………....
11
System Structure ………………………………………………………….
13
2.1 System Architecture ……………………………………………………..
13
3.1.1
Image Acquisition ………………………………………………..
14
- IR thermal camera ……………………………………………………
15
- Low Light Level camera ……………………………………………..
16
3.1.2
Image Prepossessing ……………………………………………...
3.1.3
Image Registration ………………………………………………. 18
3.1.4
Dual-Band Image Fusion …………………………………………
v
16
20
3.1.5
Color Correction Unit …………………………………………….. 21
3.1.6
Best Color Mapping Selection ……………………………………. 22
2.2 Access to Google Street View ………………………………………………. 23
3
Color Mappings and the Database …………………………………… 25 3.1 lαβ Color Space …………………………………………………………..
25
3.1.2
RGB to lαβ Transform ………………………………………….
25
3.1.3
lαβ to RGB Transform ………………………………………….
27
3.2 Color Mapping …………………………………………………………… 27
2.2.1 Statistic Matching ………………………………………………… 28 2.2.2 Lookup Table (LUT) Color Mapping ……………………………. 30 - Generation of a colormap …………………………………………… 31
2.2.3 Usefulness of Color Lookup Table (LUT) ……………………….. 33 2.2.4 Color Night Vision Based on Color Lookup Table (LUT) ………. 37 3.3 Database of Colormaps and Their Daytime Images …………………… 42
4
Color-Map Retrieval Based on Histogram Comparison ……….. 45 4.1 Introduction ………………………………………………………………. 45 4.2 Color Retrieval Subsystem ………………………………………………. 47 4.3 Color Space ……………………………………………………………….. 49 4.3.1
RGB Color Space …………………………………………………… 49
4.3.2
HSV Color Space …………………………………………………… 50
4.3.3
RGB to HSV Color Conversion ……………………………………. 50
4.4 Histogram Based Colormap Search …………………………………….. 51 vi
4.4.1
Histogram Computation …………………………………………….. 52
4.4.2
Color Quantization ………………………………………………….. 55
4.4.3
Distance Metrics for Histogram Comparison ……………………… 55
- Minkowski-form distance ………………………………………………. 56
5
- Histogram intersection ………………………………………………....
57
- Cosine angle distance …………………………………………………..
57
- X2 statistics (chi square statistics) ……………………………………...
57
- Match distance ………………………………………………………...
58
- Kolmogorov-Smirnov distance ………………………………………...
58
4.5 Recall and Precision ……………………………………………………..
59
4.6 Experiments and Results ………………………………………………..
60
4.6.1
Distance Metrics Evaluation ……………………………………….
60
4.6.2
Evaluation of X2 Statistics in RGB and HSV Color Spaces ……….
68
4.6.3
Performance of X2 Statistics on Colormap Database ……………… 71
Results and Conclusions ………………………………………………..
74
5.1 Color Enhancement ……………………………………………………..
74
5.2 Deriving the Colormap from a Synthetic Image ………………………
76
5.3 Human or Hot Object Appearance …………………………………….
78
5.4 Results of Applying Retrieved Colormaps …………………………….
81
5.5 Disadvantage of Histogram Comparison ………………………………
88
5.6 Conclusions ………………………………………………………………
89
5.7 Future Work …………………………………………………………….
90
Reference ………………………………………………………………………..
92
vii
LIST OF FIGURES Fig.
Figure Name
Page
2.1
System architecture …………………………………………………….. 14
2.1
Tamarisk® IR thermal camera ………………………………………….
15
2.3
LLL visible camera and analog to digital video convertor ……………………
16
2.4
Inverting an IR thermal image
17
2.5
LLL visible image noise reduction ……………………………………………. 17
2.6
19
2.7
3D view of mounting IR thermal and LLL visible cameras …………………. IR thermal and LLL visible image registration ………………………………..
2.8
Multiband image fusion ……………………………………………………….
21
2.9
The obtained image from the Google Street View …………………….. from above HTTP request
24
3.1
Color transfer using statistical approach ………………………………………
29
3.2
Color transfer using look-up tables approach …………………………………
32 - 33
3.3
Results of applying the derived colormap shown in Fig. 3.2 (c) ……………...
34 - 35
3.4
Color transfer using look-up tables approach …………………………………
35 - 36
3.5
Results of applying the derived colormap shown in Fig. 3.4 (c) ……………...
36 - 37
3.6
Procedure of colorizing multiband fused image using LUT method ………….
38
3.7
Process of Color Night Vision using LUT method ……………………………
38 - 40
3.8
Applying a derived colormap from a scene on another scene ………………… 40 - 41
3.9
Applying the colormap from Fig.3.8 on a scene ……………………………….. that does not have the same materials
19
42
3.10 The stored images in the database …………………………………………...... 44
4.1
Colormap retrieval subsystem ………………………………………….
47
4.2
Retrieving an image using histogram comparison …………………………....
48
4.3
RGB color space ………………………………………………………………
49
4.4
HSV color space ………………………………………………………..
50
viii
4.5
Histogram of an image …………………………………………………
54
4.6
Illustration Precision and Recall ……………………………………………...
59
4.7
The performance of all distance metrics in RGB ……………………… color space for multiple query images
62 - 64
4.8
The performance of all distance metrics in HSV ………………………. 65 - 67 color space for multiple query images
4.9
The performance of X2 statistics distance metric ……………………… in both RGB and HSV color spaces
69 - 71
4.10 An Example of the colormap retrieval subsystem ……………………... 72 4.11 An Example of the colormap retrieval subsystem ……………………... 73 4.12 An Example of the colormap retrieval subsystem ……………………... 73
5.1
Illustration of color transfer from a daylight reference image ………………...
75
5.2
Illustration of color enhancement by scribbling the …………………………... daytime reference image
76
5.3
Illustration of using a synthetic view as the reference daytime image ………..
77
5.4
Illustration of using a synthetic view as the reference daytime image ………... 78
5.5
Hot target appearances in a resulting color night vision image ……………….
79
5.6
Hot target appearances in a resulting color night vision image ……………….
80
5.7
Applying a derived colormap from a scene on another ………………………. scene of the same materials
81 - 82
5.8
Applying a derived colormap from a scene on another ………………………. scene of the same materials
83 - 84
5.9
Applying a derived colormap from a scene on another ………………………. scene of the same materials
85 - 86
5.10 Applying a derived colormap from a scene on another ………………………. scene of the same materials
ix
87
LIST OF ABBREVIATIONS IR
InfraRed
LLL
Low Light Level
NHTSA
National Highway Traffic Safety Administration
GPS
Global Positioning System
CCD
Charge Coupled Device
MRI
Magnetic Resonance Imaging
CT
Computed Tomography
FOV
Field Of View
BM3D
Block Matching and 3D filtering
URL
Uniform Resource Locator
HTTP
HyperText Transfer Protocol
LUT
Look Up Table
CBIR
Content Based Image Retrieval
QBIC
Query By Image Content
DSP TM
Digital Signal Processing TriMedia
x
ACKNOWLEDGMENT First of all, I would like to thank my God (Allah) for blessing me and giving me opportunity to study in the USA. I would like to show my gratitude to my family members, especially my mother and my father, for their past efforts to make me a strong and successful person in life. I wish to express my sincere gratitude and appreciation to my advisor, Professor Samuel P. Kozaitis. His willingness to support my work and his excellent guidance has encouraged me to improve my skills as an image processing engineer.
xi
Chapter One Introduction 1.1 Introduction Nowadays, one of the most interesting research areas in computer vision and image processing is color night vision. Infrared (IR) thermal and low-light-level (LLL) visible cameras are the most popular night time imaging systems which are widely used for military, surveillance, reconnaissance, and security [1]. A thermal camera provides information of objects radiating thermal energy in a dark area, a busy background, and seeing though fog, but they are not capable of capturing background like trees, leaves and grass in nature scene while a low-light-level visible camera provides information of the background reflecting visible near infrared light in great detail [2, 3]. Thermal cameras and visible cameras are not sufficiently capable when used individually [4]. It can be very difficult to distinguish a background of a target in the scene using only Infrared cameras for night time imagery [6]. A creative idea to achieve better description of a scene at night is to combine thermal and visible images into single fused image. In the single fused image, all perceptually important information that is present in the individual thermal and visible images is preserved. Compared to individual image modality, reliability and capability of human and machine perception can be improved with fused imagery [6].
1
Until recently, the most common way to represent night vision imagery is by a gray-scale or green-scale representation. It is already known that natural color images have much more benefits over monochrome images due to the more variety in color images. Color image representation of night vision imagery will lead to a better scene recognition and object detection in the scene. It also improves human performance and reduces reaction time [7]. The reason behind that is because human eyes can distinguish only 100 shades of gray scale compared to more than 400 hues and about 20 saturation levels per hue [8]. In this thesis, a road navigation assistant system based on color night vision techniques will be presented. The system will help drivers to drive at night on roads and see in front of them in a natural day-time color appearance. The inputs of the system are two different spectral bands of imagery coming from two different cameras (thermal and low light visible cameras). These two images are combined into a single color image. The perceived scene would be more detailed with a suitably combined representation of the input images because the fused image contains all perceptually important information of both input images, resulting in a larger degree of situational awareness [6]. Since the result fused image has unnatural color appearance, a color transfer method will be a mandatory process to be applied in the proposed system to transfer natural colors from a daytime color image (reference image) to the fused false color image. Many different techniques have been proposed to display night-time imagery in natural daytime color imagery. Most of them are computationally expensive, and cannot be applied for real time applications [1,2,5,7,8]. Some other researchers have introduced
2
real time image colorization techniques based on color transfer from one image to another image [3,6,9,18].
1.2 Literature Review 1.2.1
Navigation Road Assistance In the last decade a lot of methods have been introduced for navigation assistance
based on image processing and computer vision techniques. The aim of developing most of those methods is to minimize car accidents during nighttime driving. It has been realized that fatal car accident rate at nighttime is about three times higher than the daytime [22]. This is due to a variety of reasons. In low-light conditions, drivers’ peripheral vision, depth perception, and ability to differentiate color are worse during night than daytime. Some methods have been proposed to detect the night time road lines and upcoming curvature. Those systems can be used as a tool for estimating the road boundaries and the road curvature [23-25,30,31]. In [23] lane detection and tracking using B-Snakes were invented by Yue Wang and his colleagues. Serfling et al. [24] introduce a system that can be used to estimate a road course during night which covers distances up to 120 meters in a rural environment. They thought that by knowing the course of the road in front of the car the drivers would be more aware of a detected object whether it is on the road and thus of immediate importance. In order to realize the system, they presented a fusion system which combines the information provided by a night vision camera sensor, a digital map, and prototypical image radar sensor. They 3
made a particle filter to combine features from the camera and the radar sensor in order to match the road shape with the road visible in the image. The digital map was exploited to calculate the shape of the road. Tran et al [25] presented a method that brightens the lane boundary based on the intensity of the road images taken by a camera mounted on a car roof in a night scene. Bing Ma et al [31] introduced even better system detector of road location and lane boundaries by combing different types of sensed data. Those methods help drivers to be aware from forthcoming curvature, right and left turns, but tell nothing about upcoming obstacles on the road. Bertozzi and his colleagues [30] made a real time stereo vision system to improve road safety. The system was designed to detect lanes of painted markings and generic obstacles, but they did not test their system for night time driving. Some of the systems are more like entertainment applications. They have been designed to help drivers to be familiar with roads [26, 27], and know their location [28, 29]. For example, driving on a road for the second time would be more comfortable due to visual memory of the route. Peng et al. [26] make a system based on Google maps with street view. The system will generate a smooth scenic video from the starting point to the destination. They believed that the process of viewing in the Google street view from starting point to the ending point could not present smoothly, but it presents in a way more like jumping from a point to another point. Chen et al [27] suggested a method based on Microsoft Visual Earth to see a video of the planned route before you actually drive on it for the first time in order to acquire a visual memory of the route. However, these two methods help drivers to see their routes and be familiar with before they haven’t seen the roads, they do not help them in real time traffic and during night because 4
the resulting videos show the roads only in daylight. Gautam et al. [28] invented a system that helps mobile users as navigation assistance. The system retrieves the images along a route that connects the two addresses. Zamir et al. [29] address the problem of finding the GPS location of images based on Google street view images. Those systems fail in night time because the images in the databases are all taken in day time. In addition the capability of both systems is limited only for a specific area.
1.2.2
Multimodal Grayscale Image Fusion
The objective of image fusion is to preserve complimentary information which is present in two or more multimodal images of the same scene when merged into a single composite image such that the result images have more details than each of the individual image. A multi-sensor fused image would be more suitable for the purpose of human perception and computer processing tasks such as object detection, target recognition, segmentation and feature extraction [11]. For instance, thermal IR and visible images could be combined to obtain better perception of a scene especially in night time applications because the two cameras are not capable enough to capture all information of the scene individually. The simplest way to combine two images is to sum and average them, but the problem of this direct fusion is that the contrast of features appearing only in one of the source images is reduced in composite image [12]. To address that problem, many algorithms to fuse grayscale images have been proposed.
In 1990, Toet [13] developed a fusion scheme called hierarchal image fusion. Burt et al [11] introduce a novel method to fuse multimodal images. In this method, for each
5
of the input image an image pyramid is created, and then a composite image is constructed by selecting coefficients from the input image pyramids. Finally, an inverse pyramid transform is taken to display the composite pyramid. Hui Li and his colleagues [12] found that the Laplacian pyramid image fusion method makes blocking artifacts in the region where the multi-sensor data are significantly different; therefore, they introduced a new scheme for image fusion based on the wavelet transform. The wavelet transform for each source image is computed, and a composite wavelet is created by fusing the wavelet transforms of the sources images. Finally, the new composite image is achieved by computing the inverse wavelet transform of the fused wavelet coefficients [12]. 1.2.3
Color Transfer Between Two Images In the past 10 years, color transfer has drawn a lot of attention in image
processing and computer graphics. image’s color [1-10,14,18-21].
Many methods have been proposed to alter an
Most of those methods borrow one image’s color
characteristics from another. An overall appearance of an image can be changed using statistics and stochastic processes [14, 15], as in correcting colors of a sunset photograph to a daytime appearance. Reinhard et al [15] invented a method for correcting colors in a target image by borrowing image’s color characteristics from a reference image. They transferred colors between two images by rendering the means and standard deviations of target and reference image to in Ruderman et al’s uncorrelated lαβ color space [16]. They preferred lαβ color space because there is little correlation between the axes in lαβ space, compared to a strong correlation between the axes in RGB color space. This property of 6
lαβ space allows applying different operations in different color channels with some confidence that unwanted cross channel artifacts will not occur [15].
The main disadvantage of Reinhard and his colleagues is that the pixel values have to be transformed from RGB to lαβ color space which makes the process computationally expensive. Inspired by their work, Xiao and Ma [14] found another method to transfer colors between images without any need to transform the two images to lαβ space. They considered pixel’s values as a three dimension stochastic variables and an image as a set of samples. They compute the correlation between the three components by measuring the covariance, and the mean alone each of the three axes in RGB color space. This approach converts a target image’s color appearance through a series of transformations including translation, scaling and rotation derived from those statistics (mean and covariance) of the target and reference images [14].
1.2.4
Color Night Vision Until recently, the most common way to represent night vision imagery is by a
gray-scale or green-scale representation. It is clear that a color image has obvious advantages over a gray-scale image on visual tasks. It has been proven that color images can reduce human error and speed up reaction time. Thus, the aim of colorizing night vision images is to improve overall scene recognition and situational awareness [3,7,9]. Thermal cameras and low light level visible cameras are not sufficiently capable when used individually at night [4]. Fusing two images of different bands preserves all important information that is present in both images, but it produces a false color fusion image. The false appearance of the resulting fused image can be transformed to natural 7
daylight appearance by rendering the statistics of a daylight image using Reinhard al’s idea. Several methods have been proposed for giving night vision imagery a natural daytime color appearance. Most of them focus on multi-band night vision and image fusion [2,3,7], and there is some other methods dealing with single band image colorization technique [9,18,19]. Welsh et al [19] introduced a general technique to colorize a grayscale image by borrowing colors from a reference daytime image. They used the same concept of Reinhard’s color transfer scheme, but since a grayscale image is represented by one dimensional distribution, they only matched the luminance channels between reference color image and grayscale target image. Toet [18] applied Walsh’s idea to give single band intensified night vision imagery a natural daytime color appearance.
In 2003, Toet [6] showed that Reinhard et al’s [15] color transfer method can be applied to transfer the natural color characteristics of daytime color image to fused multiband night vision images. Because color transfer in lαβ color space is computationally expensive due to intense mathematical arithmetic operations, it is difficult to be realized in real time. Many other algorithms have been developed based on methods proposed by Reinhard and Toet to colorize night time images using different color spaces which take fewer mathematical arithmetic operations to transfer colors between images, and they can be realized in real time applications [1-3, 5, 7]. To obtain color night vision, Wang et al [4, 5] applied color transfer algorithm in YUV color space instead of lαβ space. Their algorithm has fewer mathematical arithmetic operations than which in lαβ color space because the transformation of an image from RGB color space
8
to YUV space is less complex. In [5], Wang et al built a real time color transfer system in YUV color space based on three pieces of multi-media DSP TM1300. Li et al [20] implemented Reinhard’s color transfer scheme in YCbCr color space. Through a series of mathematical derivations and proof, they presented a fast color transfer algorithm for the fusion of thermal and visible images [21]. Compared to the usual used lαβ space, Color transfer in YCbCr and YUV avoid iterative color transformation, logarithmic and exponential operations [5]. Based on Xiao and Ma [14] Color transfer scheme, In 2010, Anwaar et al [8] presented that colors from a daylight reference image can be transferred to false color fused image of thermal and visible images in RGB color space without conversion to any uncorrelated color space which means the system can easily be realized in real-time applications.
Statistical color transform has its advantages and disadvantages. The main advantage is that the daytime reference image and the multiband night vision image do not have to necessarily be identical, but it should be similar scene to the multiband nighttime image in terms of contents. For example, it is not appropriate to use an image of buildings and streets view as a reference image for a fused nighttime image of a forest view. The main drawbacks of statistical approach is that a large object in the reference image would dominate the color mapping, and it only addresses the global color characteristics of the depicted scene. Hogervorst et al [7] described an alternative lookup table based method that alleviates the drawbacks of the statistical approach. They derive a color mapping from the combination of a nighttime false color fusion image and a corresponding daylight color image. To derive the color mapping, both nighttime image and daytime reference image have to be registered. Once the color mapping has been 9
obtained it can be applied to different nighttime fused images. This approach is fast enough to be realized in real time applications because the false color appearance of multiband night images can be replaced just by exchanging the false color map of them to the derived color mapping. In this thesis, Hogervorst et al [7] method will be extended for correcting colors in the false night fused images of thermal and visible cameras.
1.3 The Proposed System As it has been discussed in the literature review, many algorithms and systems have been introduced to give night vision imagery a natural daytime appearance, but the purpose of almost all the systems is to improve human performance, better scene recognition, and reduce reaction time for security, reconnaissance, military, and surveillance applications. In this thesis a novel navigation system based on color night vision techniques is presented. The system shows the front of a vehicle more clearly with a natural daylight color appearance during nighttime driving. The inputs of the system are two images coming from two different bands of night vision camera sensors. A low light level visible sensor captures reflected visible to near infrared light, while an IR thermal camera sensor captures information of invisible thermal energy in dark and busy background environment [2]. These two cameras provide complimentary information of a scene, so it is natural to think about having a system to combine them into a single fused image to obtain better location of the road and lane boundaries. The system maps the IR thermal image to the red channel and visible image to green channel of an RGB image representation, and sets blue channel to zero or an image from third camera if it is available. Since the fused image has an 10
unnatural color appearance, a color correction scheme is required to replace the false colors to day-like colors.
For color correction of the fused image, a set of color mappings for different environments is stored in a database, but the question here is, which color mapping in the database should be applied to the specific area? To solve the aforementioned problem, a GPS sensor which gives information about the location of the vehicle and environment content is utilized. The GPS data can be sent to database such as Google Street View or Microsoft StreetSide to get an image of the vehicle location. In this thesis, Google Street View is preferred because it has a larger database of street images than Microsoft StreetSide. Once an image is obtained from Google Street View, it is compared using its histogram with the stored images of each color mapping in the database. Finally, the best match will be chosen for correcting the colors in the fused image.
1.4 Objective While most of the fatal car accidents occur during nighttime driving because human’s vision and ability to differentiate objects in dark and low light condition are worse than day time, night navigation assistance has become increasingly important and necessary for today’s life. According to a NHTSA report [22] fatal car accidents rate at nighttime is about three times higher than the daytime.
The main goal of this thesis is to develop a color night vision system which could help drivers to see the front of vehicles in a natural day-like color view while driving during night. Natural color image representation of night vision imagery that closely resembles 11
its daytime looking improves drivers’ performance and reaction time by making the scene’s interpretation and recognition more intuitive, and makes drivers prepared and aware of upcoming obstacles and lane curvatures on the roads.
12
Chapter Two The System Structure In chapter one, the introduction of the thesis, history of color night vision and image fusion, and a brief concept of the proposed system were discussed. This chapter presents the proposed system in more detail, and shows its overall block diagram.
2.1 System Architecture Figure 2.1 shows the complete system architecture of the color night vision system which is developed in this thesis. As it can be seen, the inputs of the system are images from IR thermal and low light-level (LLL) CCD visible cameras. After acquiring images from the cameras, they cannot be directly combined without passing them through some initial image preprocessing. For example, videos and images acquired by a LLL visible camera in a low light condition are too noisy, so a noise filter is needed to reduce random noise corrupting the images. Due to differences in camera position, viewing angle, lenses, field of view…etc, image registration becomes a mandatory process to register the IR thermal and visible images before combining them. The two registered images are fed into a dual band image fusion section which maps the IR thermal images into the R channel of an RGB system and visible images into the G channel, and sets the B channel to zero. Finally, the false fused image is fed to a color correction unit to change the unnatural colors in the fused image. In the color correction unit, the false colormap is swapped to an appropriate natural colormap that is selected in
13
a database. The colormap database and the techniques to select the best match for different environments will be discussed in more detail in later chapters. IR thermal Camera Image preprocessing
Image registration
Dual band image fusion
Image preprocessing
LLL visible camera
Natural Color Image database
Best colormap selection based on histogram comparison
Color Correction Unit
Display night time images in natural color appearance
Database of colormaps and their reference images
Figure 2.1
System architecture
2.1.1 Image Acquisition IR thermal and LLL cameras were utilized to acquire input images for the system. These two cameras were chosen because they operate in different spectral bands and can give complimentary information of a scene in front of a vehicle at night time, and lead to meaningful color representation of the system output when combined into a
14
single color image. Different cameras can be ad adopted pted for different applications. For example, in medical imaging systems, MRI and CT imaging modalities can be combined to obtain more detailed image of a patient body body, and easily colorized to improve disease identification accuracy.
IR thermal camera While driving in low light conditions or foggy weather,, hot targets such as human, human vehicles and animals can barely be seen. To alleviate this problem, a thermal camera is usually used because an IR thermal camera converts thermal energy from the long-wave (8-12 microns) part of thee spectrum into a visible image regardless of lighting condition [32]. In this thesis, a Tamarisk® IR thermal camera from DRS technologies Inc was used. It provides images with 320 320x240 240 resolution, and it has circular field-of-view field FOV of 40 degrees.. The size of this camera is less than 2 cubic inches, and the weight is as little as 33 grams. The Tamarisk® produces clear imagery day and night through fog, smoke, dust, and haze [34 34].
Figure 2.2 Tamarisk® IR thermal camera
15
Low Light Level (LLL) vvisible camera
The he IR thermal camera is not sufficiently capable to capture all information of an inspected scene when used alone for night time imagery. A LLL visible camera can be utilized to provide complimentary information of the scene. The LLL visible camera that is used in this work is an Everfocus EQ700 Super Low Light camera. It amplifies reflected visible to near infrared light over spectral range from 400 to 800 nm. nm It delivers a sensitivity of .0001 lux,, and analog video signal output at rate 30 fps with a resolution of 640x480 pixels.. This camera is equipped with a 2.8 – 11 mm 1/3 CCTV CS lens, lens and with an Encore ENMVG ENMVG-3 USB Audio/Video Grabber III which converts the t output analog video signal of the camera to digital video signal.
(a) Everfocus EQ700 Super
(b) Encore ENMVGENMVG
Low Light Camera Figure 2.3
Audio/Video Grabber III LLL visible camera and analog to digital video convertor
2.1.2 Image Preprocessing Usually the color of roads is black or dark gray, but in the IR thermal camera the roads is white which is not suitable for insight. An easy way to eliminate this false color of the roads is to invert the thermal image. Figure 2.4 shows the original thermal image 16
and its complement. In these two pictures, it can be concluded that the complement image is more accurate and closer to nature than the original image.
(b) Inverted thermal image
(a) Original thermal image
Figure 2.4 Inverting an IR thermal image
In extremely low light conditions, the visible images are corrupted by random noise. Hence, a noise reduction filter becomes a compulsory image preprocessing before implementing farther processes. A BM3D filter was used as a noise reduction process, and applied on visible images to eliminate the random noise [53]. The reason of selecting this algorithm is that it has significant advantages over the traditional noise filters. It neither blurs the image nor leaves any noticeable artifacts.
Visible image after applying BM3D noise reduction
Visible image corrupted by random noise
Figure 2.5 LLL visible image noise reduction 17
2.1.3 Image Registration The IR thermal and LLL visible cameras do not provide registered images. There is some sort of misalignments due to differences in cameras position, lenses, field of view, and resolution. Thus, they have to be registered before fusing them into a single image because combining two images without being registered makes ghosting artifacts that distract observers. The two cameras are mounted one on top of the other, and take images from the same viewpoint as shown in figure 2.6. In this scenario, there is little misalignment between the two images in translation, scaling, and shifting.
An affine transformation was adopted to obtain registered images because it is capable of describing rotations, translations, scaling and shears, and it preserves straightness of lines. The model of an affine transformation is X’ = RX + T
ݔ′ ܿߠݏ ′൨ = ݇ ቂ ݕ −ߠ݊݅ݏ
߬௫ ݔ ߠ݊݅ݏ ቃ ቂ ݕቃ + ቂ߬ ቃ ௬ ܿߠݏ
ܿߠݏ where k is scaling, R = rotation = ቚ −ߠ݊݅ݏ
ሺ2.1ሻ
߬௫ ߠ݊݅ݏ ቚ, and T = translation = ቚ߬ ቚ. ௬ ܿߠݏ
Equation 2.1 can be rewrite in simpler form as follow
ܿߠݏ where −ߠ݊݅ݏ 0
ܿߠݏ ݔ′ ′ ݕ൩ = ݇ −ߠ݊݅ݏ 0 1
ߠ݊݅ݏ ܿߠݏ 0
ߠ݊݅ݏ ܿߠݏ 0
߬௫ ݔ ߬௬ ൩ ቈݕ 1 1
ሺ2.2)
߬௫ ߬௬ ൩ is called homography matrix, and ሺݔ, ݕሻ and ሺ ݔ′ , ݕ′ ሻ are 1
the coordinates of corresponding points of the two images. Based on the IR thermal camera, the visible image is mapped to coordinates of the IR thermal camera. Once the
18
homography matrix is calculated, it can be used again and again to register upcoming images from the two cameras because they are rigidly mounted.
IR thermal Camera
Figure 2.6 3D view of mounting IR thermal and LLL visible cameras
LLL Visible Camera
B
(a) Thermal image
(b) LLL visible image before registration
C
(c) LLL visible image after registration Figure 2.7 IR thermal and LLL visible image registration 19
2.1.4 Dual-Band Image Fusion It has been mentioned that the two cameras are not sufficiently capable when used individually. For example, IR thermal cameras cannot capture information about backgrounds like trees, leaves, and grass in natural scene while LLL visible cameras can easily capture them [2, 3]. A suitable combination of IR thermal and visible images leads to better scene interpolation and target recognition because the fused image contains all information of the inspected scene captured by the two cameras. The serious drawback of IR thermal and LLL visible cameras is that the acquired images are represented in grayscale representation. Human eyes can only distinguish 100 shades of gray scale while they can discern 400 hues and about 20 saturation levels per hue [8]. Another advantage of this combination is to facilitate colorization algorithms because image fusion expands intensity range variations compared to individual IR thermal and visible images. The fusion method that is used in this work is very straightforward. The IR thermal image is mapped to the R channel, the visible image is mapped to the G channel of RGB image representation, and the B channel is set to zero. ܴܫ ܴ ܩ൩ = ܸ݅ ݏ൩ ܤ ݎ݁ݖ
(2.2)
In Figure 2.8, it can be seen that the fused image has more details of the scene than the individual IR thermal and LLL visible images, but it has unnatural color appearance. Hence, a color correction technique is necessary to change this false color to a natural daylight appearance. 20
B
A
(a) Inverted IR thermal Image
(b) Denoised visible image
C
(c) Fused image Figure 2.8 Multiband image fusion
2.1.5 Color Correction Unit Even though false color images can significantly improve observer performance and reaction times compared to corresponding gray-scale images [33], they are uncomfortable to watch. The main drawback with false color images is that observers need specific training with each of the unnatural false color schemes so that they can correctly and quickly recognize objects [10]. Thus, a color correction unit is added to the system to transfer natural colors from color day-time images to false color images because an appropriate color mapping can yield better overall scene recognition performance. In the color correction unit, the false color is swapped to a natural daylight 21
colormap selected in a database based on environment composition. Techniques of deriving a color mapping from a daylight image will be presented in great detailed later on in this thesis. 2.1.6 Best Colormap Selection One colormap is not adequate to transform the false colors in fused images of different environments to a natural daylight appearance in the system. For example, a color mapping that built for forest landscape cannot be applied to correct the false colors in the fused images of a downtown area. We know that the system is mobile which means the two cameras take pictures of different scenes in front of a car. This reason motivates us to build a set of color mappings for different environments and store them in a database. Another question that arises here is how to select the best color mapping among the others in the database for a specific environment? The false color fused images cannot be used as a reference to select the best color mapping because their colors are not natural, and cannot be used for comparison. An appropriate answer for this question is that the system must communicate with a GPS system and the internet in order to collect information about the car location and environment compositions. For selecting the best color mapping, a GPS sensor is utilized as the third input to the system. The system sends the GPS data to an online database of street images such as Google Street View or Microsoft StreetSide. In this thesis, Google Street View is preferred because it has a larger database of street images than Microsoft StreetSide. Once an image is obtained from Google Street View, it is compared using its histogram with the stored images of each color mapping in the database. Finally, the best match will be chosen for correcting the colors in the fused image. 22
2.2 Access to Google Street View The Google Street View allows developers to obtain static images and use them in their projects. A request is sent through a standard HTTP with URL of the following form. http://maps.googleapis.com/maps/api/streetview?parameters
There are some required parameters that have to be sent with the request. All parameters are separated using ampersand (&) character. The required parameters and their possible values are listed below. size: it specifies the size of the image in pixel. For example, if size = 600x400 is sent, the server returns an image 600 pixels wide and 400 high. The maximum possible image size that can be requested is 640x640 pixels. location: it can be either a text string (such as Chagrin Fall, OH) or latitude and longitude values of specific point. For example (28.007995, -80.635916). sensor: its value is either true or false. If it is true, it means the request came from a location sensor such as GPS sensor, but if is false then it means a location sensor was not used. heading: this is an important parameter because it indicates the compass heading of the camera. Values from 0 to 360 can be sent with a request to get images from different viewpoints of the same place. fov: indicates the horizontal field of view. This parameter is used to zoom in and zoom out the requested image. Smaller numbers indicate a higher level of zoom. It is expressed in degrees, with maximum allowed value of 120. 23
pitch: This parameter identifies the up and down angle of the camera. Positive values angle the camera up while negative values angle the camera down. An example of getting an image from the Google Street View is shown bellow. http://maps.googleapis.com/maps/api/streetview?size=620x620&loca tion=28.007995,-80.635916&fov=50&heading=40&pitch=0&sensor=false
Figure 2.9 The obtained image from the Google Street View from above HTTP request The GPS sensor that was utilized in this work was a Garmin GPS 18x USB sensor. It gives three important GPS data including latitude and longitude coordinates, and the compass heading to the system. These three GPS data are the most important required data to get an image of a location from the Google Street View. Other URL parameter values of an HTTP request can be fixed to appropriate values, for example, suitable values for our system are; image size = 640x640, fov=50, and pitch=0. 24
Chapter Three Color Mappings and the Database This chapter provides techniques required to create color mappings for different environments, and store them in a database. Since the color mappings are derived in lαβ color space it is also necessary to give readers a brief concept about this color space in this chapter. Based on the current environment content, a color mapping can be retrieved, and then applied to change the false color appearance of multiband fused images to a daylight appearance.
3.1 lαβ Color Space This color space was developed by Ruderman et al. [16]. In this color space, the correlation between channels is minimized. The l axis represents an achromatic channel, while α and β channels represent chromatic yellow-blue and red-green opponent channels respectively. There is a little correlation between the axes in lαβ space which lets us to perform any manipulations on each color component independently. 3.1.1 RGB to lαβ Transform First, images from RGB tristimulus values are converted to device independent XYZ tristimulus values. X 0.5141 0.3239 0.1604 R Y = 0.2651 0.6702 0.0641 G Z 0.0241 0.1228 0.8444 B
25
(3.1)
LMS color space can be obtained from XYZ values using the following equation, 0.3897 0.6890 -0.0787 X L M = -0.2298 1.1834 0.0464 Y S 0.0000 0.0000 1.0000 Z
(3.2)
The combination of (1) and (2) gives, L 0.3811 M = 0.1967 S 0.0241
0.0402 R 0.0782 G 0.8444 B
0.5783 0.7244 0.1228
(3.3)
The data in this color space shows a great deal of skew which can largely be eliminated by converting the data to logarithmic space. L = log L M = log M
(3.4)
S = log S Ruderman et al [14] developed the lαβ color space which minimizes correlation between axes using the fallowing equation. "
! √$
l α = 0 β 0
0
"
√%
0
0) 1 ( 0( 1 "( 1 √&'
1 1 L 1 − 2 M −1 0 S
(3.5)
If we think of L as red, the M as green, and S as blue, one can interpret as follows: achromatic component (l α r+g+b), yellow-blue component (α α r+g-b), red- green component (β α r-g).
26
3.1.2 lαβ to RGB Transform An image can be converted from lαβ back to RGB color space using the revere order steps. There is no way to convert lαβ to RGB directly, so the image is converted to LMS color space using the following function, + 1 , = . 1 1 -
1 1 −2
√$
!$
1 −1/ = 0 0 0
0
√% %
0
0) 0 ( 1 = 0( 2 √&( &'
(3.6)
Because of skew in the LMS color space, the data was converted to logarithmic space in RGB to lαβ space step, so the data should be converted back to its original space such that. L = 103
M = 104
(3.7)
S = 105
Then, the image in RGB color space can be obtained using this equation. R 4.4679 G = −1.2186 B 0.0497
−3.5873 2.3809 −0.2439
0.1193 L −1624 M 1.2045 S
(3.8)
3.2 Color Mapping Even though the fused image has more details of the scene than individual IR thermal and LLL visible images, it has an unnatural color appearance. Hence, a method is required to give night-time imagery (fused image) a natural day-time color appearance because an appropriate color mapping can yield better overall scene recognition performance. 27
3.2.1 Statistic Matching The color appearance of an image can be changed by transferring the first and second order statistics (mean and standard deviation) from an image to another one based on the technique that was developed by Reinhard et al [16]. Because of the correlations between the three channels’ values in RGB color space; this mapping method is performed in a perceptually decorrelated color space such as lαβ color space. The steps of color transfer using statistical approach can be summarized as follows: 1- Generate the false color image images (source image) using visible and thermal night vision. It is obtained by mapping the registered LLL visible and thermal IR images to B and G channels of an RGB image representation receptively, and the R channel is set to zero or a black image. 2- Choose a color daylight image (target image). The depicted scenes in source and target images need not to be identical, but they should resemble each other. 3- The false color image and the target image are transferred to lαβ color space using the mentioned steps. 4- In lαβ color space, the statistics of the target image is matched to the source image using the following function
678
=
96:8
−
? 8 < => ;: . ? =@
+ ;B8
For k = { l, α , β }
(3.9)
where IC is the resulting corrected color image, IS is the source (false color) image in lαβ color space, and µs, µt, σs and σt indicates the means and standard deviations of the source and target images respectively. 28
5- The resulting colored image is converted back to RGB color space for display. By using image statistics, colors can be transferred between images regardless of scene content and thus the accuracy of the colors in corrected fused images is very much dependent on how well the source images and tar target get images are matched [6]. [ The best database for target images could be Google Street View because a target image of the same scene can be obtained from it just by sending the GPS coordinate data of the fused image location via an HTTP request. The final fused image could have a day-time day color appearance, but identical to color day day-time time images can never be fully achieved [6]. [ A
B
(a)) Fused image (source image image)
(b) Target image attained from Google Street View
C
(c)) Corrected color image using statistical matching Figure 3.1 Color transfer using statistical approach 29
3.2.2 Lookup Table (LUT) Color Mapping The statistical method can give the multiband fused image a natural color appearance, but it has some disadvantages that make this approach unsuitable for the proposed system. It only addresses the global color characteristics of the depicted scene [32]. Colors in the resulting image depend on the relative amount of the different materials in the scene. Small objects are colorized with the same color of the dominated and large object in the reference image [32]. Due to the intense arithmetic operations to transform an image from RGB space to lαβ space, statistical method is computationally expensive and therefore cannot be realized in real time. Hogervorst ant Toet [7] developed an alternative color look up table (LUT) based method to overcome the drawbacks of statistical method. Compared to the statistical approach, this method is highly specific for different kinds of materials in the scene. The main advantage of this method is that the colors of objects relies on the fused image intensity values and is independent of the scene content. Thus, the color appearance of the resulting image matches daytime appearance more closely than the result of statistical method [32]. Unlike the statistical approach, to derive a color map using LUT method, the multiband false fused and its corresponding daytime reference images have to be the same scene and perfectly registered since the pixel valves determine which false color matches to which color in the reference image.
30
Generation of a colormap The steps for creating a colormap can be summarized as follows: 1- Create the false color fused image by mapping the registered LLL visible and thermal IR images to R and G channels of an RGB image representation receptively, and the B channel is set to zero. 2- Convert the multiband false color image to an index image. In such index image, a single index is assigned to each pixel. Each index value represents an RGBvalue in a color lookup table (LUT) (users choose the number of interties for the table). In our case, the color lookup table contains various combinations of R and G values because the B values are just zeros. 3- Derive the natural color equivalent for each index by locating the pixels in the indexed false color image with this index and finding the corresponding pixels in a strictly registered daytime image of the same scene. 4- Calculate the average of this group of pixels in lαβ color space. It means the RGB values are transformed to decorrelated lαβ values. This ensures that the computed average color reflects the perceptual average color [7]. Next, the results are converted back to RGB space. 5- The resulting RGB values are assigned as a new color lookup table for the false indexed image. 6- Finally, the lookup table of the false color fused image is replaced with the new derived color lookup table to give the multi-band nighttime imagery a natural daytime appearance.
31
These steps can be explained in more sensible manner by taking an example. Let’s assume that we are interested in obtaining the natural color associated with index 1. First, all pixels associated with index 1 are located in indexed fused false color. Next, all corresponding pixels in the registered daytime reference image are taken and converted to lαβ color space. The average of this group of pixels is calculated, and the result is converted back to RGB space. Finally, this RGB value is assigned to index 1 of the new color lookup table. This procedure is repeated to derive natural colors for all indices. When the color lookup table does not contain enough entries, the color mapping is obtained by finding the closest match of the table entries to the observed multiband sensor values [32]. Unfortunate, a poor colorization will be achieved if there is any misalignment between the false color fused image and the corresponding reference images.
(b) Natural Daytime color reference image
(a) False color fused image of thermal and visible images
32
(c) False colormap of image (a), and derived natural Colormap from image (c) respectively
(d) Result of replacing the false colormap of image (a) with the derived natural colormap in (c)
Figure 3.2 Color transfer using look-up tables approach
Figure 3.2 shows how well the false color appearance in the fused image of daytime thermal and visible images can be replaced to a natural color appearance using
color lookup table. The colors in the result image closely match the colors in the reference image. Note that the color of the left house wall does not appear as same as the color in the reference image image.. This is due to a larger impact of the street on the derive colormap. The street and the house wall have the same false color appearance in the false color fused image, but the street occupies a larger area compared to the house wall.
3.2.3 Usefulness of C Color Lookup Table (LUT) The color lookup table method has ppractical value if the derived colormap can be applied to colorize different images of various scenes in real time implementation. implementation One of the major advantages of this method is that it requires minimum computing power because the false colors in the multiband fused image can be transformed to natural colors just by swapping the false colormap to the derived colormap [7]. Figure 3.3 33
shows the results of applying the derived pair colormaps in figure 3.2 (c) to different images. The resulting images shown in figure 3.3 (c) and (d) closely resemble their daytime images since the colormap is derived from the corresponding scene (a scene containing the same materials). Meanwhile, some objects appear with wrong colors. For example, the color of the garage door and the house roof in figure 3.3 (c) and (d) respectively. That is due to the fact that colors (e.g. street and grass colors) which appear more frequently dominate the color mapping.
a
b
Fig. 3.3 (a) and (b) false color images c
d
Fig. 3.3 (c) and (d) Results of applying the derived colormap shown in Fig. 3.2 (c) 34
e
f
Fig. 3.3 (e) and (f) Corresponding natural daytime images Figure 3.3 Results of applying the derived colormap shown in Fig. 3.2 (c)
Figure 3.5 (c) and (d) show the result of applying the color mapping derived from figure 3.4 (a) and (b). Again, the results are pretty close to their corresponding natural color images. In brief, we can conclude that a derived colormap from a scene can be used to colorize different images of the corresponding scenes.
c
d
(a) False color fused image
(b) The corresponding natural reference image
35
(d) Image (a)) after replacing the false color map to the derived Figure 3.4 Col Color transfer using look-up tables approach
(c) Derived color mapping from (a) and (b)
a
b
Fig. 3.5 (a) and (b) false color images c
d
Fig. 3.5 (c) and (d)) Result Results of applying the derived colormap shown in Fig. F 3.4 (c) 36
e
f
Fig. 3.5 (e) and (f) Corresponding natural daytime images Figure 3.5 Results of applying the derived colormap shown in Fig. 3.4 (c)
3.2.4 Color Night Vision Based on Color Lookup Table (LUT) In the previous examples, the color lookup table method has successfully been demonstrated that it works perfectly with the multiband fused images taken during daytime. Since the system is proposed to give a natural color appearance to multiband fused night time images, this method should be verified for images taken during nighttime as well. Figure 3.7 shows the verification of this coloring method to a multiband nighttime image. Figure 3.7 (a) and (b) show the IR thermal and LLL visible images of a scene respectively. Figure 3.7 (c) shows the multiband fused image obtained by mapping the IR thermal image to R channel, and the visible image to G channel of RGB image representation. Figure 3.7 (d) shows the daytime reference image taken at the same view point. The result of using the color look up table method is shown in figure 3.7 (f). Even though the resulting image closely matches the daytime appearance, it looks noisy due to solorizing effect appearing in the luminance of the colorized image
37
[7]. To eliminate the noise, the luminance component is replaced with a grayscale fused image of the thermal and visible images. The final result (Fig. 3.7 (i) ) closely resembles its corresponding daytime image without any noticeable noise. The procedure of color co night vision using color lookup table (LUT) can be summarized with a block diagram shown in figure 3.6
Grayscale fused image False color fused image
Color correction unit of the proposed system
RGB to index image
Index image to RGB
RGB to HSV
Luminance replacement
The colorized image
Colormap replacement
Figure 3.6 Procedure rocedure of colorizing multiband fused image using LUT method
(a) Inverted IR thermal image
(b) Denoised LLL visible camera
38
(c) The false color fused image of ((a) & (b)
(d) Natural Daytime reference image
(e) Derived pair colormaps
(f) The noisy colorized image
(g) Luminance of the result image
(h) Grayscale fused image of (a) and (b) (b
39
(i) The final colorized image Figure 3.7 Process of Color Night Vision using LUT method
Figure 3.8 shows the result of applying the derived colormap from (a) ( and (c) to colorize another false color fused image shown in figure 3.8 ((b). ). Since the materials in the new fused image are as same as in the fused image used to derive the colormap, colormap most materials in the resulting image shown in figure 3.8 ((f) appear with correct colors. colors
(a) Multiband fused image used to derive a colormap
40
(b) Multiband fused image of different scene
(c) Daytime reference image
(d) Daytime image of (b)
(e) Result of color lookup table
(f) Result of applying derived colormap from (a) and (b)
Figure 3.8 Applying a derived colormap from a scene on another scene
Figure 3.9 show the result of applying the same derived colormap to colorize another false color fused image that does not containing the same materials. As it can be seen in figure 3.9 (c), some materials (sky and road) appear with a wrong color. In this case, deriving another colormap to colorize such different images is required.
41
(a) False color fused image
(b) Daytime of the same scene
(c) Colorized image using the same colormap from fig. 3.8 Figure 3.9 Applying the colormap from Fig.3.8 on a scene that does not have the same materials
3.3 Database of C Colormaps and Their Daytime Images mages From the previous examples,, we can conclude that a derived colormap color from a false color fused image and its corresponding daytime image can only be applied on corresponding false color fused images that contain the same materials. Thus, different colormaps is required for different environments. For example a colormap that is derived for a downtown area cannot be applied on images of forest areass and vice verse.
42
Therefore, a set of colormaps for different environments have to be derived, and stored in a database. Later on, based on the environment content (composition), an appropriate colormap is retrieved in the system. To retrieve a colormap from the database, the daytime reference images that are used to obtain colormaps are also stored in the database because there is no way to retrieve a colormap directly using colormap information. In this thesis, images used to derive the colormaps were not stored. Images of the places were downloaded from Google Street View and stored in the database. This is to increase the accuracy of retrieving the correct colormap. The question that arises here is how to retrieve a colormap while driving on a road? To answer this question, it is natural to think about a link between the system and the internet. The system attains images of the vehicle location from an online database of street images such as Google Street View. A GPS sensor is connected to the system and gives the coordinates and compass heading. Once an image is obtained from Google Street View using the GPS data, it is compared using its histogram with the stored images of each colormap in the database, and the best match is chosen for correcting the colors in the upcoming fused images. In this work, a database of 22 colormaps and their daytime images is built. Most of the images were gathered in Melbourne and Palm Bay, Florida, USA. To build the database, 22 locations are selected. If the system is utilized in another state that has different geographical landscape, then the database should be replaced and renewed with colormaps of suitable places in that state. The processing time and color correction accuracy depend on the database size. Unfortunately, a large database makes the system slower because the retrieval system based on histogram comparison needs more time to 43
find the best match. In other hands, having a larger database, leading to a better result because the accuracy of natural colors in a resulting image are highly dependent on where the selected colormap have been derived. Furthermore, a large database gives rise to a better colormap selection for a specific environment.
Figure 3.10 The stored images in the database 44
Chapter Four Color-Map Retrieval Based On Histogram Comparison In this chapter, various histogram distance metrics used to measure image similarity in two different color spaces, RGB and HSV, are evaluated and compared by providing precision versus recall graphs. The aim of comparing those distance metrics in different color spaces is to select the most accurate histogram distance measure and color space that can be employed in the colormap retrieval subsystem of the proposed system.
4.1 Introduction From Chapter 3, we concluded that one colormap is not sufficient to convert the false colors in multiband fused images of different environments to a natural daylight appearance in the proposed system. Hence, a set of different colormaps is derived for different environments and stored in a database. A subsystem is appended to the system in order to retrieve a colormap from the database for a specific environment. The information inside the derived colormaps cannot be used as signatures for comparison in the subsystem because it does not include significant features of the false color fused images or daytime reference images used to compute the colormaps. Thus, the daytime reference image of each colormap is also stored, and linked to its colormap in the database. 45
The subsystem can be an image retrieval system. Currently, the two most widely used image retrieval systems are text-based and content based methods [38]. In the text based system, images in the database are indexed and named according to the content. The disadvantage of this method is that it needs some sort of human interaction to provide an explanation of image content [39]. Moreover, this method is not closely related to human feeling to retrieve images [36]. To overcome these drawbacks, many content based image retrieval (CBIR) algorithms have been introduced such as QBIC (Query By Image Content) investigated by Niblack and his colleagues [43], the first CBIR system by Gudivada et al [45], and a Photobook system developed by Petland et al [44]. The basic idea of the CBIR is to extract the signature of a query image and compare it with already generated signatures of all images in the database based on some distance metrics [35]. Signature extraction is a process of obtaining image features which can be texture, shape, color or any other image information that can be used for image comparison. The most widely used feature in this field is color features such as a color histogram because it is computationally efficient. In addition, comparison of image histograms using numerical distances could be related to human perpetual differences [42]. It also is important to retrieve images that are visually similar. Since the accuracy of the results is highly dependent on the distance metrics and the color spaces [35], it is more convenient to evaluate various distance metrics in different color spaces before actually employing them in the system.
46
4.2 Colormap Retrieval Subsystem The retrieval algorithm utilized in the proposed system must be fully automatic without any human interaction. The CBIR can provide the automated way to retrieve a colormap based on the content of an obtained image from the Google Street View. Figure 4.1 shows the colormap retrieval subsystem structure designed for the proposed system.
Natural Color Image database
a Database of colormaps and their reference images
Best colormap selection based on histogram comparison Retrieved colormap
b GPS
HTTP request
Google Street view database
Query Image
Query Image Histogram
GPS sensor
Retrieved Colormap
Histogram Comparison
Figure 4.1 (a) Colormap retrieval subsystem of the proposed system (b) Expanded Colormap retrieval subsystem 47
Database of colormaps and histograms of daytime reference images
When driving the GPS sensor continuously gives the vehicle location coordinates to the system. Then, the he system forms an HTTP request to obtain an image of that location from Google Street View. The he colormap retrieval subsystem simply extracts the histogram of the obtained image and compares it with the already stored histograms of reference daytime images using a distance metric. Based on the degree of similarity, images mages in the database that has similar content to the obtained image are then ranked. Finally, the subsystem retrieves the closest image and its colormap to correct the false colors in nighttime multiband fused image images acquired from the same location. Figure 4.2 shows the retrieved image from the database using the above procedure.
(a) Obtained image from the Google Street View C
(c)) Histogram of the Obtai Obtained image
(b) Retrieved Image from the database
(d)) Histogram of the retrieved image
Figure 4.2 Retrieving an image using histogram comparison 48
4.3 Color Spaces paces A color space is a mathematical representation describing the way colors can be characterized as tuples of numbers numbers,, typically to three or four values or color components [35] Several color spaces are available in the literature for different applications.. The RGB and HSV models are the most commonly used in an image retrieval system.. Hence, in this thesis, these two models were considered. considered
4.2.1 RGB Color Space pace The RGB model is an additive color space in which Red (R), Green (G), Blue (B) values are added together to form a desired color [[46]. It is the most commonly used color space in practice for color monitors, digital cameras, and phone displays [51] [ because it simplifies the architecture and design of the systems [[46]. This color space is based on a Cartesian coordinate system as shown in figure 4.3. The line joining the black and white points at the corners represents the grayscale. b
a Blue Magenta Cyan
White Grayscale
Red
Black Green
Yellow
(a) RGB coordinate system
(b) Illustration of RGB color space
Figure 4.3 RGB color space 49
4.2.2 HSV Color Space HSV stands for Hue, Saturation, and value (grayscale component). Hue describes a pure color (e.g. pure green, yellow, or red), while saturation depicts the degree of lightness diluting a pure color [51]. This is related to the way how human eye perceives and interprets colors in a scene [51]. This model is able to decuple the intensity component from the color-carrying information (hue and saturation) in an image.
a A
b
Grayscale
θ
(a) HSV hex-cone coordinate system
(b) Illustration of HSV color space
Figure 4.4 HSV color space
4.2.3 RGB to HSV Conversion Let’s assume that an image is given in RGB color format, and the pixel values have been normalized to the range [0 1]. The H component of each RGB pixel is obtained as follows, H=
360 −
≤
>
50
(4.1)
=
[ ] /
"
#
(4.2)
The saturation component can be obtained using the equation $ = 1−
3 [ min&, , ] & + +
4.3
Finality, the value component is given by .=
1 & + + 3
4.4
where R,G, and B, are the R (red), G (green), and B (blue) channels of the given image in RGB color format respectively.
4.4. Histogram Based Colormap Search It is impossible to use the colormaps themselves to retrieve an appropriate colormap for a specific environment in the database. To solve this issue, the reference daytime images used for deriving the colormaps are also stored in the database. An image and its colormap can be retrieved from the database based on the image content comparison. The color characteristics of images can provide intuitive information for comparison [39]. This is not astonishing because even human recall is highly dependent to color [41]. A common approach to measure similarity degree between the color content of two images is to compare their color histograms because color histogram is an effective representation of the color content of an image, and easy to compute [41]. The proposed system is totally designed for colorizing dual-band fused image during 51
nighttime, so it is more reasonable to retrieve the colormap based on the color content of images rather than any other image features. The idea is that similar images would share similar portions of certain colors. The steps for retrieving a colormap based on color histogram comparison can be summered as follows 1- Select a suitable color space. 2- Color histograms are computed for each daytime reference image, and stored in the database. 3- Continuously compute the histograms of the obtained images from the Google Street View one at a time. 4- Quantization of histogram values into less number of bins. 5- Compare the histogram of the obtained image with each histogram of the stored images in the database using a distance metric. 6- Select the image and its colormap that has the closest color histogram to the obtained image. The performance of colormap retrieval in RGB and HSV color models, and various distance metrics will be studied later in this chapter. 4.4.1 Histogram Computation An important image feature that is effective in characterizing the global representation of an image is its color histogram [35]. Color histograms are popular and the basis for numerous spatial domain processing techniques. They can be executed in
52
inexpensive hardware, and implemented in real time applications due to the computational simplicity in software [51]. The histogram for a grayscale image is created by counting the number of pixels of each intensity level. For example, the histogram of a given image with intensity levels in range [0 L-1] can be thought as a vector [hr1, hr2, hr3 ……., hrL-1], and computed using the function
ℎ56 = 956 = ;6
4.5
where 56 is the kth intensity value, nk is the number of pixels in the image with intensity 56. The histogram can also be considered as the probability 956 of occurrence of intensity level 56 in the image [51]. Usually, the histogram is normalized by dividing
each of its components by the total number of pixel in the image. Hence, the normalized histogram is given by ℎ56 = 956 =
;6 =>
4.6
where M and N are the row and column dimensions of the image respectively. Figure 4.5 shows a grayscale image and its normalized histogram. By looking at the histogram, one can realize that the intensity level 178 occupies more pixels in the image compared to any other intensity level.
53
A
B
(a) An image
(b) The normalized histogram of the image Figure 4.5
Histogram of an image
This definition can be extended for color images to compute the joint probability of the intensities of the three color channels. The difference is that the color histograms consist of 3-D vectors instead of 1-D vector. This makes the color histograms difficult to visualize. There are several ways to compute the color histograms. The easiest way is to compute the histogram of each channel separately. In this thesis, the method that was used to compute color histogram is as follows. First, each pixel is considered as a color point in the selected color space (RGB or HSV). A color point is identified by the intensity values of the three channels in a pixel Ci = [Ia(x,y), Ib(x,y), Ic(x,y)], where a, b and c represent the three color channels and I is the intensity value at the pixel (x, y) in each color channel. Finally, the 1-D normalized color histogram is given by E C
1 ℎ56 = ? ? @A, B => D
D
FℎG5G @A, B =
54
1 0
A, B = H
A, B ≠ H
4.7
4.4.2 Color Quantization There are several difficulties in retrieving a colormap based on histogram comparison. The main issue regarding the use of color histograms is their high dimensionality. For example, to identify the color characteristics of a typical image in 24-bit RGB format using color histograms, 256x256x256 bins have to be taken into consideration. That is because each color channel is comprised of 256 intensity levels in this model. It is necessary to reduce the number of colors in such images otherwise the process of comparing two color histograms turns out to be computationally expensive and it cannot be realized in real time. A continuous tone picture is transformed to a discrete image using color quantization which shrinks the dimension of the space retaining the information of the color [36].This yields important savings in storage and processing time of histogram comparison because it reduces the number of colors into several bins [38]. Usually, for perceptually uniform color space, uniform quantization is performed, but for nonuniform color space a non-uniform quantization is chosen [52].
4.4.3 Distance Metrics for Histogram Comparison After the color histograms for all daytime reference images have been computed, quantized, and stored in the database as image features, a distance metric is used to compare the color histogram of the obtained image from the Google Street View with the color histograms of all reference images. The distance metric measures the degree of similarity between the obtained image and the daytime reference images in the database based on color histograms. Finally, that image which has the closest histogram 55
distance is retrieved, and the colormap corresponding to it is used to correct the false colors in the dual-band fused images. Several similarity measures have been developed for image retrieval systems based on image features such as color, shape, and texture in recent years [42]. Since the result is highly dependent on the metric used for histogram matching and the underlying color space, it is recommended to evaluate, and investigate the performance of different distance metrics in different color spaces to discover which distance metric and color space give the most accurate and perceptually correct results. In this thesis, seven different distance metrics have been evaluated in two different color models (RGB and HSV models) in order to decide which color model and distance metric can be utilized in the colormap retrieval subsystem to obtain better results.
Minkowski-form distance The Minkowski-form distance is the most commonly used metric for image retrieval. Let’s assume H = {hi} is the histogram of an image, and K = {ki} is the histogram of another image. The Minkowski distance is then given by /M
KLM N, O = P?|ℎR − SR |M T R
4.8
When r = 1, the Minkowski-form is also called city-block or L1 distance, but when r =2, it is called Euclidean or L2 distance [42], and KLM N, O is always greater or
equal to zero. KLM N, O would be zero for identical images and high for images that
show little similarity. 56
Histogram intersection Histogram intersection was invented by Swain et al [47] in 1991. It can be considered as a special case of L1 distance because Swain et al [47] showed that the histogram intersection is equivalent to the normalized L1 distance when the image histogram are scaled to the same size [47]. K∩ N, O =
∑R min ℎR , SR ∑R SR
4.9
Cosine angle distance The concept of the Cosine distance measure is explained in [48] by Salton and Buckley. This distance is commonly used in data mining. It determines whether two vectors are pointing roughly the same direction [35]. KLM N, O is zero for two identical images, and greater than zero for two different images. KYZ[ N, O = cos =
∑` _` 6`
a∑` _` a∑` 6`
4.10
X2 statistics (chi square statistics) This distance indicates whether the data vector (ℎR ) is well described by some set
of hypothesized values, in our case, the estimated mean (bR ). Kc N, O is close to
zero if the data vector is well described. This means Kc N, O has a small value if the two images are composed of the same contents. The degree of similarity between two images based on histogram comparison using X2 statistics is given by
57
where bR =
ℎR − bR d Kc N, O = ? bR
_` 6` d
4.11
R
is the mean histogram of the two histogram (H and K) vectors.
Match distance The match distance [49] is defined as L1 distance, but the difference is that this metric computes the distance between the cumulative histograms rather than performing bin by bin comparison like the mentioned distance metrics. In addition, it is only useful for one-dimensional histograms. KE N, O is given by KE N, O = ∑ReℎfR − SfR e
(4.12)
where ℎfR = ∑ghR ℎg is the cumulative histogram of {ℎg }, and similar for SfR . Kolmogorov-Smirnov distance The Kolmogorov-Smirnov distance is common statistical measure for unbinned distributions. Like the match distance, it is only useful for one-dimensional histograms [37]. Kij N, O = maxR eℎfR − SfR e
(4.13)
Again, where ℎfR = ∑ghR ℎg is the cumulative histogram of {ℎg }, and similar for mSR .
58
4.5 Recall and Precision Recall and Precision are the two commonly used metrics in retrieval systems to evaluate the performance of each distance metric in different color models. The Recall measures the ability of the system to retrieve the relevant images in the database in response to a query image, while the Precision signifies how many retrieved images are relevant to the query image [35, 38]. In general, the higher Precision at the same Recall value, the better distance metric is [35]. To make a better picture to these two metrics, let’s take an example >obpG5 5GqGrG;s btuG retrieved ystq >obpG5 btuG retrieved
n5G; = &Gtqq =
>obpG5 5GqGrG;s btuG 5GsrGz >obpG5 5GqGrG;s btuG ; sℎG ztstptG
Database A B a aa
a ab a ac
d
Figure 4.6 Illustration Precision and Recall
59
4.14 4.15
Let A be the relevant images in the database, and B be the retrieved images from the database. a stands for “unretrieved relevant” images, b stands for “retrieved relevant” images, c stands for “retrieved irrelevant” images, and d stands for unretrieved irrelevant images. &Gtqq = n|{ =
n{ ∩ n{
n5G; = n{| =
=
n{ ∩ n
p t+p =
p p+
4.16 4.17
4.6 Experiments and Results 4.6.1 Distance Metrics Evaluation The aim to evaluate the discussed distance metrics in both RGB and HSV color models is to discover which distance metric and color space give the most accurate and perceptually correct results. The COREL database which is the same database used in SIMPLIcity [50] was also used in this experiment. This database consists of 1000 different images divided into 10 categories (Africans, beaches, monuments, mountains, flowers, dinosaurs, elephants, horses, foods, buses). Each category has 100 different images of same objects. The result of CBIR depends not only on the used color space and the distance metric, but also somehow on composition of the colors that forms the image [36]. Therefore, the experiment was examined, and performed on different image categories rather than only one category. That is to give robustness to the experiment.
60
To reduce the retrieval processing time, the images are quantized to fewer color bins before their histograms being computed. For example, a typical image in RGB color space would have 256x256x256 = 16777216 histogram bins without quantization. This large number of histogram bins takes a long time to be computed and compared with the same size histograms of other images. Therefore, the images in RGB color space are quantized into 8x8x8 = 512 bins. The three channels (R, G, and B) are quantized using the same quantization level due to their equal dimensions in RGB color space (see Fig. 4.3). The color histograms for all images in the database are computed and stored after the images are quantized. Five images from five categories are randomly selected and queried with their color histograms as the inputs. Then, the distance between the color histogram of each query image and the stored histograms of all images is calculated using the seven mentioned distance metrics to observe their retrieval performance. In this thesis, the Precision vs. Recall graph is used as the performance criteria. This graph is calculated and plotted by varying the number of retrieved images from 5 to 100 and using the prior database information such as size of categories and number of relevant images to a selected query image. Finally, images that have closer color histograms to the query image are ranked and revealed; in this experiment, 7 best matches are shown in figures 4.7 and 4.8. Figure 4.7 (a) to (e) shows the performance of all distance metrics in RGB color space. The top left image is the randomly selected image as the query image, and the others are the retrieved images as the best matches to the query. As it can be seen in Figures 4.7, the Precision and Recall curves of X2 statistics are above the curves of 61
other distance metrics in most cases. We also know that the higher precision at the same recall value, the better distance metric is. Hence, we can conclude that the X2 statistics metric shows better performance compared to all other distance metrics in RGB color space.
Precision
First case: Flowers
Recall
62
Figure 4.7 (a )
Precision
Second case: Horses
Recall
Figure 4.7 (b)
Recall
Figure 4.7 (c)
Precision
Third: Food
63
Precision
Fourth case: African
Recall
Figure 4.7 (d)
Precision
Fifth: Buses
Recall Figure 4.7 (e)
Figure 4.7 (a – e) the performance of all distance metrics in RGB color space for multiple query images 64
The same procedure used in RGB color space can also be applied in HSV color space to evaluate the performance of the distance metrics. The only difference is that because the H, S and V components do not have equal dimensions in HSV color space (see figure 4.4), they are not quantized using the same quantization levels. The Hue requires the most attention because it contains of the information about the colors. In this experiment, the Hue is quantized to 32 levels, but the saturation and value components are quantized to 4 levels. The total histogram bins for each image becomes 18x4x4 = 512 bins. Figure 4.8 (a) to (e) shows the performance of all distance metrics in HSV color space. Again, the X2 statistics metrics outperformed the other distance measures.
Precision
First case: Flowers
Recall
65
Figure 4.8 (a)
Precision
Second case: Horses
Recall
Figure 4.8 (b)
Recall
Figure 4.8 (c)
Precision
Third case: Food
66
Precision
Fourth case: African
Recall
Figure 4.8 (d)
Recall
Figure 4.8 (e)
Precision
Fifth case: Buses
Figure 4.8 (a – e) the performance of all distance metrics in HSV color space for multiple query images 67
In this test, the Kolomogoroff-Simirnov distance metric which is one of the advanced metrics showed very poor performance. This is because it determines similarity rather than closeness of probability distributions, thus it is not appropriate to compare probability distribution of histograms using Kolomogoroff-Simirnov distance metric [1p].
4.6.2 Evaluation of X2 Statistics in RGB and HSV Color Spaces In the previous experiment, we concluded that the best distance metric in both RGB and HSV color spaces is X2 statistics. In this experiment, the X2 statistics metric is evaluated in both RGB and HSV color spaces to discover which color model is more accurate to be adopted into the system. Five random images from different categories are considered as the query images, and the Precision vs. Recall graph is calculated in both RGB and HSV color space using X2 statistics as the distance measure. Figure 4.9 shows the result of this test. In four cases, the performance of X2 statistics in HSV color space is better than in RGB color space. The reason is that in HSV model, the pure colors are decoupled from the brightness and value components of the images. Because we are interested in comparing images based on their colors, the Hue component which carries color information requires more attention, and therefore it is quantized into 32 levels, while each of saturation and value components which carry information about how bright and dark the colors appear is quantized into 4 levels. In this way, the effects of shadows are reduced in the color histogram comparison.
68
Precision
First case: an African image
Recall
Figure 4.9 (a)
Precision
Second case: a buses image
Recall
Figure 4.9 (b)
69
Precision
Third case: a horses image
Recall
Figure 4.9 (c)
Precision
Fourth case: a food image
Recall
Figure 4.9 (d)
70
Precision
Fifth case: a flower image
Recall Figure 4.9 (e) Figure 4.9 (a – e) The performance of X2 statistics distance metric in both RGB and HSV color spaces
4.6.3 Performance of X2 Statistics on the Colormap Database In the previous section, it has been shown that the best distance metric is X2 statistics in HSV color space. Therefore, it is chosen to be adopted in the proposed system to retrieve a colormap in the database storing the derived colormaps and their daytime images. In the database, the derived colormaps are coupled to their daytime reference images. In other words, if a reference image is retrieved from the database, the colormap derived from it is also retrieved to correct the false colors in the fused multiband images.
71
We have learned that the performance of X2 statistics is better in the HSV color space rather than in the RGB model. Therefore, all daytime reference images are converted to HSV color space, and then the color histograms for all of them are computed. Since most of the reference images contain sky, tree, and roads which means their scenes are close to each other, the quantization levels for each HSV components need to be carefully selected. In this thesis, the Hue component is quantized to 128 levels, whereas each of saturation and value components is quantized to 32 levels. Figure 4.10 shows the results of applying the X2 statistics to retrieve a colormap from the database when a query image is downloaded from the Google Street View and used as the input into the system. By looking at the two pictures, we can conclude that the X2 statistics has found the best match to the query image because most of the colors that are present in the query image can also be found in the retrieved image. Thus, it is reasonable to use the colormap which has been derived from this retrieved image. More examples will be shown in the next chapter.
a
b
Figure 4.10 (a) a query image obtained from Google Street View (b) the retrieved image 72
Figures 4.11 and 4.12 show two more examples. Again, the retrieved colormaps are appropriate to correct the false colors in multiband nighttime images of the scenes appeared in the query images.
b
a
Figure 4.11 (a) a query image obtained from Google Street View (b) the retrieved image
a
b
Figure 4.12 (a) a query image obtained from Google Street View (b) the retrieved image
73
Chapter Five Results and Conclusions This chapter presents additional results from applying different colormaps retrieved from the database using histogram comparison. A simple color enhancement method for deriving a colormap is also presented. Other results are shown to demonstrate how human and hot objects, such as animals, appear in the resulting colorized images. At the end of the chapter, conclusions to this thesis are drawn and some future work is suggested.
5.1 Color Enhancement In chapter three, the process for deriving a colormap from a multiband fused image and its corresponding registered daylight image was discussed. The natural color appearance of a multiband fused image is derived from a daylight reference image, so the colors in the derived colormap are highly dependent on the colors presenting in the daylight reference image. If the reference image is dark or bright for example, the colormap derived from it gives a dark or bright appearance to the multiband fused image respectively. Fig. 5.1 shows the case when the trees appear darker than they should be when compared to the reference scene. Furthermore, the clouds and the sky also affect the derived colormap. The image shown in Fig. 5.1c is the result of applying the colormap derived from the image shown in Fig. 5.1b to the multiband fused image shown in Fig. 5.1a. As it can be seen, a part of the sky appears in a wrong color, and the trees appear in dark green color. 74
(a) Multiband fused image
(b) Daylight reference image
(c) The resulting natural color multiband image Figure 5.1 Illustration of color transfer from a daylight reference image
A simple way to improve the color appearance in the resulting image is to scribble reasonable colors in the interest regions of the reference image as shown in Fig. 5.2b. The scribbles will not become visible in the resulting image because the average color of a group of pixels is calculated in the process of deriving a colormap. For example, the derived color for the trees is the average of the colors in pixels composing the trees. Fig. 5.2 shows the result obtained from the scribbled reference image. In contrast, the color
75
appearance of the result shown in Fig. 5.2c has been improved compared to the resulting image shown in Fig. 5.1c.
(b) Scribbled daylight reference image
(a) Multiband fused image
(c) The resulting natural color multiband image Figure 5.2 Illustration of color enhancement by scribbling the daytime reference image
5.2 Deriving the Colormap from a Synthetic Image Sometimes, the reference image of a scene is taken by a camera at an arbitrary angle, and therefore cannot easily be registered with the multiband fused image. Hence, a synthetic view of the scene could be generated and used to derive the colormap. An easy 76
way to make a synthetic view of a scene is to manually colorize the multiband fused image. Natural colors should be chosen based on the colors of the objects appearing in the scene. Fig. 5.3d illustrates the result of applying a colormap derived from the synthetic view shown in Fig. 5.3b. In this case, it is hard to get a registered image from the reference image shown in Fig. 5.3c. The synthetic view is the painted version of the multiband fused image. As can be seen in the resulting image, objects have a natural appearance closely resembling the corresponding daytime image.
(a) False color multiband fused image
(b) Synthetic image created from (a)
(c) Daylight image of the scene
(d) Color night vision result
Figure 5.3 Illustration of using a synthetic view as the reference daytime image
Fig. 5.4 shows the result of color night vision using the synthetic view obtained from the multiband fused image shown in Fig. 5.4a as the reference image. 77
(a) False color multiband fused image
(b) Synthetic image created from (a)
(c) Color night vision result Figure 5.4 Illustration of using a synthetic view as the reference daytime image
5.3 Human and Hot Objects Appearance Hot targets, such as human beings and animals, give off a lot of thermal radiation; therefore, they are captured as a bright white color by the IR thermal camera. While driving on a road, human and animals should clearly be seen in the resulting colorized images to improve situational awareness. That can assist drivers to take quick action for upcoming and moving objects. In the proposed system, observers can see hot targets as a 78
black color. The reason is that the intensity component of the resulting colorized image is replaced with a grayscale fused image of the inverted thermal and visible images. Fig. 5.5 shows the case when a person is moving around the trees. He has visibly come into view and can easily be distinguished from the other objects in the scene. Comparing to each of the individual bands (Fig. 5.5a and 5.5b), the person cannot even be seen in the visible image, but clearly appears in the thermal image with grayscale background. Moreover, the colorized background in the resulting image has given more meaning to the scene and improves observers’ performance in object recognition.
(b) IR thermal image
(a) Visible image
(c) Colorized multiband image Figure 5.5 Hot target appearances in a resulting color night vision image
79
I would like to add another picture
(b) IR thermal image
(b) Visible image
(c) Colorized multiband image Figure 5.6 Hot target appearances in a resulting color night vision image
80
5.4 Results of Applying Retrieved Colormaps In this section, the overall accuracy of the proposed system is summarized. For this purpose a Simulink program was built. The colormap retrieval subsystem based on histogram comparison is the most sensitive process happening in the proposed system. If this subsystem retrieves a wrong colormap from the database, then the objects in the resulting colorized image will appear with the wrong colors. This can make situational awareness even worse compared to the individual grayscale IR thermal and visible images. In the chapter four, a number of histogram distance measures were evaluated. We chose the chi-square metric as the best distance measure in HSV color space. Therefore, in the Simulink, the chi-square metric in HSV color space was used as the histogram distance measure to select an image and eventually its colormap for the location of interest. Fig. 5.7a is the input query image to the Simulink program. This image is obtained from the Google Street View by sending the coordinate data (longitude, latitude, and the compass heading) of the vehicle location coming from the GPS sensor. After comparing its color histogram with already stored histograms of many images in the database, the image shown in Fig. 5.7b was retrieved as the best match.
(a) Query image
(b) Retrieved image from the database 81
(c) Multiband fused image at the query image location
(d) Multiband fused image used to derive the colormap
(e) Location of the query image on the Google Maps
(f) Location of the retrieved image on the Google Maps
(e) Applying the retrieved colormap derived from (d) on image (c)
(h) Resulting colorized image using the colormap derived from (d) and stored in the database
Figure 5.7 applying a derived colormap from a scene on another scene of the same materials 82
In addition, the colormap derived from the scribbled version of the retrieved image and the false color fused image shown in Fig.5.7d was also retrieved and applied to correct the false color appearance in the multiband fused image shown in Fig. 5.7c. This multiband fused image was taken at the same place of the query image during nighttime. The color appearance of the resulting image (Fig. 5.7g) is close to nature. This is because the retrieval subsystem has retrieved a colormap which was derived from a scene very close to the scene appearing in the query image. Fig. 5.8 shows another example of the approach. Fig. 5.8b is retrieved as the response of the query image shown in Fig. 5.8a. Both scenes in the retrieved and the query images closely resemble each other. Therefore, it is reasonable to use the colormap of the retrieved image to correct the false colors in multiband fused image (Fig. 5.8c) taken at the same place as the query image. As the result, the color night vision image shown in Fig. 5.8g has a natural appearance.
(a) Query image
(c) Retrieved image from the database
83
(c) Multiband fused image at the query image location
(d) Multiband fused image used to derive the colormap
(e) Location of the query image on the Google Maps
(f) Location of the retrieved image on the Google Maps
(g) Applying the retrieved colormap derived from (d) on image (c)
(h) Resulting colorized image using the colormap derived from (d) and stored in the database
Figure 5.8 applying a derived colormap from a scene on another scene of the same materials 84
Fig. 5.9 and 5.10 show two more examples. Again the retrieval system successfully retrieved the best matches for both cases. In the resulting images shown in Fig. 5.10e and 5.10f, some objects appeared with the wrong colors, but that should not affect on a driver’s performance because they can clearly recognized the road and the trees
(a) Query image
(d) Retrieved image from the database (c)
(c) Multiband fused image at the query image location
(d) Multiband fused image used to derive the colormap
85
(f) Location of the retrieved image on the Google Maps
(e) Location of the query image on the Google Maps
(g) Applying the retrieved colormap derived from (d) on image (c)
(h) Resulting colorized image using the colormap derived from (d) and stored in the database
Figure 5.9 applying a derived colormap from a scene on another scene of the same materials
86
(a) Query image
(b) Retrieved image from the database
(e) Multiband fused image at the query image location
(f) Multiband fused image used to derive the colormap
(g) Applying the retrieved colormap derived from (d) on image (c)
(h) Resulting colorized image using the colormap derived from (d) and stored in the database
Figure 5.10 applying a derived colormap from a scene on another scene of the same materials 87
5.5 Disadvantage of Histogram Comparison In the previous examples, it can be noticed that the colormap retrieval subsystem has ability to retrieve the desire colormap for the location of interest. However, it has some disadvantages which may affect on its performance. A common drawback of histogram based retrieval system is that sometimes two images of totally different scenes may have the same histograms. There are two main aspects in the Google Street View that can degrade the performance of the colormap retrieval system. The images were taken under different weather conditions. Therefore, some images have clear sky and some others have cloudy sky. Since the sky occupy a large area in the images, the histograms of two images contain same materials but different sky conditions would not be similar. Another unwanted aspect is the color of roads. Some roads were new or had been renewed; therefore, they appeared with black color compared to old roads which their color had been changed to bright gray color. For example, when an image of a new highway road is compared with old highway road, their histograms would be quite different. Another disadvantage of the retrieval subsystem is number of images. A larger number of images in the database results in slower searching process. That is because the system needs to compare a lot of images in order to give one of them as the closest image. The capability of IR thermal camera is varying for different weather conditions. For example, image intensities of a scene that has been taken on summer season are not as same as image intensities of the same scene taken at another season. That causes the 88
color correction of the false color not to be accurate and some objects appear with wrong colors. Another example of this case is that when an image of scene is taken under clear weather condition would not be as same as an image of the same scene taken under rainy condition.
5.5 Conclusions Even though a fused image may have more details of a scene than the input sensors, it usually has an unnatural color appearance. In this thesis, a new color night vision system was presented to give a natural appearance to nighttime images. The system fuses two spectral bands, thermal and visible, to enhance night vision imagery. A color transfer based on look-up tables was used in the system to replace the false color in the multiband fused images with natural colors. Natural colors are borrowed from a daytime reference image. An optimal colormap for a certain environment is derived in advance. A single colormap is not sufficient for navigation when the environment is changing. Hence, a set of different colormaps was derived for different environments. As environments change the appropriate colormap should be selected. To retrieve a colormap from the database, the daytime reference images that are used to obtain colormaps are also stored in the database, and colormaps are extracted from them. A GPS sensor is connected to the system and gives the coordinate data and compass heading. By sending this GPS data, an image of the location of interest was obtained from Google Street View. The system has ability to select the best match image and 89
eventually its colormap in the database by comparing histograms of the obtained image and histograms of the reference daylight images. We evaluated the results of several comparison metrics and found that chi-square (X2) statistic metric gave the best results as a histogram distance measure. The results shown in this thesis clearly demonstrate the benefits of this system for nighttime navigation, surveillance and even target detection tasks. The resulting colorized nighttime images improve situational awareness because they closely resemble daytime reference images. We found that the proposed method could be used in real-time and aid nighttime vehicle navigation. The derivation of a colormap may require some time. However, once the colormap is derived, it can be employed in a real-time implementation because the swapping process requires minimal amount of processing time to exchange the false colormap of a multiband image with the derived colormap.
90
5.6 Future Work The system was equipped with two night vision cameras, IR thermal and LLL visible cameras. To further improve the resulting colorized image, three sensors could be utilized rather than two sensors. Two image intensifiers equipped with two different filters reflecting different spectral bands can be used to give a clear vision of the background of a scene, while an infrared sensor can be used to highlight hot targets. In the proposed system, there is no technique to recognize hot targets and give a warning sign while driving. Hence, it is reasonable to think about adding a hot target detection algorithm for improving drivers’ awareness as future work. An easy way to differentiate hot targets from other image contents is to assign them a specific color during the lookup table computation. Meanwhile, one could think about adding a vehicle detector to make the system more advanced and reliable. A challenge in is to find a way to combine the IR thermal image with the Google Street View. In this case, the two challenging processes that we will be facing are first how to remove vehicles and people in the images of Google Street View and second how to register them in real-time implementation before combination.
91
References [1] Zhang Junju, Han Yiyong, Chang Benkang, Yuan, Qian Yunsheng, Qiu Yafeng, "Real-time Color Image Fusion for Infrared and Low-Light-Level Cameras," In Proc. of SPIV, vol. 7383, pp. 73833B-1, 2009. [2] Xiaojing Gu, ShaoYuan Sun, and Jian'an Fang., "Real-time Color Night-vision for Visible and Thermal images.," In Intelligent Information Technology Application Workshops, 2008. IITAW'08. International Symposium on, pp. 612-615, IEEE, 2008. [3] Gang Liu, Guohong Huang, "Color fusion based on EM algorithm for IR and visible image," IEEE International Conference Computer and Automation Engineering (ICCAE), vol. 2, pp. 253-258, 2010. [4]
Wang, Lingxue, Shiming Shi, Weiqi Jin, and Yuanmeng Zhao. "Color fusion algorithm for visible and infrared images based on color transfer in YUV color space." In International Symposium on Multispectral Image Processing and Pattern Recognition, pp. 67870S-67870S. International Society for Optics and Photonics, 2007.
[5] Wang, Lingxue, Yuanmeng Zhao, Weiqi Jin, Shiming Shi, and Shengxiang Wang. "Real-time color transfer system for low-light level visible and infrared images in YUV color space." In Defense and Security Symposium, pp. 65671G-65671G. International Society for Optics and Photonics, 2007. [6]
Toet, Alexander. "Natural colour mapping for imagery."Information fusion 4, no. 3 (2003): 155-166.
multiband
nightvision
[7] Hogervorst, Maarten A., and Alexander Toet. "Method for applying daytime colors to nighttime imagery in realtime." In SPIE Defense and Security Symposium, pp. 697403-697403. International Society for Optics and Photonics, 2008. [8] Haq, Anwaar-ul, I. Gondal, and M. Murshed. "Automated multi-sensor color video fusion for nighttime video surveillance." In Computers and Communications (ISCC), 2010 IEEE Symposium on, pp. 529-534. IEEE, 2010. [9]
Toet, Alexander. "Natural colour mapping for imagery."Information fusion 4, no. 3 (2003): 155-166.
multiband
nightvision
[10] Y. Zheng, "An overview of night vision colorization techniques using multispectral images: From color fusion to color mapping," Audio, Language and Image Processing (ICALIP), 2012 International Conference on, pp. 134-143, 2012.
92
[11] Burt, Peter J., and Raymond J. Kolczynski. "Enhanced image capture through fusion." In Computer Vision, 1993. Proceedings., Fourth International Conference on, pp. 173-182. IEEE, 1993. [12] Li, Hui, B. S. Manjunath, and Sanjit K. Mitra. "Multisensor image fusion using the wavelet transform." Graphical models and image processing 57, no. 3 (1995): 235245. [13] Toet, Alexander. "Hierarchical image fusion." Machine Vision and Applications3, no. 1 (1990): 1-11. [14] Xiao, Xuezhong, and Lizhuang Ma. "Color transfer in correlated color space." InProceedings of the 2006 ACM international conference on Virtual reality continuum and its applications, pp. 305-309. ACM, 2006. [15] Reinhard, Erik, Michael Adhikhmin, Bruce Gooch, and Peter Shirley. "Color transfer between images." Computer Graphics and Applications, IEEE 21, no. 5 (2001): 34-41. [16] Ruderman, Daniel L., Thomas W. Cronin, and Chuan-Chin Chiao. "Statistics of cone responses to natural images: Implications for visual coding." JOSA A 15, no. 8 (1998): 2036-2045. [18]
Toet, Alexander. "Colorizing single images."Displays 26, no. 1 (2005): 15-21.
band
intensified
nightvision
[19] Welsh, Tomihisa, Michael Ashikhmin, and Klaus Mueller. "Transferring color to greyscale images." ACM Transactions on Graphics 21, no. 3 (2002): 277-280. [20] Li, Guangxin, and Ke Wang. "Applying daytime colors to nighttime imagery with an efficient color transfer method." In Defense and Security Symposium, pp. 65590L65590L. International Society for Optics and Photonics, 2007. [21] Guangxin Li, "Image Fusion Based on Color Transfer Technique." Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences. [22] Cherian Varghese and Umesh Shabkar, “Passenger Vehicle Occupant Fatalities by Day and Night – A Contrast” Traffic Safity Facts; Research note. NHTSA. [23] Wang, Yue, Eam Khwang Teoh, and Dinggang Shen. "Lane detection and tracking using B-Snake." Image and Vision computing 22, no. 4 (2004): 269-280. [24] Serfling, Matthias; Roland Schweiger, and Werner Ritter, "Road course estimation in a night vision application using digital map, a camera sensor and a prototypical imaging radar system," IEEE intelligent Vehichles Symosium, pp. 810-815, 4-6 June 2008. 93
[25] Tran, Trung-Thien; Jin-Ho Son, Byun-Jae Uk, Jong-Hwa Lee, and Hyo-Moon Cho, "An adaptive Method for detection Lane boundary in Night Scene," Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence, pp. 301-308, 2010. [26] Peng, Chi; Bing-Yu Chen, and Chi-Hung Tsai, "Integrated Google Maps and Smooth Street View Videos For Route Planning," In Computer Symposium (ICS), 2010 International, pp. 319-324, 2010. [27] Chen, Billy; Boris Neubert, Eyal Ofek, Oliver Deussen, and Michael F. Cohen, "integrated videos and Maps for driving Direction," In Proceedings of the 22nd annual ACM symposium on User interface software and technology, pp. 223-233, 2009. [28] Gautam, Shantanu; Gabi Sarkis, Edwin Tjandranegara, Evan Zelkowitz, YungHsiang Lu, and Edward J. Delp, "Multimedia for mobile environment: image enhanced navigation," Proceedings of SPIE, vol. 6073, pp. 128-138, 2006. [29] Zamir, Amir; and Mubarak Shah, "Accurate Image Localization Based on Google Maps Street View," in European Conference on Computer Vision (ECCV), 2010. [30] Bertozzi, Massimo, and Alberto Broggi. "GOLD: A parallel real-time stereo vision system for generic obstacle and lane detection." Image Processing, IEEE Transactions on 7, no. 1 (1998): 62-81. [31] Ma, Bing, S. Lakahmanan, and Alfred Hero. "Road and lane edge detection with multisensor fusion methods." In Image Processing, 1999. ICIP 99. Proceedings. 1999 International Conference on, vol. 2, pp. 686-690. IEEE, 1999. [32] Toet, Alexander, and Maarten A. Hogervorst. "Real-Time Full Color Multiband Night Vision." [33] Toet, Alexander. "Applying daytime colors to multiband nightvision imagery." In Proceedings of the Sixth International Conference on Information Fusion, FUSION, vol. 2003, pp. 614-621. 2003. [34] DRS Technologies Inc. "The Tamarisk®320 datasheet." 2012 [35] Sinha, Abhijeet Kumar, and K. K. Shukla. "A Study of Distance Metrics in Histogram Based Image Retrieval." INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY 4, no. 3 (2013): 821-830.
94
[36] Conci, A. U. R. A., and E. M. M. M. Castro. "Color Image Retrieval System: A Comparison of Approaches." Proc. of V Simpósio Brasileiro de Sistemas Multimídia e Hypermedia, Goiânia, Goias (1999). [37] Rubner, Yossi, Carlo Tomasi, and Leonidas J. Guibas. "The earth mover's distance as a metric for image retrieval." International Journal of Computer Vision 40, no. 2 (2000): 99-121. [38] Kaur, Simardeep, and Vijay Kumar Banga. "Content Based Image Retrieval: Survey and Comparison between RGB and HSV model." [39] Chakravarti, Rishav, and Xiannong Meng. "A study of color histogram based image retrieval." In Information Technology: New Generations, 2009. ITNG'09. Sixth International Conference on, pp. 1323-1328. IEEE, 2009. [40] Pass, Greg, and Ramin Zabih. "Histogram refinement for content-based image retrieval." In Applications of Computer Vision, 1996. WACV'96., Proceedings 3rd IEEE Workshop on, pp. 96-102. IEEE, 1996. [41] Androutsos, Dimitrios, K. N. Plataniotiss, and Anastasios N. Venetsanopoulos. "Distance measures for color image retrieval." In Image Processing, 1998. ICIP 98. Proceedings. 1998 International Conference on, vol. 2, pp. 770-774. IEEE, 1998. [42] Kaur, VK Banga Avneet, and V. K. Banga. "Color based image retrieval." InPSRC Proceedings Pattaya Conferences. 2011. [43] Niblack, Carlton W., Ron Barber, Will Equitz, Myron D. Flickner, Eduardo H. Glasman, Dragutin Petkovic, Peter Yanker, Christos Faloutsos, and Gabriel Taubin. "QBIC project: querying images by content, using color, texture, and shape." In IS&T/SPIE's Symposium on Electronic Imaging: Science and Technology, pp. 173-187. International Society for Optics and Photonics, 1993. [44] Pentland, Alex, Rosalind W. Picard, and Stan Sclaroff. "Photobook: Content-based manipulation of image databases." International Journal of Computer Vision 18, no. 3 (1996): 233-254. [45] Gudivada, Venkat N., and Vijay V. Raghavan. "Content based image retrieval systems." Computer 28, no. 9 (1995): 18-22. [46] Jack, Keith. Digital Video and DSP: Instant Access: Instant Access. Access Online via Elsevier, 2008.
[47] Swain, Michael J., and Dana H. Ballard. "Color indexing." International journal of 95
computer vision 7, no. 1 (1991): 11-32. [48] Belkin, Nicholas J., and W. Bruce Croft. "Information filtering and information retrieval: two sides of the same coin?." Communications of the ACM 35, no. 12 (1992): 29-38. [49] Werman, Michael, Shmuel Peleg, and Azriel Rosenfeld. "A distance metric for multidimensional histograms." Computer Vision, Graphics, and Image Processing 32, no. 3 (1985): 328-336. [50] Wang, James Ze, Jia Li, and Gio Wiederhold. "SIMPLIcity: Semantics-sensitive integrated matching for picture libraries." Pattern Analysis and Machine Intelligence, IEEE Transactions on 23, no. 9 (2001): 947-963. [51] Rafael C. Gonzalez, Richard E. Woods. (2008). Digital Image Processing. Pearson Education. Inc. [52] Smith, John R., and Shih-Fu Chang. "Automated image retrieval using color and texture." IEEE Transaction on Pattern Analysis and Machine Intelligence(1996). [52] Dabov, Kostadin, Alessandro Foi, Vladimir Katkovnik, and Karen Egiazarian. "Image restoration by sparse 3D transform-domain collaborative filtering." InElectronic Imaging 2008, pp. 681207-681207. International Society for Optics and Photonics, 2008.
96