Figure 3-8 â hit percentage â fusion method X image quality. Means ...... of the images rated as good by the viewers had one of their input bands from the near.
Ben-Gurion University of the Negev Faculty of Engineering Sciences The Department of Industrial Engineering and Management
Comparing Multispectral Image Fusion Methods for a Target Detection Task
By: Yoel Lanir
Thesis submitted in partial fulfillment of the requirements for the M.Sc Degree June 2005
I
Ben-Gurion University of the Negev Faculty of Engineering Sciences The Department of Industrial Engineering and Management
Comparing Multispectral Image Fusion Methods for a Target Detection Task
By: Yoel Lanir Supervised By: Dr. Masha Maltz
Thesis submitted in partial fulfillment of the requirements for the M.Sc Degree
June 2005
II
Abstract With the advance in multispectral imaging, the use of image fusion has emerged as a new and important research area. Many studies have examined human performance in specific fusion methods over the individual input bands, yet few comparison studies have been conducted to examine which fusion method is preferable over another. The current study compared human performance for the pixel averaging, false color, and principal components fusion methods, as well as a novel method based on edge detection, for a target detection task. Three experiments, involving 89 participants, were conducted. In the first experiment, images with multiple targets were presented to the participants. Quantitative measurements of participants' hit accuracy and reaction time were measured. In the second experiment, a paired comparison method was used in order to qualitatively assess the subjective value of the methods and scale the quality of each method. In the third experiment, participants' eye movements were recorded as they searched for targets. A novel method was introduced to compare eye movement data of different image samples when looking at small targets. Results indicated that the false color and principal components fusion methods showed best results over all experiments.
III
Acknowledgements
Firstly, I would like to express my gratitude to my supervisor, Dr. Masha Maltz, for her enthusiasm, support, patience, and all the helpful conversations, and comments on my text and throughout the course of this work. I would also like to thank Prof. Stanley Rotman for the professional guidance, and all the helpful comments and ideas. I am grateful to Prof. Joachim Mayer for his insightful comments and to Prof. David Shinar for his kindness and help throughout my studies. Also, I would like to thank Dr. Yisrael Parmet for the statistical help. Lastly, and most importantly, I would like to thank my dear wife Shuli, for her love and patience, for her everlasting support, and for being a great source of strength throughout this work. Without her, I would never be where I am today.
IV
Table of Content
1
Introduction............................................................................................................... 1 1.1 Image fusion........................................................................................................ 1 1.2 Multispectral images........................................................................................... 3 1.3 Image fusion applications ................................................................................... 4 1.4 Categories of fusion ............................................................................................ 5 1.5 Methods of Image Fusion ................................................................................... 6 1.6 Human factor issues in Image fusion................................................................ 11 1.6.1 Detection Tasks......................................................................................... 11 1.6.2 Search Tasks ............................................................................................. 13 1.6.3 Recognition and identification Tasks........................................................ 14 1.6.4 Situation awareness .................................................................................. 16 1.6.5 Human factor discussion........................................................................... 17 1.6.6 The lab method verses the natural method ............................................... 18 1.7 Chromatic and achromatic fusion ..................................................................... 19 1.8 Effect of the input bands on the fusion ............................................................. 21 1.9 The effect of the scene and target ..................................................................... 22 1.10 Target Detection................................................................................................ 23
2
The Current Study.................................................................................................. 25 2.1 Research Objectives.......................................................................................... 25 2.2 Research methodology...................................................................................... 27 2.3 Fusion methods ................................................................................................. 28 2.3.1 False Color ................................................................................................ 28 2.3.2 Principal Components............................................................................... 29 2.3.3 Simple intensity average ........................................................................... 30 2.3.4 Edge fusion ............................................................................................... 30
3
Target Detection...................................................................................................... 32 3.1 Method .............................................................................................................. 32 3.1.1 Participants................................................................................................ 32 3.1.2 Apparatus .................................................................................................. 33 3.1.3 Stimuli....................................................................................................... 33 3.1.4 Procedure .................................................................................................. 34 3.2 Results............................................................................................................... 37 3.2.1 Total hits ................................................................................................... 37 3.2.2 False alarms .............................................................................................. 40 3.2.3 Detection time........................................................................................... 41 3.3 Discussion ......................................................................................................... 42
V
4
Paired Comparisons................................................................................................ 44 4.1 Method .............................................................................................................. 44 4.1.1 Participants................................................................................................ 44 4.1.2 Apparatus .................................................................................................. 45 4.1.3 Stimuli....................................................................................................... 45 4.1.4 Procedure .................................................................................................. 46 4.2 Results + Discussion......................................................................................... 47
5
Eye Tracking ........................................................................................................... 53 5.1 Method .............................................................................................................. 54 5.1.1 Participants................................................................................................ 54 5.1.2 Apparatus .................................................................................................. 54 5.1.3 Stimuli....................................................................................................... 55 5.1.4 Procedure .................................................................................................. 56 5.2 Results............................................................................................................... 57 5.2.1 Average vs. False Color............................................................................ 58 5.2.2 Average vs. Principal Components........................................................... 61 5.2.3 False Color vs. Principal Components...................................................... 64 5.3 Discussion ......................................................................................................... 65
6.
Discussion................................................................................................................. 69 6.1 Result Summary................................................................................................ 71 6.2 Results Discussion ............................................................................................ 72 6.2.1 Chromatic vs. achromatic fusion .............................................................. 75 6.3 Limitations and future research ........................................................................ 76
7.
Conclusions.............................................................................................................. 78
8.
References................................................................................................................ 80
9
Appendices............................................................................................................... 86 Appendix1: Appendix2: Appendix3: Appendix4: Appendix5: Appendix6:
Familiarization images that were given before the target detection 86 Target detection statistical results .................................................... 86 Paired comparison decision time ..................................................... 90 Eye tracking statistical results .......................................................... 92 Example of original and fused images ........................................... 103 List of images used in the experiments .......................................... 105
VI
List of Tables
Table 4-1 Sum of comparative judgments of all images ..................................................... 48 Table 4-2 Percentage of comparative judgments of all images ............................................ 49 Table 4-3 Distance between each method in standard deviation of preference units.... 50 Table 4-4 Scale value of the different fusion methods ........................................................ 51 Table 5-1 Summary of eye-tracking statistical results ........................................................ 67 Table 6-1 Ranks of the four fusion methods for all experiment ........................................... 72
VII
List of Figures
Figure 1-1
Diagram of a generic multiscale decomposition fusion ....................................... 7
Figure 3-1 Example fused image band1 ........................................................................ 35 Figure 3-2 Example fused image band2 ........................................................................ 35 Figure 3-3 Example fused image Average ......................................................................... 35 Figure 3-4 Example fused image Edge.............................................................................. 36 Figure 3-5 Example fused image false color ...................................................................... 36 Figure 3-6 Example fused image principal components ..................................................... 36 Figure 3-7 effect of fusion method on the hit percentage average ................................ 38 Figure 3-8 – hit percentage – fusion method X image quality. Means for high and low image quality in each fusion method ................................................................................ 39 Figure 3-9 Mean false alarm rates per fusion group ........................................................... 41 Figure 3-10 Mean target detection time per fusion group ................................................... 42
Figure 4-1 Example of a paired comparison between average and principle components methods ............................................................................................................................. 47 Figure 4-2 scale value by law of comparative judgment of different fusion methods ............ 51
Figure 5-1 Figure 5-2 Figure 5-3 Figure 5-4 Figure 5-5 Figure 5-6 Figure 5-7 Figure 5-8 Figure 5-9
fixation count percent of average and FC in each image side...................... 59 Effect of fusion method on the fixation number percent for each image..... 60 Average sum of fixation durations on targets of average and FC methods . 61 fixation count percentage of average and PC methods in each image side . 62 effect of fusion method on the fixation number percentage in each image . 63 Average sum of fixation durations on targets of average and PC method... 64 Fixation count fraction of PC and FC methods in each image side ............. 64 Average sum of fixation durations on targets of FC and PC methods .................. 65
Figure 6-1
Example of multispectral image ........................................................................ 74
Example of a composite image ........................................................................ 56
VIII
Keywords Target detection, Multispectral imaging, Image fusion, Visual search, Eye Tracking.
IX
X
1 Introduction This research project presents a comprehensive approach for comparing different fusion methods of multispectral images. We examined four different fusion methods, and compared human performance of observers viewing fused images in a target detection task
1.1 Image fusion With the availability of multi-sensor data in many fields such as remote sensing, computer vision, medical, and army applications, image fusion has emerged as a new and important research area. The human eye is sensitive to a limited range of the electromagnetic spectrum as well as to low light intensity. In order to answer the need for obtaining data that can not be sensed by the eye, one can use sensor data such as thermal sensors or image intensifier night time sensors. In certain tasks, the human observer needs data from multiple sensors. For example, using the visual channel as well as the thermal channel can substantially improve the ability to detect a target (Toet et al., 1997). When the target is cold (early morning, rain) the contrast between the target and the background is larger in the visual channel then in the thermal channel, resulting in higher detection in the visible channel. On the other hand, it is hard to detect a hidden target in the visual channel, yet the difference between the target’s thermal signature and its surroundings may ease its detection in the thermal channel. According to Toet (1992), the simultaneous use of different sensors in different displays increases the operator workload. It is difficult to reliably integrate the visual
1
information from different displays, both in a spatial display which puts the display of each sensor side by side, and in a sequential display which shows the displays one after another. Recognizing relationships among patterns can be difficult, since an object can appear quite differently in the different sensor displays. A solution for this problem is to display the different sensor data using one display, combining the data using a process which is called sensor or image fusion. Another potential advantage of image fusion is to provide scene information not present in the input bands. By deriving information based on the differences between the input images, information not presented in the input bands can be shown in the fused image. (Sinai et al. 1999) Image fusion refers to the process of combining the signals provided by different sensors of the same image in one display. Image fusion tends to improve the reliability of the images using the redundant information between the different images and to improve the capability by using the complementary information between the images. This type of image fusion is also called pixel-level multi-sensor fusion (Luo et al. 2002). The sensors used for image fusion need to be accurately co-aligned so that their images will be in spatial registration. The different image fusion schemes should extract all useful information from the source images, while not adding artifacts which will distract human observers, and keeping the scheme reliable and robust to imperfections such as mis-registration of the source images. Our goal in image fusion is to combine and preserve in a single image, all the perceptually important information that is present in the input images, so that the resulting image will be more suitable for the purpose of human visual perception, object
2
detection, and target recognition. Hence, for a given observation task, performance with the fused image should be at least as good as performance with the individual input images.
1.2 Multispectral images Using multi-spectral data from satellites and from individual cameras is the wave of the future for military and industrial applications. While the increase in camera resolution has improved each year, many feel that now the development of multispectral cameras will be the cause of the next large advance in electro-optical target acquisition. Multispectral images are images from two or more discrete bands of the electromagnetic spectrum. Each individual image is of the same scene and resolution, but of a different spectral band. For example, a digital camera can capture three separate images from the electromagnetic spectrum: red, green and blue. Later, they are combined to form a single RGB image. The bands of a multispectral image can be from the visible or non-visible wavelength of the spectrum. This technique can provide an optical spectrum of each pixel of the image. Image processing techniques can then be applied to extract the required data from the image. Objects of different materials normally have unique spectral signatures. In other words, different objects will reflect and absorb light differently. Thus, we can define a spectral signature for each type of object in the scene. In this way, we can use the fusion of multispectral images to enhance the view of objects of interest in the scene. Multispectral imaging in the visible and the near infrared wavelength range is routinely used in remote sensing (the analysis of landscapes and structures from aircraft
3
or satellites). There are many applications that can benefit from this procedure including the detection of different crops, mineral deposits or land-mines, military camouflage detection, and monitoring of agricultural resources. Another application of multispectral data is automatic target recognition. Computers are able to quickly and efficiently segment and analyze images when based both on their brightness and on their spectral signatures (Caefer et al. 2002). Point targets become clear when matched target algorithms and anomaly detection algorithms (Raviv and Rotman, 2003) are applied. However, this advantage becomes less clear when multispectral data is being presented to human observers. The problem is that while multispectral data is three dimensional (x, y, and the spectral dimension), the image presented to the observer is two dimensional. Image fusion provides us a way to transform the three dimensional data into two dimensions.
1.3 Image fusion applications Development of different image fusion methods is intended for many fields including medical imagery, processing of satellite images (remote sensing), tracking of production processes and military applications. Here are some examples of applications of image fusion: 1. In order to help night driving in conditions of low beam illumination or no illumination at all, it was suggested to add a thermal sensor to the car (Krebs et al. 1999). The image from the thermal sensor is presented on a display (HUD), which is placed on the windscreen just above the steering wheel. This setting is problematic since it requires the driver to alternate attention and gaze between the front-viewed
4
scene and the HUD, and to integrate information from displays differing in size, aspect ratio, luminance, and special ratio. The alternative display suggested is a system that will present a combination of the IR information and visible information in one sensor fused image. 2. Perconti and Steele (1997) examined a system in which a helicopter pilot uses a number of sensors (usually thermal and visual sensors) that are being displayed on a helmet-mounted display (HMD). Today, the pilot can change between the different displays. They suggest combining the information from the different sensors to one display using sensor fusion. They predict that a fusion display will ease the navigation task, and will help the pilot keep eye contact with the landing field in bad visual conditions (night, rain, fog). 3. Xue and Blum (2003) used image fusion to help detect a concealed weapon using IR and visual sensors. The fused image maintains the high resolution and the natural color of the visual image while incorporating any concealed weapons detected by the IR sensor. This fused image can be helpful for a police officer, for example, who must respond quickly based on a glance of the fused image.
1.4 Categories of fusion A multispectral image or different sensor images can be fused using various methods. These can be divided into categories: pixel level, data level or feature level fusion and decision level fusion. Our study, like most studies today, focuses on the pixel level fusion.
5
Pixel level fusion On this level, the input images are fused together on the pixel level. Methods using this level either use arithmetic operations (like addition, subtraction) on corresponding pixel intensity from different input images or use the frequency domain. Using the frequency domain, the input images are first transformed in the frequency domain using various pyramid based methods like Laplacian, or Wavelet transforms. After transformation, algebraic operations are performed on the input images fusing them to one image. Then, that image is inverse transformed to the final fused image. The algebraic rule can be based on pixel contrast, intensity, or on weight of a specific spectrum.
Feature level fusion At the feature level, features from the input images will be first extracted, and then fusion will be done based on these features. The typical algorithms used are featurebased template methods (like edge enhancement), Artificial Neural Networks, and knowledge based approaches.
Decision Level Fusion In decision level fusion first the features are extracted from each input. Then a decision is made on each input, and only then the decisions are fused to a final decision.
1.5 Methods of Image Fusion
6
In this chapter the most common methods of fusion are described. The fusion methods we are using in our experiments are described in more detail later.
Multiscale Decomposition based methods Multiscale transforms are very useful for analyzing the information content of images for the purpose of fusion. Zhang and Blum (1999) discuss the different multiscale image fusion approaches in detail. Most of the methods combine the multiscale decomposition of the source images. The idea is to perform a multiscale transform (MST) on the source images, construct a composite representation of these using some sort of fusion rule, and then construct the fused image by applying the inverse multiscale transform (IMST). This process can be seen in figure 1-1.
Figure 1-1 – Diagram of a generic multiscale decomposition fusion
Most commonly used multiscale decomposition fusion methods are pyramid transforms and wavelet transforms.
7
Pyramid transforms Pyramid transforms can be used as a multiscale transform for the fusion process. A pyramid transform fusion consists of a number of images at different scales which together represent the original image. An example for a pyramid transform is the Laplacian Pyramid. Each level of the Laplacian Pyramid is constructed from its lower level using blurring, size reduction, interpolation and differencing in this order (Zhang and Blum, 1999). Toet and Franken (2003) used a Laplacian Pyramid fusion to fuse infrared and image intensified images. They say that as a side effect of this method details in the resulting fused images can be displayed at higher contrast than they appear in the images from which they originate. Alternative pyramid transforms are contrast pyramid which preserves local luminance contrast in the sensor images (Toet 1990), and gradient pyramids which applies the gradient operator on each level of the Gausian pyramid representation (Burt and Kolezynski, 1993).
Discrete Wavelet transform Wavelets are a type of multi-resolution function approximation that allow for the hierarchical decomposition of a signal or an image. The Wavelet transform is a useful method to fuse images (Scheunders and Backer 2001, Gomez et al. 2001, Zhang and Blum 1999, Singh et al, 2004). The wavelet transform has several advantages over other pyramid-based transforms: It provides a more compact presentation, separates spatial orientation in different bands, and decorrelates interesting attributes in the original image. Using the Wavelet transformation, the source images are first transformed using the wavelet transform. Then, a fusion decision map is generated based on a set of fusion
8
rules. The fused wavelet coefficients can be built from the source images wavelet coefficients using the decision map. Finally, the fused image is obtained using the inverse wavelet transform. From this process, we can see that the fusion rule plays a very important role in the fusion process. A frequently used fusion rule is a pixel based rule where each coefficient in the merged transform is produced from a combination of the corresponding coefficients in the source images transform. Another used method is to consider not only the corresponding coefficients in the source images, but also their close neighbors – a 3x3 or 5x5 window for example. This is called window based fusion rule. This method assumes that there is usually a high correlation between nearby pixels.
Principal Component Transform Fusion PCA (Principal Component Analysis) is a general statistical technique that transforms multivariate data with correlated variables into one with uncorrelated variables. These new variables are obtained as linear combination of the original variables. PCA has been widely used in image encoding, image data compression, image enhancement and image fusion. When this technique is used in image fusion, it is performed on the images with all its spectral bands. William Krebs has used this method in many of his experiments (Krebs and Sinai, 2002, McCarley and Krebs, 2000, Krebs et al., 2001). There are two ways researchers have used PCA to fuse images. The first one assigns the first principal component (PC) band to one of the RGB bands the second to another RGB band in a false color technique. The second method separates the first and second PCs to intensity and hue in an HSV image and is described in more detail later.
9
Opponent color processing A technique used by Waxman et al. (1998) is based on biological models of opponent-color processing. It was observed that certain rattle snakes have neuron cells that can fuse IR and visible inputs. These cells show interactions in which the IR input can enhance or depress response to the visible input in a non-linear way. This idea forms the model for the opponent color technique. The technique maps two bands to human opponent colors (red vs. green, blue vs. yellow). A neural network is used to fuse low light and thermal images to produce a false color image.
False color A simple fusion method introduced by Alexander Toet (Toet and Walraven, 1996) and used frequently (Toet and Franken, 2003, McCarley and Krebs 2000) tries to utilize the ability of the human visual system to perceive color. The method assigns each band of the input image to a corresponding band in an RGB color image, assigning one band to the R channel, a second band to the B channel and the third band to the G channel. This will work if we are merging three images. If trying to merge two images, then one image can be assigned to both the B and G channel (cyan) while the other image will be assigned to the R channel. Some manipulation can be done on the input images before assigning them to the RGB bands. Toet and Franken (2003) have shown that this method is better for human perception and target detection than a normal contrast enhancement method.
10
Fusion methods used in our research In our research we chose to compare the common fusion methods of Principal Components fusion, and False Color fusion. In addition we compare them with a novel feature level fusion method based on edge detection which is explained later, and a simple pixel averaging method.
1.6 Human factor issues in Image fusion The goal of image fusion is to improve the performance of the human observer in different visual tasks. Human factors experiments are used to test whether a sensor or a fused image can improve operator performance in the task. Krebs et al. (2002) showed, by examining the effects of the same sensor fusion to different cognitive tasks, that the benefit of sensor fusion may be task dependent. Therefore, in this chapter, we will divide the different visual tasks into four groups: Detection, Search, Recognition & identification, and situational awareness. For each task group, we will review the human factor experiments and discuss how image fusion affects the task.
1.6.1 Detection Tasks Detection tasks are tasks in which the observer sees a scene with and without a target and needs to decide whether or not the target appears in the scene. Target detection is one of the most common tasks used to investigate the potential benefits of image fusion, yet the results of this benefit are not consistent. While some researchers have found fusion to improve target detection (Essock et al., 1999; Krebs et
11
al., 1999; Toet et al., 1997) others have not (Krebs and Sinai, 2002; Steele and Perconti, 1997). Essock et al. (1999) examined the ability of fusion to enhance target detection. In their experiment, ten human observers were presented images from SLS (Star Light Stimulator) sensors, thermal sensors, and fused images using the opponent color processing method. The observers were asked to tell if in a given image the target which was shown beforehand appeared. The time each image was shown was very short (100 ms). They found that the sensitivity of the detection results measured by d’ using the signal detection theory, was better in the fused image than in the single sensor ones. Krebs et al. (1999) wanted to determine if a fused image can improve drivers’ detection of road hazards within a nighttime scene. They used images collected from visible and short waved infrared sensors, and fused them using an opponent color technique. Eleven observers were asked to detect the presence of a pedestrian in single sensor image and in fused images with different intensity of opposing headlight glare. The results showed no effect of the image type on the reaction time, but sensor-fused imagery produced accuracy better than or equivalent to that produced by either one of the single-band imagery. In another experiment, Krebs et al. (2001) showed 14 observers a scene which contained in 50% of the trials a randomly placed airplane target. The image formats used were three single band imagery (short, mid and long IR bands), Color and B&W fused images using the PCA method. Target detection and false alarm probabilities were computed according to the signal detection theory. The results showed better performance with the color and the long-wave infrared images than the other images.
12
Sampson et al. (1996) compared thermal sensor images, SLS sensor images, chromatic and achromatic fused images in a target detection task. Six observers viewed three scenes with and without targets pasted on them. The experiment was conducted such that the display will imitate a HUD (head up display). The observers were asked to indicate whether or not the target existed in the image. Results showed some advantage to fused images over the input bands. Not all target detection experiments showed advantage to the fused images. Krebs and Sinai (2002) conducted an experiment to determine the perceptual advantages of sensor-fused imagery over conventional single-band nighttime imagery for a wide range of visual tasks including target detection. The target detection experiment included presenting an image to 84 observers. Observers were randomly assigned one of six image formats: IR, Image intensifier image, two chromatic fused formats, and two grayscale fused formats. The fusion method was the PCA method. Their data indicates that sensor fusion did not improve performance - neither in the reaction time nor in accuracy in the target detection task beyond the single-band image.
1.6.2 Search Tasks In search tasks the observer is asked to search for a target in the image presented to him, and to make a decision concerning its location. (For example: is the target located in the left or right hand side of the screen) Waxman et al. (1996) examined the benefit of chromatic and achromatic fusion in a search task. Three fusion formats were used fusing long wave IR and image intensified images: Opponent Color method, and two B&W methods. The images described natural scenes. A square with contrast value of +-15% which acted as the
13
target was inserted to the source images in a random location. The observers were asked to tell if the square was located on the right or left hand side of the image. The results showed that when one of the original formats had low contrast, and therefore was hard to use, the fused image helped with the search task in all contrast levels. Another search task experiment was conducted by Krebs et al. (2001). Ten observers participated in an eye movement study. Eye movements of the observers were studied as they searched for a target. Once the target was detected, the subject was to maintain fixation on the target. Each subject was shown one display format out of three single band imagery (short, mid and long IR), Color and B&W fused images using the PCA method. The images were of natural scenes which contained in 50% of the trials a randomly placed airplane target. After analyzing the data, it was concluded that observers looking at the short and medium IR bands, and at the B&W fused image had more fixations and longer scan-path lengths compared to long wave IR band and the color fused image. Thus, the long-wave and the color-fused targets contained enough information to guide the subjects’ eye movement to the desired location. It seems by the two experiments shown here that fusion contributes to the search task, but further experiments should be conducted to verify these results.
1.6.3 Recognition and identification Tasks Recognition tasks are tasks in which the observer is required to classify the target to a broader category, and identification tasks are tasks in which the observer is required to classify the target to a narrower category. For example, in recognition we distinguish the target between a car and another vehicle, while in identification we classify the type of car.
14
Steele and Perconti (1997) examined the contribution of fusion to the tasks of recognition, search, and identification for a helmet mounted helicopter display (HMD) system. Twenty-Three experienced observers looked at images presented on a HMD system similar to the one helicopter pilots use. The images were presented for ten seconds and the observers were requested to respond as fast as possible to questions like: is a target from a specific category (recognition), the location of a target (search), and detailed information on the target (identification). The comparison was done between five formats: thermal long wave IR sensor, light intensified image, PCA fusion method, and opponent color chromatic and achromatic methods. The results indicated that the fastest reaction time and the best accuracy level were with the opponent color method and the thermal image. Sinai et al. (1999) examined whether a fused image can improve visual performance over single bands. In their first experiment there were two input bands: long wave IR image and light intensified image, and four fused formats: two color fused formats, and two achromatic fused formats using a PCA method (In each fused method one IR band was taken with white-hot polarity and one with black-hot polarity). They examined 60 subjects who looked at images made of natural scenes; each subject looked at only one format. The subject needed to decide if the image contained a human figure, a vehicle, or no target at all. The results showed that the color fused format had fewer errors than the single band formats. The least number of errors was found in the color fused white-hot polarity. The results didn’t show any effect of format on reaction time.
15
1.6.4 Situation awareness Situation Awareness is defined as a perception of the components of the environment which are limited in time and place, understanding of their meanings, and their implications on the near future (Endsley 1988). Situational awareness is divided to three levels according to cognitive complexity: the perception of relevant attributes, the interpretation of the attributes needed to complete the task, and the ability to predict the behavior of the attributes in the perception. In the relevant experiments measuring situational awareness, only the first level is usually measured. Toet et al. (1997) examined if fusion can improve observers’ situational awareness. Two color fusion schemes, the Opponent Color method and the false color method, were applied to thermal and visible military images and were presented to six observers along with the original images. The task involved the detection and localization of a person relative to a characteristic detail that provided spatial context. The observers were requested to determine the position of the person in the image relative to a fence by pointing the mouse cursor on the same place in the schematic image based on their memory.
This task enabled the researchers to access the observers’
situation awareness. The results indicated that color fused imagery improved target detection rates over all other modalities. Furthermore, they showed that observers improved their ability to determine the relevant location of a person in the scene when looking at the fused images, hence improving their situation awareness. Sinai et al. (1999) also examined the advantages of fusion to situation awareness. In their second experiment 60 observers were asked to determine if an image was upright or inverted. Each observer viewed imagery from only one sensor format. The results
16
show that the reaction time was faster in all fused formats than in the input images. Also, the error rates were highest with the IR image. Krebs and Sinai (2002) presented chromatic and achromatic fused images using a PCA fusion method to 48 observers. The task required the observers to make a speedy response to determine if the scene is upright or inverted in 180 degrees. Results failed to reveal an affect of the image format on the sensitivity (number of errors), but did show that reaction time to the input sensors was slower than that to all fused formats.
1.6.5 Human factor discussion From the literature review of the different human factors experiments it seems that in most human visual tasks there is some sort of benefit to fusion. Looking more specifically, fusion’s effect on target detection and search tasks is equivocal, in the identification task it seems that fusion is at least as good as the thermal band, and in situational awareness tasks fusion has shown to have better results than the separate bands. The equivocal effect of fusion on detection tasks can be explained by the various fusion methods used, the different methodological approaches, the different input bands, and the differences in the scenes and the targets. Furthermore, in a target detection task the detection usually depends upon the difference in the contrast of the target and the surroundings. This difference is usually high in the thermal bands, and therefore adding information from other bands does not always help to improve the detection. In a situational awareness task on the other hand, fusion of data from different modalities can contribute to the overall perception of the environment by adding information about the environment from each input band. This can explain why all experiments in situation
17
awareness have shown the advantages of fused imagery. Therefore, a possible task of image fusion in a general visual task can be to maintain or slightly improve the high target detection of the thermal band, while improving the situation awareness of the image.
1.6.6 The lab method verses the natural method Two experimental approaches in cognitive psychology can be implemented in human factor experiments examining fusion methods for visual tasks. The first approach, and the more common one, is the natural-setting or real-life method. It uses an experimental setup which is as close as possible to reality (for example: Krebs et al. 1999, Sinai et al. 1999, Toet and Franken 2003). The second approach, the experimental method, exercises control over all intervening variables that are not concerned with the experiment itself. An example for this approach is taking a patch of the target, while controlling the pixel intensity in the image (Essock et al. 1999). One of the differences between the two approaches is the use of a natural background as opposed to a uniform background: in the real-life method, natural background is used (Krebs and Sinai 2002), while the laboratory approach, uses patches of real scenes which are presented on a uniform background (Essock et al. 1999). Another difference between the two approaches has to do with the target. In the natural approach, the scene is presented as is and the target is in its natural position, and in the lab approach, the target is superimposed in the scene (Sampson, 1996). The differences between these two approaches can explain some of the differences in the results of the experiments. The main advantage of the natural approach is that it is similar to the settings used in real life. By using real images with targets at natural positions, the conditions in which the
18
operator makes his decision are simulated, and conclusions can be made about the specific task. The problem with these kinds of experiments is that it is hard sometimes to extrapolate the results to other cases since there are many intervening variables. The lab approach controls these variables and therefore is better for examining specific cognitive processes, yet it does not have the reliability of real world experiments. Furthermore, by using patches and by putting the target in the scene, the unnatural interaction between the target and the background can give the observer unwanted hints of the position of the target. In our experiments use the natural setting approach.
1.7 Chromatic and achromatic fusion The human eye is sensitive to color, and can use color for different tasks like enhancing the ability to perceive and recognize targets. In a search task, for example, there is evidence that color can help the search using pre-attentive pop-up processes. (Treisman & Gelade, 1980). In a detection task, the ability to detect small objects against a varied background was shown to be greatly facilitated by the use of color (Goldstein 1996). Furthermore, it was found that fewer fixations are required to locate color-coded targets (Hughes and Creed, 1994) In a color display, compared to a grayscale display, in addition to the brightness contrast dimension there is also a color contrast dimension which can cause the separation between a shape and its background to be easier (Aguilar and Fay et al. 1998). The importance of color in fusion stems from the fact that even though the luminance intensity and the content in the spatial frequency in grayscale and color fused images are the same, color adds a perspective dimension of chromatic contrast that can aid in
19
performance of specific visual tasks (Krebs and Sinai, 2002). Nevertheless, it is not obvious that chromatic fusion will show better performance than achromatic fusion for a specific visual task. A fusion algorithm can facilitate visual performance by improving the spatial content of the input images, and not only by enhancing their contrast through the addition of the color. In that case, chromatic fusion is not necessarily better than achromatic fusion (Krebs et al. 2001). Human performance experiments show that in most cases chromatic fusion methods do show some advantage over achromatic fusion methods. In search tasks, as predicted by the pop-up effect, it seems that color fusion is advantageous (Waxman et al. 1996, Krebs et al. 2001). In recognition tasks (Sinai et al., 1999), situational awareness tasks (Toet et al. , 1997), and in scene recognition tasks (Sinai et al., 1999) experiments have also shown possible contribution of color fusion over achromatic fusion. In target detection on the other hand, findings of the benefits of chromatic fusion over achromatic fusion are not conclusive. In one study, Krebs et al. (2001) found color to be beneficial to a target detection task, but in another experiment, Krebs and Sinai (2002) did not find a difference in reaction time or in accuracy level between achromatic and chromatic fused images. In some circumstances, chromatic fusion even led to a lower level of detection than achromatic fusion (McCarley and Krebs, 2000). This difference in the target detection experiments can be attributed to the different fusion methods and methodologies used in each experiment. In our experiment, we will use several chromatic and achromatic fusion methods. The methodology will be the same for all fusion methods but we will not use chromatic and achromatic images of the
20
same fusion method. Investigation of the effects of color fusion on target detection is a topic for further research.
1.8 Effect of the input bands on the fusion Image fusion is a process of combining two or more images from different sensors. Different sensors that extract different types of information from a scene are used as input bands for the fusion process. Different input bands are used for different tasks. While thermal sensors are best for detecting human targets or heat illuminating objects at night time, millimeter wavelength radiation is more effective in the fog, and visual images have the best resolution. Different infrared (IR) sensors are frequently used as input bands for fused images. IR bands are divided into longwave IR (LWIR), midwave IR (MWIR) and shortwave IR (SWIR). Image intensified sensor (i²) images which enhance the star and moon light at night time are frequently fused with IR bands to form a fused image describing a night time scene. (McCarley and Krebs 2000, Essock et al 1999, Krebs and Sinai 2002, Sinai et al. 1999). In other experiments IR bands from different wave lengths were fused (Waxman et al. 1996, Krebs et al. 2001), or IR and regular visible sensor images were fused (Xue and Blum 2003, Toet et al. 1997). Fusion of long distance observation system’s video output streams has also been addressed. (Fishbain, 2004) In the fusion of IR bands with i² or visual images, the resolution of the infrared sensors are generally poorer than that of the image intensifier sensors; however, the contrast between the heat emitting objects in the image and their surrounding in the infrared sensor compared to the image intensifier image is greater. In a situation
21
awareness task, the number of errors and the reaction time was worse with the thermal sensor band than in all other formats (Krebs and Sinai 2002). The writers’ explanation was that the low resolution of the thermal format causes a less detailed description of the scene. However, in that experiment, the fused format showed better performance over both the thermal and the image intensifier formats. The improved performance indicates that the thermal format added unique information to the situation awareness task beyond that of the i² format. In a target detection task, on the other hand, the contrast of a target and the background is more important than the resolution. Therefore the advantages of fusion of IR and i² images as compared to a single IR band for this task are not straight forward. The input bands used in the fusion process can affect the decision of which fusion method to use. For example, Singh et al. (2004) used the wavelet transform to fuse infrared and visual images for the task of face recognition. They argue that since the IR images have a much lower resolution than that of the visual image, fusion using a multiresolution method would allow features to be fused at the resolution they are most salient.
1.9 The effect of the scene and target One of the areas that needs further examination is the effect of the content of the image (the target and the background) on the fusion. The target and the background used in the visual task can affect human performance. Krebs and Sinai (2002) found that it is easier to detect and recognize a human figure than a vehicle. This was found both in the single format and the fusion formats.
22
White (1998) argued that one cannot define the correct fusion format without considering the scene first. He presented to experienced and inexperienced observers 23 different scenes from five categories: man-made objects, wood, roads, ocean and “general” in a situation awareness task. Out of the 23 scenes in his experiment, the thermal sensor showed better human performance in 11 of them, the color fusion was better in 10 images, and the image intensifier image was better in two images. Most of the fusion methods do not take most of the data from one band and add a different proportion from the other. An exception to this is our method of edge fusion (see chapter 2) which takes most of the data from one band, and adds data from other bands. The disadvantage of this method is that the base band is not chosen dynamically according to the images, but is chosen beforehand. For example, clouds obscuring the moon and star light may degrade the image-intensifier image. An IR image taken in the morning after a long period of rain may be less detailed because of the low thermal contrast in the scene causing a target detection task to be harder (Toet 1997). So, if the input bands fused are an IR band and an image-intensifier band, and the IR image was taken after a long period of rain, a fusion scheme which emphasizes the image-intensifier image would be better. On the other hand, if there were clouds during the time the image was taken it might be better using a fusion scheme which will take most of its features from the IR image.
1.10 Target Detection A target detection task is a task in which the observer scans a region of the visual world, looking for something whose existence is unknown and is in an unknown location.
23
(Wickens and Hollands, 2000). There is no consistent pattern of display scanning (e.g. left-to-right) and no optimal scan pattern in search unless the target is in a defined scheme (e.g., menu in a software). The target search is driven by cognitive factors related to the expectancy of where the target is likely to be found. Usually, the semantically logical places for the target are first searched when scanning an image, and only later the rest of the image. The Feature integration theory (Treisman and Gelade, 1980) distinguishes between parallel and serial processes in target searching. Certain search tasks, like detecting a red item out of a group of green items, seem easy and effortless and demand parallel processing. Visual attention will be drawn to display items that are large, bright, colorful, or changing. This effect is called the pop-up effect. Other tasks where the target and the distractors are defined by a conjunction of features are more difficult, take more time, and demand serial processing. When multiple levels of multiple dimensions define the target, and when the target is difficult to discriminate from the distractors, serial search results. In our case, we are using complex IR images of real outdoor scenes. The targets and distractors have many features, and it is hard to distinguish between them. We can therefore conclude, that we are performing a serial search and not a parallel one, and thus we can expect the search time to take longer. Adding a feature that will distinguish the target from its surroundings (like color, or a distinct shape) can change the process to a parallel search, and ease the target detection task.
24
2 The Current Study 2.1 Research Objectives Many human factor studies have examined the advantages of specific fusion methods over the individual input bands, yet few comparison studies have been conducted to examine which fusion method is preferable over another. Simard et al. (1999) examined different fusion display methods of synthetic and IR sensor images for a helmet mounted display. Three subjects viewed an emulation of a descending flight. Their task was to detect specified terrain features and objects as they became visible in successive fused image snapshots along a flight path. They examined three fusion methods: pixel averaging, opponent process, and false color. The distance the target was first detected (the image number) was measured. The results indicate that all three formats improved the capability to detect features from a greater distance over the single sensor. When comparing the different fusion methods, the false color algorithm was superior over the two other methods over different visibility conditions. Simard et al. compared three fusion methods, but their use of only three observers and synthetic images prevents us from generalizing which method is better under what conditions. Other researchers have conducted experiments to show the advantage of different fusion methods over single bands, sometimes using two different fusion methods and comparing them. Toet et al. (1997) compared the opponent color method with the false color method while examining the benefit of fusion of thermal and visible image in a situation awareness task. The results showed better performance in the fused
25
methods over the input formats, but did not show a preference for one of the two fusion methods. McCarley and Krebs (2000) compared the false color and the principal components methods when trying to show the advantage of sensor fusion for enhancing drivers’ night-time display detection of road hazards. Observers were asked to detect a pedestrian in a night-time scene. They found that the principal components method was better when there was glare of the oncoming vehicle’s headlights, but under low illumination did not perform as well as the false color method. Other than the studies mentioned above, we are not aware of any other studies which compare human performance with different fusion methods.
Our main research
objective was to compare different fusion methods which are used today to fuse multispectral images, and to determine which method is best for the task of target detection disregarding other variables such as type of target or background. A second objective was to introduce a novel feature based fusion method based on edge detection and to compare this method to other known fusion methods. In addition, we introduce two methodologies to compare between different image modalities. One method is the paired comparison method, used to subjectively compare two image modalities. The other method is a new methodology to use eye-tracking to compare images in a target detection task.
26
2.2 Research methodology For the edge fusion method, we developed and implemented an algorithm which fuses several input bands using edge detection. This algorithm is described in detail later on in this section. In order to compare between different fusion methods, we performed several psychophysical experiments. Psychophysical experiments are commonly used in human factors research. They are procedures which are designed to examine and record human reactions to given situations or tasks. These experiments can be of qualitative or quantitative nature. In our experiments, we performed both qualitative and quantitative experiments. The first experiment was a quantitative experiment. In the experiment we presented to observers fused images with multiple targets embedded in them. We measured the absolute number of targets correctly detected for each observer, and the time needed to detect those targets. The second experiment was a qualitative experiment. In this experiment we presented to observers paired comparisons of two different fusion methods and recorded their judgment of which method is better on a qualitative subjective scale. The third experiment measured the eye-movements of the observers as they searched for targets in the fused images. The methods and results of all three experiments are described in the next sections followed by a general discussion of all the results and our conclusions.
27
2.3 Fusion methods The images were fused using four different fusion methods: False Color, Principal Components, Edge, and Average fusion.
2.3.1
False Color A simple fusion method introduced by Toet and Walraven (1996) fuses two input
bands using a false color mechanism. First, the common component of the two images is found using a local minimum. The common component for the images A(i,j), B(i,j) is calculated by: (1)
A ∩ B (i, j ) = Min{ A(i, j ), B (i, j )}
where A and B denotes the two input band images, and i and j denotes a pixel in image A or B. Then, the common component of the two images is subtracted from the original images in order to get the unique component of each image: (2)
A* = A − A ∩ B , and B* = B − A ∩ B .
The next step is to subtract the unique component of each image from the other image: A-B* and B-A*. Finally, these two images are mapped to the Red and Green bands of an RGB image to create one fused image: (3)
C = ( A − B*) ⊕ ( B − A*)
where ⊕ represents the fusion operation. It is possible to emphasize the unique component of the two images by assigning the difference between them: (A*-B*) to the Blue band of the RGB image.
28
A more simple way of using the False Color method is to assign each band of the input images to a corresponding band in the RGB color image, assigning one band to the R channel, one band to the G channel and one band to the B channel. If working with 3 bands this is simple. If working with only two bands, one band can be assigned to the R channel, and the other band to the Cyan color. Assigning the second band to the cyan color, is done by assigning the band to both the G and the B bands.
2.3.2
Principal Components William Krebs has used the principal components method to fuse images in many
of his experiments (Krebs and Sinai, 2002, McCarley and Krebs, 2000, Krebs et al., 2001). In his approach, the input images are transformed using principal components analysis, and the major axis is mapped to the luminance channel in an HSV (hue, saturation, value) image, while the orthogonal axis is mapped to the color channel. Then, the image is transformed from HSV to an RGB (red, green, blue) image. The resulting image shows the fused image using false colors which show each band. The assignment of the major principal component to the luminance channel is strait forward, but the assignment of the second component to the color channel is not immediately obvious. According to Krebs, the assignment of the second component to color results in displaying two and only two opponent colors in various saturation. This makes an immediately intuitive representation as to which spectral bands dominate and by how much. When Krebs indicated that only two colors were displayed, he was using only two bands as input bands. When using three or more input bands there might be
29
more colors, but the intuitive representation of the spectral bands will happen also with three input bands.
2.3.3 Simple intensity average The most simple fusion scheme is to take the average result of each pixel from the input bands as the intensity of the fused image. This will produce an image which contains information from all the input bands in a unified way. We can use this method as a benchmark for the other fusion methods. Weighted average can also be applied, taking the weights of the different input bands according to characteristics of the input bands (Fishbain, 2004). In this study we used a simple non-weighted averaging scheme.
2.3.4
Edge fusion A common denominator between the above methods is that they are based on
pixel evaluation alone. The neighborhood of a particular pixel does not influence the value of the pixel in the fused image. A totally different method involves feature level fusion. A standard image processing method is to enhance images by adding on edge information. Edge information can be added to the original image by using any edge detection method. In our algorithm we will use the Sobel filter to extract the edge information of each band. (Gonzalez and Woods, 1992) This method finds edges by convolving the original image with a spatial filter. The extracted edge information can be added to the original image to enhance the target detection ability.
30
When using this method for fusion of several input images, one image should be selected as the base image. An edge image can then be converted as mentioned above, and added to the base image. We can repeat this process for several bands, adding each band’s edge intensity to the base image. Edge enhancement method is a feature level fusion method. The effectiveness of this method depends upon the distribution of the feature space. If the distribution is too low, the overlap will cause too much ambiguity and will not be able to identify the target, if the distribution is too high, there will not be enough overlap to enhance the identification of the target. Another point to take into consideration is which band to use as base band. This can depend on the input bands used. When fusing visible or short IR bands with IR bands (mid or long) it is best in our opinion, to use the visible band as base for the fusion, since the background has higher resolution in the visible bands. If the IR band will be taken as the base band, the resolution of the fused image will be that of the IR band. For the task of target detection, it is best to choose the base band in which the target is more salient. The edge information from the other band can then further enhance the targets' salience.
31
3 Target Detection To quantitatively test target detection in a natural scene, many studies show observers the scene with or without a target present, and ask the observer to find the target as quickly as possible if it exists or to decide that the target does not exist (Krebs and Sinai, 2002, McCarley and Krebs, 2000, Toet and Franken, 2002). Detection time and accuracy of detection are usually measured. In our images, there are more than one targets present in each scene, so a different approach is needed. As in previous studies (Maltz and Shinar, 2003), our observers' task was to find as many targets as possible as quickly as possible while avoiding false alarms. The accuracy and number of targets as well as the time taken to find the targets were measured.
3.1
Method
3.1.1 Participants Participants in the experiment were 56 students from the industrial engineering and management department of Ben-Gurion University with no experience in target detection. Average age of the participants was 25.5 with a standard deviation of 2.3 . All participants had normal or corrected to normal vision (minimum 6/9 Snellen visual acuity), were checked to have normal contrast vision using the Pelli-Robson contrast chart, and reported having no color vision deficiency. All participants were naïve to the purpose of the experiment.
32
3.1.2 Apparatus Stimuli were presented to the participants on a color 17” LCD screen via a Pentium 4 PC. The display resolution was 1024x768 with a frame refresh rate of 85 Hz. Participant-to-screen distance was approximately 40cm. The experiment took place in a small, quiet, darkened room to minimize peripheral distraction. Observers were instructed to adjust their distance to the computer and to place the mouse in their preferred position to enable them to view the screen clearly and to easily manipulate the mouse.
3.1.3 Stimuli Stimuli for this experiment were eight multispectral images of different input bands. Each image had a different combination of two or three short, medium, or long IR input bands. The images were carefully chosen out of twenty six "base" multispectral images to represent different input bands, and to present a different scene in each image. Each scene contained two to five military vehicles in a complex rural background. During post processing each image was spatially registered. Then the registered images were normalized in contrast using adaptive histogram equalization (Gonzalez and Woods, 1992). Each image was then fused using the four fusion methods described previously (false color, principal components, edge, and average). Figures 3-1 to 3-6 display an example of the imagery used. Figures 3-1 and 3-2 show the two original bands of a
33
multispectral image used in the experiment after contrast normalization. Figures 3-3 to 3-6 display the result of the average, edge, false color and principle component fusion methods respectively. This image has three targets which are marked with an arrow in figure 3-1.
3.1.4 Procedure The observers were randomly assigned to four subject groups; grouping was based on fusion method (false color, principal components, edge, and average). Each observer saw images fused with his group's fusion method. The observer was requested to find and click as quickly as possible on all the targets present in the scene while avoiding false alarms. Preliminary explanations were presented to the observer as well as an image containing patches of targets from images fused by the subjects' fusion method. In addition, one example was shown as a familiarization exercise before the start of the experiment trials. Each trial started with a dialog window prompting the user to indicate when he was ready to start the trial. After 20 seconds, the trial ended and a new dialog window was presented starting a new trial. All together there were eight trials for each observer. In each trial, a different scene was presented to the observer. The order of the images for each observer was the same. Performance was tied to speed as well as accuracy in the task. To motivate the observers, a cash prize of 100NIS was promised to the observer with the best performance.
34
Figure 3-1 – original band 1 with the three targets marked
Figure 3-2 – original band 2
35
Figure 3-3 – fused image using Average method
Figure 3-4 – fused image using Edge method
Figure 3-5 – fused image using false color method
Figure 3-6 – fused image using principal components method
36
3.2 Results The independent variable was the fusion method: average, edge, false color (FC), and principal components (PC). The dependent variables measured were the number of hits (correct detection of a target), the number of false alarms (clicking the mouse on a non-target), and the detection time. For each subject, the variables (for example number of hits) were summed or averaged over all images shown to the subject. We are interested in how the fusion methods affected the different dependent variables. The comparison between fusion methods was done using a one-way analysis of variance (ANOVA).
3.2.1 Total hits The total number of times the observer clicked on the mouse and hit a target out of the number of targets in each image was averaged over all images shown to each observer. In a target detection task, we prefer methods that increase the number of hits. We can see the results of the analysis with the hit percentage average as the dependent variable in figure 3-7. As can be seen in the graph, the principal components and the false color methods showed the best results for the number of detections, the average method had a lower number of detections, and the edge method showed the worse results.
37
method; LS Means Current effect: F(3, 52)=8.8512, p=.00008 Effective hypothesis decomposition Vertical bars denote 0.95 confidence intervals 0.70 0.65 0.60
hit percent
0.55 0.50 0.45 0.40 0.35 0.30 0.25 average
edge
false color
principal comp
method
Figure 3-7 – effect of fusion method on the hit percentage average
The analysis showed a significant effect { F( 3,52 ) = 7.394; P < 0.01 } of fusion method. The post-hoc Fisher LSD analysis indicated that the edge method was responsible for this effect. All pair wise comparisons of the edge method with all other methods produced a significant effect (can be seen in appendix 3). No significant differences were found between the other methods. In order to examine the interaction between the fusion method and the image quality, we asked five people to rate the general image qualities. We then divided the images into two groups of high and low quality based on this rating. It was found that all of the images rated as good by the viewers had one of their input bands from the near infrared spectrum, which can explain why people perceived them as better. A two way analysis of variance was conducted in order to examine, across all participants, the influence of the image quality and fusion method on the probability of hit. A significant
38
effect was found for image quality { F(1,52 ) = 12.216; P < 0.01 } indicating that there was a significant difference between the two groups. In addition, a significant interaction effect was found between the fusion method and the image quality { F(1,52 ) = 3.87; P < 0.05 }. Figure 3-8 displays this interaction. Means of the hit percentage rate are presented for the fusion methods of high and low quality images.
Quality*method; LS Means Curr ent effect: F(3, 52)=3.8707, p=.01423 Effective hypothesis decomposition Vertical bars denote 0.95 confidence intervals 0.8
0.7
0.58
probability of hit
0.6
0.6
0.53 0.5 0.51
0.5
0.5 0.44
0.4
0.3
0.25
0.2
0.1
low
high
average edge false color principal components
Quality
Figure 3-8 – hit percentage – fusion method X image quality. Means for high and low image quality in each fusion method
As shown in figure 3-8, the interaction between the quality and the fusion method stems mainly from the difference in the average method. While the probability of hit increased in the PC, FC, and edge method from the low to the high quality images, it decreased for the average method. Also, the PC and FC methods were the best methods for high quality images, while the average method was better for the low quality images.
39
To further investigate this interaction, we performed a one-way ANOVA test on each quality group separately. The low quality group showed a main effect on the method variable { F(1,52 ) = 10.83; P < 0.01 }. A post-hoc Fisher LSD analysis showed that this effect stems from the difference between the edge method and all other methods (see figure 4). There was a significant main effect for the high quality group { F(1,52 ) = 3.98; P < 0.05 }. Using the post-hoc Fisher LSD analysis showed that this effect stems from the difference between the edge method and the PC and FC methods. Furthermore, the difference between the average and the PC methods was marginally significant (p = 0.056). The detailed analysis of these results is presented in Appendix 3.
3.2.2 False alarms False alarms were recorded when the participant clicked on the image thinking there was a target in a place with no actual target. The number of false alarms was summed over all images for each participant. We wished to identify which methods not only give a better detection rate (probability of hit) but also have a minimum error rate (number of false alarms). Figure 3-9 displays the false alarm means of the different groups. We can see that more mistakes were made in the false color and edge groups, while the principle components and the average groups showed fewer false alarms. However, these differences were not statistically significant. { F( 3,52 ) = 0.4283; NS }.
40
10 8
8.928571
9.071429
edge
FC
7.857143
7.285714
6 4 2 0 average
PC
Figure 3-9 – Mean false alarm means per fusion group
3.2.3 Detection time Detection time is the time taken to detect a target. Better fusion methods will help the observer detect the targets and therefore will have faster detection times. There may be targets which can be detected with all fusion methods, but some fusion methods help to emphasize a target's features, causing the target to be detected faster. The detection time from the last click (or start of a new image for the first click) was measured for all hits and averaged over all images for each participant. Figure 3-10 displays the mean target detection time of each group. The time taken to detect a target using the false alarm method was higher than the other groups, which had similar detection times. When analyzing the average detection time as the dependent variable in a one way ANOVA test, the result did not show significant results. { F( 3,52 ) = 0.3125; NS }
41
PC
3.60
FC
ge ed
ra av e
3.61
3.53
ge
reaction time
3.93
4.0 3.9 3.8 3.7 3.6 3.5 3.4 3.3
Figure 3-10 – Mean target detection time per fusion group
3.3 Discussion The results for the number of hits, showed best results for the principal components and false color methods, and worst results for the edge method. But, only the fact that the edge method had fewer hits was found statistically significant. When looking at the image quality factor, we see that the PC and FC methods showed best hit percentage results for images in the good quality group. The common denominator for this group, in contrast to the bad quality group, was that all images in this group had a short wave IR band as one as the input bands. The short wave IR frequencies, being close to the visible frequencies, show a more regular representation of the scene and therefore images fused with a short wave IR band as an input band were rated better. We can therefore conclude that when fusing a short-wave IR band with a medium or long wave IR band, it is best to use the PC or FC methods. Because the difference between the groups was the visibility, we hypothesize that fusion of bands
42
from the visible spectrum with medium or long IR bands will act in the same way. Nevertheless, investigating this issue is a topic for further research. When fusing medium and long IR bands, it was shown the PC, FC and edge methods showed about the same hit percent results, while the edge method showed significantly lower results. The results for false alarm rates showed best results for the principal components and average methods (less false alarms) but were not statistically significant. The results for target detection time showed worse results for the false color method but were not statistically significant.
43
4
Paired Comparisons In a paired comparisons experiment, according to the guidelines of the method of
paired comparisons (Thurstone 1918, in Torgerson, 1967), several subjects compare two image samples to each other. The percentage of the time one sample is preferred over the other is used as an index of the relative quality of the two samples. This method of pairwise comparisons generates reliable data about the relative subjective quality of the two images. (Silberstein and Farell, 2000). By comparing all combinations of methods, we can then scale all methods with scalar values on one single scale. By applying this procedure, we can put the subjective assessments of the participants’ views of the benefit of the different fusion methods to the target detection task on one comparable scale.
4.1 Method 4.1.1 Participants Participants in the experiment were the same 56 students from Ben-Gurion University who participated in the target detection experiment. First they did the target detection experiment, then the paired comparisons experiement. The target detection experiment was performed first so the participants would not have a previous knowledge of the targets’ locations. In this experiment we showed the targets to the participants, so the target detection experiment did not affect this experiment. Average age of the participants was 25.5, with a standard deviation of 2.3. They all had normal or corrected
44
to normal vision (minimum 6/9 Snellen visual acuity) and were checked to have normal contrast vision using the Pelli-Robson contrast chart. All participants reported having normal color vision. All participants were naïve to the purpose of the experiment.
4.1.2 Apparatus Stimuli were presented to the participants on a color 17” LCD screen via Pentium 4 PC. Display resolution was 1024x768 with a frame refresh rate of 85 Hz. Participantto-screen distance was approximately 40cm. The experiment took place in a small, quiet, darkened room to minimize peripheral distraction. Observers were instructed to adjust their distance to the computer and to place the mouse in their preferred position to enable them to view the screen clearly and to easily manipulate the mouse.
4.1.3 Stimuli Stimuli were twenty-six "base" multispectral images of different input bands (visible, short IR, mid IR and long IR) obtained from the Israeli Ministry of Defense. Some images showed the same scene, but had different input bands; altogether there were nine different scenes. Each scene contained two to five military vehicles in a complex rural background. During post processing each image was spatially registered. Then the registered images were normalized in contrast using adaptive histogram equalization (Gonzalez and Woods, 1992). Each image was fused using the 4 fusion methods described previously (false color, principal components, edge, and average).
45
4.1.4 Procedure We used a qualitative experiment setting according to the guidelines of the method of paired comparison (Silverstein and Farrell, 2000). We chose nineteen multispectral images (seven images were screened due to bad registration) and fused them using the four fusion method mentioned (Principal components, false color, edge, average). We organized the fused images into pairs; each containing images of the exact same multispectral image, but with two different fusion methods. All possible combinations of methods between two images of the same multispectral image were used. Preliminary explanations and three example pairs were shown as a familiarization exercise before the start of the experiment trials. Then, each observer judged 114 pairs of images (19 images X 6 comparison combinations of 4 fusion methods) as they were displayed side by side on the screen by a simple Visual Basic application. Order of presentation of the image pairs for each observer as well as the side of each method in each comparison (right or left) was random. Their task was to choose which of the two images was preferable for detecting the embedded target by clicking with the computer mouse directly on the preferred image (not necessarily on the target). In each trial, the target was marked by a circle surrounding the target during the first second of each trial to indicate the target to the observer. An example of a paired comparison with a marked target as presented to the observers can be seen in figure 4-1. Immediately after the decision was made, the next pair of images appeared on the screen for consideration.
Observers were allowed as much time as necessary to make
their decisions.
46
Figure 4-1 – Example of a paired comparison between average and principle components methods
4.2 Results + Discussion One of the common methods to analyze experiments performed according to the method of paired comparisons is the Law of Comparative judgment (LCJ). The law of comparative judgment gives us the analytical tools to quantify the proportion of time any given stimulus j is judged greater on a given attribute than any other stimulus k. Applying this law in the current research gives us a set of equations relating the proportion of time an image I(i, j) is judged better than another image I(i, k) from the point of view of ease to detect the marked target in the image, where i is the image index and j,k are the fusion methods indices. Given a set of samples, each sample representing a different fusion method, that are judged against each other pair-wise across a set of subjects an N x N matrix can be compiled, where N is the number of fusion methods. Each element Cij represents the
47
number of times fusion method i has been judged to be better on the criteria checked (ease to detect the marked target) then method j. The matrix showing the sum results of all subjects each subject judging 19 images is shown in table 4-1. Cell Cij (where i is the row and j is the column) represents the number of times method i was preferred over method j. For example, The False Color method was preferred over the Principal Components method in 617 separate comparisons.
Method
Principal Components -
False color
Edge
Average
447
525
475
617
-
651
578
Edge
539
413
-
470
Average
589
486
594
-
Principal components False Color
Table 4-1 – Sum of comparative judgments of all images. Cell Cij represents the number of times method i was preferred over method j
The matrix showing the percentage of judgment which each method was preferred is presented in table 4-2. Each Cell Cij in table 4-2 represents the overall percentage of times method i was preferred over method j. For example, the False Color method was preferred over the Principal Components method 57.99% of the times. The matrix is symmetrical on the diagonal axis in such a way that Cij complements Cji to 100% because each comparison presented to the user forces a decision between Cij and Cji .
48
Method
Principal Components -
False color
Edge
Average
42.01
49.34
44.64
57.99
-
61.18
54.32
Edge
50.66
38.82
-
44.17
Average
55.36
45.68
55.83
-
Principal components False Color
Table 4-2 - Percentage of comparative judgments of all images. Cell Cij represents the percentage of times method i was preferred over method j
Thurstone’s case V method of comparative judgment (Thurstone 1959, in Silverstein and Farell 2001) can be applied to determine the relative qualities of all the samples if: a. Each sample has a single value that describes its quality qi b. Each observer estimates the quality of this sample with a value from a normal distribution around this quality c. Each sample has the same perceptual variance d. Each comparison is independent In the current experiment, we assume that the quality of each fusion method can be quantified in a single value. In addition, we can also assume that observers' estimation of this quality is normally distributed. The different fusion methods use the same input bands and the same images, therefore we can also assume that they have the same perceptual variance. Finally, according to the experimental settings, each comparison is independent.
49
Given these assumptions, the quality of each sample i can be described by a scalar value qi with units of standard-deviations of preferences. The distances between two samples d i′, j can be estimated by the following equation: ⎡ Ci , j ⎤ (1) d i′, j = qi − q j ≈ 2 Z ⎢ ⎥ ⎢⎣ Ci , j + C j ,i ⎥⎦
where Z is the inverse cumulative-normal function (Z-score).
Using equation 1, we can calculate the distance between each method by standard-deviation of preference units. The results of the distances from each method to the other are presented in table 4-3. A positive value in cell Cij indicates that method i is preferable over method j, while a negative value indicates that method j is preferable over method i. The table’s diagonal symmetry stems from the fact that the distance between methods A and B is the same as the distance between methods B and A.
Method
Principal Components -
False color
Edge
Average
-0.285
-0.023
-0.190
+0.285
-
+0.402
+0.153
Edge
+0.023
-0.402
-
-0.207
Average
+0.190
-0.153
+0.207
-
Principal components False Color
Table 4-3 – Distance between each method in standard-deviation of preference units
Based on the assumption that each sample has a single value that can describe its quality qi all samples can be put on a one dimensional quality line. We can estimate a
50
distance of a sample to the mean of all samples by taking the mean distance between the sample and all other samples. This can be described in the following equation: d i′,mean ≈
∑d ′
i, j
N
Using this equation, we can calculate the scale values of the fusion methods. This scale is presented by value in table 4-4 and graphically in figure 4-2.
Method Scale value
Principal Components -0.166
False color +0.28
Edge
Average
-0.195
+0.244
Table 4-4 – scale value of the different fusion methods.
0.4
scale value
0.3 0.2 0.1 0 -0.1
PC
FC
Edge
Average
-0.2 -0.3
Figure 4-2 – scale value by law of comparative judgment of different fusion methods (Principal components, False color, Edge, Average)
The results show, with very slight differences, that the false Color and average methods give the best results, while the principle components and the edge methods give worse results. There is almost no difference between the false color and the average
51
methods in the overall scale, but looking at the specific comparison (table 4-2) we can see that the false color method is preferable over the average method. There is almost no difference between the principle components and the edge method, not in the overall scale nor in the specific comparison.
52
5 Eye Tracking Eye tracking is a technology used to determine where a person is looking. A special camera is used to capture where the person’s eye is and track his eye movements. Cognitive scientists record eye movements to understand cognitive processes that occur when an observer is searching for a target. Eye movement data can indicate where the eye fixated when searching a scene (Maltz and Shinar, 1999, Findlay, 1997), helping us to understand the search process. The eye does not usually move smoothly across a scene. It moves via a series of jumps called saccades. Saccades are rapid motions causing a desired portion of the scene to fall on the fovea. Once a saccade starts it is not possible to change its destination or path, and the visual system is mostly suppressed during the saccade. Usually, a saccade is followed by a fixation. As a person looks at a scene, he or she takes in the information by a series of fixations which are pauses of the eye that last between 200-600 ms as an observer examines part of the stimulus. Eye movements propel the eye from one fixation to the next. These eye movements are necessary if we are to see all of the details of the scene, because a single fixation would reveal only the details near where we are looking. According to Hochberg (1970) these eye movements also have another purpose: the information they take in about the different parts of the scene is used to create a mental map of the scene by a process of integration. In our experiment, we used eye-tracking to attempt to understand what drew the attention of the observers during the target detection task, and thus compare between the different fusion methods. It is hypothesized that the better the fusion method will be, the more fixations the user will have on the targets. We can also measure how many times 53
the subject visited the target before making a decision and fixation duration till the decision.
5.1 Method 5.1.1 Participants Participants in the experiment were 33 students from the industrial engineering and management department of Ben-Gurion University with no experience in target detection. The average age of the participants was 24.7 with a standard deviation of 3.1. They all had normal vision or corrected to normal vision (minimum 6/9 Snellen visual acuity) and none wore eye-glasses. They were checked to have normal contrast vision using the Pelli-Robson contrast chart and reported having normal color vision. All participants were naïve to the purpose of the experiment.
5.1.2 Apparatus Stimuli were presented to the participants on a color 17” LCD screen via a Pentium 4 PC. Display resolution was 1024x768 with a frame refresh rate of 85 Hz. Participant-toscreen distance was approximately 70cm. The experiment took place in a small, quiet, darkened room to minimize peripheral distraction and to keep the participants’ pupil from becoming too small. Observers were instructed to place the mouse in their preferred position to enable them to view the screen clearly and to easily manipulate the mouse. Participants’ eye movements were tracked using an Applied Science Laboratories (Bedford, MA) model 504 eye tracking device. The device utilizes the pupil to corneal
54
reflection technique for measuring eye movements. In this technique, a pan-tilt camera is used to capture eye movements. Surrounding the camera's lens are infra-red LEDs to illuminate the eye and allow the camera to record the pupil as a bright disk and to capture the corneal reflection. The system calculates eye position from the measured locations of the pupil and the corneal reflection. Because the pupil and corneal reflection move differently with respect to each other as the angle of gaze changes, simultaneous tracking of the two helps the system distinguish between eye and head movements. This allows head movements (the two features move together) to be distinguished from eyemovements (the two features move apart). A magnetic head tracker (MHT) receiver translates head movement information received from the transmitter (located immediately behind the participant's head) and sensor (placed via headband above the participant's dominant eye). The MHT helps the system track strong head movements that are beyond the scope of the pan tilt camera. The system samples eye data at 50 Hz. The manufacturer claims a system accuracy of 1° (equivalent at a distance of 70cm to 40 pixels) after calibration.
5.1.3 Stimuli Stimuli for this experiment were based on the eight fused images used in experiment 1. Composite images were prepared from the source images by applying two different fusion methods to a source image such that the right side of the image had one fusion method and the left side had another fusion method. We used the fusion methods of principal components, false color and average which have shown best results in the target detection experiment. Eight images were processed, producing 6 composite
55
stimuli for each image (3 fusion methods giving 3 paired combinations, each in two right/left configurations). An example of a composite image compiled from the principle components method on the left side and the average method on the right side is presented in figure 5-1.
Figure 5-1 – Example of a composite image. The left side of the image was fused using the PC method, and the right side was fused using the average method.
5.1.4 Procedure The observers were randomly assigned to three subject groups; grouping was based on a unique combination of two of the three fusion methods (principle components – false color, principle components – average, false color – average). Within each group, subjects were randomly assigned to one of two sub-groups, I or II. Sub-group I was presented with images 1,3,5 and 7 with the first fusion method on the right side and the second fusion method on the left side (for example, the left side of the image was fused with the Average method and the right side with the PC method), and with images 2,4,6 and 8 with the first fusion method on the left side and the second fusion method on the 56
right side (PC on the left and Average on the right). The opposite right/left configuration was presented to the subgroup II. This was done to neutralize any tendency of an observer to look at one side more than the other. The course of the experiment was similar to the target detection experiment. The observer was requested to find and click as quickly as possible on all the targets present in the scene while avoiding false alarms. Preliminary explanations were presented to the observer as well as an image containing patches of targets from images fused by the subjects' fusion methods. In addition, one image was shown as a familiarization exercise before the start of the experiment trials. Each trial started with a dialog window prompting the user to indicate when he was ready to start the trial. After 20 seconds, the trial ended and a new dialog window was presented starting a new trial. All together, there were eight trials for each observer; in each trial a different composite image of a different scene was presented to the observer. The order of the images for observers was identical. Observers were told to perform the target acquisition as quickly as possible, and performance score was tied to speed as well as accuracy in target detection. Observers’ eye movements during the target search were recorded as well as target detection time and accuracy. Eye movement data was analyzed using the ASL Eyenal software bundled with the system.
5.2 Results We analyzed each group separately (Average-PC, Average-FC, PC-FC). In each group, each participant in the group viewed all 8 composite images.
57
Since the images were not balanced target-wise, for each image, we compared measurements of the participants who viewed the left side with one method with participants who viewed the left side with the other method. The same comparison was performed for the right side of the image. In this way, we compared performance in the same areas of an image for the different methods. In addition, by comparing sides of the images, we were able to avoid skewing of the results by any subject's tendency to stray to the right or to the left side of the display. We also analyzed the number and duration of fixations on the targets, for the two fusion methods. The dependent variables were the number of fixations, the total fixation duration and the mean fixation duration for each image half. The independent variables were image (1-8), side of the image (left, right) and the fusion method used (Average, Principle Components, and False Color)
5.2.1 Average vs. False Color 5.2.1.1 Fixation Count A three-way analysis of variance (ANOVA) of fixation count percentage, with group as a between-subject variable and image and side as within-subject variables was conducted in order to examine, across all participants, images, and sides of images, the influence of the fusion method on fixation count. Image(8) and side(2) was treated as repeated measures. Fixation count percentage is the percent of the number of fixations in an area out of the total number of fixations a user had in the image. Figure 5-2 displays means of fixation count percentage for the Average and FC methods on each side (left, right) over
58
all images. It can be seen that the FC method had more overall fixations, both in the left side and the right side of the images. There was a significant main effect for fusion method { F(1,124 ) = 15.811; p < 0.05 }, and no significant interaction with image side.
60
fixation count %
50
55.33 48.04
46.47 40.30
40 average
30
FC
20 10 0 left
right
Figure 5-2 –fixation count percent of average and FC in each image side.
The three-way ANOVA (group(2) X side(2) X image(8) ) on the fixation count percent yielded a marginally significant interaction between group and image { F( 7 ,124 ) = 2.23; p = 0.058 }. Figure 5-3 shows the fixation count percent for each image for the average and FC fusion methods. As can be seen in the graph, images 1 and 4 do not follow the general pattern of the PC method having more fixations than the average method. Since only two of eight images did not follow the general pattern, and since the interaction was not significant, this indicates that the main effect on the method stemmed from the difference of the means of the two groups.
59
method*image; LS Means 70 65 60
fix num %
55 50 45 40 35 30 25 1
2
3
4
5
6
7
8
Average FC
image
Figure 5-3 – Effect of fusion method on the fixation number percent for each image
The mean fixation duration is the average time a single fixation took. A three way ANOVA (image(8) X side(2) X method(2)) conducted on the mean fixation durations revealed no significant effect or interaction.
5.2.1.2 Target fixation time The total amount of time fixated on the targets in each image was recorded. The time fixated on the targets tells us how salient the target was, even if the observer did not click on it with the mouse. We hypothesize that the better the fusion method is, the more time the observer fixates on the target. A one-way ANOVA computed on the targets’ summed fixation time yielded a marginally significant effect of the method. { F(1,34 ) = 3.03; P = 0.083 }. Observers viewing the FC method averaged 0.62 seconds fixation duration on a target, while observers viewing the average method averaged 0.33 seconds. These differences are presented in figure 5-4.
60
0.7
0.62
0.6 0.5 0.4
0.33
0.3 0.2 0.1 0 Average
FC
Figure 5-4 – Average sum of fixation durations on targets of average and FC methods
5.2.2 Average vs. Principal Components 5.2.2.1 Fixation count Figure 5-5 displays the means of the fixation count percentage in each image side for the average and PC fusion methods. It can be seen in the graph that the principle component method had more fixations than the average method in both sides. A threeway ANOVA (image(8) X side(2) X method(2)) on fixation count percentage showed significantly more fixations for the PC fusion method, with a mean fixation count percentage of 51.6% out of the total number of fixations in the image, compared with 42.1% for the average fusion method, { F(1,100) = 8.245; P < 0.05 }.
61
57.08
fixation count %
60 50
47.27
44.77 37.62
40
avg
30
pc
20 10 0 left
right side
Figure 5-5 – fixation count percentage of average and PC methods in each image side.
The three-way ANOVA (group(2) X side(2) X image(8) ) on the fixation count percentage yielded a significant interaction between group and image { F( 7 ,100 ) = 8.745; p < 0.05 }, showing that the effect of the fusion method on the fixation count percentage was different from image to image. Figure 5-6 displays this effect. As seen in the figure, images 1,2,4,7 and 8 have more fixations in the PC method than in the average method while images 4,5 and 6 have about the same number of fixations for both fusion methods. The main effect between the methods stems from the overall difference, which is strongest in images 1 and 7. Because for each image the PC method showed the same or more fixations than the average method, we can relate the main effect to the differences of the two groups.
62
method*image; LS Means 80
70
fix num %
60
50
40
30
20
10 1
2
3
4
5
6
7
8
Average PC
image
Figure 5-6 – Effect of fusion method on the fixation number percentage in each image.
A three way ANOVA (image(8) X side(2) X method(2)) conducted on the mean fixation durations revealed no significant effect or interaction.
5.2.2.2 Target fixation time
A one-way ANOVA conducted on the targets’ summed fixation time did not yield a significant effect of the method { F(1,31) = 1.42; p = 0.204 }, although the mean fixation duration on each target was greater for the PC fusion method (0.86) than that of the average fusion method (0.53). The means for the two methods are presented in figure 57.
63
sum fixation duration
1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0
0.86
0.53
avg
pc
Figure 5-7 – Average sum of fixation durations on targets of average and PC methods
5.2.3 False Color vs. Principal Components 5.2.3.1 Fixation count
A three-way ANOVA (image(8) X side(2) X method(2)) on fixation count percentage yielded a significant main effect of method { F(1.128) = 7.65; P < 0.05 }, indicating that areas of interest had more fixations fused with the PC method than with the FC method. Figure 5-8 presents mean fixation count percentage for the PC and FC methods in each image side. The analysis revealed no significant interaction effects.
fixation count %
60 50
50.95
46.47
44.26
41.18
40 fc
30
pc
20 10 0 left
right side
Figure 5-8 – Fixation count fraction of PC and FC methods in each image side.
64
A three way ANOVA (image(8) X side(2) X method(2)) conducted on the mean fixation durations revealed no significant effect or interaction.
5.2.3.2 Target fixation time
A one-way ANOVA conducted on the targets’ summed fixation time yielded a marginally significant effect of the method { F(1,51) = 3.26; p = 0.077 }. The means for the two methods are presented in figure 5-9. We can see from the graph that the means for fixation duration on each target was greater for the PC method (0.87) than that in the FC method (0.56). It is interesting to note, that although the effect was only marginally significant, it
sum fixation duration
is in the same tendency as the significant effect found for the fixation count. 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0
0.87
0.56
PC
FC
Figure 5-9 – Average sum of fixation durations on targets of FC and PC methods
5.3 Discussion Mackworth and Morandi (1967) divided two color photographs into 64 square regions. The regions were then rated according to the informativeness of each region by
65
a group of observers who were asked to rate the regions according to how easy it would be to recognize the region again. A group of viewers then examined the pictures and were asked to decide which image they preferred. The number of fixations in each region was found to be related to informativeness rating of the region. Henderson and Hollingworth (in Underwood, 1998) agreed with Mackworth and Morandi and argued that the total time a region is fixated is correlated with the number of fixations in that region. Because fixation density is higher for informative regions, not only the number of fixations, but also the amount of time viewers fixate on an area is dependent on the informativeness of that area; the more information the area contains, the more time a viewer will fixate on that area. (Henderson and Hollingworth 1998, in Underwood, 1998). We can therefore hypothesize that the more a viewer fixates on a region fused with a specific method, or on targets fused with that method, the more informative that region or that target would be. Thus, a fusion method which shows more fixations makes the features and objects in the image more informative, and therefore, the targets shown in that image would be more detectable.
When comparing the average and FC methods, image sides showing the FC method had significantly more fixations per image than identical image sides showing the average method. Targets in the false color fused areas had marginally significant more fixations than the average method did. When comparing the average and PC methods, areas showing the PC method had significantly more fixations per image than areas showing the average method did.
66
Targets in the PC areas had more fixations than the targets in the average areas, although this effect was not significant. The comparison of the PC and the FC methods showed that image sides showing the PC method had significantly more fixations per image than image sides showing the FC method. Targets in the PC sides had more fixations than the targets in the FC sides. This effect was marginally significant. The summary of these statistical results is presented it table 5-1
Fixation count percentage Target fixation duration Average fixation duration
Avg. vs. FC
Avg. vs. PC
PC vs. FC
FC(s)
PC(s)
PC(s)
FC(m)
PC
PC(m)
none
none
none
Table 5-1 - Summary of eye-tracking statistical results. (s) denotes statistically significant result; (m) denotes marginally significant result.
We can see in table 5-1 that for each comparison of two fusion methods, there is a significant effect for the number of fixations in one fused side. Fixations on the targets, on the other hand, while strongly following the tendency of that effect, show only a marginally significant effect. The main reason, in our opinion, that this tendency was only marginally significant despite the large difference between the group means, was the large differences between the subjects. Because of the complex task of searching IR images, not all observers were found to fixate on all targets, and on the more salient
67
targets, observers had a large variability for the number of fixations. In addition, according to the manufacture, the eye-tracking system's accuracy is 1º (equivalent to 40 pixels at a distance of 70 cm). This error affects mainly small targets, and can cause the data to be highly variable. The overall average fixation duration was not significant for all comparisons. However, as can be seen in table 5-1, the fixation count percentage was significant in all comparisons. Thus, since there is no difference between the means of each fixation duration, we can conclude that the higher number of fixations caused a longer dwell time in the more fixated areas. Summarizing these results, the PC method had more fixations and dwell time than both FC and average, FC had more fixations and dwell time then average, and average fusion method had the least number of fixations. Therefore we can conclude that the fusion method which causes the images to be most informative and most strongly attracts the eye is the principal components method, followed by false color and the average methods.
68
6. Discussion This research focused on comparing different multispectral fusion methods for a target detection task. The aim of this study was to take well known image fusion schemes, and to design special qualitative and quantitative measures to compare them. In addition, we introduced a novel method for multispectral fusion based on edge detection, and compared this method to other known methods. With the increased availability of multispectral data in the last ten years, many image fusion schemes have been introduced. Human factors experiments have been used to show that these fusion methods improve human performance in various visual tasks over the use of the single input bands (Krebs et al., 1999, Essock et al., 1999, Krebs et al., 2001). Although it has been established that in many cases the use of fused imagery improves human performance, there have not been many studies that have tried to examine which fusion method is to be used under which circumstances. The studies that did compare different fusion methods, compared them while trying to show their advantages over the input band, and did not emphasize the differences between the different fusion methods. We examined 26 different multispectral images, and fused them using 4 different fusion methods. Three of the four methods were based on known methods used in many studies, and one method was a novel method based on edge detection. A set of three experiments were then conducted in order to qualitatively and quantitatively compare between the different methods, and better understand the advantages and disadvantages of each method compared to the other. 69
In order to quantitatively examine which method is best, fused images of different scenes were presented to observers. Observers were assigned to different groups according to the fusion method. We compared detection accuracy and detection times of targets embedded in the fused images of the different fusion groups. In order to qualitatively compare between the two groups, we conducted a pairedcomparison experiment, where several observers compared two fusion methods. Pairwise subjective comparisons of the same image fused with the different fusion methods were evaluated and transformed to scalar values on a single scale. The method of paired comparisons is widely applied in social and economical related statistical research (Clarke et al., 1999, Huang and Stoll., 1996). Here, applied in a different context, we use it to compare different image samples. To get further insight, eye movements were recorded while the observers searched for targets. We used a novel method to compare between the different fusion methods. The images viewed by the observers were divided into two: the left side fused with one fusion method, and the right side fused with another fusion method. By comparing the eye movement data of the same area of the same image using different fusion methods, we were able to check which method attracted the eye more, and which method provided more visual information. The main motivation behind the use of this experimental method was that it is hard to measure precise eye fixation locations on small areas of interest. The declared system accuracy by the manufacturer is 1° which at a distance of 70cm is equivalent to 40 pixels. Since this is the length of some of the small targets in our images, and about half the length of the bigger targets, it is hard to reliably measure eye fixations on the targets.
70
Thus, it is difficult to reliably define where the person is fixating. Our solution to this problem was to define bigger areas of interest (e.g., to divide the image into two parts), and to compare viewing patterns in the same areas of interest of the same images of different fusion methods. In this way, we were also able to measure immediate differences between fusion methods, since observers were able to look at two fusion methods simultaneously.
6.1
Result Summary The results of the target detection experiment, showed a significant difference in
target hits; the edge method had significantly less hits than the other methods. In addition, the results showed some advantage of the PC and FC methods over the average method. Further investigation showed a difference between high quality and low quality images. The FC and PC methods were better for the high quality images, while the average, FC, and PC fusion methods were about the same, and better than the edge method for the low quality group. In the paired comparison experiment, the results showed the subjective opinions of the observers. In their opinion, the false color and average fusion methods better helped them to detect the targets than the edge and the principal components method did. The eye tracking experiment results showed a significant difference in the fixation number and dwell time observers spent looking at image sides and targets fused with each method. The results indicated that the PC method had more fixations than both the FC and the average methods, indicating that the PC method causes images to be more
71
visually informative than either the FC or the average methods. The FC method had more fixations than the average method. Table 6-1 ranks the fusion methods according to the results of the three experiments. Experiment / method Target detection
Principal components 1
False color
Edge
average
1
4
3
Paired comparisons
3
1
3
1
Eye tracking
1
2
-
3
Total
5
4
7(*)
7
Table 6-1 – Ranks of the four fusion methods for all experiment. 1-4 denotes the rank of the method in the experiment, where 1 is the highest performance and 4 is the lowest. * denotes a missing experiment
6.2
Results Discussion As seen in table 6-1 the edge fusion method did not perform well. It ranked last in
the target detection experiments, with significantly less detection rates, and last together with the PC method in the subjective opinion of the observers. Because of these rankings we did not use this method in the eye-tracking experiment. Edge enhancement is a method which strengthens information about edges of objects displayed. It finds the edge information of one input band and adds it to the other input band. A possible explanation for the low performance of the edge method is the low contrast of many of the input images used in our experiment. Edge enhancement will work best if the contrast between a target and its surroundings is high, making the targets’ edges clear. Some of our images had low resolution. In these images, the contrast,
72
especially of the mid and long IR input bands, was low, which made it hard on the edge detection algorithm to find the exact edges of the targets. The detection of these targets in other methods is based more upon the recognition of their shape than on the exact edges. In addition, the edge method uses features (edge information) which are extracted from one image and added to another image. Because of the low contrast in some of the images, these edge features were incomplete, and may have added clutter to the edge image. According to the feature detection theory, in complex search tasks, where a target is defined by a conjunction of features, serial attention process would be needed to locate the target. Thus, more features the distractors have, the more time the search will take (Treisman and Gelade, 1980). Adding clutter to an image adds another feature layer, which, when distributed all over the image, can decrease the observers’ detection performance. Another possible explanation for the edge method's poor performance is the fusion level. While the other methods fuse the input bands in the pixel level, fusing the new representation of the target from all input bands, the feature level edge method chooses one band, and adds features (edge information) from other bands. While the edge information adds to the contour of the target, the detection of a target depends on the recognition of its shape, not necessarily recognition of its contour. Figure 6-1 a-d displays two input bands and the average and edge fused methods of an image. We can see that the edge fused method is based on band 1. This method takes the target’s body almost completely from band 1, and adds information from band 2. The average method
73
on the other hand, takes the target’s body from both images, making the content of the target more uniform, and the shape of the target more salient.
(a) Band 1
(b) Band 2
(c) Average
(d) edge
Figure 6-1 – Example of multispectral image. a and b show the input bands, while c and d show the average and edge fused images respectively. Target is marked by an ellipse in the input bands.
Other results, as seen in table 6-1, indicate that over all experiments, the FC methods showed better performance than the average method. These results are consistent with the findings of Simard et al. (1999) who found that the FC algorithm is superior over the average algorithm. The PC method showed better results than the average method in both the target detection and the eye tracking experiment. Yet in the paired comparisons experiment, people preferred the average method over the PC method. It is interesting to note that
74
although the quantitative experiments indicate that the PC method is better, people perceived the average as better. A possible explanation is that the strong colors the PC method generated, which were not natural, caused the people to prefer the more natural seeming average representation. The false color method, while using unnatural colors as well, used brighter colors. Comparing the FC and the PC methods across the different experiments showed that the FC method was preferred in the paired comparisons experiment, the PC method showed better performance in the eye-tracking experiment, and they performed about the same in the target detection experiment. Overall, we cannot prefer one method over the other. These results support the finding of McCarley and Krebs (2000) who did not find a substantial difference between the FC and PC methods. Comparing our results to previous studies, we see that similar to the finding of Simard et al (1999) we found that the FC fusion method was better than the average fusion method. Krebs et al. (2001, 2002) showed that the PC fusion method showed equal or better performance than each of the input bands. We add to these findings by showing that the PC fusion method also showed better performance than the simple average fusion method. In addition, similar to the findings of McCarley and Krebs (2000), we did not find a significant difference between the FC and PC methods.
6.2.1 Chromatic vs. achromatic fusion Our results showed better performance for the chromatic fusion methods (FC, PC) than the achromatic fusion methods (average, edge). These results are in accordance with other studies (e.g., Krebs et al., 2001, Waxman et al., 1996, Sinai et al., 1999, Toet et al., 1997) which have shown some advantage of color over achromatic fusion.
75
Color can help explain the eye-tracking experiment, in which the FC and PC fusion methods performed better than the average method. Color adds another dimension to visual informativeness and thus guides the eye during visual search. Krebs et al. (2001) recorded eye movement of observers viewing multispectral images. They found that chromatic fused images showed shorter scan paths to the target than achromatic fused images. They argued that color-fused targets provided more information and therefore helped observers find the target faster than the achromatic fused images. Similar to the eye tracking experiment, in the target detection experiment the color methods performed better than the achromatic methods. These results are not surprising. Krebs et al. (2001) have shown that both spatial and color target attributes assist in visual search, and Essock et al. (1999) suggested that color in complex scenes aids the perceptual organization required for visual search. It should be noted that while other studies, which have shown possible contribution of color to fusion, help explain our results, we cannot claim that our study helps substantiate the advantages of chromatic fusion. Because the chromatic and achromatic fusion methods we used had different spatial content, the differences in performance can be attributed to the spatial content and not necessarily the color.
6.3
Limitations and future research The input multispectral images we used were given to us as is. We could not control
the input bands, the scenes, or the targets. This dictated parts of our experiment. For example, in the target detection experiment, we showed each observer eight different images. It would have been better to show each observer more images, but in the images
76
we had, there were only eight different scenes. Using more images, or alternatively, more observers, may have helped in the eye-tracking experiment to turn the tendency of the observers’ fixations on targets to become significant. This study compared four different fusion methods. Of these four fusion methods three (PC, FC and average) were frequently used in different human factors researches. Because of limited resources, we did not compare them to the opponent processing fusion method or to any multiscale decomposition based method. Future studies could include these methods, and compare them with the methods examined here to get a more complete picture of the different fusion methods. The present work focused on the task of target detection. Other tasks like situation awareness, recognition and identification, and different search tasks will not necessarily follow the trends found in this study. Further research can compare different fusion methods for different visual perception tasks. A suggestion for future research would be to develop a general predictive model for target detection performance based on fusion method. This study suggested three ways for comparing different fusion methods for perceptual tasks. Nevertheless, since there are many fusion methods, it is impossible to compare them all through all tasks and different environmental conditions of input images. The methods introduced here, could be used as part of the model that would analyze the relative efficiency of a specific fusion method. This model should present a way to rate a specific fusion method for different perceptual tasks, and have clear guidelines regarding the characteristics of the input images and the targets. Building such a model, would provide a consistent paradigm to compare the existing fusion methods, as well as yet-to-be-developed fusion methods.
77
7. Conclusions Four fusion methods have been compared in this study over different input bands, using three different experiments, for a target detection task. The results indicated that the false color and principal components fusion methods showed best results over all experiments and conditions. A novel method based on edge detection, did not yield good results. We introduced two new methodologies to compare fused images. In the first, paired comparisons were used to subjectively compare fusion methods according to observers’ preferences, and then scale the quality of each method with scalar values on one scale. In the second, a novel approach for using eye-tracking to compare different image samples was introduced, in which the images are divided into two areas and eye movements in each area are compared between different image samples. This approach helps solve the inherent problems in eye-tracking data with analyzing fixations in small targets.
There have been many studies which have shown the advantage of image fusion over the component bands. These studies have established that fusion is beneficial in many cases, and have introduced numerous fusion methods. The natural next step was to ask what fusion method should be used under which circumstances. This study tried to give a first perspective on this question. To further investigate this issue, comparison studies like this one should be done between different methods, input bands, and tasks.
78
Ultimately, answers to this question will provide us with a more complete understanding of how to fuse multispectral images for various tasks.
79
8. References
1
Aguilar, M., Fay D.A., Ross W.D., Waxman A.M., Ireland D.B., Racamato, J.P. (1998). Real-time fusion of low-light CCD and uncooled IR imagery for color night vision, Proceedings of the SPIE, vol. 3364, pp. 24-35.
2
Burt, P. J., Kolezynski. R. J. (1993). Enhanced image capture through fusion, Proc. The 4th International Conference on Computer Vision, pp. 173-182. IEEE Computer Society.
3
Caefer, C.E, Rotman, S.R., Silverman, J., and Yip, P.W. (2002). Algorithms for point target detection in hyperspectral imagery, Imaging Spectrometry VIII, Sylvia S. Shen, proceeding pf SPIE vol. 4816, pp. 242-257.
4
Canny, J. (1986). A Computational Approach to Edge Detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-8, No. 6, pp. 679-698.
5
Clarke A., Bell P.A., Peterson G. (1999). The Influence Of Attitude Priming And Social Responsibility on the Valuation of Environmental Public Goods Using Paired Comparisons, Environment and Behavior, Vol. 31, No. 6, 838-857.
6
Endsley, M. (1998). Design and evaluation of situation awareness enhancement, Proceedings of the Human Factors society, 32nd annual meetings, (pp. 97-101).
7
Essock E.A., Sinai M.J., McCarley J.S., Krebs W.K, DeFord J.K. (1999). Perceptual ability with real-world nighttime scenes: Image-intensified, infrared and fused-color imagery. Human Factors, 41 (3), 438-452.
8
Findlay, J.M. (1997). Saccade target selection in visual search. Vision Research, 37, 617-631 80
9
Fishbain B. (2004). Data fusion methods for fusing thermal and visual range video sequences. (Masters thesis) Tel Aviv University.
10 Goldstein B.E. (1996). Sensation and perception. Pacific Grove, CA: Brooks/Cole
Publishing Company
11 Gomez, R.B., Jazaeri, A., Kafatos, M. (2001). Wavelet-based hyperspectral and
multispectral image fusion, Proc. SPIE Vol. 4383, p. 36-42, Geo-Spatial Image and Data Exploitation II, William E. Roper; Ed.
12 Gonzalez R.C., Woods R.E., (1992) Digital Image Processing, Addison-Wesley.
13 Hahn M. , Samadzadegan F. (2004). A Study of Image Fusion Techniques in Remote
Sensing. Proceedings of XXth ISPRS Congress, Commission IV papers, Vol. XXXV, part B4
14 Huang, R. D., and Stoll H.R. (1996) Dealer versus Auction Markets: A Paired
Comparison of Execution Costs on NASDAQ and the NYSE, Journal of Financial Economics, 41, 313–357.
15 Hughes P. K. and Creed D. J. (1994). Eye movement behaviour viewing colour-
coded and monochrome avionic displays, Ergonomics 37, 1871-1884
16 Krebs W.K., McCarley J.M., Kozek T, Miller G.M., Sinai M.S., Werblin F.S. (1999)
An evaluation of a sensor fusion system to improve drivers’ nighttime detection of road hazards. Proceedings of the 43rd Annual Meeting Human Factors and Ergonomics Society. 43, 1333-1337
81
17 Krebs W.K, Scribner D.A, and McCarley J.S (2001). Comparing behavioral receiver
operating characteristic curves to multidimensional matched filters, Opt. Eng. 40(9), 1818-1826
18 Krebs W.K, Sinai M.J. (2002). Psychophysical Assessments of Image-Sensor Fused
Imagery, Human Factors 44(2), 257-271
19 Luo, R.C., Yih C, Su K.L (2002). Multisensor Fusion and Integration: Approaches,
Applications, and Future Research Directions, IEEE Sensors Journal vol.2 no.2
20 Maltz M., Shinar D, (1999). Eye Movements of Younger and Older Drivers, Human
Factors 41, 15-25.
21 Maltz M., Shinar D., (2003). New alternative methods of analyzing human behavior
in cued target acquisition, Human Factors 45(2):281-295
22 McCarley, J.S., Krebs, W.K. (2000). Visibility of road hazards in thermal, visible,
and sensor-fused night-time imagery. Applied Ergonomics, 31, 523-530.
23 Raviv O., and Rotman S.R., An improved filter for point target detection in multi-
dimensional imagery, Signal and Data Processing of Small Targets 2003, Sylvia S. Shen, Proceedings of SPIE Vol. 5159, in print.
24 Sampson, M.T., Krebs, W.K., Scribner, D.A., and Essock, E.A. (1996). Visual
search in natural (visible, infrared, and fused visible and infrared) stimuli. Investigative Ophthalmology and Visual Science, (SUPPL) 36, 1362, Ft. Lauderdale, FL.
25 Scheunders P, DeBackers S. (2001). Fusion and merging of multispectral images
with use of multiscale fundamental forms, Opt. Soc. Am. A 18(10), 2468
82
26 Sinai, M.J, McCarley, J.S., Krebs, W.K, and Essock, E.A. (1999). Psychophysical
comparisons of single and dual-band fused imagery. Proceedings of the SPIESynthetic Advanced Vision, 3691, 1-8
27 Silverstein, D.A. and J.E. Farrell. (2001). Efficient method for paired comparison,. J.
of Electronic Imaging 10(2):394-398. 28 Simard, P., Link, N K, Kruk, R V. (1999). Feature detection performance with fused
synthetic and sensor images, Human Factors and Ergonomics Society Annual Meeting, 43rd, 1108-1112.
29 Singh S, Gyaourova A, Bebis G, Pavlidis I, (2004). Infrared and visible image
fusion for face recognition, Proceedings of SPIE Vol. 5404, 585-596.
30 Steele P.M., Perconti P., (1997). Part task investigation of multispectral image
fusion using gray scale and synthetic color night vision sensor imagery for helicopter pilotage, Proc. SPIE Vol. 3062, p. 88-100, Targets and Backgrounds: Characterization and Representation III, Wendell R. Watkins; Dieter Clement; Eds.
31 Toet A., (1992). Multiscale contrast enhancement with applications to image fusion,
Optical Engineering, 31, 1026-1031
32 Toet, A., Walraven, J., (1996). New false color mapping for image fusion, Opt. Eng.
35, 650-658
33 Toet, A., Ijspeert, J. K., Waxman A. M., Aguilar M. (1997). Fusion of visible and
thermal imagery improves situational awareness, SPIE Proc. 3088, 177-188
34 Toet A., Bijl P., Kooi F. L., and Valenton J. M.. (1998). A high-resolution image
dataset for testing search and detection models, Report TNOTM-98-A020, TNO Human Factors Research Institute, Soesterberg, The Netherlands.
83
35 Toet A, Franken M (2003). Perceptual evaluation of different image fusion schemes,
Displays 24, 25-37
36 Torgerson, W.S. (1967). Theory and Methods of Scaling, chapter 10, John Wiley & Sons Inc. 37 Treisman A.M. and Gelade G. (1980) A feature-integration theory of attention,
Cognitive Psychology. 12, 97–136.
38 Underwood G. (2001). Eye guidance in Reading and Scene Perception, Elsevier
Science ltd, Oxford
39 Waxman, A.M, Gove A.N., Seibert M.C., Fay D.A., Carrick J.E., Racamato J.P.,
Savoye E.D., Burke B.E., Riech R.K., McGonagle W.H., Craig D.M. (1996) Progress on color night vision: visible/IR fusion, perception and search, and lowlight CCD imaging, Proceedings of SPIE 2736, 96-107
40 Waxman, A.M, Aguilar M., Baxter, R. A., Fay, D.A., Ireland, D.B., Racamato, J.P.,
Ross W.D. (1998). Opponent-color fusion of multi-sensor imagery: visible, IR and SAR, Proceedings of IRIS Passive Sensors, Vol. 1, pp. 43-61,
41 White, B.L. (1998), Evaluation of the Impact of Multispectral Image Fusion on
Human Perfromance in Global Scene Processing (Master’s thesis) Naval Postgraduate School, Monterey, CA.
42 Wickens C.D., Hollands J,G (2000) Engineering psychology and human
performance, Prentice-Hall Inc., New-Jersey
43 Xue Z, Blum, R.S. (2003). “Concealed weapon detection using color image fusion”,
Proceedings of the Sixth International Conference of Information Fusion 622-627
84
44 Zhang Z, Blum R.S. (1999). A categorization of multiscale-decomposition-based
image fusion schemes with a performance study for a digital camera application”, Proceedings of the IEEE 87(8), 1315-1326
45 Zhang Z, Blum R.S, (1997) “Multisensor Image Fusion Using a Region-Based
Wavelet Transform Approach”
85
9
Appendices
Appendix 1 - Familiarization images that were given before the target detection experiment
Average
Edge
False Color
Target detection
86
Appendix 2 – Target detection statistical results One way ANOVA and fisher LSD post-hoc tests of the effect of fusion method on total hit numbers Univariate Tests of Significance for total hits (Spreadsheet13) Sigma-restricted parameterization Effective hypothesis decomposition SS
Degr. of MS
Intercept 10395.88 1
F
p
10395.88 1155.979 0.000000
method
199.48
3
66.49
Error
467.64
52
8.99
7.394
0.000324
LSD test; variable total hits (Spreadsheet13) Probabilities for Post Hoc Tests Error: Between MS = 8.9931, df = 52.000 method
{1}
1 average 2 edge
{2}
{3}
{4}
0.005474 0.191498 0.317984 0.005474
0.000097 0.000271
3 false co 0.191498 0.000097
0.753954
4 principa 0.317984 0.000271 0.753954
One way ANOVA test of the effect of fusion method on false alarm numbers Univariate Tests of Significance for false alarm (target detection base) Sigma-restricted parameterization Effective hypothesis decomposition SS
Degr. of
MS
F
p
Intercept 3844.571 1
3844.571 159.3695 0.000000
method 31.000
10.333
Error
3
1254.429 52
0.4283
24.124
87
0.733528
method; LS Means Current effect: F(3, 52)=.42835, p=.73353 Effective hypothesis decomposition Vertical bars denote 0.95 confidence intervals 13 12 11
false alarm
10 9 8 7 6 5 4 3 average
edge
false co
principa
method
One way ANOVA test of the effect of fusion method on average detection time Univariate Tests of Significance for avg. to detection (target detection base) Sigma-restricted parameterization Effective hypothesis decomposition SS
Degr. of
MS
F
p
Intercept 753.1859 1
753.1859 538.6120 0.000000
method 1.3109
3
0.4370
52
1.3984
Error
72.7159
0.3125
0.816267
method; LS Means Current effect: F(3, 52)=.31248, p=.81627 Effective hypothesis decomposition Vertical bars denote 0.95 confidence intervals 4.8 4.6 4.4
avg. to detection
4.2 4.0 3.8 3.6 3.4 3.2 3.0 2.8 2.6 average
edge
false co
principa
method
88
Two way ANOVA test of Quality X Method Repeated Measures Analysis of Variance (target detection full) Sigma-restricted parameterization Effective hypothesis decomposition SS
Degr. of
MS
F
p
26.82101 1
26.82101 1080.336 0.000000
method
0.75104
3
0.25035
Error
1.29098
52
0.02483
QUALITY
0.18824
1
QUALITY*method 0.17893
Intercept
0.80128
Error
10.084
0.000024
0.18824
12.216
0.000978
3
0.05964
3.871
0.014232
52
0.01541
Fisher LSD Post hoc analysis of main effect of fusion method on good quality images LSD test; variable good quality (target detection full) Probabilities for Post Hoc Tests Error: Between MS = .01841, df = 52.000 method
{1}
1 average
{2}
{3}
{4}
0.293116 0.111453 0.056576
2 edge
0.293116
0.009805 0.004000
3 false color
0.111453 0.009805
0.742121
4 principal comp 0.056576 0.004000 0.742121
Main effect analysis of variance of main effect of fusion method on bad quality images Univariate Tests of Significance for bad quality (target detection full) Sigma-restricted parameterization Effective hypothesis decomposition SS
Degr. of
MS
F
p
Intercept 15.75161 1
15.75161 855.6592 0.000000
method 0.22020
3
0.07340
52
0.01841
Error
0.95725
3.9873
89
0.012467
Appendix 3 - paired comparisons decision time The decision time taken to choose one image modality over the other can be attributed to the certainty in the choice of the observer. The more time it takes to choose between two methods, the less certain the observer is of his or her choice. The following graphs show the decision time averages of each decision pair. A oneway ANOVA was conducted in order to examine the effect of the comparison type on the decision time. The following graph, and then the analysis shows this test.
Comparison; LS Means Current effect: F(5, 275)=6.5253, p=.00001 Effective hypothesis decomposition Vertical bars denote 0.95 confidence intervals 5.4 5.2 5.0
Decision Time
4.8 4.6 4.4 4.2 4.0 3.8 3.6 3.4 PC - FC
PC - Edge
PC - Avg
FC - Edge
FC - Avg
Comparison
LSD test; variable DV_1 (Spreadsheet1) Probabilities for Post Hoc Tests Error: Within MS = .40239, df = 275.00
R1
{1}
1 var2 2 Var3 0.005818
{2}
{3}
{4}
{5}
{6}
0.005818 0.031995 0.000199 0.466980 0.000003 0.533052 0.322034 0.041197 0.046104
90
Edge - Avg
3 Var4 0.031995 0.533052
0.107195 0.154704 0.009080
4 Var5 0.000199 0.322034 0.107195
0.002567 0.312696
5 Var6 0.466980 0.041197 0.154704 0.002567
0.000065
6 Var7 0.000003 0.046104 0.009080 0.312696 0.000065
It can be seen from the graph and the post-hoc analysis that the time taken to decide between the PC-FC pairs and the FC-Average pairs was the largest. It can be concluded that these are the pairs which it was hardest for the observers to decide on. The edge-Avg pair, on the other hand, was significantly lower than the other decision times. It can be concluded that it was easier for the observers to choose between these two image pairs.
91
Appendix 4 – Eye tracking statistical results
Average-FC target comparison Fixations on sides
Model Dimension(a)
Fixed Effects
Repeated Effects
Number of Levels 1
Intercept
Covariance Structure
Number of Parameters
2
1
pic
8
7
side
2
1
grp * pic
16
7
grp * side
4
1
grp * pic * side
32
14
trial
16
Diagonal
16
81
48
a Dependent Variable: fix_cnt_pct.
Information Criteria(a) -2 Restricted Log Likelihood
993.557
Akaike's Information Criterion (AIC)
1025.557
Hurvich and Tsai's Criterion (AICC)
1030.641
Bozdogan's Criterion (CAIC)
1086.681
Schwarz's Bayesian Criterion (BIC)
1070.681
The information criteria are displayed in smaller-is-better forms. a Dependent Variable: fix_cnt_pct.
Fixed Effects Type III Tests of Fixed Effects(a)
grp
Number of Subjects
1
grp
Total
Source Intercept
Subject Variables
Numerator df 1
Denominator df 78.835
F 2104.252
Sig. .000
1
78.835
15.811
.000
92
sub
10
pic
7
31.708
.147
.993
side
1
78.835
14.157
.000
grp * pic
7
31.708
2.235
.058
grp * side
1
78.835
.019
.891
29.066
3.680
.001
grp * pic * side
14 a Dependent Variable: fix_cnt_pct.
Estimated Marginal Means 1. grp(a) 95% Confidence Interval grp Avg
Mean
FC
Std. Error
df
43.310
1.429
73.838
Lower Bound 40.463
Upper Bound 46.158
51.531
1.494
76.860
48.556
54.507
a Dependent Variable: fix_cnt_pct.
2. grp * pic(a) 95% Confidence Interval grp Avg
FC
pic 1.00
Mean
Std. Error
df
48.394
6.677
15.771
Lower Bound 34.223
Upper Bound 62.565
2.00
42.520
4.774
11.882
32.106
52.934
3.00
44.012
2.317
15.796
39.094
48.929
4.00
47.792
4.952
18.312
37.400
58.183
5.00
40.986
3.258
15.739
34.071
47.901
6.00
38.382
2.366
9.035
33.033
43.731
7.00
44.789
2.801
15.929
38.848
50.730
8.00
39.607
3.060
9.737
32.764
46.449
1.00
47.573
6.744
15.156
33.211
61.934
2.00
53.064
5.302
12.889
41.600
64.528
3.00
50.549
2.407
13.605
45.371
55.726
4.00
41.688
4.973
15.808
31.134
52.241
5.00
54.268
3.375
16.560
47.133
61.403
6.00
56.258
2.478
10.638
50.782
61.733
7.00
52.474
2.807
15.300
46.503
58.446
3.656
18.905
48.723
64.032
8.00
56.377 a Dependent Variable: fix_cnt_pct.
3. grp * side(a) 95% Confidence Interval grp Avg FC
side left
Mean
Std. Error
df
47.057
2.020
40.147
Lower Bound 42.975
Upper Bound 51.139
right
39.563
2.022
34.181
35.455
43.671
left
55.563
2.113
42.955
51.301
59.826
93
right 47.499 a Dependent Variable: fix_cnt_pct.
2.113
34.764
43.209
51.790
4. grp * pic * side(a) 95% Confidence Interval grp Avg
pic 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00
FC
1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00
side left
Mean
Std. Error
df
51.626
9.542
7.911
Lower Bound 29.580
right
45.162
9.342
7.865
23.554
66.769
left
41.391
5.737
6.828
27.755
55.026
right
43.650
7.633
6.274
25.169
62.130
left
46.746
2.980
7.786
39.840
53.651
right
41.278
3.549
8.319
33.148
49.407
left
45.942
6.690
7.183
30.203
61.680
right
49.642
7.303
11.536
33.658
65.626
left
50.303
3.734
14.739
42.332
58.275
right
31.669
5.339
8.022
19.363
43.975
left
45.262
4.549
7.779
34.719
55.805
right
31.503
1.302
6.759
28.403
34.603
left
40.095
4.125
9.134
30.783
49.406
right
49.484
3.791
6.851
40.479
58.488
left
55.094
5.645
7.209
41.825
68.364
right
24.119
2.363
9.799
18.839
29.399
left
52.049
8.534
7.911
32.330
71.767
right
43.097
10.445
7.865
18.939
67.255
left
53.090
8.113
6.828
33.807
72.374
right
53.038
6.827
6.274
36.508
69.568
left
55.973
2.640
8.093
49.898
62.049
right
45.124
4.026
7.847
35.807
54.441
left
42.326
6.603
11.657
27.892
56.759
right
41.049
7.439
6.714
23.306
58.793
left
60.890
5.356
7.980
48.533
73.247
right
47.646
4.107
12.818
38.760
56.532
left
62.466
4.816
9.536
51.665
73.267
right
50.049
1.168
6.677
47.261
52.838
left
47.289
3.998
6.660
37.736
56.842
right
57.660
3.940
9.086
48.760
66.559
left
70.424
5.363
12.230
58.763
82.084
right
42.331
4.970
7.303
30.677
53.986
a Dependent Variable: fix_cnt_pct.
Fixations on targets Means: method; Weighted Means (pc - avg targets) Current effect: F(1, 30)=3.2024, p=.08363
94
Upper Bound 73.672
Effective hypothesis decomposition method sum fix dur sum fix dur sum fix dur sum fix dur N 1 avg
0.319913
0.103035
0.111672
0.528154
23
2 FC
0.626895
0.113363
0.397780
0.856010
19
Univariate Tests of Significance for sum fix dur (fc - avg targets) Sigma-restricted parameterization Type II decomposition SS
Degr. of
MS
F
p
Intercept 1.493229 1
1.493229 5.624364 0.024329
method 0.850213 1
0.850213 3.202394 0.083631
image
0.726146 7
0.103735 0.390727 0.900358
side
0.268973 3
0.089658 0.337703 0.798176
Error
7.964786 30
0.265493
Average-PC target comparison Fixations on sides Model Dimension(a)
Fixed Effects
Intercept
Covariance Structure
Number of Parameters
Subject Variables
2
1
pic
8
7
2
1
16
7
grp * side
4
1
pic * side
16
7
grp * pic * side
32
7
trial
16
grp * pic
Total
Diagonal
97
Information Criteria(a) 833.892
Akaike's Information Criterion (AIC)
865.892
Hurvich and Tsai's Criterion (AICC)
872.446
16 48
a Dependent Variable: fix_cnt_pct.
-2 Restricted Log Likelihood
Number of Subjects
1
grp side
Repeated Effects
Number of Levels 1
95
sub
9
Bozdogan's Criterion (CAIC)
923.575
Schwarz's Bayesian Criterion (BIC)
907.575
The information criteria are displayed in smaller-is-better forms. a Dependent Variable: fix_cnt_pct.
Fixed Effects Type III Tests of Fixed Effects(a)
Source Intercept
Numerator df 1
Denominator df 65.980
F 1578.738
Sig. .000
grp
1
65.980
16.061
.000
pic
7
27.206
1.077
.405
side
1
65.980
21.745
.000
grp * pic
7
27.206
4.471
.002
grp * side
1
65.980
.202
.654
pic * side
7
27.206
4.500
.002
grp * pic * side
7
27.206
.541
.796
a Dependent Variable: fix_cnt_pct.
Estimated Marginal Means 1. grp Estimates(a) 95% Confidence Interval grp avg pc
Mean
Std. Error
df
42.159
1.601
32.949
Lower Bound 38.901
Upper Bound 45.417
51.617
1.734
33.380
48.091
55.143
a Dependent Variable: fix_cnt_pct.
Pairwise Comparisons(b) 95% Confidence Interval for Difference(a)
Mean Difference (I) grp (J) grp (I-J) Std. Error df Sig.(a) Lower Bound Upper Bound avg pc -9.458(*) 2.360 65.980 .000 -14.171 -4.746 pc avg 9.458(*) 2.360 65.980 .000 4.746 14.171 Based on estimated marginal means * The mean difference is significant at the .05 level. a Adjustment for multiple comparisons: Least Significant Difference (equivalent to no adjustments). b Dependent Variable: fix_cnt_pct.
96
Tests of simple effect(a)
Numerator df 1
Denominator df 65.980
F 16.061
Sig. .000
The F tests the effect of grp. This test is based on the linearly independent pairwise comparisons among the estimated marginal means. a Dependent Variable: fix_cnt_pct.
2. grp * pic(a) 95% Confidence Interval grp avg
pc
pic 1.00
Mean
Std. Error
df
37.857
5.605
5.741
Lower Bound 23.990
2.00
43.386
7.164
5.301
25.280
61.491
3.00
45.072
3.661
3.019
33.463
56.682
4.00
38.561
3.447
10.827
30.959
46.164
5.00
48.839
4.283
18.988
39.875
57.803
6.00
49.853
3.735
12.137
41.726
57.980
7.00
28.999
3.833
11.109
20.573
37.426
8.00
44.702
3.017
5.892
37.287
52.117
1.00
59.105
3.045
4.700
51.124
67.086
2.00
52.329
7.026
4.848
34.096
70.563
3.00
38.482
5.933
13.978
25.754
51.210
4.00
46.313
4.007
11.610
37.550
55.076
5.00
48.655
4.287
9.122
38.976
58.334
6.00
50.771
5.690
20.034
38.903
62.639
7.00
64.074
4.583
11.116
54.000
74.147
3.273
5.544
45.036
61.380
8.00
53.208 a Dependent Variable: fix_cnt_pct.
Upper Bound 51.723
3. grp * side(a) 95% Confidence Interval grp avg pc
side left
Mean
Std. Error
df
47.131
2.336
28.207
Lower Bound 42.348
Upper Bound 51.913
right
37.186
2.191
34.689
32.736
41.637
left
57.650
2.481
34.083
52.609
62.691
right
45.584
2.423
26.213
40.606
50.562
a Dependent Variable: fix_cnt_pct.
4. grp * pic * side(a) 95% Confidence Interval grp avg
pic 1.00
side left
Mean 54.727
Std. Error 7.339
97
df 5.741
Lower Bound 36.572
Upper Bound 72.883
right
20.986
8.474
5.741
.022
41.950
left
55.792
11.699
5.301
26.225
85.358
right
30.980
8.273
5.301
10.073
51.886
left
38.849
4.295
2.792
24.587
53.111
right
51.296
5.930
3.135
32.876
69.716
left
36.642
5.100
11.149
25.435
47.850
right
40.480
4.639
7.478
29.651
51.309
left
51.053
5.625
10.738
38.636
63.469
right
46.625
6.460
14.650
32.828
60.422
left
55.210
5.580
9.912
42.762
67.657
right
44.496
4.966
9.631
33.374
55.619
left
31.372
5.354
6.579
18.547
44.197
right
26.627
5.487
15.640
14.972
38.281
left
53.402
4.741
6.843
42.140
64.665
right
36.001
3.732
4.542
26.109
45.894
left
72.446
4.604
4.700
60.380
84.512
right
45.763
3.987
4.700
35.314
56.213
left
69.492
8.113
4.848
48.438
90.547
right
35.167
11.474
4.848
5.391
64.942
left
30.523
10.209
10.634
7.959
53.087
right
46.441
6.050
11.489
33.194
59.689
left
52.833
5.736
7.613
39.489
66.178
right
39.793
5.597
12.350
27.636
51.949
left
55.430
6.405
11.928
41.466
69.394
right
41.879
5.702
5.965
27.908
55.851
left
51.775
7.836
10.410
34.409
69.142
right
49.767
8.253
16.355
32.303
67.231
left
65.936
7.047
12.980
50.709
81.163
right
62.211
5.860
7.198
48.431
75.991
left
62.768
4.200
3.832
50.903
74.633
43.649 a Dependent Variable: fix_cnt_pct.
5.022
7.209
31.843
55.454
2.00 3.00 4.00 5.00 6.00 7.00 8.00 pc
1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00
right
Fixations on targets Means: method sum fix dur sum fix dur sum fix dur sum fix dur N 1 Avg
0.530550
0.185941
0.154748
0.906352
20
2 Pc
0.862227
0.177288
0.503915
1.220540
22
Univariate Tests of Significance for sum fix dur (fc - avg targets) Sigma-restricted parameterization Effective hypothesis decomposition
98
SS
Degr. of
MS
F
p
Intercept 20.32201 1
20.32201 29.38898 0.000003
method 1.15248
1.15248
Error
1
27.65936 40
1.66668
0.204115
0.69148
FC-PC target comparison Fixations on sides Model Dimension(a)
Fixed Effects
Intercept
Covariance Structure
Number of Parameters
Subject Variables
2
1
pic
8
7
2
1
16
7
grp * side
4
1
pic * side
16
7
grp * pic * side
32
7
trial
16
grp * pic
Total
Diagonal
97
Information Criteria(a) 1069.622
Akaike's Information Criterion (AIC)
1101.622
Hurvich and Tsai's Criterion (AICC)
1106.523
Bozdogan's Criterion (CAIC)
1163.254
Schwarz's Bayesian Criterion (BIC)
1147.254
16 48
a Dependent Variable: fix_cnt_pct.
-2 Restricted Log Likelihood
Number of Subjects
1
grp side
Repeated Effects
Number of Levels 1
The information criteria are displayed in smaller-is-better forms. a Dependent Variable: fix_cnt_pct.
Fixed Effects Type III Tests of Fixed Effects(a)
99
sub
10
Source Intercept
Numerator df 1
Denominator df 105.293
F 1785.758
Sig. .000
grp
1
105.293
7.651
.007
pic
7
28.342
.957
.480
side
1
105.293
3.056
.083
grp * pic
7
28.342
.872
.540
grp * side
1
105.293
.105
.746
pic * side
7
28.342
1.653
.161
28.342
.254
.967
grp * pic * side
7 a Dependent Variable: fix_cnt_pct.
Estimated Marginal Means 1. grp Estimates(a) 95% Confidence Interval grp fc
Mean
Std. Error
df
1.530
105.293
Lower Bound 39.691
48.709 1.530 a Dependent Variable: fix_cnt_pct.
105.293
45.676
pc
42.724
Upper Bound 45.758 51.743
Pairwise Comparisons(b) 95% Confidence Interval for Difference(a)
Mean Difference (I) grp (J) grp (I-J) Std. Error df Sig.(a) Lower Bound Upper Bound fc pc -5.985(*) 2.164 105.293 .007 -10.275 -1.695 pc fc 5.985(*) 2.164 105.293 .007 1.695 10.275 Based on estimated marginal means * The mean difference is significant at the .05 level. a Adjustment for multiple comparisons: Least Significant Difference (equivalent to no adjustments). b Dependent Variable: fix_cnt_pct.
Tests of simple effect(a)
Numerator df Denominator df F Sig. 1 105.293 7.651 .007 The F tests the effect of grp. This test is based on the linearly independent pairwise comparisons among the estimated marginal means. a Dependent Variable: fix_cnt_pct.
100
2. grp * pic(a) 95% Confidence Interval grp fc
pc
pic 1.00
Mean
Std. Error
df
42.953
3.182
15.892
Lower Bound 36.204
Upper Bound 49.702
2.00
39.980
5.630
15.988
28.045
51.915
3.00
41.889
3.624
15.341
34.180
49.598
4.00
38.072
4.299
12.402
28.739
47.404
5.00
49.566
4.893
14.868
39.129
60.003
6.00
45.705
4.631
15.688
35.872
55.538
7.00
42.528
4.699
15.111
32.518
52.537
8.00
41.103
2.986
15.981
34.771
47.434
1.00
48.459
3.182
15.892
41.711
55.208
2.00
52.093
5.630
15.988
40.158
64.028
3.00
49.289
3.624
15.341
41.580
56.998
4.00
40.156
4.299
12.402
30.823
49.488
5.00
45.199
4.893
14.868
34.762
55.636
6.00
51.769
4.631
15.688
41.936
61.602
7.00
48.369
4.699
15.111
38.359
58.378
8.00
54.342
2.986
15.981
48.011
60.674
a Dependent Variable: fix_cnt_pct.
3. grp * side(a) 95% Confidence Interval grp fc pc
side left
Mean
Std. Error
df
44.264
2.328
53.997
Lower Bound 39.597
Upper Bound 48.932
right
41.184
1.986
53.874
37.203
45.166
left
50.952
2.328
53.997
46.284
55.619
1.986
53.874
42.486
50.449
right
46.467 a Dependent Variable: fix_cnt_pct.
4. grp * pic * side(a) 95% Confidence Interval grp fc
pic 1.00 2.00 3.00 4.00 5.00 6.00
side left
Mean
Std. Error
df
48.640
4.681
8
Lower Bound 37.845
Upper Bound 59.435
right
37.266
4.310
8
27.327
47.205
left
41.745
8.071
8
23.135
60.356
right
38.215
7.851
8
20.111
56.319
left
41.188
4.563
8
30.666
51.709
right
42.590
5.631
8
29.606
55.575
left
46.516
7.541
8
29.127
63.905
right
29.627
4.129
8
20.105
39.150
left
48.070
7.816
8
30.046
66.094
right
51.062
5.888
8
37.484
64.640
left
47.400
6.996
8
31.268
63.531
101
right
44.011
6.070
8
30.014
58.008
left
38.336
7.407
8
21.254
55.418
right
46.720
5.784
8
33.382
60.057
left
42.220
4.150
8
32.651
51.789
right
39.985
4.296
8
30.079
49.891
left
57.521
4.681
8
46.726
68.317
right
39.397
4.310
8
29.458
49.337
left
53.451
8.071
8
34.840
72.062
right
50.735
7.851
8
32.630
68.839
left
53.306
4.563
8
42.784
63.828
right
45.272
5.631
8
32.288
58.257
left
44.097
7.541
8
26.708
61.486
right
36.215
4.129
8
26.693
45.737
left
45.039
7.816
8
27.015
63.063
right
45.359
5.888
8
31.781
58.937
left
52.806
6.996
8
36.674
68.938
right
50.732
6.070
8
36.735
64.729
left
44.006
7.407
8
26.924
61.087
right
52.731
5.784
8
39.394
66.069
left
57.388
4.150
8
47.818
66.957
51.296 a Dependent Variable: fix_cnt_pct.
4.296
8
41.390
61.202
7.00 8.00 pc
1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00
right
Fixations on targets Means: method; Weighted Means (fc - pc targets) Current effect: F(1, 51)=3.2628, p=.07677 Type II decomposition method sum fix dur sum fix dur sum fix dur sum fix dur N 1 pc
0.871857
0.164702
0.537142
1.206572
35
2 FC
0.562429
0.113947
0.328628
0.796229
28
Univariate Tests of Significance for sum fix dur (fc - pc targets) Sigma-restricted parameterization Type II decomposition SS
Degr. of
MS
F
p
Intercept 33.97246 1
33.97246 48.38103 0.000000
method 2.29107
1
2.29107
3.26276
0.076772
image
4.63795
7
0.66256
0.94358
0.481824
side
1.30799
3
0.43600
0.62091
0.604704
Error
35.81146 51
0.70219
102
Appendix 5 – Example of original and fused images. Targets are marked in the fused average image.
Original Band 1
Original band 2
Fused image using the average method
103
Fused image using the edge method
Fused method using the false color method
Fused method using the principle components method
104
Appendix 6 – List of images used in the experiments Here is a list of image names which we have used in the experiments. The names are as they came in the disk from MAFFAT Image comparison: R2_1_d1_d2 R2_1_d1_d3 R2_1_d1_d3_i1 R2_1_D4_1C4_I3 R2_1_D2_I1 R2_2_D1_D3_I1 R2_2_D1_I1 R2_2_D4_1C4_RI3 R4_1_1C2_1D2 R4_1_1C2_1I1 R4_1_1C2_2I1 R4_1_1D4_I3 R4_2_1C2_1D2 R4_2_1D4_1C4_L_S R4_2_1D4_1C4_R_S R4_2_2D4_3C4_LI3 R4_2_2D4_LI3 R4_2_2D4_3C4_RI3 R4_2_2D4_RI3
Target detection and eye tracking: R2_1_D1_D2 R4_1_1C2_1I1 R2_1_D4_1C4_I3 R4_1_1C2_1D2 R2_2_D1_I1 R4_2_1C2_1D2 R4_2_3C2_LI1 R4_2_2D4_RI3
105