Comparing multispectral image fusion methods for a ... - CiteSeerX

Ben-Gurion University of the Negev Faculty of Engineering Sciences The Department of Industrial Engineering and Management

Comparing Multispectral Image Fusion Methods for a Target Detection Task

By: Yoel Lanir

Thesis submitted in partial fulfillment of the requirements for the M.Sc Degree June 2005

I

Ben-Gurion University of the Negev Faculty of Engineering Sciences The Department of Industrial Engineering and Management

Comparing Multispectral Image Fusion Methods for a Target Detection Task

By: Yoel Lanir Supervised By: Dr. Masha Maltz

Thesis submitted in partial fulfillment of the requirements for the M.Sc Degree

June 2005

II

Abstract With the advance in multispectral imaging, the use of image fusion has emerged as a new and important research area. Many studies have examined human performance in specific fusion methods over the individual input bands, yet few comparison studies have been conducted to examine which fusion method is preferable over another. The current study compared human performance for the pixel averaging, false color, and principal components fusion methods, as well as a novel method based on edge detection, for a target detection task. Three experiments, involving 89 participants, were conducted. In the first experiment, images with multiple targets were presented to the participants. Quantitative measurements of participants' hit accuracy and reaction time were measured. In the second experiment, a paired comparison method was used in order to qualitatively assess the subjective value of the methods and scale the quality of each method. In the third experiment, participants' eye movements were recorded as they searched for targets. A novel method was introduced to compare eye movement data of different image samples when looking at small targets. Results indicated that the false color and principal components fusion methods showed best results over all experiments.

III

Acknowledgements

Firstly, I would like to express my gratitude to my supervisor, Dr. Masha Maltz, for her enthusiasm, support, patience, and all the helpful conversations, and comments on my text and throughout the course of this work. I would also like to thank Prof. Stanley Rotman for the professional guidance, and all the helpful comments and ideas. I am grateful to Prof. Joachim Mayer for his insightful comments and to Prof. David Shinar for his kindness and help throughout my studies. Also, I would like to thank Dr. Yisrael Parmet for the statistical help. Lastly, and most importantly, I would like to thank my dear wife Shuli, for her love and patience, for her everlasting support, and for being a great source of strength throughout this work. Without her, I would never be where I am today.

IV

Table of Content

1

Introduction............................................................................................................... 1 1.1 Image fusion........................................................................................................ 1 1.2 Multispectral images........................................................................................... 3 1.3 Image fusion applications ................................................................................... 4 1.4 Categories of fusion ............................................................................................ 5 1.5 Methods of Image Fusion ................................................................................... 6 1.6 Human factor issues in Image fusion................................................................ 11 1.6.1 Detection Tasks......................................................................................... 11 1.6.2 Search Tasks ............................................................................................. 13 1.6.3 Recognition and identification Tasks........................................................ 14 1.6.4 Situation awareness .................................................................................. 16 1.6.5 Human factor discussion........................................................................... 17 1.6.6 The lab method verses the natural method ............................................... 18 1.7 Chromatic and achromatic fusion ..................................................................... 19 1.8 Effect of the input bands on the fusion ............................................................. 21 1.9 The effect of the scene and target ..................................................................... 22 1.10 Target Detection................................................................................................ 23

2

The Current Study.................................................................................................. 25 2.1 Research Objectives.......................................................................................... 25 2.2 Research methodology...................................................................................... 27 2.3 Fusion methods ................................................................................................. 28 2.3.1 False Color ................................................................................................ 28 2.3.2 Principal Components............................................................................... 29 2.3.3 Simple intensity average ........................................................................... 30 2.3.4 Edge fusion ............................................................................................... 30

3

Target Detection...................................................................................................... 32 3.1 Method .............................................................................................................. 32 3.1.1 Participants................................................................................................ 32 3.1.2 Apparatus .................................................................................................. 33 3.1.3 Stimuli....................................................................................................... 33 3.1.4 Procedure .................................................................................................. 34 3.2 Results............................................................................................................... 37 3.2.1 Total hits ................................................................................................... 37 3.2.2 False alarms .............................................................................................. 40 3.2.3 Detection time........................................................................................... 41 3.3 Discussion ......................................................................................................... 42

V

4

Paired Comparisons................................................................................................ 44 4.1 Method .............................................................................................................. 44 4.1.1 Participants................................................................................................ 44 4.1.2 Apparatus .................................................................................................. 45 4.1.3 Stimuli....................................................................................................... 45 4.1.4 Procedure .................................................................................................. 46 4.2 Results + Discussion......................................................................................... 47

5

Eye Tracking ........................................................................................................... 53 5.1 Method .............................................................................................................. 54 5.1.1 Participants................................................................................................ 54 5.1.2 Apparatus .................................................................................................. 54 5.1.3 Stimuli....................................................................................................... 55 5.1.4 Procedure .................................................................................................. 56 5.2 Results............................................................................................................... 57 5.2.1 Average vs. False Color............................................................................ 58 5.2.2 Average vs. Principal Components........................................................... 61 5.2.3 False Color vs. Principal Components...................................................... 64 5.3 Discussion ......................................................................................................... 65

6.

Discussion................................................................................................................. 69 6.1 Result Summary................................................................................................ 71 6.2 Results Discussion ............................................................................................ 72 6.2.1 Chromatic vs. achromatic fusion .............................................................. 75 6.3 Limitations and future research ........................................................................ 76

7.

Conclusions.............................................................................................................. 78

8.

References................................................................................................................ 80

9

Appendices............................................................................................................... 86 Appendix1: Appendix2: Appendix3: Appendix4: Appendix5: Appendix6:

Familiarization images that were given before the target detection 86 Target detection statistical results .................................................... 86 Paired comparison decision time ..................................................... 90 Eye tracking statistical results .......................................................... 92 Example of original and fused images ........................................... 103 List of images used in the experiments .......................................... 105

VI

List of Tables

Table 4-1 Sum of comparative judgments of all images ..................................................... 48 Table 4-2 Percentage of comparative judgments of all images ............................................ 49 Table 4-3 Distance between each method in standard deviation of preference units.... 50 Table 4-4 Scale value of the different fusion methods ........................................................ 51 Table 5-1 Summary of eye-tracking statistical results ........................................................ 67 Table 6-1 Ranks of the four fusion methods for all experiment ........................................... 72

VII

List of Figures

Figure 1-1

Diagram of a generic multiscale decomposition fusion ....................................... 7

Figure 3-1 Example fused image band1 ........................................................................ 35 Figure 3-2 Example fused image band2 ........................................................................ 35 Figure 3-3 Example fused image Average ......................................................................... 35 Figure 3-4 Example fused image Edge.............................................................................. 36 Figure 3-5 Example fused image false color ...................................................................... 36 Figure 3-6 Example fused image principal components ..................................................... 36 Figure 3-7 effect of fusion method on the hit percentage average ................................ 38 Figure 3-8 – hit percentage – fusion method X image quality. Means for high and low image quality in each fusion method ................................................................................ 39 Figure 3-9 Mean false alarm rates per fusion group ........................................................... 41 Figure 3-10 Mean target detection time per fusion group ................................................... 42

Figure 4-1 Example of a paired comparison between average and principle components methods ............................................................................................................................. 47 Figure 4-2 scale value by law of comparative judgment of different fusion methods ............ 51

Figure 5-1 Figure 5-2 Figure 5-3 Figure 5-4 Figure 5-5 Figure 5-6 Figure 5-7 Figure 5-8 Figure 5-9

fixation count percent of average and FC in each image side...................... 59 Effect of fusion method on the fixation number percent for each image..... 60 Average sum of fixation durations on targets of average and FC methods . 61 fixation count percentage of average and PC methods in each image side . 62 effect of fusion method on the fixation number percentage in each image . 63 Average sum of fixation durations on targets of average and PC method... 64 Fixation count fraction of PC and FC methods in each image side ............. 64 Average sum of fixation durations on targets of FC and PC methods .................. 65

Figure 6-1

Example of multispectral image ........................................................................ 74

Example of a composite image ........................................................................ 56

VIII

Keywords Target detection, Multispectral imaging, Image fusion, Visual search, Eye Tracking.

IX

X

1 Introduction This research project presents a comprehensive approach for comparing different fusion methods of multispectral images. We examined four different fusion methods, and compared human performance of observers viewing fused images in a target detection task

1.1 Image fusion With the availability of multi-sensor data in many fields such as remote sensing, computer vision, medical, and army applications, image fusion has emerged as a new and important research area. The human eye is sensitive to a limited range of the electromagnetic spectrum as well as to low light intensity. In order to answer the need for obtaining data that can not be sensed by the eye, one can use sensor data such as thermal sensors or image intensifier night time sensors. In certain tasks, the human observer needs data from multiple sensors. For example, using the visual channel as well as the thermal channel can substantially improve the ability to detect a target (Toet et al., 1997). When the target is cold (early morning, rain) the contrast between the target and the background is larger in the visual channel then in the thermal channel, resulting in higher detection in the visible channel. On the other hand, it is hard to detect a hidden target in the visual channel, yet the difference between the target’s thermal signature and its surroundings may ease its detection in the thermal channel. According to Toet (1992), the simultaneous use of different sensors in different displays increases the operator workload. It is difficult to reliably integrate the visual

1

information from different displays, both in a spatial display which puts the display of each sensor side by side, and in a sequential display which shows the displays one after another. Recognizing relationships among patterns can be difficult, since an object can appear quite differently in the different sensor displays. A solution for this problem is to display the different sensor data using one display, combining the data using a process which is called sensor or image fusion. Another potential advantage of image fusion is to provide scene information not present in the input bands. By deriving information based on the differences between the input images, information not presented in the input bands can be shown in the fused image. (Sinai et al. 1999) Image fusion refers to the process of combining the signals provided by different sensors of the same image in one display. Image fusion tends to improve the reliability of the images using the redundant information between the different images and to improve the capability by using the complementary information between the images. This type of image fusion is also called pixel-level multi-sensor fusion (Luo et al. 2002). The sensors used for image fusion need to be accurately co-aligned so that their images will be in spatial registration. The different image fusion schemes should extract all useful information from the source images, while not adding artifacts which will distract human observers, and keeping the scheme reliable and robust to imperfections such as mis-registration of the source images. Our goal in image fusion is to combine and preserve in a single image, all the perceptually important information that is present in the input images, so that the resulting image will be more suitable for the purpose of human visual perception, object

2

detection, and target recognition. Hence, for a given observation task, performance with the fused image should be at least as good as performance with the individual input images.

1.2 Multispectral images Using multi-spectral data from satellites and from individual cameras is the wave of the future for military and industrial applications. While the increase in camera resolution has improved each year, many feel that now the development of multispectral cameras will be the cause of the next large advance in electro-optical target acquisition. Multispectral images are images from two or more discrete bands of the electromagnetic spectrum. Each individual image is of the same scene and resolution, but of a different spectral band. For example, a digital camera can capture three separate images from the electromagnetic spectrum: red, green and blue. Later, they are combined to form a single RGB image. The bands of a multispectral image can be from the visible or non-visible wavelength of the spectrum. This technique can provide an optical spectrum of each pixel of the image. Image processing techniques can then be applied to extract the required data from the image. Objects of different materials normally have unique spectral signatures. In other words, different objects will reflect and absorb light differently. Thus, we can define a spectral signature for each type of object in the scene. In this way, we can use the fusion of multispectral images to enhance the view of objects of interest in the scene. Multispectral imaging in the visible and the near infrared wavelength range is routinely used in remote sensing (the analysis of landscapes and structures from aircraft

3

or satellites). There are many applications that can benefit from this procedure including the detection of different crops, mineral deposits or land-mines, military camouflage detection, and monitoring of agricultural resources. Another application of multispectral data is automatic target recognition. Computers are able to quickly and efficiently segment and analyze images when based both on their brightness and on their spectral signatures (Caefer et al. 2002). Point targets become clear when matched target algorithms and anomaly detection algorithms (Raviv and Rotman, 2003) are applied. However, this advantage becomes less clear when multispectral data is being presented to human observers. The problem is that while multispectral data is three dimensional (x, y, and the spectral dimension), the image presented to the observer is two dimensional. Image fusion provides us a way to transform the three dimensional data into two dimensions.

1.3 Image fusion applications Development of different image fusion methods is intended for many fields including medical imagery, processing of satellite images (remote sensing), tracking of production processes and military applications. Here are some examples of applications of image fusion: 1. In order to help night driving in conditions of low beam illumination or no illumination at all, it was suggested to add a thermal sensor to the car (Krebs et al. 1999). The image from the thermal sensor is presented on a display (HUD), which is placed on the windscreen just above the steering wheel. This setting is problematic since it requires the driver to alternate attention and gaze between the front-viewed

4

scene and the HUD, and to integrate information from displays differing in size, aspect ratio, luminance, and special ratio. The alternative display suggested is a system that will present a combination of the IR information and visible information in one sensor fused image. 2. Perconti and Steele (1997) examined a system in which a helicopter pilot uses a number of sensors (usually thermal and visual sensors) that are being displayed on a helmet-mounted display (HMD). Today, the pilot can change between the different displays. They suggest combining the information from the different sensors to one display using sensor fusion. They predict that a fusion display will ease the navigation task, and will help the pilot keep eye contact with the landing field in bad visual conditions (night, rain, fog). 3. Xue and Blum (2003) used image fusion to help detect a concealed weapon using IR and visual sensors. The fused image maintains the high resolution and the natural color of the visual image while incorporating any concealed weapons detected by the IR sensor. This fused image can be helpful for a police officer, for example, who must respond quickly based on a glance of the fused image.

1.4 Categories of fusion A multispectral image or different sensor images can be fused using various methods. These can be divided into categories: pixel level, data level or feature level fusion and decision level fusion. Our study, like most studies today, focuses on the pixel level fusion.

5

Pixel level fusion On this level, the input images are fused together on the pixel level. Methods using this level either use arithmetic operations (like addition, subtraction) on corresponding pixel intensity from different input images or use the frequency domain. Using the frequency domain, the input images are first transformed in the frequency domain using various pyramid based methods like Laplacian, or Wavelet transforms. After transformation, algebraic operations are performed on the input images fusing them to one image. Then, that image is inverse transformed to the final fused image. The algebraic rule can be based on pixel contrast, intensity, or on weight of a specific spectrum.

Feature level fusion At the feature level, features from the input images will be first extracted, and then fusion will be done based on these features. The typical algorithms used are featurebased template methods (like edge enhancement), Artificial Neural Networks, and knowledge based approaches.

Decision Level Fusion In decision level fusion first the features are extracted from each input. Then a decision is made on each input, and only then the decisions are fused to a final decision.

1.5 Methods of Image Fusion

6

In this chapter the most common methods of fusion are described. The fusion methods we are using in our experiments are described in more detail later.

Multiscale Decomposition based methods Multiscale transforms are very useful for analyzing the information content of images for the purpose of fusion. Zhang and Blum (1999) discuss the different multiscale image fusion approaches in detail. Most of the methods combine the multiscale decomposition of the source images. The idea is to perform a multiscale transform (MST) on the source images, construct a composite representation of these using some sort of fusion rule, and then construct the fused image by applying the inverse multiscale transform (IMST). This process can be seen in figure 1-1.

Figure 1-1 – Diagram of a generic multiscale decomposition fusion

Most commonly used multiscale decomposition fusion methods are pyramid transforms and wavelet transforms.

7

Pyramid transforms Pyramid transforms can be used as a multiscale transform for the fusion process. A pyramid transform fusion consists of a number of images at different scales which together represent the original image. An example for a pyramid transform is the Laplacian Pyramid. Each level of the Laplacian Pyramid is constructed from its lower level using blurring, size reduction, interpolation and differencing in this order (Zhang and Blum, 1999). Toet and Franken (2003) used a Laplacian Pyramid fusion to fuse infrared and image intensified images. They say that as a side effect of this method details in the resulting fused images can be displayed at higher contrast than they appear in the images from which they originate. Alternative pyramid transforms are contrast pyramid which preserves local luminance contrast in the sensor images (Toet 1990), and gradient pyramids which applies the gradient operator on each level of the Gausian pyramid representation (Burt and Kolezynski, 1993).

Discrete Wavelet transform Wavelets are a type of multi-resolution function approximation that allow for the hierarchical decomposition of a signal or an image. The Wavelet transform is a useful method to fuse images (Scheunders and Backer 2001, Gomez et al. 2001, Zhang and Blum 1999, Singh et al, 2004). The wavelet transform has several advantages over other pyramid-based transforms: It provides a more compact presentation, separates spatial orientation in different bands, and decorrelates interesting attributes in the original image. Using the Wavelet transformation, the source images are first transformed using the wavelet transform. Then, a fusion decision map is generated based on a set of fusion

8

rules. The fused wavelet coefficients can be built from the source images wavelet coefficients using the decision map. Finally, the fused image is obtained using the inverse wavelet transform. From this process, we can see that the fusion rule plays a very important role in the fusion process. A frequently used fusion rule is a pixel based rule where each coefficient in the merged transform is produced from a combination of the corresponding coefficients in the source images transform. Another used method is to consider not only the corresponding coefficients in the source images, but also their close neighbors – a 3x3 or 5x5 window for example. This is called window based fusion rule. This method assumes that there is usually a high correlation between nearby pixels.

Principal Component Transform Fusion PCA (Principal Component Analysis) is a general statistical technique that transforms multivariate data with correlated variables into one with uncorrelated variables. These new variables are obtained as linear combination of the original variables. PCA has been widely used in image encoding, image data compression, image enhancement and image fusion. When this technique is used in image fusion, it is performed on the images with all its spectral bands. William Krebs has used this method in many of his experiments (Krebs and Sinai, 2002, McCarley and Krebs, 2000, Krebs et al., 2001). There are two ways researchers have used PCA to fuse images. The first one assigns the first principal component (PC) band to one of the RGB bands the second to another RGB band in a false color technique. The second method separates the first and second PCs to intensity and hue in an HSV image and is described in more detail later.

9

Opponent color processing A technique used by Waxman et al. (1998) is based on biological models of opponent-color processing. It was observed that certain rattle snakes have neuron cells that can fuse IR and visible inputs. These cells show interactions in which the IR input can enhance or depress response to the visible input in a non-linear way. This idea forms the model for the opponent color technique. The technique maps two bands to human opponent colors (red vs. green, blue vs. yellow). A neural network is used to fuse low light and thermal images to produce a false color image.

False color A simple fusion method introduced by Alexander Toet (Toet and Walraven, 1996) and used frequently (Toet and Franken, 2003, McCarley and Krebs 2000) tries to utilize the ability of the human visual system to perceive color. The method assigns each band of the input image to a corresponding band in an RGB color image, assigning one band to the R channel, a second band to the B channel and the third band to the G channel. This will work if we are merging three images. If trying to merge two images, then one image can be assigned to both the B and G channel (cyan) while the other image will be assigned to the R channel. Some manipulation can be done on the input images before assigning them to the RGB bands. Toet and Franken (2003) have shown that this method is better for human perception and target detection than a normal contrast enhancement method.

10

Fusion methods used in our research In our research we chose to compare the common fusion methods of Principal Components fusion, and False Color fusion. In addition we compare them with a novel feature level fusion method based on edge detection which is explained later, and a simple pixel averaging method.

1.6 Human factor issues in Image fusion The goal of image fusion is to improve the performance of the human observer in different visual tasks. Human factors experiments are used to test whether a sensor or a fused image can improve operator performance in the task. Krebs et al. (2002) showed, by examining the effects of the same sensor fusion to different cognitive tasks, that the benefit of sensor fusion may be task dependent. Therefore, in this chapter, we will divide the different visual tasks into four groups: Detection, Search, Recognition & identification, and situational awareness. For each task group, we will review the human factor experiments and discuss how image fusion affects the task.

1.6.1 Detection Tasks Detection tasks are tasks in which the observer sees a scene with and without a target and needs to decide whether or not the target appears in the scene. Target detection is one of the most common tasks used to investigate the potential benefits of image fusion, yet the results of this benefit are not consistent. While some researchers have found fusion to improve target detection (Essock et al., 1999; Krebs et

11

al., 1999; Toet et al., 1997) others have not (Krebs and Sinai, 2002; Steele and Perconti, 1997). Essock et al. (1999) examined the ability of fusion to enhance target detection. In their experiment, ten human observers were presented images from SLS (Star Light Stimulator) sensors, thermal sensors, and fused images using the opponent color processing method. The observers were asked to tell if in a given image the target which was shown beforehand appeared. The time each image was shown was very short (100 ms). They found that the sensitivity of the detection results measured by d’ using the signal detection theory, was better in the fused image than in the single sensor ones. Krebs et al. (1999) wanted to determine if a fused image can improve drivers’ detection of road hazards within a nighttime scene. They used images collected from visible and short waved infrared sensors, and fused them using an opponent color technique. Eleven observers were asked to detect the presence of a pedestrian in single sensor image and in fused images with different intensity of opposing headlight glare. The results showed no effect of the image type on the reaction time, but sensor-fused imagery produced accuracy better than or equivalent to that produced by either one of the single-band imagery. In another experiment, Krebs et al. (2001) showed 14 observers a scene which contained in 50% of the trials a randomly placed airplane target. The image formats used were three single band imagery (short, mid and long IR bands), Color and B&W fused images using the PCA method. Target detection and false alarm probabilities were computed according to the signal detection theory. The results showed better performance with the color and the long-wave infrared images than the other images.

12

Sampson et al. (1996) compared thermal sensor images, SLS sensor images, chromatic and achromatic fused images in a target detection task. Six observers viewed three scenes with and without targets pasted on them. The experiment was conducted such that the display will imitate a HUD (head up display). The observers were asked to indicate whether or not the target existed in the image. Results showed some advantage to fused images over the input bands. Not all target detection experiments showed advantage to the fused images. Krebs and Sinai (2002) conducted an experiment to determine the perceptual advantages of sensor-fused imagery over conventional single-band nighttime imagery for a wide range of visual tasks including target detection. The target detection experiment included presenting an image to 84 observers. Observers were randomly assigned one of six image formats: IR, Image intensifier image, two chromatic fused formats, and two grayscale fused formats. The fusion method was the PCA method. Their data indicates that sensor fusion did not improve performance - neither in the reaction time nor in accuracy in the target detection task beyond the single-band image.

1.6.2 Search Tasks In search tasks the observer is asked to search for a target in the image presented to him, and to make a decision concerning its location. (For example: is the target located in the left or right hand side of the screen) Waxman et al. (1996) examined the benefit of chromatic and achromatic fusion in a search task. Three fusion formats were used fusing long wave IR and image intensified images: Opponent Color method, and two B&W methods. The images described natural scenes. A square with contrast value of +-15% which acted as the

13

target was inserted to the source images in a random location. The observers were asked to tell if the square was located on the right or left hand side of the image. The results showed that when one of the original formats had low contrast, and therefore was hard to use, the fused image helped with the search task in all contrast levels. Another search task experiment was conducted by Krebs et al. (2001). Ten observers participated in an eye movement study. Eye movements of the observers were studied as they searched for a target. Once the target was detected, the subject was to maintain fixation on the target. Each subject was shown one display format out of three single band imagery (short, mid and long IR), Color and B&W fused images using the PCA method. The images were of natural scenes which contained in 50% of the trials a randomly placed airplane target. After analyzing the data, it was concluded that observers looking at the short and medium IR bands, and at the B&W fused image had more fixations and longer scan-path lengths compared to long wave IR band and the color fused image. Thus, the long-wave and the color-fused targets contained enough information to guide the subjects’ eye movement to the desired location. It seems by the two experiments shown here that fusion contributes to the search task, but further experiments should be conducted to verify these results.

1.6.3 Recognition and identification Tasks Recognition tasks are tasks in which the observer is required to classify the target to a broader category, and identification tasks are tasks in which the observer is required to classify the target to a narrower category. For example, in recognition we distinguish the target between a car and another vehicle, while in identification we classify the type of car.

14

Steele and Perconti (1997) examined the contribution of fusion to the tasks of recognition, search, and identification for a helmet mounted helicopter display (HMD) system. Twenty-Three experienced observers looked at images presented on a HMD system similar to the one helicopter pilots use. The images were presented for ten seconds and the observers were requested to respond as fast as possible to questions like: is a target from a specific category (recognition), the location of a target (search), and detailed information on the target (identification). The comparison was done between five formats: thermal long wave IR sensor, light intensified image, PCA fusion method, and opponent color chromatic and achromatic methods. The results indicated that the fastest reaction time and the best accuracy level were with the opponent color method and the thermal image. Sinai et al. (1999) examined whether a fused image can improve visual performance over single bands. In their first experiment there were two input bands: long wave IR image and light intensified image, and four fused formats: two color fused formats, and two achromatic fused formats using a PCA method (In each fused method one IR band was taken with white-hot polarity and one with black-hot polarity). They examined 60 subjects who looked at images made of natural scenes; each subject looked at only one format. The subject needed to decide if the image contained a human figure, a vehicle, or no target at all. The results showed that the color fused format had fewer errors than the single band formats. The least number of errors was found in the color fused white-hot polarity. The results didn’t show any effect of format on reaction time.

15

1.6.4 Situation awareness Situation Awareness is defined as a perception of the components of the environment which are limited in time and place, understanding of their meanings, and their implications on the near future (Endsley 1988). Situational awareness is divided to three levels according to cognitive complexity: the perception of relevant attributes, the interpretation of the attributes needed to complete the task, and the ability to predict the behavior of the attributes in the perception. In the relevant experiments measuring situational awareness, only the first level is usually measured. Toet et al. (1997) examined if fusion can improve observers’ situational awareness. Two color fusion schemes, the Opponent Color method and the false color method, were applied to thermal and visible military images and were presented to six observers along with the original images. The task involved the detection and localization of a person relative to a characteristic detail that provided spatial context. The observers were requested to determine the position of the person in the image relative to a fence by pointing the mouse cursor on the same place in the schematic image based on their memory.

This task enabled the researchers to access the observers’

situation awareness. The results indicated that color fused imagery improved target detection rates over all other modalities. Furthermore, they showed that observers improved their ability to determine the relevant location of a person in the scene when looking at the fused images, hence improving their situation awareness. Sinai et al. (1999) also examined the advantages of fusion to situation awareness. In their second experiment 60 observers were asked to determine if an image was upright or inverted. Each observer viewed imagery from only one sensor format. The results

16

show that the reaction time was faster in all fused formats than in the input images. Also, the error rates were highest with the IR image. Krebs and Sinai (2002) presented chromatic and achromatic fused images using a PCA fusion method to 48 observers. The task required the observers to make a speedy response to determine if the scene is upright or inverted in 180 degrees. Results failed to reveal an affect of the image format on the sensitivity (number of errors), but did show that reaction time to the input sensors was slower than that to all fused formats.

1.6.5 Human factor discussion From the literature review of the different human factors experiments it seems that in most human visual tasks there is some sort of benefit to fusion. Looking more specifically, fusion’s effect on target detection and search tasks is equivocal, in the identification task it seems that fusion is at least as good as the thermal band, and in situational awareness tasks fusion has shown to have better results than the separate bands. The equivocal effect of fusion on detection tasks can be explained by the various fusion methods used, the different methodological approaches, the different input bands, and the differences in the scenes and the targets. Furthermore, in a target detection task the detection usually depends upon the difference in the contrast of the target and the surroundings. This difference is usually high in the thermal bands, and therefore adding information from other bands does not always help to improve the detection. In a situational awareness task on the other hand, fusion of data from different modalities can contribute to the overall perception of the environment by adding information about the environment from each input band. This can explain why all experiments in situation

17

awareness have shown the advantages of fused imagery. Therefore, a possible task of image fusion in a general visual task can be to maintain or slightly improve the high target detection of the thermal band, while improving the situation awareness of the image.

1.6.6 The lab method verses the natural method Two experimental approaches in cognitive psychology can be implemented in human factor experiments examining fusion methods for visual tasks. The first approach, and the more common one, is the natural-setting or real-life method. It uses an experimental setup which is as close as possible to reality (for example: Krebs et al. 1999, Sinai et al. 1999, Toet and Franken 2003). The second approach, the experimental method, exercises control over all intervening variables that are not concerned with the experiment itself. An example for this approach is taking a patch of the target, while controlling the pixel intensity in the image (Essock et al. 1999). One of the differences between the two approaches is the use of a natural background as opposed to a uniform background: in the real-life method, natural background is used (Krebs and Sinai 2002), while the laboratory approach, uses patches of real scenes which are presented on a uniform background (Essock et al. 1999). Another difference between the two approaches has to do with the target. In the natural approach, the scene is presented as is and the target is in its natural position, and in the lab approach, the target is superimposed in the scene (Sampson, 1996). The differences between these two approaches can explain some of the differences in the results of the experiments. The main advantage of the natural approach is that it is similar to the settings used in real life. By using real images with targets at natural positions, the conditions in which the

18

operator makes his decision are simulated, and conclusions can be made about the specific task. The problem with these kinds of experiments is that it is hard sometimes to extrapolate the results to other cases since there are many intervening variables. The lab approach controls these variables and therefore is better for examining specific cognitive processes, yet it does not have the reliability of real world experiments. Furthermore, by using patches and by putting the target in the scene, the unnatural interaction between the target and the background can give the observer unwanted hints of the position of the target. In our experiments use the natural setting approach.

1.7 Chromatic and achromatic fusion The human eye is sensitive to color, and can use color for different tasks like enhancing the ability to perceive and recognize targets. In a search task, for example, there is evidence that color can help the search using pre-attentive pop-up processes. (Treisman & Gelade, 1980). In a detection task, the ability to detect small objects against a varied background was shown to be greatly facilitated by the use of color (Goldstein 1996). Furthermore, it was found that fewer fixations are required to locate color-coded targets (Hughes and Creed, 1994) In a color display, compared to a grayscale display, in addition to the brightness contrast dimension there is also a color contrast dimension which can cause the separation between a shape and its background to be easier (Aguilar and Fay et al. 1998). The importance of color in fusion stems from the fact that even though the luminance intensity and the content in the spatial frequency in grayscale and color fused images are the same, color adds a perspective dimension of chromatic contrast that can aid in

19

performance of specific visual tasks (Krebs and Sinai, 2002). Nevertheless, it is not obvious that chromatic fusion will show better performance than achromatic fusion for a specific visual task. A fusion algorithm can facilitate visual performance by improving the spatial content of the input images, and not only by enhancing their contrast through the addition of the color. In that case, chromatic fusion is not necessarily better than achromatic fusion (Krebs et al. 2001). Human performance experiments show that in most cases chromatic fusion methods do show some advantage over achromatic fusion methods. In search tasks, as predicted by the pop-up effect, it seems that color fusion is advantageous (Waxman et al. 1996, Krebs et al. 2001). In recognition tasks (Sinai et al., 1999), situational awareness tasks (Toet et al. , 1997), and in scene recognition tasks (Sinai et al., 1999) experiments have also shown possible contribution of color fusion over achromatic fusion. In target detection on the other hand, findings of the benefits of chromatic fusion over achromatic fusion are not conclusive. In one study, Krebs et al. (2001) found color to be beneficial to a target detection task, but in another experiment, Krebs and Sinai (2002) did not find a difference in reaction time or in accuracy level between achromatic and chromatic fused images. In some circumstances, chromatic fusion even led to a lower level of detection than achromatic fusion (McCarley and Krebs, 2000). This difference in the target detection experiments can be attributed to the different fusion methods and methodologies used in each experiment. In our experiment, we will use several chromatic and achromatic fusion methods. The methodology will be the same for all fusion methods but we will not use chromatic and achromatic images of the

20

same fusion method. Investigation of the effects of color fusion on target detection is a topic for further research.

1.8 Effect of the input bands on the fusion Image fusion is a process of combining two or more images from different sensors. Different sensors that extract different types of information from a scene are used as input bands for the fusion process. Different input bands are used for different tasks. While thermal sensors are best for detecting human targets or heat illuminating objects at night time, millimeter wavelength radiation is more effective in the fog, and visual images have the best resolution. Different infrared (IR) sensors are frequently used as input bands for fused images. IR bands are divided into longwave IR (LWIR), midwave IR (MWIR) and shortwave IR (SWIR). Image intensified sensor (i²) images which enhance the star and moon light at night time are frequently fused with IR bands to form a fused image describing a night time scene. (McCarley and Krebs 2000, Essock et al 1999, Krebs and Sinai 2002, Sinai et al. 1999). In other experiments IR bands from different wave lengths were fused (Waxman et al. 1996, Krebs et al. 2001), or IR and regular visible sensor images were fused (Xue and Blum 2003, Toet et al. 1997). Fusion of long distance observation system’s video output streams has also been addressed. (Fishbain, 2004) In the fusion of IR bands with i² or visual images, the resolution of the infrared sensors are generally poorer than that of the image intensifier sensors; however, the contrast between the heat emitting objects in the image and their surrounding in the infrared sensor compared to the image intensifier image is greater. In a situation

21

awareness task, the number of errors and the reaction time was worse with the thermal sensor band than in all other formats (Krebs and Sinai 2002). The writers’ explanation was that the low resolution of the thermal format causes a less detailed description of the scene. However, in that experiment, the fused format showed better performance over both the thermal and the image intensifier formats. The improved performance indicates that the thermal format added unique information to the situation awareness task beyond that of the i² format. In a target detection task, on the other hand, the contrast of a target and the background is more important than the resolution. Therefore the advantages of fusion of IR and i² images as compared to a single IR band for this task are not straight forward. The input bands used in the fusion process can affect the decision of which fusion method to use. For example, Singh et al. (2004) used the wavelet transform to fuse infrared and visual images for the task of face recognition. They argue that since the IR images have a much lower resolution than that of the visual image, fusion using a multiresolution method would allow features to be fused at the resolution they are most salient.

1.9 The effect of the scene and target One of the areas that needs further examination is the effect of the content of the image (the target and the background) on the fusion. The target and the background used in the visual task can affect human performance. Krebs and Sinai (2002) found that it is easier to detect and recognize a human figure than a vehicle. This was found both in the single format and the fusion formats.

22

White (1998) argued that one cannot define the correct fusion format without considering the scene first. He presented to experienced and inexperienced observers 23 different scenes from five categories: man-made objects, wood, roads, ocean and “general” in a situation awareness task. Out of the 23 scenes in his experiment, the thermal sensor showed better human performance in 11 of them, the color fusion was better in 10 images, and the image intensifier image was better in two images. Most of the fusion methods do not take most of the data from one band and add a different proportion from the other. An exception to this is our method of edge fusion (see chapter 2) which takes most of the data from one band, and adds data from other bands. The disadvantage of this method is that the base band is not chosen dynamically according to the images, but is chosen beforehand. For example, clouds obscuring the moon and star light may degrade the image-intensifier image. An IR image taken in the morning after a long period of rain may be less detailed because of the low thermal contrast in the scene causing a target detection task to be harder (Toet 1997). So, if the input bands fused are an IR band and an image-intensifier band, and the IR image was taken after a long period of rain, a fusion scheme which emphasizes the image-intensifier image would be better. On the other hand, if there were clouds during the time the image was taken it might be better using a fusion scheme which will take most of its features from the IR image.

1.10 Target Detection A target detection task is a task in which the observer scans a region of the visual world, looking for something whose existence is unknown and is in an unknown location.

23

(Wickens and Hollands, 2000). There is no consistent pattern of display scanning (e.g. left-to-right) and no optimal scan pattern in search unless the target is in a defined scheme (e.g., menu in a software). The target search is driven by cognitive factors related to the expectancy of where the target is likely to be found. Usually, the semantically logical places for the target are first searched when scanning an image, and only later the rest of the image. The Feature integration theory (Treisman and Gelade, 1980) distinguishes between parallel and serial processes in target searching. Certain search tasks, like detecting a red item out of a group of green items, seem easy and effortless and demand parallel processing. Visual attention will be drawn to display items that are large, bright, colorful, or changing. This effect is called the pop-up effect. Other tasks where the target and the distractors are defined by a conjunction of features are more difficult, take more time, and demand serial processing. When multiple levels of multiple dimensions define the target, and when the target is difficult to discriminate from the distractors, serial search results. In our case, we are using complex IR images of real outdoor scenes. The targets and distractors have many features, and it is hard to distinguish between them. We can therefore conclude, that we are performing a serial search and not a parallel one, and thus we can expect the search time to take longer. Adding a feature that will distinguish the target from its surroundings (like color, or a distinct shape) can change the process to a parallel search, and ease the target detection task.

24

2 The Current Study 2.1 Research Objectives Many human factor studies have examined the advantages of specific fusion methods over the individual input bands, yet few comparison studies have been conducted to examine which fusion method is preferable over another. Simard et al. (1999) examined different fusion display methods of synthetic and IR sensor images for a helmet mounted display. Three subjects viewed an emulation of a descending flight. Their task was to detect specified terrain features and objects as they became visible in successive fused image snapshots along a flight path. They examined three fusion methods: pixel averaging, opponent process, and false color. The distance the target was first detected (the image number) was measured. The results indicate that all three formats improved the capability to detect features from a greater distance over the single sensor. When comparing the different fusion methods, the false color algorithm was superior over the two other methods over different visibility conditions. Simard et al. compared three fusion methods, but their use of only three observers and synthetic images prevents us from generalizing which method is better under what conditions. Other researchers have conducted experiments to show the advantage of different fusion methods over single bands, sometimes using two different fusion methods and comparing them. Toet et al. (1997) compared the opponent color method with the false color method while examining the benefit of fusion of thermal and visible image in a situation awareness task. The results showed better performance in the fused

25

methods over the input formats, but did not show a preference for one of the two fusion methods. McCarley and Krebs (2000) compared the false color and the principal components methods when trying to show the advantage of sensor fusion for enhancing drivers’ night-time display detection of road hazards. Observers were asked to detect a pedestrian in a night-time scene. They found that the principal components method was better when there was glare of the oncoming vehicle’s headlights, but under low illumination did not perform as well as the false color method. Other than the studies mentioned above, we are not aware of any other studies which compare human performance with different fusion methods.

Our main research

objective was to compare different fusion methods which are used today to fuse multispectral images, and to determine which method is best for the task of target detection disregarding other variables such as type of target or background. A second objective was to introduce a novel feature based fusion method based on edge detection and to compare this method to other known fusion methods. In addition, we introduce two methodologies to compare between different image modalities. One method is the paired comparison method, used to subjectively compare two image modalities. The other method is a new methodology to use eye-tracking to compare images in a target detection task.

26

2.2 Research methodology For the edge fusion method, we developed and implemented an algorithm which fuses several input bands using edge detection. This algorithm is described in detail later on in this section. In order to compare between different fusion methods, we performed several psychophysical experiments. Psychophysical experiments are commonly used in human factors research. They are procedures which are designed to examine and record human reactions to given situations or tasks. These experiments can be of qualitative or quantitative nature. In our experiments, we performed both qualitative and quantitative experiments. The first experiment was a quantitative experiment. In the experiment we presented to observers fused images with multiple targets embedded in them. We measured the absolute number of targets correctly detected for each observer, and the time needed to detect those targets. The second experiment was a qualitative experiment. In this experiment we presented to observers paired comparisons of two different fusion methods and recorded their judgment of which method is better on a qualitative subjective scale. The third experiment measured the eye-movements of the observers as they searched for targets in the fused images. The methods and results of all three experiments are described in the next sections followed by a general discussion of all the results and our conclusions.

27

2.3 Fusion methods The images were fused using four different fusion methods: False Color, Principal Components, Edge, and Average fusion.

2.3.1

False Color A simple fusion method introduced by Toet and Walraven (1996) fuses two input

bands using a false color mechanism. First, the common component of the two images is found using a local minimum. The common component for the images A(i,j), B(i,j) is calculated by: (1)

A ∩ B (i, j ) = Min{ A(i, j ), B (i, j )}

where A and B denotes the two input band images, and i and j denotes a pixel in image A or B. Then, the common component of the two images is subtracted from the original images in order to get the unique component of each image: (2)

A* = A − A ∩ B , and B* = B − A ∩ B .

The next step is to subtract the unique component of each image from the other image: A-B* and B-A*. Finally, these two images are mapped to the Red and Green bands of an RGB image to create one fused image: (3)

C = ( A − B*) ⊕ ( B − A*)

where ⊕ represents the fusion operation. It is possible to emphasize the unique component of the two images by assigning the difference between them: (A*-B*) to the Blue band of the RGB image.

28

A more simple way of using the False Color method is to assign each band of the input images to a corresponding band in the RGB color image, assigning one band to the R channel, one band to the G channel and one band to the B channel. If working with 3 bands this is simple. If working with only two bands, one band can be assigned to the R channel, and the other band to the Cyan color. Assigning the second band to the cyan color, is done by assigning the band to both the G and the B bands.

2.3.2

Principal Components William Krebs has used the principal components method to fuse images in many

of his experiments (Krebs and Sinai, 2002, McCarley and Krebs, 2000, Krebs et al., 2001). In his approach, the input images are transformed using principal components analysis, and the major axis is mapped to the luminance channel in an HSV (hue, saturation, value) image, while the orthogonal axis is mapped to the color channel. Then, the image is transformed from HSV to an RGB (red, green, blue) image. The resulting image shows the fused image using false colors which show each band. The assignment of the major principal component to the luminance channel is strait forward, but the assignment of the second component to the color channel is not immediately obvious. According to Krebs, the assignment of the second component to color results in displaying two and only two opponent colors in various saturation. This makes an immediately intuitive representation as to which spectral bands dominate and by how much. When Krebs indicated that only two colors were displayed, he was using only two bands as input bands. When using three or more input bands there might be

29

more colors, but the intuitive representation of the spectral bands will happen also with three input bands.

2.3.3 Simple intensity average The most simple fusion scheme is to take the average result of each pixel from the input bands as the intensity of the fused image. This will produce an image which contains information from all the input bands in a unified way. We can use this method as a benchmark for the other fusion methods. Weighted average can also be applied, taking the weights of the different input bands according to characteristics of the input bands (Fishbain, 2004). In this study we used a simple non-weighted averaging scheme.

2.3.4

Edge fusion A common denominator between the above methods is that they are based on

pixel evaluation alone. The neighborhood of a particular pixel does not influence the value of the pixel in the fused image. A totally different method involves feature level fusion. A standard image processing method is to enhance images by adding on edge information. Edge information can be added to the original image by using any edge detection method. In our algorithm we will use the Sobel filter to extract the edge information of each band. (Gonzalez and Woods, 1992) This method finds edges by convolving the original image with a spatial filter. The extracted edge information can be added to the original image to enhance the target detection ability.

30

When using this method for fusion of several input images, one image should be selected as the base image. An edge image can then be converted as mentioned above, and added to the base image. We can repeat this process for several bands, adding each band’s edge intensity to the base image. Edge enhancement method is a feature level fusion method. The effectiveness of this method depends upon the distribution of the feature space. If the distribution is too low, the overlap will cause too much ambiguity and will not be able to identify the target, if the distribution is too high, there will not be enough overlap to enhance the identification of the target. Another point to take into consideration is which band to use as base band. This can depend on the input bands used. When fusing visible or short IR bands with IR bands (mid or long) it is best in our opinion, to use the visible band as base for the fusion, since the background has higher resolution in the visible bands. If the IR band will be taken as the base band, the resolution of the fused image will be that of the IR band. For the task of target detection, it is best to choose the base band in which the target is more salient. The edge information from the other band can then further enhance the targets' salience.

31

3 Target Detection To quantitatively test target detection in a natural scene, many studies show observers the scene with or without a target present, and ask the observer to find the target as quickly as possible if it exists or to decide that the target does not exist (Krebs and Sinai, 2002, McCarley and Krebs, 2000, Toet and Franken, 2002). Detection time and accuracy of detection are usually measured. In our images, there are more than one targets present in each scene, so a different approach is needed. As in previous studies (Maltz and Shinar, 2003), our observers' task was to find as many targets as possible as quickly as possible while avoiding false alarms. The accuracy and number of targets as well as the time taken to find the targets were measured.

3.1

Method

3.1.1 Participants Participants in the experiment were 56 students from the industrial engineering and management department of Ben-Gurion University with no experience in target detection. Average age of the participants was 25.5 with a standard deviation of 2.3 . All participants had normal or corrected to normal vision (minimum 6/9 Snellen visual acuity), were checked to have normal contrast vision using the Pelli-Robson contrast chart, and reported having no color vision deficiency. All participants were naïve to the purpose of the experiment.

32

3.1.2 Apparatus Stimuli were presented to the participants on a color 17” LCD screen via a Pentium 4 PC. The display resolution was 1024x768 with a frame refresh rate of 85 Hz. Participant-to-screen distance was approximately 40cm. The experiment took place in a small, quiet, darkened room to minimize peripheral distraction. Observers were instructed to adjust their distance to the computer and to place the mouse in their preferred position to enable them to view the screen clearly and to easily manipulate the mouse.

3.1.3 Stimuli Stimuli for this experiment were eight multispectral images of different input bands. Each image had a different combination of two or three short, medium, or long IR input bands. The images were carefully chosen out of twenty six "base" multispectral images to represent different input bands, and to present a different scene in each image. Each scene contained two to five military vehicles in a complex rural background. During post processing each image was spatially registered. Then the registered images were normalized in contrast using adaptive histogram equalization (Gonzalez and Woods, 1992). Each image was then fused using the four fusion methods described previously (false color, principal components, edge, and average). Figures 3-1 to 3-6 display an example of the imagery used. Figures 3-1 and 3-2 show the two original bands of a

33

multispectral image used in the experiment after contrast normalization. Figures 3-3 to 3-6 display the result of the average, edge, false color and principle component fusion methods respectively. This image has three targets which are marked with an arrow in figure 3-1.

3.1.4 Procedure The observers were randomly assigned to four subject groups; grouping was based on fusion method (false color, principal components, edge, and average). Each observer saw images fused with his group's fusion method. The observer was requested to find and click as quickly as possible on all the targets present in the scene while avoiding false alarms. Preliminary explanations were presented to the observer as well as an image containing patches of targets from images fused by the subjects' fusion method. In addition, one example was shown as a familiarization exercise before the start of the experiment trials. Each trial started with a dialog window prompting the user to indicate when he was ready to start the trial. After 20 seconds, the trial ended and a new dialog window was presented starting a new trial. All together there were eight trials for each observer. In each trial, a different scene was presented to the observer. The order of the images for each observer was the same. Performance was tied to speed as well as accuracy in the task. To motivate the observers, a cash prize of 100NIS was promised to the observer with the best performance.

34

Figure 3-1 – original band 1 with the three targets marked

Figure 3-2 – original band 2

35

Figure 3-3 – fused image using Average method

Figure 3-4 – fused image using Edge method

Figure 3-5 – fused image using false color method

Figure 3-6 – fused image using principal components method

36

3.2 Results The independent variable was the fusion method: average, edge, false color (FC), and principal components (PC). The dependent variables measured were the number of hits (correct detection of a target), the number of false alarms (clicking the mouse on a non-target), and the detection time. For each subject, the variables (for example number of hits) were summed or averaged over all images shown to the subject. We are interested in how the fusion methods affected the different dependent variables. The comparison between fusion methods was done using a one-way analysis of variance (ANOVA).

3.2.1 Total hits The total number of times the observer clicked on the mouse and hit a target out of the number of targets in each image was averaged over all images shown to each observer. In a target detection task, we prefer methods that increase the number of hits. We can see the results of the analysis with the hit percentage average as the dependent variable in figure 3-7. As can be seen in the graph, the principal components and the false color methods showed the best results for the number of detections, the average method had a lower number of detections, and the edge method showed the worse results.

37

method; LS Means Current effect: F(3, 52)=8.8512, p=.00008 Effective hypothesis decomposition Vertical bars denote 0.95 confidence intervals 0.70 0.65 0.60

hit percent

0.55 0.50 0.45 0.40 0.35 0.30 0.25 average

edge

false color

principal comp

method

Figure 3-7 – effect of fusion method on the hit percentage average

The analysis showed a significant effect { F( 3,52 ) = 7.394; P < 0.01 } of fusion method. The post-hoc Fisher LSD analysis indicated that the edge method was responsible for this effect. All pair wise comparisons of the edge method with all other methods produced a significant effect (can be seen in appendix 3). No significant differences were found between the other methods. In order to examine the interaction between the fusion method and the image quality, we asked five people to rate the general image qualities. We then divided the images into two groups of high and low quality based on this rating. It was found that all of the images rated as good by the viewers had one of their input bands from the near infrared spectrum, which can explain why people perceived them as better. A two way analysis of variance was conducted in order to examine, across all participants, the influence of the image quality and fusion method on the probability of hit. A significant

38

effect was found for image quality { F(1,52 ) = 12.216; P < 0.01 } indicating that there was a significant difference between the two groups. In addition, a significant interaction effect was found between the fusion method and the image quality { F(1,52 ) = 3.87; P < 0.05 }. Figure 3-8 displays this interaction. Means of the hit percentage rate are presented for the fusion methods of high and low quality images.

Quality*method; LS Means Curr ent effect: F(3, 52)=3.8707, p=.01423 Effective hypothesis decomposition Vertical bars denote 0.95 confidence intervals 0.8

0.7

0.58

probability of hit

0.6

0.6

0.53 0.5 0.51

0.5

0.5 0.44

0.4

0.3

0.25

0.2

0.1

low

high

average edge false color principal components

Quality

Figure 3-8 – hit percentage – fusion method X image quality. Means for high and low image quality in each fusion method

As shown in figure 3-8, the interaction between the quality and the fusion method stems mainly from the difference in the average method. While the probability of hit increased in the PC, FC, and edge method from the low to the high quality images, it decreased for the average method. Also, the PC and FC methods were the best methods for high quality images, while the average method was better for the low quality images.

39

To further investigate this interaction, we performed a one-way ANOVA test on each quality group separately. The low quality group showed a main effect on the method variable { F(1,52 ) = 10.83; P < 0.01 }. A post-hoc Fisher LSD analysis showed that this effect stems from the difference between the edge method and all other methods (see figure 4). There was a significant main effect for the high quality group { F(1,52 ) = 3.98; P < 0.05 }. Using the post-hoc Fisher LSD analysis showed that this effect stems from the difference between the edge method and the PC and FC methods. Furthermore, the difference between the average and the PC methods was marginally significant (p = 0.056). The detailed analysis of these results is presented in Appendix 3.

3.2.2 False alarms False alarms were recorded when the participant clicked on the image thinking there was a target in a place with no actual target. The number of false alarms was summed over all images for each participant. We wished to identify which methods not only give a better detection rate (probability of hit) but also have a minimum error rate (number of false alarms). Figure 3-9 displays the false alarm means of the different groups. We can see that more mistakes were made in the false color and edge groups, while the principle components and the average groups showed fewer false alarms. However, these differences were not statistically significant. { F( 3,52 ) = 0.4283; NS }.

40

10 8

8.928571

9.071429

edge

FC

7.857143

7.285714

6 4 2 0 average

PC

Figure 3-9 – Mean false alarm means per fusion group

3.2.3 Detection time Detection time is the time taken to detect a target. Better fusion methods will help the observer detect the targets and therefore will have faster detection times. There may be targets which can be detected with all fusion methods, but some fusion methods help to emphasize a target's features, causing the target to be detected faster. The detection time from the last click (or start of a new image for the first click) was measured for all hits and averaged over all images for each participant. Figure 3-10 displays the mean target detection time of each group. The time taken to detect a target using the false alarm method was higher than the other groups, which had similar detection times. When analyzing the average detection time as the dependent variable in a one way ANOVA test, the result did not show significant results. { F( 3,52 ) = 0.3125; NS }

41

PC

3.60

FC

ge ed

ra av e

3.61

3.53

ge

reaction time

3.93

4.0 3.9 3.8 3.7 3.6 3.5 3.4 3.3

Figure 3-10 – Mean target detection time per fusion group

3.3 Discussion The results for the number of hits, showed best results for the principal components and false color methods, and worst results for the edge method. But, only the fact that the edge method had fewer hits was found statistically significant. When looking at the image quality factor, we see that the PC and FC methods showed best hit percentage results for images in the good quality group. The common denominator for this group, in contrast to the bad quality group, was that all images in this group had a short wave IR band as one as the input bands. The short wave IR frequencies, being close to the visible frequencies, show a more regular representation of the scene and therefore images fused with a short wave IR band as an input band were rated better. We can therefore conclude that when fusing a short-wave IR band with a medium or long wave IR band, it is best to use the PC or FC methods. Because the difference between the groups was the visibility, we hypothesize that fusion of bands

42

from the visible spectrum with medium or long IR bands will act in the same way. Nevertheless, investigating this issue is a topic for further research. When fusing medium and long IR bands, it was shown the PC, FC and edge methods showed about the same hit percent results, while the edge method showed significantly lower results. The results for false alarm rates showed best results for the principal components and average methods (less false alarms) but were not statistically significant. The results for target detection time showed worse results for the false color method but were not statistically significant.

43

4

Paired Comparisons In a paired comparisons experiment, according to the guidelines of the method of

paired comparisons (Thurstone 1918, in Torgerson, 1967), several subjects compare two image samples to each other. The percentage of the time one sample is preferred over the other is used as an index of the relative quality of the two samples. This method of pairwise comparisons generates reliable data about the relative subjective quality of the two images. (Silberstein and Farell, 2000). By comparing all combinations of methods, we can then scale all methods with scalar values on one single scale. By applying this procedure, we can put the subjective assessments of the participants’ views of the benefit of the different fusion methods to the target detection task on one comparable scale.

4.1 Method 4.1.1 Participants Participants in the experiment were the same 56 students from Ben-Gurion University who participated in the target detection experiment. First they did the target detection experiment, then the paired comparisons experiement. The target detection experiment was performed first so the participants would not have a previous knowledge of the targets’ locations. In this experiment we showed the targets to the participants, so the target detection experiment did not affect this experiment. Average age of the participants was 25.5, with a standard deviation of 2.3. They all had normal or corrected

44

to normal vision (minimum 6/9 Snellen visual acuity) and were checked to have normal contrast vision using the Pelli-Robson contrast chart. All participants reported having normal color vision. All participants were naïve to the purpose of the experiment.

4.1.2 Apparatus Stimuli were presented to the participants on a color 17” LCD screen via Pentium 4 PC. Display resolution was 1024x768 with a frame refresh rate of 85 Hz. Participantto-screen distance was approximately 40cm. The experiment took place in a small, quiet, darkened room to minimize peripheral distraction. Observers were instructed to adjust their distance to the computer and to place the mouse in their preferred position to enable them to view the screen clearly and to easily manipulate the mouse.

4.1.3 Stimuli Stimuli were twenty-six "base" multispectral images of different input bands (visible, short IR, mid IR and long IR) obtained from the Israeli Ministry of Defense. Some images showed the same scene, but had different input bands; altogether there were nine different scenes. Each scene contained two to five military vehicles in a complex rural background. During post processing each image was spatially registered. Then the registered images were normalized in contrast using adaptive histogram equalization (Gonzalez and Woods, 1992). Each image was fused using the 4 fusion methods described previously (false color, principal components, edge, and average).

45

4.1.4 Procedure We used a qualitative experiment setting according to the guidelines of the method of paired comparison (Silverstein and Farrell, 2000). We chose nineteen multispectral images (seven images were screened due to bad registration) and fused them using the four fusion method mentioned (Principal components, false color, edge, average). We organized the fused images into pairs; each containing images of the exact same multispectral image, but with two different fusion methods. All possible combinations of methods between two images of the same multispectral image were used. Preliminary explanations and three example pairs were shown as a familiarization exercise before the start of the experiment trials. Then, each observer judged 114 pairs of images (19 images X 6 comparison combinations of 4 fusion methods) as they were displayed side by side on the screen by a simple Visual Basic application. Order of presentation of the image pairs for each observer as well as the side of each method in each comparison (right or left) was random. Their task was to choose which of the two images was preferable for detecting the embedded target by clicking with the computer mouse directly on the preferred image (not necessarily on the target). In each trial, the target was marked by a circle surrounding the target during the first second of each trial to indicate the target to the observer. An example of a paired comparison with a marked target as presented to the observers can be seen in figure 4-1. Immediately after the decision was made, the next pair of images appeared on the screen for consideration.

Observers were allowed as much time as necessary to make

their decisions.

46

Figure 4-1 – Example of a paired comparison between average and principle components methods

4.2 Results + Discussion One of the common methods to analyze experiments performed according to the method of paired comparisons is the Law of Comparative judgment (LCJ). The law of comparative judgment gives us the analytical tools to quantify the proportion of time any given stimulus j is judged greater on a given attribute than any other stimulus k. Applying this law in the current research gives us a set of equations relating the proportion of time an image I(i, j) is judged better than another image I(i, k) from the point of view of ease to detect the marked target in the image, where i is the image index and j,k are the fusion methods indices. Given a set of samples, each sample representing a different fusion method, that are judged against each other pair-wise across a set of subjects an N x N matrix can be compiled, where N is the number of fusion methods. Each element Cij represents the

47

number of times fusion method i has been judged to be better on the criteria checked (ease to detect the marked target) then method j. The matrix showing the sum results of all subjects each subject judging 19 images is shown in table 4-1. Cell Cij (where i is the row and j is the column) represents the number of times method i was preferred over method j. For example, The False Color method was preferred over the Principal Components method in 617 separate comparisons.

Method

Principal Components -

False color

Edge

Average

447

525

475

617

-

651

578

Edge

539

413

-

470

Average

589

486

594

-

Principal components False Color

Table 4-1 – Sum of comparative judgments of all images. Cell Cij represents the number of times method i was preferred over method j

The matrix showing the percentage of judgment which each method was preferred is presented in table 4-2. Each Cell Cij in table 4-2 represents the overall percentage of times method i was preferred over method j. For example, the False Color method was preferred over the Principal Components method 57.99% of the times. The matrix is symmetrical on the diagonal axis in such a way that Cij complements Cji to 100% because each comparison presented to the user forces a decision between Cij and Cji .

48

Method


False color

Edge

Average

42.01

49.34

44.64

57.99

-

61.18

54.32

Edge

50.66

38.82

-

44.17

Average

55.36

45.68

55.83

-


Table 4-2 - Percentage of comparative judgments of all images. Cell Cij represents the percentage of times method i was preferred over method j

Thurstone’s case V method of comparative judgment (Thurstone 1959, in Silverstein and Farell 2001) can be applied to determine the relative qualities of all the samples if: a. Each sample has a single value that describes its quality qi b. Each observer estimates the quality of this sample with a value from a normal distribution around this quality c. Each sample has the same perceptual variance d. Each comparison is independent In the current experiment, we assume that the quality of each fusion method can be quantified in a single value. In addition, we can also assume that observers' estimation of this quality is normally distributed. The different fusion methods use the same input bands and the same images, therefore we can also assume that they have the same perceptual variance. Finally, according to the experimental settings, each comparison is independent.

49

Given these assumptions, the quality of each sample i can be described by a scalar value qi with units of standard-deviations of preferences. The distances between two samples d i′, j can be estimated by the following equation: ⎡ Ci , j ⎤ (1) d i′, j = qi − q j ≈ 2 Z ⎢ ⎥ ⎢⎣ Ci , j + C j ,i ⎥⎦

where Z is the inverse cumulative-normal function (Z-score).

Using equation 1, we can calculate the distance between each method by standard-deviation of preference units. The results of the distances from each method to the other are presented in table 4-3. A positive value in cell Cij indicates that method i is preferable over method j, while a negative value indicates that method j is preferable over method i. The table’s diagonal symmetry stems from the fact that the distance between methods A and B is the same as the distance between methods B and A.

Method


False color

Edge

Average

-0.285

-0.023

-0.190

+0.285

-

+0.402

+0.153

Edge

+0.023

-0.402

-

-0.207

Average

+0.190

-0.153

+0.207

-


Table 4-3 – Distance between each method in standard-deviation of preference units

Based on the assumption that each sample has a single value that can describe its quality qi all samples can be put on a one dimensional quality line. We can estimate a

50

distance of a sample to the mean of all samples by taking the mean distance between the sample and all other samples. This can be described in the following equation: d i′,mean ≈

∑d ′

i, j

N

Using this equation, we can calculate the scale values of the fusion methods. This scale is presented by value in table 4-4 and graphically in figure 4-2.

Method Scale value

Principal Components -0.166

False color +0.28

Edge

Average

-0.195

+0.244

Table 4-4 – scale value of the different fusion methods.

0.4

scale value

0.3 0.2 0.1 0 -0.1

PC

FC

Edge

Average

-0.2 -0.3

Figure 4-2 – scale value by law of comparative judgment of different fusion methods (Principal components, False color, Edge, Average)

The results show, with very slight differences, that the false Color and average methods give the best results, while the principle components and the edge methods give worse results. There is almost no difference between the false color and the average

51

methods in the overall scale, but looking at the specific comparison (table 4-2) we can see that the false color method is preferable over the average method. There is almost no difference between the principle components and the edge method, not in the overall scale nor in the specific comparison.

52

5 Eye Tracking Eye tracking is a technology used to determine where a person is looking. A special camera is used to capture where the person’s eye is and track his eye movements. Cognitive scientists record eye movements to understand cognitive processes that occur when an observer is searching for a target. Eye movement data can indicate where the eye fixated when searching a scene (Maltz and Shinar, 1999, Findlay, 1997), helping us to understand the search process. The eye does not usually move smoothly across a scene. It moves via a series of jumps called saccades. Saccades are rapid motions causing a desired portion of the scene to fall on the fovea. Once a saccade starts it is not possible to change its destination or path, and the visual system is mostly suppressed during the saccade. Usually, a saccade is followed by a fixation. As a person looks at a scene, he or she takes in the information by a series of fixations which are pauses of the eye that last between 200-600 ms as an observer examines part of the stimulus. Eye movements propel the eye from one fixation to the next. These eye movements are necessary if we are to see all of the details of the scene, because a single fixation would reveal only the details near where we are looking. According to Hochberg (1970) these eye movements also have another purpose: the information they take in about the different parts of the scene is used to create a mental map of the scene by a process of integration. In our experiment, we used eye-tracking to attempt to understand what drew the attention of the observers during the target detection task, and thus compare between the different fusion methods. It is hypothesized that the better the fusion method will be, the more fixations the user will have on the targets. We can also measure how many times 53

the subject visited the target before making a decision and fixation duration till the decision.

5.1 Method 5.1.1 Participants Participants in the experiment were 33 students from the industrial engineering and management department of Ben-Gurion University with no experience in target detection. The average age of the participants was 24.7 with a standard deviation of 3.1. They all had normal vision or corrected to normal vision (minimum 6/9 Snellen visual acuity) and none wore eye-glasses. They were checked to have normal contrast vision using the Pelli-Robson contrast chart and reported having normal color vision. All participants were naïve to the purpose of the experiment.

5.1.2 Apparatus Stimuli were presented to the participants on a color 17” LCD screen via a Pentium 4 PC. Display resolution was 1024x768 with a frame refresh rate of 85 Hz. Participant-toscreen distance was approximately 70cm. The experiment took place in a small, quiet, darkened room to minimize peripheral distraction and to keep the participants’ pupil from becoming too small. Observers were instructed to place the mouse in their preferred position to enable them to view the screen clearly and to easily manipulate the mouse. Participants’ eye movements were tracked using an Applied Science Laboratories (Bedford, MA) model 504 eye tracking device. The device utilizes the pupil to corneal

54

reflection technique for measuring eye movements. In this technique, a pan-tilt camera is used to capture eye movements. Surrounding the camera's lens are infra-red LEDs to illuminate the eye and allow the camera to record the pupil as a bright disk and to capture the corneal reflection. The system calculates eye position from the measured locations of the pupil and the corneal reflection. Because the pupil and corneal reflection move differently with respect to each other as the angle of gaze changes, simultaneous tracking of the two helps the system distinguish between eye and head movements. This allows head movements (the two features move together) to be distinguished from eyemovements (the two features move apart). A magnetic head tracker (MHT) receiver translates head movement information received from the transmitter (located immediately behind the participant's head) and sensor (placed via headband above the participant's dominant eye). The MHT helps the system track strong head movements that are beyond the scope of the pan tilt camera. The system samples eye data at 50 Hz. The manufacturer claims a system accuracy of 1° (equivalent at a distance of 70cm to 40 pixels) after calibration.

5.1.3 Stimuli Stimuli for this experiment were based on the eight fused images used in experiment 1. Composite images were prepared from the source images by applying two different fusion methods to a source image such that the right side of the image had one fusion method and the left side had another fusion method. We used the fusion methods of principal components, false color and average which have shown best results in the target detection experiment. Eight images were processed, producing 6 composite

55

stimuli for each image (3 fusion methods giving 3 paired combinations, each in two right/left configurations). An example of a composite image compiled from the principle components method on the left side and the average method on the right side is presented in figure 5-1.

Figure 5-1 – Example of a composite image. The left side of the image was fused using the PC method, and the right side was fused using the average method.

5.1.4 Procedure The observers were randomly assigned to three subject groups; grouping was based on a unique combination of two of the three fusion methods (principle components – false color, principle components – average, false color – average). Within each group, subjects were randomly assigned to one of two sub-groups, I or II. Sub-group I was presented with images 1,3,5 and 7 with the first fusion method on the right side and the second fusion method on the left side (for example, the left side of the image was fused with the Average method and the right side with the PC method), and with images 2,4,6 and 8 with the first fusion method on the left side and the second fusion method on the 56

right side (PC on the left and Average on the right). The opposite right/left configuration was presented to the subgroup II. This was done to neutralize any tendency of an observer to look at one side more than the other. The course of the experiment was similar to the target detection experiment. The observer was requested to find and click as quickly as possible on all the targets present in the scene while avoiding false alarms. Preliminary explanations were presented to the observer as well as an image containing patches of targets from images fused by the subjects' fusion methods. In addition, one image was shown as a familiarization exercise before the start of the experiment trials. Each trial started with a dialog window prompting the user to indicate when he was ready to start the trial. After 20 seconds, the trial ended and a new dialog window was presented starting a new trial. All together, there were eight trials for each observer; in each trial a different composite image of a different scene was presented to the observer. The order of the images for observers was identical. Observers were told to perform the target acquisition as quickly as possible, and performance score was tied to speed as well as accuracy in target detection. Observers’ eye movements during the target search were recorded as well as target detection time and accuracy. Eye movement data was analyzed using the ASL Eyenal software bundled with the system.

5.2 Results We analyzed each group separately (Average-PC, Average-FC, PC-FC). In each group, each participant in the group viewed all 8 composite images.

57

Since the images were not balanced target-wise, for each image, we compared measurements of the participants who viewed the left side with one method with participants who viewed the left side with the other method. The same comparison was performed for the right side of the image. In this way, we compared performance in the same areas of an image for the different methods. In addition, by comparing sides of the images, we were able to avoid skewing of the results by any subject's tendency to stray to the right or to the left side of the display. We also analyzed the number and duration of fixations on the targets, for the two fusion methods. The dependent variables were the number of fixations, the total fixation duration and the mean fixation duration for each image half. The independent variables were image (1-8), side of the image (left, right) and the fusion method used (Average, Principle Components, and False Color)

5.2.1 Average vs. False Color 5.2.1.1 Fixation Count A three-way analysis of variance (ANOVA) of fixation count percentage, with group as a between-subject variable and image and side as within-subject variables was conducted in order to examine, across all participants, images, and sides of images, the influence of the fusion method on fixation count. Image(8) and side(2) was treated as repeated measures. Fixation count percentage is the percent of the number of fixations in an area out of the total number of fixations a user had in the image. Figure 5-2 displays means of fixation count percentage for the Average and FC methods on each side (left, right) over

58

all images. It can be seen that the FC method had more overall fixations, both in the left side and the right side of the images. There was a significant main effect for fusion method { F(1,124 ) = 15.811; p < 0.05 }, and no significant interaction with image side.

60

fixation count %

50

55.33 48.04

46.47 40.30

40 average

30

FC

20 10 0 left

right

Figure 5-2 –fixation count percent of average and FC in each image side.

The three-way ANOVA (group(2) X side(2) X image(8) ) on the fixation count percent yielded a marginally significant interaction between group and image { F( 7 ,124 ) = 2.23; p = 0.058 }. Figure 5-3 shows the fixation count percent for each image for the average and FC fusion methods. As can be seen in the graph, images 1 and 4 do not follow the general pattern of the PC method having more fixations than the average method. Since only two of eight images did not follow the general pattern, and since the interaction was not significant, this indicates that the main effect on the method stemmed from the difference of the means of the two groups.

59

method*image; LS Means 70 65 60

fix num %

55 50 45 40 35 30 25 1

2

3

4

5

6

7

8

Average FC

image

Figure 5-3 – Effect of fusion method on the fixation number percent for each image

The mean fixation duration is the average time a single fixation took. A three way ANOVA (image(8) X side(2) X method(2)) conducted on the mean fixation durations revealed no significant effect or interaction.

5.2.1.2 Target fixation time The total amount of time fixated on the targets in each image was recorded. The time fixated on the targets tells us how salient the target was, even if the observer did not click on it with the mouse. We hypothesize that the better the fusion method is, the more time the observer fixates on the target. A one-way ANOVA computed on the targets’ summed fixation time yielded a marginally significant effect of the method. { F(1,34 ) = 3.03; P = 0.083 }. Observers viewing the FC method averaged 0.62 seconds fixation duration on a target, while observers viewing the average method averaged 0.33 seconds. These differences are presented in figure 5-4.

60

0.7

0.62

0.6 0.5 0.4

0.33

0.3 0.2 0.1 0 Average

FC

Figure 5-4 – Average sum of fixation durations on targets of average and FC methods

5.2.2 Average vs. Principal Components 5.2.2.1 Fixation count Figure 5-5 displays the means of the fixation count percentage in each image side for the average and PC fusion methods. It can be seen in the graph that the principle component method had more fixations than the average method in both sides. A threeway ANOVA (image(8) X side(2) X method(2)) on fixation count percentage showed significantly more fixations for the PC fusion method, with a mean fixation count percentage of 51.6% out of the total number of fixations in the image, compared with 42.1% for the average fusion method, { F(1,100) = 8.245; P < 0.05 }.

61

57.08

fixation count %

60 50

47.27

44.77 37.62

40

avg

30

pc

20 10 0 left

right side

Figure 5-5 – fixation count percentage of average and PC methods in each image side.

The three-way ANOVA (group(2) X side(2) X image(8) ) on the fixation count percentage yielded a significant interaction between group and image { F( 7 ,100 ) = 8.745; p < 0.05 }, showing that the effect of the fusion method on the fixation count percentage was different from image to image. Figure 5-6 displays this effect. As seen in the figure, images 1,2,4,7 and 8 have more fixations in the PC method than in the average method while images 4,5 and 6 have about the same number of fixations for both fusion methods. The main effect between the methods stems from the overall difference, which is strongest in images 1 and 7. Because for each image the PC method showed the same or more fixations than the average method, we can relate the main effect to the differences of the two groups.

62

method*image; LS Means 80

70

fix num %

60

50

40

30

20

10 1

2

3

4

5

6

7

8

Average PC

image

Figure 5-6 – Effect of fusion method on the fixation number percentage in each image.

A three way ANOVA (image(8) X side(2) X method(2)) conducted on the mean fixation durations revealed no significant effect or interaction.

5.2.2.2 Target fixation time

A one-way ANOVA conducted on the targets’ summed fixation time did not yield a significant effect of the method { F(1,31) = 1.42; p = 0.204 }, although the mean fixation duration on each target was greater for the PC fusion method (0.86) than that of the average fusion method (0.53). The means for the two methods are presented in figure 57.

63

sum fixation duration

1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0

0.86

0.53

avg

pc

Figure 5-7 – Average sum of fixation durations on targets of average and PC methods

5.2.3 False Color vs. Principal Components 5.2.3.1 Fixation count

A three-way ANOVA (image(8) X side(2) X method(2)) on fixation count percentage yielded a significant main effect of method { F(1.128) = 7.65; P < 0.05 }, indicating that areas of interest had more fixations fused with the PC method than with the FC method. Figure 5-8 presents mean fixation count percentage for the PC and FC methods in each image side. The analysis revealed no significant interaction effects.

fixation count %

60 50

50.95

46.47

44.26

41.18

40 fc

30

pc

20 10 0 left

right side

Figure 5-8 – Fixation count fraction of PC and FC methods in each image side.

64

A three way ANOVA (image(8) X side(2) X method(2)) conducted on the mean fixation durations revealed no significant effect or interaction.

5.2.3.2 Target fixation time

A one-way ANOVA conducted on the targets’ summed fixation time yielded a marginally significant effect of the method { F(1,51) = 3.26; p = 0.077 }. The means for the two methods are presented in figure 5-9. We can see from the graph that the means for fixation duration on each target was greater for the PC method (0.87) than that in the FC method (0.56). It is interesting to note, that although the effect was only marginally significant, it

sum fixation duration

is in the same tendency as the significant effect found for the fixation count. 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0

0.87

0.56

PC

FC

Figure 5-9 – Average sum of fixation durations on targets of FC and PC methods

5.3 Discussion Mackworth and Morandi (1967) divided two color photographs into 64 square regions. The regions were then rated according to the informativeness of each region by

65

a group of observers who were asked to rate the regions according to how easy it would be to recognize the region again. A group of viewers then examined the pictures and were asked to decide which image they preferred. The number of fixations in each region was found to be related to informativeness rating of the region. Henderson and Hollingworth (in Underwood, 1998) agreed with Mackworth and Morandi and argued that the total time a region is fixated is correlated with the number of fixations in that region. Because fixation density is higher for informative regions, not only the number of fixations, but also the amount of time viewers fixate on an area is dependent on the informativeness of that area; the more information the area contains, the more time a viewer will fixate on that area. (Henderson and Hollingworth 1998, in Underwood, 1998). We can therefore hypothesize that the more a viewer fixates on a region fused with a specific method, or on targets fused with that method, the more informative that region or that target would be. Thus, a fusion method which shows more fixations makes the features and objects in the image more informative, and therefore, the targets shown in that image would be more detectable.

When comparing the average and FC methods, image sides showing the FC method had significantly more fixations per image than identical image sides showing the average method. Targets in the false color fused areas had marginally significant more fixations than the average method did. When comparing the average and PC methods, areas showing the PC method had significantly more fixations per image than areas showing the average method did.

66

Targets in the PC areas had more fixations than the targets in the average areas, although this effect was not significant. The comparison of the PC and the FC methods showed that image sides showing the PC method had significantly more fixations per image than image sides showing the FC method. Targets in the PC sides had more fixations than the targets in the FC sides. This effect was marginally significant. The summary of these statistical results is presented it table 5-1

Fixation count percentage Target fixation duration Average fixation duration

Avg. vs. FC

Avg. vs. PC

PC vs. FC

FC(s)

PC(s)

PC(s)

FC(m)

PC

PC(m)

none

none

none

Table 5-1 - Summary of eye-tracking statistical results. (s) denotes statistically significant result; (m) denotes marginally significant result.

We can see in table 5-1 that for each comparison of two fusion methods, there is a significant effect for the number of fixations in one fused side. Fixations on the targets, on the other hand, while strongly following the tendency of that effect, show only a marginally significant effect. The main reason, in our opinion, that this tendency was only marginally significant despite the large difference between the group means, was the large differences between the subjects. Because of the complex task of searching IR images, not all observers were found to fixate on all targets, and on the more salient

67

targets, observers had a large variability for the number of fixations. In addition, according to the manufacture, the eye-tracking system's accuracy is 1º (equivalent to 40 pixels at a distance of 70 cm). This error affects mainly small targets, and can cause the data to be highly variable. The overall average fixation duration was not significant for all comparisons. However, as can be seen in table 5-1, the fixation count percentage was significant in all comparisons. Thus, since there is no difference between the means of each fixation duration, we can conclude that the higher number of fixations caused a longer dwell time in the more fixated areas. Summarizing these results, the PC method had more fixations and dwell time than both FC and average, FC had more fixations and dwell time then average, and average fusion method had the least number of fixations. Therefore we can conclude that the fusion method which causes the images to be most informative and most strongly attracts the eye is the principal components method, followed by false color and the average methods.

68

6. Discussion This research focused on comparing different multispectral fusion methods for a target detection task. The aim of this study was to take well known image fusion schemes, and to design special qualitative and quantitative measures to compare them. In addition, we introduced a novel method for multispectral fusion based on edge detection, and compared this method to other known methods. With the increased availability of multispectral data in the last ten years, many image fusion schemes have been introduced. Human factors experiments have been used to show that these fusion methods improve human performance in various visual tasks over the use of the single input bands (Krebs et al., 1999, Essock et al., 1999, Krebs et al., 2001). Although it has been established that in many cases the use of fused imagery improves human performance, there have not been many studies that have tried to examine which fusion method is to be used under which circumstances. The studies that did compare different fusion methods, compared them while trying to show their advantages over the input band, and did not emphasize the differences between the different fusion methods. We examined 26 different multispectral images, and fused them using 4 different fusion methods. Three of the four methods were based on known methods used in many studies, and one method was a novel method based on edge detection. A set of three experiments were then conducted in order to qualitatively and quantitatively compare between the different methods, and better understand the advantages and disadvantages of each method compared to the other. 69

In order to quantitatively examine which method is best, fused images of different scenes were presented to observers. Observers were assigned to different groups according to the fusion method. We compared detection accuracy and detection times of targets embedded in the fused images of the different fusion groups. In order to qualitatively compare between the two groups, we conducted a pairedcomparison experiment, where several observers compared two fusion methods. Pairwise subjective comparisons of the same image fused with the different fusion methods were evaluated and transformed to scalar values on a single scale. The method of paired comparisons is widely applied in social and economical related statistical research (Clarke et al., 1999, Huang and Stoll., 1996). Here, applied in a different context, we use it to compare different image samples. To get further insight, eye movements were recorded while the observers searched for targets. We used a novel method to compare between the different fusion methods. The images viewed by the observers were divided into two: the left side fused with one fusion method, and the right side fused with another fusion method. By comparing the eye movement data of the same area of the same image using different fusion methods, we were able to check which method attracted the eye more, and which method provided more visual information. The main motivation behind the use of this experimental method was that it is hard to measure precise eye fixation locations on small areas of interest. The declared system accuracy by the manufacturer is 1° which at a distance of 70cm is equivalent to 40 pixels. Since this is the length of some of the small targets in our images, and about half the length of the bigger targets, it is hard to reliably measure eye fixations on the targets.

70

Thus, it is difficult to reliably define where the person is fixating. Our solution to this problem was to define bigger areas of interest (e.g., to divide the image into two parts), and to compare viewing patterns in the same areas of interest of the same images of different fusion methods. In this way, we were also able to measure immediate differences between fusion methods, since observers were able to look at two fusion methods simultaneously.

6.1

Result Summary The results of the target detection experiment, showed a significant difference in

target hits; the edge method had significantly less hits than the other methods. In addition, the results showed some advantage of the PC and FC methods over the average method. Further investigation showed a difference between high quality and low quality images. The FC and PC methods were better for the high quality images, while the average, FC, and PC fusion methods were about the same, and better than the edge method for the low quality group. In the paired comparison experiment, the results showed the subjective opinions of the observers. In their opinion, the false color and average fusion methods better helped them to detect the targets than the edge and the principal components method did. The eye tracking experiment results showed a significant difference in the fixation number and dwell time observers spent looking at image sides and targets fused with each method. The results indicated that the PC method had more fixations than both the FC and the average methods, indicating that the PC method causes images to be more

71

visually informative than either the FC or the average methods. The FC method had more fixations than the average method. Table 6-1 ranks the fusion methods according to the results of the three experiments. Experiment / method Target detection

Principal components 1

False color

Edge

average

1

4

3

Paired comparisons

3

1

3

1

Eye tracking

1

2

-

3

Total

5

4

7(*)

7

Table 6-1 – Ranks of the four fusion methods for all experiment. 1-4 denotes the rank of the method in the experiment, where 1 is the highest performance and 4 is the lowest. * denotes a missing experiment

6.2

Results Discussion As seen in table 6-1 the edge fusion method did not perform well. It ranked last in

the target detection experiments, with significantly less detection rates, and last together with the PC method in the subjective opinion of the observers. Because of these rankings we did not use this method in the eye-tracking experiment. Edge enhancement is a method which strengthens information about edges of objects displayed. It finds the edge information of one input band and adds it to the other input band. A possible explanation for the low performance of the edge method is the low contrast of many of the input images used in our experiment. Edge enhancement will work best if the contrast between a target and its surroundings is high, making the targets’ edges clear. Some of our images had low resolution. In these images, the contrast,

72

especially of the mid and long IR input bands, was low, which made it hard on the edge detection algorithm to find the exact edges of the targets. The detection of these targets in other methods is based more upon the recognition of their shape than on the exact edges. In addition, the edge method uses features (edge information) which are extracted from one image and added to another image. Because of the low contrast in some of the images, these edge features were incomplete, and may have added clutter to the edge image. According to the feature detection theory, in complex search tasks, where a target is defined by a conjunction of features, serial attention process would be needed to locate the target. Thus, more features the distractors have, the more time the search will take (Treisman and Gelade, 1980). Adding clutter to an image adds another feature layer, which, when distributed all over the image, can decrease the observers’ detection performance. Another possible explanation for the edge method's poor performance is the fusion level. While the other methods fuse the input bands in the pixel level, fusing the new representation of the target from all input bands, the feature level edge method chooses one band, and adds features (edge information) from other bands. While the edge information adds to the contour of the target, the detection of a target depends on the recognition of its shape, not necessarily recognition of its contour. Figure 6-1 a-d displays two input bands and the average and edge fused methods of an image. We can see that the edge fused method is based on band 1. This method takes the target’s body almost completely from band 1, and adds information from band 2. The average method

73

on the other hand, takes the target’s body from both images, making the content of the target more uniform, and the shape of the target more salient.

(a) Band 1

(b) Band 2

(c) Average

(d) edge

Figure 6-1 – Example of multispectral image. a and b show the input bands, while c and d show the average and edge fused images respectively. Target is marked by an ellipse in the input bands.

Other results, as seen in table 6-1, indicate that over all experiments, the FC methods showed better performance than the average method. These results are consistent with the findings of Simard et al. (1999) who found that the FC algorithm is superior over the average algorithm. The PC method showed better results than the average method in both the target detection and the eye tracking experiment. Yet in the paired comparisons experiment, people preferred the average method over the PC method. It is interesting to note that

74

although the quantitative experiments indicate that the PC method is better, people perceived the average as better. A possible explanation is that the strong colors the PC method generated, which were not natural, caused the people to prefer the more natural seeming average representation. The false color method, while using unnatural colors as well, used brighter colors. Comparing the FC and the PC methods across the different experiments showed that the FC method was preferred in the paired comparisons experiment, the PC method showed better performance in the eye-tracking experiment, and they performed about the same in the target detection experiment. Overall, we cannot prefer one method over the other. These results support the finding of McCarley and Krebs (2000) who did not find a substantial difference between the FC and PC methods. Comparing our results to previous studies, we see that similar to the finding of Simard et al (1999) we found that the FC fusion method was better than the average fusion method. Krebs et al. (2001, 2002) showed that the PC fusion method showed equal or better performance than each of the input bands. We add to these findings by showing that the PC fusion method also showed better performance than the simple average fusion method. In addition, similar to the findings of McCarley and Krebs (2000), we did not find a significant difference between the FC and PC methods.

6.2.1 Chromatic vs. achromatic fusion Our results showed better performance for the chromatic fusion methods (FC, PC) than the achromatic fusion methods (average, edge). These results are in accordance with other studies (e.g., Krebs et al., 2001, Waxman et al., 1996, Sinai et al., 1999, Toet et al., 1997) which have shown some advantage of color over achromatic fusion.

75

Color can help explain the eye-tracking experiment, in which the FC and PC fusion methods performed better than the average method. Color adds another dimension to visual informativeness and thus guides the eye during visual search. Krebs et al. (2001) recorded eye movement of observers viewing multispectral images. They found that chromatic fused images showed shorter scan paths to the target than achromatic fused images. They argued that color-fused targets provided more information and therefore helped observers find the target faster than the achromatic fused images. Similar to the eye tracking experiment, in the target detection experiment the color methods performed better than the achromatic methods. These results are not surprising. Krebs et al. (2001) have shown that both spatial and color target attributes assist in visual search, and Essock et al. (1999) suggested that color in complex scenes aids the perceptual organization required for visual search. It should be noted that while other studies, which have shown possible contribution of color to fusion, help explain our results, we cannot claim that our study helps substantiate the advantages of chromatic fusion. Because the chromatic and achromatic fusion methods we used had different spatial content, the differences in performance can be attributed to the spatial content and not necessarily the color.

6.3

Limitations and future research The input multispectral images we used were given to us as is. We could not control

the input bands, the scenes, or the targets. This dictated parts of our experiment. For example, in the target detection experiment, we showed each observer eight different images. It would have been better to show each observer more images, but in the images

76

we had, there were only eight different scenes. Using more images, or alternatively, more observers, may have helped in the eye-tracking experiment to turn the tendency of the observers’ fixations on targets to become significant. This study compared four different fusion methods. Of these four fusion methods three (PC, FC and average) were frequently used in different human factors researches. Because of limited resources, we did not compare them to the opponent processing fusion method or to any multiscale decomposition based method. Future studies could include these methods, and compare them with the methods examined here to get a more complete picture of the different fusion methods. The present work focused on the task of target detection. Other tasks like situation awareness, recognition and identification, and different search tasks will not necessarily follow the trends found in this study. Further research can compare different fusion methods for different visual perception tasks. A suggestion for future research would be to develop a general predictive model for target detection performance based on fusion method. This study suggested three ways for comparing different fusion methods for perceptual tasks. Nevertheless, since there are many fusion methods, it is impossible to compare them all through all tasks and different environmental conditions of input images. The methods introduced here, could be used as part of the model that would analyze the relative efficiency of a specific fusion method. This model should present a way to rate a specific fusion method for different perceptual tasks, and have clear guidelines regarding the characteristics of the input images and the targets. Building such a model, would provide a consistent paradigm to compare the existing fusion methods, as well as yet-to-be-developed fusion methods.

77

7. Conclusions Four fusion methods have been compared in this study over different input bands, using three different experiments, for a target detection task. The results indicated that the false color and principal components fusion methods showed best results over all experiments and conditions. A novel method based on edge detection, did not yield good results. We introduced two new methodologies to compare fused images. In the first, paired comparisons were used to subjectively compare fusion methods according to observers’ preferences, and then scale the quality of each method with scalar values on one scale. In the second, a novel approach for using eye-tracking to compare different image samples was introduced, in which the images are divided into two areas and eye movements in each area are compared between different image samples. This approach helps solve the inherent problems in eye-tracking data with analyzing fixations in small targets.

There have been many studies which have shown the advantage of image fusion over the component bands. These studies have established that fusion is beneficial in many cases, and have introduced numerous fusion methods. The natural next step was to ask what fusion method should be used under which circumstances. This study tried to give a first perspective on this question. To further investigate this issue, comparison studies like this one should be done between different methods, input bands, and tasks.

78

Ultimately, answers to this question will provide us with a more complete understanding of how to fuse multispectral images for various tasks.

79

8. References

1

Aguilar, M., Fay D.A., Ross W.D., Waxman A.M., Ireland D.B., Racamato, J.P. (1998). Real-time fusion of low-light CCD and uncooled IR imagery for color night vision, Proceedings of the SPIE, vol. 3364, pp. 24-35.

2

Burt, P. J., Kolezynski. R. J. (1993). Enhanced image capture through fusion, Proc. The 4th International Conference on Computer Vision, pp. 173-182. IEEE Computer Society.

3

Caefer, C.E, Rotman, S.R., Silverman, J., and Yip, P.W. (2002). Algorithms for point target detection in hyperspectral imagery, Imaging Spectrometry VIII, Sylvia S. Shen, proceeding pf SPIE vol. 4816, pp. 242-257.

4

Canny, J. (1986). A Computational Approach to Edge Detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-8, No. 6, pp. 679-698.

5

Clarke A., Bell P.A., Peterson G. (1999). The Influence Of Attitude Priming And Social Responsibility on the Valuation of Environmental Public Goods Using Paired Comparisons, Environment and Behavior, Vol. 31, No. 6, 838-857.

6

Endsley, M. (1998). Design and evaluation of situation awareness enhancement, Proceedings of the Human Factors society, 32nd annual meetings, (pp. 97-101).

7

Essock E.A., Sinai M.J., McCarley J.S., Krebs W.K, DeFord J.K. (1999). Perceptual ability with real-world nighttime scenes: Image-intensified, infrared and fused-color imagery. Human Factors, 41 (3), 438-452.

8

Findlay, J.M. (1997). Saccade target selection in visual search. Vision Research, 37, 617-631 80

9

Fishbain B. (2004). Data fusion methods for fusing thermal and visual range video sequences. (Masters thesis) Tel Aviv University.

10 Goldstein B.E. (1996). Sensation and perception. Pacific Grove, CA: Brooks/Cole

Publishing Company

11 Gomez, R.B., Jazaeri, A., Kafatos, M. (2001). Wavelet-based hyperspectral and

multispectral image fusion, Proc. SPIE Vol. 4383, p. 36-42, Geo-Spatial Image and Data Exploitation II, William E. Roper; Ed.

12 Gonzalez R.C., Woods R.E., (1992) Digital Image Processing, Addison-Wesley.

13 Hahn M. , Samadzadegan F. (2004). A Study of Image Fusion Techniques in Remote

Sensing. Proceedings of XXth ISPRS Congress, Commission IV papers, Vol. XXXV, part B4

14 Huang, R. D., and Stoll H.R. (1996) Dealer versus Auction Markets: A Paired

Comparison of Execution Costs on NASDAQ and the NYSE, Journal of Financial Economics, 41, 313–357.

15 Hughes P. K. and Creed D. J. (1994). Eye movement behaviour viewing colour-

coded and monochrome avionic displays, Ergonomics 37, 1871-1884

16 Krebs W.K., McCarley J.M., Kozek T, Miller G.M., Sinai M.S., Werblin F.S. (1999)

An evaluation of a sensor fusion system to improve drivers’ nighttime detection of road hazards. Proceedings of the 43rd Annual Meeting Human Factors and Ergonomics Society. 43, 1333-1337

81

17 Krebs W.K, Scribner D.A, and McCarley J.S (2001). Comparing behavioral receiver

operating characteristic curves to multidimensional matched filters, Opt. Eng. 40(9), 1818-1826

18 Krebs W.K, Sinai M.J. (2002). Psychophysical Assessments of Image-Sensor Fused

Imagery, Human Factors 44(2), 257-271

19 Luo, R.C., Yih C, Su K.L (2002). Multisensor Fusion and Integration: Approaches,

Applications, and Future Research Directions, IEEE Sensors Journal vol.2 no.2

20 Maltz M., Shinar D, (1999). Eye Movements of Younger and Older Drivers, Human

Factors 41, 15-25.

21 Maltz M., Shinar D., (2003). New alternative methods of analyzing human behavior

in cued target acquisition, Human Factors 45(2):281-295

22 McCarley, J.S., Krebs, W.K. (2000). Visibility of road hazards in thermal, visible,

and sensor-fused night-time imagery. Applied Ergonomics, 31, 523-530.

23 Raviv O., and Rotman S.R., An improved filter for point target detection in multi-

dimensional imagery, Signal and Data Processing of Small Targets 2003, Sylvia S. Shen, Proceedings of SPIE Vol. 5159, in print.

24 Sampson, M.T., Krebs, W.K., Scribner, D.A., and Essock, E.A. (1996). Visual

search in natural (visible, infrared, and fused visible and infrared) stimuli. Investigative Ophthalmology and Visual Science, (SUPPL) 36, 1362, Ft. Lauderdale, FL.

25 Scheunders P, DeBackers S. (2001). Fusion and merging of multispectral images

with use of multiscale fundamental forms, Opt. Soc. Am. A 18(10), 2468

82

26 Sinai, M.J, McCarley, J.S., Krebs, W.K, and Essock, E.A. (1999). Psychophysical

comparisons of single and dual-band fused imagery. Proceedings of the SPIESynthetic Advanced Vision, 3691, 1-8

27 Silverstein, D.A. and J.E. Farrell. (2001). Efficient method for paired comparison,. J.

of Electronic Imaging 10(2):394-398. 28 Simard, P., Link, N K, Kruk, R V. (1999). Feature detection performance with fused

synthetic and sensor images, Human Factors and Ergonomics Society Annual Meeting, 43rd, 1108-1112.

29 Singh S, Gyaourova A, Bebis G, Pavlidis I, (2004). Infrared and visible image

fusion for face recognition, Proceedings of SPIE Vol. 5404, 585-596.

30 Steele P.M., Perconti P., (1997). Part task investigation of multispectral image

fusion using gray scale and synthetic color night vision sensor imagery for helicopter pilotage, Proc. SPIE Vol. 3062, p. 88-100, Targets and Backgrounds: Characterization and Representation III, Wendell R. Watkins; Dieter Clement; Eds.

31 Toet A., (1992). Multiscale contrast enhancement with applications to image fusion,

Optical Engineering, 31, 1026-1031

32 Toet, A., Walraven, J., (1996). New false color mapping for image fusion, Opt. Eng.

35, 650-658

33 Toet, A., Ijspeert, J. K., Waxman A. M., Aguilar M. (1997). Fusion of visible and

thermal imagery improves situational awareness, SPIE Proc. 3088, 177-188

34 Toet A., Bijl P., Kooi F. L., and Valenton J. M.. (1998). A high-resolution image

dataset for testing search and detection models, Report TNOTM-98-A020, TNO Human Factors Research Institute, Soesterberg, The Netherlands.

83

35 Toet A, Franken M (2003). Perceptual evaluation of different image fusion schemes,

Displays 24, 25-37

36 Torgerson, W.S. (1967). Theory and Methods of Scaling, chapter 10, John Wiley & Sons Inc. 37 Treisman A.M. and Gelade G. (1980) A feature-integration theory of attention,

Cognitive Psychology. 12, 97–136.

38 Underwood G. (2001). Eye guidance in Reading and Scene Perception, Elsevier

Science ltd, Oxford

39 Waxman, A.M, Gove A.N., Seibert M.C., Fay D.A., Carrick J.E., Racamato J.P.,

Savoye E.D., Burke B.E., Riech R.K., McGonagle W.H., Craig D.M. (1996) Progress on color night vision: visible/IR fusion, perception and search, and lowlight CCD imaging, Proceedings of SPIE 2736, 96-107

40 Waxman, A.M, Aguilar M., Baxter, R. A., Fay, D.A., Ireland, D.B., Racamato, J.P.,

Ross W.D. (1998). Opponent-color fusion of multi-sensor imagery: visible, IR and SAR, Proceedings of IRIS Passive Sensors, Vol. 1, pp. 43-61,

41 White, B.L. (1998), Evaluation of the Impact of Multispectral Image Fusion on

Human Perfromance in Global Scene Processing (Master’s thesis) Naval Postgraduate School, Monterey, CA.

42 Wickens C.D., Hollands J,G (2000) Engineering psychology and human

performance, Prentice-Hall Inc., New-Jersey

43 Xue Z, Blum, R.S. (2003). “Concealed weapon detection using color image fusion”,

Proceedings of the Sixth International Conference of Information Fusion 622-627

84

44 Zhang Z, Blum R.S. (1999). A categorization of multiscale-decomposition-based

image fusion schemes with a performance study for a digital camera application”, Proceedings of the IEEE 87(8), 1315-1326

45 Zhang Z, Blum R.S, (1997) “Multisensor Image Fusion Using a Region-Based

Wavelet Transform Approach”

85

9

Appendices

Appendix 1 - Familiarization images that were given before the target detection experiment

Average

Edge

False Color

Target detection

86

Appendix 2 – Target detection statistical results One way ANOVA and fisher LSD post-hoc tests of the effect of fusion method on total hit numbers Univariate Tests of Significance for total hits (Spreadsheet13) Sigma-restricted parameterization Effective hypothesis decomposition SS

Degr. of MS

Intercept 10395.88 1

F

p

10395.88 1155.979 0.000000

method

199.48

3

66.49

Error

467.64

52

8.99

7.394

0.000324

LSD test; variable total hits (Spreadsheet13) Probabilities for Post Hoc Tests Error: Between MS = 8.9931, df = 52.000 method

{1}

1 average 2 edge

{2}

{3}

{4}

0.005474 0.191498 0.317984 0.005474

0.000097 0.000271

3 false co 0.191498 0.000097

0.753954

4 principa 0.317984 0.000271 0.753954

One way ANOVA test of the effect of fusion method on false alarm numbers Univariate Tests of Significance for false alarm (target detection base) Sigma-restricted parameterization Effective hypothesis decomposition SS

Degr. of

MS

F

p


3844.571 159.3695 0.000000

method 31.000

10.333

Error

3

1254.429 52

0.4283

24.124

87

0.733528

method; LS Means Current effect: F(3, 52)=.42835, p=.73353 Effective hypothesis decomposition Vertical bars denote 0.95 confidence intervals 13 12 11

false alarm

10 9 8 7 6 5 4 3 average

edge

false co

principa

method

One way ANOVA test of the effect of fusion method on average detection time Univariate Tests of Significance for avg. to detection (target detection base) Sigma-restricted parameterization Effective hypothesis decomposition SS

Degr. of

MS

F

p


753.1859 538.6120 0.000000

method 1.3109

3

0.4370

52

1.3984

Error

72.7159

0.3125

0.816267

method; LS Means Current effect: F(3, 52)=.31248, p=.81627 Effective hypothesis decomposition Vertical bars denote 0.95 confidence intervals 4.8 4.6 4.4

avg. to detection

4.2 4.0 3.8 3.6 3.4 3.2 3.0 2.8 2.6 average

edge

false co

principa

method

88

Two way ANOVA test of Quality X Method Repeated Measures Analysis of Variance (target detection full) Sigma-restricted parameterization Effective hypothesis decomposition SS

Degr. of

MS

F

p

26.82101 1

26.82101 1080.336 0.000000

method

0.75104

3

0.25035

Error

1.29098

52

0.02483

QUALITY

0.18824

1

QUALITY*method 0.17893

Intercept

0.80128

Error

10.084

0.000024

0.18824

12.216

0.000978

3

0.05964

3.871

0.014232

52

0.01541

Fisher LSD Post hoc analysis of main effect of fusion method on good quality images LSD test; variable good quality (target detection full) Probabilities for Post Hoc Tests Error: Between MS = .01841, df = 52.000 method

{1}

1 average

{2}

{3}

{4}

0.293116 0.111453 0.056576

2 edge

0.293116

0.009805 0.004000

3 false color

0.111453 0.009805

0.742121

4 principal comp 0.056576 0.004000 0.742121

Main effect analysis of variance of main effect of fusion method on bad quality images Univariate Tests of Significance for bad quality (target detection full) Sigma-restricted parameterization Effective hypothesis decomposition SS

Degr. of

MS

F

p


15.75161 855.6592 0.000000

method 0.22020

3

0.07340

52

0.01841

Error

0.95725

3.9873

89

0.012467

Appendix 3 - paired comparisons decision time The decision time taken to choose one image modality over the other can be attributed to the certainty in the choice of the observer. The more time it takes to choose between two methods, the less certain the observer is of his or her choice. The following graphs show the decision time averages of each decision pair. A oneway ANOVA was conducted in order to examine the effect of the comparison type on the decision time. The following graph, and then the analysis shows this test.

Comparison; LS Means Current effect: F(5, 275)=6.5253, p=.00001 Effective hypothesis decomposition Vertical bars denote 0.95 confidence intervals 5.4 5.2 5.0

Decision Time

4.8 4.6 4.4 4.2 4.0 3.8 3.6 3.4 PC - FC

PC - Edge

PC - Avg

FC - Edge

FC - Avg

Comparison

LSD test; variable DV_1 (Spreadsheet1) Probabilities for Post Hoc Tests Error: Within MS = .40239, df = 275.00

R1

{1}

1 var2 2 Var3 0.005818

{2}

{3}

{4}

{5}

{6}

0.005818 0.031995 0.000199 0.466980 0.000003 0.533052 0.322034 0.041197 0.046104

90

Edge - Avg

3 Var4 0.031995 0.533052

0.107195 0.154704 0.009080

4 Var5 0.000199 0.322034 0.107195

0.002567 0.312696

5 Var6 0.466980 0.041197 0.154704 0.002567

0.000065

6 Var7 0.000003 0.046104 0.009080 0.312696 0.000065

It can be seen from the graph and the post-hoc analysis that the time taken to decide between the PC-FC pairs and the FC-Average pairs was the largest. It can be concluded that these are the pairs which it was hardest for the observers to decide on. The edge-Avg pair, on the other hand, was significantly lower than the other decision times. It can be concluded that it was easier for the observers to choose between these two image pairs.

91

Appendix 4 – Eye tracking statistical results

Average-FC target comparison Fixations on sides

Model Dimension(a)

Fixed Effects

Repeated Effects

Number of Levels 1

Intercept

Covariance Structure

Number of Parameters

2

1

pic

8

7

side

2

1

grp * pic

16

7

grp * side

4

1

grp * pic * side

32

14

trial

16

Diagonal

16

81

48

a Dependent Variable: fix_cnt_pct.

Information Criteria(a) -2 Restricted Log Likelihood

993.557

Akaike's Information Criterion (AIC)

1025.557

Hurvich and Tsai's Criterion (AICC)

1030.641

Bozdogan's Criterion (CAIC)

1086.681

Schwarz's Bayesian Criterion (BIC)

1070.681

The information criteria are displayed in smaller-is-better forms. a Dependent Variable: fix_cnt_pct.

Fixed Effects Type III Tests of Fixed Effects(a)

grp

Number of Subjects

1

grp

Total

Source Intercept

Subject Variables

Numerator df 1

Denominator df 78.835

F 2104.252

Sig. .000

1

78.835

15.811

.000

92

sub

10

pic

7

31.708

.147

.993

side

1

78.835

14.157

.000

grp * pic

7

31.708

2.235

.058

grp * side

1

78.835

.019

.891

29.066

3.680

.001

grp * pic * side

14 a Dependent Variable: fix_cnt_pct.

Estimated Marginal Means 1. grp(a) 95% Confidence Interval grp Avg

Mean

FC

Std. Error

df

43.310

1.429

73.838

Lower Bound 40.463

Upper Bound 46.158

51.531

1.494

76.860

48.556

54.507


2. grp * pic(a) 95% Confidence Interval grp Avg

FC

pic 1.00

Mean

Std. Error

df

48.394

6.677

15.771

Lower Bound 34.223

Upper Bound 62.565

2.00

42.520

4.774

11.882

32.106

52.934

3.00

44.012

2.317

15.796

39.094

48.929

4.00

47.792

4.952

18.312

37.400

58.183

5.00

40.986

3.258

15.739

34.071

47.901

6.00

38.382

2.366

9.035

33.033

43.731

7.00

44.789

2.801

15.929

38.848

50.730

8.00

39.607

3.060

9.737

32.764

46.449

1.00

47.573

6.744

15.156

33.211

61.934

2.00

53.064

5.302

12.889

41.600

64.528

3.00

50.549

2.407

13.605

45.371

55.726

4.00

41.688

4.973

15.808

31.134

52.241

5.00

54.268

3.375

16.560

47.133

61.403

6.00

56.258

2.478

10.638

50.782

61.733

7.00

52.474

2.807

15.300

46.503

58.446

3.656

18.905

48.723

64.032

8.00

56.377 a Dependent Variable: fix_cnt_pct.

3. grp * side(a) 95% Confidence Interval grp Avg FC

side left

Mean

Std. Error

df

47.057

2.020

40.147

Lower Bound 42.975

Upper Bound 51.139

right

39.563

2.022

34.181

35.455

43.671

left

55.563

2.113

42.955

51.301

59.826

93

right 47.499 a Dependent Variable: fix_cnt_pct.

2.113

34.764

43.209

51.790

4. grp * pic * side(a) 95% Confidence Interval grp Avg

pic 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00

FC

1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00

side left

Mean

Std. Error

df

51.626

9.542

7.911

Lower Bound 29.580

right

45.162

9.342

7.865

23.554

66.769

left

41.391

5.737

6.828

27.755

55.026

right

43.650

7.633

6.274

25.169

62.130

left

46.746

2.980

7.786

39.840

53.651

right

41.278

3.549

8.319

33.148

49.407

left

45.942

6.690

7.183

30.203

61.680

right

49.642

7.303

11.536

33.658

65.626

left

50.303

3.734

14.739

42.332

58.275

right

31.669

5.339

8.022

19.363

43.975

left

45.262

4.549

7.779

34.719

55.805

right

31.503

1.302

6.759

28.403

34.603

left

40.095

4.125

9.134

30.783

49.406

right

49.484

3.791

6.851

40.479

58.488

left

55.094

5.645

7.209

41.825

68.364

right

24.119

2.363

9.799

18.839

29.399

left

52.049

8.534

7.911

32.330

71.767

right

43.097

10.445

7.865

18.939

67.255

left

53.090

8.113

6.828

33.807

72.374

right

53.038

6.827

6.274

36.508

69.568

left

55.973

2.640

8.093

49.898

62.049

right

45.124

4.026

7.847

35.807

54.441

left

42.326

6.603

11.657

27.892

56.759

right

41.049

7.439

6.714

23.306

58.793

left

60.890

5.356

7.980

48.533

73.247

right

47.646

4.107

12.818

38.760

56.532

left

62.466

4.816

9.536

51.665

73.267

right

50.049

1.168

6.677

47.261

52.838

left

47.289

3.998

6.660

37.736

56.842

right

57.660

3.940

9.086

48.760

66.559

left

70.424

5.363

12.230

58.763

82.084

right

42.331

4.970

7.303

30.677

53.986


Fixations on targets Means: method; Weighted Means (pc - avg targets) Current effect: F(1, 30)=3.2024, p=.08363

94

Upper Bound 73.672

Effective hypothesis decomposition method sum fix dur sum fix dur sum fix dur sum fix dur N 1 avg

0.319913

0.103035

0.111672

0.528154

23

2 FC

0.626895

0.113363

0.397780

0.856010

19

Univariate Tests of Significance for sum fix dur (fc - avg targets) Sigma-restricted parameterization Type II decomposition SS

Degr. of

MS

F

p


1.493229 5.624364 0.024329

method 0.850213 1

0.850213 3.202394 0.083631

image

0.726146 7

0.103735 0.390727 0.900358

side

0.268973 3

0.089658 0.337703 0.798176

Error

7.964786 30

0.265493

Average-PC target comparison Fixations on sides Model Dimension(a)

Fixed Effects

Intercept



Subject Variables

2

1

pic

8

7

2

1

16

7

grp * side

4

1

pic * side

16

7

grp * pic * side

32

7

trial

16

grp * pic

Total

Diagonal

97

Information Criteria(a) 833.892


865.892


872.446

16 48


-2 Restricted Log Likelihood

Number of Subjects

1

grp side

Repeated Effects

Number of Levels 1

95

sub

9


923.575


907.575



Source Intercept

Numerator df 1


F 1578.738

Sig. .000

grp

1

65.980

16.061

.000

pic

7

27.206

1.077

.405

side

1

65.980

21.745

.000

grp * pic

7

27.206

4.471

.002

grp * side

1

65.980

.202

.654

pic * side

7

27.206

4.500

.002

grp * pic * side

7

27.206

.541

.796


Estimated Marginal Means 1. grp Estimates(a) 95% Confidence Interval grp avg pc

Mean

Std. Error

df

42.159

1.601

32.949

Lower Bound 38.901

Upper Bound 45.417

51.617

1.734

33.380

48.091

55.143


Pairwise Comparisons(b) 95% Confidence Interval for Difference(a)

Mean Difference (I) grp (J) grp (I-J) Std. Error df Sig.(a) Lower Bound Upper Bound avg pc -9.458(*) 2.360 65.980 .000 -14.171 -4.746 pc avg 9.458(*) 2.360 65.980 .000 4.746 14.171 Based on estimated marginal means * The mean difference is significant at the .05 level. a Adjustment for multiple comparisons: Least Significant Difference (equivalent to no adjustments). b Dependent Variable: fix_cnt_pct.

96

Tests of simple effect(a)

Numerator df 1


F 16.061

Sig. .000

The F tests the effect of grp. This test is based on the linearly independent pairwise comparisons among the estimated marginal means. a Dependent Variable: fix_cnt_pct.

2. grp * pic(a) 95% Confidence Interval grp avg

pc

pic 1.00

Mean

Std. Error

df

37.857

5.605

5.741

Lower Bound 23.990

2.00

43.386

7.164

5.301

25.280

61.491

3.00

45.072

3.661

3.019

33.463

56.682

4.00

38.561

3.447

10.827

30.959

46.164

5.00

48.839

4.283

18.988

39.875

57.803

6.00

49.853

3.735

12.137

41.726

57.980

7.00

28.999

3.833

11.109

20.573

37.426

8.00

44.702

3.017

5.892

37.287

52.117

1.00

59.105

3.045

4.700

51.124

67.086

2.00

52.329

7.026

4.848

34.096

70.563

3.00

38.482

5.933

13.978

25.754

51.210

4.00

46.313

4.007

11.610

37.550

55.076

5.00

48.655

4.287

9.122

38.976

58.334

6.00

50.771

5.690

20.034

38.903

62.639

7.00

64.074

4.583

11.116

54.000

74.147

3.273

5.544

45.036

61.380

8.00


Upper Bound 51.723

3. grp * side(a) 95% Confidence Interval grp avg pc

side left

Mean

Std. Error

df

47.131

2.336

28.207

Lower Bound 42.348

Upper Bound 51.913

right

37.186

2.191

34.689

32.736

41.637

left

57.650

2.481

34.083

52.609

62.691

right

45.584

2.423

26.213

40.606

50.562


4. grp * pic * side(a) 95% Confidence Interval grp avg

pic 1.00

side left

Mean 54.727

Std. Error 7.339

97

df 5.741

Lower Bound 36.572

Upper Bound 72.883

right

20.986

8.474

5.741

.022

41.950

left

55.792

11.699

5.301

26.225

85.358

right

30.980

8.273

5.301

10.073

51.886

left

38.849

4.295

2.792

24.587

53.111

right

51.296

5.930

3.135

32.876

69.716

left

36.642

5.100

11.149

25.435

47.850

right

40.480

4.639

7.478

29.651

51.309

left

51.053

5.625

10.738

38.636

63.469

right

46.625

6.460

14.650

32.828

60.422

left

55.210

5.580

9.912

42.762

67.657

right

44.496

4.966

9.631

33.374

55.619

left

31.372

5.354

6.579

18.547

44.197

right

26.627

5.487

15.640

14.972

38.281

left

53.402

4.741

6.843

42.140

64.665

right

36.001

3.732

4.542

26.109

45.894

left

72.446

4.604

4.700

60.380

84.512

right

45.763

3.987

4.700

35.314

56.213

left

69.492

8.113

4.848

48.438

90.547

right

35.167

11.474

4.848

5.391

64.942

left

30.523

10.209

10.634

7.959

53.087

right

46.441

6.050

11.489

33.194

59.689

left

52.833

5.736

7.613

39.489

66.178

right

39.793

5.597

12.350

27.636

51.949

left

55.430

6.405

11.928

41.466

69.394

right

41.879

5.702

5.965

27.908

55.851

left

51.775

7.836

10.410

34.409

69.142

right

49.767

8.253

16.355

32.303

67.231

left

65.936

7.047

12.980

50.709

81.163

right

62.211

5.860

7.198

48.431

75.991

left

62.768

4.200

3.832

50.903

74.633


5.022

7.209

31.843

55.454

2.00 3.00 4.00 5.00 6.00 7.00 8.00 pc

1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00

right

Fixations on targets Means: method sum fix dur sum fix dur sum fix dur sum fix dur N 1 Avg

0.530550

0.185941

0.154748

0.906352

20

2 Pc

0.862227

0.177288

0.503915

1.220540

22

Univariate Tests of Significance for sum fix dur (fc - avg targets) Sigma-restricted parameterization Effective hypothesis decomposition

98

SS

Degr. of

MS

F

p


20.32201 29.38898 0.000003

method 1.15248

1.15248

Error

1

27.65936 40

1.66668

0.204115

0.69148

FC-PC target comparison Fixations on sides Model Dimension(a)

Fixed Effects

Intercept



Subject Variables

2

1

pic

8

7

2

1

16

7

grp * side

4

1

pic * side

16

7

grp * pic * side

32

7

trial

16

grp * pic

Total

Diagonal

97

Information Criteria(a) 1069.622


1101.622


1106.523


1163.254


1147.254

16 48


-2 Restricted Log Likelihood

Number of Subjects

1

grp side

Repeated Effects

Number of Levels 1



99

sub

10

Source Intercept

Numerator df 1


F 1785.758

Sig. .000

grp

1

105.293

7.651

.007

pic

7

28.342

.957

.480

side

1

105.293

3.056

.083

grp * pic

7

28.342

.872

.540

grp * side

1

105.293

.105

.746

pic * side

7

28.342

1.653

.161

28.342

.254

.967

grp * pic * side

7 a Dependent Variable: fix_cnt_pct.

Estimated Marginal Means 1. grp Estimates(a) 95% Confidence Interval grp fc

Mean

Std. Error

df

1.530

105.293

Lower Bound 39.691

48.709 1.530 a Dependent Variable: fix_cnt_pct.

105.293

45.676

pc

42.724

Upper Bound 45.758 51.743

Pairwise Comparisons(b) 95% Confidence Interval for Difference(a)

Mean Difference (I) grp (J) grp (I-J) Std. Error df Sig.(a) Lower Bound Upper Bound fc pc -5.985(*) 2.164 105.293 .007 -10.275 -1.695 pc fc 5.985(*) 2.164 105.293 .007 1.695 10.275 Based on estimated marginal means * The mean difference is significant at the .05 level. a Adjustment for multiple comparisons: Least Significant Difference (equivalent to no adjustments). b Dependent Variable: fix_cnt_pct.

Tests of simple effect(a)

Numerator df Denominator df F Sig. 1 105.293 7.651 .007 The F tests the effect of grp. This test is based on the linearly independent pairwise comparisons among the estimated marginal means. a Dependent Variable: fix_cnt_pct.

100

2. grp * pic(a) 95% Confidence Interval grp fc

pc

pic 1.00

Mean

Std. Error

df

42.953

3.182

15.892

Lower Bound 36.204

Upper Bound 49.702

2.00

39.980

5.630

15.988

28.045

51.915

3.00

41.889

3.624

15.341

34.180

49.598

4.00

38.072

4.299

12.402

28.739

47.404

5.00

49.566

4.893

14.868

39.129

60.003

6.00

45.705

4.631

15.688

35.872

55.538

7.00

42.528

4.699

15.111

32.518

52.537

8.00

41.103

2.986

15.981

34.771

47.434

1.00

48.459

3.182

15.892

41.711

55.208

2.00

52.093

5.630

15.988

40.158

64.028

3.00

49.289

3.624

15.341

41.580

56.998

4.00

40.156

4.299

12.402

30.823

49.488

5.00

45.199

4.893

14.868

34.762

55.636

6.00

51.769

4.631

15.688

41.936

61.602

7.00

48.369

4.699

15.111

38.359

58.378

8.00

54.342

2.986

15.981

48.011

60.674


3. grp * side(a) 95% Confidence Interval grp fc pc

side left

Mean

Std. Error

df

44.264

2.328

53.997

Lower Bound 39.597

Upper Bound 48.932

right

41.184

1.986

53.874

37.203

45.166

left

50.952

2.328

53.997

46.284

55.619

1.986

53.874

42.486

50.449

right


4. grp * pic * side(a) 95% Confidence Interval grp fc

pic 1.00 2.00 3.00 4.00 5.00 6.00

side left

Mean

Std. Error

df

48.640

4.681

8

Lower Bound 37.845

Upper Bound 59.435

right

37.266

4.310

8

27.327

47.205

left

41.745

8.071

8

23.135

60.356

right

38.215

7.851

8

20.111

56.319

left

41.188

4.563

8

30.666

51.709

right

42.590

5.631

8

29.606

55.575

left

46.516

7.541

8

29.127

63.905

right

29.627

4.129

8

20.105

39.150

left

48.070

7.816

8

30.046

66.094

right

51.062

5.888

8

37.484

64.640

left

47.400

6.996

8

31.268

63.531

101

right

44.011

6.070

8

30.014

58.008

left

38.336

7.407

8

21.254

55.418

right

46.720

5.784

8

33.382

60.057

left

42.220

4.150

8

32.651

51.789

right

39.985

4.296

8

30.079

49.891

left

57.521

4.681

8

46.726

68.317

right

39.397

4.310

8

29.458

49.337

left

53.451

8.071

8

34.840

72.062

right

50.735

7.851

8

32.630

68.839

left

53.306

4.563

8

42.784

63.828

right

45.272

5.631

8

32.288

58.257

left

44.097

7.541

8

26.708

61.486

right

36.215

4.129

8

26.693

45.737

left

45.039

7.816

8

27.015

63.063

right

45.359

5.888

8

31.781

58.937

left

52.806

6.996

8

36.674

68.938

right

50.732

6.070

8

36.735

64.729

left

44.006

7.407

8

26.924

61.087

right

52.731

5.784

8

39.394

66.069

left

57.388

4.150

8

47.818

66.957


4.296

8

41.390

61.202

7.00 8.00 pc

1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00

right

Fixations on targets Means: method; Weighted Means (fc - pc targets) Current effect: F(1, 51)=3.2628, p=.07677 Type II decomposition method sum fix dur sum fix dur sum fix dur sum fix dur N 1 pc

0.871857

0.164702

0.537142

1.206572

35

2 FC

0.562429

0.113947

0.328628

0.796229

28

Univariate Tests of Significance for sum fix dur (fc - pc targets) Sigma-restricted parameterization Type II decomposition SS

Degr. of

MS

F

p


33.97246 48.38103 0.000000

method 2.29107

1

2.29107

3.26276

0.076772

image

4.63795

7

0.66256

0.94358

0.481824

side

1.30799

3

0.43600

0.62091

0.604704

Error

35.81146 51

0.70219

102

Appendix 5 – Example of original and fused images. Targets are marked in the fused average image.

Original Band 1

Original band 2

Fused image using the average method

103

Fused image using the edge method

Fused method using the false color method

Fused method using the principle components method

104

Appendix 6 – List of images used in the experiments Here is a list of image names which we have used in the experiments. The names are as they came in the disk from MAFFAT Image comparison: R2_1_d1_d2 R2_1_d1_d3 R2_1_d1_d3_i1 R2_1_D4_1C4_I3 R2_1_D2_I1 R2_2_D1_D3_I1 R2_2_D1_I1 R2_2_D4_1C4_RI3 R4_1_1C2_1D2 R4_1_1C2_1I1 R4_1_1C2_2I1 R4_1_1D4_I3 R4_2_1C2_1D2 R4_2_1D4_1C4_L_S R4_2_1D4_1C4_R_S R4_2_2D4_3C4_LI3 R4_2_2D4_LI3 R4_2_2D4_3C4_RI3 R4_2_2D4_RI3

Target detection and eye tracking: R2_1_D1_D2 R4_1_1C2_1I1 R2_1_D4_1C4_I3 R4_1_1C2_1D2 R2_2_D1_I1 R4_2_1C2_1D2 R4_2_3C2_LI1 R4_2_2D4_RI3

105

Comparing multispectral image fusion methods for a ... - CiteSeerX

Comparing multispectral image fusion methods for a ... - CiteSeerX

Suggest Documents

Multispectral image fusion for vehicle Identification

a novel fusion-based unsupervised approach for multispectral image ...

Hyperspectral and Multispectral Image Fusion via

Multispectral image fusion for illumination-invariant palmprint ... - PLOS

Active Multispectral Illumination and Image Fusion for Retinal ...

simple multispectral image analysis for systemically ... - CiteSeerX

simple multispectral image analysis for systemically ... - CiteSeerX

Multispectral Image Analysis for Phosphorus ... - CiteSeerX

Wavelet for Image Fusion - CiteSeerX

Wavelet for Image Fusion - CiteSeerX

An Overview of Different Image Fusion Methods for ... - CiteSeerX

A Comparative Analysis of Image Fusion Methods - CiteSeerX

fusion of digital multispectral videography with ... - CiteSeerX

Fuzzy Methods and Image Fusion in a Digital Image Processing

Nonlinear Fusion of Multispectral Citrus Fruit Image Data with ... - MDPI

MRF-Based Multispectral Image Fusion Using an Adaptive ... - arXiv

Multispectral Multifocus Image Fusion with Guided ... - Google Sites

Multispectral Image Fusion and Colorization www.researchgate.net › profile › publication › data › 9

Data fusion used in multispectral system for critical ... - CiteSeerX

Image Registration in Multispectral Data Sets - CiteSeerX

multispectral image processing and pattern recognition ... - CiteSeerX

Comparing methods of brand image measurement

Information Fusion for Image Analysis: Geospatial ... - CiteSeerX

A wavelet-based image fusion tutorial - CiteSeerX