Fusion of Visible, Infrared and 3D LADAR Imagery - International ...

Fusion of Visible, Infrared and 3D LADAR Imagery ∗ David A. Fay, Allen M. Waxman, Jacques G. Verly, Michael I. Braun, Joseph P. Racamato, and Carl Frost M.I.T. Lincoln Laboratory Sensor Exploitation Group Lexington, MA 02420 [email protected] Abstract - We have extended our previous capabilities for fusion of multiple passive imaging sensors to now include 3D imagery obtained from a prototype flash ladar. Real-time fusion of SWIR + uncooled LWIR and low-light visible + LWIR + 3D LADAR is demonstrated. Fused visualization is achieved by opponent-color neural networks for passive image fusion, which is then textured upon segmented object surfaces derived from the 3D data. An interactive viewer, coded in Java3D, is used to examine the 3D fused scene in stereo. Interactive designation, learning, recognition and search for targets, based on fused passive + 3D signatures, is achieved using Fuzzy ARTMAP neural networks with a Java-coded GUI. A client-server, web-based communication architecture, enables remote users to interact with fused 3D imagery via a wireless palmtop computer. Keywords: Sensor fusion, image fusion, real-time processing, data mining, target recognition, ladar, range data, 3D models.

1 Background Sensor operators on commercial and military platforms need to quickly make decisions based on information from multiple sources of imagery. As the number of imaging sensors increases, the variety of data increases, but so does the amount of data the operators need to process. Fusing the imagery from the different sensors, while enhancing the complementary information, can decrease the operator’s workload and improve performance by increasing target popout. The work presented here addresses the issue of combining imagery from passive sensors (low-light visible, SWIR, and LWIR) with range data from an active imaging ladar, to support 3D fused interactive visualization. In addition, the fused results provide input to an interactive target learning, recognition, and tracking system that can help cue an operator to potential targets of interest.

∗

We build upon work that we have presented over the last several years focusing on the real-time fusion of low-light visible and thermal infrared imagery [1-13]. We have also demonstrated the modification of these techniques to the fusion of imagery from up to six different bands, ranging from the visible to SWIR, as well as fusion of visible, infrared and SAR (synthetic aperture radar) imagery [14-17]. The addition of range imagery to the fused low-light visible and infrared imagery provides 3D spatial context, greatly enhancing scene understanding. Prior to our introduction of opponent-color image fusion, other methods of image fusion were based upon maximizing image contrasts across multiple scales via pixel comparisons and image blending [18-22]. Human factors testing has shown that the resulting gray-scale fused images do not provide for the same degree of target pop-out as our color fused results [13,23]. There are other color fusion methods that have also shown improvement over the gray-scale fusion methods for target detection, but they do so at the cost of overall visual quality [13,23,24]. Our paper is organized as follows: after the sensors and computing hardware used are described, the biological motivations and image fusion system architectures are introduced. Range image cleaning and 3D model generation methods are explained, followed by examples of multisensor fusion for visualization and as input to the target learning and recognition system.

2 Sensors and computing hardware Our real-time multi-sensor imaging platform is shown in Figure 1. On the left is the brassboard active imaging ladar, developed at Lincoln Laboratory [25,26], which measures range from the sensor to the scene being imaged. The ladar illuminates the scene with a 30-µJ frequency-doubled (532 nm) µchip laser, and then detects the reflected laser light with a 4x4 array of Silicon Geiger-mode avalanche photodiode detectors (APDs), which are sensitive enough to

This work was sponsored by the Defense Advanced Research Projects Agency under Air Force contract F19628-00-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the United States Government.

Figure 1: Multi-sensor imaging platform. The platform supports the active imaging Lincoln Laboratory LADAR, on the left, and the multi-sensor imaging pod, containing four passive sensors, on the right. On the top shelf, the camera on the left is the Lincoln Laboratory low-light CCD and on the right is the Raytheon Amber MWIR imager. The Sensors Unlimited SWIR camera sits on the left side of the bottom shelf, behind the dichroic beam splitter. On the bottom right, behind the mirror, is the Lockheed Martin LWIR imager. detect a single photon of light. Measuring the time-of-flight of the laser light at each pixel produces the range information. The laser and detector array are scanned using mirrors to create a 148x108 pixel image. This scanning process restricts the ladar to creating a new frame of data once every 1.5 seconds. The ladar has a resolution of 1”x1”x1” at a distance of approximately 200ft, which is the distance of the objects in the experiments described in this paper. Recent progress at Lincoln Laboratory has led to the development of a new 32x32 APD array imaging at 1kHz. The right side of Figure 1 shows the passive sensor pod. On the bottom layer is a dual-sensor configuration consisting of an InGaAs SWIR camera from Sensors Unlimited and an uncooled microbolometer LWIR imager from Lockheed Martin Infrared. The SWIR camera is sensitive from 0.9 to 1.7 microns and can output 12 bit digital data, while the LWIR imager operates in the range 7 to 13 microns and outputs 15 bit digital data. They both operate at 60 fps (frames per second) and are 320x240 pixels in resolution. The two sensors are optically registered by using a dichroic beam splitter that passes the SWIR band while reflecting the LWIR band. These two sensors feed our real-time fusion system that produces color-fused results at 30 fps. The second layer of the sensor pod in Figure 1 supports a Lincoln Laboratory low-light CCD (LLCCD) imager of 640x480 resolution, that outputs 12 bit digital data at 30 fps, and an MWIR InSb imager from Raytheon Amber of 256x256 resolution, that outputs 12 bits digital data at 60 fps. The low-light CCD imager is sensitive from 0.4 to 1.0 microns while the MWIR camera operates from 3 to 5 microns. These two cameras are bore-sighted to each other and to the SWIR / LWIR pod below. These offsets are

compensated for in software at the workstation when the imagery is registered off-line by making use of the 3D range image produced by the ladar. For the experiments discussed in this paper we used the SWIR / LWIR configuration, shown on the bottom layer of the pod, and later replaced the SWIR sensor with the lowlight CCD imager to create a LLCCD / LWIR configuration that is optically registered by mirrors. The MWIR camera was not used for these experiments but future plans include fusing imagery from all four passive sensors with the range data from the ladar. An example image from each sensor is shown in Figure 2. One important attribute of the SWIR camera is that it does not pick up the dye pigments in the clothing, giving it the potential to detect camouflage in the presence of foliage [14,34]. The real-time fusion processor we assembled had to be able to process two 16-bit digital input streams and perform 1.5 billion operations per second. We chose the Matrox Genesis boards because they were based on the powerful TI C80 DSP chips, and their modular architecture made the system easily expandable. The C80 processor consists of one master floating point processor and four parallel integer processors. The Genesis boards come in two versions: a main board and a co-processor board. The main board contains one C80, 16MB of SDRAM, two VIA (Video Interface ASIC) chips for independently controlling communications, an optional NOA (Neighborhood Operation Accelerator) chip, an analog/digital grab daughter board, and a video display section. The co-processor board contains two sets of the chips on the main board except for the data acquisition and display sections. The range data from the PC supporting the LADAR, and the fused imagery from our real-time fusion processor, are sent via FireWire to the Silicon Graphics 540 computer that serves as our 3D visualization processor. The SGI 540 contains four 500Mhz Pentium III processors and the SGI Cobalt graphics chip set. The 3D models can be displayed in stereo by mounting a polarized StereoGraphics Z-screen in front of the monitor. Users wearing polarized glasses can then view the scene in stereo without having to be tethered to the computer.

3 Biologically motivated image fusion Our image fusion architectures are motivated by the biological computational processes of the human retina and the primary visual cortex. The three different cone cells in the retina are sensitive to the short, medium, and long wavelengths of the visible spectrum [27]. The outputs from these photoreceptors are then contrast enhanced within band by both ON and OFF center-surround spatial opponent processes at the bipolar cells [28]. In later stages (ganglion cells in retina and V1) these signals are contrast enhanced by center-surround processes between the different bands [29,30]. This opponent-color processing separates the

(a)

(b)

(c)

(d)

Figure 2: Example imagery from the four sensors under normal indoor lighting conditions. The targets include two men, a glass fish tank full of water on a table in the center, a nose cone pointing towards the sensors, and a white cardboard poster on a tripod in the background. These targets are 65m-75m from the sensor array. Multi-sensor fusion exploits the complementary information captured by the four sensors and maintains the highest resolution. (a) Lincoln Laboratory lowlight visible CCD. (b) Sensors Unlimited SWIR. (c) Lockheed Martin LWIR. (d) Lincoln Laboratory LADAR range data (white = near, black = far). complementary information that each band contains. This insight into how the visual system contrasts and combines information from different spectral bands provides one example of a working multi-spectral fusion system. Other biological fusion systems of relevance are the visible / thermal processing pathways of the rattlesnake and python [6,32,33]. The experiments described in this paper use two different dual-sensor configurations: low-light visible / LWIR and SWIR / LWIR. Since the visible and SWIR sensors have distinct characteristics, different fusion architectures are used to combine the two sets of imagery. However, each architecture utilizes the same center-surround shunting network [33] as a fundamental building block, for both within-band and cross-band (i.e., opponent-color) contrast enhancement. As illustrated in Figure 3, the first stage of processing in the visible / LWIR architecture is image registration and noise cleaning. Next, each band is separately contrast enhanced and adaptively normalized by shunting centersurround processing. The LWIR image is processed by both ON-center / OFF-surround and OFF-center / ON-surround shunting networks. This treats the warm parts of the scene separately from the cool parts, since both are often equally as important. In a second stage of center-surround processing we form grayscale fused single-opponent color-contrast images. These three processes will form the red, green and blue components of the final color fused image. The enhanced ON-center LWIR (+LWIR) feeds the center of the red channel, while the enhanced visible (+Vis) feeds the surround. The green channel gets +Vis in the center and both +LWIR and the OFF-center LWIR (-LWIR) in the surround. Finally, -LWIR feeds the blue channel center and the +Vis feeds the surround. The net effect is a decorrelation of the sensor imagery being used to drive the displayed color contrasts. An optional final step is to convert the RGB image into HSV space then remap the hue values of the entire image by means of a user controlled rotation of the hue circle. This can be done to provide a more natural coloring to

Figure 3: Low-light Visible / LWIR fusion architecture. After registration and noise cleaning, the imagery from each sensor is adaptively normalized and contrast enhanced. The resulting imagery is combined three different ways to form the red, green and blue output channels that drive the color display after an optional color remapping stage. Computations involve center-surround shunting neural networks to derive opponent-color contrasts that decorrelate the imagery. the background scene while maintaining target pop-out. A similar architecture is used to perform SWIR / LWIR fusion (see Figure 4 in [1]). An example of dual-sensor fusion results is shown in Figure 5b. For reference, a photo of the central portion of the scene is shown in Figure 5a. The scene contains two human targets, a table supporting a fish tank full of water and two paint cans, a white poster board mounted on a tripod, and a large nose cone mounted on a rack. The low-light visible / LWIR fusion result, shown in Figure 5b, demonstrates how the fusion process is able to retain the complementary information from each sensor. The high resolution of the visible imagery is maintained, such that people and objects are still identifiable, while the lower resolution thermal information from the LWIR sensor is presented as changes in color: warmer objects like people and lights are yellow or

red, and cooler objects like the background and fish tank are blue.

4 Passive + active 3D image fusion Real-time fusion of the imagery from the passive sensor pairs is implemented on the PC containing the Matrox Genesis boards, as described in Section 2. The system produces 24bit color results, at 30fps, for display to the user and sends the data over FireWire to the SGI 540 for further processing. Range data cleaning, 3D model generation, interactive 3D visualization, and interactive target learning and search are all performed on the quad-Pentium SGI 540 computer.

4.1 Range data preparation Before the range data can be converted into a 3D surface mesh, some preprocessing must be performed to remove various artifacts present in the raw data. The first stage of processing removes “dropouts” by replacing their values with appropriate neighboring values. The second stage removes bright and dark “outliers” by performing grayscale mathematical morphology. Dropout pixels are locations in the range image where an APD did not receive a return pulse from the laser during the appropriate temporal window. There can be a number of reasons for this to occur: the lack of an object in the scene (such as sky), the reflective material on an object, the angle of a reflective surface, or a faulty APD. The left image in Figure 4 shows an example of a single frame of raw range data. The center image in Figure 4 shows an enlarged view of the nose cone on the right side of the range image. The grayscale images represent the range to the objects, with brighter values being further in range than darker values. Black pixels represent the dropouts. The noticeable, regular

pattern of dropouts is caused by a faulty APD in the 4x4 array, which is scanned to create a 148x108 image. The dropouts are removed in a two-step process that replaces dropout clusters as large as 2x2 pixels (as can be seen in the center image of Figure 4) with the most appropriate neighboring values [35]. The outliers are pixels that do not represent the actual range to the object at that location. Bright and dark outliers have ranges that are respectively too far or too close. For this prototype ladar it is believed that the majority of these outliers are caused by cross-talk between the APDs. These locally bright and dark pixels are cleaned using grayscale mathematical morphology (MM) dilation and erosion operators [35]. The end result is that local outliers are clipped back to the maximum or minimum value of the neighboring pixels. The right image in Figure 4 shows the final cleaned range data of the nose cone region. The range data is now ready to be turned into a surface mesh, but first it must be registered with the passive imagery that will be used to texture the 3D model.

4.2 Passive imagery and range data registration The registration problem of the passive sensors is solved in hardware by using a dichroic beam-splitter and a relay mirror to optically align the two cameras. Registration between the passive sensors and the ladar, however, has to be performed in software. The passive sensor pod is bolted to the ladar platform to prevent them from shifting relative to one another. This allows us to create a single transformation matrix that registers the passive sensor imagery to the ladar range data. Once the transformation is determined via a calibration process, it is applied to any scene.

Figure 4: Range data cleaning. Left: An example of a single frame of the raw range data received from the LADAR. Brighter gray values represent larger ranges, darker values are smaller ranges, and black values represent dropouts. Center: An enlarged version of the cone region in the raw range image, showing more clearly the outliers and dropouts. Right: The resulting range image after removal of the outliers and dropouts. Further processing is used to segment objects and refine boundaries.

5 3D visualization

6 Target learning and search

The ladar produces an angle-angle range image that must be transformed to Cartesian coordinates before a surface mesh can be created. Since the angle covered by each pixel is known to be 0.35 mrad, the 3D location of each range data point can be calculated. The next step is to perform the triangulation necessary to create the surface mesh. The range data comes in the form of a regular grid, making the triangulation a simple matter of connecting four neighboring vertices with two triangles. However, since neighboring vertices are not always parts of the same object, not making a connection can provide segmentation of the 3D scene. The criteria used for making this decision are the range differences between neighboring vertices. If there are three or four vertices, whose relative range differences pass a userdefined threshold, then one of six possible triangular configurations is chosen [37]. The configuration is selected based on the relative ranges of neighboring vertices compared to a user-defined threshold. The surface mesh can next be textured with the registered passive imagery, creating a fused image that combines the complementary information from the passive sensors with the range information from the ladar. To display the 3D model, we created a Java3D viewer that allows the user to manipulate the model, change the rendering mode, switch the passive modality that textures the model, and view the model in stereo. The model can be updated at a rate limited by the scan rate of the ladar system, which generates one frame of range data every 1.5 seconds. The field-of-view (FOV) of the ladar is 3° diagonal, while the FOV of the passive sensors is 7° diagonal. As a result, only the central portion of the passive imagery is modeled in 3D. An example of a 3D model with no textures is shown in Figure 5c. Here the model is color-coded to enhance the depth information, from red, representing near ranges, to blue and magenta, representing far ranges. The physical relationship between the men, cone and table become apparent once the model is rotated. However, it is difficult to identify people and objects in the scene based on shape information alone. Figure 5d shows the same model, viewed from a different perspective, textured with the low-light visible and LWIR fused results. The high-resolution lowlight visible sensor provides context for the 3D data and allows identification of the people and objects. The thermal information, showing up as color variations in the image, allows quick segmentation of the hot and cold objects in the scene. The warm people pop-out from the surrounding, cooler background. This visualization tool allows a user to quickly understand information from three very distinct sensors by viewing the fused imagery in one display. Future plans include combining multiple views of the scene, taken from different viewing angles, to create a full 3D representation of the scene.

The results from multi-sensor fusion are not only useful for visualization but also for target learning and recognition. We have developed an interactive real-time image mining system for target learning and search in multi-sensor imagery. The system is built around the Fuzzy ARTMAP neural network [38,39]. The results from the stages of the multi-sensor fusion architectures are combined to form a data cube, where each pixel represents a feature vector that serves as the input to the Fuzzy ARTMAP classifier. Other image planes containing processed results can be added to the data cube, and hence, the approach easily accommodates other registered sensor data.

6.1 Range feature extraction As an additional feature in the data cube, the range information can be used to separate targets from background. However, the raw range data by itself would not be useful for tracking targets over time since in most scenarios it is very unlikely that the target would remain at a constant distance from the sensor. A more useful feature would be a measure of the physical dimensions of a target computed from the range information. Physical characteristics such as width, height, depth, and volume can be calculated if we can successfully separate the objects in the scene. These characteristics should remain fairly constant as an object moves about in range, provided the object does not rotate while moving and has no articulating parts. These limitations will be addressed in future work. The range image segmentation is performed in a two-stage process. The first stage separates the range data into discrete range regions using a histogram thresholding technique. The range histogram is calculated then smoothed with a Gaussian kernel. The local minima are then used as threshold values to create binary images representing range regions in the data [40]. The assumption being that the peaks in the histogram correspond to the objects in the scene, while the valleys represent the gaps between the objects. The second stage labels “blobs” in each resulting binary image from stage one using connected components [41,42]. These labeled images are then merged to create an image that contains unique labels for all the objects and object parts in the scene. For each labeled object in the image we next calculate the physical dimensions for that object. For the preliminary experiments described here we only use the depth feature as an additional layer in the input data cube. Depth is calculated by taking the difference between the maximum and minimum range values for each object. The depth of an object only represents the parts that can be seen from a single viewing angle. Occlusion of the far side of an object prevents the true depth from being calculated. Each object then gets assigned its own depth value. To increase the robustness of this feature as an input to Fuzzy ARTMAP, Gaussian noise

(a)

(b)

(c)

(d)

Figure 5: Multi-sensor fused imagery collected from a single aspect. (a) Photo showing the content of the central part of the scene. (b) Fused low-light visible / LWIR. (c) Color–coded range data modeled in 3D (red = near, magenta = far). (d) 3D model textured with color fused low-light visible / LWIR.

Figure 6: Examples of interactive target designation, learning and search. Left: Example color fused SWIR / LWIR image overlaid with the example (green) and counter-example (red) pixels used for the training the Fuzzy ARTMAP neural network to detect the human targets. Right: Fused SWIR / LWIR result with the addition of the search results overlaid in white and yellow. White returns indicate high confidence results, while yellow reflect lower confidence search results.

was added, based on that frame’s range statistics. For the example in this paper, six feature layers were used in the data cube that serves as the input structure to the target learning and search system: enhanced SWIR, enhanced LWIR, three opponent contrast images from the SWIR / LWIR fusion process (driving the red, green, and blue output channels), and object depth.

6.2 Interactive target learning and recognition The user is provided with an interface that allows him to select example (target) and counter-example (non-target) pixels from the color fused image. These features serve as the training set to the neural classifier. A preliminary search on a subset of the image can be performed to determine if the network is trained to satisfaction. If not, more examples and counter-examples can be selected in an iterative process. This process takes only seconds to execute. Once satisfied, the whole image can be searched. Pixels classified as targets are highlighted based on the confidence level of the network. Since the order of presentation of the input vectors is important to Fuzzy ARTMAP, multiple networks are trained with different ordered sets of inputs. The networks then vote on whether a given pixel should be classified as a target or a non-target. The voter consensus determines the degree of confidence in the classification of a given pixel. The system also uses an algorithm that examines the overlap between example and counter-example feature vectors in determining which components of the input feature vectors are necessary to achieve correct classification of the training set. This reduced feature vector embodies a differential signature of the target with respect to the context. When performing the search on subsequent fused imagery, only those components are used, significantly speeding up the search process. After training on an initial scene, subsequent frames will be searched and results displayed. This system can be used for both tracking targets over time or for detecting their arrivals and departures from the scene. Multiple search agents can be trained to search for multiple target types. The left image of Figure 6 shows an example of a userdefined training set for detecting human targets. Example target pixels of the man on the left are shown in green, while counter-examples are displayed in red. After the network was trained, the entire image was searched, producing the results shown in the right image of Figure 6. Pixels that match the characteristics of the training set are highlighted in white or yellow, depending on the confidence of the network. White indicates high confidence, while yellow reflects lower confidence. Even though example pixels were only chosen from the man on the left, the network was able to detect both men in the scene. For this example, the network determined that the blue channel image (–LWIR contrasted with +Ave, see Section 3) and object depth were the two most important features.

7 Summary We have developed and demonstrated a multi-sensor fusion system, based upon biological models of the human visual system, that combines imagery from multiple passive sensors with range data from a prototype flash ladar. A real-time system for fusing low-light visible + LWIR imagery and SWIR + LWIR imagery is described. The fused passive sensor imagery is used to texture a 3D model created from ladar range data at the rate of once every 1.5 seconds. The resulting imagery from the multi-sensor fusion system is also used as input to an interactive tool for target learning and search based on Fuzzy ARTMAP neural networks. In related work we have used this same approach to combine and exploit remotely sensed multi-spectral, synthetic aperture radar, and hyper-spectral imagery in the context of 3D site models [16,17]. Future work will refine our approach to LADAR data processing by incorporating local surface patch models, and extend this paradigm to include fusion of imagery from multiple viewing aspects into a unified 3D textured site model.

References [1] D.A. Fay, A.M. Waxman, M. Aguilar, D.B. Ireland, J.P. Racamato, W.D. Ross, W.W. Streilein, and M.I. Braun, “Fusion of 2- / 3- / 4-Sensor Imagery for Visualization, Target Learning and Search”, Proc. of SPIE Conf. On Enhanced and Synthetic Vision 2000, SPIE-4023, pp. 106-115, 2000. [2] M. Aguilar, D.A. Fay, D.B. Ireland, J.P. Racamato, W.D. Ross, and A.M. Waxman, “Field Evaluations of Dual-Band Fusion for Color Night Vision,” Proc. of SPIE Conf. On Enhanced and Synthetic Vision 1999, SPIE-3691, pp. 168-175, 1999. [3] M. Aguilar, D.A. Fay, W.D. Ross, A.M. Waxman, D.B. Ireland, and J.P. Racamato, “Real-time fusion of low-light CCD and uncooled IR imagery for color night vision,” Proc. of SPIE Conf. On Enhanced and Synthetic Vision 1998, SPIE-3364, 1998 [4] R. K. Reich, B. E. Burke, W. M. McGonagle, D. M. Craig, A. M. Waxman, E. D. Savoye, and B. B. Kosicki, “Low-light-level 640x480 CCD camera for night vision application,” Proc. of the Meeting of the IRIS Specialty Group on Passive Sensors 1998, 1998. [5] A.M. Waxman, M. Aguilar, D.A. Fay, D.B. Ireland, J.P. Racamato, W.D. Ross, J.E. Carrick, A.N. Gove, M.C. Seibert, E.D. Savoye, R.K. Reich, B.E. Burke, W.H. McGonagle, and D.M. Craig, “Solid state color night vision: Fusion of low-light visible and thermal IR imagery,” Lincoln Laboratory Journal, 11 (1), pp. 41-60, 1998. [6] A.M. Waxman, A.N. Gove, D.A. Fay, J.P. Racamato, J.E. Carrick, M.C. Seibert and E.D. Savoye, “Color Night Vision: Opponent Processing in the Fusion of Visible and IR Imagery,” Neural Networks, 10 (1), pp. 1-6, 1997. [7] A.M. Waxman, A.N. Gove, D.A. Fay, J.P. Racamato, J. Carrick, M.C. Seibert, E.D. Savoye, B.E. Burke, R.K. Reich, W.H. McGonagle, and D.M. Craig, “Solid state color night vision: Fusion of low-light visible and thermal IR imagery,” Proc. of the Meeting of the IRIS Specialty Group on Passive Sensors 1996, II, pp. 263280, 1996.

[8] A.M. Waxman, A.N. Gove, M.C. Seibert, D.A. Fay, J.E. Carrick, J.P. Racamato, E.D. Savoye, B.E. Burke, R.K. Reich, W.H. McGonagle, and D.M. Craig, “Progress on Color Night Vision: Visible/IR Fusion, Perception & Search, and Low-Light CCD Imaging,” Proc. SPIE Conf. on Enhanced and Synthetic Vision 1996, SPIE-2736, pp. 96-107, 1996. [9] A.M. Waxman, D.A. Fay, A.N. Gove, M.C. Seibert, J.P. Racamato, J.E. Carrick and E.D. Savoye, “Color night vision: Fusion of intensified visible and thermal IR imagery,” Synthetic Vision for Vehicle Guidance and Control, SPIE-2463, pp. 58-68, 1995. [10] A.M. Waxman, A.N. Gove, D.A. Fay, and J.E. Carrick, “RealTime Adaptive Digital Image Processing for Dynamic Range Remapping of Imagery Including Low-light-level Visible Imagery,” U.S. Patent 5,909,244, issued 6/1/99 (filed 9/5/96); rights assigned to the MIT, 1999. [11] E.D. Savoye, A.M. Waxman, R.K. Reich, B.E. Burke, J.A. Gregory, W.H. McGonagle, A.H. Loomis, B.B Kosicki, R.W. Mountain, A.N. Gove, D.A. Fay, J.E. Carrick, “Low-light-level Imaging and Image Processing,” U.S. Patent 5,880,777, issued 3/9/99 (filed 4/15/96); rights assigned to the MIT, 1999. [12] A.M. Waxman, D.A. Fay, A.N. Gove, M.C. Seibert, and J.P. Racamato, “Method and apparatus for generating a synthetic image by the fusion of signals representative of different views of the same scene,” U.S. Patent 5,555,324, issued 9/10/96 (filed 11/1/94); rights assigned to the MIT, 1996. [13] A. Toet, J. K. IJspeert, A. M. Waxman, and M. Aguilar, “Fusion of visible and thermal imagery improves situational awareness,” Proc. SPIE Conf. on Enhanced and Synthetic Vision, SPIE-3088, pp. 177-188, 1997. [14] A.M. Waxman, M. Aguilar, R.A. Baxter, D.A. Fay, D.B. Ireland, J.P. Racamato, and W.D. Ross, “Opponent-color fusion of Multi-sensor Imagery: Visible, IR and SAR,” Proc. of the Meeting of the IRIS Specialty Group on Passive Sensors 1998, I, pp. 43-61, 1998. [15] A.N. Gove, R.K. Cunningham, and A.M. Waxman, “Opponent-color visual processing applied to multispectral infrared imagery,” Proc. of the Meeting of the IRIS Specialty Group on Passive Sensors 1996, II, pp. 247-262, 1996. [16] W. Streilein, A. Waxman, W. Ross, F. Liu, M. Braun, D. Fay, P. Harmon, and C.H. Read, “Fused Multi-Sensor Image Mining for Feature Foundation Data”, Proceedings of 3rd International Conference on Information Fusion, Paris, France, July 10-13, 2000. [17] W.D. Ross, A.M. Waxman, W.W. Streilein, M. Aguilar, J. Verly, F. Liu, M.I. Braun, P. Harmon, and S. Rak, “Multi-Sensor 3D Image Fusion and Interactive Search,” Proceedings of 3rd International Conference on Information Fusion, Paris, France, July 10-13, 2000. [18] P. J. Burt and R. J. Kolczynski, “Enhanced image capture through fusion,” Fourth International Conference on Computer Vision, pp. 173-182. Los Alamitos: IEEE Computer Society Press, 1993. [19] D. Ryan and R. Tinkler, “Night pilotage assessment of image fusion,” Helmet- and Head-Mounted Displays and Symbology Design Requirements II, SPIE-2465, pp. 50-67, 1995. [20] A. Toet, “Hierarchical image fusion,” Machine Vision and Applications, 3, pp. 1-11, 1990. [21] A. Toet, “Multiscale contrast enhancement with applications to image fusion,” Optical Eng., 31, 1026-1031, 1992. [22] A. Toet, L. J. van Ruyven and J. M. Valeton, “Merging thermal and visual images by a contrast pyramid,” Optical Eng., 28, pp. 789-792, 1989.

[23] P. M. Steele and P. Perconti, “Part task investigation of multispectral image fusion using gray scale and synthetic color night vision sensor imagery for helicopter pilotage,” Targets and Backgrounds: Characterization and Representation III, SPIE-3062, pp. 88-100, 1997. [24] A. Toet and J. Walraven, “New false color mapping for image fusion,” Optical Engineering, 35, pp. 650-658, 1996. [25] D.G. Fouche, B.F. Aull, M.A. Albota, R.M. Heinrichs, J.J. Zayhowski, M.E. O’Brien, and R.M. Marino, “Three-Dimensional Imaging Laser Radar Using Microchip Lasers and Geiger-Mode Avalanche Photodiodes: An Update,” IRIS Proc. Active Systems, Dayton, Ohio, April 2000. [26] R.M. Heinrichs, B.F. Aull, S. Kaushik, D.G. Kocher, and M.A. Albota, “Technology development and performance simulation of 3D imaging laser radar for advanced seekers,” IRIS Proc. Active Systems, Albuquerque, New Mexico, March 1998. [27] P.K. Kaiser and R.M. Boynton, Human Color Vision, Optical Society of America, Washington, DC, 1996. [28] P. Schiller, “The ON and OFF channels of the visual system,” Trends in Neuroscience, TINS-15, pp. 86-92, 1992. [29] P. Schiller and N. K. Logothetis, “The color-opponent and broad-band channels of the primate visual system,” Trends in Neuroscience, TINS-13, pp. 392-398, 1990. [30] P. Gouras, “Color Vision,” Chapter 31 in Principles of Neural Science (E.R. Kandel, J.H. Schwartz, and T.M. Jessell, editors), pp. 467-480, New York: Elsevier Science Publishers, 1991. [31] E.A. Newman and P.H. Hartline, “The infrared vision of snakes,” Scientific American, 246 (March), pp. 116-127, 1982. [32] E.A. Newman and P.H. Hartline, “Integration of visual and infrared information in bimodal neurons of the rattlesnake optic tectum,” Science, 213, pp. 781-791, 1981. [33] S. A. Ellias and S. Grossberg, “Pattern formation, contrast control, and oscillations in the short memory of shunting on-center off-surround networks,” Biological Cybernetics, 20, pp. 69-98, 1975. [34] M. Norton, R. Kindsfather, and R. Dixon, “Short-wave (1-2.8 µm) imagery applications for fun and profit,” Terrorism and Counterterrorism Methods and Technologies, SPIE-2933, pp. 9-31, 1997. [35] J.G. Verly and R.L. Delanoy, "Model-Based Automatic Target Recognition (ATR) System for Forward-looking Ground-based and Airborne Imaging Laser Radars (LADAR)", Proceedings of the IEEE, Vol. 84, No. 2, pp. 126-163, Feb. 1993. [36] O. Faugeras, Three-Dimensional Computer Vision: A Geometric Viewpoint, pp. 51-66, MIT, Cambridge, MA, 1993. [37] M. Rutishauser, M. Stricker, and M. Trobina, “Merging range images of arbitrarily shaped objects.” Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp. 573-580, 1994. [38] G.A. Carpenter, S. Grossberg, N. Markuzon, J.H. Reynolds, and D.B. Rosen, “Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps,” IEEE Transactions on Neural Networks, 3, pp. 698-713, 1992. [39] T. Kasuba, “Simplified Fuzzy ARTMAP”, AI Expert, Nov 1993. [40] D.M. Tsai, “A fast thresholding selection procedure for multimodal and unimodal histograms,” Pattern Recognition Letters, 16(6), pp. 653-666, 1995. [41] D.H. Ballard and C.M. Brown, Computer Vision, pp. 151-152, Prentice-Hall, Inc., New Jersey, 1982. [42] R.C. Gonzalez and R.E. Woods, Digital Image Processing, pp. 42-45, Addison-Wesley Publishing. Co., Inc., 1992.