Enhancing Stereo Image Formation and Depth Map

Enhancing Stereo Image Formation and Depth Map Estimation for Mastcam Images Chiman Kwan Applied Research LLC Rockville, USA [email protected]

Bryan Chou Applied Research LLC Rockville, USA [email protected]

Abstract—There are two multispectral Mastcam imagers in the Mars Science Laboratory (MSL) onboard the Mars rover Curiosity. The left imager has three times lower resolution than that of the right imager. This paper summarizes some new results in enhancing the stereo images and depth maps using the left and right Mastcam images. Our goal is to investigate the fusion of the left and right images to generate high quality stereo images and depth maps. We first enhance the left images with help from the right images. We then combine the improved left images and the original high resolution right images to create high quality stereo and depth maps. Actual Mastcam images were used in our experiments. Results indicated that the proposed approach works to a certain extent. There is still room for performance improvement. Keywords—Mastcam; image enhancement; image alignment; stereo image; depth map; pansharpening.

I. INTRODUCTION There are a few instruments [1]-[3] in the Mars Science Laboratory (MSL) onboard the Curiosity rover for studying the Mars surface. Two of them are known as the Mastcam multispectral imagers [3]. The left imager has 3 times lower resolution than that of the right. The current practice for stereo image formation is to downsample the right images to the same resolution of the left. We believe such a practice is a big waste of high resolution images in the right Mastcam. Moreover, the current stereo images have lower resolution, which may degrade the augmented reality or virtual reality experience of science fans.

Bulent Ayhan Applied Research LLC Rockville, USA [email protected]

registration algorithm. Sub-pixel registration accuracy can be achieved. We then use the enhanced left image and the original right image to generate a stereo image and a depth map. This step comprises several sub-steps, including feature extraction, feature correspondence, image rectification, depth map generation, and stereo image formation. We performed extensive experiments by using actual left and right Mastcam images, selected from over 500,000 images in the NASA Planetary Data System (PDS) database. In some images, we can see good improvement in the quality of stereo images and depth map as compared to those using the low resolution left and high resolution right images. However, in some other images, the quality is not satisfactory due to image alignment issues when there are image contents at different depths. Although the results are mixed, we would like to mention that this is the first ever study on the generation of high resolution stereo image and depth maps for the Mastcam images. A companion paper [19] discusses an approach to improving the alignment process. This paper is organized as follows. Section II describes the Mastcam imagers. Section III briefly summarizes those key algorithms such as image registration, pansharpening, and stereo and depth map generation. Section IV summarizes the data used and all the experimental results. Finally, we conclude the paper and point out some research directions in Section V.

There are significant advances in image super-resolution in recent years. Some of them are for single images and some require the fusion of multiple images. There are also deep learning based methods and dictionary based algorithms. One super-resolution technique is known as pan-sharpening, which has wide range of applications [4]-[13]. The goal of pansharpening is to fuse a low-spatial resolution multispectral data with a high-spatial resolution panchromatic image (Pan) [5]. Some pansharpening algorithms have been applied in [14]-[18] to enhance the Mastcam images and have achieved good performance. In this research, we investigated how we can generate stereo images and depth maps at the resolution of the right Mastcam images. We first improve the resolution of the left images with help from the right images. A pair of left and right images with a common field of view is first aligned using a two-step image This research was supported by NASA JPL under contract # 80NSSC17C0035. 978-1-5386-7693-6/18/$31.00 ©2018 IEEE

Fig. 1: Mastcam spectral response profiles for the left M-34 camera (top panel) and the right M-100 camera (bottom panel) [3].

II. MASTCAM The spectral response of the two Mastcam imagers is shown in Fig. 1. There are nine bands in each imagers. Six of them are overlapping and three are non-overlapping (L3, L4 and L5 from the left camera and R3, R4, and R5 from the right camera). More details about Mastcam can be found in [3] and [14]. III. KEY ALGORITHMS Here, we propose a new approach to high resolution stereo image formation and high resolution depth map generation. As shown in Fig. 2, the approach consists of the following steps. First, an accurate two-step image registration approach is used to align the left and right images. We use the left image as the reference for registration. The coarse step is to use scale invariant features transform (SIFT) or speeded up robust features (SURF) features with RANSAC (Random Sample Consensus). The fine step is to apply a diffeomorphic algorithm to achieve subpixel accuracy. The common area between the left and right images is then extracted. The alignment can achieve sub-pixel accuracy. Second, a panchromatic (pan) band is created using the multispectral bands in the right image. One simple way to create the pan band is to take the average of all the available bands in the right camera. Any pansharpening algorithms can be used for pansharpening. In the examples in this paper, a pansharpening algorithm known as Gram-Schmidt Adaptive (GSA) algorithm was used to pansharpen the left image. Third, the pansharpened left image and the original high resolution right image are then used to create a stereo image. In the stereo image creation process, some standard procedures, including feature points extraction, feature points matching between left and right images, estimation of fundamental matrix, outliers removal based on epipolar constraint, and image rectification, can be used. Fourth, based on the stereo image, a depth map can then be generated. Fig. 2 below shows the signal flow. In the subsequent sections, we will summarize the key algorithms. Low resolution Left image upsampling

Pansharpening

High resolution Right image Two-step Image Alignment Aligned right image Create Pan band

High resolution left image

High resolution Right image

Stereo image formation

Disparity map generation

Depth map generation

Fig. 2: Signal flow of a new stereo image formation and depth map generation system.

A. Image Registration We applied a two-step image alignment approach that can achieve sub-pixel accuracy [14]. The signal flow is shown in Fig. 2. The first step uses Random Sample Consensus (RANSAC) technique [20] for an initial alignment. In this first step, we use the two RGB stereo images from the left and right Mastcams. The SURF features [21] and SIFT features [22] are extracted from the two stereo images. These features are then matched within the image pair. RANSAC is then applied to estimate the geometric transformation. The second step uses the previously aligned image with RANSAC and the left camera image as inputs and applies the Diffeomorphic Registration [24] technique. The second step reduces the registration errors to subpixel levels.

Fig. 3. Block diagram of the two-step image alignment approach [14].

B. Pansharpening The goal of pansharpening is to fuse a low-spatial resolution left image with a high-spatial resolution panchromatic image (pan) from the right camera. In our case, after the two-step registration, the image from the left camera can be considered as a blurred version of the right one. As a result, pansharpening techniques can be applied to sharpen the images from the left camera using high spatial resolution images from the right camera as the pan image. Although there are many new pansharpening techniques, some conventional and computational efficient ones can be classified into two main categories: (1) the component substitution (CS) approach and (2) the multiresolution analysis (MRA) approach. In this paper, we focus on CS-based approach due to its simplicity. The output pansharpened data are finally achieved by applying the inverse transformation to project the data back to the original space. We applied the Gram-Schmidt Adaptive (GSA) [5] algorithm for its simplicity and performance in our experiments. C. Stereo Image and Depth Map Generation Fig. 4 shows the key steps in stereo image formation. Given a pansharpened left image and the original high resolution right image, we first perform feature points extraction. This step is similar to the two-step image registration described earlier.

Both SIFT and SURF features can be used. In the past, we found that SIFT features are more robust than SURF features. Second, the feature correspondence can be achieved with RANSAC, which matches feature points that belong to the same physical locations. Third, the fundamental matrix is estimated based on the corresponded feature points. Fourth, outliers are removed using the epipolar constraint. Finally, an image rectification step is performed.

(c) Stereo image pair with proper baseline Fig. 5: Key steps in the image rectification process [23].

Disparity is the difference between two pixels that correspond to the same physical point in the stereo image pair. Once the stereo images are created, a feature correspondence process is needed to determine the pixels that belong to the same physical point. Based the feature correspondence results, the disparity map is computed for every pixel in the image. Once disparity map is found, we can use the following formula to compute the depth, L, for each pixel. L = Bf / ΔX where B is the baseline between the two cameras, f is the focal length, and ΔX is the disparity at a particular pixel. IV. EXPERIMENTAL STUDIES

Fig. 4: Flow chart in stereo image formation.

To illustrate the stereo rectification process, we use Fig. 5 below. “Camera 1” and “Camera 2” with different views are illustrated in Fig. 5 (a). Our aim is to get a true stereo pair from these two images through image rectification. The first step in the rectification process is to find a homography to each image We then transform these two images to the new ones, as if they were captured by two parallel cameras, as illustrated in Fig. 5 (b). The second step is to adjust the wide or narrow baseline of the two parallel cameras to a proper value by translating the new images. Thus, the desired stereo pair is constructed in Fig. 5(c).

(a) Two general image plane

(b) Rectified image planes

A. Mastcam Data The Mastcam dataset downloaded from the PDS archive contains more than 500,000 images collected at different times and locations since 2012. It should be noted that the left and right Mastcams are independently controlled and do not always collect data simultaneously. Extensive pre-processing and screening were performed to only select pairs of images that have the same dates and locations. After cleaning up the image sets, we obtained a database with a total of 133 LR-pairs. B. Experimental Results In this section, we include 4 case studies. Case 1: Image pair: MSL_0001_0013_M1. Fig. 6 shows the original low resolution left and the pansharpened right images. It can be seen that the image quality has been improved after pansharpening. If one looks at the rock edges, one will notice that the edges look much sharper in the pansharpened image as compared to the original left image. Fig. 7 shows the comparison between the stereo images generated by using LR left and HR right images as well as by using the pansharpened left and HR right images. If one wears a 3D (redblue) glass, one can see that the resolution of the 3D image looks much better. Fig. 8 shows the comparison of depth maps. It can be seen that the quality has been improved slightly in the depth map.

(a) Before pansharpening;

(b) Stereo image (Red-cyan anaglyph) formed using enhanced left and high resolution right. Fig. 7: Comparison of stereo images generated by using low resolution/enhanced left images and high resolution right image. 10

8

6

4

2

0

-2

(b) After pansharpening Fig. 6: Left image (MSL_0001_0013_M1) before and after pansharpening.

-4

-6

(a) Depth map using low resolution left and high resolution right; 10

8

6

4

2

0

-2

-4

(a) Stereo image (Red-cyan anaglyph) formed using low resolution left and high resolution right;

-6

(b) Depth map using enhanced left and high resolution right; Fig. 8: Comparison of depth maps generated by using low resolution/enhanced left images and high resolution right image.

Case 2: Image Pair: MSL_0001_0013_M2 Fig. 9 shows the left image before and after pansharpening. It can be easily seen that the image quality is better after pansharpening. Fig. 10 compares the stereo images formed using original left and enhanced left images. Although it is somewhat hard to assess the improvement objectively, the quality is slightly better in terms of visual inspection. Similarly, the depth maps in Fig. 11 show that the one using enhanced left image has smooth transition between the different depths.

(b) Stereo image (red-cyan anaglyph) formed using enhanced left and high resolution right. Fig. 10: Comparison of stereo images generated by using low resolution/enhanced left images and high resolution right image. 10

8

6

4

2


0

-2

-4

-6

(a) Depth map using low resolution left and high resolution right; 10

8

6

4

2

(b) After pansharpening Fig. 9: Left image (MSL_0001_0013_M2) before and after enhancing.

0

-2

-4

-6


Case 3: Image Pair: MSL_0002_0174_M1

(a) Stereo (red-cyan anaglyph) formed using low resolution left and high resolution right;

Fig. 12 shows the left and pansharpened left image of a third case. The improvement is obvious. Similar to earlier cases, the stereo image and depth map in Fig. 13 and Fig. 14 all show improvement over the non-sharpened case.

10 8 6 4 2 0 -2 -4 -6



(b) After pansharpening Fig. 12: Left image (MSL_0002_0174_M1) before and after enhancing.

(a) Stereo (red-cyan anaglyph) formed using low resolution left and high resolution right;

(b) Stereo image (red-cyan anaglyph) formed using enhanced left and high resolution right. Fig. 13: Comparison of stereo images generated by using low resolution/enhanced left images and high resolution right image. 10 8 6 4 2 0 -2 -4 -6

(a) Depth map using low resolution left and high resolution right;

Case 4: MSL_0003_0194_M1 Unlike the previous three cases, this case does not show much improvement. From Fig. 15, the image quality has been improved. However, the stereo image (Fig. 16) and depth map (Fig. 17) show inconsistent results. We believe this is caused by inaccurate alignment, which is due to the fact that the image contents are non-coplanar.

(a) Before pansharpening; (b) After pansharpening Fig. 15: Left image (MSL_0003_0194_M1) before and after enhancing.

(a) Stereo (red-cyan anaglyph) (b) Stereo image formed using formed using LR left and HR right; enhanced left and original right. Fig. 16: Comparison of stereo images generated by using low resolution/enhanced left images and high resolution right image. 10

10

8

8

6

6

4

4

2

2

0

0

-2

-2

-4

-4

-6

-6

(a) Depth map using low resolution (b) Depth map using enhanced left left and high resolution right; and high resolution right; Fig. 17: Comparison of depth maps generated by using low resolution/enhanced left images and high resolution right image.

C. Discussions From the results in Section IV.B, it can be seen that Cases 1 to 3 worked quite well and Case 4 did not work that well. This is mainly caused by poor registration and hence poor pansharpening results. The poor registration is because the image contents are not coplanar, causing the registration to have large errors. Consequently, all the subsequently results of pansharpening, stereo image formation, and depth map generation are all being affected In a companion paper, we propose a new approach [19] to handling the alignment issue. V. CONCLUSIONS We present a new approach to generating high resolution stereo image and depth map for Mastcam images. Initial results are very encouraging. Some potential applications include the possibility of using the stereo maps for anomaly detection [25][29] or target tracking [30][31]. REFERENCES [1] W. Wang, S. Li, H. Qi, B. Ayhan, C. Kwan, and S. Vance, “Revisiting the Preprocessing Procedures for Elemental Concentration Estimation based on CHEMCAM LIBS on MARS Rover,” 6th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS), Lausanne, Switzerland, June 24-27, 2014. [2] B. Ayhan, C. Kwan, and S. Vance, “On the Use of a Linear Spectral Unmixing Technique for Concentration Estimation of APXS Spectrum,” J. Multidisciplinary Engineering Science and Technology, Volume 2, Issue 9, pp. 2469-2474, September, 2015. [3] J. F. Bell et al., “The Mars Science Laboratory Curiosity Rover Mast Camera (Mastcam) Instruments: Pre-Flight and In-Flight Calibration, Validation, and Data Archiving,” AGU J. Earth and Space Science, 2017. [4] J. Zhou, C. Kwan, and B. Budavari, “Hyperspectral Image SuperResolution: A Hybrid Color Mapping Approach,” SPIE Journal of Applied Remote Sensing, Vol. 10, article 035024, September, 2016. [5] G. Vivone, L. Alparone, J. Chanussot, M. Dalla Mura, Garzelli, and G. Licciardi, “A critical comparison of pansharpening algorithms,” IEEE Int. Conf. Geoscience and Remote Sensing (IGARSS), pp. 191–194, 2014. [6] C. Kwan, J. H. Choi, S. Chan, J. Zhou, and B. Budavari, “Resolution Enhancement for Hyperspectral Images: A Super-Resolution and Fusion Approach,” IEEE International Conference on Acoustics, Speech, and Signal Processing, New Orleans, 2017. [7] C. Kwan, B. Budavari, A. Bovik, and G. Marchisio, “Blind Quality Assessment of Fused WorldView-3 Images by Using the Combinations of Pansharpening and Hypersharpening Paradigms,” IEEE Geoscience and Remote Sensing Letters, August, 2017. [8] C. Kwan, B. Budavari, and F. Gao, “A Hybrid Color Mapping Approach to Fusing MODIS and Landsat Images for Forward Prediction,” Remote Sensing, March 2018. [9] C. Kwan, J. H. Choi, S. Chan, J. Zhou, and B. Budavari, “A SuperResolution and Fusion Approach to Enhancing Hyperspectral Images,” Remote Sensing, September 2018. [10] C. Kwan, B. Ayhan, and B. Budavari, “Fusion of THEMIS and TES for Accurate Mars Surface Characterization,” IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, July 2017. [11] Y. Qu, H. Qi, B. Ayhan, C. Kwan, and R. Kidd, “Does Multispectral/Hyperspectral Pansharpening Improve the Performance of Anomaly Detection?” IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, July 2017. [12] M. Dao, C. Kwan, K. Koperski, and G. Marchisio, “A Joint Sparsity Approach to Tunnel Activity Monitoring Using High Resolution Satellite

Images,” IEEE Ubiquitous Computing, Electronics & Mobile Communication Conference, 2017. [13] C. Kwan, B. Budavari, M. Dao, and J. Zhou, “New Sparsity Based Pansharpening Algorithm for Hyperspectral Images,” IEEE Ubiquitous Computing, Electronics & Mobile Communication Conference, 2017. [14] B. Ayhan, M. Dao, C. Kwan, H. Chen, J. F. Bell III, and R. Kidd, “A Novel Utilization of Image Registration Techniques to Process Mastcam Images in Mars Rover with Applications to Image Fusion, Pixel Clustering, and Anomaly Detection,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, August, 2017. [15] C. Kwan, M. Dao, B. Chou, L. M. Kwan, and B. Ayhan, “Mastcam Image Enhancement Using Estimated Point Spread Functions,” IEEE Ubiquitous Computing, Electronics & Mobile Communication Conference, pp. 186-191, New York City, October 2017. [16] C. Kwan and J. Larkin, “Perceptually Lossless Compression for Mastcam Images,” IEEE Ubiquitous Computing, Electronics & Mobile Communication Conference, New York City, November 2018.?. [17] M. Dao, C. Kwan, B. Ayhan, and J. F. Bell, “Enhancing Mastcam Images for Mars Rover Mission,” 14th International Symposium on Neural Networks, pp. 197-206, Hokkaido, Japan, June 2017. [18] C. Kwan, B. Budavari, M. Dao, B. Ayhan, and J. F. Bell, “Pansharpening of Mastcam images,” IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, July 2017. [19] C. Kwan, B. Chou, and B. Ayhan, “Stereo image and depth map generation for images with different views and resolutions,” IEEE Ubiquitous Computing, Electronics & Mobile Communication Conference, 2018. [20] R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, Cambridge University Press, 2003. [21] H. Bay, et al., “SURF: Speeded Up Robust Features,” Computer Vision and Image Understanding (CVIU).Vol. 110, No. 3, pp. 346–359, 2008. [22] D. G. Lowe, “Object recognition from local scale-invariant features.” IEEE International Conference on Computer Vision, vol. 2, 1999. [23] X. Li, C. Kwan, and B. Li, “Stereo Imaging with Uncalibrated Camera,” Advances in Visual Computing, Second International Symposium, ISVC 2006, Lake Tahoe, NV, USA, November 6-8, 2006. [24] H. Chen, et al., “A Parameterization of Deformation Fields for Diffeomorphic Image Registration and Its Application to Myocardial Delineation,” Lecture Notes in Computer Science Volume 6361, pp 340348, 2010. [25] J. Zhou, C. Kwan, B. Ayhan, and M. Eismann, “A Novel Cluster Kernel RX Algorithm for Anomaly and Change Detection Using Hyperspectral Images,” IEEE Trans. Geoscience and Remote Sensing, Volume: 54, Issue: 11, pp. 6497-6504, Nov. 2016. [26] W. Wang, S. Li, H. Qi, B. Ayhan, C. Kwan, and S. Vance, “Identify Anomaly Component by Sparsity and Low Rank,” IEEE Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensor (WHISPERS), Tokyo, Japan, June 2-5, 2015. [27] Y. Qu, W. Wang, R. Guo, B. Ayhan, C. Kwan, and S. Vance, and H. Qi, “Hyperspectral Anomaly Detection through Spectral Unmixing and Dictionary based Low Rank Decomposition,” IEEE Trans. Geoscience and Remote Sensing, March 22, 2018. [28] Y. Qu, H. Qi, and C. Kwan, “Unsupervised Sparse Dirichlet-Net for Hyperspectral Image Super-Resolution,” Conference on Computer Vision and Pattern Recognition, Salt Lake City, June 2018. [29] S. Li, W. Wang, H. Qi, B. Ayhan, C. Kwan, and S. Vance, “Low-rank Tensor Decomposition based Anomaly Detection for Hyperspectral Imagery,” IEEE International Conference on Image Processing (ICIP), Quebec City, Canada, September 27-30, 2015. [30] X. Li, C. Kwan, G. Mei, and B. Li, “A Generic Approach to Object Matching and Tracking,” Proc. Third International Conference Image Analysis and Recognition, Lecture Notes in Computer Science, Volume 4141, pp 839-849, 2006. [31] C. Kwan, B. Chou, A. Echavarren, B. Budavari, J. Li, and T. Tran, “Compressive Vehicle Tracking Using Deep Learning,” IEEE Ubiquitous Computing, Electronics & Mobile Communication Conference, New York City, November 2018.