A Method for Estimating the Motion of a Marine Remotely Operated Vehicle using an Underwater Vision System Hammad Ahmad*. Jonathan Horgan *. Daniel Toal*. Edin Omerdic*, Sean Nolan* *Electronics and Computer Engineering Department, University of Limerick, Ireland (e-mails:hammad.ahmad;jonathan.horgan;daniel.toal;
[email protected],
[email protected])
Abstract: This paper outlines a methodology for estimating translational and rotational velocities in the 2D plane using a vision system in near seabed operations. The feature detection across consecutive camera frames is done using the Scale Invariant Feature Transform (SIFT) in Matlab. The estimated translational and rotational velocities from this vision based motion detection system can be used as aiding for inertial navigation systems, e.g. iXsea Photonic Inertial Navigation System (PHINS), to improve the near sea bed navigation performance for Remotely Operated Vehicles (ROVs). Keywords: SIFT, Vision, ROV navigation 1. INTRODUCTION Accurate navigation and localisation of unmanned underwater vehicles (UUV) is vital for both reliability of data collection and control of the vehicle itself in ocean applications such ocean survey, offshore oil and gas operations, etc. However, the task of underwater navigation is challenging and as of yet has no perfect solution capable of tackling all mission and environment scenarios. In general, fusing of differing navigation sensors is employed to obtain the best navigation estimate. The current state of the art navigation systems are based on the use of velocity measurements from a Doppler velocity log (DVL) fused with accurate velocity/angular rate and position/attitude measurements derived by integration and double integration respectively of linear acceleration and angular rates from an inertial measurement unit (IMU) (Kinsey et al., 2006). To constrain the inherent integration drift in the system position fixes from an acoustic transponder network such as Long Baseline (LBL), Ultra Short Baseline (USBL) or GPS Intelligent Buoys (GIB) are often used. Near seabed vehicle navigation for the collection and registration of high resolution video datasets is of particular interest for biological and geological surveying (Grehan et al., 2005). However, the marine environment is not ideal for optical imaging as many of its properties affect the quality of video data (Horgan and Toal, 2009). For example, the range of optical imaging is limited by attenuation and scattering of light due to the rapid absorption of longer wavelengths of the visible colour spectrum and the turbidity of the water. As a result, optical imaging is only suitable in close proximity to the intended target. Consequently, current navigation systems generally rely on acoustic sensors for positioning information. However, the use of transponder networks to bound IMU drift raises mission cost as transponders require deployment and calibration prior to the mission or a mother ship is necessary. This solution also limits the area in which
the vehicle can accurately navigate to within the bounds of the transponder network (acoustic tether). Many current IMU or inertial navigation system (INS) based for UUVs generally rely on the integration of DVL velocity measurements in bottom track to reduce motion estimation drift. Many DVL systems, depending on frequency have a blanking range of 1m or more (lower frequency - higher blanking range), thus causing measurements inside this range to suffer from DVL drop outs with resultant position estimation drift. As it is often within this 1m DVL blanking range that video (and high definition video) data collection is performed there is an evident mismatch between the navigational capabilities of lower frequency DVL systems and the requirements of near intervention video dataset collection missions. In the last decade many have investigated the use of computer vision in UUV navigation but to date no solutions have penetrated to wide spread application in the off shore industry sector. Underwater vehicles are commonly fitted with cameras for biological, geological and archaeological survey needs. As such, cameras are standard equipment onboard submersibles. As a readily available sensor, vision can thus be incorporated into a navigation framework to provide alternative vehicle navigation estimates when working near the seafloor. Optical imaging systems are relatively inexpensive sensors in comparison to their acoustic counterparts while also possessing the advantage of high update rates and resolutions. The implementation of custom image processing algorithms allows for optical information to be used in many diverse navigation and positioning applications including: cable tracking (Balasuriya and Ura, 2002), mosaicking (Pizarro and Singh, 2003) and station keeping (Cufi et al., 2002). Despite recent interest in vision based techniques very little research exists in the use of vision as an underwater odometry sensor or as a means of Simultaneous Localization
and Mapping (SLAM). The majority of SLAM research that has taken place in the underwater environment has focused on acoustic data. The key to successful visual SLAM, like most vision applications for underwater vehicle navigation, lies in the selection of robust features on the sea floor to allow for accurate correspondence in the presence of changing view points and non uniform illumination. Another important factor to be considered is the likely sparseness of distinct features due to the environment thus placing even more importance on the accuracy of key-point localization. One of the few examples of underwater optical SLAM was developed by Eustice who implemented a vision based SLAM algorithm that performs even in the cases of low overlap imagery (Eustice, 2005). In addition, inertial sensors can take advantage of the technique to improve the production of detailed seabed image reconstructions. Using an efficient sparse information filter the approach scales well to large-scale mapping. An impressive image mosaic of the RMS Titanic was constructed using the techniques but due to the high processing demands was only suitable for offline processing (Eustice et al., 2005). Saez et al. detail a technique for visual SLAM that takes advantage of a trinocular stereo vision (Saez et al., 2006). A global rectification strategy is employed to maintain the global consistency of the trajectory and improve accuracy. While experiments showed good results all testing was carried out offline. The algorithm for global rectification becomes increasingly computational complex with time and as a result is unsuitable for large scale environments. The issues associated with metric motion estimation from vision are dealt with more directly by Caccia (Caccia, 2003) and later developed into a more complete system with ocean environment experimental results (Caccia, 2007). The system is based on an optical feature correlation system to detect motion between consecutive camera frames. This motion is converted into its metric equivalent with the implementation of a laser triangulation scheme to measure the altitude of the vehicle. The system described only allows for horizontal linear translation and doesn’t account for changes in yaw but promising results were achieved using the Romeo vehicle for a constant heading and altitude in the Ligurian Sea. Cufi et al. also calculate direct metric motion estimation for evaluation of a vision based station keeping algorithm (Cufi et al., 2002). This technique uses altitude measurements gained from an ultrasonic altimeter to convert offsets from images produced by a calibrated camera into metric displacements. The work of the authors in vision based UUV navigation with the addition of a Scale Invariant Feature Transform - SIFT based motion estimation algorithm to improve image correspondence and the use of the latest Graphical Processing Units (GPU) technology in order to perform in real-time is reported in (Horgan et al., 2009, Horgan et al., 2007). This paper outlines the methodology for estimating translational and rotational velocities in the 2D plane from a near sea bed vision system. The feature detection between consecutive frames is done using SIFT in Matlab (Vedaldi,
2009). This estimation is used as an aiding for the iXsea Photonic Inertial Navigation System (PHINS) to improve the near sea bed navigation operation in ROVs such as ROVLATIS (Omerdic and Toal, 2009). 2. ESTIMATION METHODOLOGY 2.1 SIFT features This approach transforms an image into a large collection of local feature vectors, each of which is invariant to image translation, scaling, and rotation, and partially invariant to illumination changes and affine or 3D projection. Application of the SIFT algorithm involves the following different stages (Lowe, 2004): Stage 1 Identification of points which are invariant to scale and orientation
Stage 2 Identifying the location and scale of the above selected points
Stage 3 Estimation of local image gradient. Based on gradient directions assign more than two orientations to each candidate points
Stage 3 To account for local image distortion and illumination changes gradient direction around the candidate points are calculated
Fig. 1. Different stages of SIFT algorithm. 2.2 Method for Feature Detection In operations off the coast of Galway in 2008 ROVLATIS was flown at a constant altitude close to sea bed. The deployment of ROVLATIS is shown in Fig. 5. Fig. 2 shows consecutive frames from the vehicle’s downward looking camera as the ROV passes over its own umbilical. As an example of the operation of the SIFT algorithm, these frames are used for motion estimation. In order to estimate the motion of the camera two camera frames are processed for local feature detection using SIFT algorithm (Droeschel et al., 2009). The potential points of interest which are invariant to scale and orientation and are calculated using the difference-of-
Gaussian function as shown in Fig. 3. The keypoint features in the nearest neighbourhood in one frame are then matched to those of the next frame. The Euclidean distance is then calculated between the corresponding nearest neighbourhood descriptors Fig. 4 in pixel coordinates. One match which is the mapping of points in two frames with the altitude information gives a 3D point in space.
Fig. 4. Descriptors matching in frame 1 and in frame 2. 2.3 Singular Value Decomposition (SVD) The following convention holds: Vector=bold italic. Fig. 2. Two frames ROVLatis dive off coast of Galway.
Matrix=bold non italic. Scalar =non bold italic. The SVD algorithm is implemented to find the translation vector, and rotation matrix (Aran et al., 1987). The computational complexity of the calculating SVD is O(n3) (Golub and Loan 1996) which can be reduced by subspace tracking of updating the SVD. This paper deals with the proof –of-concept phase of the project. The computational complexity is not considered during this initial phase of the project. Consider two 2-D sets of points frame2 respectively. where
and
and
and
in frame1 and
are 2 1 column matrix.
The Following are the steps for calculating . (a)
Step 1: Calculate vectors and as follows:
and
and sets
and
from (3) (4) (5) (6)
Step 2: Calculate 2
matrix (7)
where the superscript denotes matrix transpose. Step 3: Calculate SVD of Л
(8)
Step 4: Calculate (9)
(b) Fig. 3. (a) Frame 1 all octaves of scale space. (b) Frame 2 all octaves of scale space. This is generated using difference of Gaussian function.
Step 5: Calculate, det( ), the determinant of . If det( )=+1, then
If det( )=-1, algorithm fails. This does not occur in practice (see (Aran et al., 1987) for proof) Step 6: calculate (10)
Fig. 5. ROVLATIS being deployed from the RV Celtic Explorer near Rossaveel in Galway Bay 2.4 Camera Ego Motion The camera is calibrated and the altitude is known from a high frequency sonar sensor (Nolan, 2006). The picture pixels to real world metric is calculated to be 1 pixel = 2.2917 mm in camera coordinates for the frames analysed. The frames are translated and rotated by the camera ego motion (i.e. translation vector, and rotation matrix ). A closed form solution is implemented for the estimation of underwater camera and as explained in section 2.3. The translation vector is composed of which is the linear translation of camera between two frames. The rotation matrix (i.e. rotation about vertical yaw axis) is the orientation change of camera between two frames. The vertical camera on the ROV takes an incremental trajectory which is given by: +
2.6 Sensor Aiding The estimation from the camera will be used for aiding the iXsea PHINS inertial navigation system (with Kalman filter) for accurate near seabed navigation of ROV Latis in future work as shown in schematic Fig. 6. This should significantly enhance the navigation of the ROV in missions where the ROV flies from the surface to near seabed positions and follows video transects. The switch shown in Fig. 6 will change the iXsea PHINS aiding from DVL to vision in near seabed operations. In the overall mission, navigation will be realised with different sensors aiding the INS in different phases of the mission. For example consider a mission to acquire a seabed video transect in 40m depth waters. At the start of the mission the ROV will have GPS as an aiding sensor, during the dive the INS will have Doppler Velocity Log in bottom track aiding and once the ROV comes close to the seabed to acquire video with altitude off bottom below the blanking range of the DVL, the motion from video estimator will be used in INS aiding. In this near sea bed video phase, altitude is provided by a high frequency sonar (2MHz) (Nolan, 2006), allowing constant height off bottom flight control and motion estimation in 2D as described. With aiding sensors in all phases of the mission pure inertial drift of the INS will be avoided. GPS
MicroBath
Vision system
DVL
Channel 1
If (near seabed operation) Channel 1 Else Channel 2 Channel 2
Kalman filter implemented in iXsea PHINS system
Estimated State
Switch GAPS
(11) Fig. 6. Vision system used as aiding to iXsea PHINS.
and (12) ,where is the previous frame and rotation matrix.
The estimated linear velocities in x and y directions are 0.192m/s and 0.394m/s respectively. The angular velocity about the yaw axis is 11.52deg/s.
is a 2D
2.5 Motion Estimation The motion estimation described in the previous section gives translation and rotation changes and respectively. The linear and angular velocities are calculated by knowing the frames rate (0.1s) and the estimation vector is given by: (13)
6. CONCLUSIONS The translational and rotational velocities in the 2D plane from a near sea bed vision system are calculated by applying using SIFT and SVD algorithms to consecutive frames. In the work reported in this paper, features detection and matching have been carried out off line using previously acquired camera footage. As reported in the literature, the implementation of SIFT is slow for online computation. Work to increase the computational efficiency of the SIFT algorithm and the online implementation of vision based motion estimation is being investigated by the authors (Horgan et al., 2009).
The technique uses the latest graphical processing unit (GPU) technology to perform the highly parallel SIFT algorithm rather than the conventional CPU approach. This method greatly reduces restrictions on image resolution, vehicle speed, frame rate and vehicle dynamics, thus leading to highly accurate motion measurements. The use of GPU image processing also frees the CPU for tasks more sequential in nature such as SVD and Kalman filtering. Test data collected onboard the RV Celtic Explorer will allow for further testing and debugging of image motion estimation and Kalman filtering techniques. The dataset collected will also be implemented to perform fusion of navigation data and comparison of navigation estimates with the current onboard navigation equipment. The eventual goal of this research is to perform onboard real time vision based navigation estimation and INS aiding in near seabed operations. REFERENCES ARAN, K. S., HUANG, T. S. & BLOSTEIN, S. D. 1987. Least-squares fitting of two 3-d point sets. IEEE Transactions on Pattern Analysis and Machine Intelligence, 9, 698-700. BALASURIYA, A. & URA, T. 2002. Vision-based underwater cable detection and following using AUVs. Oceans '02 MTS/IEEE. CACCIA, M. Year. Vision-based linear motion estimation for unmanned underwater vehicles. In: IEEE International Conference on Robotics and Automation (ICRA '03), 2003. 977-982. CACCIA, M. 2007. Vision-based ROV horizontal motion control: Near-seafloor experimental results. Control Engineering Practice, 15, 703-714. CUFI, X., GARCIA, R. & RIDAO, P. Year. An approach to vision-based station keeping for an unmanned underwater vehicle. In: IEEE/RSJ International Conference on Intelligent Robots and System., 2002. 799-804 DROESCHEL, D., MAY, S., HOLZ, D., PLOEGER, P. & BEHNKE, S. 2009. Robust Ego Motion Estimation with ToF Cameras. 4th European Conference on Mobile Robots (ECMR). EUSTICE, R. M. 2005. Large-Area Visually Augmented Navigation for Autonomous Underwater Vehicles. Doctor of Philosophy, Massachusetts Institute of Technology and Woods Hole Oceanographic Institution. EUSTICE, R. M., SINGH, H., LEONARD, J., WALTER, M. & BALLARD, R. D. 2005. Visually navigating the RMS Titanic with SLAM information filters. Proceedings of Robotics Science and Systems. Cambridge, MA: MIT Press. GOLUB, G.H., & LOAN, C.F.V. 1996. Matrix computations, 613 .The John Hopkins University Press, Baltimore and London GREHAN, A., WILSON, M., RIORDAN, J., MOLNAR, L., OMERDIC, E., LE GUILLOUX, E., TOAL, D. & BROWN, C. 2005. ROV Investigations of ColdWater Coral Habitats along the Porcupine Bank Margin, West Coast of Ireland. Third International
Symposium on Deep-Sea Corals, Science and Management. ed. Miami, Florida. HORGAN, J., FLANNERY, F. & TOAL, D. 2009. Towards Real Time Vision Based UUV Navigation using GPU Technology. IEEE Oceans 2009. Bremen. HORGAN, J. & TOAL, D. 2009. Computer Vision Applications in the Navigation of Unmanned Underwater Vehicles (Chapter 11). In: INZARTSEV, A. V. (ed.) Underwater Vehicles. InTech. HORGAN, J., TOAL, D., RIDAO, P. & GARCIA, R. 2007. Real-time vision based AUV navigation system using a complementary sensor suite. IFAC Conference on Control Applications in Marine Systems (CAMS'07). Bol, Croatia. KINSEY, J. C., EUSTICE, R. M. & WHITCOMB, L. L. 2006. Survey of underwater vehicle navigation: Recent advances and new challenges. IFAC Conference of Manoeuvring and Control of Marine Craft (MCMC 06'). LOWE, D. G. 2004. Distinctive image features from scaleinvariant keypoints. International Journal of Computer Vision, 60, 91-110. NOLAN, S. 2006. A High Frequency Wide Field of View Ultrasonic Sensor for Short Range Collison Avoidance Applications on Intervention Class Underwater Vehicles. Doctor of Philosophy, University of Limerick. OMERDIC, E. & TOAL, D. 2009. Smart ROVLATIS from Design Concepts to Test Trials. IFAC 8th Conference on Manoeuvring and Control of Marine Craft (MCMC'2009). Guarujá, Brazil. PIZARRO, O. & SINGH, H. 2003. Toward large-area mosaicing for underwater scientific applications. IEEE Journal of Oceanic Engineering, 28, 651-672. SAEZ, J. M., HOGUE, A., ESCOLANO, F. & JENKIN, M. Year. Underwater 3D SLAM through entropy minimization. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). , 2006. 3562-3567. VEDALDI, A. 2009. SIFT for Matlab [Online]. Available: http://www.vlfeat.org/~vedaldi/code/sift.html [Accessed].