A Video Marking System for an Autonomous Underwater Vehicle

A Video Marking System for an Autonomous Underwater Vehicle Raúl Arrabales, Daniel Toal and Colin Flanagan Department of Electronic & Computer Engineering, University of Limerick, Ireland.

Abstract This paper describes the first part of an ongoing project concerning the development of an intelligent vision system for an autonomous underwater vehicle (AUV). The submersible craft is expected to carry out underwater filming and inspection tasks. The AUV control system is based on the subsumption architecture. When operating in automatic mode the vehicle navigates and performs the assigned tasks by itself. This autonomous robot may be used to obtain video footage from the deep-sea in order to study the creatures that live in deep abyssal waters. The whole vision system envisaged for the AUV is a combination of two differentiated but interrelated modules. The initial vision subsystem that has been developed is the video marking system (VMS); and based on this, a target tracking system (TTS) will be subsequently implemented. The principal aim of the VMS is to detect the interesting events that exist in the video footage autonomously acquired by the AUV. Using the VMS scientists do not need to watch the full length recorded video looking for events that they want to investigate. The marking system uses a number of different approaches designed to identify which fragments of the video footage are really interesting, and discards those which only show the water background. Keywords: Robot Vision, Image Processing, Underwater Vision, Event Mining.

1. Introduction The development of two autonomous underwater robots is ongoing at the department of Electronic and Computer Engineering in the University of Limerick. One of these vehicles is of single hull construction, which has enabled the group to progress to rapidly develop the submersible control systems. A second open frame AUV system is under construction that will have the advantage of more precise maneuverability and can be readily reconfigured with various tools and sensors for different missions (see figure 1). The Autonomous Underwater Vehicles (AUV) control system is based on the subsumption architecture originally described by Brooks [1], and is enhanced with the results achieved during previous work carried out by the autonomous mobile robots research group [2,3]. Target applications of this AUV include open-sea inspection and filming tasks. The simplest approach to securing video footage from a vehicle mission is to switch on the camera and lamps at the surface, run for the duration of the mission and stop and retrieve the video footage after taking the craft from the water. It is however the intention of the team to enhance the video capture systems on the AUV craft. One tedious task marine researchers often have to do is to survey hours of video footage from a marine robot mission while there may only be very short sequences of interest which are easily overlooked.

Figure 1. The Single Hull and Open Frame AUV Craft Since there is no pilot controlling the craft or the onboard TV camera, an intelligent vision system is being developed. The first step of the development of such an intelligent vision system for the AUV is the video marking system. An initial version of the VMS is under development which works offline and takes an MPEG file as input and generates a list of potentially interesting events which have occurred in the film. The video reviewer can then go straight to the marked sequences. The relationship between the two modules that form the AUV vision system is depicted in figure 2. The VMS, which is described in the present paper, is not only a module of the vision system, but the initial test-bed for the image processing and image understanding techniques that will be used for further development. The expertise gained during the design and development of the VMS will be applied to the implementation of an onboard vision system for the AUV, which will be able to instantaneously detect the events in the video stream captured by the craft’s TV camera. This will be the real-time version of the VMS (RVMS). Using the real-time video marking system, the recording procedure may be optimized, storing only the digital video data that correspond to interesting scenes, thus saving power and storage resources aboard. Furthermore, the target tracking system may be integrated in the AUV vision system in order to provide the robot with the capability of automatically chasing and filming underwater creatures. The RVMS will be part of the TTS in view of the fact that the tracking tasks need the knowledge of the events which occur in the scene as recorded by the RVMS. AUV Vision System

-Video Marking System - Developed First. - Works Offline. - Windows® Platform.

-Target Tracking System - Developed Secondly. - Works aboard. - Includes RVMS.

Figure 2. AUV Vision System Components

2. The Video Marking System Version one of the developed VMS processes the input video file as a temporal sequence of images, and may apply different image processing techniques. The decision about which methods are to be used in a determined period of time may be stated by the user, or may be automatically deduced by the VMS itself for optimal performance. The routine applied to adatatively select the best image processing approach is explained in the next section. AUV Vision 2002 Project Description v.0.9. Raúl Arrabales Moreno.

Page 2 of 8

A block diagram that represents the process flow in the VMS is presented in figure 3. Image acquisition is the first stage in the process of the AUV visual perception; the onboard digital TV camera converts the optical radiation to an appropriate electronic signal, which is encoded and stored as an MPEG file. This file, which constitutes the input for the VMS, is processed in order to extract useful information that will be utilized to understand the scene. In the framework of this project, the stage denominated Image Processing in figure 3 involves pre-processing, segmentation and feature-extraction, i.e. low and medium level artificial vision [4]. Optical Radiation

Digital Video

Image Acquisition

Image Features

Image Processing

Pattern Recognition Statement about the world

Underwater real world

Figure 3. Video Marking System. Several visual-sensing objectives can be achieved with the system using image processing alone [5]. For example some objects can be distinguished solely by measurements such as area and perimeter of a shape, obtained in the image processing stage. Therefore, operators such as edge detection, noise-removal, and background subtraction are applied in the pre-processing stage defined in this work. It must be taken into account, in the context of the VMS development for the AUVs in this project, that the major cause of difficulty of processing is that the images are taken underwater. Consequently the complexity and variability of the input source may easily degrade the performance of simple image processing techniques when utilised as final visual sensing tools. Based on experiments performed using the first version of the VMS developed in this project, the complexity of the visual-sensing problem makes necessary the use of advanced pattern recognition techniques integrated with the image processing tasks. An object detector for the VMS is being designed using Kohonen Self-Organising Maps (SOM) [6], with the aim of recognising whether an object is present in a frame of the video. One of the key decisions that must be made in the design of a robot vision system is how much reliance is to be placed on image processing in comparison to pattern recognition. Typically general-purpose vision systems rely more on pattern recognition, while the more problem-oriented systems tend to use image-processing techniques more extensively [7]. In underwater filming an optimal combination of both components must be found; however, due to the changing environment, this ideal combination will be time varying. Consequently, adaptative approaches will be studied in order to find out if an adaptative algorithm can be applied to get a dynamic integration of the various vision system modules as required by the actual circumstances. It may be foreseen that purely algorithmic approaches will not work well since the objects to be recognised in this project are not well defined. For that reason, adaptative approaches go hand in hand with the learning phase in which examples of variants of each class are shown to the SOM pattern recogniser. Therefore, two different but interrelated sources of adaptiveness are to be included in the VMS: • •

Adaptive Video Processing selects optimal image-processing and feature extraction techniques. This selection is based on a simplistic world model. Object Detection based on Kohonen SOM, which provides the marking system with flexibility by applying unsupervised learning algorithms.

AUV Vision 2002 Project Description v.0.9. Raúl Arrabales Moreno.

Page 3 of 8

We are investigating the use of two different Kohonen maps in the implementation of the VMS Object Detector. The first SOM will work as feature extractor with the aims of; employing unsupervised learning, discovering automatically which are the features characterising every image frame. The second SOM will act as a classifier. Self-Organising architecture is beneficial in this context since target objects are not well defined and the SOM may discover relevant classes automatically. The number of classes defined in the VMS pattern recogniser will have meanings regarding events occurring in the video or types of objects found in the scene.

3. From Video Acquisition to Event Mining We could define the AUV robot vision system as a device, which undertakes the process of extracting, characterising, and interpreting the visual information obtained by the onboard camera. A more complete scheme of the stages involved in such a process is presented in figure 4. Image Processing Sensing

Pre-processing

Segmentation

Event Mining Description

Recognition

Interpretation

Figure 4. AUV Robot Vision Process The AUV vision system begins with sensing, which is the process of yielding the visual image, i.e. obtaining the digital video data from the craft camera. Then, the data is pre-processed enhancing interesting details of images like edges, and/or reducing the noise. Many algorithms may be applied in order to obtain interesting features from the video frame [5,7,8]. The resultant images are segmented determining the areas where objects of interest have been found; and all this information is used to build a description of the scene, basically compound of a set of features, e.g. size or shape of objects. The selection of features is based on how suitable they are for differentiating one type of object from another. As stated above, at this stage simple conclusions about the environment could be obtained. The VMS may work just using the image processing task (note that image processing in this context refers to pre-processing, segmentation and description. See figure 4); and detect some kinds of simple events in video like changes of illumination or big object appearance. The next sub-process that takes place in the AUV robot vision system, considering the VMS, is event mining which entails two different tasks: Recognition and Interpretation. A Kohonen SOM will be used as an adaptative object detector in the Recognition stage, providing the Interpretation routine with a level of confidence of object appearance. The aim of the interpretation stage is to decide whether a mark must be assigned to a segment of the video, meaning that scientists should examine it because something interesting is happening there. For extension to a Target Tracking System, TTS, the final process should be to some extent different since Recognition and Interpretation will be more oriented to Target Tracking. The TTS can be considered as an extension of the VMS because it may use the output of the SOM recogniser used in the VMS to find out whether an interesting object is present in the scene, and subsequently apply object-tracking processes. The processes represented in figure 4 form the whole vision system that is to be implemented in the VMS. It is convenient to bear in mind that this vision system is being built gradually, from the simplest possible version to the fully functional one. Therefore, the whole process may be divided into various areas according to the sophistication involved in their implementation. Three levels of Artificial Vision accomplishment have been defined for the successive implementations of the VMS:


Page 4 of 8

• • •

Low level: Sensing and Pre-processing. The Vision System has no intelligence. Medium level: includes low level plus Segmentation, Description and Recognition of individual objects. The Vision System is able to extract, characterise and label objects using a Neural Network approach. High level: is based on the medium level and tries to emulate cognition, is vague and speculative. Constraints must be applied to cope with complexity.

The aim of the VMS may be described as event mining or event discovery. The most important event that the scientists are interested in is the appearance of a creature in the field of sight of the camera. The AUV is supposed to be wandering and recording in deep-sea waters while most of the time the only thing in front of the camera will be the blue background. Consequently, the first goal of the VMS is to detect when any object or creature comes into sight, and also when that entity exits the scene. The problem of detecting these events, called “object appearance” and “object disappearance”, may be expressed in terms of object detection in the frames of the video footage. The simplest definition for these events considering only one possible object at the same time is as follows: • Object appearance: a sequence of frames with no object detected, followed by a frame sequence with evidence of an object being present. • Object disappearance: sequence of frames showing no object after an object appearance event. The object detection is considered at different levels within the VMS artificial vision process. In each level the meaning is to some extent different, becoming more complex as we reach higher levels of artificial vision accomplishment stages. As described in the preceding section, the VMS is being built progressively and the concept of object detection being renovated in every step. In the lowest level of the vision system development, there is no intelligence and object detection is defined in terms of pre-processing, i.e. differentiate figure from background in one single frame. The feature extractor and SOM based Object Recogniser characterise medium level. In this level object detection consists in determining number of objects present in a period of time. Some features of the objects are also found out. The most complex definition comes with the higher level of artificial vision accomplishment, where the concept of object detection is embedded in the idea of description of the scene. What is an object appearance in the low level could be just part of the background in the higher level, because now objects are identified, assigned a class and play a determined role in the dynamic scene; explicitly, the video is understood by the vision system. Event detection in video is based on feature extraction from successive image frames captured by the onboard camera. The images received from the camera form a temporal sequence and cannot be considered individually in terms of event discovery. Therefore, the outputs of the feature extractor, i.e. the feature vectors, are not considered single isolated occurrences, but constituting a multidimensional function that represents the change in the video sequence. The VMS will be able to deduce what kinds of events are taking place in the scene by watching the variation of the features over time as a stochastic process. In order to address the problem of the great variability of the video input, we propose to consider the stochastic process of features variation as being stationary in the short term. Thus, a model of the stochastic process can be calculated periodically. When the actual features' vectors do not fit into the calculated model for the current period of time, the system considers that an event is taking place. The simplest way to calculate the change from one frame to the following is applying the Frame Subtraction technique. The differences between a frame and the following one are calculated applying the subtraction operator. Nevertheless, this operator is used in the Preprocessing stage as part of more complicated image processing sequences.


Page 5 of 8

4. Adaptative Video Processing Separating moving objects from the background is relatively easy when dealing with video footage taken in clear blue mid-water. A background subtractor can be used to isolate the figures from the background [8,9]. However, the AUV is supposed to perform filming and inspection tasks not only in blue mid-water, but also in the vicinity of fish farm sea cages and the under side of a ship’s hull. Filming close to the sea floor or surface also adds complexity to artificial vision routines. Some measures have been taken in order to design the VMS as robust as possible. The key approach that is being considered to deal with the environment variability is an adaptative VMS controller, which uses a simple world model to improve accuracy. A concrete vision routine is often best suited for a determined situation; i.e. different image processing or vision techniques should be applied depending on the AUV location. Consequently, given a number of available video processing routines, a world model could be used to select the most pertinent approach. Different contexts also determine different expectations about what we can find in the images. When the AUV is wandering close to the seafloor, the floor should appear in the video. The same kind of statements can be made taking into account the surroundings wherever the craft is located. The AUV vision system could use such knowledge to apply the more opportune approach. This 'situation' information can easily be obtained from vehicle sensors such as Sonar and pressure depth sensors. A simplified world model may be implemented using IF-THEN rules. For example: IF recording in deep mid-water AND ambient light is low THEN use green light based image pre-processor. IF noise in the image > noise-threshold THEN add noise filter to the image pre-processor. Using a world model to select and arrange video-processing routines may improve the accuracy and performance of the AUV vision system; inconsistent arrangement of image processing tasks are avoided when some meta-knowledge from the world is applied. For example, noise filtering is only performed when the noise present in the images is elevated; movement based routines are only applied when some confidence of object movement in the scene exists. Knowledge about actions and events which have occurred in the video is also part of the world representation; thus the vision system may keep a kind of short-term memory which helps the system to build a more accurate and dynamic world model. More advanced modelling of the world involves issues like fish 3D geometry. Knowledge about the kind of creature being tracked could assist the TTS in the prediction of the next movement of the tracked object. In short, the vision system can record simple, static descriptions of the world, deduced from contextual information or events detected by the VMS, and use them to select appropriate vision routines.

5. Main Difficulties Some particular problems have to be faced when a vision system must operate underwater; the following points are considered key issues (most aspects treated here are also applicable to typical underwater filming). 5.1. Scene Illumination Feature extraction is a critical part of the artificial vision process. The lack of contrast in the images complicates this procedure to a great extent. The main factor that affects the contrast of the captured images is the scene illumination. AUV Vision 2002 Project Description v.0.9. Raúl Arrabales Moreno.

Page 6 of 8

Several illumination schemes are often used in Robot Vision in order to enhance the contrast of the images. For example, using a technique named backlighting the contrast of figures in the image is increased locating the objects between an artificial light source and the camera [10]. Unfortunately most of these schemes cannot be used in the underwater environment since artificial light sources can only be located on the craft. The sub-aqua environment has not much in common with a typical industrial robot arm workspace; the AUV is supposed to be wandering around and light sources cannot be set in the desired positions Poor illumination results in deterioration of performance or effectiveness of vision algorithms. In typical Robot Vision scenarios arbitrary lighting of environment is often not acceptable. The lighting system is used to minimise the complexity of the resulting image thereby increasing the information available for object detection and extraction. In contrast, due to the impossibility of applying common lighting schemes underwater, the complexity of the vision algorithms used in this project should be increased to robustly face data input characterised by lowcontrast images, specular reflections, shadows and incidental singularities. 5.2. Marine Snow Another problem also associated with illumination is the marine snow effect produced when light is reflected off suspended particles [11,12]. According to the Academic Press Dictionary of Science and Technology marine snow is “downward drifting particles of living and dead organisms, including some inorganic material; sometimes suspended and concentrated at density boundaries such as thermo-clines”. The bigger marine snowflakes are over 0.5 mm and constitute the major food source for organisms that live in the deeper abyssal waters. Marine snow can be found everywhere in the ocean and may become a serious impediment when trying to get good quality images. Possible solutions will be investigated in order to minimise the negative effect of the marine snow in obtained video.

6. Development and Testing The first version of the VMS that has been built and works offline, i.e. the digital MPEG video file is collected after the craft has finished the mission. The video marking process is then performed in a computer running Microsoft® Windows® operating system. The VMS is a program written in Java™ that uses an MPEG decoder to grab the image frames from the input video file. Java™ Media Framework (JMF) API v.2.1.1a is employed to integrate the video decoder with the developed event-mining code. A window based user interface has also been built using Java™ Foundation Classes (JFC)/Swing components, thus providing the marine researchers with a familiar and friendly user interface. During the development of the AUV Vision System serious emphasis is being placed in the aspects of Software Engineering and Project Management (an ad hoc lightweight project management plan has been developed based on ESA Software Engineering Standards [13]). Since the Video Marking System is an experimental tool, and it will be subject of many changes during its lifecycle, we are designing the VMS software architecture for easy maintainability and reusability. Design patterns [14] are applied in our Java code in order to achieve the desired flexibility required by the adaptative algorithms that we propose to implement. The video marking system has been initially designed as a test bed for image processing algorithms and high-level vision routines applied to underwater vision. Work has been initially targeted at the less arduous or ambitious task of marking video footage from mid water column video stream, and subsequently progressing to the more challenging near bottom and near surface areas. The vision systems are being tested on video footage generated in controlled pool environments firstly, on blue mid column water footage received from other research teams, and filmed by the AUV system built at the University of Limerick. Finally, given success at these early stages, we aim to progress to working with surface and bottom footage. AUV Vision 2002 Project Description v.0.9. Raúl Arrabales Moreno.

Page 7 of 8

7. Conclusion and Future Work Development of the VMS is well underway. The main goals at this stage of the VMS development are to: 1. Find out which image processing operator combinations work best when trying to obtain features from underwater video footage. 2. Analyze feature usefulness for both event mining and target tracking tasks. 3. Develop an adaptative vision routine selector able to build optimal feature vectors depending on the environment. 4. Integrate the modules described in 1,2&3 above with a pattern recognizer based on Kohonen Self Organizing Maps. 5. Study the use of different neural networks to face the problem of image understanding. An initial version of the TTS will be also developed following the same strategy as in the video marking system, i.e. build an offline version in order to test the validity of the methods applied.

References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]

[12] [13] [14]

Brooks, R.A., “A Robust Layered Control System for a Mobile Robot”. IEEE Journal of Robotics and Automation, Vol. RA-2. No. 1, 1986, pp. 14-23. Toal, D., Flanagan, C., Jones, C., Strunz, B., “Subsumption Architecture for the Control of Robots”. IMC-13, Limerick 1996. Flanagan, C., Toal, D., Strunz, B., “Subsumption Control of a Mobile Robot”. Polymodel 16, Sunderland, 1995. Gonzalez, R. and Woods, R. “Digital Image Processing”. Addison-Wesley. 1992. Hartley, E., Lindsay, A., Parkes, A. “Signal Processing: Does it mean anything?”. Workshop on Computational Semiotics for New Media. June 2000. Kohonen, T. “Self-Organization and Associative Memory”. Springer-Verlag. 3rd Ed. 1989. Fu, K.S., Gonzalez, R., Lee, C.S.G. “Robotics: Control, Sensing, Vision and Intelligence”. McGraw Hill. 1987. Canny, J. “A Computational Approach to Edge Detection”. IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 8, No. 6, Nov. 1986. Elgammal, A., Harwood, D., Davis, L. “Non-parametric Model for Background Subtraction”. 6th European Conference on Computer Vision. Dublin, June/July 2000. Aleksander, I. “Artificial Vision for Robots”. Kogan Page Ltd. 1983. Wang, H.H., Rock, S.M., Lee, M.J. “OTTER: The Design and Development of an Intelligent Underwater Robot”. Journal of Autonomous Robots. 3(2-3):297-320. Kluwer Academic Publishers. June-July 1996. Rife, J. and Rock, S.M. “Visual Tracking of Jellyfish in Situ”. Proceedings of the 2001 International Conference on Image Processing. IEEE. 2001. European Space Agency Board for Software Standardisation and Control. “ESA Software Engineering Standards”. Issue 2. Feb. 1991. Gamma, E., Helm, R., Johnson, R., Vlissides, J. “Design Patterns: Elements of Reusable Object-Oriented Software”. 1994.


Page 8 of 8