Ballast 3D Reconstruction by a Matching Pursuit Based ... - CiteSeerX

2004 IEEE Intelligent Vehicles Symposium University of Parma Parma, Italy • June 14-17, 2004

Ballast 3D Reconstruction by a Matching Pursuit Based Stereo Matcher Anna Labarile, Ettore Stella, Nicola Ancona, Arcangelo Distante Institute on Intelligent System for Automation - C.N.R. (Bari) - Italy

Abstract-- In the last years the railway association spent much money to check the railway infrastructure. The railway infrastructure is a particular field in which the periodical surface inspection of rolling plane can help the operator to prevent critical situations. The maintenance and monitoring of this infrastructure is an important aspect for railway association. In this paper we describe a visual system able to detect defect where the ballast bed has an anomalous behaviour. The proposed system is a stereo rig, based on two high resolution line scanner TV cameras, installed under the train. To detect defect of the layer ballast we developed stereo vision techniques use Matching Pursuit method to extract features from images and the similarity function to execute the correspondence between left and right images. Visual inspection can help to increase the control quality and reduce costs to maintenance.

I.

INTRODUCTION

Stereo matching is one of the most active research areas in computer vision. Recovering three-dimensional images is the fundamental goal of stereo techniques. The problem of recovering the missing dimension (depth) from a set of images is essentially a correspondence problem: given a point in the first image to find the corresponding one in the other images. Over the years numerous matching algorithms for passive stereo have been proposed; they can be roughly classified in two mains: 1. Feature Based. These algorithms extract the features of interest from images (edge, corner, line) and match them in two or more views. Usually, these methods only yield very sparse depth maps. 2. Area Based. These algorithms match neighbouring pixels values within windows between images by computing the correlation or the sum of squared differences (SSD). The selection of the appropriate windows size is critical to achieving a smoothed and detailed disparity map. The windows size must be large enough to include an acceptable intensity variation to obtain a reliable matching. It not must be too much small to avoid the effect of projective distortion. The optimal choice of window size depends on the local amount of variation in texture and disparity [1], [2], [3] [4], [5], [6]. In this paper, we present a new area based method, that better performs than standard techniques (SSD, correlation , etc) producing a dense depth map. The novel stereo technique is based on Matching Pursuit (MP) [11].

0-7803-8310-9/04/$20.00 © 2004 IEEE

653

In our approach, we consider a fixed patch and we apply methods of the matching pursuit. The result is a vector of coefficients used for stereo matching. On the other hand, we compute the coefficients in correspondence of every fixed patch in the left image and shifting a fixed window in the second one. This second window moves in the right image by integer increments along the epipolar line and an array of coefficients is generated for each increment. The correspondence is valid if and only if the similarity measure is the best. Two assumptions about the matching constrains were explicitly stated: uniqueness and continuity [7], [8]. Uniqueness: a given pixel or feature of an image can match no more than a pixel or feature in an other image. Continuity: the cohesiveness of matters suggests that the matches disparity should smoothly vary everywhere in the image. This algorithm has been tested on synthetic and real images, and the quality of the disparity maps obtained on images with large intensity variation is enough good. The obtained results have been compared with those of standard SSD methods. The paper is organized as follows. In section two, we present an overview of the system. In section three, we present our model of stereo matching. Finally, in the section four we present the experimental results and we compare them with those of SSD methods. II.

SYSTEM OVERVIEW

Every year the railway association of all nations in the world spent much money to check the railway infrastructure as station, overhead line, and railway. The use of vision-based techniques in railway infrastructure monitoring has assumed great importance. Currently, the procedures adopted by main railway companies consist in stopping train traffic over the route that is analysed by human operators that walk along the tracks looking for defects. This procedure cannot more used for its inefficiency, in fact, the stop of traffic induces many problems in train scheduling and overall the visual inspection is strongly constrained to the experience of human operators. So, recently, some of the main railway companies have introduced the use of diagnostic car equipped with sensors of different kind. At first, these cars had only data acquisition and recording functionalities. After, a human

operator, offline, had to analyse recorded data and searching for anomalies. Now, these cars operate autonomously by mean an analysis system able to detect defects. They only require the human supervision task. In this paper a visual system able to perform the 3D reconstruction of the ballast is described. This functionality is important in railway maintenance, because it permits to detect context where the ballast bed has an anomalous behaviour. In fact, these situations constraint the rails to have no correct attitude so that they can easily break. Today, the layer of ballast in all the main line tracks consists of crushed stones or slag in chips of specified size. The irregular shape of the fragments ensures a porous mass for good drainage and at the same time permits interlocking, so that the weight is evenly distributed over the roadbed. The detection of places where the ballast bed is not homogeneous indicates that the deep ballast layer lost its consistency. This means that these places need of further analysis, executed by using the Ground Penetrating Radar. The state of art for 3D ballast reconstruction reveals the use of active system, based on laser triangulation, that has the big disadvantage to be slow. In fact, this is due to the frame rate of the area scan TV cameras that is inversely proportional to the spatial resolution. For example, the mainly used TV camera is the CA-D6 by DALSA that provides a frame rate of 262 fps with a spatial resolution of 510x516 pixels. By means of this TV camera it is possible to operate at a speed of 200 km/h but the pixel size permits to have a 3D precision of few centimetres every 20 centimetres. Moreover, the field of view is about of 300 mm on the rail. The maintenance procedures needs of a larger field of view (whole ballast bed), a better precision and sometimes an higher operating speed. Our proposed system is a stereo ring, based on two high resolution line scanner TV cameras (2048 pixel/line), installed under the car. The field of view can reach 1200 mm. The trigger resolution for the line acquisition is 1 mm. III.

3D RECONSTRUCTION

The 3D reconstruction is based on a new area based stereo matching technique. We have developed a model of stereo matching (see Fig.1) that is composed of three steps. Extract Patch

Matching Patch

Apply Method of Matchin Pursuit

Extract Best Measure

Left Image

Right Image

Acquisition Image

Compute Coefficients

Image Representations with Overcomplete Dictionaries

Disparity Map

Compute Disparity Method of S tereo Matching

Fig. 1. Model of Stereo Matching

In the first step we acquire two images, left and right, from two line scanner TV cameras C1 and C2 (see Fig. 2).

654

Fig. 2. Stereo Vision Structure

The depth of point P can be calculated given two images taken from known cameras C1 and C2 and the corresponding points p1 of coordinates (ul,vl) and p2 of coordinates (ur,vr) within those images, which are projections of P. The choose of the distance between C1 and C2 (baseline) is very important. In fact, short baselines simplify the matching process, but produces uncertain disparity results while long baselines produce precise disparity results, but complicate the matching process. We assume that the baseline is short. In this work all our images are taken on linear path with the optical axis perpendicular to the camera displacement. This is the case of parallel camera: two retinal planes, horizontally displaced, are coplanar in the space and have the same focal length. In this case, the epipolar line coincides with the horizontal scanlines. So, the disparity value d is defined as: d = vr - vl and ul = ur (1) In the second step we compute the coefficients for every fixed patch in the left image shifting a fixed window in the right image. This last window is moved in the right image by integer increments c∈[cmin, cmax] along the epipolar line and an array of coefficients is generated for each increments. To compute the coefficients from the extracted patch we use the method of the matching pursuit [9]. The method requires the system of functions (ȥ)jɽj to be complete, and not to be a basis. On the other hand, the system spans the whole space, getting rid of the constraint of linear indipendence among the elements of the system. Such system is, in general, costituted by much more elements than the ones present in a basis and for this reason it is called overcomplete or redundant system of functions. The main difference with respect to a basis is that now there is not a unique representation for each element of the space. On the contrary, there is a plurality of possible representations. The existence of multiple representations is due to the fact that the elements of the system are linearly dependent. The MP method is a non linear algorithm that decomposes any signals into a linear expansion of waveforms, in general, belonging to a overcomplete dictionary of functions. It is an iterative procedure which, at each step, selects the atom of the dictionary which best

reduces the residual between the current approximation of the signal and the signal itself. On the other hand, given an overcomplete dictionary ȥ consisting on n Gabor atoms [12] and an input image I, the MP is a greedy technique that permits to approximate I by an image Î: m

Î = ¦ ci ψ i

with m