A Neural Network Approach to Camera Calibration for Binocular-Vision

An Artificial Neural Network Approach to Camera Calibration and 3D–World Reconstruction for Stereovision Sohaib A. Khan and Qurban Memon Faculty of Electronic Engineering, GIK Institute of Engineering Sciences and Technology, Topi, 23460, Distt. Swabi, N.W.F.P, Pakistan Fax. 92-938-71878, email: [email protected] eliminated. The training set for our ABSTRACT

neural network comprises of a variety of

Stereo-pair images obtained from

stereo-pair images and corresponding

two cameras can be used to compute 3D-

3D-world coordinates. We present the

world coordinates of a point using

results obtained on our prototype mobile

triangulation. However, to apply this

robot that employs two cameras as its

method, camera calibration parameters

sole sensors and navigates through

for

be

simple, regular obstacles in a high

Camera

contrast environment. We observe that

calibration is a rigorous experimental

the percentage errors obtained from our

procedure in which typically twelve

setup are comparable to those obtained

parameters are to be evaluated for each

through standard camera calibration

camera. The general camera model is

techniques, and that the system is

often such that the system becomes non-

accurate enough for most machine-

linear and requires good initial estimates

vision applications.

each

camera

experimentally

need

obtained.

to

to converge to a solution. We propose

Keywords: Binocular-vision, Camera

that for stereo vision applications in

calibration,

which real-world coordinates are to be

Autonomous mobile robot.

used to train the system such that the for

camera

calibration

Artificial

neural networks, Stereo reconstruction,

evaluated, artificial neural networks be

need

Stereovision,

is

1

I. INTRODUCTION

methods, non-linear methods and two-

Camera calibration is considered

step techniques. Linear methods assume

as an important issue in computer vision.

a simple pinhole camera model and

Accurate calibration of cameras is

incorporate no distortion effects. The

especially crucial for applications that

algorithm is non-iterative and therefore

involve

measurements,

very fast [1]-[4]. The limitation in this

depth from stereoscopy or motion from

case, however, is that camera distortion

images.

cannot be incorporated and therefore

quantitative

The

problem

of

camera

calibration is to compute the camera

lens

distortion

effects

cannot

be

extrinsic and intrinsic parameters. The

corrected. The problem of lens distortion

extrinsic parameters of a camera indicate

is significant in most off-the-shelf CCD

the position and the orientation of the

cameras. In non-linear techniques, first

camera with respect to the coordinate

the relationship between parameters is

system, and the intrinsic parameters

established and then an iterative solution

characterize the inherent properties of

is found by minimizing some error term.

the camera optics, including the focal

Many classical calibration techniques fall in this category [5]-[7]. Direct

length, the image center, the image scaling factor, and the lens distortion

Linear Transformation (DLT) introduced

coefficients. The number of parameters

by [1] has also been extended to

to be evaluated depends on the camera

incorporate distortion parameters. The

model being utilized. Typically, 12

advantage of such techniques is that the

parameters are found out for each

camera model can be very general to

camera. The problem of finding these

accommodate different types of cameras.

parameters is, in general, a non-linear

However, for this type of an iterative solution, a good initial guess is essential,

problem (due to lens distortion) and requires good initial estimates and an

otherwise

iterative solution.

converge

the to

iterations a

solution.

may

not

Two-step

in

techniques involve a direct solution of

literature for camera calibration can be

some camera parameters and an iterative

broadly divided into three types; linear

solution

The

techniques

found

2

for

the

other

parameters.

Iterative solution is also used to reduce

we directly train a neural network to

the errors in the direct solution. This is

compute

the most common and current approach

matched pairs of image points. The

to the problem [8]-[10].

advantage that we get is that the

Computing from

stereo

world

images

coordinates

coordinates

from

approach is not dependent on the camera

first

model, and will work for any type of

matching the images obtained from two

camera. In Section III, we discuss the

different

determine

results obtained by our approach, when

disparities (difference in positions of

tested on our prototype mobile robot

corresponding

then

system. In Section IV we present

transforming these into world distances.

conclusions followed by references in

The problem has been called the object

section V.

cameras

requires

world

to

features)

and

pose estimation problem in computer vision literature [6] or simply the stereo

II. ARTIFICIAL NEURAL NETWORK

reconstruction problem. The process of

MODEL FOR 3D-WORLD

matching is essential for finding world

RECONSTRUCTION

coordinates from a stereo image. The INPUT LAYER

HIDDEN LAYER

OUTPUT LAYER

matched points are used to find world coordinates using triangulation [11]. In this process, all the camera calibration parameters appear as constants in the

x1

equation. Hence, camera calibration is

X y1

essential to compute world coordinates

Y

from stereo-images.

x2 Z

In the next section, we present a y2

simple and unified approach to camera calibration and stereo reconstruction using neural networks. In our approach, instead of calibrating both cameras and

Fig. 1: Artificail Neural Network Model used for the problem

Fig 1: Artificial neural network model used for the problem

then using the triangulation procedure,

3

Artificial Neural Networks (ANNs) are

the stereo images (x1, y1) and (x2, y2).

being

scientific

These points are generated by the same

disciplines to solve a variety of problems

world point on both images, and form

in

prediction,

the input of the neural network. The

optimization associative memory and

output neurons correspond to the actual

control.

conventional

world coordinates of the point (X,Y,Z)

approaches to these problems is flexible

which are mapped as (x1 , y1) and (x2 , y2)

enough to perform well outside their

on the two images. We train the network

domain.

exciting

on a range of inputs and outputs, such

alternatives and many applications could

that the network could, after training,

benefit from them [12].

give the world coordinates for any

applied

pattern

in

many

recognition,

None

of

the

ANNs

provide

In our problem, we propose a

matched

pair

of

points.

The

multi-layer ANN model because camera

implementation details and the results

calibration problem is a non-linear

are given in the next section.

problem and cannot be solved with a

The approach requires training of

single layer network [13]. The best

the network for a set of matched image

architecture

points whose corresponding world point

and

algorithm

for

the

problem can only be evaluated by experimentation and there are no fixed rules to determine the ideal network model

for

a

problem.

However,

variations of architecture and algorithm effect only the convergence time of the

Fig 2: Calibration chart at 50 cm.

solution. is known. For this purpose, we use an

We have used the network model

object similar to that used by [7],

in Fig. 1 for our simulations. The model

consisting of a grid of points placed at

consists of 4 input neurons, 8 hidden

fixed intervals. This chart (shown in Fig.

neurons and 3 output neurons. The input neurons

correspond

to

the

2) is placed in front of the two cameras

image

at known distances from an arbitrary

coordinates of matched points found on

world origin. It should be noted that the

4

choice of the world origin in this

The advantage of this approach

approach is arbitrary and the cameras

lies

need not be fixed at some precise

generality. The technique will work for

location relative to the world origin. We

any type of camera and accurate camera

capture stereo images of the chart at

modeling is not an issue. The cameras

various distances from the world origin,

need not be fixed at any precise location

noting the value of the world coordinates

with respect to the world origin, nor do

of the chart at each instance. The set of

their axes have to be aligned. The

matched

world

precise positioning of the chart with

coordinates thus obtained form the

respect to the camera is also not

training set of our artificial neural

required, as is the case in some

network. Once the network is trained, we

approaches to the problem (e.g. [7]). The

present it with arbitrary matched points

calibration chart only needs to be at

and it directly gives us the world

known positions with respect to a world

coordinates

origin.

points

and

the

corresponding

to

the

mainly

matched pair.

in

its

simplicity

and

It should be noted that this

It should be noted that this

approach of camera calibration is only

approach is different from conventional

valid for stereo vision systems and is not

camera calibration techniques in the

applicable to monocular cameras. The

sense that no extrinsic or intrinsic

approach

camera parameters are found for any of

autonomous mobile robots that employ

the cameras. Instead, the system is

stereo vision. It is novel in the sense that

trained such that it learns to directly find

it is based on training rather than

the world coordinates of objects. The

computing explicit values of camera

experimental

is

parameters. However, the training set

almost the same as that of conventional

presented to the artificial neural network

approaches to the problem. However, the

must be a good enough representative of

approach is essentially very simple, and

the range of possible scenarios that the

yields comparable results.

system

procedure

required

operation.

5

is

particularly suited for

might

encounter

during

III. EXPERIMENTAL DETAILS AND

experiment did we have to measure the

RESULTS

exact distance of the cameras with

We took two cameras mounted

respect to the world origin, as would

on our prototype mobile robot. We kept

have been necessary in some calibration

the distance between the two cameras at

approaches found in literature. A set of

approximately 7cm and did not align

400 stereo pairs and their actual 3D-

their optical axes precisely. Next we

world distances formed our training set.

made a calibration chart consisting of a

We trained our neural network

grid of lines 5 cm apart (as shown in Fig.

on this set of 400 stereo pairs. The

2). This chart was placed in front of the

training was done by presenting the

cameras at various distances from the

stereo-pair points to the input of the

world origin and its image captured from

network and presenting the 3D-world

both cameras, without moving the

coordinates at the output. The training

cameras. The chart was placed at

was done using Back Propagation

distances that were in the range of

Algorithm,

interest of our robot. We defined the

initialization of weights and adaptive

with

Nguyen-Widrow

learning model. We used a log-sigmoid

range of interest of the robot to be within 50 cm to 140 cm in front of the robot,

activation function for both inputs and

and captured images in this range at

weights. All inputs were normalized

increments of 15 cm. We felt that our

between 0 and 1 before presenting them

robot should be able to correctly gauge

to the network. The target outputs were

the distance of obstacles that are present

also normalized between 0 and 1. Such

in this range.

normalization is necessary to obtain quicker learning. We experimented with

After capturing the images of the

various different network architectures,

calibration chart, we matched these images to obtain stereo pair points. For

but observed very little change in error

each stereo pair, we also knew the actual

by using alternate architectures.

world distance, since we had placed the

To check the accuracy of the

chart at measured distances with respect

trained network, we presented the

to a world origin. At no time in our

network with stereo-pair points that were

6

Mean Error in 3D coordinates as a function of training epochs 20

not included in the training set, but were from within our range of interest of

15

whose

corresponding

Percentage Error

distance. We had a set of such points 3D-world

coordinates were already known to us.

10

5

We computed the average percentage error that we got from these points. This

0 2 10

error presented the true learning of the

10

3

4

10 Training Epochs

10

5

10

6

Fig. 3: Mean percentage error in computing 3D coordinates as a function of number of epochs.

network, since we had not included these points in the training set. The results of

with points that were outside our range

mean percentage error observed during

of interest. We had originally trained the

training is shown in Fig. 3

network for points within 140 cm of the

From Fig. 3 it can be seen that as

world origin. Now we presented the

the training epochs are increased, the

network with points whose distance

error in the computations done by the

from the world origin ranged from 150

network is decreased. The error became

cm to 200 cm. These results are shown

less than 5% after 40,000 epochs of

in Fig. 4. We observed a linear increase

training.

in percentage error as the object moved

After

100,000

epochs

of

further away from our training limit.

training, the mean error in computing 3D

It should be noted that we placed

coordinates of a point became 4.33%. It must be appreciated that this

minimal constraints on the type of

error contains not only the error that is

camera required, the resolution and

contributed by the network, but also the

quality of images and the accuracy of

quantization errors of the camera. Since

measurements. Error in computation of Z-coordinate beyond training range 35

we did not use any sub-pixel measuring 30

Percentage Error

technique to find the image coordinates of a point, we should have a significant contribution of quantization error. Once our network was trained in

25

20

15

our range of interest, we also presented it

10 150 160 170 180 190 200 Distance from World Origin in cm (training range limit: 140 cm)

7

Fig 4: Percentage error in computation of Z-coordinate beyond the training range. (Results taken after training the network for 50000 epochs)

IV. CONCLUSION

Photogrammetry (Urbana IL), Jan. 1971, pp. 1-18

In this paper, we have presented

2. K. W. Wong, “Mathematical formulation

a unified approach to camera calibration and

3D-world

reconstruction

and digital analysis in close range

for

stereovision. We used an artificial neural

photogrammetry”, Photogrammetric Eng.

network to train the system such that

Remote Sensing, vol. 41, no. 11, pp. 1355-

when the system is presented with a

1373, Nov. 1975. 3. S. Ganapathy, “Decomposition of

matched pair of points, it automatically

transformation matrices for robot vision”,

computes the world coordinates of the corresponding approach

object

differs

from

point.

Proc. IEEE Int. Conf. Robotics Automat.

The

(Atlanta), Mar. 1984, pp. 130-139

conventional

approaches to the problem, which appear

4. O. D. Frugeras and G. Toscani, “Calibration

in computer vision literature in the sense

problem for stereo”, Proc. Int. Comput.

that the cameras are never actually

Vision Patt. Recogn. (Miami Beach, FL)

calibrated, and network is so trained as

June 1986, pp. 15-20. 5. D. C. Brown, “Decentering distortion of

to compute the correct world coordinates

lenses”, Photogrammetric Eng. Remote

of two matched points. The approach is simple in concept, independent of the

Sensing, May 1966, pp. 444-462

camera model used and the quality of

6. R. M. Haralick and L. G. Shapiro, “ Computer and Robot Vision”, vol. 2,

image obtained, and yields very good results when applied to a prototype

Addison-Wesley Publishing Company,

autonomous

1993, pp.125-178

mobile

robot

using

7. Y. Nomura, M. Sagara, H. Naruse, and A.

stereovision.

Ide, “Simple Calibration Algorithm for High-Distortion-Lens Camera,” IEEE Trans.

V. References: 1. Y. I. Abdel-Aziz and H. M. Karara, “Direct

Patt. Anal. Machine Intell., vol. PAMI-14, no. 11, pp. 1095-1099, Nov. 1992.

linear transformation into object space

8. R. Y. Tsai, “ A versatile camera calibration

coordinates in close-range photogrammetry”, Proc. Symp. Close-Range

technique for high-accuracy 3D machine vision metrology using off-the-shelf TV

8

cameras and lenses”, IEEE J. Robotics Automat., vol. RA-3, no. 4, pp. 323-344, Aug. 1987. 9. R. K. Lenz and R. Y. Tsai, “Techniques for calibration of the scale factor and image center for high accuracy 3D machine vision metrology”,

IEEE

Trans.

Patt.

Anal.

Machine Intell., vol. PAMI-10, no. 5, Sept. 1988, pp. 713-720. 10. J Weng, P Cohen and M Herniou, “Camera Calibration with Distortion Models and Accuracy Evaluation” IEEE Trans. Patt. Anal. Machine Intell., vol. PAMI-14, no. 10, Oct. 1992, pp. 965-980. 11. R. C. Gonzales and R. E. Woods, “Digital Image

Processing”,

Addison-Wesley

Publishing Company, 1992, pp. 56-71 12. A. K. Jain, J. Mao, K. M. Mohiuddin, “Artificial Neural Networks: A Tutorial”, IEEE Computer Magazine, Mar. 1996, pp. 31-44 13. L. Fausett, “Fundamentals of Neural Networks: Architectures, Algorithms and Applications”, Prentice Hall, Inc. 1994, pp. 289-330

9