An Artificial Neural Network Approach to Camera Calibration and 3D–World Reconstruction for Stereovision Sohaib A. Khan and Qurban Memon Faculty of Electronic Engineering, GIK Institute of Engineering Sciences and Technology, Topi, 23460, Distt. Swabi, N.W.F.P, Pakistan Fax. 92-938-71878, email:
[email protected] eliminated. The training set for our ABSTRACT
neural network comprises of a variety of
Stereo-pair images obtained from
stereo-pair images and corresponding
two cameras can be used to compute 3D-
3D-world coordinates. We present the
world coordinates of a point using
results obtained on our prototype mobile
triangulation. However, to apply this
robot that employs two cameras as its
method, camera calibration parameters
sole sensors and navigates through
for
be
simple, regular obstacles in a high
Camera
contrast environment. We observe that
calibration is a rigorous experimental
the percentage errors obtained from our
procedure in which typically twelve
setup are comparable to those obtained
parameters are to be evaluated for each
through standard camera calibration
camera. The general camera model is
techniques, and that the system is
often such that the system becomes non-
accurate enough for most machine-
linear and requires good initial estimates
vision applications.
each
camera
experimentally
need
obtained.
to
to converge to a solution. We propose
Keywords: Binocular-vision, Camera
that for stereo vision applications in
calibration,
which real-world coordinates are to be
Autonomous mobile robot.
used to train the system such that the for
camera
calibration
Artificial
neural networks, Stereo reconstruction,
evaluated, artificial neural networks be
need
Stereovision,
is
1
I. INTRODUCTION
methods, non-linear methods and two-
Camera calibration is considered
step techniques. Linear methods assume
as an important issue in computer vision.
a simple pinhole camera model and
Accurate calibration of cameras is
incorporate no distortion effects. The
especially crucial for applications that
algorithm is non-iterative and therefore
involve
measurements,
very fast [1]-[4]. The limitation in this
depth from stereoscopy or motion from
case, however, is that camera distortion
images.
cannot be incorporated and therefore
quantitative
The
problem
of
camera
calibration is to compute the camera
lens
distortion
effects
cannot
be
extrinsic and intrinsic parameters. The
corrected. The problem of lens distortion
extrinsic parameters of a camera indicate
is significant in most off-the-shelf CCD
the position and the orientation of the
cameras. In non-linear techniques, first
camera with respect to the coordinate
the relationship between parameters is
system, and the intrinsic parameters
established and then an iterative solution
characterize the inherent properties of
is found by minimizing some error term.
the camera optics, including the focal
Many classical calibration techniques fall in this category [5]-[7]. Direct
length, the image center, the image scaling factor, and the lens distortion
Linear Transformation (DLT) introduced
coefficients. The number of parameters
by [1] has also been extended to
to be evaluated depends on the camera
incorporate distortion parameters. The
model being utilized. Typically, 12
advantage of such techniques is that the
parameters are found out for each
camera model can be very general to
camera. The problem of finding these
accommodate different types of cameras.
parameters is, in general, a non-linear
However, for this type of an iterative solution, a good initial guess is essential,
problem (due to lens distortion) and requires good initial estimates and an
otherwise
iterative solution.
converge
the to
iterations a
solution.
may
not
Two-step
in
techniques involve a direct solution of
literature for camera calibration can be
some camera parameters and an iterative
broadly divided into three types; linear
solution
The
techniques
found
2
for
the
other
parameters.
Iterative solution is also used to reduce
we directly train a neural network to
the errors in the direct solution. This is
compute
the most common and current approach
matched pairs of image points. The
to the problem [8]-[10].
advantage that we get is that the
Computing from
stereo
world
images
coordinates
coordinates
from
approach is not dependent on the camera
first
model, and will work for any type of
matching the images obtained from two
camera. In Section III, we discuss the
different
determine
results obtained by our approach, when
disparities (difference in positions of
tested on our prototype mobile robot
corresponding
then
system. In Section IV we present
transforming these into world distances.
conclusions followed by references in
The problem has been called the object
section V.
cameras
requires
world
to
features)
and
pose estimation problem in computer vision literature [6] or simply the stereo
II. ARTIFICIAL NEURAL NETWORK
reconstruction problem. The process of
MODEL FOR 3D-WORLD
matching is essential for finding world
RECONSTRUCTION
coordinates from a stereo image. The INPUT LAYER
HIDDEN LAYER
OUTPUT LAYER
matched points are used to find world coordinates using triangulation [11]. In this process, all the camera calibration parameters appear as constants in the
x1
equation. Hence, camera calibration is
X y1
essential to compute world coordinates
Y
from stereo-images.
x2 Z
In the next section, we present a y2
simple and unified approach to camera calibration and stereo reconstruction using neural networks. In our approach, instead of calibrating both cameras and
Fig. 1: Artificail Neural Network Model used for the problem
Fig 1: Artificial neural network model used for the problem
then using the triangulation procedure,
3
Artificial Neural Networks (ANNs) are
the stereo images (x1, y1) and (x2, y2).
being
scientific
These points are generated by the same
disciplines to solve a variety of problems
world point on both images, and form
in
prediction,
the input of the neural network. The
optimization associative memory and
output neurons correspond to the actual
control.
conventional
world coordinates of the point (X,Y,Z)
approaches to these problems is flexible
which are mapped as (x1 , y1) and (x2 , y2)
enough to perform well outside their
on the two images. We train the network
domain.
exciting
on a range of inputs and outputs, such
alternatives and many applications could
that the network could, after training,
benefit from them [12].
give the world coordinates for any
applied
pattern
in
many
recognition,
None
of
the
ANNs
provide
In our problem, we propose a
matched
pair
of
points.
The
multi-layer ANN model because camera
implementation details and the results
calibration problem is a non-linear
are given in the next section.
problem and cannot be solved with a
The approach requires training of
single layer network [13]. The best
the network for a set of matched image
architecture
points whose corresponding world point
and
algorithm
for
the
problem can only be evaluated by experimentation and there are no fixed rules to determine the ideal network model
for
a
problem.
However,
variations of architecture and algorithm effect only the convergence time of the
Fig 2: Calibration chart at 50 cm.
solution. is known. For this purpose, we use an
We have used the network model
object similar to that used by [7],
in Fig. 1 for our simulations. The model
consisting of a grid of points placed at
consists of 4 input neurons, 8 hidden
fixed intervals. This chart (shown in Fig.
neurons and 3 output neurons. The input neurons
correspond
to
the
2) is placed in front of the two cameras
image
at known distances from an arbitrary
coordinates of matched points found on
world origin. It should be noted that the
4
choice of the world origin in this
The advantage of this approach
approach is arbitrary and the cameras
lies
need not be fixed at some precise
generality. The technique will work for
location relative to the world origin. We
any type of camera and accurate camera
capture stereo images of the chart at
modeling is not an issue. The cameras
various distances from the world origin,
need not be fixed at any precise location
noting the value of the world coordinates
with respect to the world origin, nor do
of the chart at each instance. The set of
their axes have to be aligned. The
matched
world
precise positioning of the chart with
coordinates thus obtained form the
respect to the camera is also not
training set of our artificial neural
required, as is the case in some
network. Once the network is trained, we
approaches to the problem (e.g. [7]). The
present it with arbitrary matched points
calibration chart only needs to be at
and it directly gives us the world
known positions with respect to a world
coordinates
origin.
points
and
the
corresponding
to
the
mainly
matched pair.
in
its
simplicity
and
It should be noted that this
It should be noted that this
approach of camera calibration is only
approach is different from conventional
valid for stereo vision systems and is not
camera calibration techniques in the
applicable to monocular cameras. The
sense that no extrinsic or intrinsic
approach
camera parameters are found for any of
autonomous mobile robots that employ
the cameras. Instead, the system is
stereo vision. It is novel in the sense that
trained such that it learns to directly find
it is based on training rather than
the world coordinates of objects. The
computing explicit values of camera
experimental
is
parameters. However, the training set
almost the same as that of conventional
presented to the artificial neural network
approaches to the problem. However, the
must be a good enough representative of
approach is essentially very simple, and
the range of possible scenarios that the
yields comparable results.
system
procedure
required
operation.
5
is
particularly suited for
might
encounter
during
III. EXPERIMENTAL DETAILS AND
experiment did we have to measure the
RESULTS
exact distance of the cameras with
We took two cameras mounted
respect to the world origin, as would
on our prototype mobile robot. We kept
have been necessary in some calibration
the distance between the two cameras at
approaches found in literature. A set of
approximately 7cm and did not align
400 stereo pairs and their actual 3D-
their optical axes precisely. Next we
world distances formed our training set.
made a calibration chart consisting of a
We trained our neural network
grid of lines 5 cm apart (as shown in Fig.
on this set of 400 stereo pairs. The
2). This chart was placed in front of the
training was done by presenting the
cameras at various distances from the
stereo-pair points to the input of the
world origin and its image captured from
network and presenting the 3D-world
both cameras, without moving the
coordinates at the output. The training
cameras. The chart was placed at
was done using Back Propagation
distances that were in the range of
Algorithm,
interest of our robot. We defined the
initialization of weights and adaptive
with
Nguyen-Widrow
learning model. We used a log-sigmoid
range of interest of the robot to be within 50 cm to 140 cm in front of the robot,
activation function for both inputs and
and captured images in this range at
weights. All inputs were normalized
increments of 15 cm. We felt that our
between 0 and 1 before presenting them
robot should be able to correctly gauge
to the network. The target outputs were
the distance of obstacles that are present
also normalized between 0 and 1. Such
in this range.
normalization is necessary to obtain quicker learning. We experimented with
After capturing the images of the
various different network architectures,
calibration chart, we matched these images to obtain stereo pair points. For
but observed very little change in error
each stereo pair, we also knew the actual
by using alternate architectures.
world distance, since we had placed the
To check the accuracy of the
chart at measured distances with respect
trained network, we presented the
to a world origin. At no time in our
network with stereo-pair points that were
6
Mean Error in 3D coordinates as a function of training epochs 20
not included in the training set, but were from within our range of interest of
15
whose
corresponding
Percentage Error
distance. We had a set of such points 3D-world
coordinates were already known to us.
10
5
We computed the average percentage error that we got from these points. This
0 2 10
error presented the true learning of the
10
3
4
10 Training Epochs
10
5
10
6
Fig. 3: Mean percentage error in computing 3D coordinates as a function of number of epochs.
network, since we had not included these points in the training set. The results of
with points that were outside our range
mean percentage error observed during
of interest. We had originally trained the
training is shown in Fig. 3
network for points within 140 cm of the
From Fig. 3 it can be seen that as
world origin. Now we presented the
the training epochs are increased, the
network with points whose distance
error in the computations done by the
from the world origin ranged from 150
network is decreased. The error became
cm to 200 cm. These results are shown
less than 5% after 40,000 epochs of
in Fig. 4. We observed a linear increase
training.
in percentage error as the object moved
After
100,000
epochs
of
further away from our training limit.
training, the mean error in computing 3D
It should be noted that we placed
coordinates of a point became 4.33%. It must be appreciated that this
minimal constraints on the type of
error contains not only the error that is
camera required, the resolution and
contributed by the network, but also the
quality of images and the accuracy of
quantization errors of the camera. Since
measurements. Error in computation of Z-coordinate beyond training range 35
we did not use any sub-pixel measuring 30
Percentage Error
technique to find the image coordinates of a point, we should have a significant contribution of quantization error. Once our network was trained in
25
20
15
our range of interest, we also presented it
10 150 160 170 180 190 200 Distance from World Origin in cm (training range limit: 140 cm)
7
Fig 4: Percentage error in computation of Z-coordinate beyond the training range. (Results taken after training the network for 50000 epochs)
IV. CONCLUSION
Photogrammetry (Urbana IL), Jan. 1971, pp. 1-18
In this paper, we have presented
2. K. W. Wong, “Mathematical formulation
a unified approach to camera calibration and
3D-world
reconstruction
and digital analysis in close range
for
stereovision. We used an artificial neural
photogrammetry”, Photogrammetric Eng.
network to train the system such that
Remote Sensing, vol. 41, no. 11, pp. 1355-
when the system is presented with a
1373, Nov. 1975. 3. S. Ganapathy, “Decomposition of
matched pair of points, it automatically
transformation matrices for robot vision”,
computes the world coordinates of the corresponding approach
object
differs
from
point.
Proc. IEEE Int. Conf. Robotics Automat.
The
(Atlanta), Mar. 1984, pp. 130-139
conventional
approaches to the problem, which appear
4. O. D. Frugeras and G. Toscani, “Calibration
in computer vision literature in the sense
problem for stereo”, Proc. Int. Comput.
that the cameras are never actually
Vision Patt. Recogn. (Miami Beach, FL)
calibrated, and network is so trained as
June 1986, pp. 15-20. 5. D. C. Brown, “Decentering distortion of
to compute the correct world coordinates
lenses”, Photogrammetric Eng. Remote
of two matched points. The approach is simple in concept, independent of the
Sensing, May 1966, pp. 444-462
camera model used and the quality of
6. R. M. Haralick and L. G. Shapiro, “ Computer and Robot Vision”, vol. 2,
image obtained, and yields very good results when applied to a prototype
Addison-Wesley Publishing Company,
autonomous
1993, pp.125-178
mobile
robot
using
7. Y. Nomura, M. Sagara, H. Naruse, and A.
stereovision.
Ide, “Simple Calibration Algorithm for High-Distortion-Lens Camera,” IEEE Trans.
V. References: 1. Y. I. Abdel-Aziz and H. M. Karara, “Direct
Patt. Anal. Machine Intell., vol. PAMI-14, no. 11, pp. 1095-1099, Nov. 1992.
linear transformation into object space
8. R. Y. Tsai, “ A versatile camera calibration
coordinates in close-range photogrammetry”, Proc. Symp. Close-Range
technique for high-accuracy 3D machine vision metrology using off-the-shelf TV
8
cameras and lenses”, IEEE J. Robotics Automat., vol. RA-3, no. 4, pp. 323-344, Aug. 1987. 9. R. K. Lenz and R. Y. Tsai, “Techniques for calibration of the scale factor and image center for high accuracy 3D machine vision metrology”,
IEEE
Trans.
Patt.
Anal.
Machine Intell., vol. PAMI-10, no. 5, Sept. 1988, pp. 713-720. 10. J Weng, P Cohen and M Herniou, “Camera Calibration with Distortion Models and Accuracy Evaluation” IEEE Trans. Patt. Anal. Machine Intell., vol. PAMI-14, no. 10, Oct. 1992, pp. 965-980. 11. R. C. Gonzales and R. E. Woods, “Digital Image
Processing”,
Addison-Wesley
Publishing Company, 1992, pp. 56-71 12. A. K. Jain, J. Mao, K. M. Mohiuddin, “Artificial Neural Networks: A Tutorial”, IEEE Computer Magazine, Mar. 1996, pp. 31-44 13. L. Fausett, “Fundamentals of Neural Networks: Architectures, Algorithms and Applications”, Prentice Hall, Inc. 1994, pp. 289-330
9