Sensor Planning for a Trinocular Active Vision System - CiteSeerX

3 downloads 0 Views 346KB Size Report
E-mail: peter, sayed, farag @cairo.spd.louisville.edu. URL: http: www.cvip.uofl.edu ..... Roberto, Model-Based Planning of Optimal Sen- sor Placements for ...
Sensor Planning for a Trinocular Active Vision System Peter Lehel, Elsayed E. Hemayed and Aly A. Farag CVIP Lab, University of Louisville, KY 40292 USA E-mail:(peter, sayed, farag)@cairo.spd.louisville.edu

URL: http://www.cvip.uo .edu

Abstract

We present an algorithm to solve the sensor planning problem for a trinocular, active vision system. This algorithm uses an iterative optimization method to rst solve for the translation between the three cameras and then uses this result to solve for parameters such as pan, tilt angles of the cameras and zoom setting.

1 Introduction

Sensor planning in computer vision is an emerging eld that aims to understand the relationship between objects and the sensors which observe them. In our project, we focus on vision sensor planning for the task of reconstructing and recognizing 3D objects. For this system, generalized camera parameters such as position, orientation, and optical settings have to be determined so that object features are within the eld of view and are in focus. Recent developments in sensor planning are related to a new area called active sensing. In active sensing, sensor parameters are controlled in response to the requirements of the task. The main advantage of active sensing is that it can transform ill-posed vision problems into wellposed problems by using constraints introduced by the dynamic nature of the sensor. Also, by recon guring an active vision system, its e ective scope is increased and a range of sensing situations can be accommodated. A number of di erent vision planning systems have been developed in the past years that use a priori information about the observed object and the applied sensors to automatically generate sensor parameters that satisfy di erent vision constraints [1]. The di erence between these techniques is in the approach used to determine sensor parameter values. Several systems use a generate and test approach, [2], where sensor positions and settings are chosen and tested to meet the  This research has been supported by grants from NSF (ECS-9505674) and The Department of Defense (USNV N00014-97-11076).

requirements of the task. The resulting parameters are valid for a speci c setup only and cannot be used if the environment is changed. For vision systems, a single sensor con guration may not always result in a suciently informative view. Therefore, other methods take a synthesis approach, [3, 4]. In these sensor planning techniques, the task requirements are characterized analytically and sensor parameter values are directly determined from analytical relationships that satisfy the prede ned constraints. Another approach is sensor simulation, [5], where a scene is visualized, given information about the environment and the vision system at hand, and a virtual system is created. This virtual system provides a framework for planning sensor con gurations. This can be done by a generate and test approach. Satisfactory sensor con gurations can be found by creating a simulated view of the environment and evaluating the constraints in the simulated image. The sensor planning algorithm developed for our active vision system is a combination of the synthesis and simulation approaches. Finally, there are the so-called expert systems, [6], that use expert knowledge about viewing and illumination techniques as a system rule base.

1.1 CardEye System

The CardEye system is a robot-controlled, trinocular, multi-sensor, active vision system (Fig. 1). The goal in developing this system was to create a exible, precise tool to mimic the functionality of the human vision system. CardEye utilizes more sensors than human beings, which improves the resultant performance. Speci cally, the system has three cameras to improve the recovery process, and the system uses an active lighting device to assist in camera parameter selection. The system has the basic mechanical properties of active vision platforms - pan, tilt, roll, focus, zoom, aperture, vergence and baseline. The exibility of the system and the availability of di erent sensors will assist in solving many problems in active vision research [7]. Building a trinocular system with di erent mechanical properties adds complexity and redun-

1063-6919/99 $10.00 (c) 1999 IEEE

Active Lighting Device

Approximated Optical Axis

>> >>

β

The Shoulder: θ2

>>> >>> >>>

M(x, y, z)

Target

Z

The Elbow: θ3 Trinocular Head

The Base: θ1 Y

X

Figure 1: CardEye simulated system. A trinocular head attached to a three-segment robot arm.

along their mounts to change the baseline distance. At the same time, the cameras can rotate towards each other to xate to a point in space. This is known as the vergence property. The xation process in the system is performed in two steps. The rst step is to change the robotic arm joints to align the xation point with the end-e ector segment of the arm. The second step is to rotate and translate the cameras to xate at that point. Fig. 3 describes the geometry of the vision module in more detail. The object is inside a sphere that has a radius R. By adjusting the vergence angle Eqn. 1, all cameras can xate on the same point in 3D space. The system's xation point is the center C of the sphere. The center of the sphere is at distance d from the origin along the z axis. This distance is called object distance. The distance from the cameras' optical center top point C is l, and it can be easily calculated (l = t2 + d2 ).

  = tan,1 dt

dancy to the system. To eliminate the redundancy, the mechanical properties were assigned to the system as a whole and not to each camera. As a consequence, the three cameras are coupled together to perform the same motion, to xate to a point, or to change the baseline while a robot arm, on which the camera assembly is mounted, provides the exibility to pan, tilt, or roll. Active lenses add the zoom and focus properties to the system.

(1)

Figure 3: CardEye trinocular active vision module's system geometry. The target for this system is a sphere with radius R. Figure 2: CardEye trinocular active vision module. The vision module of the CardEye system, Fig. 2, contains three Hitachi KP-M1 black and white CCD cameras (e.g. c1 , c2 , c3 ) with H10x11E Fujinon zoom lenses. The cameras are placed at equal distances from each other and an active lighting device that is the integration of two laser devices, a range nder and a pattern generator is mounted at the center of the trinocular head (O). The cameras can translate (t)

2 CardEye System Constraints

A complete sensor planning system must automatically generate camera position and orientation as well as optical settings (e.g., zoom, focus, and aperture settings of active zoom lenses) so that the output of the system satis es certain prede ned criteria. Criteria for one camera is that the object is within the eld of view, and that it is in focus. Fig. 4. illustrates the eld of view angle, that satis es the eld of view criterion. Eqn. 2 shows how this angle can be calculated using the system's geometrical setup.

1063-6919/99 $10.00 (c) 1999 IEEE

= sin,1



p 2R



(2)

t + d2

Figure 4: Field of view angle.

Figure 6: Relationship between  and overlap surface area. The larger , the larger the overlap surface area.

In order for the sensor planning system to generate satisfactory sensor locations, two main constraints must be satis ed to maximize the e ectiveness of 3D reconstruction from three 2D images. These constraints are the overlap and disparity constraints. In this section, these two system constraints will be discussed in detail.

gle  is much easier to calculate and to implement in an autonomous planning algorithm. The following is a derivation for the overlap angle : Projecting , Eqn. 1, to the c1 c3 C plane angle  is obtained: p ! 3t ; , 1 (3)  = tan 2d0

When two cameras observe the same sphere, two spherical caps are created. The intersection between these two spherical caps is called the overlap surface area. From the CardEye system's geometry, the three cameras are at distance t from p the center and at equal distance from each other ( 3t). Therefore, it is suf cient to investigate one of the three equal overlap surface areas between any two cameras as shown in Fig. 5. From the standpoint of 3D reconstruction, it is

where d0 = d2 + 2t 2 .

2.1 Overlap constraint

Figure 5: Overlap surface area of two cameras created by the intersection of two spherical caps. desirable to maximize the overlap surface area between the two cameras so that the number of corresponding points between the two 2D images is relatively high. To determine the exact size of this overlap surface area as a function of t seems to be an extremely complicated problem. From Fig. 6, which was obtained by projecting the geometry to the plane created by c1 ; c3 , and C , it can be seen that the larger , the larger the area, and the goal of maximum overlap surface area is achieved. The advantage of this simpli cation is that the an-

q

= 2 , 

(4)

 = 2

(5) Therefore, the overlap surface area which is directly related to the angle,  in radians is given as: p !  =  , 2 tan,1 2d30t (6) where the translation t is the distance between the cameras and the origin of the system, and d0 is the projection of the object distance d. Note that from Eqn. 6, we can see the intuitively obvious fact that the overlap surface area has a maximum when  =  and t = 0, which means that the two cameras are in the same position.

2.2 Disparity constraint

The second major constraint is the disparity constraint that has an important e ect on the planned locations of the sensors. Side-to-side di erences in the positions of similar images in the two cameras are called horizontal disparities and can produce a compelling sensation of three-dimensionality [9, 10]. The total angular disparity of the two cameras is de ned as  = 2 [10] and can be seen in Fig. 7. The following is a derivation for : p ! p ! 3 t 3t , 1 , 1  = tan (7) 2d0 , tan 2(d0 + R)

1063-6919/99 $10.00 (c) 1999 IEEE

planning system. The only independent variable is t, the parameter for which the system rst has to solve. The proposed solution uses an analytical approach to characterize the equations by carefully changing one parameter at a time. Throughout this analysis the size of the object is set to a constant R = 0:2m.

3.1 E ect of object distance Figure 7: Total angular disparity, . The value for  is de ned as 2. For small angles, the tangent of an angle is equal to the angle itself in radians. p p 3 t (8)   2d0 , 2(d0 +3tR) Since d0 R is relatively small by comparison with d02 the angular disparity,  in radians is: p   3Rt (9)

In Fig. 8 the overlap angle and angular disparity curves are plotted for various object distances in the range of 1:2m to 10m, which is an initial estimation for the system's working space. The lenses' minimum focus distance is 1:2m, and the system will be used indoors. A step size for the object distance, d, is selected using the formula d = 1:2 + 0:8i, where i = 0::10. The goal is to maximize these two constraints, but according to the functions, Eqn. 6 and Eqn. 9, the overlap angle is decreasing as t increases and has a maximum at t = 0, which means that the three cameras are at the same location, i.e., at the origin. On the other hand, the angular disparity function is increasing with t. The intersections of the corresponding functions provide a possible selection for the optimum.

d02

where t is the distance between the cameras and the origin of the system, d0 is the projection of the object distance, and R is the radius of the sphere circumscribing the object. Note that Eqn. 9 is a linear function of the translation parameter t. Thus, by increasing t more adequate depth information can be recovered from the imaged object.

3 Analysis of System Constraints

In this section, a complete system analysis and initial result are presented to help solve the CardEye vision sensor planning problem. The goal for the sensor planning system is to maximize the e ectiveness of the 3D reconstruction algorithm from one frame. For e ective reconstruction, the frames must display adequate depth information and have a fairly large overlap area. These two constraints are in con ict. While the overlap constraint is trying to move the cameras as close as possible, the disparity constraint is moving the cameras away from each other. Consequently, there exists a need to nd an optimum that satis es both constraints to a certain extent. The following analysis will show that the intersections of the corresponding functions provide a satisfactory selection for the optimum. In Section 2, equations were derived for the overlap (Eqn 6) and angular disparity (Eqn. 9) constraints. The parameters of d (object distance) and R (estimated size of object) are inputs given for the sensor

Figure 8: Overlap and disparity curves as object distance changes (1.2m-10m).

3.2 Data normalization

Fig. 8 shows twelve intersection points selected for the optimum. However, for these data points, the corresponding translation value is out of the system's working environment. This is due to the fact that even though the two functions have the same units, i.e. radians, they measure completely di erent quantities. The solution is to normalize the two functions according to a third constraint. This is a simple physical constraint that limits the range for the translation parameter, t. From the actual CardEye vision module, there is a lower limit for each camera's position, which is 0:1m from the origin and has an upper limit of 0:6m. For example, if the object is at the closest position of d = 1:2m or less, then obviously the goal is to

1063-6919/99 $10.00 (c) 1999 IEEE

place the cameras as close as possible, at the t = 0:1m position. Similarly, if the object is at d = 10m or further away from the system, the trivial position is the maximum of t = 0:6m. Using this third physical constraint, the intersection data is normalized to the range of 0:1m  t  0:6m, as seen in Fig. 9. In the following section, this normalized data is analyzed and evaluated according to the two main constraints. Figure 10: Constant angular disparity curves (constant size, changing d).

Figure 9: Normalized data to t translation parameter t in the range of 0.1m to 0.6m.

f (x) = 0:005622x2 + 0:04068x + 0:04125

3.3 Workspace estimation

Constant angular disparity curves are generated using the following equation from Eqn. 9:

 02  (10) t = c pd 3R where c = 0:005 + 0:005i for i = 0::4. This equa-

tion gives solutions for translation as the object distance changes while keeping the angular disparity constant. See the results in Fig. 10. After evaluating the group of constant angular disparity curves, it is possible to achieve a better estimate for the working environment of the CardEye system where the overlap and angular disparity parameters are within a controllable range. By selecting the lowest value of c = 0:005 in Fig. 10, a new maximum object distance of approximately d = 7:0m can be determined. As a result, the new working environment of the CardEye system is nally estimated to be 1:2m  d  7:0m. By repeating the same analysis and optimization for this new object distance range, initial results are obtained.

3.4 Initial result of analysis

method. The data in Fig. 11 was estimated by rst, second, and third order polynomials. From these results, we can conclude that there is no improvement using a third order polynomial curve compared to a second order curve. Therefore, the solution for the curve tting process is a second order polynomial with the following equation:

Discrete data points were used to plot all the graphs in the analysis. In order to have a continuous solution for the planned sensor locations, a continuous, closed form solution is required. Such a function can be obtained by tting an n-degree polynomial curve through a set of data points using the least-squares

(11)

Eqn. 11 provides the following nal solution for the placement of the cameras: 8 if d < 1:2m < 0:1m t = : 0:005622d2 + 0:04068d + 0:04125 if 1:2  d  7:0 0:6m if d > 7:0m (12)

Figure 11: n-order polynomial estimation of solution where n=1,2,3.

4 Object Size Constraint

The analysis and optimization in the previous section was performed for a xed size (R = 0:2m) object. Additional analysis is needed to investigate the e ect if the size of the object changes. When the size of an

1063-6919/99 $10.00 (c) 1999 IEEE

d [m] Rmax [m] size range for R [m] 1.200 0.329 0.2-0.3 1.925 0.528 0.3-0.5 2.650 0.727 0.5-0.7 3.375 0.926 0.7-0.9 4.100 1.125 0.9-1.0 Table 1: Illustration of closest possible placement of various size objects. object is changed, the eld of view angle of the sensor must be changed as well to ensure that the object is within the eld of view of the sensor. For every CCD camera, there is a physical limit for this eld of view angle which corresponds to the minimum dimension of the active sensor's image array. This limit was determined to be = 14:4 at the widest angle setting of the active lens attached to the CCD camera of the CardEye system. Using this angle, the maximum radius of an object at given distance is calculated and presented in Table 1. The CardEye system is designed to handle objects with sizes in the range of 0:2m  R  1:0m. As the object size is increased, the workspace in which to place this object decreases. On the other hand, when the workspace decreases, the overlap and disparity constraints are improved. Since the range for the object distance is shorter, the recalculated data for the overlap angle is a subset of the initial data presented before. This still results in a high value and even less di erence in the overlap angle. For the disparity, according to Eqn. 9, as R increases the angular disparity increases, too. Final results were developed by adding the object size constraint to the previous analysis. For each object size range, the analysis in Section 3 is repeated and a corresponding 2nd-order polynomial is derived that provides a solution for the sensor placement t, see Fig. 12. Five cases of object size are analyzed, and their solution for t is estimated as follows: Case 1: 0:2m  R  0:3m, 1:200m  d  7:0m t = 0:005622d2 + 0:04068d + 0:04125 [m] Case 2: 0:3m < R  0:5m, 1:925m  d  7:0m t = 0:005812d2 + 0:04702d , 0:01307 [m] Case 3: 0:5m < R  0:7m, 2:650m  d  7:0m t = 0:006205d2 + 0:05530d , 0:09068 [m] Case 4: 0:7m < R  0:9m, 3:375m  d  7:0m t = 0:006882d2 + 0:06668d , 0:20372 [m] Case 5: 0:9m < R  1:0m, 4:100m  d  7:0m t = 0:007990d2 + 0:08380d , 0:37802 [m]

Figure 12: Final results of active sensor planning for the translation parameter t. Using this solution for t, parameters such as vergence angle and eld of view angle can be easily solved according to Eqn. 1 and 2.

5 Applying Planned Parameters

A great challenge is how the planned parameters of the sensor planning algorithm are applied to the motorized CardEye system. Parameters, such as translation t and vergence angle , are applied to the system directly using electrical stepper motors through a controlling device. The eld of view angle, however, cannot be applied directly to the system since the active lens requires a speci c voltage level as an input. Therefore, angle has to be converted to a corresponding voltage level. Our zoom lenses have a focal length range of 11 , 110mm. These zoom lens settings correspond to a vertical eld of view angle of 14:4 , 1:7, respectively. A relationship is established between the eld of view angle and output voltage for the zoom by measuring the height of the image at ten di erent zoom settings. From the height of the image, an accurate corresponding eld of view angle can be calculated at each zoom setting. The result of this measurement can be seen in Fig. 13. Next, a 3rd-order polynomial curve is tted through the measured data points in Fig. 13. The equation of this 3rd-order polynomial curve is shown in Eqn. 13.

Vout;z = 0:00619 3 , 0:015915 2 +0:18004 , 0:142738 (13) This output voltage level (0.0-1.0) for the zoom setting can be directly applied through a controlling device to the active lenses.

6 Experimental Results

Experimental results were obtained using a simulated system created in 3D Studio Max and the actual CardEye trinocular vision head as well. Due to space

1063-6919/99 $10.00 (c) 1999 IEEE

7 Conclusion

An algorithm has been developed to solve the sensor planning problem for a trinocular, active vision system. This algorithm uses an iterative optimization method to rst compute the translation between the three cameras and then uses this result to determine parameters such as pan, tilt angles of the cameras and zoom setting. The solution for the focus will be added to this algorithm in future work. Figure 13: Relationship between eld view angle and output voltage level for the zoom setting. Diamonds indicate the measured data. Solid line indicates the result of the curve- tting process. limitation, simulated results are not shown here. In the practical experiment, a dummy head is placed at di erent distance d. The estimated size of this object is R = 0:2m. Then, the sensor planning algorithm is applied to the CardEye vision module. Fig. 14 illustrates the results for this experiment, the rst row shows the images before applying the sensor planning, second and third rows show the images after applying the sensor planning to the system at d = 3:0m and d = 1:5m, respectively. The results show that the presented sensor planning is capable of maintaining the object in the eld of view and with relatively similar sizes at di erent distances from the system.

(a) Camera1 (b) Camera2 (c) Camera3 Figure 14: Results of sensor planning applied to CardEye system. First row: before planning, second row: after planning d = 3:0m, third row: after planning d = 1:5m.

References

[1] K. Trabanis, R. Tsai and P. K. Allen, \A Survey of Sensor Planning in Computer Vision," IEEE Transactions on Robotics and Automation, Vol. 11, No. 1, February 1995. [2] S. Sakane, R. Niepold, T. Sato and Y. Shirai, \Illumination setup planning for a hand-eye system based on an environmental model," Adv. Robot, Vol. 6, No. 4, pp. 461-482, 1992. [3] K. Trabanis, R. Tsai and P. K. Allen, \The MVP Sensor Planning System for Robotic Vision Tasks," IEEE Transactions on Robotics and Automation, Vol. 11, No. 1, February 1995. [4] E. Trucco, M. Umasuthan, A. Wallace V. and Roberto, \Model-Based Planning of Optimal Sensor Placements for Inspection," IEEE Transactions on Robotics and Automation, Vol. 13, No. 2, April 1997. [5] K. Ikeuchi and J. C. Robert, \Modeling sensors detectability with the VANTAGE geometric/sensor modeler," IEEE Trans. Robot. Automat., Vol. 7, pp. 771-784, Dec. 1991. [6] Y. Kitamura, H Sato, and H. Tamura, \An expert system for industrial machine vision," Proc. 10th Int. Conf. Patt. Recognition, pp. 771-773, 1990. [7] Elsayed E. Hemayed and Aly A. Farag CardEye: A 3D Trinocular Active Vision System, Technical Report, CVIP Lab, University of Louisville, Nov 1998. [8] W. F. Taylor, The Geometry of Computer Graphics. Wadsworth & Brooks, 1992. [9] O. Faugeras, Three-Dimensional Computer Vision: A Geometric Viewpoint. MIT Press, 1993. [10] I. P. Howard and B. J. Rogers, Binocular Vision and Stereopsis. Oxford University Press, 1995.

1063-6919/99 $10.00 (c) 1999 IEEE