ation board called EZ-LAB made by Analog Devices. ... Figure 3: Basic Architecture of the VFG Board ... of packed data in a rst-in- rst-out bu er for easy ac-.
A Low-Cost, DSP-Based, Intelligent Vision System for Robotic Applications S. Asaad, M. Bishay, D.M. Wilkes and K. Kawamura Electrical & Computer Engineering Department, Vanderbilt University, Nashville, TN 37235
Abstract In this paper we present the design and implementation of a novel low-cost active vision system for robotic applications. The system is comprised of two parts: A 4-DOF trinocular active camera head and a DSP-based image acquisition and processing board. Design issues and tradeos are discussed. The performance of the system is evaluated by running an edge-based object tracking algorithm that is also presented in this paper. Experimental results show that the system is capable of tracking objects at 30frames=sec.
Figure 1. In this paper we present the design of RIVS and CATCH1 and report its performance as revealed by initial experimentation. The rest of the paper is organized as follows: Section 2 presents the hardware design of RIVS. Section 3 shows the construction and kinematics of CATCH1. An edge-based object tracking algorithm is implemented on the system for evaluating its real-time performance and is discussed in section 4. Finally, in section 5 we report our conclusions and explain how we intend to apply this system in our future research.
1 Introduction
2 RIVS System Overview
Although there are several image acquisition and processing boards available in the market today, it is still hard to nd a hardware platform optimized for robotic applications. This is because typical robot vision applications, like visual servoing and object tracking impose strict requirements on the real-time performance of the vision system which is usually only found in highly priced systems. We strongly believe that the main characteristics for a successful robot vision system are: high speed acquisition & processing, averageto-low image size and low overall cost. Related work has been reported by Horswill [3]. However, according to him, his system was limited by low processing power and small image size. Having this in mind, we built our rst prototype of \RIVS": a Real-time Intelligent Vision Sensor [1]. RIVS is capable of capturing, processing and extracting the desired information from the camera input and transmitting the \ nal answers" to the user at frame rate. RIVS, once programmed for a speci c task, can then be embedded in any environment as needed and operate autonomously, relieving the rest of the system from the computational burden associated with most vision algorithms. We also built a \Cost-eective Active Trinocular Camera Head (CATCH1)" which constitutes with RIVS a generalpurpose active vision system. The system is shown in
As shown in Figure 2, RIVS is mainly composed of two units: A video frame grabber board (VFG) and a host processor board (EZ-LAB). The former provides all the necessary functions to capture, format and temporarily store a video frame while the latter is responsible for processing the captured frame and controlling the overall system. A real-time display interface card installed in a PC-compatible computer was designed and implemented to complement the RIVS system with the ability to display the captured / processed images. The PC serves also as a user console to download programs and control commands to the RIVS system and communicate results back from it. The following subsections discuss the subsystems of RIVS in greater detail.
2.1 Host Processor Board The chosen host processor board is a DSP evaluation board called EZ-LAB made by Analog Devices. It is based on their ADSP-21020 oating point digital signal processor and is equipped with 32Kx40 bits of data memory and 32Kx48 bits of program memory. The size of data memory suces to hold two images of size 256x256 bytes which is enough for our applications. The ADSP-21020 is a high-end 32-bit oating
Video Inputs
Real Time Clock
Video Signal Digitization and sync detection HSYNC VSYNC Pixel Clock Capture
Programmable Window Control
8
Horizontal &Vertical Windowing
Image data Packing (Sequencer)
Status Information New Line
Frame Buffer (FIFOs)
To EZ-LAB
32
FIFO write control & new line tagging
Figure 3: Basic Architecture of the VFG Board
Figure 1: RIVS/CATCH1 system Host PC
2.2 Video Frame Grabber Board (VFG)
PC-Interface Card
RS-232 Port
ADSP-21020 Evaluation Board EZ-LAB
Display Port
Video Frame Grabber
Video Inputs
VFG
Real-time Intelligent Vision Sensor RIVS
point DSP superior in performance to most DSP chips of its class. It is benchmarked at 33.3 MIPS and a sustained oating point performance of 66 MFLOPs. The modi ed harvard architecture used in this processor allows storing of both data and instructions in the program memory besides storing data in the data memory. Furthermore, because of an on-chip instruction cache, the processor can fetch an instruction from the cache along with a data item from data memory and a data item from program memory all in one cycle. This feature is useful in some image processing algorithms such as model-based matching, in which case the image would reside in the data memory and the model or template for matching in the program memory. The ADSP-21020 has three arithmetic units arranged in parallel, namely: an arithmetic/logic unit (ALU), a multiplier and a barrel shifter. All three units can be executing in parallel. It also provides hardware to handle program sequencing. This can implement circular buers in memory, which is common in many image processing algorithms.
RS-232 Port Camera Head Control
Figure 2: RIVS System Overview
The VFG board is in essence an interface that performs three basic functions. First, it digitizes the incoming video signal and extracts all timing information out of it. Second, it extracts a user-de ned \window" from the digitized frame and packs it into words of four pixels each. Finally, it stores a complete frame of packed data in a rst-in- rst-out buer for easy access by the EZ-LAB board. In addition to the aforementioned functions, the VFG has the capability of \tagging" each new line of video as it is digitized and storing this information in a status register accessible to the EZ-LAB. Figure 3 shows a basic block diagram of the VFG board as discussed above. The major features in the design of the VFG board that distinguish it from other image capturing systems can be summarized in the following:
The VFG architecture is optimized for 32-bit data
processing. In other words, packing of the image data into units of four 8-bit pixels (32-bit words) occurs in hardware and operates autonomously. To do this, a bank of four FIFO RAM chips is used as a temporary frame buer. The read strobe of all four FIFOs are connected together and used by the host processor when reading in the frame. Writing those FIFOs is done in a rotary fashion by a state machine sequencer such that the nth pixel in a video line is written in FIFO number n modulo 4. This feature speeds up data transfer between the VFG and the host processor board by a factor of 4 leaving more time between frames for processing. (Typical image transfer time is 3 msec leaving 30 msec for processing to achieve NTSC frame rate performance). The image processing algorithm will require unpacking the image before actual processing can take place. However, depending on the algorithm, only the window of interest could be unpacked as will be demonstrated in section 4. The VFG supports four video inputs to choose from under software control. The sync detection circuitry is user programmable which facilitates adaptability to various video signal formats such as PAL, NTSC as well as non-standard formats. The VFG supports \hardware windowing" of the incoming frames. The user can specify a window of the image that is of interest to him. The VFG hardware ensures that only this part of the image gets stored in the frame buer and eventually transmitted to the host processor memory. This relieves the system from having to store and transfer unwanted data which in turn leads to increased eciency in terms of memory requirements and speed. The grabbing operation, once initialized by the host processor, runs autonomously without the need for the host. Due to this fact, pipelined operation of the VFG in capturing a frame and the host in processing the previous frame is possible. This greatly facilitates real-time operation of the whole system. All image point processing, such as image inversion or thresholding can be done \on the y". This is because the video digitizer passes the digital bit stream through a look-up table RAM that is designed for this purpose. The VFG implements a status byte that provides, among other information, a ag that tags the start of each video line. This feature is very useful when
1 of 4 channels selectable by software Programmable format: NTSC, PAL, ... 8 bit gray scale Programmable gain and oset Image 256 x 200 pixels Size Programmable hardware for window selection Acquisition 3 msec for image transfer Speed to host processor leaving 30 msec for processing to maintain frame rate acquisition Host EZ-LAB with ADSP-21020 DSP Processor 32K x 40 bits Data Memory Board 32K x 48 bits Program Memory Display 16-bit ISA PC card Adaptor memory or I/O mapped 16 msec nominal display time/frame Video Inputs Sync Detector Digitizer
Table 1: RIVS System Speci cations the number of pixels in each line is not known or variable. It serves also for debugging purposes.
2.3 PC Interface Board The PC interface board serves two main purposes: First it communicates messages (usually control commands) from the user to the RIVS system and second it provides a high speed parallel interface for image display on the PC monitor. The primary uses of this card is in the initialization of the vision algorithm and for debugging purposes by displaying the captured and/or processed images. The interface card is a 16-bit ISA plug in card for the PC. The card uses FIFO RAM chips to relax the timing dierences between the RIVS system and the PC. Both memory mapping and I/O mapping of the display port are possible through the appropriate placement of a jumper selector. The card can operate either in interrupt driven mode or in under normal program control. Table 1 summarizes the main speci cations of the RIVS system.
3 Active Camera Head (CATCH1) We have developed a low-cost, light-weight camera head with four degrees of freedom. Three cameras can be mounted on this head. The central camera as shown in Figure 1 is a color camera and the left and right cameras are grey-scale. The side cameras are capable of independent vergence while all three can be
panned or tilted. The color camera does not require a separate control motor since the pan and tilt axes of rotation intersect its optical axis and therefore these motors are sucient to control it. In addition, all cameras are mounted such that their optical axes lie in the same plane. The motivation behind the color camera is color-based object detection [5, 2]. Once an object has been detected in the color image, CATCH1 moves to xate the object in the center. Due to the mounting geometry of the cameras, the object will automatically appear along the central horizontal line in the left and right images. From then on, the faster greylevel processing will be used to xate the object in the left and right images and obtain depth from vergence. Low cost hobby servo motors actuate the joints. The speed of the vergence and tilt motors is :24sec=60o , and 0:22sec=60o for the pan motor. The full range of all the motors is 140 degrees. The controller board for CATCH1 is based on the Motorola 68HC11 microcontroller. The controller command for each motor has the form \motor I.D. position value" where motor I.D. = l, r, t, p for the left, right, tilt, and pan motors, respectively. Position value is a one-byte value that ranges from 0 to 255 covering the full range of motion. The controller receives its commands from RIVS through a serial link. Pulse Width Modulation (PWM) is used to control the position of the servo motors. The system weighs approximately 1 kg (2 lbs).
4 Real-Time 3-D Object Tracking To evaluate the RIVS/CATCH1 system performance, we implemented an edge-based object tracking algorithm using the side cameras of CATCH1. The motivation for testing a tracking algorithm is the realtime constraint of performing the algorithm at frame rate. Edge-based tracking is robust against object size changes and grey-level features disappearance resulting from object rotation. However, it relies on the existence of object/background contrast to produce object edges. Our tracking algorithm continuously xates the object that is being tracked in the center of the visual eld. This has several advantages that have been explained in [6], namely: image stabilization, overcoming limited eld of view, and gure/background separation since the object would have minimumoptical ow compared to the background. Fixating the object has two additional advantages: the rst is that camera optics are best around the center of the image, where the optical axis intersects the image plane, hence by keeping
image plane θ Rotation axis
f O
φ
A x
Rotation axis
Optical axis
f O
φ
Optical axis
A x
δ
P
(a)
P
(b)
Rotation axis
f O
θ φ
Optical axis
A x
P
(c)
Figure 4: Centering a point P which has zero Ycoordinate the object near the center the object edges are distorted minimally. This permits the use of less expensive cameras with lower quality optics, thus reducing the overall system cost. The second advantage which is speci c to our system, is that the image unpacking for applying the tracking algorithm is always kept to the same area of the image. This simpli es the unpacking algorithm by only unpacking a xed region around the center of the image.
4.1 Target Centering In this section we derive the equations for centering a target in the image. Initially we are assuming that the object to be centered lies in the plane formed by the optical axes of both cameras, hence the object's projection in the image plane has the y-coordinate equal zero. As shown in Figure 4, if the camera rotates about a line passing through the image plane, which is perpendicular to the page, then the camera needs to be rotated by an angle given by: = + ; (1) (2) = tan?1 fx ; where f is the camera focal length. Since the object distance from the camera is much larger than the camera focal length, then THRESH, set xleft = x Reset x = xt[n ? 1] increment x till SV (x; y) > THRESH, set xright = x x (m) = xright ? xleft increment mH 5. setN the x coordinate of the tracking point: xt[n] = mHH =0 x (m) NH
6. SH(x[n]; y[n]) = Sobelhorizontal (I(x[n]; y[n])) 7. For NV times do set x = xt[n] + (mV NV ? N2V ) set y = yt [n ? 1] decrement y till SH(x; y) > THRESH, set ytop = y Reset y = yt [n ? 1] increment y till SH(x; y) > THRESH, set ybottom = y
NmVV =0y (m) NH
4.3 Experimentation
80
Base Angle (DEG)
8. Set the y coordinate of the tracking point: yt [n] =
Base Angle (DEG)
y (mV ) = ytop ? ybottom increment mV
60 40 20 0 -20 0
50
100 Time (sec)
150
60 50 40 30 0
A
40 20 0 50
100 Time (sec)
150
50
100 Time (sec)
150
160
Tilt Angle (DEG)
Vergence Angle (DEG)
B
60
-20 0
70
Vergence angle
80
50
100 Time (sec)
150
140 120 100 80 0
Figure 7: Tracking results
Base angle
Figure 6: Experimentation setup The system was tested using the setup shown in Figure 6. The robot arm moves an object at a speed of 4.2 cm/sec in front of the camera head to track it. The object to camera distance changes from 100 cm at point A to 75 cm at point B. The tracking speed was 30frames=sec for a single camera and 15frames=sec when using stereo cameras. The latter is due to the time lost in frame synchronization when switching camera channels. The tracking algorithm, implemented in assembly language, takes approximately 8msec=frame. The tracking results are shown in Figure 7 for a single camera tracking in terms of the pan and tilt motions . The results show that the system tracks the moving object with no motion delay.
5 Conclusions & Future Research In this paper we presented a low-cost active vision system for robotics applications. The system comprises a 4-DOF trinocular camera head and a DSPbased image acquisition and processing board. The overall system cost is in the order of $1000. The system will be used in applications that demand real-time performance, such as mobile robot navigation and 3-D face tracking. In the rst application, the system will guide a mobile robot to grasp a soda can placed on a table [2]. The other application involves tracking the face of a disabled person while being fed by ISAC: a robotic system for feeding the disabled [4]. Most commercially available computer vision boards tend to be as general as possible to cover a
larger portion of the market applications. This has the drawback of increasing the cost of these systems while not achieving optimum performance for speci c applications. To overcome these problems, we developed the RIVS/CATCH1 system to serve as a platform for our robot vision research. We demonstrated that using dedicated hardware designed speci cally for robotic applications results in improved performance and lower cost.
References
[1] S. Asaad \A low-cost, real-time, intelligent vision sensor," M.S. thesis, Vanderbilt University, Electrical Eng., May 1995. [2] K. Fujiwara, \Visual detection of objects using multi-color cues," M.S. thesis, Vanderbilt University, Electrical Eng., May 1994. [3] I. Horswill and M. Yamamoto \A $1000 active stereo vision system," IEEE CVPR, 1993. [4] K. Kawamura, S. Bagchi, M. Iskarous, and M. Bishay, \Intelligent robotic system in service of the disabled," IEEE Trans. on Rehab. Eng., 3:1, pp. 14-21, 1995 [5] M. Swain and D.H. Ballard, \Color Indexing," International Journal of Computer Vision, 7:11-32, 1991. [6] M. Swain and M. Stricker, \Promising directions in active vision," International Journal of Computer Vision, Vol. 11, No. 2 pp. 109-126, 1993.