Machine Vision for Robotic Assembly

1 downloads 0 Views 52KB Size Report
Computer vision is often required to provide data to a robot for grasping of a target and performing ... laser line scanning which gives a range or depth map.
Machine Vision for Robotic Assembly: Issues and Experiments C D’Souza, K Sivayoganathan, D Al-Dabass*, V Balendran, J Keat Department of Mechanical and Manufacturing Engineering *Department of Computing The Nottingham Trent University Nottingham, NG1 4BU

Abstract Computer vision is often required to provide data to a robot for grasping of a target and performing assembly tasks. Using a vision system for assembly presents several issues with respect to data acquisition, coordinate transforms, invariant object recognition, and vision system configuration for integration with the robot. This paper presents an overview of approaches used in the field of object recognition and the options available for use with a robot. Typically, recognition systems use objectcentred or viewer-centred methods. Extraction of appropriate features is the key to successful object recognition. These features, once extracted and processed from the volumetric approach or surfacebased approach, should ideally possess the property of invariance to scale, translation and rotation. Certain mathematical functions such as the Gaussian and mean curvatures and Gabor wavelets can be applied to achieve this. This information can then be used for training neural networks. For recognition purposes, artificial neural networks (ANNs) have been found to be good for classification and adaptation even in the presence of noise. Certain ANNs have invariance inherent in their architecture while others use a stage of preprocessing. These are described in the paper. After integration of the vision system, the various possibilities of assembly and disassembly using the robot arm are outlined. Results obtained in the early stages of this work on object recognition using intensity data are also presented.

1. Introduction Vision-guided robotics has been a topic of continued interest for the past three decades. Robots today can perform assembly and material handling jobs with speed and precision, yet compared to human workers, robots are hampered by their lack of sensory perception. Machine vision is a useful robotic sensor since it aims to mimic the human sense of vision and allows for non-contact measurement of the environment. A robot must perceive the three-dimensional world to be effective. Yet recovering 3D information and describing it still remains the subject of basic research. Section 2 describes the commonly used features to represent objects. Different methods used by researchers in the field of object recognition have been described in section 3 which is followed by a brief description of two neural network architectures.

2. Object representation The data for visual object recognition is usually obtained by a CCD camera giving intensity data or by laser line scanning which gives a range or depth map. From this data, features have to be extracted for object recognition. Successful object recognition involves the matching of image features, from either range or intensity data, against a previous internal representation of the object ie. a model. Certain features like curvatures are independent of the view point for the object. After the Gaussian (H) and

mean (K) curvatures are estimated, by inspection of their respective signs, it is possible to define the local surface area as being one of eight fundamental primitive types viz. peak, pit, ridge, valley, saddle ridge, saddle valley, minimal and flat. Other common features which are extracted are the edges of objects, vertices, planar regions and curves [1]. The volumetric approach creates an internal representation by approximating an object by using volumetric primitives such as cylinders, cones and spheres (often referred to as geons). One approach uses features called as jets. These are a set of responses of Gabor filters[2] of different frequencies and orientations, all centred at a pixel position. The jets are robust against slight rotation and distortions. Often along with features, the relation attribute (ie. how a given feature is related topologically to another feature), is also stored. Neural networks are being increasingly used to store the model representation since some of them can adapt. Two such architectures are described in section 4. A brief description of several approaches used in the area of object recognition follows.

3. Overview of approaches for object recognition Work done by Wunsch and Hirzinger [3] uses a form of iterative closest point algorithm for registering a 3-dimensional CAD model to a 2-dimensional camera image. The key idea is to relate image feature points to model data in 3-D space rather than in the image plane using the inverse perspective approach. The model is fitted onto the wireframe of the image derived by extracting edge segments, after several iterations. This method could prove computationally intensive, although the work presented took 5.5 sec on a Silicon graphics indigo workstation. Keat [1] has used a laser line triangulation technique for collection of 3-D data. Before the object could be put to a recognition system, the data was converted to a format that was independent of origin and viewpoint. This was done by deriving the HK map comprising Gaussian (H) and mean (K) curvatures (as described in section 2). To remove the positional and scale variance, the centre of gravity approach was used in the retinal preprocessing stage. For object recognition a modified version of ART2a[4], a type of ANN, was used. A pattern rotation unit and medium term memory layer were added. The rotational variance was removed within the ANN architecture. The input range map was mapped onto the master range image and thus rogue features could be determined. Parametric geons have been used by Wu and Levine [5] as a coarse description of object components for qualitative object recognition. Parametric geons are seven qualitative shape types defined by parameterized equations which control the size and degree of tapering and bending. Model recovery is performed by a procedure of model fitting and selection by minimizing an objective function measuring the similarities in both size and shape between models and objects. The models used are obtained by fusing multiple view range data. Grossberg and Bradski [6] have proposed VIEWNET, a neural architecture for learning to recognize 3-D objects from multiple 2-D views. Preprocessing is done by a CORT-X2 filter to suppress noise. A log-polar transform is taken with respect to the centroid of the resulting figure and then recentred to achieve scale and rotation invariance. The invariant images are coarse coded and the compressed codes are input into a supervised learning system based on the Fuzzy ARTMAP algorithm which learns 2-D view categories. Voting based on the unordered set of stored categories determines object recognition. This method has the disadvantage of having to gather hundreds of images of an object from various angles and storing them. Cannon and Park [7] have used a profile-network based object recognition method. In the off-line training stage, the multiview model of the 3D CAD object is generated using a tessellated sphere whose surface is divided into approximately identical triangles and the view from anywhere within a triangle is assumed to be the same. After the boundaries of the object are extracted from 2D views of a 3D CAD model, they are represented using a centroidal profile(CP) feature. The CP is an ordered sequence of the length between the centroid and points on the boundary. The CP is independent of

translation and rotation once the starting point is specified. The input CP pattern is sequentially applied to the neural networks in the library. For a given image of the object, the viewpoint of the image will nearly match one of the finite viewpoints from the tessellated sphere. The descriptions above represent only some of the typical approaches that have been used by researchers in the past and gives an idea of how vast the subject of object recognition is.

4. Neural Networks Some of the approaches described above utilize neural networks for object recognition. Artificial neural networks have been found to be good for classification and adaptation even in the presence of noise. Two architectures which can be used for object recognition are described below. 4.1. Adaptive Resonance Theory (ART) The ART-1 [8] architecture consists of two parts, the attentional subsystem and orienting subsystem. The attentional subsystem is made up of 2 layers of nodes, F1 and F2 as shown in figure 1. In an ART network, information in the form of processing-element output reverberates back and forth between layers. If a stable oscillation or resonance takes place, learning or adaptation can occur. A resonant state can be attained in one of two ways. If the network has learned previously to Orienting recognise an input vector, then a resonant state Subsystem will be achieved quickly when that input vector LTM is presented. During resonance, the adaptation LTM process will reinforce the memory of the stored STM F1 pattern. If the input vector is not immediately Attentional recognised, the network will rapidly search Subsystem through its stored patterns looking for a match. If no match is found, the network will enter a INPUT PATTERN resonant state whereupon the new pattern will be stored for the first time. Thus, the network Figure 1 - ART1 Module responds quickly to previously learned data, yet remains able to learn when novel data are presented. The activity of a node in the F1 or F2 layer is called short-term memory(STM). The adaptive weights are called long-term memory(LTM). The vigilance parameter of the orienting subsystem determines how much mismatch will be tolerated. The ART-1 model can use only binary data, but a later version, the ART2a model can also use other values. Some preprocessing for achieving invariance, such as adding a pattern rotation stage, has to done in this type of ANN. STM

F2

4.2. Dynamic Link Architecture C. von der Malsburg et al [9] have concluded that to achieve invariant pattern recognition, a network must explicitly encode neighborhood or topological relations between a pattern’s features. The dynamic link architecture uses the topography constraint that a local feature f and its neighbours are very likely to have almost the same transformation to match the stored pattern onto its counterpart in the perceived pattern. There are two layers, the model layer and image layer. An object to be memorized is extracted from an image as a model graph by placing a rectangular grid of points over the object and recording the features. The image and models are represented as neural layers of local features ie. two patterns each consisting of N x N local features arranged in two 2-D layers I and M as shown in figure 1. The feature vectors used are ‘jets’ located at each point of a grid of vertices. The features can be extracted

from the image by applying filters based on Gabor-type wavelets(as described in section 2). Neighboring vertices are connected by links (correlations), which encode information about Model Layer M local topology. Hence, vertices refer to locations, carry jets as attributes, and thus form local descriptors of object structure. To Image Layer I recognize an object, the system attempts to competitively match all stored object models against the jet array in the image domain, a Figure 2 - Dynamic Link Architecture process called “Dynamic Link Matching”. The winning model is identified as the object recognized. In one version of fast dynamic link matching, a blob or attention window is moved in the image and model layer to reinforce or weaken the connectivity matrix between the two layers when matching. Another version of elastic graph matching varies the image graph to minimize a cost function which depends on the normalized dot product of jets and the square of the distance of the edges in the model and image layer.

5. Visual system configuration It is our aim to integrate the visual recognition system on an industrial robot(Unimate PUMA Mark III). This has two basic units: a controller and a robot arm. The system software that controls the robot is called VALII. A teach pendant can be used to manipulate the tool to desired locations. Visual servo systems typically use one of two camera configurations: end-effector mounted, or fixed in the workspace. The first is often called an eye-in-hand configuration. Motion of the manipulator causes changes to the image observed by the vision system. Thus the specification of an image-based visual servo task involves determining an appropriate error function e, such that when the task is achieved, e=0. It is necessary to relate changes in the image feature parameters to changes in the position of the robot arm. In the second type of configuration, the camera is fixed. So, if the objects are placed on a table and the z coordinate is assumed to be constant, the coordinates in the image plane can easily be related to world coordinates. Recalibration is also another aspect involved. Due to random disturbances to the system or due to the need of repositioning the camera or moving the robot base, usually a lengthy re-calibration process needs to be carried out. S. Kumaresan and H. Li [10] have described a technique which eliminates the need of recalibration for the vertical position change of the camera based on the mathematical formulation of inverse mapping and fuzzy logic.

6. Performing tasks Once the visual system is integrated with the robot, it can be used to perform assembly tasks. Within the defined workspace, the vision system will be able to identify different objects independent of placement and provide coordinates for the robot arm to approach and pick the object. Tasks such as peg-in-hole insertion can be carried out. Also, the vision system will be able to identify separate components of mated objects thus enabling the robot to perform disassembly. For efficient assembly, tactile sensing can also be integrated. For mating objects, once the robot arm carrying a component is close enough to the object to be mated with, contact force sensing can be used for further adjustments.

7. Experimentation and Results During initial experiments for object recognition, the ART-1 algorithm was used. A video frame grabbing card having frame buffers was plugged to the PC and a CCD camera was attached to obtain

grey level intensity data. Simple objects such as a floppy disk, pen and round piece of cork, were used with a white background for ease of segmentation. Once a frame was captured, it was thresholded and edges were detected by convolution. The coordinates for each existing edge point were stored in an array using dynamic allocation. For locating the object, a search for the maximum and minimum x and y values in the array was made and a square surrounding the object was found. This square was subdivided into a grid of 20 x20 and each location was assigned a value of either 1 or 0 depending upon a threshold of the sum of existing edge points. Thus, the object could be represented by a matrix of 20 x 20, with all 1’s representing the edges of the object. This binary data was then used for training the neural network (ART-1). Each of the three objects was stored using the method described above. During the recognition phase, when one of the objects was presented again, the node representing the proper object was selected as the winning node. A vigilance factor of 0.4 was used. Smaller objects of the same type were also correctly recognised. This recognition is also invariant to translation. The success of the recognition depends mainly on the proper segmentation of the objects.

8. Conclusions This paper addressed the various approaches and issues relating to object recognition and robotic assembly. The use of artificial neural networks is rapidly increasing in this area. In the above experiments, only the edges of the simple objects were used. Proper segmentation of the object from the background is critical and spurious data outside the boundaries of objects can cause problems for object recognition. For robust recognition the need for different features in addition to edges was felt. Research is ongoing to extract multiple features and their relation to each other and use this data to train a neural network. A good recognition system would allow the robot to perform the assembly tasks efficiently.

References 1. John Keat; Adaptive Invariant Recognition and Assessment of Free-form Objects: A Connectionist Approach, Ph.D. thesis, The Nottingham Trent University, 1996 2. M. Potzsch, N.Kruger, C. von der Malsburg; Improving Object Recognition by Transforming Gabor Filter responses, Network: Computation in Neural Systems, vol 7(2), pp 341-347, 1996 3. P.Wunsch and G. Hirzinger; Registration of CAD-Models to Images by Iterative Inverse Perspective Matching, Proceedings of the 13th ICPR, pp 77-83, 1996 4. G. Carpenter, S. Grossberg, D. Rosen; ART-2A: An Adaptive Resonance Algorithm for Rapid Category Learning and Recognition, Neural Networks, vol. 4, pp 493-504, 1991 5. K Wu, M. Levine; Recovering parametric geons from multiview range data, Proceedings of the IEEE Comp. Soc. Conference on Computer Vision and Pattern Recognition, pp 159-66, 1994 6. S. Grossberg and G. Bradski; VIEWNET: A Neural Architecture for Learning to Recognize 3-D Objects from Multiple 2-D Views, SPIE Vol.2353,Int. Robots and Comp. Vision , pp 266275,1994 7. K.Park and D. Cannon; Recognition and Localization of a 3D Polyhedral Object using a Neural Network, Proceedings of the IEEE International Conference on Robotics and Automation, vol.4, pp 3613-3618, 1996 8. G.A. Carpenter and S. Grossberg; A Massively Parallel Architecture for a Self-organising Neural Pattern Recognition Machine, Computer Vision, Graphics, and Image Processing, vol 37, pp 54115, 1987 9. M.Lades, J.Buhmann, C.v.d.Malsburg, R.Wurtz; Distortion Invariant Object Recognition in the Dynamic Link Architecture, IEEE transactions on Computers, vol 42(3) pp 300-11, 1993 10. Kumaresan S.S., Li, H.H.; Hand-Eye Coordination of a Robot Manipulator Based on Fuzzy Logic Proceedings ICIP-94 p. vol 3, pp 221-5, 1994