Parallel Matching of 3D Articulated Object ... - Semantic Scholar

1 downloads 0 Views 233KB Size Report
Parallel Matching of 3D Articulated Object Recognition by. Professor Patrick Wang, Ph.D. and IAPR Fellow. College of Computer Science. Northeastern ...
Parallel Matching of 3D Articulated Object Recognition by

Professor Patrick Wang, Ph.D. and IAPR Fellow College of Computer Science Northeastern University Boston, MA 02115 (617)373-3711(O), (617)373-5121(fax), email: [email protected] http://www.ccs.neu.edu/home/pwang/

Summary

In dealing with large volume image data, sequential methods usually are too slow and unsatisfactory. This paper introduced a new system employing parallel matching in high level recognition of 3D articulated objects. A new structural strategy using linear combination and parallel graphic matching techniques is presented for 3D polyhedral objects representable by 2D line-drawings. It solves one of the basic concerns in di usion tomography complexities, i.e. patterns can be reconstructed through fewer projections, and 3D objects can be recognized by a few learning samples views. It also improves some of current methods while overcome their drawbacks. Furthermore, it can distinguish very similar objects with and is more accurate than other methods in the literature. An on-line webpage system for understanding and recognizing 3D objects is also illustrated.

Keywords: image visualization and understanding, 3D articulated objects, pattern recognition, linear combination, di usion tomography, parallel graph matching, nite representation

1

1 Introduction and Objectives In recent years, there has been a growing interest in 3D object recognition research. Not only it is interesting and challenging in theory, but also it can be applied to solving many realistic problems in industrial parts inspection, military target recognition, CAM/CAD engineering design, scienti c experiments, and document image analysis [5,11,18,22,33,41]. However, up-to-date, most research focused on rigid objects only, with large learning sample size, low speed, and can not distinguish similar patterns [2,25]. With rapid development of modern computer technologies, there is an increasing interest and need to recognize articulated objects, which are widely used in industrial parts inspection and other engineering, military, and commercial applications. It is well known that line drawing images, also called \noble class of images" [31] in which many important and essential phenomena of 3D objects hold in the real world, provide a very e ective and practical method to describe the 3D shape of an object. These drawings normally can be organized into two or more distinct views, known as characteristic views or aspects in which no one view by itself is sucient to completely characterize the object being represented [20]. One of the most challenging and dicult problems is that how can we visualize and interpret a line drawing image of 3D object? How a 3D object in various rotations and topological scalings can be properly represented and recognized? There have been several developments in engineering drawing interpretation over the past twenty years. Perhaps, the earliest work was reported in 1971 [12]. Later, a bottom-up approach to interpretation was developed [42], and was extended to include objects with curved surfaces [29]. Recently, it has applied the bottom-up technique to interpret paper-based drawings [24]. In 1984, a method was developed for correcting misalignments in single-view drawings with a non-special view point; yet it does not appear to be extensible to multiple views with special view points [31]. At the same time, there have been various methods in computer vision and object recognition. An interesting approach was presented for recognizing 3D objects using a set of 72 x 72 = 5184 sample views from learning [32]. Further e ort has reduced it to 100 views [25]. More recently, a powerful new computational method called linear combinations (LC) using alignment techniques was developed at MIT [2,34,35]. It involves easy computations and needs fewer learning sample views, in contrast to other methods that required a huge set of learning samples even for describing and recognizing only one single object [8,9,14,17], and some other methods which require more complicated primitive structures such as ellipsoid and bent cylinder [10]. Yet, it has many 2

limitations and may misrecognize invalid objects and reject valid ones. Such diculties were overcome by another approach using parallel graph matching by Wang [39] with several explicit rules for learning and recognition of concave rigid objects with very few sample views. Other recent developments and surveys can be found in [4,7,24]. In this research we will tackle a more general problem involving articulated objects. This class of objects is of special interests and importance since it includes most of the industrial robots and man made factory tools as well as military targets. For example, in recognizing a tank, it will involve di erent views of the model and various status/range of how far each component, e.g. cannon can rotate. How can such process be automated by using computer to visualize, understand and interpret such images? Though interesting and important, yet recognizing such objects by computers is more dicult and challenging. The rst attempt to tackle such problems was in [6] using symbolic reasoning in ACRONYM system. An interpretation tree approach to deal with 2-d objects with rotating subparts was introduced in [16] and extended to handle 3D articulated objects such as staplers [15]. In [3], di erent aspects in this area were reviewed and generalized the generalized Hough Transform (GHT) to recognize single joint articulated objects such as pairs of scissors. Yet, it remains to be seen how to overcome the added limitations of the GHT method, as usually happens when an existing object recognition technique is extended to handle articulated objects [3,13]. The method proposed in [37,40] uses very few learning samples and can handle articulated objects but only works for wire-frames, and needs 3D coordinates of the objects. Our main goal is to present a new approach to overcome the above mentioned drawbacks, while maintaining advantages of key methods [1,14,19,21,23,27,28,36].

2 Extended Linear Combination Method (LC) Linear combination method is based on the observation that novel views of objects can be expressed as linear combination of the stored views (from learning). It identi es objects by contructing custom-tailored templates from stored two-dimensional image models. The template-construction procedure just adds together weighted coordinate values from corresponding points in the stored two-dimensional image models. Here, a model is a representation in which * an image consists of a list of feature points observed in the image * the model consists of several images { minimally three for polyhedra. 3

An unknown object is matched with a model by comparing the points in an image of the unknown object with a template-like collection of points produced from the model. In general, an unknown object can be arbitrarilly rotated, arbitrarilly translated and even arbitrarilly scaled relative to an arbitrary original position. From the basis graphic knowledge, an arbitrary rotation and translation of an object transforms the coordinate value of any point on that object according to the following equations: X = rxx ()X + ryx ()Y + rzx ()Z + tx Y = rxy ()X + ryy ()Y + rzy ()Z + ty Z = rxz ()X + ryz ()Y + rzz ()Z + tz where rij () ( i,j=x,y,z) is the parameter that shows how much the i coordinate of a point, before rotation, contributes to the j coordinate of the same point after rotation, and ts (s=x,y,z) is the parameter that is determined by how much the object is translated. Based on S. Ullman's concept that three images , each showing four corresponding vertexes, are almost enough to determine the vertexes' relative positions, therefore, at least three model images are needed and these three model images yield the following equations relating models and unknown object coordinate values to unrotated, untranslated coordinate value, x,y,z. XI1 = rxx (1 )X + ryx (1 )Y + rzx (1 )Z + tx (1 ) XI2 = rxx (2 )X + ryx (2 )Y + rzx (2 )Z + tx (2 ) XI3 = rxx (3 )X + ryx (3 )Y + rzx (3 )Z + tx (3 ) XI0 = rxx (0 )X + ryx (0 )Y + rzx (0 )Z + tx (0 ) These equations can be viewed as four equations in four unknowns, X, Y, Z and XI0 , and can be solved to yield XI0 in term of XI1 ; XI2 ; XI3 and a collection of four constraints, XI0 = x XI1 + x XI2 + x XI3 + x where x ; x ; x and x are the constrants required for x-coordinate-value prediction, each of which can be expressed in term of rs and ts . In order to determine the constraints value, a few corresponding points are needed, here there are four constraints, therefore, four equations are needed, furthermore, four feature points are required in every image. Similarly, we can build equations for y coordinate, by using same method as that used for x coordinate, producing another set of constrants: y ; y ; y and y to predict the y coordinate value of any point in the unknown image. The new idea is to partition each input image to two portions, i.e. the main and articulated portion. The rationale is to use as few projections as possible for learning and recog4

nition, use graph for pattern representation, and gain faster speed by parallel matching. Each portion needs projections from two directions for every characteristic view in learning. Only one view is needed for recognition, though. If each portion of an input is a linear combination of some object models, it is accepted, else rejected. Human interactions are involved in learning/ recognition. Some examples of object projections and pattern reconstructions are illustrated in Figure 2.1, and some mathematical formulations/computations are described below.

θ =>

or rotate θ

B

A

+ A

B

Figure 2.1 An example of 2 projections of an aspect view, which forms the base of linear combination for object recognition. A linear combination (LC) scheme uses two images in the same characteristic view class to see if an input is acceptable. If the input image is an LC of two images in the model-base, then it is accepted, elas rejected. It is equivalent to solving simultaneous equations of the following two vectors : x^ = a1x1 + a2 y1 + a3 x2 and y^ = b1 x1 + b2 y1 + b3x2 We use the following heuristic rules for learning and recognition: [43]     

Rule 1: Every characteristic view of an object in learning should include all visible sides from that angle and should cover all such sides' maximum area as possible as it can. Rule 2: Every dangling line should not be seen through in recognition, i.e. should not cro ss the touching boundary line(s). Rule 3: Incomplete lines or dangling lines with no touching boundary lines are not acceptable, unless they are matched with the corresponding lines and nodes in learning samples, and Rule 4: Select nodes, including dangling nodes in the representing graph to be correspondence points for matching by linear combinations (LC). Rule 5: Should avoid views that involve a side just barely visible or a near unstable view that will change dramatically with a tiny bit rotation. (else, it will b e ambiguous or ill-posed)

Figure 2.2 An example of hough transform of an input image (3D articulated object) to line drawing and skeleton. In this article, smoothed and thinned lines such as those shown below will be used for illustrations. 5

a

b

d

b

e

e

j

j

l

g

l f

i

h

k

c

k

c

d

a

measurement change

h

f i g

d

door (articulated portion) c

a b h

e

f i

j

l ~

k ~

dangling points (nodes) and lines

g

main body

Figure 2.3 An example of articulated feature extraction Notice that the procedure is divided into two parts: one handles the main portion, the other, articulated portion. The whole object is represented by a graph. In the learning process, one can interactively open and close the door (articulated portion), thereby distinguishing the two portions by di erent measuring of the angles between the door and the main body. Therefore, the shaded area of the subgraph indicates the articulated portion, and the rest is the main body. Now each portion is represented by its individual graph respectively. This heuristically generated graph representation will be used in the pattern recognition stage as well as object understanding and visualization. [43]

6

a

In Learning:

a

d

e

d

b

e

c

b

c

+ j

f

g h

h

g

a

e c

j

e

f i

extracted j and compiled ~ h in the Library as: (d−acef)(a−b,c−bi, e−g,f−gij)(b−h,i−h) ~

d

b

c

g

In Recognition: i

i a

a

graph representation b

f

j

d

f g

This input image is accepted because each of the 2 portions shown on the right is a linear combination of 2 of the images stored in the Library in the same category and view

d d

b c

j

is an LC of f

+

e

is an LC of

and

+

f g

Figure 2.4 An example of pattern matching (learning and recognition)

3 Illustrative Examples The Extended Linear Combination Method applies to both rigid objects, which include both convex and concave objects, and articulated objects with visible and invisible hinges. This includes a large amount of objects in real world. The following gives two kinds of articulated objects and their experimental results. Articulated Objects A xwindow system is developed for the experiment, in which about 50 objects are stored structurally in the memory, half of them are articulated 3D objects, including 3D rotating buckets with lid [44], rotating model tanks [45], and 3D rotating model pairs of scissors [46]. Details of these articulated objects and their various views of di erent angles ranging from 0 to 360 degrees rotation around x-, y-, and z- axis can be seen from internet webpages in the above references. The following experiments summary some of them. 7

Figure 3.1 Articulated object recognition (i), (ii) and (iii) are model images of the closet, (iv) is another view of the same closet,and (v) is the image of a di erent closet The objects in g. 3.1 are kinds of closets and their doors can be rotated along their hinges. The three model images in (i), (ii) and (iii) are set up by rotating both main part and articulated part separately. Fig 3.1(iv) shows the di erent view of the same object and the following data verify that it matches the model images. The match procedures on the main part are described as follows: The feature points on the main parts of the model images are as belows: Model1: (-1.90,0.40), (1.00,-0.20), (1.70,-1.10), (0.70,-2.10) Model2: (-2.00,0.60), (0.70,-0.50), (1.70,-1.20), (0.30,-2.20) Model3: (-2.00,0.70), (0.40,-0.70), (1.70,-1.30), (-0.00,-2.40) and the selected feature points on the main part of this unknown image are (-2.0,0.8), (0.3,-0.8), (1.7,-1.3), (-0.2,-2.4), and (-2.3,-0.8). After the equations are solved, the constrain values are as follows: 8

x = ?1:12; x = 1:78; x = 0:31; x = 0:05 y = ?0:32; y = 0:09; y = 1:21; y = 0:03

the predicted x and y coordinate values are -2.4030 and -0.7106 and the absolute di erences between the predicted value and the original one are 0.1030 and 0.0894. If we select threshold to be 0.6, both the di erences are less than threshold, so far the main part of this unknown image matches those of the model images. The match procedures on the articulated portion are described as follows: The feature points on the articulated portion of the model images are as follows: Model1: (-1.90,0.40), (1.00,0.30), (2.00,1.20), (-0.80,1.30) Model2: (-2.00,0.60), (0.70,0.60), (2.10,1.40), (-0.40,1.40) Model3: (-2.00,0.70), (0.40,0.80), (2.20,1.70), (-0.10,1.60) and The feature points on the articulated portion of this unknown image are (-2.0,0.8), (0.1,1.2), (2.2,2.1), (0.0,1.7), and (-0.95,1.0). After calculation, the constrain values are as follows: x = 0:62; x = ?2:00; x = 2:38; x = ?0:07 y = 0:33; y = 0:67; y = 1:00; y = 1:60 the predicted x and y coordinate values are -1.4835 and 1.2656 and the absolute di erences between the predicted value and the original one are 0.5335 and 0.2656 . If we select threshold to be 0.6, both the di erences are less than threshold, so far the articulated portion of this unknown image also matches those of the model images. Therefore, the conclusion can be made that this unknown image matches the model images. Figure 3.1 (v) is a closet which has the same main part as that of model object, however, its door is di erent from that of the model one: it has a various size. Therefore, when this input image is recognized, the main part should match the models and the articulated portion should not. Hence, the entire image will not match model images. The followings are the recognization data which demonstrate the above statement. Because the main part of this unknown image is the same as that in (iv) and from the above calculation, we know that it matches the main parts of the model images. Next, let us focus on the articulated portion recognition. The feature points on the articulated portion of this unknown image are (-2.0,0.8), (-1.0,1.0), (1.1,1.9), (0.0,1.7), and (-1.5,0.9). After calculation, the constrain values are as follows: 9

x = 0:26; x = ?1:68; x = 2:62; x = ?0:61 y = 0:50; y = ?0:92; y = 0:67; y = 0:40

the predicted x value is -0.0273 and the absolute di erence between the predicted value and the original one is 1.4727. If we select threshold to be 0.6, both the di erence is much greater than threshold, the articulated portion of this unknown image does not match that of the model images, therefore this unknown image does not match the model images.

4 Discussions and Conclusions A new approach called Extended Linear Combination Method LCM has been proposed, which has many advantages including : (i) can overcome ambiguity and misrecognition, (ii) can extend the conventional linear combination method's strength to handle articulated object recogniton, understanding and interpretation, and (iii) needs very few learning samples, compared with other methods in the literature [2,3,4,6,8,17,21,25,27,33,34,36,39,42] For testing my idea of extended LCM, I have constructed an xwindow system including an image databse of dozens of 3D patterns and objects, e.g. model tanks, dryers, baskets, and various 3D industrial parts, as shown in Figures 4.1.

Figure 4.1 An xwindow system for testing \dryer" object recognition 10

Figure 4.2 Another example of 3D articulated object \basket with lid". An example of model tank and its various views of rotation along x, y, and z-axis is shown in Figure 4.3. Notice that a model tank has two joints which can be rotated independently with respect to each other, one for the upper body of the tank, another for the cannon. Other single joint articulated objects such as dryer or basket with lid use only one joint to rotate its articulated portion. For more details and illustrations, please refer to my web homepage at: http://www.ccs.neu.edu/home/pwang/ObjectRecognition.

Figure 4.3 Some model objects(tanks) in the learning image database and their hough transforms So far the correspondence points are chosen manually and the success depends on the choice of threshold values. As some drawbacks indicated in [2], certain objects are misrecognized or incorrectly rejected. In the future, we would like to explore automatic determination of correspondence 11

points, model basis, and threshold values, and performance evaluation, overcome some diculties, plus perspective views. Notice that : (a) there are only nite number of aspect (characteristic) views, (b) each aspect view is composed of only nite number of shapes (triangles, 4-side polygons, 5-side polygons etc), (c) the intersection point of m sides can not match an intersection point of n sides, if m 6= n , and (d) a joint point of an m-side polygon can not be a correspondence point of an n-side polygon, if m 6= n . For examples, in Figure 3.3 (i), point a, which is a joint of two line segments, can not be a correspondence point of point b, which is a joint of three line segments. Similarly, in another aspect (characteristic) view of Figure 3.3 (ii), points d and f can not be correspondence points, because point d is a joint of 3 4-side polygons, while point f is a joint of 2 4-side polygons and a triangle (i.e. fji ). From the above observations, one can list some helpful necessary conditions for two points within the same aspect view to be correspondence points. These will be useful in playing an important role in automatic selection of correspondence points for future research.

5 Bibliography 1. L. Baird and P.S.P. Wang,\3D Object Perception Using Gradient Descent", Int. Journal of Mathematical Imaging and Vision (IJMIV), 5, 111-117, 1995 2. R. Basri, \Viewer-centered representations in object recognition - a computational approach", Handbook of Pattern Recog. and Comp. Vision (eds C.Chen,L.Pau and P.Wang), WSP, (1993) 863-882 3. A. Beinglass and H. Wolfson, \Articulated object recognition, or how to generalize the generalized Hough transform", Proc. ICCVPR, 461-466, 1991 4. P. Besl and R.Jain,\3-d object recognition", ACM Computing Survey, 17(1): 75-154, 1985 5. A.F.Bobick and R.C.Bolles, \The representation space paradigm of concurrent evolving object descriptions", IEEE-PAMI, v14 n2 (1992) 146-156 6. R. Brooks,\Symbolic reasoning around 3-d models and 2-d images", Arti. Int., 17, 285-348, 1981 7. I. Chakravarty and H. Freeman, \Characteristic views as a basis for 3-d object recognition", SPIE: Robot Vision, 336, 37-45, 1982 8. C.Chien and J. Aggarwal,\Shape recognition from single silhouette", ICCV, 481-490, 1987 12

9. R.T.Chin and C.R.Dyer,\Model-based recognition in robot vision", ACM Computing Survey, 18(1), 67-108, 1986 10. S.I.Dickinson, A.P. Pentland and A. Rosenfeld, \3-D Shape Recovery Using Distributed Aspect Matching", IEEE-PAMI, v14, n2, 174-197 (1992) 11. D.Dori, \Self-structured syntax-directed pattern recognition of dimensioning components in engineering drawing", Structured Document Analysis (ed. H.Baird, H.Bunke, K.Yamamoto), Springer Verlag, (1992) 359-384 12. M.Ejiri, T.Uno, H.Yoda, T.Goto and K.Takeyasu,\An intelligent robot with cognition and decision-making ability", Proc. 2nd IJCAI, pp350-358, 1971 13. J.R.Engelbracht and F.M.Wahl,\Polyhedral object recognition using hough-space features", PR, v21 n2 (1988) 155-167 14. B. Girod and S. Scherock,\Depth from defocus of structured light", SPIE Proc. Optics, Illumination and Image Sensing for Machine Vision IV (1989), v1194, 129-146 15. R.Goldberg and D.Lowe,\Veri cation of 3-d parametric models in 2-d image data", Proc. IEEE Workshop on Computer Vision, 255-267, 1987 16. W.E.L.Grimson and T. Lozano-Perez,\Localizing overlapping parts by searching the interpretation tree", IEEE-PAMI, 9(4), 469-482, 1987 17. T.S.Huang and C.H.Lee,\Motion and structure from orthographic projections", IEEE-PAMI, 2(5), 536-540, 1989 18. M.Karima, K.S.Sadhal and T.O.McNeil, \From paper drawing to computer aided design", IEEE Trans. on Computer Graphics and Appl., pp27-39, Feb., 1985 19. R.Kasturi, S.T.Bow, W. El-Masri, J.R. Gattiker, U.B. Mokate,\A system for interpretation of line drawings", IEEE-PAMI, PAMI-12, 10, October 1990, pp978-992 20. B. Liu and P.S.P. Wang, \3D Articulated Object Recognition - A Case Study", SPIE'96, v. 2904 Robotics and Computer Vision, Boston, November 1996, pp14-24 September 1991, pp79-87 21. T. Marill. \Emulating the human interpretation of line-drawings as 3-d objects", IJCV, v6-2, (1991) 147-161 22. E.Marti, J.Regincos, J. Lopez-Krahe and J.Villanueva,\A system for interpretation of hand line drawings as 3-d scene for CAD input", Proc. ICDAR'91, Saint-Malo, France, September 1991, pp472-481 23. I.V.Nagendra and U.G. Gujar, \3-D objects from 2-D orthographic views - a survey", Computers and Graphics, v12, no 1, pp 111-114, 1988 24. S. Negahdaripour and A.K. Jain, \Challenges in computer vision: future research directions", IEEE-CVPR'92, 189-198 25. T.Poggio and S. Edelman, \A network that learns to recognize 3D objects", Nature, (1990), 343: 263-266 26. T.Poggio, \3d object recognition: on a result by Basri and Ullman", TR 9005-03, IRST, Italy, 1990 27. R. N. Shepard and J.Metzler, \Mental rotation: e ects of dimensionality of objects and type of task", I. Exp. Psychol. : Human Perception and Performance, 14 : 3-11 (1988) 28. L.Stark and K.W.Bowyer,\Achieving generalized object recognition through reasoning about association of function to structure", IEEE-PAMI, v13 (1991) 1097-1104 13

29. H.Sakurai and D.C.Gossard,\Solid model input through orthographic views", Computer Graphics, v 17, no 3, pp243-252, 1983 30. C.Y. Suen and P.S.P. Wang (eds), Advances of Thinning Methologies in Pattern Recognition, WSP, 1994 31. K. Sugihara, Machine interpretation of line drawings, MIT Press, Cambridge (1986) 32. Y.Y.Tang, C.D.Yan, M.Cheriet and C.Y.Suen,\Automatic analysis and understanding of documents", Handbook of Pattern Recog. and Comp. Vis., (C.H.Chen, L.Pau and P.S.P.Wang eds), WSP, 1993, 625-654 33. D.W. Thompson and J.L. Mundy, \3D model matching from an unconstrained viewpoint", Proc. IEEE Int. Conf. on robotics and automation, Raleigh, NC, (1987) 208-220 34. S. Ullman and R. Basri, \Recognition by Linear Combinations of Models", IEEE-PAMI, v13, no 10, (1991) 992-1006 35. S. Ullman, \Aligning pictorial descriptions: an approach to object recognition", Cognition, 32(3):193-254, 1989 36. D. Waltz, \Understanding line drawings of scenes with shadows", in The Psychology of Computer Vision(ed. by P. Winston), McGraw-Hill, 1975, 19-92 37. P.S.P.Wang, \A parallel heuristic algorithm for 3d object pattern representation and recognition", Advances in Structural and Syntactic Pattern Recognition (ed. H. Bunke), WSP (1992) 411-420 38. P.S.P.Wang and Y.Y.Zhang, \A fast and exible thinning algorithm", IEEE-Computers, v 38, no 5, (1989) 741-745 39. P.S.P.Wang, \Recognizing line drawings in document analysis", SPIE v2056 Int. Robots and Comp. Vision XII, 1993, 120-131 40. P.S.P.Wang, \A heuristic parallel algorithm for line-drawing object pattern representation and recognition", in Advances in Image Processing, ed by E. Dougherty, Marcel Dekker, (1994) 197-222 41. P.S.P.Wang, \3D Line Image Analysis - A Heuristic Parallel Approach", Int. J. of Information Science, 81/3-4, 155-175, 1994 42. P.S.P.Wang, \Machine visualization, understanding and interpretation of polyhedral objects in document analysis and recognition", Proc. IEEE-ICDAR'93, Japan, 1993 882-885 43. P.S.P.Wang, \Analysis, Learning and Recognition of Articulated Line Drawing Images", Int. J. of CSIM, accepted, to appear (1997) 44. P.S.P.Wang, http://www.ccs.neu.edu/home/pwang/ObjectRecognition/bucket.html 45. P.S.P.Wang, http://www.ccs.neu.edu/home/pwang/ObjectRecognition/tank.html 46. P.S.P.Wang, http://www.ccs.neu.edu/home/pwang/ObjectRecognition/scissor.html

14

Suggest Documents