Object Recognition using Bayesian Networks for

0 downloads 0 Views 71KB Size Report
Em seguida, um algoritmo de estimativa de posiç ˜ao será utilizado para encon- trar a orientaç ˜ao do objeto real, permitindo o processo de registro para objetos.
Object Recognition using Bayesian Networks for Augmented Reality Applications Rodrigo L.S. Silva1,2 , Paulo S. Rodrigues1 , Diego Mazala1 , Gilson Giraldi1 1

LNCC–National Laboratory for Scientific Computing Av. Getulio Vargas, 333, 25651-070 - Petropolis , RJ, Brazil 2

LAMCE - UFRJ - COPPE Mail Box 68552, 21949-900 - Rio de Janeiro, RJ, Brazil rodrigo,pssr,mazala,[email protected]

Abstract. One of the major problems in Augmented Reality (AR) is tracking and registration of both cameras and objects. These tasks must be done accurately to combine real and rendered scenes. In particular, the initialization of the object tracking often remains manual in most systems. This paper proposes the use of Bayesian Networks to perform the initialization phase of the tracking. By recognizing the object, special points are taken and we use this information to create generic markers around the scene. Then, an algorithm for pose estimation is used to find the orientation of the real object to allow the registration process for 3D objects. Resumo. Um dos maiores problemas em Realidade Aumentada e´ o registro e tracking de ambos, cˆamera e objeto. Estas tarefas devem ser feitas precisamente para combinar cenas reais com cenas virtuais. A inicializac¸a˜ o em particular do objeto de tracking e´ feita manualmente em diversos sistemas. Este artigo prop˜oe o uso de Redes Bayesianas para realizar a fase de inicializac¸a˜ o do sistema de tracking. Ao reconhecer o objeto, pontos chaves s˜ao extra´ıdos do mesmo e usamos esta informac¸a˜ o para criar marcadores gen´ericos na cena. Em seguida, um algoritmo de estimativa de posic¸a˜ o ser´a utilizado para encontrar a orientac¸a˜ o do objeto real, permitindo o processo de registro para objetos 3D.

1. Introduction In AR applications, tracking and registration of both cameras and objects are required because, to combine real and rendered scenes, we must project graphical data representations at the right location in real scenes [Azuma, 1997]. Recently, visual tracking of image features was used to establish camera pose and register the virtual object with the camera view [Seo and Hong, 2000, Kutulakos and Vallino, 1998]. In this paper, our emphasis is on the automation of the initialization phase of the tracking. This is performed by addresssing the major aspects of the problem of model-toimage registration: object recognition, feature detection, correspondence and pose estimation. Our recognition process is performed using a Bayesian Network which combines features such as color and shape. Once the object geometry is recovered from the model data base, we perform feature detection and extraction from a video sequence. The correspondence between the model features and image features are stablished based on the same Network. Finally, pose estimation will be addressed through a version of the POSIT algorithm [Shahrokni et al., 2002].

This paper is organized as follows. Section 2 discusses our Bayesian network approach. Final comments and results are given on section 3.

2. The Bayesian Network Model In this work, we use a Bayesian Network to perform object recognition. This Network stores geometric and color information of the objects of interest. Then, suitable features are detected by exploiting a priori knowledge about the targets such as corners and edges. Hypotheses are constructed as feasible configurations of the features. We then use the set of hypotheses to identify the transformation which best describes the object pose. The Bayesian networks are a graphical framework which represent the interdependence between variables of a probability distribution [Jensen, 2001]. This distribution is represented by a directed graph with no circuit, where the nodes are the random variables and the arrows are their relationships. The intensity of these interdependences are expressed by conditional probabilities associated to the graph’s arrows. 2.1. The Proposed Model In this section, we show how to calculate the probability of an event O (target object), given that query Q was observed. This probability, denoted by P (O|Q), is calculated with two characteristics: color and shape (Figure 1).

Figure 1: The proposed Bayesian Network model. This model combines 2 features (color and shape). Each one is represented by a set at the second row. Q is the query and the bottom row stands for the database objects.

In this work, P (O|Q) is calculated according to the following equation: P (Ij | Q) = η [1 − (1 − P (Cj | C)) × (1 − P (Sj | S)))]

(1)

where P (Cj | C) is the probability of occurrence of the color Cj , given that the color C was observed in the query Q; and P (Sj | S) stands for the shape feature. The used model was simplified from [Coelho et al., 2004, Jensen, 2001] discarding the third evidence (texture). This equation can be used as a similarity measure between two images of objects regards to two characteristics, color and shape. It permits to consider each term separately setting to zero the other. For example, if we want to consider only color contributions, we should set P (Sj | S) = 0.

3. Results and Discussion We have proposed a method for object recognition based on a Bayesian Network for augmented reality applications using general markers on the scene. The first stage of our proposal consists in populating our network model with features of the objects of interest. Then, by detecting facets in the image using segment detection, we extract the necessary features to make the correspondence between the 2D and 3D points (Figure 2).

(a)

(b)

(c)

(d)

Figure 2: In this example, the front faced box is the target object. (a) is the original frame and (b) shows the target’s extraction. (c)-(d) show edge (feature) detection in different frames.

As our framework is not finished, we are researching pose estimation methods [Alter, 1992, Wilson, 1993, Shahrokni et al., 2002] to be used in our application to accomplish the 3D registration process. We are currently working on optimizations of our model to be able to use more complex objects and features.

References Alter, T. (1992). “3D Pose from Corresponding Points under Weak-Perspective Projection”. Technical report, MIT Artificial Intelligence Lab. Azuma, R. (1997). “A Survey of Augmented Reality”. ACM SIGGRAPH, 1-38. Coelho, T., Calado, P., Souza, L., Ribeiro-Neto, B., and Muntz, R. (2004). “Image Retrieval Using Multiple Evidence Ranking”. IEEE Transactions on Knowledge and Data Engineering, 16(4):408–417. Jensen, F. V. (2001). Bayesian Networks and Decision Graphs. Statistics for Engineering and Information Science Series. Springer. Kutulakos, K. and Vallino, J. (1998). “Calibration-Free Augmented Reality”. Proc. IEEE Trans. Visualization and Computer Graphics, vol. 4, no. 1, pp. 1-20. Seo, Y. and Hong, K. S. (2000). “Calibration-Free Augmented Reality in Perspective”. Proc. IEEE Trans. Visualization and Computer Graphics, vol. 6, no. 4, pp. 346-359. Shahrokni, A., Vacchetti, L., Lepetit, V., and Fua, P. (2002). “Polyhedral Object Detection and Pose Estimation for Augmented Reality Applications”. In Proceedings of the Computer Animation 2002, page 65. IEEE Computer Society. Wilson, W. (1993). “Visual Servo Control of Robots Using Kalman Filter Estimates of Relative Pose”. In Proc. IFAC 12th World Congress, pages 9–399 to 9–404, Sydney.