Baseline Detection and Localization for Invisible ... - CiteSeerX

International Journal of Computer Vision 58(3), 209–226, 2004 c 2004 Kluwer Academic Publishers. Manufactured in The Netherlands.

Baseline Detection and Localization for Invisible Omnidirectional Cameras HIROSHI ISHIGURO Department of Computer & Communication Sciences, Wakayama University, Japan [email protected]

TAKUSHI SOGO Department of Social Informatics, Kyoto University, Japan MATTHEW BARTH Department of Electrical Engineering, University of California, Riverside, USA Received January 7, 2002; Revised October 30, 2003; Accepted November 5, 2003

Abstract. Two key problems for camera networks that observe wide areas with many distributed cameras are self-localization and camera identification. Although there are many methods for localizing the cameras, one of the easiest and most desired methods is to estimate camera positions by having the cameras observe each other; hence the term self-localization. If the cameras have a wide viewing field, e.g. an omnidirectional camera, and can observe each other, baseline distances between pairs of cameras and relative locations can be determined. However, if the projection of a camera is relatively small on the image of other cameras and is not readily visible, the baselines cannot be detected. In this paper, a method is proposed to determine the baselines and relative locations of these “invisible” cameras. The method consists of two processes executed simultaneously: (a) to statistically detect the baselines among the cameras, and (b) to localize the cameras by using information from (a) and propagating triangle constraints. Process (b) works for the localization in the case where the cameras are observed each other, and it does not require complete observation among the cameras. However, if many cameras cannot be observed each other because of the poor image resolution, it dose not work. The baseline detection by process (a) solves the problem. This methodology is described in detail and results are provided for several scenarios. Keywords: omnidirectional camera, distributed vision, invisible camera, identification, localization, baseline detection, triangle constraint, constraint propagation

1.

Introduction

In recent computer vision research, a number of multiple camera system applications and approaches have been proposed. One of the most popular applications of a multiple camera system is to monitor humans and other moving objects. Several monitoring systems have been developed in the VSAM project sponsored by DARPA of the USA (Vsam, 2001; Collins, 1999). The basic strategy that detects moving regions in images

by background subtraction and tracks them using calibrated cameras is not particularly novel comparing with previous work. However, progress has been made in that the systems are far more robust through the use of contextual information (Medioni et al., 2001). As studied in VSAM and other similar projects in the world, the most important aspect of a multiple camera system is to monitor a wide area. Previously, the authors have proposed a distributed omnidirectional vision system as a new information infrastructure for

210

Ishiguro, Sogo and Barth

monitoring dynamic worlds (Ishiguro, 1997; Ishiguro and Nishimura, 2001). The wide viewing field of the omnidirectional camera is suitable for monitoring tasks that require observing targets from various viewing directions. We have developed a realtime human tracking system which covers a wide area with a relatively small number of omnidirectional cameras. One of the key problems in the multiple camera systems is camera localization. In the wide-area monitoring systems where cameras are widely distributed, it is not easy to precisely measure the location by hand. Therefore, the systems that observe wide areas with many distributed cameras need a better, preferably automatic, method for camera localization. This paper proposes the automatic camera localization. Before discussing on the method, let us overview previous approaches for camera calibration and localization for multiple camera systems. Jain and his colleagues used a well-known method for camera localization and calibration (Jain and Wakimoto, 1995; Boyd et al., 1998). In this work, a target is used that can be observed from all cameras in the system. Torr and Murray developed a more robust and elegant method for wide-baseline stereo calibration problem (Torr and Murray, 1997). By using the precisely calibrated cameras, Kanade developed a multiple camera system that provides the best view of players in an American football stadium (Eyevision, 2001). In these research approaches, their purpose was to observe a relatively small area with multiple cameras. Thus, all cameras can observe a common target for the calibration process. On the other hand, our purpose of this paper is to find positions widely distributed cameras for monitoring the wide area and not for reconstructing precise geometry of targets. The real-time human tracking is one of the applications. In addition to that, it is not so difficult to place the cameras at the same height in such the applications. Therefore, the problem we should solve is to find camera positions in the wide 2-D space. The problem cannot be considered as general camera calibration but rather a labeling problem among cameras. Let us consider the problem we solve in this paper again. The system consisting of many cameras distributed in a wide space monitors dynamic events. For this purpose, an omnidirectional camera that has

a wide viewing field will be an ideal imaging device. Thus, the problem is considered as a localization problem of omnidirectional cameras in a distributed omnidirectional vision system (Section 2) where the cameras placed in the same height. A simple method for automatic localization is by having the cameras that observes each other and finding the locations from the angles between cameras. Suppose there are three cameras A, B, and C observing each other. Each camera has two projections of the other cameras in the image. For example, camera C has two projections a and b of camera A and B. From the distance between a and b, the angle AC-AB can be measured. Each camera measures the angle between other cameras in the same way. From the acquired angles, the triangle ABC can be determined up to a scale factor. In this paper, an automatic method is proposed for localizing cameras based on this simple idea. The fundamental idea is simple but it is not easy to apply in a real-world multiple camera system. In a real-world system, two key difficulties arise: 1. In general, it is difficult to distinguish the camera projections when the system consists of identical cameras. Even if the camera observes all the projections of the other cameras, it is difficult to identify which camera is which since they all have the same visual features. In this paper, we refer to this as the identification problem. 2. A more serious problem is that today’s cameras are becoming increasingly smaller due to progress in CCD technology and associated circuitry. As a result, we cannot expect that all cameras are observable in the other camera projections. We call this the invisible camera problem. Basically, we can decide the camera positions by solving the identification problem if the cameras observe each other, but we need to deal with the invisible camera problem if the camera projections on the images are very small. Therefore, we deal with the invisible camera problem and show how to find the baseline directions (directions to other cameras) in the images. In order to solve this problem, the system observes moving objects in the environment. If we detect three baselines among three cameras by solving the invisible camera problem, we can decide the positions up to a scale factor. However, in the case where many cameras exist and the cameras observe each other, an algorithm that

Baseline Detection and Localization for Invisible Omnidirectional Cameras

211

solves identification problem efficiently decides the camera positions. That is, two algorithms for the invisible camera problem and identification problem are executed simultaneously and the algorithm for the invisible camera problem gives supplemental information to the algorithm for the identification problem. Let us summarize the basic assumptions in this paper here. (1) The system consists of many omnidirectional cameras. (2) The cameras are placed at the same height. (3) Many of the cameras observe each other (4) There are cameras that cannot be observed each other because of the poor image resolution. (5) There are moving objects in the environment. These assumptions are enough reasonable for practical systems discussed in the next section. By solving the problems based on the assumptions, we can realize robust localization in the distributed omnidirectional vision system consisting of many small omnidirectional cameras. In the following sections, we propose methods to solve the invisible camera problem (Section 3) and the identification problem (Section 4). 2.

Distributed Omnidirectional Vision Systems

The Internet has changed the world rather significantly. Many distributed computers connected in various places are enhancing human communication abilities. On many of these computers, cameras can be connected for monitoring purposes, leading to a “distributed” vision system. As the next generation Internet unfolds, it is considered that each computer will have the ability to acquire real world information via cameras that will more tightly couple virtual worlds with the real world. The distributed omnidirectional vision system used in our experimentation is a testbed of such a nextgeneration Internet system. The basic concept of this system and fundamental problem definitions were proposed in Ishiguro (1997). Generally speaking, it is not trivial to develop a computer vision system that can be used in the real-world. The sensor data are noisy and the environment changes readily. One approach to solve this problem is to use many cameras each of which executes simple and robust vision tasks. Complex vision

Figure 1. Low-cost and compact omnidirectional camera (it includes a C-MOS CCD camera on the bottom).

tasks are realized by integrating the cameras through the network. The omnidirectional camera shown in Fig. 1 is a key device of the distributed vision system. It has a wide visual field and is an ideal camera for both observing wide areas and automatically localizing the cameras. Standard recti-linear cameras have a limited visual field of about 30–60 degrees. As a result, the arrangements of these cameras in a multiple camera system are rather restricted if they are to observe each other. Omnidirectional cameras do not have this restriction. They can observe each other in any direction. Based on this fundamental idea and by using omnidirectional cameras as the key sensors, we have developed various distributed vision systems. One of them is the robust and real-time human tracking system shown in Fig. 2. Sixteen omnidirectional cameras distributed in the room track multiple humans simultaneously by communicating with each other. In the development of the system, we have acquired camera positions by manually measuring the projections in the cameras. However, this measurement process takes a long time and is rather tedious. Similar camera appearances are confusing. Further, it is not easy to find projections of the cameras because of the small size of the sensors. As wider areas are covered with additional sensors, an automatic localization method is needed. This paper contributes to solving this fundamental problem for distributed omnidirectional vision systems.

212


Figure 2.

3.

Distributed omnidirectional vision system for tracking multiple humans in real time.

Solution for the Invisible Camera Problem—Statistical Estimation of the Baseline Directions Among the Cameras

As described in Section 2, the distributed omnidirectional vision systems have difficulty in measuring camera positions, due to the invisible camera problem. As a solution to the problem, this section proposes an estimation method of the baseline directions among the cameras. The camera positions are computed from the baseline directions.

hand, when the object passes the points d, e, and f, camera 1 observes it in the same azimuth angle, however, camera 2 observes it in different azimuth angles. Thus, the object passing the baseline between two cameras is always projected onto the camera views in the same azimuth angle. Assuming that the object randomly moves among the cameras, the baseline direction can be estimated by memorizing pairs of azimuth angles of the object projections in each camera view, and by checking the pairs that are obtained relatively many times.

3.1.

3.2.

Fundamental Idea

Figure 3 illustrates the fundamental idea of the proposed method. In Fig. 3, there are two omnidirectional cameras. When there is an object moving among them and it passes the points a, b, and c, cameras 1 and 2 observe it in the same azimuth angle. On the other

Algorithm for Statistical Estimation

Based on the above discussion, the baseline directions among cameras are statistically estimated. Assuming that the cameras may be accidentally moved in the real-world environment, the proposed method dynamically estimates the baseline directions based on dynamic (i.e., real-time) information obtained by observing moving objects as follows. Each camera simultaneously observes objects and determines the azimuth angle to each object: 1 d11 , d21 , . . . , dm1 1 , . . . , d M , 1 2 d12 , d22 , . . . , dm2 2 , . . . , d M , 2

... Figure 3.

Fundamental idea for baseline estimation.

N d1N , d2N , . . . , dmNN , . . . , d M N

(1)


where N is the number of cameras, and Mi is the number of objects observed by camera i. dmi i is the azimuth angle of the m i -th object observed by camera i (represented in camera i’s local coordinates). Note that the number of detected objects may differ according to cameras. Then, every pair of azimuth angles is considered:

as a correct pair of baseline directions. The reliability of pk is increased by rinc , and pk is removed from Punreliable . (b) If one of the azimuth angles of pk is equal j to dmi i or dm j (this case corresponds to the positions d, e, and f in Fig. 3), pk is considered as a wrong pair of baseline directions. pk is left in Punreliable . (c) If no element in P matches the above conditions, pi, j,m i ,m j is considered as a new possible pair of baseline directions, and added to P with an initial reliability rinitial .

1 2 d11 , d12 , d11 , d22 , . . . , dm1 1 , dm2 2 , . . . , d M , , dM 1 2 1 3 1 3 1 1 3 d1 , d1 , d1 , d2 , . . . , dm 1 , dm3 3 , . . . , d M , , dM 1 3 ... (2) i j i j i j i d1 , d1 , d1 , d2 , . . . , dm i , dmj j , . . . , d M , dM j , i ... N −1 N N −1 N d1 , d1 , d1 , d2 , . . . , dmNN−1 , dmNN , . . . , −1 N −1 N d M N −1 , d M N

In general, these pairs are represented as pi, j,m i ,m j = j (dmi i , dm j ). The estimation algorithm memorizes these pairs as possible baseline directions, with initial reliability. By iterating observation, a number of pairs of azimuth angles are obtained, and the reliability of each pair is increased or decreased. Finally, only the pairs with a high reliability are considered as correct baseline directions. The detailed process is as follows (rinitial , rinc , rdec , Treliable , and Tunreliable are predetermined constants): Step 1. Obtain azimuth angle pairs pi, j,m i ,m j = j (dmi i , dm j ) by observation as described above. Step 2. Initialize Punreliable with P: Punreliable ← P where P is a set of possible pairs of baseline directions obtained in the previous estimation process. Note that P is empty in the beginning of the estimation process. Punreliable is a set of pairs of azimuth angles that are considered as unreliable baseline directions. As shown above, all of the pairs in P are considered as unreliable at this moment. Step 3. Compare every azimuth angle pair pi, j,m i ,m j = j (dmi i , dm j ) with the elements in P: (a) If both of the azimuth angles of pk (an elj ement in P) are equal to dmi i and dm j of pi, j,m i ,m j (this case corresponds to the positions a, b, and c in Fig. 3), pk is considered

213

Step 4. With respect to elements included in Punreliable , decrease the reliability of corresponding elements in P by rdec . If the reliability becomes smaller than a threshold Tunreliable , remove the element from P. Whenever the cameras observe the objects, the above steps are performed to update the reliability. Finally, the elements in P whose reliability is greater than a threshold Treliable are considered as correct baseline directions. When comparing azimuth angles in Step 3, an azimuth angle α is considered as equal to β if α = β or α = β ± π (i.e., α is opposite to β), since an object on the baseline may be located at three different positions with respect to the cameras in omnidirectional stereo as shown in Fig. 4. For actual baseline directions (let these directions be d1 and d2 ), the following three azimuth angle pairs are possible: 1. 2. 3. 4.

(d1 , d2 ) (see Fig. 4(a)), (d1 + π, d2 ) (see Fig. 4(b)), (d1 , d2 + π ) (see Fig. 4(c)). (d1 + π, d2 + π )

Theoretically, the pair (d1 +π, d2 +π ) is impossible, except for the case when camera 1 observes an object in (d1 + π ) and camera 2 accidentally observes another similar object in (d2 + π ).

Figure 4.

Three different positions on the baseline.

214


By iterating the estimation process, the above azimuth angle pairs (actually regarded as the same azimuth angle pair) are obtained with a high reliability with respect to the baseline between two cameras. Since the pair (d1 + π, d2 + π ) is obtained relatively fewer times than other three pairs, it can be distinguished from other three pairs by checking how often each pair is obtained. In addition, the opposite azimuth angle pair (d1 , d2 ) indicates the actual direction from one camera to the other camera. 3.3.

Increase Ratio of the Reliability

The quality of the results depends on the increase ratio of the reliability (rinc : rdec ). For example, the method may detect many wrong baselines with a high ratio (rinc rdec ) since the reliability of the azimuth angle pairs quickly becomes greater than the threshold Treliable , while it cannot detect baselines with a low ratio (rinc rdec ) since the reliability remains smaller than Treliable . In this subsection, we discuss how to determine a proper increase ratio for baseline estimation. Figure 5 shows the configuration of two cameras 1 and 2, where θ represents the angular resolution of the cameras, and integer values k1 and k2 are assigned to each direction (0 ≤ (k1 θ, k2 θ ) < π). δ1 and δ2 (0 ≤ (δ1 , δ2 ) < θ ) represent the differential angle between the actual baseline direction and the direction of the reference axis of each camera (the zero azimuth). l is the baseline length and L is the observable range of the cameras (0 < l < 2L). N is the number of different directions, which is given by 2π/θ.

Figure 5.

Let us consider an azimuth angle pair (k1 , k2 ) = (0, 0) that indicates the baseline directions. This pair is obtained when the object is located in one of the regions R (light gray regions) in Fig. 5. Since this pair is obtained many times by iterating observation, the reliability of the pair is increased by rinc in Step 3 (a) as described in Section 3.2. On the other hand, if the object is located in one of the regions S (dark gray regions) in Fig. 5 (i.e., near the baseline), one of the cameras observes it in the baseline direction, but the other camera does not. In this case, azimuth angle pairs (k1 , k2 ) = (0,∗ ) and (k1 , k2 ) = (∗ , 0) (“∗ ” takes an arbitrary value except 0) are obtained, and the reliability of these pairs including the pair (k1 , k2 ) = (0, 0) is decreased by rdec in Step 3(b) and Step 4. In the above process, the pair (k1 , k2 ) = (0, 0) that indicates the correct baseline directions should remain with relatively high reliability compared to other pairs. Since the number of each pair obtained by iterating observation depends on the size of each region in Fig. 5, the increase ratio rinc and the decrease ratio rdec of the reliability must satisfy the following inequality: rinc R > rdec S

(3)

where R and S indicate the size of the regions R and S, respectively. Inequality (3) means that the reliability of the pair (k1 , k2 ) = (0, 0) should increase in total. Then, another azimuth angle pair other than the baseline (e.g., (k1 , k2 ) = (2, N /2 − 2)) is considered. This pair is obtained when an object is located in the region

Object location that gives an azimuth angle pair (k1 , k2 ) = (0, 0).


Figure 6.

215

Object location that gives an azimuth angle pair (k1 , k2 ) = (2, N /2 − 2).

R (see Fig. 6), and its reliability is increased by rinc . On the other hand, if the object is located in one of the regions S , azimuth angle pairs (k1 , k2 ) = (2,∗ ) and (k1 , k2 ) = (∗ , N /2 − 2) (“∗ ” takes an arbitrary value except N /2 − 2 and 2, respectively) are obtained, and the reliability of these pairs including the pair (k1 , k2 ) = (2, N /2 − 2) is decreased by rdec . Since the pair (k1 , k2 ) = (2, N /2 − 2) does not indicate the correct baseline direction, the reliability should be decreased after iteration of observation. Therefore, rinc and rdec must satisfy the following inequality: rinc R < rdec S

(4)

where R and S indicate the size of the regions R and S , respectively. Inequality (4) means that the reliability of the pair (k1 , k2 ) = (2, N /2 − 2) should decrease in total. Note that the actual values of R and S depend on the location (i.e., the values of (k1 , k2 )). Therefore, Inequality (4) must be satisfied at arbitrary location: rinc S (5) < min k1 ,k2 rdec R where k1 = 0 and k2 = 0. Consequently, rinc and rdec should satisfy: S S rinc (6) < min < k1 ,k2 R rdec R for arbitrary δ1 , δ2 , and l (0 < l < 2L). However, such rinc and rdec do not exist, since:

• When δ1 and δ2 are close to 0 or θ (i.e., the difference between the reference axis and the actual baseline direction becomes large), R becomes small and S becomes large, and S/R (the left side of Inequality (6)) becomes large. • When l is relatively small compared to L (i.e., two cameras are very close to each other), or when l is larger than L, mink1 ,k2 (S /R .) (the right side of Inequality (6)) becomes small. In order to determine proper rinc and rdec that satisfy Inequality (6), other values such as δ1 , δ2 , and l should be limited to a specific range. For example, the results of preliminary experimentation show that Inequality (6) is satisfied with rinc /rdec ∼ = 7.0 on condition that 0.5L ≤ l ≤ 1.0L, θ = 2π/52, and the error margin of k1 and k2 is ±1. 3.4.

Experimentation

We have evaluated the proposed method in both of a simulated and real environment. Figure 7 shows the common configuration of four omnidirectional cameras in both environments. In these experiments, the following parameters are used: rinc = 7,

rdec = 1,

rinitial = 7,

Treliable = 250,

Tunreliable = 0 (See Section 3.2 for detailed meanings of the parameters). The cameras determine azimuth angles to

216

Ishiguro, Sogo and Barth objects with the resolution of 360/52 degrees (i.e., θ = 2π/52).

Figure 7.

Camera configuration (top view).

Figure 8. The number of detected baselines in the simulated environment.

Figure 9.

Outdoor experimentation.

3.4.1. Simulated Environment. In the simulated environment, the method iterates the following process: (1) randomly place several objects in the environment, (2) measure the azimuth angles to the objects within approximately 8m from each camera (i.e., L = 8.0. This is determined based on the experimentation in the real environment shown in the next paragraph), and (3) perform the estimation process described in Section 3.2. As shown in Fig. 8, when there is one object in the environment, the method has detected all of the six baselines after 100,000 observations. With three objects, it has also detected six baselines, however, two of them were next to the actual baselines, and two of the actual baselines have not been detected. In the case of five objects, the method has detected sixteen baselines, which include all of the six correct baselines, four baselines next to the actual ones, and six wrong baselines. It seems that the method have detected wrong baselines on account of many false matches among the projections of objects in the camera views. 3.4.2. Real Environment. We have also evaluated the method in the real environment, with the same camera configuration as the simulation (see Fig. 9). In this experimentation, the cameras detect objects (usually, walking people) by background subtraction, and measure the azimuth angles to the objects. Figure 10 shows four omnidirectional images taken with the cameras.


217

Figure 10. Unwrapped omnidirectional image. Each of the white clusters indicates the azimuth angle to an object detected by background subtraction.

The graph in the bottom of each image shows the result of background subtraction based on intensity, where the horizontal center of each cluster is considered as the azimuth angle to the detected object. In the real environment, we should ignore stationary objects. If not, the same pair of azimuth angles is continuously obtained by observing a stationary object, and results in unexpected increase of the reliability of the pair in Step 3 (a) of the estimation process (see Section 3.2), even if the pair represents a wrong baseline direction. Therefore, in this experimentation, we have added the following step to the estimation process described in Section 3.2:

• (After Step 1) With respect to a pair pi, j,m i ,m j = j (dmi i , dm j ), if the projections of an object in the azj imuth angles dmi i and dm j do not move in the omnidirectional views of cameras i and j, respectively, the method considers pi, j,m i ,m j as that of stationary object and ignores it in the process of Step 3. This step also eliminates observation errors when false objects are continuously detected in the same azimuth by background subtraction due to background noise. In this experimentation, the error margin used for comparing azimuth angles in Step 3 is 1 (the unit is 360/52 degrees).

218


provides angular information between cameras which is necessary for the camera localization. This section proposes how to effectively and precisely localize many cameras. The method proposed in Section 3 solves the identification problem between cameras. However, if we consider a general case, some of the cameras may observe each other. In such a case, we have to solve the identification problem of the camera projections to estimate the baselines. The method proposed here handles the general camera localization and identification. 4.1.

Figure 11. The number of detected baselines in the real environment.

The Algorithm for Identification and Localization

Given N omnidirectional cameras located randomly within a region, the overall goal is to identify all of the cameras and to know the relative positions between them. Prior to describing the details of the algorithm, several assumptions must be stated: 1. Each camera has an omnidirectional vision sensor and can view other cameras. 2. All cameras have the same body that can readily be found in the environment; however, the cameras cannot be visually identified by each other. 3. Each camera cannot precisely measure the distance to other cameras (although rough distance measurements may be possible by viewing the camera image size);

Figure 12. Detected baselines in the real environment (top view). The arrows indicate wrong directions.

Figure 11 shows the number of detected baselines in the real environment. After 250,000 observations (approximately 4.5 hours), the method has detected 10 baselines. Figure 12 shows the directions of the detected baselines overlaid on the actual camera positions. Two of them indicated with the arrows are wrong, however, the others indicate nearly correct directions. Note that 20 directions are shown in Fig. 12 based on the detected baselines, since each baseline is represented with a pair of two azimuth angles.

Each camera observes the other cameras and determines the azimuth angle to each relative to some base viewing angle. These data can be represented as: r 1 d 1 , d2 , . . . , d N 1 r 2 d 1 , d2 , . . . , d N 2

···

r N d 1 , d2 , . . . , d N N

(7)

where ri is the camera ID, and dn is the azimuth angle to the n-th camera (for Nn observable cameras). From these data, the following angles between two observed cameras can be determined: 1 1 1 1 1 1 θ1,2 , θ1,3 , . . . , θ1,N , θ2,3 , θ2,4 , . . . , θ2,N , . . . , θ N1 1 −1,N1 1 1

4.

Solution for the Identification Problem—Propagation of a Triangle Constraint

Section 3 has proposed a solution for the invisible camera problem. Even if the camera is small, the method

2 2 2 2 2 2 θ1,2 , θ1,3 , . . . , θ1,N , θ2,3 , θ2,4 , . . . , θ2,N , . . . , θ N2 2 −1,N2 2 2

··· N N N N N N θ1,2 , θ1,3 , . . . , θ1,N , θ2,3 , θ2,4 , . . . , θ2,N , . . . , θ NNN −1,N N N N

(8)


219

where the superscript represents the ID of the observing camera and the subscripts index the observed cameras. For each observing camera, the angles between all of the observed camera combinations are represented. Note that the algorithm does not assume that each camera can see an equal number of other cameras, therefore Ni represents the total number of observed cameras for camera i. This angle representation can be simplified as follows: Figure 14.

1 θ11 , θ21 , . . . , θm1 1 , . . . , θ M 1 2 θ12 , θ22 , . . . , θm2 2 , . . . , θ M 2

···

(9)

θ1N , θ2N , . . . , θmNN , . . . , θ MN N In this case, the single subscripts index the observed camera combinations. m i is the index and M j is the total number of observed camera combinations for observing camera j. 4.1.1. Triangle Constraint. At this point, we want to look at different combinations of these angles. One of the key constraints that is used in the algorithm is the fact that the relative angles between three cameras always add up to 180◦ . Each camera represents a vertex in a triangle, and the angles between cameras must add up to 180◦ (see Fig. 13). We refer to this as the triangle constraint (Kato et al., 1999). We consider different observed camera angles from combinations of three different observing cameras: θmi i + θmj j + θmk k = 180◦

(10)

Neighboring triangles.

4.1.2. Triangle Verification. The resulting triplets from the previous step may contain impossible triangles. These impossible triangles can be classified into four different types, as shown in Fig. 15. In order to eliminate these impossible triangle combinations, additional processing must be carried out on the triplets generated from the previous step. This processing involves evaluating neighboring triangle candidates (see Fig. 14) generated from the triangle constraint. The procedure is as follows: 1. Each triangle from the candidate list is selected. 2. For a particular triangle, each edge is examined. An edge of a triangle is represented by the two angles j on each end (θmi i , θm j ). All of the other candidate triangles are then examined to see if they contain the same edge. One candidate triangle is represented j j by (θmi i , θm j , θmk k ) another triangle, (θmi i , θm j , θml l ) shares the edge (ri , r j ). 3. For all pairs of triangle candidates that share and edge, check to see if other triangle candidates exist

For all combinations of three cameras (ri , r j , rk ), all observed camera combinations (indexed m j = [1, . . . , M j ], and by m i = [1, . . . , Mi ], m k = [1, . . . , Mk ]) are checked. The resulting triplet combinations that satisfy the triangle constraint will allow us to compute the relative positions of the cameras.

Figure 13.

Triangle constraint.

Figure 15.

Impossible triangles.

220


that contain the opposite edge, and one of the common vertices of the original triangle pair. In the example, the opposite edge would be (rk , rl ). Candidate triangles would be checked to see if they contain this edge and vertices ri or r j . If such triangles exist and all angles observed by all cameras are different from each other, the triangles (ri , r j , rk ), (ri , r j , rl ), (r j , rk , rl ), (r j , rk , rl )

(11)

are uniquely determined. When the triangles are uniquely determined, the projections (directions) of other cameras are identified between images taken by the cameras, and at the same time, the positions can be computed. That is, this method solves the identification problems for the projections, and then determines the locations. Further, the locations are precisely computed with sufficient information, since all of the cameras can observe each other. 4.1.3. Error Handling. Two major difficulties arise in actual situations: 1. Some of the cameras may have identical angles and corresponding combinations as shown in Fig. 16. In such cases, the triangle verification technique does not identify the camera projections; 2. Some of the angles belonging to an observing camera may have significant errors, and as a result the triangle constraint may not be met; 3. In a real environment, obstacles may exist which obstruct the camera views of each other. In order to handle these problems, the triangle constraint can be applied allowing for an error δ in the angle observations. If δ is set too small, the triangle constraint will not be met in many cases where there are valid triangles, and too few candidate triangles will be gener-

Figure 16. A case where the single triangle verification method. does not correctly identify the angles.

Figure 17.

Iterative verification.

ated. If δ is too large, then too many angles will meet the triangle constraint, generating too many candidate triangles. In this later case, it is possible to apply again the triangle verification technique described previously. In fact, for the case when there are many cameras, the triangle verification technique can be extended beyond simply finding a single opposite, verifying triangle that supports the hypothesis of the original triangle. For a large number of cameras, many “verification” triangles can be found. Figure 17 is an example of this. In this Figure, consider triangle (1, 2, 3) as the reference triangle. With neighbor triangles (1, 2, 4) and (1, 3, 4), we can identify projections between cameras 1, 2, 3, and 4. With neighboring triangles (2, 3, 5) and (1, 3, 5), we can identify projections between cameras 1, 2, 3, and 5. Further, by considering the verification triangle (2, 3, 4) as a new reference triangle, we can identify projections between cameras 2, 3, 4, and 6 with neighboring triangles (2, 3, 6) and (3, 4, 6). We apply this process to all triangle candidates acquired using the triangle constraint and sum up the number of verification triangles. In the example of Fig. 17, the total number of verification triangles for reference triangle (1, 2, 3) is 2 × 3 = 6. The triangles that have the maximum number of verification triangles can be considered as the best solution. Even still, this “best” solution may not be unique since there could be several solutions that have an equal maximum number of verification triangles. In order to overcome this problem, positioning information can be used. Given a single solution, the relative positions of the sensors can be determined from a set of reference triangles. For each reference triangle, the position information can also be calculated from the associated verification triangles. With noisy observations, this position information will be slightly different than the reference-triangle-based positions. The camera positions are determined by a leastsquare method. First, we select two sensors as reference


cameras for determining a global coordinate system. Then, each camera position (X r , Yr ) is computed in the global coordinate system as (X r , Yr ) =

n

Xi

i=1

n,

n

Yi

n

(12)

i=1

where (X i , Yi ) is a computed position with a triangle, and n is the number of the triangles which have the same vertice. It is possible to estimate the error between these position estimates as follows: E=

pri − pr i

2

(13)

where p(ri ) is a sensor position using projections from the reference triangle, and p (ri ) is the sensor position using projections from neighboring verification triangles. The solution that has the minimum positioning error E can be used as the best solution. 4.1.4. The Process and Computational Cost. The method proposed in this paper filters out possible solutions using the triangle verification technique and selects the solution that has the minimum positioning error. The process is summarized as follows: Step 1. List up all triplet combinations that satisfy the triangle constraint with an error δ in the angle observations (the unique solution is given in the case where all of the angles are different from each other); Step 2. Apply the triangle verification technique to all of the triplets and eliminate invalid triplets from the list (the unique solution is given in the case where combinations of angles for a camera are different from the angles observed by other cameras); Step 3. Estimate the positioning error for all of the remaining candidates and select the solution that has the minimum error. In the process, suppose a unique solution is acquired at the end of Step 1. The computational cost will be N C3

= O(N 3 )

(14)

If a unique solution is acquired at the end of Step 2, the maximum computation cost (in the case where all

221

triplets remain from Step 1) will be N C3

• (N − 3)! > O(N 3 )

(15)

Theoretically, the computational cost is high. However, Step 1 typically filters out almost all of the candidates, and only a few good candidates are considered for which the triangle verification is successfully applied. Based on our preliminary experimentation (see next section), the computation for up to 10 cameras can be achieved in real time. If needed, parallel computation can be employed for increase performance. In order to perform the parallel computation, the cameras should be divided into local groups. The coarse range information given by the camera size projected on the omnidirectional images can be used for the localization. We consider the proposed method will be practical by using the local and parallel computation for a small number of cameras. 4.2.

Experimental Results

In order to verify our method, both simulation and realworld experiments have been carried out. 4.2.1. Simulation Experiments. A simulation program has been created that can randomly place omnidirectional sensors within a region. For each camera, the azimuth directions to the other sensors are determined. From these data, the angles between observed cameras (i.e., θmi i ) are calculated. The algorithm described in the previous section is then performed. Figure 18 shows six simulation results for the case where all cameras can precisely observe all of the other cameras in the region. For each simulation, the camera locations have been randomly generated. The square dots show the ground truth camera locations and the circular dots show the reconstructed positions. They have completely overlapped. In another simulation case, observation errors are introduced. A mis-identified camera is added to the observation angles of the cameras. In this case, all of the cameras still observe each other, but they also observe an object that is mis-identified as a camera. As before, the observation angles are generated and processed. The results are shown in Fig. 19. The square dots show the ground truth camera positions and the circular dots show the reconstructed results. Again the camera positions are correctly reconstructed. In Fig. 19(a),

222


Figure 18.

Simulation results for verification of the algorithm.

Figure 19.

Simulation results in cases there are similar objects to the camera.

there are six cameras and an object which seems like a camera. Figure 20 shows the performance of the proposed method. In the table, Camera, Triangles, Triangle constraints and propagation means the number of cameras, the number of possible triangles without identifying the cameras, the number of triangles filtered out with the triangle constraint and the number of triangles filtered out by propagating the triangle verification, respectively. The triangle constraint remains many triangle candidates, however the propagation of the triangle ver-

ification filters out almost all of wrong candidates. The remained candidates are finally filtered out by computing the locations as discussed in the previous section. In this simulation, we have used a standard personal computer which has Pentium Pro CPU and 56 Mbyte memory. The computational time is quite short for up to seven cameras. In the case the system consists of seven cameras or the cameras can be divided into small groups that consists of up to seven cameras, this algorithm solves the identification and localization problems in real time.


Figure 20.

Performance of the method and its computational time.

4.2.2. Real-World Experiment. In addition to the simulation experiments, a real-world experiment was carried out using seven identical omnidirectional vision sensors. A picture of the sensors is shown in Fig. 21. These cameras were placed randomly on the floor of our laboratory in a region approximately 4 × 4 meters, as shown in Fig. 21. In addition to the cameras, a trashcan was placed among the cameras in order to occlude the views of some cameras. In this experiment, the ground truth positions of the cameras were carefully measured. Seven omnidirectional images were acquired from the cameras as shown in Fig. 23. In order to determine the azimuth angles to the observed cameras, the cameras must first be detected in the images. Because all of the omnidirectional cameras are set on the level floor

Figure 21.

223

Seven omnidirectional cameras in a real environment.

and all are of equal height, the images of the observed cameras will fall within a very narrow circular band in the omnidirectional image, as shown in Fig. 23. Therefore, we can constrain our image processing to this narrow region. Within the circular band, we perform a simple region-based segmentation algorithm that uses connectivity analysis. The results of the segmentation are distinct “blobs” within the image region. Simple features of these blobs are used to detect the omnidirectional cameras. The cameras are dark compared to their background, and have a distinctive square shape. Once the omnidirectional cameras are detected within the image, the center of gravity of the camera blobs are used to determine the azimuth angle to each

224


Figure 22.

Image processing for detecting cameras in an omnidirectional view.

Figure 23.

Omnidirectional views taken by the cameras.

observed camera. The omnidirectional images for all cameras with the processed viewing directions are shown in Fig. 23. The observed azimuth angles to the other identified cameras are given in Table 1.

The angles between observed cameras are used as input to the positioning algorithm. In order to compare the reconstructed camera positions with the ground truth data, a common coordinate system must be used. The


Figure 24.

Comparison of ground truth camera positions and algorithm results.

results of the positioning algorithm return the relative camera positions up to a scale factor. When comparing to the ground truth positions, three items must be established: Table 1. Observed azimuth angles for each observing platform. Camera ID

Directions to other cameras

1

110.83, 196.64, 212.80, 274.89

2

221.41, 282.07, 290.04, 179.49

3

144.76, 228.17, 295.65, 319.15, 39.09

4

260.59, 278.51, 312.00, 323.09, 0.48

5

349.12, 0.00, 15.51, 42.75, 48.58, 96.83

6

130.79, 180.00, 338.52, 33.66, 115.74

7

103.40, 138.63, 158.21, 169.02, 95.16

Table 2.

225

Angles between cameras that are used in the algorithm.

Camera ID

Observed angles

(1) A coordinate center origin (2) The coordinate system orientation (3) A scale factor In this experiment, camera #5 in Fig. 24 is selected as the coordinate system origin. The coordinate system orientation and scale were determined by having camera #6 lie on the x-axis, one unit length away from camera #5. With these definitions, the Cartesian coordinates of the cameras are given in Table 3. If the ground truth positions are scaled and oriented around the same coordinate system origin, it is possible to illustrate both the ground truth positions and algorithm positions together, as shown in Fig. 24. As can be seen, the positioning errors are approximately 10% or less. Table 3. Camera ID

Estimated X , Y coordinates values. X

Y

1

16.16, 62.09

1

1.849722

0.565838

2

41.92, 60.66

2

1.427011

1.421890

3

67.48, 23.50, 79.95, 105.67, 83.40

3

0.627623

0.772362

4

37.39, 44.58

4

−0.159063

1.328027

5

10.88, 48.58, 15.51, 48.25

5

0.000000

0.000000

6

158.52, 146.34, 55.14, 64.26,137.22

6

1.000000

0.000000

7

10.81, 63.05, 19.57, 6, 35.24

7

1.954834

−0.375734

226

5.


Conclusions

In this paper, we have proposed two related methods for localizing cameras of a distributed omnidirectional vision system. The methods solve the invisible camera problem for detecting baselines when cameras do not observe projections of other cameras. Further, the methods used then localize the cameras by propagating triangles for solving the identification problem. With respect of the solution of the invisible camera problem, the increase ratio of the reliability should be properly determined in the baseline estimation method as described in Section 3. Although the discussion on this point is not yet sufficient, we have shown with experimentation that this method can detect the baselines among the sensors without the knowledge of the object correspondence. Further consideration of the increase ratio as well as the verification of the identification method remains as future work. With respect to the solution of the identification problem, the future work is to refine the algorithm and make it more efficient. The processing time increases exponentially as the number of cameras increases. Further, the algorithm currently operates on static snapshots of other cameras. We consider that the developed methods can be used as fundamental techniques for multiple camera systems that observes a wide area for monitoring and recognizing human activities and providing rich information through the computer network.

References Adelson, E.H. and Bergen, E.H. 1991. The plenoptic function and the elements of early vision. In Computation Models of Visual Processing, M. Landy and J.A. Movshon (Eds.), MIT Press. Aggarwal, J.K. and Cai, Q. 1997. Human motion analysis: A review. In Proc. IEEE Nonrigid and Articulated Motion Workshop, pp. 90– 102.

Boyd, J. Hunter, E. Kelly, P. Tai, L. Phillips, C., and Jain, R. 1998. MPI-video infrastructure for dynamic environments. In IEEE Int. Conf. Multimedia Systems. Collins, Lipton and Kanade, 1999. A system for video surveillance and monitoring. In Proc. American Nuclear Society (ANS) Eighth International Topical Meeting on Robotics and Remote Systems. Pittsburgh, PA. Eyevision, 2001 http://www.ri.cmu.edu/events/sb35/tksuperbowl. html Hong, J. et al. 1991. Image-based homing. In Proc. Int. Conf. Robotics and Automation. Ishiguro, H., Yamamoto, M., and Tsuji, S. 1992. Omni-directional stereo. In IEEE Trans. PAMI, 14(2): pp. 257–262. Ishiguro, H. 1997. Distributed vision system: A perceptual information infrastructure for robot navigation. In Proc. IJCAI, pp. 36–41. Ishiguro, H. 1998. Development of low-cost compact omnidirectional vision sensors and their applications. In Proc. Int. Conf. Information Systems, Analysis and Synthesis, pp. 433–439. Ishiguro, H. and Nishimura, T. 2001. VAMBAM: View and motion based aspect models for distributed omnidirectional vision systems. In Proc. Int. J. Conf. Artificial Intelligence, pp. 1375–1380. Ishiguro, H. 1998. Development of low-cost compact omnidirectional vision sensors and their applications. In Proc. Int. Conf. Information Systems, Analysis and Synthesis, pp. 433–439. Jain R. and Wakimoto, K. 1995. Multiple perspective interactive video. In Proc. Int. Conf. Multimedia Computing and System. Kato, K. Ishiguro, H., and Barth, M. 1999. Identifying and localizing robots in a multi-robot system. In Proc. Int. Conf. Intelligent Robots and Systems, pp. 966–972. Medioni, G., Cohen, I., Bremond, F., and Nevatia, R. 2001. Event detection and analysis from video streams. IEEE Trans. PAMI, 23(8):873–888. Nayar, S.K. and Baker, S. 1997. Catadioptiric image formation. In Proc. Image Understanding Workshop. pp. 1431–1437. Rees, D.W. 1970. Panoramic television viewing system. United States Patent, No. 3, 505, 465. Sarachik, K. 1989. Characterizing an indoor environment with a mobile robot and uncaliblated stereo. In Proc. Int. Conf. Robotics and Automation, pp. 984–989. Torr, P.H.S. and Murray, D.W. 1997. The development and comparison of robust methods for estimating the fundamental matrix. Int. J. Computer Vision, 24(3):271–300. Vsam, 2001. http://www-2.cs.cmu.edu/∼vsam/ Yagi, Y. and Kawato, S. 1990. Panoramic scene analysis with conic projection. In Proc. IROS. Yamazawa, K., Yagi Y., and Yachida, M. 1993. Omnidirectional imaging with hyperboloidal projection. In Proc. Int. Conf. Robots and Systems.