University of Patras. P.O. Box 1045, Patras 26110 ... line drawings of Origami scenes (Kanade [4]), trihedral scenes with hidden edges (Sugihara [5]) ..... are always closed if the line drawing is realizable and each walk corresponds to a planarĀ ...
Parallel Algorithms for the Interpretation of Line Drawings
N. Dendris3 & P. Parodi33 3
Department of Computer Engineering and Informatics University of Patras P.O. Box 1045, Patras 26110 Greece 33
Dipartimento di Fisica Universit a di Genova Via Dodecaneso 33, 16146 Genova Italy Abstract
This paper describes the design and the analysis of parallel algorithms for the interpretation of planar or perspective projections of scenes constructed by some generating grammar (trihedral scenes, Manhattan world, Origami scenes, Pottery world). The motivation is that parallel techniques are more and more studied in Arti cial Intelligence and Computer Vision as parallel machines improve their performances and parallel models have been successfully proposed for biological systems, so that it is possible to attempt a methodological identi cation between algorithms which are easily parallelizable and biologically plausible algorithms. It is found that the labeling problem (Human, 1970; Clowes, 1970) is eciently solvable in parallel when it is polynomially tractable. The realizability problem, which is usually solved by linear programming methods, is hard to parallelize in the general case, while
it is parallelizable when the vanishing points are known. All the results point to the importance of heavily exploiting geometrical regularities to devise biologically plausible visual algorithms. 1
Introduction
In this paper, we describe the design of parallel algorithms for the interpetation of planar or perspective projections of scenes constructed by some generating grammar (trihedral scenes, Manhattan world, Origami scenes, Pottery world). The only information we can count on is a line drawing, which records the projection of the visible edges (sometimes also the hidden edges) of the 3D objects. Line drawings have been intensively studied since the 70s. The fundamental step for the interpretation of line drawings is thought to be the consistent labeling of its segments as the projection of convex, concave, contour edges. Labeling was introduced by Human [1] and Clowes [2]), who focused on the case of trihedral scenes. Their results have been extended their results have been extended to line drawings with shadows and cracks (Waltz [3]), line drawings of Origami scenes (Kanade [4]), trihedral scenes with hidden edges (Sugihara [5]), curved line objects (Malik [6]). In Sugihara [7] a polynomial time algorithm was given which checked whether a labeled line drawing representing a polyhedral scene was realizable by reduction to Linear Programming. It is worth noticing that our brain is usually able to reconstruct the 3D shape of an object of which a line drawing is given, and it does so with no apparent eort. Any algorithm designed for interpreting line drawings must be confronted with the eciency of the brain. On account of this, complexity analysis has acquired a growing importance in the design of biologically plausible algorithms (see Tsotsos [8] for a general discussion), and a number of results have been obtained on the classi cation of visual problems as tractable (that is, polynomially solvable) and intractable (that is, NP -complete if not provably nonpolynomial). One is tempted to identify polynomially solvable problems with those problem which 2
are solved eciently by the brain, and although this cannot always be true (on the one hand, the class P of polynomially solvable problems is too large, in that it includes algorithms of complexity O (n ) with k arbitrarily large; on the other hand, the complexity of an algorithm is only an asymptotic estimate of its time requirements, which may not be meaningful when the input size does not exceed a certain range) this identi cation can have a methodological utility. It inspires the following strategy to analyze a problem: (i) consider a problem which is known to be eciently solved by the human brain (e.g., 3D reconstruction from a photograph); (ii) analyze its complexity; (iii) if the problem is found to be intractable, nd constraints and regularities which make the problem tractable; (iv) if such conditions are found, generate the conjecture that these constraints and regularities are indeed used by the brain; (v) discuss the plausibility of the conjecture (are such regularities usually found in natural instances of the problem?). Examples of the use of this methodology can be found in [9], [10], [11], [12], [13]. k
A number of results on the complexity of understanding line drawings has been obtained in the past. Kirousis & Papadimitriou [9] showed that the labelability and the realizability problems are both NP -complete. They also gave a linear time algorithm for labeling planar projections of Manhattan scenes (in these, all edges in the scene are oriented according to one of three orthogonal directions). Kirousis [10] gave a linear time algorithm that labels line drawings of trihedral scenes, under the hypothesis that for all Y-junctions we know at least one segment which is not the projection of an occluding edge. Also, Alevizos [14] gave a linear time algorithm that labels line drawings of trihedral scenes using the information provided by hidden edges. Dendris et al, [11] described a linear time labeling algorithm for line drawings of scenes made up of curved objects, derived by revolution of line segments around arbitrary axes. Parodi & Torre [12] gave a polynomial time algorithm to label perspective projections of trihedral scenes once the location of vanishing points in the image plane is given. Parodi [13] analyzed the complexity of labeling Origami scenes. Summarizing, the identi cation of polynomially solvable problems with biologically plausible algorithms may help a lot in nding the clues that the brain uses in order to solve a visual problem. 3
In this paper we would like to push this identi cation a little further. It is a general conviction that biological systems exploit a massive parallelism (see for example Feldman & Ballard [15]) , so that it is important to ask whether a given algorithm which solves a visual problem is parallelizable. We will exploit the theory of parallel complexity, and as a rst approximation we will use the PRAM (Parallel Random Access Memory) as the model of computation (see Section A). The use of the PRAM model need not be biologically realistic, but it is a powerful model for several reasons (see Jaja [16]): (i) the PRAM model removes algorithmic details concerning synchronization and communication, and therefore allows the algorithm designer to focus on structural properties of the problem; (ii) the PRAM model captures several important parameters of parallel computation; (iii) many algorithms for more realistic models can be easily derived from PRAM algorithms; furthermore, PRAM algorithms can be mapped eciently on several bounded-degree networks. We, then, propose the following methodological identi cation: ecient parallel algorithms = biologically plausible algorithms where ecient algorithms are those which run in polylogarithmic time by using a polynomial number of processors. As it will be shown in this paper, the labeling problem can be solved by ecient parallel algorithms in all cases it has been found to be solvable by a polynomial algorithm. On the other hand, the realizability of a labeled line drawing is not always parallelizable unless some geometric information is present. As in the sequential case, the geometrical regularities play a fundamental role in biological plausibility. Some algorithms presented in this paper had already been parallelized in the past (Manhattan world [9], trihedral scenes with a connecting segment for each Y-junction [10], Pottery world [11]), other algorithms were not (trihedral scenes with vanishing points, Origami scenes, trihedral scenes with hidden edges, general polyhedral scenes with hidden edges and vanishing points, realizability problem with the information on vanishing points. 4
2
Labeling line drawings
A line drawing L (see Fig. 1) is a graph which is interpreted as the orthographic or perspective projection of a 3D scene constructed according to some generation rules (e.g., trihedral scenes, Origami scenes, Pottery world.) Since orthographic projection can be seen as a special case of perspective projection, in the following we will always assume that we are dealing with perspective projection. A line drawing may only record the visible edges of the scene, or it may also contain the information on hidden edges, maybe drawn (as in [5], [14]) dierently from visible edges (see Fig. 1D). Finally, the scene is assumed to be in general position: a slight change in the viewpoint does not change the topology of the line drawing.
Figure 1 near here The classical approach to interpreting line drawings is by subdivision into two steps: labeling and realizability. The rst step (labeling) gives a qualitative description of the line drawing in terms of convexity, concavity and occlusion; the second step (realizability) answers the question of whether there exists a scene that generates the line drawing under examination, and reconstructs its 3D shape if possible. 2.1
The labeling problem
Labeling a line drawing means assigning a label to every segment describing the 3D properties of the edge which is projected into the segment. A segment labeled with \+" (respectively, \0") is the projection of a visible convex (respectively, concave) edge, while a segment labeled \!" means that there is only one visible face (the one at the right of the arrow). In the case of curved objects another label (\!!") is introduced, which means that the segment is the projection of a limb, along which the surface changes its orientation smoothly (and the visible part remains on the right). In order for a junction to be realizable as the projection of a 3D vertex, its segments must be labeled according to some accepted catalog (e.g. the catalog shown in Fig. 2 corresponds to Origami and trihedral scenes). 5
Figure 2 near here Labeling a line drawing is a combinatorial problem. There are two standard approaches to labeling: the linguistic approach, dating back to Human [1] and Clowes [2], and the constraint satisfaction approach, dating back to Waltz [3] and Mackworth [17].
2.1.1 Labeling as Satis ability of a boolean expression In the linguistic approach, the labeling problem is reduced to the satis ability problem (SAT) of a boolean proposition in conjunctive normal form (CNF) associated to the line drawing. Recall that a CNF boolean proposition F on the boolean variables x1 ; : : : x is the logical product of logical sums (clauses) of the variables, as for examples: n
F = (x1 + x3 + x4) 1 (x2 + x1 ) 1 x5 1 (x1 + x4) SAT is the problem of determining whether there is an assignment of values (0/1) to x1 ; : : : x such that the proposition is true. Observe that SAT is NP -complete, so this is not an ecient method for solving the general case labeling problem. In many cases, however, it is possible to reduce the labeling problem to 2-SAT (the satis ability problem for a boolean proposition in conjunctive normal form with no more than two literals per clause). 2-SAT is known to be polynomially solvable, and also to be eciently solvable in parallel (see Karp & Wigderson [18]). The least upper bound so far obtained is O(log2 N ) with O (N 4 ) processors (see Cook & Luby [19]) (N is the number of variables) on the EREW-PRAM model (see Section A). n
2.1.2 Labeling as Constraint Satisfaction In the constraint satisfaction approach, the labeling problem is seen as a special case of the problem of assigning values to a given set of variables, so that the assigned values satisfy a given set of constraints (Constraint Satisfaction Problem, CSP). A more formal de nition is the following. A set of n variables X1 ; : : : X is given along with their domains D1 2 : : : 2 D of maximum cardinality a. Furthermore, we have a set of m constraint relations. Each constraint relation is a subset of a Cartesian product of the form D 1 ; : : : D k , n
n
j
6
j
where j1 ; : : : j is a sequence of distinct integers from the set f1; : : : ng and having length (arity) k (2 k n). Given variables, domains and constraint relations as above, nd all n-tuples (d1 ; : : : d ) 2 D1 2 : : : 2 D such that all constraint relations are satis ed. The decision version of CSP (the one in which one is only interested in knowing whether there is any n-tuple as above) is NP -complete, and can be addressed by backtracking methods. In order to decrease the combinatorial explosion, relaxation algorithms are usually used as a pre-processing to backtrack search, in order to restrict the domain of permissible values for each variable so that: (i) no global assignment satisfying all constraints is lost, and (ii) any assignment of values to a xed number of variables can be consistently extended to include an arbitrary additional variable. As an example, Arc Consistency requires that for all values in the domain of a variable there must be at least one value for all the other variables which is consistent with it. Arc Consistency can be carried out in time proportional to the number of constraints (see Mackworth & Freuder [20] for a perspective on the work in the area). k-Consistency (consistency of sets of k + 1 variables) can be achieved in time O (n ) where n is the number of constraints. Unfortunately, relaxation algorithms are inherently sequential: Kasif [21] has proved that Arc Consistency and, more in general, k -Consistency, is P complete. On the positive side, it has been shown by Kirousis [22] that if the constraints in the network are all of the implicational type (a constraint relation R is called implicational if for any two variables X and X constrained by R and any value a from the domain of X , either a and R uniquely determine the value of X , or bear no implication on it) a global assignment satisfying all constraints can be found by an ecient parallel algorithm without any relaxation algorithm as a pre-processing. As we will see in the next section, there are many cases in Vision in which line labeling can be reduced to an instance of the Implicational Constraint Satisfaction Problem (ICSP). The parallel algorithm for ICSP takes time O(log3 n) with O ((m + n3 )= log n) processors on a EREW-PRAM, where n is the number of processors and m is the number of constraints. k
n
n
k
i
i
i
j
i
j
2.1.3 Tractable labeling problems
The general labeling problem is NP -complete, but there are several interesting cases in which the problem becomes polynomially solvable ([9, 10, 11, 7
12, 13]). It is an interesting fact that in all of these cases it is also possible to give an ecient parallel algorithm for labeling. The reason is that, once certain geometrical constraints are satis ed (e.g., all edges are oriented according to one of three orthogonal directions), it is very often the case that some unique propagation rules can be found for the labels, and this allows to put the constraints in the framework of implicational constraints or 2-SAT. In the following section we will see several examples 3
Reducing the labeling problem to 2-SAT and ICSP
The general labeling problem is NP -complete, but there are several interestin cases in which the problem becomes polynomially solvable ([9, 10, 11, 12, 13]). It is an interesting fact that in all of these cases it is also possible to give an ecient parallel algorithm for labeling. The reason is that if certain geometrical constraints are satis ed (e.g., all edges are oriented according to one of three orthogonal directions), some `unique propagation rules' for the labels can be found, allowing to put the constraints in the framework of ICSP or 2-SAT. Since 2-SAT and ICSP can be eciently solved in parallel, the main problem which remains to be solved is nding an ecient parallel reduction procedure of the labeling problem to 2-SAT or ICSP. In the following sections, we will show some examples of reduction. 3.1
Trihedral scenes given the vanishing points; Manhattan world
The scene is assumed to be composed by solid (no hanging faces or edges), trihedral (three faces lying on three dierent planes meeting at every vertex; no cracks or touching polyhedra) and opaque polyhedra. The line drawing has N segments. Since all junctions must have degree two or three, the number of junctions is O(N ). Under these hypotheses, the labeling problem was found to be NP -complete by Kirousis & Papadimitriou [9]. 8
Here it is also assumed that a set of n vanishing points is given, whose location in the image plane is known. As a consequence, it is possible to reduce the number of ways in which the junctions can be labeled (e.g., it is possible to determine whether the middle segment of an E-junction is convex (\+") or concave (\0"). As a consequence of the general position assumption, no segment of the line drawing can lie on the horizon line de ned by two distinct vanishing points V P and V P . It is useful to introduce a sharper classi cation of junctions. These are naturally classi ed, according to their geometrical schape, as Y, E, L, T; Y's are further classi ed Y(+)'s (Y(0)'s) according to whether a connecting segment (that is, one that must be labeled \+" or \0") must be labeled \+" (\0"). E's are classi ed as E(+)'s (E(0)'s) according the middle segment is to be labeled \+" (\0"). L's are classi ed as L(>)'s (L(