Drawing database schemas - Semantic Scholar

13 downloads 10552 Views 428KB Size Report
A DBS-DRAWING of a database schema with 10 tables and 14 links. ...... the hosting system has to do is save the database schema description in a file using an ...
SOFTWARE—PRACTICE AND EXPERIENCE Softw. Pract. Exper. 2002; 32:1065–1098 (DOI: 10.1002/spe.474)

Drawing database schemas Giuseppe Di Battista1 , Walter Didimo2 , Maurizio Patrignani1 and Maurizio Pizzonia1,∗,† 1 Dipartimento di Informatica e Automazione, Universit`a di Roma Tre, Via della Vasca Navale 79, 00146 Roma, Italy 2 Dipartimento di Ingegneria Elettronica e dell’Informazione, Universit`a di Perugia, Via G. Duranti 93, 06125 Perugia, Italy

SUMMARY A wide number of practical applications would benefit from automatically generated graphical representations of database schemas, in which tables are represented by boxes, and table attributes correspond to distinct stripes inside each table. Links, connecting attributes of two different tables, represent referential constraints or join relationships, and may attach arbitrarily to the left- or to the righthand side of the stripes representing the attributes. To our knowledge no drawing technique is available to automatically produce diagrams in such a strongly constrained drawing convention. In this paper we provide a polynomial time algorithm for solving this problem, and test its efficiency and effectiveness against a large test suite. Also, we describe an implementation of a system that uses such an algorithm and we study the main methodological problems we faced in developing such a technology. Copyright  2002 John Wiley & Sons, Ltd. KEY WORDS :

graph drawing; algorithm engineering; drawing standard; orthogonal drawing; database schema visualization; upward drawing

INTRODUCTION In order to design, maintain, update, and query databases, users and administrators cope with the complexity of the database schemas describing the structure of the data. A graphical representation of such schemas greatly improves the friendliness of a database application and is essential for producing high-quality documentation. For this reason many commercial tools provide some diagramming facility (see Figure 1 for an example). Generally, such facilities rely on the user’s skills for producing readable and effective diagrams.

∗ Correspondence to: Maurizio Pizzonia, Dipartimento di Informatica e Automazione, Universit`a di Roma Tre, Via della Vasca

Navale 79, 00146 Roma, Italy. † E-mail: [email protected]

Contract/grant sponsor: European Commission—Fet Open project; contract/grant number: COSIN IST-2001-33555

Copyright  2002 John Wiley & Sons, Ltd.

Received 8 December 2000 Revised 12 November 2001 Accepted 15 April 2002

1066

G. DI BATTISTA ET AL.

Figure 1. A screen snapshot of Microsoft Access. The example is taken from a real-life application. Boxes represent tables and lines represent referential integrity constraints between attributes.

However, drawing diagrams by hand is time consuming and the aesthetic results are often unsatisfactory. Furthermore, special attention is needed in order to keep the graphical documentation consistent with an evolving system. Unfortunately, to our knowledge, no drawing technique is available to automatically produce highquality diagrams representing database schemas. A reason may be that such diagrams are strongly constrained: (i) each table of the database schema is usually represented as a box composed of a vertically ordered sequence of attributes, with the name of the table at the top; (ii) edges, representing constraints or join paths between tables, link attributes of different tables; (iii) edges may attach arbitrarily to the left- or to the right-hand side of the boxes and should be incident to the box at the level of the corresponding attribute name. So far, although the relationship between the database research area and the graph drawing one is strong, the interest of the researchers has been mainly focused on the visualization of Entity– Relationship diagrams and Data–Flow diagrams, which are relatively simpler to draw automatically than are database schema diagrams (see, e.g., [1–3]). This paper deals with the automatic generation of diagrams of relational database schemas. To produce these diagrams we formulate a constrained orthogonal graph drawing problem, and we address it within the topology–shape–metrics approach [3–5], showing how this approach can be tailored to take into account the complex constraints originating from this type of diagram. The main results we present are the following. Copyright  2002 John Wiley & Sons, Ltd.

Softw. Pract. Exper. 2002; 32:1065–1098

DRAWING DATABASE SCHEMAS

1067

• We describe a polynomial time algorithm for drawing diagrams of relational database schemas. The algorithm relies on several variations of existing graph drawing techniques, giving new highlights on their practical applicability. • We present a system that implements the proposed algorithm. Its architecture provides several kinds of interfaces that allow end-users and developers with different levels of expertise to access the proposed graph drawing technology. Also, we discuss the methodological motivations that brought us to choose such an architecture. Furthermore, the efficiency and effectiveness of the system are verified by performing experimental tests with a randomly generated test suite.

PROBLEM DESCRIPTION The relational model Most current database management systems are based on the relational model, introduced in [6]. The relational model is based on the intuitive concept of table. For our visualization purposes a table with n columns is a set of ordered n-tuples, such that values in the same position of any two tuples have the same data type. Each column of the table is identified by a name, called the attribute, which is unique within the table, and the table itself has a name. The name of the table with the ordered sequence of its attributes is called the table schema while the set of tuples in a table is referred to as the table instance. A database schema is a set of table schemas with distinct table names, and a database instance is a set of table instances consistent with a certain database schema. A key is a set of attributes of a table that unambiguously identifies its tuples. Namely, there are no two tuples of the table with the same values on the attributes of the key. Each table schema is provided with a key. For a given application some database instances may represent meaningless information. To maintain the data consistency in commonly used database management systems several types of relationships and/or constraints between tables are defined (see [7] for a comprehensive survey). Each of them takes the form (T1 , A, T2 , B), where T1 and T2 are table schemas and A and B are subsets of attributes of T1 and T2 , respectively. Furthermore, A and B have the same cardinality and pairwise have the same data type. The most used relationships and constraints are the join relationships and the referential integrity constraints, respectively. They are defined as follows. • A join relationship of the form (T1 , A, T2 , B) states that there is a frequently used join operation between T1 and T2 , involving the subsets of attributes A and B. Database management systems also allow one to specify the ‘behavior’ of the join. For example, a user can say that the join should consider only the tuples that have the same values on the joined attributes (natural join). Otherwise, a user can say that the join should consider at least one tuple for each tuple of T1 , possibly with null values for the attributes of T2 , and only those tuples from T2 that match at least one tuple in T1 (left outer join). Refer to [7] for further details. • Given a table schema T , a set A of attributes of T , and a tuple t of an instance of T , we denote by t (A) the subtuple of t restricted to the attributes in A. A referential integrity constraint of the form (T1 , X, T2 , K), where K is a key of table T2 , is satisfied, if for each tuple t1 in the Copyright  2002 John Wiley & Sons, Ltd.

Softw. Pract. Exper. 2002; 32:1065–1098

1068

G. DI BATTISTA ET AL.

Sessione IdSessione Inizio Amministrazione

Fine

IdAmministrazione

IdAmministrazione

IdComparto IdSettore IdComune

Settore IdSettore CodiceIstat

MenuSindacati

Descrizione

Comparto

IdComparto

IdComparto

Qualifica

IdTipoIstituto IdComparto

MenuQualifica

CodiceIstat

IdTipoQualifica

IdQualifica

Descrizione

IdPeriodoNormativo

IdComparto

NumRegioni

IdSindacato

IdTipoIstituto

NumProvince

IdQualifica Descrizione

Dipendente

IdTipoQualifica

IdDipendente

Ordine

Cognome Istituto

Nome

IdAmministrazione

CodiceFiscale

IdDipendente

Sesso

IdSindacato

IdQualifica

Comune

IdTipoIstituto

IdComune

IdComune

IdComune IdQualifica

IdProvincia CodiceIstat Descrizione

Figure 2. A DBS- DRAWING of a database schema with 10 tables and 14 links. Links represent referential integrity constraints.

instance of T1 , there exists a tuple t2 in the instance of T2 such that t1 (X) is equal to t2 (K). Referential integrity constraints are also called foreign keys. Drawing conventions for database schemas Our purpose is to visualize a database schema in terms of table schemas, join relationships, and referential integrity constraints. Both join relationships and referential integrity constraints of the form (T1 , A, T2 , B) are simply called links, and the pairs T1 , A and T2 , B are the extremes of the link. To simplify the terminology, we also say that a link (T1 , A, T2 , B) is incident on all the attributes in A and B. The drawing convention we define is inspired by the graphical representations adopted in commonly used systems for handling databases. Also, we enforce graphical constraints to improve the readability of the drawing. In particular, we do not allow tables to overlap and links to traverse tables. Our drawing convention is defined as follows. For the sake of simplicity, we define such a convention in the case where all links have the cardinality of A (and B) equal to one. It is not difficult to remove such a restriction by adding suitable graphical attributes in a post-processing step. • Each table schema is represented as a box and its attributes are sequentially listed in the box, with each attribute corresponding to a horizontal stripe. We suppose that the vertical order of the attributes of a table schema is given and that the drawing must preserve such an ordering. This feature allows the user to rank the attributes in order of ‘importance’ or to put ‘related’ Copyright  2002 John Wiley & Sons, Ltd.

Softw. Pract. Exper. 2002; 32:1065–1098

DRAWING DATABASE SCHEMAS

1069

attributes close together. The stripe at the top of each box is reserved for the table name. All the stripes have the same height. Two tables cannot overlap. • Each link (T1 , A, T2 , B) is represented as a polygonal line p between the boxes of the two table schemas T1 and T2 . The segments composing p are either horizontal or vertical (orthogonal standard); p is horizontally incident on the stripes associated with the attributes in A and B. A link cannot overlap any table. The only allowed overlaps between links are the crossings between a horizontal segment and a vertical segment belonging to distinct links. We call DBS- DRAWING (DataBase Schema-drawing) a drawing of a database schema that respects the above convention. An example of a DBS- DRAWING is depicted in Figure 2. A DBS- DRAWING can be easily refined with additional graphical attributes to make it more informative. However, the algorithm we describe for computing DBS- DRAWINGS is independent of such graphical attributes, which can be introduced in a post-processing step. The algorithm we propose for computing DBS- DRAWINGS uses several concepts and techniques from the graph drawing field. In the next section we recall the basic definitions needed to describe the working of the algorithm. GRAPH DRAWING BACKGROUND We use graph drawing techniques to compute DBS- DRAWINGS, in such a way that vertices and edges of the graph represent tables and links of the database schema, respectively. Special care is devoted to the handling of the peculiar graphical constraints of the DBS- DRAWINGS. The basic graph drawing definitions needed to describe the algorithm for computing DBSDRAWINGS are given in the following. See [8] for elementary graph theory and connectivity concepts. Planarity A drawing  of a graph G maps each vertex of G onto a distinct point of the plane, and each edge of G onto a simple Jordan curve between the two points associated with the end-vertices of the edge;  is planar if no two distinct edges intersect. A graph is planar if it admits a planar drawing. A planar drawing  of G induces, for each vertex v of G, a circular clockwise ordering of the edges incident on v. Also,  subdivides the plane into topologically connected regions, called faces. Exactly one of these faces is unbounded; it is called the external face. The other faces are said to be internal. Two planar drawings of G are said to be equivalent if (i) for each vertex v of G they induce the same ordering of the edges around v, and (ii) they have the same external face. Note that two equivalent drawings of G have the same set of faces. An embedding of G is a class of equivalent planar drawings of G. In other words, we can regard an embedding of G as the choice of a clockwise ordering of the edges around every vertex plus the choice of the external face. An embedded graph is a planar graph with a given embedding. Quasi-upward drawings In the following we call digraph a directed graph. Let G be an embedded digraph. A vertex v of G is bimodal if the circular list of the edges incident on v can be partitioned into two (possibly empty) linear lists of edges, one consisting of the incoming edges and the other consisting of the outgoing Copyright  2002 John Wiley & Sons, Ltd.

Softw. Pract. Exper. 2002; 32:1065–1098

1070

G. DI BATTISTA ET AL.

(a)

(b)

Figure 3. (a) An upward planar drawing; (b) a quasi-upward planar drawing with four bends. The dashed lines indicate the tangents to the bend-points.

edges. An embedding is bimodal if every vertex is bimodal. A planar graph is said to be bimodal if it admits a planar bimodal embedding. A drawing of G such that all the edges are curves monotonically increasing in a given direction is known in the literature as an upward drawing. Figure 3(a) shows an example of an upward planar drawing in the left–right direction. Acyclicity and bimodality are necessary conditions for the existence of an upward planar drawing of an embedded digraph [9]. However, these conditions are not sufficient. A polynomial time algorithm to test the existence of upward planar drawings of a planar embedded digraph is given in [9]. The problem is NP-complete in a variable embedding setting [10]. The quasi-upward drawing convention extends the upward drawing convention [11]. A quasi-upward drawing in the left–right direction of a digraph is such that the vertical line through each vertex v ‘locally’ splits the incoming edges from the outgoing edges of v. The term locally is used to identify a sufficiently small connected region properly containing v. A bend of a quasi-upward drawing in the left–right direction is a point on an edge such that the vertical line through this point is tangential to the edge. Intuitively, a bend is a point in which an edge inverts its left–right direction. In Figure 3(b) a quasi-upward planar drawing with four bends is shown. In [11] it is proven that a quasi-upward planar drawing of a digraph exists if and only if the digraph is planar bimodal, and a polynomial time algorithm for computing quasi-upward planar drawings with the minimum number of bends of an embedded bimodal digraph is described. Such an algorithm is one of the building blocks used for computing DBS- DRAWINGS. Orthogonal drawings An orthogonal drawing of a graph is a drawing such that all edges are represented as chains of horizontal and vertical segments. Copyright  2002 John Wiley & Sons, Ltd.

Softw. Pract. Exper. 2002; 32:1065–1098

DRAWING DATABASE SCHEMAS

1071

An orthogonal representation (or shape) of an embedded (planar) graph G is an equivalence class of planar orthogonal drawings such that the following hold: • for each edge (u, v) of G, all the drawings of the class have the same sequence of left and right turns (bends) along (u, v), while moving from u to v; • for each vertex v of G, and for each pair {e1 , e2 } of clockwise consecutive edges incident on v, all the drawings of the class determine the same angle between e1 and e2 . Roughly speaking, an orthogonal representation defines a class of planar orthogonal drawings that may differ only for the length of the segments of the edges. Topology–shape–metrics One of the most popular techniques for computing an orthogonal drawing of a graph G is the so-called topology–shape–metrics approach [3,4,12]. It consists of three consecutive steps. Planarization. If G is planar, then an embedding for G is computed. If G is not planar, a set of dummy vertices is added to replace crossings. Orthogonalization. During this step, an orthogonal representation H of G is computed within the previously computed embedding. Compaction. In this step a final geometry for H is determined. Namely, coordinates are assigned to vertices and bends of H . In each step one or more optimization goals are considered, which are related to well-known aesthetic criteria [3]. Namely, during Planarization, since each dummy vertex represents a crossing, the goal is the minimization of the number of inserted dummy vertices. During Orthogonalization the objective is to determine a shape with the minimum number of bends. Finally, during Compaction the goal is to minimize the total area of the drawing or the total edge length. The distinct phases of the topology–shape–metrics approach have been extensively studied in the literature. If G is planar, which can be tested in linear time with one of the well-known algorithms in [13,14], an embedding φ of G is determined in linear time, by applying an embedding algorithm [15,16]. If G is not planar, the minimum number of dummy vertices introduced may be (n4 ). However, in practice this number is usually much smaller. Minimizing the number of crossings is in general NP-hard [17]. For a survey on planarization techniques see [3]. A popular algorithm for constructing an orthogonal representation of an embedded graph with vertices having at most four incident edges was presented in [4]. Such an algorithm computes an orthogonal representation that has the minimum number of bends within the given embedding. Extensions to general embedded graphs are provided in [5,18,19]. The problem in the variable embedding setting is NP-complete [10]. The problem of compacting an orthogonal representation minimizing the area or the total edge length of the drawing is NP-complete [20], but it can be optimally solved for particular classes of orthogonal representations [21,22]. The topology–shape–metrics approach deals with topology, shape, and geometry of the drawing separately, so allowing us to address each aesthetic criterion in the appropriate step, steering clear of the complexity of a global optimization. Copyright  2002 John Wiley & Sons, Ltd.

Softw. Pract. Exper. 2002; 32:1065–1098

1072

G. DI BATTISTA ET AL.

Orthogonal drawings with high degree vertices Observe that in a planar orthogonal drawing, since each vertex is a point and since planarity does not allow distinct edges to overlap, each vertex can be at most of degree four. Of course, this is a severe limitation for most applications. In order to orthogonally draw graphs of arbitrary vertex degree, different drawing conventions have been introduced in the literature. Here we recall the podevsnef (planar orthogonal drawing with equal vertex size and not empty faces) drawing convention, defined in [18]. A podevsnef drawing (see Figures 4(a) and 4(b)) is an orthogonal drawing with the following properties. 1. Segments representing edges cannot cross, with the exception that two segments that are incident on the same vertex may overlap. Observe that the angle between such segments has zero degree. Roughly speaking, a podevsnef drawing is ‘almost’ planar: it is planar everywhere but in the possible overlap of segments incident on the same vertex. Observe in Figure 4(b) the overlap of segments incident on vertices 1, 2, and 3. 2. All the polygons representing the faces have area strictly greater than zero. Podevsnef drawings are usually visualized representing vertices as boxes with equal size and representing two overlapping segments as two very near segments. See Figure 4(c). In [18] an algorithm is presented that computes a podevsnef drawing of an embedded planar graph with the minimum number of bends. Furthermore, the authors conjecture that the drawing problem becomes NP-hard when condition 2 is omitted. The podevsnef drawings generalize the concept of orthogonal representation, allowing angles between two edges incident on the same vertex to have a zero degree value. The consequence of the assumption that the polygons representing the faces have area strictly greater than zero is that the angles have specific constraints. Namely, because of conditions 1 and 2, each zero degree angle is in correspondence with exactly one bend [18]. An orthogonal representation corresponding to the above definition is a podevsnef orthogonal representation. To compute DBS- DRAWINGS we exploit an extension of the podevsnef drawing convention. Such an extension has been introduced in [23] to deal with drawings in which the width and the height of each single vertex is assigned by the user, and it is referred to as podavsnef (planar orthogonal drawing with assigned vertex size and non-empty faces). A podavsnef drawing has the following properties (see also Figure 4(d)). • Each vertex is represented by a box with its specific width and height; width and height are assigned to each single vertex by the user. • Consider any side of length l ≥ 0 of a vertex v and consider the set I of arcs that are incident on such side. – If l + 1 ≥ |I | then the edges of I cannot overlap. – If l + 1 < |I | then the edges of I are partitioned into l + 1 non-empty subsets such that all the edges of the same subset overlap. • The orthogonal representation constructed from a podavsnef drawing by contracting each vertex into a single point is a podevsnef orthogonal representation. Copyright  2002 John Wiley & Sons, Ltd.

Softw. Pract. Exper. 2002; 32:1065–1098

1073

DRAWING DATABASE SCHEMAS

1

3

3

1

3

1

2

5

2

2

4

4

4

5

5

7

6

6

(a)

7

6

7

(b)

(c)

3 1

vertex

2 4 5 6

7

1 2 3 4 5 6 7

width 1 2 0 0 0 0 0

height 1 0 0 0 1 0 0

(d) Figure 4. (a) A planar graph and (b) one of its podevsnef drawings; (c) a more effective visualization of the podevsnef drawing in (b); (d) a podavsnef drawing with the same shape as the drawing in (a). The sizes of the vertices are specified in the table.

In [23] a polynomial time algorithm for computing podavsnef drawings of an embedded planar graph that have a minimum number of bends over a wide class of podavsnef drawings is also described.

DRAWING DATABASE SCHEMAS In this section we describe a polynomial time algorithm for computing DBS- DRAWINGS. Here, we consider only the case in which, for each link (T1 , A, T2 , B), the sets A and B have both cardinality equal to one. In the general case, we just select one attribute for each of the two sets A and B. In fact, as mentioned in the section describing the drawing convention for database schema, suitable graphical Copyright  2002 John Wiley & Sons, Ltd.

Softw. Pract. Exper. 2002; 32:1065–1098

1074

G. DI BATTISTA ET AL.

attributes can be added to the drawing in a post-processing step in order to handle the general case. This section is organized as follows. • We first state the optimization problems related to the computation of DBS- DRAWINGS. We also discuss the limits of existing graph drawing techniques, so motivating the study of a specific algorithm. • We give an overview of the algorithm for computing DBS- DRAWINGS, by providing the intuition that is behind each one of its basic steps. After that, every step is described in detail. • Finally, we analyze the time complexity of the algorithm. Goals The drawing algorithm has two main goals. Goal 1. It must guarantee that the resulting drawings conform to the DBS- DRAWING drawing convention described for database schemas. Goal 2. In order to produce highly readable drawings, it should take into account some relevant aesthetic criteria. As shown in the graph drawing background section, the topology–shape–metrics approach allows us to compute drawings within the orthogonal drawing convention. Even though it is possible to straightly apply such an approach to draw database schemas, the result does not accomplish Goal 1. In fact, a naive application of the topology–shape–metrics approach would have the following drawbacks. • Planarization. The computed circular order of the links around a table T could be inconsistent with the attribute sequence specified for T . Figure 5 illustrates this problem. • Orthogonalization. The edges could be incident on the top side or on the bottom side of the box representing a table, while the drawing convention allows edges to be incident on the left-hand side or right-hand side of a table only. • Compaction. The edges may not be incident on the box representing T at the specific heights prescribed for the corresponding attributes. Figure 6 illustrates this problem. Concerning Goal 2, in accordance with several experimental [24–26] and cognitive [26,27] works, we consider the following aesthetic criteria relevant for the readability of a DBS- DRAWING: • • • •

number of crossings; number of bends; area of the bounding box; total edge length.

Unfortunately, as noted in [3], it is impossible in general to minimize at the same time all the above measures. Only tradeoffs among them can be pursued. Formal cognitive studies [26,27] show that, from a perceptual point of view, the minimization of the number of crossings is often the most important. Furthermore, the need to accomplish Goal 1 makes the optimization of each single aesthetic criteria more difficult, because several constraints have to be considered over the usual orthogonal drawing convention. In the next section we describe an algorithm to compute DBS- DRAWINGS that satisfies Goal 1 and Goal 2, especially concerning the minimization of the number of crossings. Copyright  2002 John Wiley & Sons, Ltd.

Softw. Pract. Exper. 2002; 32:1065–1098

DRAWING DATABASE SCHEMAS

1075

T3 T3

T1

A3

T1

B3

A1

A3

A1

B3

B1

B1

T2

T2 A2

A2

B2

B2

(a)

(b)

T3

T1 A1

A3

B1

B3 T2 A2 B2

(c) Figure 5. (a) A fragment of a schema. (b) Wrong embedding. A drawing of the schema constructed with the topology–shape–metrics approach. Observe that, in order to reduce the number of crossings, the algorithm may yield an ordering of the edges around table T1 which is not consistent with the given attribute sequence. (c) Correct embedding. A drawing within the convention.

T1

T1

T1

A1

A1

A1

B1

B1

B1

(a)

(b)

(c)

Figure 6. (a) A fragment of a schema. (b) Wrong attach point. A drawing of the schema constructed with the topology–shape–metrics approach. Observe that, in order to reduce the length of the link, the algorithm attaches it in correspondence to the wrong attribute. (c) Correct attach point. A drawing within the convention.

Copyright  2002 John Wiley & Sons, Ltd.

Softw. Pract. Exper. 2002; 32:1065–1098

1076

G. DI BATTISTA ET AL.

The DBS- ALGORITHM Let S be a database schema. The underlying graph GS of S is defined as follows. The vertices of GS are the tables of S; there is an edge in GS between tables T1 and T2 if there is a link in S involving T1 and T2 . In what follows we assume that GS is always connected. If GS is not connected we apply the algorithm we describe below to every connected component, and then arrange all obtained drawings on the plane by using a packing heuristic [28,29]. We call our algorithm DBS- ALGORITHM. It consists of four main steps (Figure 7), informally described below. Constrained planarization. A planarization is performed on GS . The purpose of this step is to obtain a planar embedding of GS such that the order of the edges around each vertex vT , representing a table T , is consistent with the specific sequence of attributes of T (see Figure 5). The output is an embedded graph GS where dummy vertices of degree four are introduced to replace crossings (cross-vertices). Each link of S is represented in GS as an alternating chain of edges and crossvertices (Figure 7(a)). U-turns assignment. This step deals with the left-to-right development of the drawing. From this perspective the edges of the drawing are of two types: edges that monotonically follow the leftto-right direction and edges that perform one or more u-turns. A u-turn is a point where an edge changes its left-to-right orientation. A left-to-right shape is assigned to GS . A (possibly empty) sequence of u-turns is associated with each edge trying to minimize their total number. A u-turn is represented with a particular kind of dummy vertex (u-vertex) of degree two (Figure 7(b)). The edges of the graph are made so as to be directed according to the computed left-to-right development. We denote by DS the digraph produced by this step. Orthogonalization. The result of this step is an orthogonal representation H of GS . H is obtained from DS by applying the transformation patterns depicted in Figure 8. For each vertex a pattern is selected according to its type. Each vertex may represent a table, may be a cross-vertex, or may be a u-vertex. Intuitively, each pattern describes the part of H associated with a vertex in DS . After the appropriate pattern has been applied to each vertex, we remove the orientation of the edges and ‘absorb’ the u-vertices, so that the orthogonal representation H of GS is completely determined (Figure 7(c)). Constrained compaction. The input of this step is H . The output is the final DBS- DRAWING. The length of the edges and the size of the vertices are computed, keeping the area and the total edge length as small as possible. The adopted technique allows us to exactly specify the incidence point of each link on the boxes representing the tables involved in the link. Crossvertices introduced in the Constrained planarization step are removed (Figure 7(d)).

Constrained planarization In the Constrained planarization step, a planarization is performed on GS . The output of the planarization is an embedded graph GS that has the same vertices as GS plus dummy vertices of Copyright  2002 John Wiley & Sons, Ltd.

Softw. Pract. Exper. 2002; 32:1065–1098

1077

DRAWING DATABASE SCHEMAS

Attend−Rel

Attend−Rel

Student

Student

Course

Course Department Person

Person SSN Name Dept. Address Birth

Name

SSN

Chief

Department

Name

Address

Name

Dept.

Course

Chief

Code

Address

Name

Course

Address

Code

Birth

Name Prof

Prof

Dept

Dept Evaluation

Evaluation

SSN

SSN

Course

Course

Dep

Dep

Eval

Eval

(a)

(b)

Attend−Rel

Attend−Rel

Student

Student

Course

Department

Course Department

Person

Person

Name

SSN

Chief

Name

Address

Dept.

Course

Name

SSN

Chief

Name

Address

Course

Dept.

Code

Address

Code

Address

Name

Birth

Name

Birth

Prof Dept

Prof Dept

Evaluation Evaluation SSN Course

SSN Course

Dep

Dep

Eval

Eval

(c)

(d)

Figure 7. An illustration of the main steps of the DBS- ALGORITHM. (a) Constrained planarization; (b) u-turn assignment; (c) orthogonalization; (d) constrained compaction.

degree four introduced to represent crossings (cross-vertices). Each link of S is represented in GS as an alternating chain of edges and cross-vertices. Furthermore, the embedding that we construct for GS is an lr-embedding. An lr-embedding is such that: • the edges incident on each vertex vT representing a table T with attributes a1 , . . . , ak are partitioned into 2k possibly empty sets l1 , . . . , lk , r1 , . . . , rk ; • the edges of li ∪ ri represent the links incident on attribute ai ; • the edges of li (ri ) are contiguous in the circular order around vT ; • sets l1 , . . . , lk , rk , . . . , r1 appear in this counter-clockwise order around vT . Copyright  2002 John Wiley & Sons, Ltd.

Softw. Pract. Exper. 2002; 32:1065–1098

1078

G. DI BATTISTA ET AL.

(a)

(b)

(c)

(d)

Figure 8. Patterns for constructing an orthogonal representation after the u-turns assignment. (a) Vertex representing a table; (b) cross-vertex; (c, d) u-vertices.

The edges of li (ri ) are called left (right) edges. Links represented by an edge of li (ri ) enter table T from the left (right) in the final drawing. Suppose we have at our disposal a standard planarization facility [3], with the additional feature that some edges can be specified as non-crossable. We now explain how it is possible to use such a facility to compute an lr-embedding. • Graph GS is mapped to a new graph PS in which each vertex vT , associated with a table T with k attributes, is represented by a chain with (k + 2) vertices. Vertices and edges of the chain are called attribute-vertices and attribute-edges, respectively. The sequence of attributevertices composing the chain is {vnorth, v1 , . . . , vk , vsouth }, where vi is associated with attribute ai (i = 1, . . . , k). The edges representing links incident on attribute ai are made incident on vi . Intuitively, attribute-vertices and attribute-edges represent the sequence of the attributes of the table, and vertices vnorth and vsouth represent the top and the bottom of the table, respectively (see Figure 9(a)). • A planarization is performed on PS with the constraint that every attribute-edge in PS is uncrossable. The result is an embedded graph PS that contains the same attribute-vertices and attribute-edges as PS , and every other edge of PS is represented in PS as an alternating chain of edges and cross-vertices. In Figure 9(a) the square represents a cross-vertex. • The lr-embedding for GS is obtained from PS by contracting all the attribute-edges associated with the same table into a unique vertex. All the other edges and cross-vertices remain unchanged (see Figure 9(b)). More formally, the circular order of the edges around each vertex vT is computed in the following way. For each table T consider the path of attribute-vertices {vnorth, v1 , . . . , vk , vsouth } that represents T in PS . Each attribute-vertex vi , with i = 1, . . . , k, is incident on two attribute-edges that we call the north attribute-edge and the south attribute-edge of vi according to the north–south orientation of the path. For each attribute-vertex vi we assign to li (ri ) the sequence of edges that incide on vi between the north (south) attribute-edge and the south (north) attribute-edge in counter-clockwise order. The circular sequence around vT is obtained by concatenating l1 , . . . , lk , rk , . . . , r1 in this order. Copyright  2002 John Wiley & Sons, Ltd.

Softw. Pract. Exper. 2002; 32:1065–1098

DRAWING DATABASE SCHEMAS

1079

vnorth v1 v2 v3 vsouth

(a)

(b) Figure 9. The constrained planarization step. (a) Vertices representing tables are expanded into chains of uncrossable edges and a planarization algorithm is performed. The squares represent a cross-vertex and triangles represent the upper part or the lower part of a table. (b) Each chain representing a table is contracted into a single vertex, while preserving the circular ordering of the edges around the table.

Copyright  2002 John Wiley & Sons, Ltd.

Softw. Pract. Exper. 2002; 32:1065–1098

1080

G. DI BATTISTA ET AL.

Observe that in the above algorithm to find an lr-embedding we require the capability to planarize a graph in such a way that specific edges never intersect (uncrossability constraint). This can be done by using a technique described in [3], and such a technique has already been implemented and is available in existing graph drawing tools (see, e.g., [30]). We also observe that the above algorithm is correct. Its correctness is based on the following observations. • Since attribute-edges are uncrossable, we can perform their contraction. Such a contraction would be unfeasible if some cross-vertices appeared between two adjacent attribute-vertices. • The expansion performed in the first step makes it impossible, in the ordering of the edges around vT , to have ‘mixings’ of edges of li (ri ) with lj ∪ rj (i = j ). This implies that the edges of li (ri ) appear consecutively around vT . • After the contraction of attribute-edges of table T , sets l1 , . . . , lk , rk , . . . , r1 appear in this counter-clockwise order around vT . U-turns assignment The u-turns assignment step associates a (possibly empty) sequence of u-vertices with each edge of GS . Each u-vertex has degree two and represents a u-turn in the final drawing (see Figure 7(b)). U-vertices are introduced, along with an orientation of the edges of GS , in such a way that the resulting digraph can be drawn upward planar in the left–right direction within the given embedding. Our algorithm for introducing u-vertices and assigning an orientation to the edges works in two steps, detailed below. Substep 1. First a suitable number of u-vertices is introduced and an orientation is given to all the edges of GS in order to obtain a digraph DS with a planar bimodal embedding. Substep 2. A constrained version of the algorithm in [11] is applied on DS , in order to compute the minimum number of additional u-vertices that are needed to produce an upward planar drawing of the digraph in the left–right direction. We start by describing Substep 1: it computes digraph DS in such a way that it has an associated bimodal planar embedding that preserves the planar embedding of GS on the common vertices. In order to guarantee that a planar bimodal embedding of DS exists, we require that the following properties hold. (a) For each vertex v of DS that represents a table all the left edges are oriented incoming v and all the right edges are oriented outgoing v. (b) For each cross-vertex v of DS , consider two edges incident on v and representing the same link. Such edges are oriented one incoming and the other outgoing v. (c) For each u-vertex of DS its two incident edges are oriented either both incoming or both outgoing. The algorithm we use for computing DS is based on the following strategy. At each step we consider a different link of the graph and we orient all the edges that represent this link. During the orientation, it might be necessary to add one u-vertex on the link in order to satisfy property (a). Since a link is taken Copyright  2002 John Wiley & Sons, Ltd.

Softw. Pract. Exper. 2002; 32:1065–1098

DRAWING DATABASE SCHEMAS

1081

into account exactly once, and since at most one new u-vertex is added for each link, the algorithm works in linear time in the number of edges of GS . The algorithm works as follows. Let L be the set of all links of GS . For each link in L, let v1 , e1 , v2 , e2 , . . . , vn , en , vn+1 (n > 0) be the ordered sequence of vertices and edges that form the link. Vertices v1 and vn+1 represent the extreme tables of the link, while the remaining vertices represent crossings. Two cases are possible. • If e1 is a left edge of v1 then: 1. if n > 1 then, for each i in 1, . . . , n − 1, ei is oriented incoming vi ; 2. for the last edge en two cases are possible: – if en is a right edge of vn+1 then en is oriented incoming vn ; – if en is a left edge of vn+1 then en is split into two new edges by adding a new u-vertex u; the two new edges are oriented both outgoing u. • If e1 is a right edge of v1 then: 1. if n > 1 then, for each i in 1, . . . , n − 1, ei is oriented outgoing from vi ; 2. for the last edge en two cases are possible: – if en is a left edge of vn+1 then en is oriented outgoing vn ; – if en is a right edge of vn+1 then en is split into two new edges by adding a new u-vertex u; the two new edges are oriented both incoming u. Note that the split operation that creates a new u-vertex together with the orientation performed by the algorithm when such a u-vertex is added are sufficient to keep properties (a) and (c) always valid. Furthermore, the algorithm assigns orientations to the two edges that are incident on a cross-vertex and that represent the same link, so that property (b) also holds. Figure 10(a) shows a digraph obtained by applying the above algorithm to the graph of Figure 9(b). After an embedded bimodal digraph DS has been obtained, we apply Substep 2, which is a simple variation of the algorithm presented in [11]. Namely, this algorithm computes a quasi-upward planar drawing of an embedded bimodal digraph with the minimum number of bends. In our case, the quasiupward drawing we compute corresponds to the left–right development of the schema and the bends of the drawing are the additional u-vertices we have to add after Substep 1. The variation we apply to the algorithm in [11] consists of setting a suitable set of constraints in order to keep unchanged the top–down linear ordering of the edges incident on each table. To do that we temporarily add a dummy left (right) edge entering (leaving) each vertex that represents a table and that has only outgoing (incoming) edges (see Figure 11). The dummy edges are removed after the algorithm in [11] is applied. Figure 10(b) shows a digraph obtained by applying Substep 2 to the intermediate result depicted in Figure 10(a). Orthogonalization The result of this step is an orthogonal representation H of GS that preserves the left-to-right development of the drawing stated in the previous step. More formally, for each vertex v of H that represents a table, we require all the edges incoming v to be incident on the left-hand side of v, and Copyright  2002 John Wiley & Sons, Ltd.

Softw. Pract. Exper. 2002; 32:1065–1098

1082

G. DI BATTISTA ET AL.

(a)

(b) Figure 10. U-turns assignment. (a) Assigning an orientation to the edges and adding u-turns to find a bimodal embedded digraph in Substep 1. U-vertices are represented as white circles. (b) Adding a minimum set of additional u-vertices to get an embedded upward planar digraph in Substep 2. The additional u-vertices are represented as black circles.

Copyright  2002 John Wiley & Sons, Ltd.

Softw. Pract. Exper. 2002; 32:1065–1098

DRAWING DATABASE SCHEMAS

e

(a)

a

c

b

a

c

b

1083

(b)

Figure 11. Adding dummy edges to keep the top–down linear ordering of the edges incident on a vertex unchanged. (a) Dummy edge e avoids edges a, b, and c having to change their top–down linear ordering on v. (b) If we do not add e, the top–down linear ordering of edges a, b, and c might change, although the circular ordering of these edges around v is unchanged.

all the edges outgoing v to be incident on the right-hand side of v. Also, the top–down ordering of the edges incident on v, computed in the u-turn assignment step, must not change. We compute H from DS by simply applying the transformation patterns shown in Figure 8. Namely, for each vertex v in DS , three different cases are possible: • if v represents a table then pattern (a) is applied; • if v is a cross-vertex then pattern (b) is applied; • if v is a u-vertex then pattern (c) or pattern (d) is applied depending on the direction of the two edges (incoming or outgoing) that are incident on v. Patterns (a) and (b) describe the angles between the edges that are incident on v plus part of the shape of such edges. Patterns (c) and (d) describe the shape of the edge in correspondence with a u-turn. Namely, we recall that in DS a u-turn is represented by a u-vertex, absorbed in H . Hence, in H , we model a u-turn by adding two consecutive 90 degree bends on the edge to which the u-turn belongs. After H is computed by applying the above-defined procedure, the orientation of the edges of H is removed, that is, it is ignored from now on (see Figure 7(c)). Observe that the straightforward application of the above patterns may give rise to avoidable bends on the edges. Namely, on some edges of H there may be alternate subsequences of left and right bends that can be removed (see Figure 12). To detect and remove such avoidable bends we perform a simple post-processing algorithm on H which works as follows. For each edge e of H it searches for pairs of consecutive left–right or right–left bends and removes them. Each bend pair removal may create a new pair of consecutive bends that, in turn, should be considered for removal. Exhaustive detection and removal of all avoidable bend pairs may be efficiently performed in linear time by using a stack. Namely, we walk on edge e in one of the two possible directions and consider its bends in the order they are encountered. For each new bend encountered we push it into the stack and check if the pair of bends on the top of the stack is a left–right pair or a right–left pair. If this is the case the bends of the pair are removed both from the stack and from H . The procedure ends when all bends of e have been processed. The post-processing algorithm does not consider for removal the bends introduced by pattern (a) of Figure 8. In fact, such bends are needed to ensure that a podevsnef drawing exists, and hence they cannot be removed. Figure 12 shows an application of the described algorithm on an edge with avoidable bends. Copyright  2002 John Wiley & Sons, Ltd.

Softw. Pract. Exper. 2002; 32:1065–1098

1084

G. DI BATTISTA ET AL.

1

2

7

8 5

4

1

2

1

7

8

2

8

3

6

3

(a)

4

3

(b)

(c)

Figure 12. Detection and removal of avoidable bends. (a) Bends from 1 to 6 are sequentially pushed into the stack and pair {5, 6} is selected for removal. (b) Bend 7 is pushed into the stack and the pair {4, 7} is selected for removal. (c) The final shape of the edge.

Constrained compaction In this step a complete DBS- DRAWING is computed from the orthogonal representation H by assigning coordinates to vertices and bends, and by giving the correct size to each vertex that represents a table. We essentially apply the compaction algorithm described in [23]: it computes a podavsnef drawing that preserves the shape of H . The width we assign to each table is proportional to the length of the longest attribute of the table. The name of the table is also taken into account. The height we assign to each table is proportional to the number of attributes of the table itself. We also have to guarantee that each link (T1 , A, T2 , B), where A and B have cardinality equal to one, is incident on T1 and T2 at the heights of the attributes in A and B. The basic version of the algorithm described in [23] allows the edges to freely shift along the side they are incident on. However, it is possible to easily adapt such an algorithm so that the point on which each edge is incident is preassigned. Finally, the cross-vertices are removed so that each link is represented by only one edge. Time complexity analysis The following result summarizes the analysis of the computational complexity. Theorem 1. Given a database schema with n tables, m links, and a bounded number of attributes per table, a DBS-DRAWING can be computed in O((n + c)2 log(n + c)) time, where c is the number of crossings of the output drawing. Proof. In the Constrained planarization step the graph GS , with n vertices and m edges, is transformed into a graph PS by expanding each vertex that represents a table in a chain. The length of such a chain is proportional to the number of attributes of the table. Since the number of attributes per table is bounded, PS has O(n) vertices and O(m + n) edges; however, only O(m) of such edges are allowed to cross other edges. The adopted planarization technique, detailed in [3], first computes a maximal planar subgraph in linear time and then computes a shortest path for each edge that crosses other edges. Each Copyright  2002 John Wiley & Sons, Ltd.

Softw. Pract. Exper. 2002; 32:1065–1098

DRAWING DATABASE SCHEMAS

1085

shortest path is performed on a graph with O(n + c) vertices in linear time by means of a breadth first search. Also, each chain that represents a crossing edge is inserted in linear time. Hence, on PS , the planarization takes O(m(n + c)). After this step, the number of vertices of the resulting graph GS is N = n + c. The U-turn assignment step takes O(N) time to compute the bimodal orientation DS since it visits the edges of GS exactly once. The number of vertices of DS is O(N) since at most one u-vertex is inserted for each link. The application of the flow technique described in [11] to insert the minimum number of u-turns in DS takes O(N 2 log N) time. The Orthogonalization step takes O(N) time to produce the orthogonal representation of GS starting from DS . In fact, patterns (b), (c), and (d) (see Figure 8) may be applied in constant time. Application of pattern (a) takes O(N) time overall because each edge is visited at most two times. The Constrained compaction step takes O(N 2 log N) time to compact the drawing by using flow techniques, as described in [3,23]. From the above analysis the statement follows. 2 Note that, even if c may be (n4 ), database schemas observed in practice are quite sparse (m = O(n)), and ‘almost planar’, so c is usually much smaller. FROM THE ALGORITHM TO REAL-LIFE APPLICATIONS In this section we describe the methodological and architectural choices we made in order to build a software system implementing the DBS- ALGORITHM described above. These choices were aimed to create modular and reusable code. Furthermore, since the DBS- ALGORITHM is quite complex and involves many graph-theoretic and graph drawing concepts, we devoted special attention to the building up of friendly interfaces targeted at users with different skills and requirements. In particular, we imagined two kinds of users: end-users and developers. We assume that end-users want to use the system ignoring all the technological details. Developers, instead, should be aware of all the details strictly needed to embed (re-use) the technology into a plurality of new systems. The architecture of our software system represents a carefully chosen tradeoff between simplicity of use and flexibility. In the following we first describe such an architecture and then discuss it in detail. Our software system is composed of two main parts. • The drawing engine: a module that encapsulates the implementation of the DBS- ALGORITHM. The drawing engine provides an API that allows us to create the graph representing the database schema and to compute a drawing of it. The architecture of the drawing engine is further articulated into several parts, which are explained in detail later. • The hosting system: the subsystem that provides an interface for the end-user or towards other subsystems. The hosting system communicates with the drawing engine by means of the above mentioned API. A schematic illustration of the architecture described above is depicted in Figure 13. Architecture rationale The two main parts in which the project is divided, the hosting system and the drawing engine, correspond to different skill levels needed for their development. In fact, the final system is necessarily Copyright  2002 John Wiley & Sons, Ltd.

Softw. Pract. Exper. 2002; 32:1065–1098

1086

G. DI BATTISTA ET AL.

Hosting System database schema

DBS−drawing API

Drawing Engine Figure 13. The two main parts of our system. We encapsulated the implementation of the DBS- ALGORITHM into the drawing engine. The hosting system interacts with the drawing engine by means of a well-defined API.

Table I. The development process of the hosting system and that of the drawing engine are very different. A comparison with respect to some of the most relevant characteristics.

Hosting system Drawing engine

Tools and methodologies

Programmer skill needed

Standard and well-known Peculiar

Widespread Specialized

Expected cost

Re-usability

Portability

Indefinite: wide range High

Not compelling

Hard

Mandatory

Easy

provided with a user interface and with functionalities for automatically importing database schemas, for example from database management systems, data-files, or SQL scripts. These input/output features may be easily implemented in the hosting system using well-known techniques and visual tools. The skills involved are those of a medium-level developer and the time-effort is easy to estimate. On the other hand, the algorithmic complexity of the DBS- ALGORITHM requires specialized expertise, and the time-effort is much harder to estimate. Furthermore, familiarity with specific libraries is needed. Thus, the division of the system into two modules follows naturally from the goal of efficiently employing the resources, and facilitates the re-use of the core implementation of the algorithm in a multiplicity of systems. To further improve the re-usability of the drawing engine, we studied in particular the friendliness of its API (see the next section). Table I summarizes the differences between the development process of the hosting system and that of the drawing engine. To ease the integration of the drawing engine into simpler systems we provided a further mediator module, offering an XML-based input/output interface to the hosting system. Using the mediator, all the hosting system has to do is save the database schema description in a file using an elementary Copyright  2002 John Wiley & Sons, Ltd.

Softw. Pract. Exper. 2002; 32:1065–1098

DRAWING DATABASE SCHEMAS

1087



Hosting System


drawing in XML format

database schema in XML format

name name st sa tt ta >

mediator

CDATA CDATA CDATA CDATA

CDATA CDATA

#REQUIRED> #REQUIRED>

#REQUIRED #REQUIRED #REQUIRED #REQUIRED

] >

database schema

DBS−drawing API

Drawing Engine



(a)

(b)

Figure 14. (a) The architecture shown in Figure 13 is refined by inserting a further level, the mediator, that augments the flexibility of the architecture. (b) The communication between the hosting system and the mediator is performed by using XML-based file formats.

XML-based format and read the output from another file. The role of the mediator and an example of the XML file format are depicted in Figure 14. The drawing engine An implementation ‘from scratch’ of the DBS- ALGORITHM would require several man-years of work. However, many graph drawing libraries, built for commercial or research purposes, may be effectively used in order to reduce the implementation and maintenance effort [30–32]. We chose to implement the DBS- ALGORITHM in C++ using the GDToolkit‡ library. This library provides algorithms and data structures for graph drawing applications and supports both orthogonal and quasi-upward drawings within the topology–shape–metrics approach. Furthermore, various constraints on the drawings are dealt with. GDToolkit is built on LEDA [33], using in particular its basic data structures and the efficient planarity testing algorithm described in [34]. Figure 15 shows

‡ http://www.dia.uniroma3.it/∼gdt.

Copyright  2002 John Wiley & Sons, Ltd.

Softw. Pract. Exper. 2002; 32:1065–1098

1088

G. DI BATTISTA ET AL.

hosting system database DBS−drawing schema API

DBS−algorithm GDToolkit LEDA Figure 15. The architecture of the drawing engine. The DBS- ALGORITHM layer fills the gap between GDToolkit (a general graph drawing library) and the specific needs of the application.

table attribute link void

input new table() new attr( table ) new link( attribute src, attribute trg ) set table width( table t, unsigned int x )

void

commit and compute compute()

point list

output table top left( table ) link polyline( link )

Figure 16. The most relevant methods of the API provided by the drawing engine: input methods allow one to specify the schema, the compute method launches the computation, and output methods are used to retrieve graphical information.

the drawing engine architecture. To have an idea of the relative weight of each layer, we measured that GDToolkit consists of about 70 000 lines of code while the DBS- ALGORITHM implementation consists of 3000 lines of code. LEDA consists of about 200 000 lines of code, but only a small portion of it is used. The implementation of the DBS- ALGORITHM with GDToolkit is described in detail in the next section. Figure 16 shows the relevant methods of the API of the drawing engine. The interaction protocol with the drawing engine is quite simple and the standard usage consists of four steps: 1. create an instance of the drawing engine; 2. specify the schema using the input methods of the API that return identifiers for the tables, the attributes, and the created links; Copyright  2002 John Wiley & Sons, Ltd.

Softw. Pract. Exper. 2002; 32:1065–1098

DRAWING DATABASE SCHEMAS

1089

3. call the compute method that validates the data and computes the drawing; 4. retrieve the drawing by means of the output methods of the API. The drawing engine API is written using the STL [35] data structures, so that a hosting system developer is not required to be familiar with specific GDToolkit or LEDA data structures. Observe that, since the order of the attributes is preserved by the drawing algorithm, the first attribute of each table may be used to contain the table name. Also, the API provides coordinates and lengths in grid units. The developer may define all the graphic features, such as font sizes, colors, and line styles, according to the freedom allowed by the chosen output device. The drawing engine development and testing was performed on a Linux platform. The hosting system was a simple command-line oriented application that used Tcl/Tk for graphic visualization and postscript output. Test schemas were stored in an XML-based file format. GDToolkit and the DBS- ALGORITHM GDToolkit overview The implementation of the drawing engine intensively uses the GDToolkit library. This library is specifically suited to support development of graph drawing systems based on the topology–shape– metrics approach. It provides classes for representing most of the specific combinatorial structures that a graph drawing system based on such an approach may need: general embedded graphs (class undi graph), planar embeddings equipped with faces (class plan undi graph), orthogonal representations (class orth plan undi graph), upward and quasi-upward representations (class upwa plan undi graph), and final drawings (class draw undi graph). In addition to such data structures, GDToolkit provides several well-known algorithms for promoting an object from one of the classes to the next. For example, the straightforward topology–shape–metrics approach described in the drawing background section can be implemented by using GDToolkit with a very few lines of code [30,36]. However, the DBS- ALGORITHM requires more programming effort due to the particular drawing standard adopted. GDToolkit allows one to attach markers and constraints to vertices and edges. Markers are just Boolean flags that specify the origin or function of an element of the graph such that subsequent operations treat the involved element accordingly. For example, dummy vertices representing crossings that are introduced by the planarization step are recognized and removed during the compaction step thanks to their markers. Constraints are quite similar to markers, but carry more complex information and may involve more than an object. In a complex algorithm a graph undergoes several changes, copies, and modifications. An interesting feature of markers and constraints in GDToolkit is that they are automatically and consistently transferred and updated if needed. A restricted set of methods specifies the behavior of each constraint for each specific graph update primitive. In the DBS- ALGORITHM implementation we make large use of markers and constraints. In the following we detail the implementation characteristics of each step, specifying where and how GDToolkit has supported the work. We stress this point since, in our opinion, the relationships between new algorithm implementations and algorithm libraries is one of the hot themes in the Algorithm Engineering field. Copyright  2002 John Wiley & Sons, Ltd.

Softw. Pract. Exper. 2002; 32:1065–1098

1090

G. DI BATTISTA ET AL.

Using GDToolkit In the constrained planarization step of DBS- ALGORITHM, we have shown how graph GS , which represents the database schema, is transformed into another graph PS (see Figure 9), which is then planarized. Since the API shown in Figure 16 requires the database schema to be incrementally specified, it is convenient, in the implementation of the drawing engine, to directly build PS during the input phase. Namely, whenever a new table or a new attribute is added to the schema, the graph PS is updated by creating or modifying the corresponding chain of vertices and edges. Constrained planarization. Since PS (see Figure 9) is directly available by the incremental construction of the database schema, we copy such a graph and call the GDToolkit planarization method (undi graph::planarize()). Such a method fully supports the constrained planarization we need. Namely, we attach to the edges of the chains representing tables a constraint specifying that those edges must not be crossed during the planarization. Also, note that in the planarization phase each link is split into a chain of edges when it needs to cross other links, and special care should be used to keep the correspondence between such a chain and the original link. The technique adopted by GDToolkit is based on the numerical identifiers of the edges: when an edge is split, one of the two edges takes the numerical identifier of the original edge and the other is identified by a higher number. Hence, the identifier of the link corresponding to a chain is assumed to be the lowest identifier among the identifiers of the edges of the chain. Information about the table side on which each link is incident is retrieved from the result of the planarization and stored in a suitable data structure Z, before performing the contraction of the chains representing tables. Contraction is performed by applying a GDToolkit method (plan undi graph::contract()), which contracts all the edges that have attached a specific GDToolkit marker, preserving the embedding around the extreme vertices of these edges. Hence, in order to contract the edges of the chains of PS representing tables exactly, we mark them with the correct GDToolkit marker before applying the plan undi graph::contract() method. U-turn assignment. The implementation of Substep 1 creates the bimodal embedded digraph DS . Since this part is quite specific, GDToolkit supports only the low-level primitives such as edge orientation and edge splitting: they are performed by calling methods of class undi graph. In doing that, the information stored in Z is used. The core of the algorithm, however, is implemented outside GDToolkit. On the resulting digraph an upward representation with a minimum number of additional u-turns is created by using a specifically constructor of the upwa plan undi graph class of GDToolkit. The dummy edges are inserted before calling such a method, so as to constrain the top–down linear ordering of the edges. In order to correctly embed the dummy edges, we compute the edge that appears on the top of each switch from the information stored in Z. Orthogonalization. The orthogonalization step is completely performed by a constructor of class orth plan undi graph of GDToolkit: it takes as input the upward representation DS upwa plan undi graph and creates an object defining an orthogonal representation H . Such a constructor also performs detection and removal of avoidable bends. Copyright  2002 John Wiley & Sons, Ltd.

Softw. Pract. Exper. 2002; 32:1065–1098

DRAWING DATABASE SCHEMAS

1091

Person SSN Name Dept.

c1

c3 c4 h

Address c2

Birth

Figure 17. An example of how the edge positions around a table can be constrained. GDToolkit allows us to specify the distance between the edge’s incidence point and the first corner encountered while moving around the vertex from that point counterclockwise. In this example, to respect the computed embedding, c2 < c1 < h and c3 < c4 < h.

Constrained compaction. GDToolkit provides a constructor of class draw undi graph from an instance of the class orth plan undi graph. Using such a constructor it is easy to obtain a drawing from H . It performs the compaction of the drawing preserving the shape of H . In order to get the final drawing in the chosen drawing standard, we attach to H several kinds of constraints that are automatically taken into account by the constructor. • In GDToolkit the instances of orth plan undi graph, such as H , do not specify any direction, in the sense that the final drawing may be arbitrarily rotated. The direction of the final drawing is specified by selecting a vertex v, an edge e incident on v, and a direction d from north, south, east, and west. The final drawing is such that the segment of e incident on v is oriented according to d while moving from v. Such a technique is used so that all the edges of the final drawing are incident on the left-hand side or on the right-hand side of the tables. • The height and the width of each table in the final drawing is chosen by attaching suitable constraints to each vertex that represents a table. Namely, GDToolkit allows us to independently set the height and width of each vertex. • The height at which each edge is incident on a vertex is chosen by means of a suitable constraint attached to the edge. In GDToolkit it is possible to express such a constraint by specifying the distance between the edge’s incidence point p and the first corner encountered while moving around the vertex from p counterclockwise (see Figure 17). Such a distance should be less than or equal to the length of the side on which the edge is incident. If this constraint is used over more than one edge on the same side the distances should be consistent with the computed embedding. Experimental analysis In order to evaluate the effectiveness of our implementation of the DBS- ALGORITHM, we tested it over a set of randomly generated database schemas, and we studied the behavior of the aesthetic Copyright  2002 John Wiley & Sons, Ltd.

Softw. Pract. Exper. 2002; 32:1065–1098

1092

G. DI BATTISTA ET AL.

18

200

constraints no constraints

16

constraints no constraints

180 160

14

140

12

120

10

100

8

80

6

60

4

40

2

20

0

0

10

20

30

40

50

60

70

80

90

100

10

20

30

40

50

(a)

60

70

80

90

100

(b)

600

constraints no constraints

500 400 300 200 100 0 10

20

30

40

50

60

70

80

90

100

(c) Figure 18. Crossings. Densities (a) 1.2, (b) 1.6, (c) 2.0.

measures related to our goals. We also compared such aesthetic measures with those computed by other suboptimal algorithms that can be used to get an estimate of a lower bound. The overall test suite consists of 900 schemas, 10 for each pair of values n, d, where n is the number of tables varying in the range 10–100, with step 10, and d is the density, that is, the number of links over the number of tables, varying in the range 1.2–2.0, with step 0.1. Each schema with n tables and density d is generated with the following procedure. For each table Ti (i = 1, . . . , n) of the schema, we randomly choose the number ki of attributes of Ti with a uniform probability distribution in the range 1–10. We order the attributes of Ti and we denote by Ai = {ai1 , . . . , aiki } the set of such attributes. To add each one of the n · d links we randomly select two distinct tables Ti and Tj (i, j ∈ {1, . . . , n}), and two attributes air , aj s , where r ∈ {1, . . . , ki } and Copyright  2002 John Wiley & Sons, Ltd.

Softw. Pract. Exper. 2002; 32:1065–1098

1093

DRAWING DATABASE SCHEMAS

18

200

constraints no constraints

16

constraints no constraints

180 160

14

140

12

120

10

100

8

80

6

60

4

40

2

20 0

0 10

20

30

40

50

60

70

80

90

10

100

20

30

40

50

(a)

60

70

80

90

100

(b)

600

constraints no constraints

500 400 300 200 100 0 10

20

30

40

50

60

70

80

90

100

(c) Figure 19. Bends. Densities (a) 1.2, (b) 1.6, (c) 2.0.

s ∈ {1, . . . , kj }. Once all links have been added, we check whether the relational schema is connected. If not, we discard the schema and restart the generation from the beginning, until a connected schema is obtained. On each schema we ran the DBS- ALGORITHM and, according to the guidelines given in [37], we measured separately the performances of each algorithmic step. Since no alternative algorithm could be found in the literature satisfying the constraints imposed by the DBS- DRAWING standard, we compared the performances of each algorithmic step with those obtained by relaxing the constraints of the same step. This comparison gives a pessimistic measure of the behavior of the algorithm, because we are using infeasible solutions as lower bounds. Since the experimental results uniformly vary with the density, we show only those relative to densities 1.2, 1.6, and 2.0. Copyright  2002 John Wiley & Sons, Ltd.

Softw. Pract. Exper. 2002; 32:1065–1098

1094

G. DI BATTISTA ET AL.

250

700

constraints no constraints

constraints no constraints

600

200

500

150

400 300

100

200

50 100

0

0

10

20

30

40

50

60

70

80

90

100

10

20

30

40

50

(a)

60

70

80

90

100

(b)

1.2e+06

constraints no constraints

1e+06 800000 600000 400000 200000 0 10

20

30

40

50

60

70

80

90

100

(c) Figure 20. Area. Densities (a) 1.2, (b) 1.6, (c) 2.0.

Figure 18 shows the number of crossings introduced by the Constrained planarization step against that obtained by applying the same step where the embedding may be inconsistent with the order of the attributes. In the latter case, we applied the well-known planarization algorithm described in [3]. The number of crossings produced by the two algorithms is similar, and the negative effect of the planarization constraints on the quality of the drawings decreases with the increase of the density. We remark that crossings are considered the most important aesthetic criterion for human readability. Figure 19 shows the number of bends introduced by steps U-turn assignment and Orthogonalization starting from the embedding computed by the Constrained planarization step. The values are compared with those obtained by the optimal algorithm given in [4]. In this case the number of bends computed Copyright  2002 John Wiley & Sons, Ltd.

Softw. Pract. Exper. 2002; 32:1065–1098

1095

DRAWING DATABASE SCHEMAS

60

250

planarization u-turn assignment orthogonalization constr. compaction

50

planarization u-turn assignment orthogonalization constr. compaction

200

40

150 30

100 20

50

10

0

0 10

20

30

40

50

60

70

80

90

10

100

20

30

40

50

(a)

60

70

80

90

100

(b)

1.2e+06

constraints no constraints

1e+06 800000 600000 400000 200000 0 10

20

30

40

50

60

70

80

90

100

(c) Figure 21. CPU time. Densities (a) 1.2, (b) 1.6, (c) 2.0.

by our technique is outperformed by that computed by the optimal algorithm. This is due both to the optimality of such an algorithm and to the relaxation of the constraints, which leave the algorithm with few degrees of freedom. Figure 20 shows the area of the drawing computed by the Constrained compaction step against the area of the drawings computed by applying the compaction algorithm in [23], in which links are not constrained to be incident on their attributes. In this case the two curves are quite similar. We also measured the CPU-time performance of each step of the DBS- ALGORITHM. All the experiments have been performed on a PC Pentium III 700 MHz, with a 256 Mb RAM, under the Linux RedHat 6.2 operating system. The executable code has been produced and optimized by using the GNU egcs-2.91.66 compiler. From the charts of Figure 21 it is apparent that the Constrained compaction step Copyright  2002 John Wiley & Sons, Ltd.

Softw. Pract. Exper. 2002; 32:1065–1098

1096

G. DI BATTISTA ET AL.

Figure 22. A snapshot of a DBdraw session.The window at the front shows the drawing of a database schema embedded in a Word document. The window at the back shows the dialog box that allows the user to select the tables to be shown and the visualization styles of the tables.

takes much more time than the other phases. However, the whole computation on instances that have at most 50 tables requires less than a minute. On the other hand, drawings with a higher number of tables are usually too cluttered to be of interest for practical applications. In conclusion, the experiments show that our implementation of the DBS- ALGORITHM can usually be successfully applied, both in terms of computation time and in terms of drawing readability, on schemas that have up to 50 tables and a relatively small density. This is the case for many practical situations. A hosting system We implemented an example hosting system within the Microsoft Windows platform. Its main features are as follows: • it provides a graphical interface that allows the user to specify a Microsoft Access database; • it extracts the schema from the specified database; Copyright  2002 John Wiley & Sons, Ltd.

Softw. Pract. Exper. 2002; 32:1065–1098

DRAWING DATABASE SCHEMAS

1097

• it allows the user to customize the set of tables to be visualized and to choose, for each table, one of the following visualization styles: full all attributes of the table are shown, partial only the attributes of the table that are linked to some other attributes are shown, collapsed only the table name is shown; • it computes the drawing, according to the given specifications, by using the drawing engine described above; • it embeds the drawing in a Microsoft Word document whose name is specified by the user. We chose to adopt Access and Word on the Microsoft Windows operating system because of their wide usage. Since such applications can be easily programmed or accessed by means of the Visual Basic scripting language, we used it to implement the hosting system. A preliminary version of our tool, which we call DBdraw, is available on the Web§ . Figure 22 shows a snapshot of a DBdraw session. In DBdraw, the drawing engine is wrapped into a Dynamically Linked Library (DLL), whose functionalities can be easily used by the hosting system.

ACKNOWLEDGEMENT

We thank Antonio Leonforte for implementing part of the system.

REFERENCES 1. Batini C, Talamo M, Tamassia R. Computer aided layout of entity-relationship diagrams. Journal of Systems and Software 1984; 4:163–173. 2. Batini C, Nardelli E, Talamo M, Tamassia R. GINCOD: A graphical tool for conceptual design of data base applications. Computer Aided Data Base Design, Albano A, De Antonellis V, Di Leva A (eds.). North-Holland: New York, 1985; 33–51. 3. Di Battista G, Eades P, Tamassia R, Tollis IG. Graph Drawing. Prentice-Hall: Upper Saddle River, NJ, 1999. 4. Tamassia R. On embedding a graph in the grid with the minimum number of bends. SIAM Journal of Computing 1987; 16(3):421–444. 5. Tamassia R, Di Battista G, Batini C. Automatic graph drawing and readability of diagrams. IEEE Transactions on Systems, Man and Cybernetics 1988; 18(1):61–79. 6. Codd EF. A relational model of data for large shared data banks. Communications of the ACM 1970; 13(6):377–387. Also published as Readings in Database Systems, Stonebraker M (ed.). Morgan-Kaufmann, 1988: 5–15. 7. Atzeni P, Ceri S, Paraboschi P, Torlone R. Database Systems: Concepts, Languages and Architectures. McGraw-Hill: London, 1999. 8. Harary F. Graph Theory. Addison-Wesley: Reading, MA, 1972. 9. Bertolazzi P, Di Battista G, Liotta G, Mannino C. Upward drawings of triconnected digraphs. Algorithmica 1994; 6(12):476–497. 10. Garg A, Tamassia R. On the computational complexity of upward and rectilinear planarity testing. Graph Drawing (Proceedings of GD ’94) (Lecture Notes in Computer Science, vol. 894), Tamassia R, Tollis IG (eds.). Springer: Berlin, 1995; 286–297.

§ http://www.dia.uniroma3.it/∼dbdraw.

Copyright  2002 John Wiley & Sons, Ltd.

Softw. Pract. Exper. 2002; 32:1065–1098

1098

G. DI BATTISTA ET AL.

11. Bertolazzi P, Di Battista G, Didimo W. Quasi-upward planarity. Algorithmica 2002; 32(3):474–506. 12. Batini C, Nardelli E, Tamassia R. A layout algorithm for data flow diagrams. IEEE Transactions on Software Engineering 1986; SE-12(4):538–546. 13. Hopcroft J, Tarjan RE. Efficient planarity testing. Journal of the ACM 1974; 21(4):549–568. 14. Lempel A, Even S, Cederbaum I. An algorithm for planarity testing of graphs. Theory of Graphs: International Symposium (Rome 1966). Gordon and Breach: New York, 1967; 215–232. 15. Chiba N, Nishizeki T, Abe S, Ozawa T, A linear algorithm for embedding planar graphs using PQ-trees. Journal of Computer and System Sciences 1985; 30(1):54–76. 16. Mehlhorn K, Mutzel P. On the embedding phase of the Hopcroft and Tarjan planarity testing algorithm. Algorithmica 1996; 16:233–242. 17. Garey MR, Johnson DS. Crossing number is NP-complete. SIAM Journal on Algebraic and Discrete Methods 1983; 4(3):312–316. 18. F¨oßmeier U, Kaufmann M. Drawing high degree graphs with low bend numbers. Graph Drawing (Proceedings of GD ’95) (Lecture Notes in Computer Science, vol. 1027), Brandenburg FJ (ed.). Springer: Berlin, 1996; 254–266. 19. Bertolazzi P, Di Battista G, Didimo W. Computing orthogonal drawings with the minimum number of bends. IEEE Transactions on Computers 2000; 49(8):826–840. 20. Patrignani M. On the complexity of orthogonal compaction. Computational Geometry: Theory and Applications 2001; 19(1):47–67. 21. Bridgeman SS, Di Battista G, Didimo W, Liotta G, Tamassia R, Vismara L. Turn-regularity and optimal area drawings of orthogonal representations. Computational Geometry: Theory and Applications 2000; 16(1):53–93. 22. Klau GW, Mutzel P. Optimal compaction of orthogonal grid drawings. Integer Programming and Combinatorial Optimization (Proceedings of IPCO ’99) (Lecture Notes in Computer Science, vol. 1610), Cornuejols G, Burkard RE, Woeginger GJ (eds.). Springer: Berlin, 1999; 304–319. 23. Di Battista G, Didimo W, Patrignani M, Pizzonia M. Orthogonal and quasi-upward drawings with vertices of arbitrary size. Graph Drawing (Proceedings of GD ’99) (Lecture Notes in Computer Science), Kratochvil J (ed.). Springer: Berlin, 1999; 297–310. 24. Di Battista G, Garg A, Liotta G, Tamassia R, Tassinari E, Vargiu F. An experimental comparison of four graph drawing algorithms. Computational Geometry: Theory and Applications 1997; 7:303–325. 25. Batini C, Furlani L, Nardelli E. What is a good diagram? A pragmatic approach. Proceedings of the 4th International Conference on the Entity–Relationship Approach, Chen PP (ed.). North-Holland: Amsterdam, 1985; 312–319. 26. Purchase H. Which aesthetic has the greatest effect on human understanding? Graph Drawing (Proceedings of GD ’97) (Lecture Notes in Computer Science, vol. 1353), Di Battista G (ed.). Springer: Berlin, 1998; 248–261. 27. Purchase HC, Cohen RF, James M. Validating graph drawing aesthetics. Graph Drawing (Proceedings of GD ’95) (Lecture Notes in Computer Science, vol. 1027), Brandenburg FJ (ed.). Springer: Berlin, 1996; 435–446. 28. Chazelle B, The bottom-left bin-packing heuristic: An efficient implementation. IEEE Transactions on Computers 1983; C-32:697–707. 29. Jansen K. An approximation scheme for bin packing with conflicts. Proceedings of the 6th Scandinavian Workshop on Algorithm Theory (Lecture Notes in Computer Science, vol. 1432). Springer: Berlin, 1998; 35–46. 30. GDToolkit. An object-oriented library for handling and drawing graphs. Third University of Rome. http://www.dia.uniroma3.it/∼gdt [1999]. 31. Gutwenger C, J¨unger M, Klau GW, Leipert S, Mutzel P. Graph drawing algorithm engineering with AGD. Software Visualization 2001 (Lecture Notes in Computer Science, vol. 2269), Diehl S (ed.). Springer: Berlin, 2002; 307–323. 32. Tom Sawyer Software. Tom Sawyer Graph Layout Toolkit. Tom Sawyer Software Corporation, 1824B Fourth Street, Berkeley, CA 94710, U.S.A. 33. Mehlhorn K, N¨aher S. LEDA: A Platform for Combinatorial and Geometric Computing. Cambridge University Press: New York, 1998. 34. J¨unger M, Leipert S, Mutzel P. Pitfalls of using PQ-Trees in automatic graph drawing. Graph Drawing (Proceedings of GD ’97) (Lecture Notes in Computer Science, vol. 1353), Di Battista G (ed.). Springer: Berlin, 1997; 193–204. 35. ANSI X3J16. American national standard for information systems—programming language—C++. Approved standard, ANSI. 36. Didimo W. Flow techniques and optimal drawing of graphs. PhD Thesis, Dipartimento di Informatica e Sistemistica, Univerist`a ‘La Sapienza’, 2000. 37. Barr RS, Golden BL, Kelly JP, Resende MGC, Stewart WR. Designing and reporting on computational experiments with heuristic methods. Journal of Heuristics 1995; 1:9–32.

Copyright  2002 John Wiley & Sons, Ltd.

Softw. Pract. Exper. 2002; 32:1065–1098