Model-Driven Development of Web Applications ... - ACM Digital Library

Model-Driven Development of Web Applications: The Autoweb System PIERO FRATERNALI and PAOLO PAOLINI Politecnico di Milano

This paper describes a methodology for the development of WWW applications and a tool environment specifically tailored for the methodology. The methodology and the development environment are based upon models and techniques already used in the hypermedia, information systems, and software engineering fields, adapted and blended in an original mix. The foundation of the proposal is the conceptual design of WWW applications, using HDM-lite, a notation for the specification of structure, navigation, and presentation semantics. The conceptual schema is then translated into a “traditional” database schema, which describes both the organization of the content and the desired navigation and presentation features. The WWW pages can therefore be dynamically generated from the database content, following the navigation requests of the user. A CASE environment, called Autoweb System, offers a set of software tools, which assist the design and the execution of a WWW application, in all its different aspects. Real-life experiences of the use of the methodology and of the Autoweb System in both the industrial and academic context are reported. Categories and Subject Descriptors: H.5.4 [Information Interfaces and Presentation]: Hypertext/Hypermedia; D.2.2 [Software Engineering]: Design Tools and Techniques General Terms: Design, Experimentation, Human Factors Additional Key Words and Phrases: Application, development, WWW, HTML, intranet, modeling

1. INTRODUCTION It is commonly accepted that the diffusion of the Web as a ubiquitous communication medium has fostered a novel type of applications, whose main focus is on capturing the user’s attention by providing facilitated access to information and services [Myers et al. 1996]. Applications in such domains as electronic commerce and digital libraries are requested to support a form of computer-human interaction based on the exploratory access to information, rather than on a predefined dialogue paradigm (e.g., The Autoweb work was partially supported by the Interdata Project, a project funded by the Italian Ministry of University and Scientific and Technological Research (MURST). Authors’ address: Piazza Leonardo da Vinci, 32, I20133 Milano Italy; email: fraterna/ [email protected]. Permission to make digital / hard copy of part or all of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and / or a fee. © 2000 ACM 1046-8188/00/1000 –0323 $05.00 ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000, Pages 323–382.

324

•

P. Fraternali and P. Paolini

form-based interaction). As the Web has demonstrated, hypertextual navigation and content-based querying are the favorite access mechanisms for nontechnical users to browse through vast collections of data. Historically, even prior to the Web advent, this type of interaction has been deeply investigated and practically experimented in the hypermedia field, where many applications like CD-ROMs and information kiosks have been constructed for the general public. However, the architecture of a typical hypermedia application is far simpler than that of an average Web application, thanks to the inherently static nature of the information designed for off-line publication. On the Web, information managed by applications changes very rapidly, is stored in many places, and assumes a variety of formats, both structured and unstructured. These issues demand a solid architecture, founded on well-established technologies for data management, in particular on database technology. Moreover, Web applications must be designed for change, not only of their content, but also of requirements and architectures. Thus, their development needs to be organized into a well-defined process, amenable to the benefits of software engineering, among which automation of repetitive tasks is prominent. 1.1 State of the Practice of Web Development Tools The present practice of application development for the Web sees an impressive number of products being offered, which boast dramatic benefits over the manual development of Web sites. However, a careful review of their features reveals that most solutions concentrate on implementation, paying little attention to the overall process of designing a Web application (a broad review of the status of the tool market and a comparison of current research projects in Web development can be found in Fraternali [1999]): —Visual HTML Editors and Site Managers (e.g., like NetObject’s Fusion, Macromedia’s Dreamweaver, and Microsoft’s FrontPage) concentrate on HTML production and do not support the integration of large masses of data. —HTML-SQL integrators (e.g., like Microsoft’s Active Server Pages (ASP), JavaSoft’s Java Server Pages (JSP), and Cold Fusion [Forta et al. 1997]) provide a way to scale the dimension of a site by producing HTML pages dynamically from database content, but are implementation-level tools and do not address specification and design activities. —Web-enabled form editors and database publishing wizards (e.g., like Inprise’s IntraBuilder) either port to the Web the traditional clientserver, form-based interface style, or merely expose the database structure as a set of Web pages. —Web application generators (e.g., like Oracle Designer 2000 Web Generator [Gwyer 1996] and Hyperwave [Hyperwave Information Management 1998]) start from conceptual modeling and produce the Web site automatically, but have limitations in the expressiveness of the concepts availACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.

Model-Driven Development of Web Applications: The Autoweb System

•

325

able to the designer for specifying the navigation and presentation requirements of a Web site. In summary, the approach to Web development taken by most products is either confined to boosting implementation productivity or is an adaptation of development methodologies originated in other fields (typically, objectoriented programming and database design), which do not consider the specificity of the Web as a novel communication medium. For the motivations above, a convergence of notations, methodologies, and tools from the hypermedia, software engineering, and database areas is required, to let Web application development leverage the best of all these three disciplines, i.e., the modeling power of hypermedia, the architectural solidity of databases, and the rigorous approach to development of software engineering. 1.2 A Novel Approach to the Development of Data-Intensive Web Sites As in Atzeni et al. [1997], the focus of this paper is on data-intensive Web sites, which are defined as Web sites either offered to the general public on the Internet, or conceived for internal use by organizations on intranets, characterized by high volumes of data to be published and maintained over time. The Autoweb Project, presented in this paper, proposes a methodology and a development environment for data-intensive Web sites, which —adapts current hypermedia design models to Web development and to the needs of automatic software generation; —leverages database technology to store not only the content of a Web application, but also a description of its structure, navigation, and presentation, which enables automatic implementation and better evolution; —defines a software development process for building new Web applications, and for “Web reverse-engineering” existing database applications; —supports such process by means of suitable tools. The main contributions of Autoweb are: —HDM-lite, an hypermedia design model tailored to the development of Web applications. HDM-lite descends from the Entity-Relationship Model [Chen 1976] and from HDM [Garzotto et al. 1993], one of the first hypermedia design models, which has been widely used in the design and implementation of applications on CD-ROMs and kiosks. With respect to previous hypermedia and Web design models, HDM-lite includes a notation for specifying presentation at a conceptual level, i.e., in a way independent of the delivery language and device, which, coupled to primitives for describing structure and navigation, covers all aspects of a Web application and enables automatic implementation. —Two transformation techniques providing (1) the mapping of HDM-lite conceptual schemas into an intermediate relational representation; (2) ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.

326

•


Fig. 1.

Main steps of application development with Autoweb.

the mapping of such relational structures into the physical pages constituting the application. The former is an evolution of the well-known translation of Entity-Relationship schemas into relational schemas, augmented to treat also navigation and presentation aspects. The latter is an original contribution permitting designers to generate Web applications from content and metainformation stored in a relational database, in such a way that the conceptual schema is preserved. —The Autoweb System, a tool suite which supports the definition of applications with HDM-lite and automatically implements the abovementioned transformations. Differently form commercial systems, Autoweb covers not only implementation, but all the activities of Web development (most notably conceptual design) and leverages a conceptual model (HDM-lite) equipped with primitives for navigation and presentation specification. With respect to related research prototypes [Fernandez et al. 1998; Atzeni et al. 1998], Autoweb has a higher level of automation and advocates a top-down approach to Web development, where hypermedia design come first, followed by database design and mapping as support tasks. Also with top-down site generation, the Autoweb System can be used to Web-enable existing legacy databases, as it will be explained in the next Section. The Autoweb approach to Web development has been tested in several applications both in the industry and in academia, with the primary purpose of assessing the acceptance of model-driven, top-down design among developers. As a further contribution, this paper reports the lessons learned during such evaluation activity. 1.3 Preview of the Autoweb Development Process The process of developing a WWW application with Autoweb is depicted in Figure 1. ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.


•

327

The initial step is the collection of requirements and their formalization as a set of conceptual schemas in HDM-lite. This phase is human-intensive, but it is supported by a tool of the Autoweb System called Visual HDM, which permits the editing, archiving, and evolution of HDM-lite schemas. The second phase is the generation of the supporting database; this phase takes as input the HDM-lite conceptual schema and produces as output a relational database, which will support the application at runtime. The output database consists of two parts, a minidatabase containing a representation of the structure, navigation, and presentation (called metaschema database), and an empty database for storing the application content. This phase is totally automated by a tool called VHDM Database Schema Generator. The last phase is the implementation and deployment of the Web application. There are two scenarios for this phase: —The application content does not exist: the empty database produced by Autoweb is filled with structured application content, possibly linked to unstructured data (e.g., multimedia files). This operation is almost totally automated: the application database can be filled via a Web interface automatically produced by the Autoweb DataEntry Generator. —The application content (or part of it) already exists and is stored in a legacy system (e.g., a relational database): in this case, the database schema produced by Autoweb must be mapped onto the legacy data sources, in order to integrate the existing content into the page production process. This activity, although not supported by a specific tool of the Autoweb architecture, can be performed with the help of commercially available data replication tools, which allow the data administrator to map a database schema on top of a set of heterogeneous distributed data sources, using such techniques as views, triggers, and automatically executed data conversion programs (see Section 4.3). In both the above cases, once the content is in place the application pages are produced dynamically by a run-time component called Autoweb Page Generator. To drive the production of pages, high-level presentation directives, called style sheets, are used, which contain a description of how to render content in the selected delivery language (e.g., HTML). Style sheets are visually edited with the Autoweb Style Sheet Editor. 1.4 Preview of the Autoweb Architecture The Autoweb System comprises a Design Environment and a Runtime Environment. The Design Environment consists of a set of application design tools, which sit on top of a common repository containing application projects. Design tools produce application descriptions (called metadata), which are stored in a relational database to enable page production by the Runtime Environment. ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.

328

•


The Runtime Environment is a CGI process, which dynamically produces pages from application metadata and contents. The former are stored in a relational DBMS; the latter are typically stored partly in a relational DBMS and partly in the file system. Pages are generated on-the-fly at each user request, but selected types of pages can be cached in the Autoweb server for improving performance. 1.5 Running Example Throughout the paper, we use a simplified running example to illustrate the features of HDM-lite and Autoweb. ACME Furniture Inc. is an aggressive company thriving in the mail order business. To enlarge its customer base, ACME has decided to put part of its catalog on the Internet. The catalog advertises various types of furniture, e.g., chairs, tables, lamps, and so on, and contains special combinations of items sold at a discounted price. Individual items are described by an image, a textual description, their price, and a set of technical features (dimensions, available colors, . . .). Combinations are illustrated by a captivating photograph, advertising text, and the discounted price. Since ACME has a number of stores throughout the world, information about the store locations is also made available, including the address and contact information, an image of the store, and a map. Users are expected to visit the ACME home page containing a logo of the company and some advertising stuff; from there they can browse the list of special offers, and the item catalog. From the page describing a combination, the individual items forming the combination can be accessed, and conversely, it is possible to navigate from an item to the combinations including it. From the home page, the list of stores can also be reached. A second category of users is also expected: inventory managers. These must directly access the inventory records of the various items at the different stores, either from a store list or from an item list. Their application pages should be textual for network speed and should exclude all customer-oriented advertising. 1.6 Organization of the Paper The rest of the paper is organized as follows: Section 2 introduces the HDM-lite design notation, describing the concepts used for structure (Section 2.1), navigation (Section 2.2), and presentation (Section 2.3) modeling. At the end of the section, a tour of the ACME customer application (Section 2.4) visually demonstrates the way HDM-lite modeling concepts may be rendered by an implementation. Section 3 discusses the techniques for automatically mapping HDM-lite schemas into Web applications: the conceptual-to-logical mapping is presented in Section 3.1, the logical-to-physical mapping is the subject of Section 3.2. Section 4 presents the tools of the Autoweb System, organized into a Design Environment (Section 4.1) and a Runtime Environment (Section 4.2). Section 5 compares the Autoweb System to the related work in industry (Section 5.1) and in academia ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.


•

329

(Section 5.2). Section 6 reports on a number of projects where Autoweb has been used and discusses the lessons learned from such experiences. Finally, Section 7 draws the conclusions and illustrates the ongoing and future work.

2. DESIGNING WWW APPLICATIONS: THE HDM-LITE MODEL Designing a WWW application means, as for any other type of application, describing its most relevant features, without committing to implementation details. HDM-lite is a design model conceived as a Web-specific evolution of HDM (Hypermedia Design Model [Garzotto et al. 1993]), a general-purpose hypermedia design model, which has influenced a number of subsequent proposals, e.g., Isakowitz et al. [1995], Nanard and Nanard [1995], Schwabe and Rossi [1995]. According to HDM-lite, a WWW application is described by a hyperschema, which consists of three different parts: —A structure schema describing the structural properties of the basic objects that make up the application. —A navigation schema specifying the actions available to move from one object to another one (traversal schema) and the access paths to reach the objects of the application (access schema); given a structure schema, there may be several different navigation schemas representing different ways to access and move across the same information. —A presentation schema dictating the way application objects are presented to the user. Given a pair (structure schema, navigation schema), there may be several different presentation schemas representing different ways to graphically render the same application. 2.1 Structure Structure modeling in HDM-lite uses a variant of the Entity-Relationship model [Chen 1976] to define the structure of the objects that constitute the information base of the application. The structure of the application is described by its structure schema, which consists a number of entity-types and a number of link-types. An entity-type describes the features common to a group of application objects, while a link-type describes the features common to a group of connections between application objects. A hyperbase is an instance of a structure schema: it consists of a number of entities and links. Entities represent instances of entity-types, and links represent instances of link-types, that is, actual connections between entities. An entity-type definition may group a hierarchy of component-types, organized as a tree. The component-type located at the root of the tree is called root component-type, and is typically used to convey general information about the entity, and thus to represent the entity as a whole. ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.

330

•


A link-type is a binary connection between entities or components of entities. A link-type has a name and cardinality constraints used to restrict the number of connection instances that an entity or component may take. A specialized link-type, called part-of relationship, connects a component and its (sub)-components; as for normal link-types, explicit cardinality constraints can be stated to impose a minimum and maximum value to the number of subcomponents contained in a supercomponent. The information content of a component-type consists of a number of slots. A slot is an elementary unit of multimedia information. A slot has a name, and a type, i.e., a specification of the structure and range of values the slot can take. Examples of admitted slot types are: text, number, image, video, animation, audio, URL, and HTML. Set- and list-valued attributes are also possible, constructed from the types above, or from records of the types above. The difference between type URL and HTML is that the former is a pointer to a piece of Web information defined outside the hyperbase, whereas the latter denotes an attribute containing HTML text, which is considered part of the entity or component. A subset of the slots belonging to a component may be selected in order to build the external name of the whole component. The external name can be used as a “representative” of an entire component when needed (for example, in a listing of several instances). An entity-type takes its external name from its root component. Note that the HDM-lite concept of external name is not intended as a means for specifying an integrity constraint or the need of a fast access method, but serves a presentation/communication purpose. Therefore, external name values are not required to uniquely identify a component nor to be minimal, and in this respect differ from the apparently similar notions of key and superkey in relational databases, and of object’s external name in object-oriented databases. 2.1.1 Notations and Running Example. Designers specify the structure schema of an application using the graphic notation summarized in Figure 2. Component-types are denoted by rectangles with the component-type’s name at the top (Figure 2(a)). Slots are listed by their name and type inside component-types. List-valued slots are marked by means of a small folder icon placed before the slot’s name. Slots belonging to the external name of a component-type are marked by an uppercase E before their name. Entity-types are denoted by larger rectangles enclosing a tree of components, with the entity-type’s name on top (Figure 2(b)). Part-of relationships are denoted by edges labeled with an uppercase P and a textual label giving the part-of a distinct name (Figure 2(c)); they connect components within an entity-type. Links are denoted by edges labeled with an uppercase L and a distinct name (Figure 2(d)). They may connect components belonging either to the same entity or to different entities. ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.


Fig. 2.

•

331

HDM-lite graphic notations for the structure schema.

Cardinality constraints are textual annotations marking part-of and link edges; their syntax is MAX⫽具maxvalue典,MIN⫽具minvalue典. Such annotations are placed on the side of the edge closer to the component-type for which the constraint holds (Figure 2(e)). The semantics is that every instance of the constrained component-type must participate to at maximum 具maxvalue典 and at minimum 具minvalue典 connections. Defaults are 具minvalue典⫽0 (optional connection) and 具maxvalue典⫽many (unlimited nary connection), which can be omitted. In the structure schema of the ACME application, shown in Figure 3, there are four entity-types: Item, Combination, Store, and InventoryRecord, which are the main concepts appearing in the specifications. All entity-types, but Item, consist of a single component. Entity-type Item describes a complex object characterized by several pieces of information, some of which have multimedia type. Therefore, three component-types are introduced to provide a reasonable partition of an item’s content: ItemDescription, BigImage, and TechRecord. In particular, the root-component ItemDescription holds summary information, consisting of a code of type integer, a name of type text, a thumbnail of type image, a description of type text, a price of type integer, and a type of type text. Cardinality constraints on the part-of relationships within Item state that there may be 0 or more big images, and exactly one technical record. The external names are name and code for Item, name for Combination, name and location for Store, and none for InventoryRecord and BigImages. A link MadeOf connects items to the combination they belong to, and has a 0:N cardinality constraint on the participations of an item, and a 2:N cardinality constraint on the connections of a combination. Two links named ItemAvailability and StoreAvailability connect inventory records to items and stores, respectively. A store or an item may have zero or more inventory records, and an inventory record is linked to exactly one store or item. ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.

332

•


Fig. 3.

The structure schema of the ACME application.

2.1.2 HDM-Lite Structure Model and the Entity Relationship Model. There is an obvious parallelism between HDM-lite constructs and ER primitives: components and slots are similar to entities and attributes, and links and part-ofs to relationships. However important distinctions exist: —As customary in hypermedia, objects of the real world are given an internal structure, which permits the designer to distribute their (possibly multimedia) content into different information segments; this capability, represented by HDM-lite component trees, facilitates the subsequent specification of the presentation semantics of an object, because some shortcuts are available for components. For example, subcomponents by default inherit the external name slots and presentation style of their supercomponent. —HDM-lite links (including part-ofs) are not only the specification of a semantic connection between objects, but imply also a navigation possibility, which is made explicit in the navigation schema. —HDM-lite part-of relationships are given the dignity of first-class modeling concepts, to emphasize the distinction between intra- and interobject relationships. Again, this distinction facilitates the specification of presentation semantics. —In general, the purpose of Entity-Relationship and HDM-lite modeling is different: the former describes objects to be stored in a database; the ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.


•

333

latter models objects to be presented to a reader. As a consequence, the same real-world domain could be described differently under the two perspectives; for example in WWW modeling (and, more generally, hypermedia modeling) normalization is not a concern, and redundancy is not only tolerated, but sometimes necessary to deliver a more self-explaining and readable application. 2.2 Navigation As in traditional hypertexts, navigation of HDM-lite applications can take two forms: contextual and noncontextual. Contextual navigation specifies the way in which it possible to move from an object to another related object, whereas noncontextual navigation describes the access structures to be used as entry points to the application as a whole, independently of any specific object. In HDM-lite, contextual navigation is specified by means of traversals, noncontextual navigation by means of collections. Together, traversals and collections constitute the navigation schema. Contextual navigation is established by explicitly turning links and part-of relationships of the structure schema into navigation commands, called traversals. A traversal is the conceptual-level description of the physical situation in which one Web page describing an object is connected (possibly via an index or some other selection mechanism) to a different page describing another object, chosen out of a set of objects semantically associated to the former object. For each link and part-of relationship R between an object type A and another object type B, the designer has four choices: (1) enabling the navigation of R in both directions, which is achieved by introducing a pair of symmetric traversals (from A to B and from B to A); (2) enabling only the navigation from A to B; (3) enabling only the navigation from B to A; (4) disabling the navigation of R. Note that, differently from a hyperlink in the traditional hypertext sense, which is a connection between two individual hypertext nodes or Web pages, an HDM-lite traversal in general connects an object to a set of objects. This is typical in structured Web applications, where pages represent objects, which are semantically related. A traversal specifies the “navigability” of a semantic association, and its implementation may require several physical hyperlinks between Web pages. Noncontextual navigation is specified by defining the access schema, which consists of a number of collections. A collection [Garzotto et al. 1994] is a set of objects, which may be used as a meaningful index to the content of the application. Normally collections range over (a subset of) the instances of a component (e.g., the set of professors in a Faculty’s site, the set of 17th Century paintings in a Museum’s site). To allow hierarchical indexes, collections may also range over other collections (e.g., the collection of all painters of a museum may be substructured into the collection of 16th Century painters, of 17th Century painters, and so on). ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.

334

•


Among the collections defined in the access schema, one is elected as the entry collection and denotes the home page of the application. If the entry collection is not explicitly defined, a default one is assumed, which is the collection containing all the defined collections. Unlike traversals, which have a precise scope (the object from which they depart and its subcomponents), collections are noncontextual and therefore not attached to any specific object; thus, the problem arises of defining the parts of the application in which each collection can be used to navigate. HDM-lite solves this issue by introducing a notion of scope also for collections, similar to the scope of a variable in a programming language. A collection can be declared by the designer as having one of the following visibility levels: —Global, the collection is accessible from any part of the application. This is the default visibility, and the typical choice for the entry collection. —Entity-level, the collection becomes navigable whenever the user accesses a specific entity-type. For example, the collection “Painting Techniques” may be visible only when browsing instances of an entity Painter, and not from instances of Sculptor. —Component-level, the collection becomes navigable whenever the user accesses any instance of a specific component-type. —Instance-level: the collection becomes visible whenever the user is accessing specific objects. For example, the collection “17th Century Milestones” may be visible only when browsing 17th Century painters. 2.2.1 Navigation Semantics. The navigation schema dictates the navigation options that will be available to users at run-time to explore the hyperbase, typically in the form of active links or buttons in the application pages. Navigation implies moving the focus from one page to another one: for a traversal, the focus moves from the page of the currently visualized object to the page of one of the objects related to it; for a collection, the focus moves from a page where the collection is visible to the page representing one of the collection’s members. In the sequel, we generically call the objects reached by a traversal and the members of a collection (either component instances or subcollections) the targets of navigation. The same navigation command can be executed at run-time in different ways: for example, it is possible first to show an ordered index of the navigation targets, then to choose one and display it, and finally to enable scrolling over other related objects. In general, the operational semantics of a navigation command is the result of the choices taken for five different orthogonal dimensions: —Sorting: are the targets of the navigation sorted? —Filtering: are all the possible targets considered or only a subset? —Indexing: are the targets presented collectively before access by means of an index? —Accessing: how many targets are presented at the same time? ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.


•

335

—Browsing: when one target is accessed, is it possible to “scroll” through the other ones? 2.2.1.1 Sorting. In HDM-lite, the targets of navigation can be ordered in different ways by the designer. —Each traversal and collection can be sorted according to a specific combination of the slots of the navigation targets (e.g., ASCENDING Surname, or DESCENDING birthdate for a target of type Person). —All the instances of a component or of an entity can be given a default order (e.g., ASCENDING title for entity Book), which is used for every traversal and collection lacking a specific sort criterion. 2.2.1.2 Filtering. Filtering is the operation of subsetting the targets of a navigation command in order to navigate a smaller number of elements. Filtering is achieved by specifying a filter, i.e., a parametric predicate on the target objects, attached to the traversal or collection. To specify a filter, the designer lists a set of pairs (slot, operator), where operator is a comparison operator (e.g., equal, greater, like) applicable to the values of the corresponding slot; when the traversal or collection is navigated at run-time, the attached filter is turned into a form, which is submitted to the user to obtain a set of triples (slot, operator, value). Such triples are conjuncted to make up a predicate which is evaluated on the navigation targets: only the subset of objects that satisfy the predicate are considered in the subsequent navigation steps. Presently, HDM-lite filters are limited so that only the slots of the component-type of the navigation targets can be used, possibly plus the external name slots of the supercomponent, if the filter is used to search the instances of a subcomponent. In Section 6.2.2 we comment on the users’ feedbacks on this limitation and sketch our ongoing work to provide a more powerful notion of filtering. 2.2.1.3 Indexing. Indexing is the splitting of the navigation command in two steps: first a list of element denotations is presented; then one entry is selected from the list, and the corresponding target element is actually accessed. In an index, component instances are denoted by means of their external name slots, and collections by means of their name. The default naming of components can be overridden, using a custom set of slots in a specific traversal or collection. When a filter is specified, it is evaluated first, and indexing is performed afterward. 2.2.1.4 Accessing. Access is the operation of actually presenting the complete information of targets (the slots of a component instance or the members of a collection). Access may be performed to (1) a single target (the default), (2) all the targets. When a filter is specified, access is performed after evaluating it, and therefore the cardinality of the set of objects to which access is applied is determined at run-time by the filter. ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.

336

•


Fig. 4.

Summary of the navigation modes supported by HDM-lite.

2.2.1.5 Browsing. Browsing is the operation of scrolling to another element of the same set of targets. Browsing describes at the conceptual level the common situation in physical hypertexts where one page describing an object is shown, which includes also some commands for moving to the previous/next object in some application-dependent order. The browsing commands, made available to move along a sequence of related objects, are —Next,Previous,First,Last: with the obvious meaning. —ToIndex, ToFilter: these optional commands lead back to the index or filter page that the user may have navigated prior to reaching the current object, provided that a filter and/or an index were required in the current navigation mode. 2.2.1.6 Navigation Modes. By selecting different options for sorting, filtering, indexing, accessing, and browsing, different navigation semantics could be defined. We call a specific mix of values for the navigation dimensions a navigation mode. Presently, HDM-lite supports a fixed set of options, illustrated in Figure 4, with mode index as the default. For example, Filtered Indexed Guided Tour is the navigation mode in which first the targets of navigation are filtered; then an index is presented for selecting one target, and finally scrolling commands are enabled to move to the other targets. 2.2.2 Notations and Running Example. The navigation schema is specified by means of a graphic notation similar to that of the structure schema (see Figure 5). Components are represented as named rectangles. Enabled traversals are denoted by solid arrows between components, whereas disabled traversals are shown as dashed arrows (Figure 5(a)). Collections are represented as named triangles of different colors distinguishing the various visibility levels (Figure 5(b)). Collections ranging over one or more component types are connected to them by a hairline (Figure 5(c)). Collections of collections are represented as trees, with the father node representing the enclosing collection, and the children nodes denoting the enclosed collections (Figure 5(d)). ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.


Fig. 5.

•

337

HDM-lite graphic notations for the navigation schema.

Finally, the navigation mode and the (optional) filter slots are represented as annotations to the arrow of a traversal or to the triangle of a collection (Figure 5(e)). To model the navigation requirements of the ACME application, we introduce two distinct navigation schemas, one for generic customers (Figure 6) and one for internal personnel (Figure 7). In Figure 6 there are six traversals and eight different collections: —Two traversals permit ACME customers to go from a combination to its items and from an item to the combinations in which it is bundled. Both traversals are navigated in the indexed guided tour mode, to let customers scroll other items of the same combination, or other combinations for the same item. —Two traversals enable the navigation from an item’s description to its enlarged images and back. Since big images are expected not to be too numerous, they are accessed together in the show-all mode. —Similarly, two traversals enable the access from an item’s description to its technical record. Since there is only one record per item, the navigation mode need not be specified. —The Entry collection is the home page of the application. It is a collection of collections, and its visibility is set to global to let customers go back to the home page from any point of the application. The navigation mode is index so that the home page will contain a list of all the member collections. —Stores, Combos: these are global collections, ranging over all stores and combinations, respectively. Their navigation mode is set to indexed guided tour. From any point of the application it will be possible to reach either the store or the combination list, and from a store or a combination the “previous” or “next” one can be reached. ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.

338

•


Fig. 6.

The navigation schema of the ACME customer application.

—Items: this collection is similar to the previous two, but for a fact: a filtering clause is specified to select at run-time the objects that will appear in the collection’s index. This feature is useful because the number of elements of the collection is large. The filtering condition produces a form permitting the customer to enter desired values for the code, name, and type fields to focus the search on specific items or item types. —Lamps, Tables, Chairs, Collectibles: these collections range over sets of items and have visibility local to a set of instances. The idea is to let the customer access the list of all lamps only when he is looking at a lamp. Note that the navigation schema for the customer application will not permit ACME customers to reach objects of type InventoryRecord. The navigation schema for inventory managers (shown in Figure 7) is less rich: it contains three collections and four enabled traversals. Collections Entry, Stores, and Items are the same as before, whereas the defined traversals let ACME personnel only go from a store or item to its inventory records and back. In this navigation schema, combinations, which are a concern of the marketing staff, are not accessible. 2.3 Presentation Presentation is the third part of an HDM-lite hyperschema, and the one in which the specificity of Web application development, with respect to general hypermedia design, is more evident. ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.


Fig. 7.

•

339

The navigation schema of the ACME inventory manager application.

The basic unit of presentation is the page-type, which is an abstract specification of the layout and content of a set of similar application pages. Three categories of page-types are defined in HDM-lite: —Component page-types: they describe the common presentation features of the instances of a given component-type; there is one component page-type for each component-type of the structure schema. —Collection page-types: they specify the presentation of collections; there is one collection page-type for each collection. —Traversal page-types: they specify the presentation of traversals; there is one such page-type for each traversal of the navigation schema. Page-types are treated as abstract grids, the cells of which may contain different types of presentation elements. A page-type is formally described by means of a style sheet, which is a textual specification of the layout of the page-type’s grid and of the visual elements contained in its cells. The presentation elements, which the designer can place in a cell, are of two kinds: —built-in: presentation abstractions predefined by HDM-lite for rendering the main concepts of the structure and navigation schema; ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.

340

•


Fig. 8.

A page of the ACME inventory manager application.

—black box: arbitrary pieces of content, which the designer can insert into a style sheet to customize the page; they are written in an implementation-dependent language and not described by HDM-lite. 2.3.1 Built-In Presentation Elements. HDM-lite predefines several elements for rendering entities, components, part-of, links, collections, and navigation commands. Each built-in element has a set of properties which can be set by the designer. Figure 8 shows a sample HTML rendition of several built-in presentation elements introduced in the sequel. Component elements are used in style sheets associated to component pages. They are: —The slot panel: an area dedicated to the presentation of the slot values of a component. Customizable attributes include the visual properties (e.g., font, size, color, alignment) of slot labels and values, how to handle null values and anonymous slots (i.e., slots with a hidden label), the formatting rules for values of record, and list types. In Figure 8 the slot panel dictates the rendition of the slots labeled location, picture, map, and mail-to-us. —The component heading panel: an area dedicated to the header information of a component. By setting appropriate properties it is possible to hide the component header, choose the header type (automatic, customtext, custom-image), and set the value of properties specific to the chosen header type (for example, a prefix string to put in front of the automatic header, which is defined as the component’s name). In Figure 8, a ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.


•

341

custom-text heading panel is used to display an appropriate title (“Store Details”) above the slot panel. —The outgoing links panel: an area dedicated to the presentation of the outgoing links of the component. Customizable attributes let the designer decide if outgoing links are represented by textual or iconic anchors, organize the anchors in columns and rows, establish if the anchors of inactive links (i.e., those links whose target set is empty) should be displayed with a “grayed” style or hidden, and supply an image map to use instead of separate icons. In Figure 8, a textual outgoing link panel is placed below the slot panel and displays a single link, from a store to its inventory records. Conversely, in Figure 11 an iconic outgoing link panel is placed just below the advertising message, and shows the link from a combination to its items, by means of a “notepad” icon. Entity elements are used in the style sheets of component pages to describe the presentation aspects related to composite entities. They are: —The part-of panel: an area dedicated to the presentation of the part-of connections of a component due to its embedding into the tree structure of an entity. Different presentation options are made available. In Figure 13, the part-of panel includes two icons (a magnifying lens and a nut and bolt), which respectively lead to the big images and the technical record of an item. —The context panel: an area used in the presentation of a subcomponent of an entity, to recall the path of objects containing it. In Figure 14 the context panel shows the code and name of the item to which the enlarged images belong. Collection and traversal elements are used in the definition of style sheets for collection and traversal page-types: —The collection or traversal heading panel: contains header information for a collection or a traversal. It can be customized in a way similar to the heading of a component. The default header type displays the collection’s or traversal’s name. In Figure 10, the collection heading panel contains the custom-text “Save $$ with our Combinations.” —The index panel: an area supporting the presentation of a list of elements used to represent the content of a collection or the multiple objects reached by navigating an N-ary traversal. In Figure 10, the index panel shows a bullet list with all the available combinations. —The show panel: an area for the presentation of multiple objects belonging to an N-ary traversal navigated in the show mode. In Figure 14, the show panel displays all the big images of an item one after the other. —The filter panel: an area containing the fill-in form resulting from a filter. The navigation console panel is a presentation element including commands to navigate through browsable sets of objects (first, last, previous, next, to-index, to-filter). It may be inserted into component, collection, and ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.

342

•


traversal page styles. Navigation commands may be rendered with textual or iconic representations, and several other options can be set. In Figure 8, the navigation console panel is placed in the leftmost part of the page, and contains icons to reach the index of stores, and the filter and index pages on the collection of items. Utilities are predefined elements which embed in the style sheet the reference to arbitrary external applications supporting functions not provided by Autoweb. Utilities are collected in a utility panel, which can be customized in a similar way as the standard navigation console panel. Examples of implemented utilities are automatically generated active maps of the structure schema, forms to update the currently visualized object, glossaries, and a tool to save the N most recently visited objects into a user-defined run-time collection. In order to enable communication between Autoweb and external applications, a simple syntax is defined for passing parameters (e.g., the external name, object identifier, or slot values of the currently displayed object) to the external utility. 2.3.2 Notations and Running Example. Style sheets are created using a textual notation, exemplified next. A style sheet is composed of two main sections: the object declaration section and the grid definition section. The former permits the designer to introduce the visual elements that will be used to build the page; the latter defines the layout of the page as a grid of cells, where the declared visual elements can be placed. 2.3.2.1 Declaring Visual Objects. Declaration is the process in which a built-in visual element is introduced and its properties are assigned a value. The following example shows the declaration of a part-of panel, which displays only connections between father and children (ShowBrothers⫽no) and uses a textual representation (Type⫽text). Connections are laid out vertically, and inactive links are not shown (ShowInactive⫽no). The panel declaration includes one HTML-specific attribute, which requires the panel to be inserted into a single HTML cell (OneCell⫽yes). [PartOf] ShowBrothers⫽no Type⫽text [TextOrIcon] Orientation⫽vertical ShowInactive⫽no [Text] TextColor⫽Green InactiveTextColor⫽Grey TextFont⫽Helvetica TextSize⫽⫺1 #HTMLspecific OneCell⫽yes [/Text] [/TextOrIcon] [/PartOf]

2.3.2.2 Building the Page Grid. At the outermost level, a page is defined as a set of adjacent rectangular regions. A region is a portion of the ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.


•

343

screen, which can be scrolled independently. Regions may recursively contain other regions, or tables, which in turn are made of cells. Cells contain the actual visual elements. Table definition follows the hierarchical model of HTML 3 tables. We are presently extending the style sheet language to support the ISO/ANSI hypermedia standard HyTime [Derose and Durand 1994], which considers multidimensional tables as finite coordinate spaces and allows a more flexible cell definition. 2.3.2.3 Running Example. To differentiate the presentation for customers and ACME personnel, two distinct families of style sheets are introduced. In the ACME customer application there are three component style sheets (for items, stores, and combinations). By default, the style sheet for items is uniformly applied to all the subcomponents of the Item entity-type, in order to retain the same graphic look and feel in all pages referring to an item. Collection and traversal style sheets are reduced to only four styles: one for the entry collection, and respectively one for each traversal or collection presenting items, stores, and combinations. In this way, lists of objects of a given type (e.g., items) have the same look and feel throughout the application. All the customer style sheets (visible in the next section) have colorful black-box inserts (for example, the ACME logo), which are added to the page to attract customers’ attention, and provide advertising messages. The inventory management application has a single component style sheet, applied to items, stores, and inventory records (shown in Figure 8), and a single collection and traversal style sheet, applied to the four traversals and to the three collections. All styles are almost completely textual, and include as black-box element only the ACME logo. 2.4 Navigating the Running Example To further clarify the meaning of the various HDM-lite constructs introduced in the previous section, we now tour the ACME Customer Application using an (automatically generated) HTML implementation of the HDM-lite specifications illustrated in Sections 2.1.1, 2.2.2, and 2.3.2. Figure 9 shows the home page of the ACME customer application. The page is the rendition of the entry collection of the navigation schema of Figure 6, according to the style sheet defined for the collection (home.sty). The upper half of the page contains a black-box component (a piece of HTML), which includes an image and a link to leave the site and go to the Autoweb Home Site. The bottom part includes the collection’s index panel, which lists the three member collections. By clicking on the Combinations button, one reaches the page shown in Figure 10, which displays the available combinations. Beside the list of combinations, at this stage only the four global collections of the navigation schema of Figure 6 are visible. ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.

344

•


Fig. 9.

Home page of the ACME customer application.

Fig. 10.

Page of the Combinations collection.

A specific combination is accessed by clicking on its external name. For example, after clicking on The Wall, the page shown in Figure 11 appears. The page is produced according to the style sheet for combinations. By clicking on the notepad icon one follows the traversal from combinations to items, which is navigated in the indexed guided tour mode. This leads to the page shown in Figure 12. The layout is similar to that of collection Combinations, but the style differs in the black-box HTML inserts, which ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.


Fig. 11.

Fig. 12.

•

345

Page of combination The Wall.

Page of the items of combination The Wall.

now are tailored to the presentation of items. By clicking on one item (e.g., Grillo) the page shown in Figure 13 is reached, which is more complex due to the nested structure of items. Below the HTML slogan, two panels appear side by side: on the left, two icons (a magnifying lens and a nut and bolt) lead to the enlarged images and to the technical record of an item, as required by the navigation schema of Figure 6; on the right, there is an icon to navigate to the combinations in which Grillo is bundled. Another ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.

346

•


Fig. 13.

Page of item Grillo.

difference is in the navigation panel (left frame): the visible collections now include also collection Lamps, because this collection has instance-level visibility restricted to items which are lamps, and Grillo is a lamp; moreover, navigation commands are made available to scroll through the other items of the same discounted combination, as required by the indexed guided tour navigation mode of the traversal from combinations to items. As a last step, by clicking on the magnifying lens, the page showing the enlarged images of Grillo appears; since the part-of has the showall navigation mode, all images are presented together. Note also that the graphic and navigation context (e.g., the visible collections and the HTML inserts) are retained when passing from the root-component (the item description) to a subcomponent (the enlarged images), to reinforce the feeling of remaining within the same “real-world” entity. 3. AUTOMATIC IMPLEMENTATION The passage from the conceptual schema of the site to the actual pages that constitute the application requires two distinct mappings (summarized in Figure 15), which progressively transform high-level specifications into physical-level pages. The conceptual-to-logical mapping takes as input the conceptual schema of the site, expressed by the structure, navigation, and presentation schema, plus the definition of a fixed database schema for storing metadata about navigation and presentation. The output is twofold: a metaschema ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.


Fig. 14.

Fig. 15.

•

347

Page of Grillo’s enlarged images.

Overview of the Autoweb mapping techniques.

database containing a description of the structure, navigation, and presentation of the site; and an empty database schema ready to store the structured part of the application content, namely, the slot values of components. The logical-to-physical mapping takes three inputs: (1) the metaschema database; (2) the application schema produced by the conceptual-to-logical mapping; (3) the application data. As output, it delivers the application pages in a network-compatible language. ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.

348

•


In the next two sections we present each transformation in detail, leaving to Section 4 the description of the technical aspects on how such mappings are implemented by the tools of the Autoweb System. 3.1 Conceptual to Logical Mapping The conceptual-to-logical mapping proceeds differently for data and metadata. First the HDM-lite structure schema is analyzed to obtain the schema of the application’s supporting database. The transformation of the HDM-lite structure schema into a relational schema is analogous to the mapping of an Entity Relationship schema into a relational one [Ceri et al. 1993]. Each HDM-lite component is mapped to a set of relations; a primary relation hosts the component’s slots that are not list-valued, plus an additional numeric primary key column, which represents a unique identifier (OID). Each list-valued slot is mapped to a secondary table linked to the primary one by a foreign key column. Large multimedia slots can be either stored within the table or in the file system, in which case a reference to their location is stored in the component’s table. Part-of and link connections between components are mapped either to references between component tables or to bridge tables, depending on the cardinality of the connection. As a second step, the HDM-lite schema is parsed to populate the metaschema database with the information about entities, components, traversals, navigation modes, visibility of collections, and style sheets. The metaschema database has itself a schema, which is constant across applications and is determined using reflection, in the following way: the HDM-lite primitives are represented in a reflexive way as a set of HDM-lite entities, components, and links, and then mapped into a set of relations in the same way as the object types of an application. This reflexive approach has an important consequence: the structure, navigation, and presentation schemas are internally represented as any other Autoweb application and thus can be browsed and updated in the same way as application content. This capability facilitates the dynamic adaptation of the application interface to changing users or evolving users’ needs, which is one direction of our ongoing and future work. The conceptual-to-logical mapping is implemented by a tool of the Autoweb System called Visual HDM Relational Schema Generator, described in Section 4.1.2. 3.2 Logical to Physical Mapping The logical-to-physical mapping is an original technique for the production of pages in a network-compatible language from conceptual schemas and content. The transformation requires four steps, shown in Figure 16: —The Parse phase takes as input the style sheet of the page to be produced and parses it to obtain an abstract page skeleton. The abstract page skeleton is a (main memory) representation of the page, which is indeACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.


Fig. 16.

•

349

Phases of the logical-to-physical mapping.

pendent of the network language in which the page will be delivered, and of the specific object that the page describes. —The Data Fill phase transforms the abstract page skeleton produced by the parse step into an abstract page instance, i.e., a page representation still independent of the network language but specific to one object (e.g., the lamp named “Grillo”). The core part of the data fill step is the definition of the queries necessary to fetch from the database data about the instance to be rendered. Such queries are executed and their result integrated into the abstract page instance. —The Language Map phase takes into account language-dependent style sheet properties (e.g., the usage of HTML frames to render independently scrollable regions), and maps the abstract page instance into a page with features specific to the chosen delivery language. —The Code Generation phase actually transforms the language-dependent page instance into a piece of code in the chosen language. The placement of the language map and code generation phases at the bottom of the transformation process has the goal of making the logical-tophysical mapping as independent as possible of the delivery language. Switching from one language to another one entails changing the languagespecific properties introduced in the style sheets and providing a new implementation for the language map and code generation phases. The logical-to-physical mapping is implemented by a tool of the Autoweb System called Page Generator, described in Section 4.2.1. Such tool delivers pages in HTML 3.2. A different experimentation has also demonstrated the feasibility of applying the logical-to-physical mapping to produce applications in Java. 3.2.1 Optimization and Interoperability Issues. Several optimizations are possible to enhance the efficiency of the logical-to-physical mapping. (1) Caching abstract page skeletons: style sheets are reused across instances of the same component and may be common to several compoACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.

350

•


nent types. Therefore, the output of the parse phase can be retained in main memory so as to avoid subsequent parsing. (2) Storing data fill queries: data fill queries have a constant scheme and thus can be turned into parametric stored procedures and put in the DBMS in precompiled format. This solution may save the execution of any dynamic SQL and thus greatly decrease data fill response time. (3) Page caching: this is the most obvious optimization, in which the actual output pages are cached to avoid recomputing the pages that have already been produced. Clearly, page-caching conflicts with the requirement of keeping output pages and database content always aligned, and its applicability must be evaluated in the context of the specific application. Optimizations 1 and 3 have been implemented in the Runtime Environment of the Autoweb System, described in detail in Section 4.2. A variant of the described sequence of mapping steps (which we call “late data fill”) can be adopted to better integrate the page generation process into commercial server-side scripting architectures, like for example, Microsoft’s ASP, JavaSoft’s JSP, and similar products. By postponing the data fill phase after the code generation phase, it is possible to produce at compile-time page templates, which translate the presentation dictated by a style sheet into a specific mark-up language, and embed SQL queries for content retrieval. These templates can be installed in a commercial Web server supporting their script language, to be interpreted and filled-in with content at run-time. Late data fill is the focus of our current work on the architectural revision of the model-driven page generation process. We comment on this aspect in Section 6.2.10. More complex optimization strategies have been recently proposed, leveraging such techniques as view materialization, data and page caching, and context-based caching [Florescu et al. 1999a; 1999b; Bernstein et al. 1999]. We will address advanced optimization policies in our future work.

4. THE AUTOWEB SYSTEM The architecture of the Autoweb System, shown in Figure 17, distinguishes a Design Environment and a Runtime Environment. The Design Environment comprises all the Visual HDM (VHDM, for short) tools, which sit on top of a common design repository containing application projects. An application project collects all the development documents related to an application: HDM-lite schemas, style sheets, prototype test data, and SQL files containing the instructions for generating the repository. The Runtime Environment comprises all the Autoweb tools, which operate on top of the relational repository hosting the application data and metadata. ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.


Fig. 17.

•

351

Architecture of the Autoweb system.

4.1 The Design Environment The Design Environment supports the conceptual specification of a Web application and the generation of its supporting database; it consists of four tools, as shown in Figure 18. 4.1.1 Visual HDM Diagram Editor. The VHDM Diagram Editor permits the user to define an HDM-lite conceptual schema. To facilitate visualization and editing, three different perspectives are offered, which isolate and display separately the structural, navigation, and presentation aspects. In the structural perspective (shown in Figure 3), entities, components, and links are displayed according to the notation introduced in Section 2.1.1. In the navigation perspective (shown in Figures 6 and 7), links and part-of edges between components are replaced by their underlying traversals, and collections are introduced and represented according to the notations of Section 2.2.2. Collections and traversals can be annotated with the specification of their navigation semantics, by double-clicking on them in the diagram of the navigation schema. Finally, in the presentation perspective (shown in Figure 19), component, collection, and traversal page-types may be shown, each one associated to the chosen presentation style. By clicking on a page-type, it is possible to invoke the VHDM Style Sheet Editor to define and store new presentation styles. 4.1.2 Visual HDM Relational Schema Generator. The VHDM Relational Schema Generator implements the conceptual-to-logical mapping described in Section 3.1; it maps an HDM-lite schema into a set of SQL files ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.

352

•


Fig. 18.

Architecture of the design environment.

Fig. 19.

Interface of VHDM diagram editor.

containing DDL statements for creating the relational tables of the application schema, and DDL and DML instructions for creating and populating the metaschema database. As usual, the mapping from a conceptual schema to a relational one entails a partial loss of semantics, which must be restored by ad hoc means. In particular, the explicit cardinality restrictions and the implicit referenACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.


•

353

tial integrity requirements on part-of and link connections between components are not mirrored in the generated relational schema. To cope with this deficiency, two solutions are available: —Server-side enforcement: the VHDM Relational Schema Generator can be instructed to produce additional SQL definitions (foreign key constraints or triggers) for the automatic enforcement of data integrity after the update of the application-structured content. This solution, mandatory if the content is updated via external applications, depends on the integrity-checking capabilities and syntax of the target relational system (and thus requires the Schema Generator to customize the output code for the different relational DBMSes). —Client-side enforcement: the generated schema is not augmented with integrity-checking features. Integrity is enforced on the client by the data entry application automatically generated by the Autoweb DataEntry Generator (described in Section 4.2.2). This solution is only viable when the application content is updated exclusively through the Autoweb DataEntry Generator. 4.1.3 Visual HDM Prototype Generator. One of the key factors for making model-driven design effective is the possibility of delivering fast prototypes of conceptual schemas in the early stages of development, when the final architecture of the site may not be in place yet. To enable fast prototyping, Autoweb includes a tool named VHDM Prototype Generator, which works in cooperation with the Diagram Editor and Schema Generator, and can be invoked to produce “in one click” a working implementation of the current conceptual schema on top of synthetic data. The user can specify a number of preferences to direct the generation of test data (e.g., sample textual and multimedia files to populate slot values, the minimum and maximum number of instances of components and links, and so on). The Prototype Generator produces a set of SQL files containing the DML statements needed to automatically instantiate a limited-size data repository and obtain the desired prototype. 4.1.4 Visual HDM Style Sheet Editor. The VHDM Style Sheet Editor (shown in Figure 20) can be invoked from the presentation view of VHDM Diagram Editor to visually define a style sheet. In the layout window, the page grid can be defined with the help of commands for splitting, merging, and resizing cells. Once the grid is consolidated, an element palette can be used to insert content elements in the grid. A double click on each built-in element activates a dialog box for defining the element’s properties. Blackbox elements can also be defined, presently limited to HTML fragments. The Style Sheet Editor visualizes a graphical representation of the page that will be generated, and outputs the .sty file required by the Autoweb Page Generator. ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.

354

•


Fig. 20.

Interface of VHDM style sheet editor.

4.2 The Runtime Environment The Runtime Environment implements the logical-to-physical mapping described in Section 3.2. It is responsible of managing Autoweb applications and serving page requests coming from remote users. It consists of four tools, as illustrated in Figure 21. 4.2.1 The Autoweb Page Generator. The Autoweb Page Generator dynamically delivers application pages constructed from the hyperbase objects, the metainformation about navigation, and the presentation style sheets. The present architecture of the Page Generator (shown in Figure 22) includes a Dispatcher, a Cache Manager, and a Server Process. The Dispatcher acts as a gateway between the HTPP server and the Server Process, which is in charge of connecting to the database and generating the physical pages. The Dispatcher interprets user requests expressed as Autoweb URLs. Autoweb URLs encode in the query-string the parameters that qualify the request coming from the client; these parameters include the identifier of the requested component-type, the identifier of the requested instance, the identifier of the traversal or collection used to access the requested instance, the identifiers of the father component-type and instance, and the ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.


Fig. 21.

Architecture of the runtime environment.

Fig. 22.

Components of the runtime environment.

•

355

identifier of the navigation mode to use for producing the page. The Dispatcher verifies if a page corresponding to a request with the same parameters is in the cache, and, if not, submits the proper page request to the Server Process. The Cache Manager stores the pages produced by the Server Process in the file system, to speed up future requests with the same parameters. To this purpose, each page is assigned a hash key produced by encoding its request parameters. The Cache Manager works according to a Least Recently Used (LRU) policy, and can be configured at application definition time so as to cache only selected types of pages (e.g., only pages having a specific component-type). In this way, it is possible to exclude from caching frequently updated pages (e.g., the collection of all items of the ACME catalog). ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.

356

•


The Server Process is the component actually in charge of producing the application pages. Typically, the Server Process goes through the following steps: —Metaschema preparation: the metaschema database is loaded into main memory at the first user’s request. —Style sheet parsing: style sheets are loaded in main memory after the first request. —Data fill query preparation and execution: based on the request parameters, the Server Process prepares the query to submit to the DBMS. When all the queries necessary for serving the client request are prepared, they are submitted to the DBMS; results are collected and formatted in main memory. —Page production: the output page is assembled from the instructions contained in the cached style sheet and from data retrieved by the queries. —Page delivery: the output page is sent back to the Cache Manager, which possibly caches it, forwards it to the HTTP Engine and through it to the remote user. In the present architecture, several applications can be up and running at the same time and the Page Generator can concurrently interact with multiple clients browsing the same application or different applications. However, user requests are enqueued and served sequentially by the Server Process. 4.2.2 The Autoweb DataEntry Generator. The Autoweb DataEntry Generator has the same architecture as the Page Generator, but serves write requests. Mutual exclusion of write requests is enforced by the concurrency manager of the underlying DBMS. The data entry interface, shown in Figure 23, contains commands to create, modify, and delete an object (either a component or a whole entity), modify part-of and link connections, populate collections, and materialize auxiliary data structures useful to speed up navigation commands (e.g., guided tours are supported through dedicated tables storing the previous/ next relationships of objects). 4.2.3 The Autoweb Page Grabber. The Autoweb Page Grabber materializes the pages of an application for off-line publication. The Page Grabber fetches pages from the Page Generator, parses them, retrieves embedded HTTP URLs and pushes them into a stack, replaces URLs embedded in the page with automatically generated file URLs, archives the updated page under the appropriate pathname, and finally recursively grabs the pages whose URLs are in the stack. Differently from commercial grabbers, the Autoweb Page Grabber is schema-aware, and thus can be instructed to materialize only pages selected based on semantic properties. For example, it is possible to download only selected instances of a given entity-type or component-type, or all the members of a collection. ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.


Fig. 23.

•

357

Data entry interface automatically generated by the Autoweb DataEntry Generator.

The Page Grabber has a dual interface: it can be used as a batch program to materialize an Autoweb application at selected times, or can be remotely invoked by users via a Java interface, to visually define the part of the application to capture. 4.2.4 The Autoweb Administrator. The Autoweb Administrator complements the Page and DataEntry Generators, by offering administrative services spanning multiple applications and Autoweb installations. The tool permits the system administrator to shutdown and bootstrap either the whole Runtime Environment, or individual applications, and access log information. 4.3 Mapping the Runtime Environment to Legacy Data Application data can be both structured or unstructured. Structured content can either be stored directly in the application database generated by the conceptual-to-logical mapping, or it may reside in external databases. When the content is stored in external databases, the problem arises of connecting the legacy data sources to the database schema generated by Autoweb. This problem is not addressed in an ad hoc manner in the Autoweb System, because it can be solved with the help of standard commercial products. In particular, data replication tools (e.g., Microsoft DTS) can be used to map a target database schema on top of a set of heterogeneous distributed data sources, using a variety of different techniques, which include relational views, triggers, and automatically generACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.

358

•


ated data conversion programs, which can be scheduled by the data administrator for execution at specific points in time. 4.4 Implementation Notes, Performance, and Security Issues Presently the Design Environment is implemented as a standalone application running on Unix workstations (Sun Sparc with Sun Solaris V.2.4 O.S.) and PC (with Linux and Windows NT O.S.). The most recent version of the Runtime Environment (2.0) is available under the Linux, Sun Solaris, and Windows NT Operating Systems. The Diagram Editor, the Schema Generator, and the Prototype Generator are written in Java; the produced SQL code has been tested on the MiniSQL system, a freeware DBMS for the Linux O.S., on Oracle Version 7, and on Microsoft’s SQL Server. The Style Sheet Editor is also coded as a Java application, using the Swing libraries by Sun Microsystems. The Runtime Environment consists of over 400 classes for a total of more than 50,000 lines of C⫹⫹ code; the client-server communication between the Dispatcher and the Server Process is built using the implementation of OMG’s Corba by Xerox Corporation (ILU 2.0); the system is designed to work with any HTPP server supporting the CGI protocol and any browser supporting HTML V.2 or higher. The Page Generator implements the abstract page skeleton and output page caching optimizations discussed in Section 3.2.1. Moreover, to increase efficiency in the interaction with the DBMS, the Page and DataEntry Generators keep the connection to the DBMS open across different client requests, avoiding the overhead of opening and closing the database connection at each request. Further language-specific optimization is pursued in the construction of the HTML pages by reducing the number of hits made by the client to render an application page; in the case of a presentation style requiring frames, Javascript headers enabling the retrieval of an entire multiframe page in one shot are included by the Page Generator in the delivered HTML pages, provided that the client’s browser is recognized as supporting Javascript. A second version of the Page Generator (called J-Autoweb Page Generator) has been implemented to experiment with a different mapping. JAutoweb is written in Java, and the Page Generator runs as an applet on the client. Database connectivity is ensured by the JDBC gateway library and application pages are delivered in Java. After experimentation, the pure client-side approach has been modified due to performance problems caused by the complex interaction with the database needed to fetch metaschema and schema information. To increase performance, the Page Generator has been split in two components: a Java application, which includes the functions for database interaction, resides on the server and manages metaschema and schema queries; a Java applet (which includes the functions for request interpretation and page construction) runs on the client and formats raw data coming from the database into application pages with the required presentation and navigation semantics. ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.


•

359

The present version of the Autoweb System (2.0) has a limited support for access control and security. All Autoweb URLs are encoded to protect the Dispatcher from malicious users sending incorrect query-string parameters. Write requests are addressed to CGI URLs which belong to a protected domain, so that the HTTP credential verification mechanism is exploited to assess the user’s rights. Once access to a data entry application has been obtained, no further control is applied. Conversely, read requests are always served. However, the HDM-lite model has already been extended to support user profiling. A notion of user is present in the metaschema, and access rights of various type can be associated to each user. We are currently working to map conceptual-level access rights to the authorization facilities of relational DBMSes, to exploit security checking at the DBMS level.

5. COMPARISON WITH RELATED WORK The development of Web applications is the subject of an enormous amount of work, both in the industry and in academia. In this section we present the most relevant proposals related to the Autoweb approach: first we consider the solutions coming from the industry; next we review the work in academia. While reviewing related approaches, we outline similarities and differences with respect to Autoweb. In the conclusions (Section 7), we further summarize the original contribution of Autoweb, which is not only technical, but also includes the experience of evaluating model-driven Web design with real users. The interested reader may find a more in-depth technical evaluation of the state-of-practice of Web development in Fraternali [1999], where over 40 different commercial products for Web development are evaluated and classified, and Autoweb and other related research projects are analytically contrasted with one another and with commercial solutions. 5.1 Industrial Products for Web Development The features of the Autoweb System have been designed after an in-depth review of the state of the practice. For brevity, we only present a limited selection of typical products supporting different forms of data-intensive Web site development. The categories of tools most relevant to the Autoweb approach are the following ones: —Visual Editors and Site Managers: these products typically bundle a WYSIWYG HTML editor, which permits the user to design sophisticated HTML pages without programming, and a visual site manager, which displays in a graphical way the content of a Web site and supports functions like page upload, deletion, and renaming, and broken-link detection and repair. Among the products in this category there are Adobe SiteMill and PageMill, NetObject Inc.’s Fusion, Microsoft’s FrontPage, and many others. ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.

360

•


—Hypermedia Web Generators: these products were initially conceived for the development of off-line hypermedia applications, and have been extended to support the generation of applications for the Web in recent times. Macromedia Director and Asymetrix Toolbook are representatives of this class. —Web-Database Gateways: these products pioneered the integration of databases and the Web; most of them work by providing an extension of HTML which is used to define HTML page templates embedding database queries; page templates are then processed at run-time, typically by a CGI script or by a dedicated process, and transformed into ordinary HTML pages merging data from the database and user-defined HTML text. As a representative of this category, which includes such products as Vignette’s StoryServer, Informix AppPages, JavaSoft Java Server Pages (JSP), and Microsoft’s Active Server Pages (ASP), we will review in detail the Cold Fusion Web Database Construction Kit by Allaire Inc. [Forta et al. 1997]. —Web-Based Form Editors and Database Web Publishing Wizards: these products are the adaptation to the Web of preexisting environments for the development of form-based interfaces to databases; some products work by providing a form interpreter which can be executed by a Web browser; other tools provide a conversion from their native formats into a network language, notably HTML or Java. For example, the Oracle Developer 2000 tool set [Hoven 1997] enables Oracle users to run the same form-based and report applications in client-server mode or on the Web. As another example, in Microsoft Access ’97 all database objects (tables, queries, forms, and reports) can be exported into HTML, either statically or dynamically, and a publishing wizard permits the personalization of the generated pages by applying predefined or custom presentation models to the exported database objects. —Model-Driven Web Generators: a few products tackle the development process from conceptual modeling to implementation, and provide advanced application generation functions. As representatives of this category we review the award-winning Oracle Web Development Suite, which comprises Designer 2000, a CASE tool for generating Web applications from augmented Entity-Relationship diagrams [Gwyer 1996], and Hyperwave [Hyperwave Information Management 1998], a document management system with an underlying hypertextual model. 5.1.1 The Cold Fusion Web Database Construction Kit. Allaire’s Cold Fusion was one of the first products to offer a commercial solution for producing HTML pages dynamically from database content. The core of Cold Fusion [Forta et al. 1997] is an extension of the HTML markup language, called Cold Fusion Markup Language (CFML). CFML includes a number of new tags, which can be used to specify how to extract database content to fill HTML pages. The most important CFML tag is the CFQUERY tag, which permits one to embed an arbitrary SQL query into an HTML page. The result of the query can be referenced in the normal ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.


•

361

HTML text by using CFML variables, i.e., special strings processed by the Cold Fusion interpreter at runtime: in particular each column in the target list of a query defines an implicit variable, and the result of a multiple-rows query can be processed iteratively by enclosing a piece of HTML text within the CFOUTPUT tag. Beside queries, CFML extends HTML with many other features like conditional and iterative control structures, functions, expressions, and modularization constructs, yielding a powerful scripting language. The development process with Cold Fusion is an extension of the traditional design of database applications, which includes the conceptual and logical design of the database schema and the design of the queries serving the application; these queries, once designed and debugged, are embedded into a Web interface using CFML. Although not supported by a conceptual model, Cold Fusion includes limited application generation facilities called Application Wizards, which guide the user in selecting the tables involved in the application and in choosing a few parameters for customizing the interface (for example, the message to display after an update performed by a data entry application). The generator produces standard CFML templates, which can be either run as they are or manually adapted to enhance the quality of the interface. Cold Fusion and more generally comparable server-side scripting engines are implementation tools, with a more restricted coverage of the development process than the Autoweb System. They offer low-level development abstractions, like database queries and constructs in the target mark-up language (typically, HTML augmented with scripting primitives), and thus lack a high-level conceptual model of the site under development. Indeed, they could be used in the implementation phase, to map an HDM-lite conceptual model into a set of server-side script templates, as illustrated in Section 6.2.10. 5.1.2 The Oracle Web Development Tool Suite and Designer 2000. The Oracle Web Development Suite includes Designer 2000 [Gwyer 1996], Developer 2000 [Hoven 1997], and the Oracle Web Server. We focus on Designer 2000, whose Web Generator completes the Oracle platform with a CASE environment for deploying Web applications according to a modeldriven approach. Designer 2000 is an environment for business process and application modeling, integrated with software generators originally designed to target traditional client-server environments, namely Oracle Developer 2000 and Visual Basic. The Web Generator enables previous applications developed with Designer 2000 and deployed on LANs to be ported to the Web, as well as the delivery of novel applications directly on the Internet or on intranets. The Web Generator takes its inputs from the Designer 2000 design repository and delivers PL/SQL code, which runs within the Oracle Web Server to produce the desired HTML pages of the application. More specifically, three inputs drive the generation process: ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.

362

•


—A Web-enhanced database design: database design diagrams, defined with the Data Diagrammer Tool, specify the structure of the database in terms of tables, views, foreign key relationships, and integrity constraints. These constitute the schema of the future Web application. A few visual features can be specified in the schema: for example, column definitions can be supplemented with caption text and display format (e.g., a pop-up list). Moreover, some integrity constraints (e.g., valid ranges) can be attached to columns and tables, and the Web Generator can be instructed to produce code for checking them on the server via PL/SQL or on the client via Javascript. —The definition of applications and modules: modules correspond to basic application units and are defined by means of the Module Data Diagrammer; each module consists of a sequence of components, linked by foreign key relationships; a component includes a single table, or a central table plus a number of collateral look-up tables. Components are the atomic unit of access: they define the columns that an application can read and update. The order of components in a module determines the sequence of HTML pages that will be produced for that module. The overall structure of the Web application is established by drawing links between modules with the Module Structure Diagrammer: the designer may define which modules can be called by a given module and introduce fictitious modules acting as hierarchical indexes over other modules. —The user preferences: user preferences are parameters, which can be set to govern the visual aspect of the generated application; they can be defined either globally, at the module, or at the component level. Example of preferences are colors, headers and footers, background images, and help text. The display of individual columns may be enhanced by specifying predefined formatting options (e.g., BOLD, ITALIC, IMAGE, MAILTO). From these inputs, the Web Generator produces fixed-format Web pages; one set of correlated pages is generated for each module, and links between different modules specified with the Module Structure Diagrammer are turned into hyperlinks between the HTML startup pages of modules. There are seven types of pages which can be generated to render a module: startup page, query form, record list, view form, list/form, insert, and delete form. Each module begins with a startup page, which is displayed at module entry and normally contains the hyperlinks to other related modules, and the record list representing the output of the query corresponding to the first component of the module. A record list usually shows only a subset of the table columns: the full detail of each record is obtained by clicking on a hyperlink going out of its key, and can be displayed either on a separate “view form” page, or on the same page using a two-frames list/form layout. Within a module, a component (master) may be connected to another one (detail) via a foreign key: this relation is represented as a detail record list, either incorporated in the same page of the master record, or in a separate hyperlinked page. ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.


•

363

Designer 2000 and Autoweb share the same model-driven approach to the generation of application pages and the high level of automation, whereby a full-fledged data-intensive Web site can be produced without programming. However, these tools greatly diverge on the conceptual models at the base of the generation process, and this clearly differentiates the focus and applicability of the two approaches. Designer 2000 is restricted to traditional database conceptual and logical models (the EntityRelationship and relational models) and adopts a database-centric design process; its Web generator derives from standard code generation tools for client-server architectures, adapted to the Web. As a consequence, the result of the generation process is essentially a form-based application with added HTML customization. Conversely, Autoweb starts from a hypermedia-oriented conceptual model, where the structural perspective is completed by navigation and presentation schemas, which have no counterpart in Designer 2000. Design starts from a conceptual specification of the desired content, appearance, and supported navigation of the application, and database structures are derived afterward, to enable the implementation of the hyperbase. However, modeling concepts like entities, components, collections, part-of and links, navigation modes, and presentation styles are totally independent on the technology for implementing the hyperbase and could be mapped to any repository model and system. 5.1.3 Hyperwave. Hyperwave Server [Hyperwave Information Management 1998] is an advanced document management environment, which permits remote users to browse, annotate, and maintain documents distributed over the Web. Hyperwave has a very basic, yet powerful, high-level model of a Web application, which is considered as a set of document collections organized hierarchically. Collections may contain subcollections and documents, and have different navigation semantics based on their type. Documents in a collection can be linked and annotated with a number of metadata. Links are specified at the conceptual level, outside the involved documents, and are managed by the Hyperwave server; in this way, any hypertextual structure can be superimposed over a set of otherwise independent documents. From the description of collections, links and metadata, Hyperwave generates a Web interface, which enables both user-oriented and administrative functions, like collection and link browsing, searching and notification, remote document management, fine-grain version and access control, collaborative work, and personal annotations. The generated interface has a default presentation, which can be personalized using a proprietary template language. Although document-oriented, the Hyperware server relies on database technology and on a multitier architecture to store metadata and manage links, which may also span multiple servers. Hyperwave and Autoweb both start from a hypermedia conceptual model, and share the idea of separating the specification of the hypertext nodes ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.

364

•


(documents in Hyperwave, components in Autoweb) and links, and of automatically generating a Web interface from the high-level specification of the hypertext topology. The most important difference between Hyperwave and Autoweb is in the expressive power of the respective conceptual models: Hyperwave relies on a very basic hypertext model, which does not capture user-defined object-types, sophisticated navigation modes, and orthogonal, language-independent presentation specifications; Autoweb has a richer structural and navigation model, and permits the designer to express the presentation semantics of objects and collections at an abstract level. 5.2 Related Research Work Beside the work in the industry, also in several research fields there is an increasing interest in advanced Web development tools and approaches. We start by reviewing two projects close to the objectives of Autoweb, and then discuss other contributions which have influenced our work. 5.2.1 Araneus. Araneus is a project developed by researchers of Universita` di Roma Tre, whose goal is to define an environment for managing unstructured and structured Web content in an integrated system, called Web Base Management System (WBMS) [Atzeni et al. 1998; Atzeni et al. 1997]. In a WBMS, database technology is used to store both data and metadata describing the hypertextual structure of Web sites. Araneus adopts a mix of database and hypermedia modeling concepts, embedded in a development lifecycle, which also intermixes database and hypermedia design tasks. The structure of the application domain is described by means of the Entity Relationship Model; the navigation aspects are specified using the Navigation Conceptual Model (NCM), a notation inspired to RMM [Isakowitz et al. 1995], simplified in several operational details. As in HDM-lite, conceptual modeling is followed by logical design, using the relational model for the structural part, and the Araneus Data Model (ADM) for the navigation aspects. ADM is based on the notion of page scheme, a languageindependent page description notation based on such elements as attributes, lists, link anchors, and forms. In Araneus, development proceeds according to a structured process organized along two tracks: database and hypertext. Database design and implementation are conducted in the customary way using the EntityRelationship Model and mapping it into relational structures. After ER modeling, hypertext conceptual modeling formalizes navigation by turning the ER schema into an NCM schema; this shift requires the activities of macroentity design, union node design, directed relationship design, and aggregation design. The next step, hypertext logical design, maps the NCM schema into several page-schemes written in ADM, which requires macroentity mapping, directed relations mapping, aggregation mapping, and possibly page-scheme restructuring. Finally, implementation requires writing page-schemes in the Penelope language [Atzeni et al. 1997], which ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.


•

365

specifies how physical pages are constructed from logical page schemes and content stored in a database, in a way similar to commercial HTML-SQL integrators. Araneus also defines a tool environment constituting the Web Base Management System (WBMS) [Atzeni et al. 1998]. Database technology is used to store both data and ADM page-schemes describing the hypertextual structure of Web sites. The WBMS offers facilities for wrapping existing HTML sites into ADM structures (the Editor and Minerva tools), and for issuing queries and defining views over wrapped sites using the Ulixes language. The creation of new sites from database content and ADM schemes is supported by the Penelope module, which offers a language for defining page schemes, matching them to database tables, and updating the content of a site. The Araneus conceptual model and design process are comparable to those of Autoweb, with some differences: Araneus has a separate page composition notation (ADM), whereas Autoweb infers page structure from the specification of components and links. This makes Araneus more flexible in the specification of multiple hypertextual views over the same data. On the other hand, Araneus requires a proprietary HTML-dependent template language for specifying presentation, whereas Autoweb separates presentation specification and implementation, by leveraging abstract style sheets, which can be automatically mapped into a specific markup language. 5.2.2 Strudel. Strudel is a project of AT&T Labs [Fernandez et al. 1998], which aims at experimenting a novel way of developing Web sites based on the declarative specification of the site’s structure and content. In Strudel both the schema and the content of a site are described by means of queries over a data model for semistructured information. Content is represented using the Uniform Graph Model, a graph-based data model capable of describing objects with partial or missing schema. As a starting point of the construction of a site, external data sources, e.g., HTML files or relational databases, are translated by means of wrappers into the Strudel internal format. In this way, it is possible either to restructure an existing HTML site, or Web-enable a legacy data repository. The design of a Web site is done in a declarative way, by writing one or more queries over the internal representation of data, using the Strudel query language (StruQL). Such queries identify the data to be included in the site, and the links and collections of objects to be provided for navigation. Presentation is added as a separate dimension by means of HTML templates; these mix HTML presentation tags and special-purpose tags, which are bound at HTML generation time to the objects resulting from the site definition queries. The templates determine the rendering of the site definition queries in HTML. The core contribution of Strudel is the idea of defining a Web site as a set of queries over semistructured data, which enables the declarative specification of sites. However, differently from Autoweb, Strudel specifications ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.

366

•


intermix content definition, navigation and presentation: navigable links and index collections are specified inside the queries that define the content of the site, and the final structure of the HTML pages that compose the site and the hyperlinks between such pages are dictated by the HTML templates created for managing the presentation. 5.2.3 Hypermedia Design Models. Autoweb draws upon a number of design models and processes proposed in the past for offline and online hypermedia applications, namely [Balasubramanian et al. 1995; Garzotto et al. 1993; Halasz and Schwarz 1994; Isakowitz et al. 1995; Nanard and Nanard 1995; Schwabe et al. 1992; Schwabe and Rossi 1995]. The first hypermedia model to gain acceptance was the Dexter Model [Hardman et al. 1994], which considered a hypermedia application at three levels: the run-time, storage, and within-component levels. The storage level describes the general structure of the hypermedia application, which consists of a set of nodes (either atomic or composite) connected by links. An important aspect of the Dexter Model is the recognition that presentation, although implemented at the run-time level, is a property of components and as such must be specified at the storage level. Many subsequent proposals in the hypermedia field started from the Dexter Model and added more sophisticated modeling primitives, formal semantics, and structured development processes. MacWeb [Nanard and Nanard 1991] adds an object-oriented flavor and distinguishes between object instances (chunks) and object types (chunktypes). Object-oriented methods are used to describe navigation properties both at the type and at the instance level. HDM [Garzotto et al. 1993] introduces the distinction between base and structural navigation and stresses a notion of model-based design clearly separating the activities of authoring in-the-large (i.e., schema design) and authoring in-the-small (i.e., content production). RMM [Isakowitz et al. 1995] proposes a modeling language built upon the Entity-Relationship model and goes further along in the definition of the development process, by proposing a seven-step approach to hypermedia design in the tradition of software engineering. RMM also gives guidelines for typical hypermedia design tasks. More recent methodologies tailor hypermedia design to the development of data-intensive Web sites. Takahashi and Liang [1997] propose a method to develop Web-Based Information Systems (WBIS), which extends RMM to the modeling of dynamic aspects through a scenario-based approach. Scenario analysis gives input to hypermedia design and permits developers to differentiate entities by their role (e.g., agents, events, and products). A different approach to modeling hypermedia applications is taken by the HyTime standard [Derose and Durand 1994], a hypermedia specification language based on SGML. HyTime relies on the notion of architectural forms, which are metadeclarations of hypermedia document features compliant to SGML syntax and semantics. Actual hypermedia applications are described by instantiating HyTime architectural forms to obtain actual ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.


•

367

Document Type Declarations (DTDs). HyTime standardizes all the most important features of hypermedia applications: structure, in-line (contextual) and out-of-line (independent) links, anchors (i.e., the targets of links). An important part of the standard is devoted to the specification of presentation-related aspects, described by the notion of schedule, which may be used to organize presentation both in space and time. In Autoweb, we have adopted a consolidated hypermedia design method (HDM) as a starting point, and we have simplified it to make it more easily implementable in a CASE system. In particular, advanced constructs like multipolar links and fine-grain component structuring primitives, nonstrictly necessary in the Web context, have been dropped. Conversely, to enable automatic application generation, HDM has been extended with a built-in navigation semantics, expressed by navigation modes and collection visibility rules, and with an original presentation model, inspired to the notion of bidimensional schedule of HyTime [Derose and Durand 1994]. In Section 6.2.2 we report on the feedbacks on the modeling power and usability of HDM-lite gathered during several experiences with developers. 5.2.4 Hypermedia Design Tools. Automated support to the hypermedia design process has also been investigated, and experiences are reported by several authors, e.g., Andrews et al. [1995], Diaz et al. [1995], Kesseler [1995], Schwabe and Rossi [1995]. The HyperG project [Andrews et al. 1995] includes both a model and a set of tools for the automatic generation of Hypertext and WWW applications. The main conceptual difference between HyperG and Autoweb is in the respective models: HyperG has a very basic model, close to the Dexter Model, based on a node-and-link paradigm; the descriptive power of the model, and consequently the capabilities of the tools, is limited in comparison to that offered by HDM-lite and the Autoweb System. The RMC tool [Diaz et al. 1995], built around the RMM model [Isakowitz et al. 1995], has objectives similar to Autoweb, with some important conceptual and architectural differences: the RMM model does not take into account collection design and abstract presentation specification, and the architecture of RMC is not based on DBMS technology, but on an internal repository, from which the HTML pages are generated in a precompiled fashion. Thus, it is unclear if RMC could be used to implement sizeable applications with frequently updated data. HSDL [Kesseler 1995] is another tool for supporting the design of large hypermedia applications, based on an HDM-like design notation. The tool has advanced features for tailoring the navigation and presentation semantics of the hypertext. Differently from Autoweb, HSDL does not rely on DBMS technology to store the schema and application data, and produces the final HTML pages by compiling programming language instructions (called expanders) attached to the schema objects and possibly also to instances. Expander programming seems to be a nontrivial task, although it permits very sophisticated control over the generation of HTML pages. ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.

368

•


OOHDM [Schwabe and Rossi 1995] advocates the use of object-orientation to model advanced navigation and interface features of hypermedia applications; classical object-oriented concepts and notations are applied in the design process, and the issues of implementation in hypermedia authoring environments and in HTML are discussed. WebArchitect and PilotBoat are the tools supporting the design of WBIS according to the approach of Takahashi and Liang [1997]. In WebArchitect a Web application is constructed by superimposing metalevel links (i.e., links defined outside the application objects) on top of Web resources, and by manually implementing the structure and navigation identified during hypermedia modeling. PilotBoat is a client-side dedicated browser, needed to navigate along metalevel links. This architecture also requires an extension of the HTPP server to support methods for linking and unlinking resources. With respect to the mentioned research prototypes, the Autoweb System exhibits a higher level of automation and life-cycle coverage: in Autoweb, a running site is developed from high-level specifications without any manual programming. The Autoweb System also pioneers automatic site generation from abstract, language-independent presentation specifications, whereas other comparable tools are based on imperative programming (HSDL, OOHDM) or language-dependent page templates (RMC, HyperG, Strudel, Araneus). The Autoweb System has demonstrated that it is possible to specify the presentation of pages in an abstract way, and automatically produce code in a language of choice, retaining a satisfactory graphical quality of the result. 5.2.5 Research on Navigation Specification, Semantics, and Implementation. Navigation has been the subject of much research in the hypermedia and human-computer interaction communities, aimed at studying better interfaces to electronic hypertext systems and hypermedia applications [Trigg 1988]. In the HyTime hypermedia standard, navigation is addressed by focusing on location, i.e., on flexible constructs for defining the anchors of links at various level of granularity. HyTime does not provide an operational semantics to its navigation constructs, but leaves it to applications. Research on navigation semantics has used a variety of formal methods, e.g., Statecharts [Zheng and Pong 1992], Petri Nets [Stotts and Furuta 1989], and logics [Garg 1988]. Many authors have used the object-oriented model to express custom forms of navigation, e.g., as methods of the “link” object type [Nanard and Nanard 1991; Schwabe and Rossi 1995]. Recently, the interest in the WWW and the problems raised by the restrictions of HTML as a language for implementing hypermedia has spawned several attempts to enhance the navigation capabilities of current browsing tools to support more sophisticated forms of navigation, like hierarchical guided tours defined orthogonally to information nodes [Hauch 1996], multilevel indexes with “memory” [Jones 1996], and graphic navigation history trees [Ayers and Stasko 1995]. ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.


•

369

In Autoweb, the focus on automatic implementation has led to the adoption of built-in navigation modes, ruling out the possibility of “programming” ad hoc navigation primitives. Presently, only the basic forms of navigation described in Figure 4 have been implemented, but more sophisticated modes (e.g., hierarchical guided tours and multilevel indexes) could be easily incorporated, by including these options in HDM-lite and extending the Page Generator to produce the code needed to manage them. 6. EXPERIENCE AND EVALUATION In this section we describe a sample of “real-life” projects where either the design process, or HDM-lite, or the Autoweb System, have been used, and elaborate on the lessons learned from such experiences. 6.1 Experiences Autoweb was originally conceived to support the CorsiOnline (On-line Courses) project of Politecnico di Milano, a two-year project aimed at designing and implementing a uniform environment for putting on the Internet the courseware of the over 900 classes of the Politecnico di Milano, which serve more than 45,000 students. The project is now in the second year and is presently in the production phase. In the early stage of the project, Autoweb has permitted developers to deliver very rapidly to the involved teachers a number of prototypes illustrating alternative choices on all the aspects of an on-line course, like the structure of the teaching materials, the on-line support to students, and so on. Implementation has been done with the Autoweb System, and parallel experiments have been conducted with different approaches and tools, including manual HTML authoring and computer-assisted development using the Toolbook II Web Generator, Oracle Designer 2000, and the beta version of Oracle Learning Architecture, a vertical product for developing Internet courseware [Oracle 1997]. The site (http://corsionline.como. polimi.it) is now open, and students can either browse it online or obtain a materialized version of the courses they are interested in on a CD-ROM. TECHDOC is an industrial application developed by Officine Meccaniche Riva Srl, an Italian company leader in the market of advanced machines for the textile industry. TECHDOC is a large hypertext conceived to replace the paper documentation of machines. Technical information (bills of materials, part descriptions, technical drawings, and so on) stored in the company database is formatted by the Autoweb system into an hypermedia application which is delivered to customers both online in the company’s Web site and offline as a set of CD-ROMs. Autoweb has been used in the prototyping stage to help the technical staff in assessing the navigation facilities and the presentation styles to include in the application, and is now supporting the production and maintenance of hypermedia technical manuals. RAI-EMSF is an initiative of the Italian public TV Network (RAI). The project is based on 400 hours of interviews to some of the most famous ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.

370

•


modern philosophers and scientists, concerning philosophical and scientific issues. This material has been used to produce TV shows, VHS-cassettes, radio broadcasts, etc. Two multimedia CD-ROMs, targeted to the high school market, are being prepared. Each week a specific subject is chosen, and several events are organized around it: two TV shows, a radio talk show (with an “expert” answering telephone calls), a national newspaper article, and more. Example of subjects are the foundation of the moral code, Eugenics, or Ethics within Politics. The EMSF Web site (http://www.emsf. rai.it) contains a skeleton of the interviews, background information about philosophers, scientists, and major works, a preview of events to come, a real-time section with current events, a synthesis, and a revised summary of past events. The site’s structure and navigation have been designed and prototyped using HDM-lite, and the database schema has been obtained using the VHDM Relational Schema Generator. Due to the very specific requirements on the graphic interface, on content management, and on interoperability with internal technology standards, the actual implementation is not based upon the Autoweb Runtime Environment, but the design resulted from prototyping has been kept and the implementation retains the flavor of the Autoweb architecture. DSU (Diritto allo Studio Universitario, Support for University Study) is an information point of the Regional Office of Lumbardy, aimed at providing a broad overview of the educational resources (Universities, Specialization Courses, Schools, Services) located in Lumbardy. The application, designed and implemented with the Autoweb System, contains descriptive information organized so as to facilitate access to nontechnical people and to permit the rapid location of the desired services. Mediateca is a medium-sized Web site, developed within the DISCETECH project. This project explores the practical implications of using multimedia products (CD-ROMs and Web sites) for education in the Italian School System. Mediateca will host the description of around 400 CDROMs and 100 Web sites, selected by the DISCETECH tutors. Teachers and students will use Mediateca to access information on CDs and Web sites, especially the evaluation records provided by tutors or by previous participants to the project. 6.2 Evaluation One of the original contributions of the Autoweb project is the evaluation of several experiences of specifying and developing Web sites with the modeldriven approach, side by side with developers trained in different Web development technologies. In summary, the use of Autoweb for dataintensive Web sites has proved beneficial in many respects, but has also shown various limitations of the current approach and prompted for several changes and extensions. 6.2.1 The Top-Down Approach. A first evaluation goal was to establish the technical feasibility and user acceptance of the top-down development ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.


•

371

process, and contrast it to the commonly advocated bottom-up approach, where data design comes first and drives the site design process. —Technical feasibility: in Autoweb, hypertext design is the prominent activity, followed by the automatic generation of the supporting database. This approach proved technically sound also in presence of already existing data sources, because mapping the database automatically generated by Autoweb to external data sources resulted in a technical issue well-addressed by standard technologies (e.g., commercial database replication tools). —Acceptance of the top down process: we found that those who had to decide about the information to publish (marketing or communication professionals and content providers) neither had the skills nor cared about understanding the data structures necessary to support the Web site. Conversely, they were very much concerned about site content, composition, and navigation, leaving data-binding problems to be solved by database professionals. In most cases, the top-down approach proved crucial to the decision-making process, because it shortened the time for getting a running evaluable prototype, which was used to validate requirements, even if the prototype was not connected to the real data. 6.2.2 Modeling Power. The modeling concepts of HDM-lite have been selected with the objective of building a design notation with minimal complexity, usable also by nontechnical people. As a consequence, only the three most necessary modeling perspectives are offered (structure, navigation, and presentation), and for each perspective only the most intuitive primitives are retained. In retrospective, this choice proved successful, because the set of HDMlite modeling primitives resulted sufficient to cover most requirements. However, the following additions and revisions were suggested, which we consider requirements for our future work: (1) Object versioning. The structure model should explicitly include a notion of entity or component version, useful in many situations, e.g., to have a short and a long version of the same entity, or a different version for different languages. Different versions may have different, but possibly overlapping, sets of slots. (2) Navigation chains. The index-based navigation modes listed in Figure 4 should be generalized to provide multistep indexes, whereby the user may select the desired object by clicking on a number of progressively more restrictive index pages (e.g., first the index of countries, then districts, finally cities). (3) Embedded navigation modes. An additional dimension should be orthogonally added to the navigation modes of Figure 4, by means of which the pages showing the index and filter could be “embedded” into the page showing the source or the destination of a traversal. In this way, it could be possible to keep together in the same page an object and multiple search forms or indexes over related sets of objects. ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.

372

•


(4) Generalized filters. Filter conditions should allow type-dependent predicates (e.g., date comparison) and disjunction. With disjunction, it could be possible to use a single input field for a string-based search into all the textual slots of a component, as customary in many commercial Web sites. 6.2.3 Treatment of Data Redundancy. Data redundancy is commonplace in Web sites, where the same piece of information may be repeated in several pages for better readability or increased emphasis. Autoweb permits the designer to explicitly mark slots as redundant (“derived,” in the Autoweb terminology) to avoid duplication when producing the supporting database. Presently, only the following special, but frequently encountered, cases are allowed: —a child component may include a slot of the parent component, the included slot is not duplicated in the database table representing the child; —any entity or component may include a slot of another entity or component connected to it by a 1:1 link; the included slot is not duplicated in the table representing the referencing object. This feature accommodates a great number of application requirements. However, a more general mechanism for covering data redundancy has been required by users and is part of our ongoing work. The basic idea is to introduce “derived data” (slots, components, links, and collections) as first-class citizens, provide a simplified OQL-like language to express how a redundant datum must be computed, and automatically translate this specification into an SQL view to install on top of the supporting database. In several cases, we simulated this “derivation wizard” by manually writing the needed SQL views, and the result was data redundancy without duplication. 6.2.4 Independent Page Definition. In Autoweb (as in Oracle Designer 2000) there is no separation between the structure of data (described in the HDM-lite structure schema) and the structure of Web pages, which is automatically produced from the HDM-lite conceptual schema by applying built-in page composition rules. For example, each component is mapped into a single page, with outgoing hyperlinks corresponding to the enabled traversals. While the designer can freely fine-tune the navigation modes and add access collections, he cannot redefine the content of pages showing component information. Other systems, e.g., Araneus and Strudel, offer distinct notations and languages to map data structures to page structures, so that the same entities can be mapped to different hypertext nodes. The reason for Autoweb’s restriction is twofold: minimizing the notations that designers need to master, and provide a totally automatic mapping of the conceptual schema into a set of Web pages. ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.


•

373

Another motivation for omitting a separate page composition notation was the reduction of prototyping times; once the structure schema is in place, a default Autoweb site is already ready to run. Even for large Web applications with sophisticated navigation requirements the time to deploy the first prototype and its underlying database is still a matter of hours. Nonetheless, direct mapping of structure to pages introduces rigidity and reduces the reusability of the structure schema: if application A uses the same conceptual schema of application B but needs to map components to Web pages in a different way, it must redefine the structure schema. The solution we are currently looking at is to provide a first-class notion of “page,” distinct from the structural notion of component (as in Araneus), coupled to a default algorithm for inferring pages, traversals, and navigation modes from entities, components, and links (as in Autoweb). Pages and traversals could be packaged into “site views” (as in Strudel), to specify different hypertexts over the same data. With this solution, designers would retain the ability of quickly producing the default site view, but would also be free to clone and modify it to obtain alternative site views. 6.2.5 Presentation. A controversial choice in the design of HDM-lite has been the decision to provide a high-level model of presentation, which is automatically translated into physical pages, instead of letting designers use some extended version of HTML to define page templates, like in many commercial products, and in Araneus and Strudel. The original motivation of this decision was the desire of enforcing presentation consistency and coherence between the presentation schema and its sibling schemas for structure and navigation. As an example, if the designer has defined a structure schema in which component A has two subcomponents B and C and two links to separate objects D and E, the presentation schema will differentiate the anchors to B and C from those to D and E, by enclosing the former in the part-of panel and the latter in the link panel. Similarly, many other related visual elements (visible collections, slots, utility buttons) are defined together and thus cannot be arbitrarily scattered within the page. As our experience has revealed, this is more a benefit than a constraint: even in the most sophisticated sites (e.g., RAI EMSF) graphical sophistication is prevalent in the site’s access pages, typically the home page and the top-level indexes. Within the site, where the actual objects are presented, pages tend to have a regular structure, to simplify navigation and reduce users’ disorientation. A typical solution is to manually implement high-level pages through graphical maps and let Autoweb generate the inner pages. Another fundamental advantage is that the high-level specification of presentation lends itself to automatic mapping to different network languages or presentation devices, whereas page template technology is bound to the language in which the template is written. This benefit, not clear in the early stage of Autoweb development, is becoming prominent in our ongoing work, due to the increasing commercial interest in novel mark-up languages for wireless applications (e.g., Wireless Markup Language ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.

374

•


(WML) [Wap Forum 1999] for WAP-enabled mobile phones). Multidevice development challenges current tools and offers a great opportunity for CASE tools capable of alleviating the effort of maintaining Web sites in multiple network languages. 6.2.6 Content Management. In most of the described Web applications, the Autoweb DataEntry Generator is used only in the prototyping phase, but is completely insufficient to cope with the requirements of a true content-management application. Autoweb, as Araneus and Strudel, focuses on reader’s requirements and lacks a model to describe content production. As a result, in most cases two different applications are designed: one to serve formatted content to the reader, and one to let content producers update the hyperbase. This choice is probably the most correct one in many applicative contexts, where readers and producers are different groups and there is no need of integrating publishing and data entry. However, integrated data entry support to readers is often required, although limited to a few well-defined objects, e.g., shopping carts in the electronic commerce field. Presently Autoweb serves this need by offering the notion of utility button, by which it is possible to incorporate into the application interface anchors leading to custom applications for updating the hyperbase objects. However, this primitive does not support complex workflows, like those needed, e.g., for the definition and validation of the technical specifications managed by TECHDOC. 6.2.7 User Management and Personalization. In large projects aimed to the general public, like CorsiOnline, the need arises of considering different categories of potential users, both to customize content, navigation, and presentation according to the interests of a specific group, or to control access to the data entry interface. Unlike a few research proposals (such as Perkowitz and Etzion [1997]) and some commercial systems (such as BroadVision One-to-One (http:// www.broadvision.com)), presently Autoweb does not address user modeling and site personalization in a systematic way. Personalization can be simulated by defining different navigation schemas and presentation styles over the same structure schema and addressing different users to different versions of the application, as done in the ACME example. However, for scalability, user modeling and personalization rules should be integrated into HDM-lite, for example by introducing a notion of perspective to represent the view of the application designed for a specific category of users. Perspectives could be implemented automatically by the Page Generator, thanks to Autoweb’s reflexive approach in which metadata about the site are stored and can be queried at run-time to drive the production of pages. 6.2.8 Impact on Process and Application Quality. Autoweb, and more generally a model-driven approach to Web development, affects the quality of both the development process and of the resulting Web application: ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.


•

375

—Quality of the application: designing WWW applications using a wellfounded and descriptive formalism surely improves the navigation model, with respect to less organized approaches. Improvements come in terms of increased navigation power, increased usability, and consistency across the application pages. —Better requirements: the availability of a structural and behavioral schema that can be discussed before implementation, and of fast prototypes strictly adherent to the schema, improves the understanding of the application’s owner (often a nontechnical person). This has been crucial for large projects, such as RAI-EMSF, where the complexity of the information to publish made it very difficult to proceed with informal methods. —Cleaner separation of structure, navigation, and presentation: HDM-lite allows the designer to tackle these aspects separately, and then top merge them together. In a typical setting, earlier prototypes are developed using a default style not interfering with the evaluation of the structure and navigation features, and the final look-and-feel is added only at the last rounds of prototyping. This procedure has been particularly effective in projects such as CorsiOnline or RAI-EMSF, where sophisticated presentation solutions were needed, and several nontechnical people were involved in the design process. —Reduced development effort: this advantage comes from a number of factors: (1) a well-organized development cycle; (2) the stress on prototyping, which causes less revisions or major changes; (3) precise guidelines for manual implementation, as for RAI-EMSF; (4) very fast automatic implementation with the Autoweb System, as for the CorsiOnline site. In all projects cost effectiveness has been experimented, with the last factor being the most substantial in all applications, but RAI-EMSF. —Improved content management: basing a WWW application on a repository of content, links, and access collections fosters a better management of information; this is especially important when content and relationships are highly volatile, as in CorsiOnline and RAI-EMSF. —Improved maintenance and evolution: cost and time reduction shows also during evolution, because changes in the conceptual model can be automatically or semiautomatically propagated to implementation. This benefit has often been the most important one, because even model-driven design not always fully reflects requirements, and, more frequently, requirements change after application delivery. 6.2.9 Alternative Development Approaches. Alternative design and implementation strategies were experimented in some of the reported projects: —Manual authoring: in all the above experiences this approach was ruled out by the need of storing large amount of information in an efficient, reliable, and accessible manner; therefore, we were always forced to couple WWW and database technology. ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.

376

•


—General-purpose generators integrating Web and database features: the structural and navigation abstractions they offer proved too simple (typically, they include table-based and record-based presentation of domain objects, and fixed-format navigation within predefined aggregate constructs, like master-detail structures), and the control over the presentation too limited (typically, customization is achieved by setting a few presentation parameters and specifying user-defined HTML headers and footers which can be appended to standard form-like pages). —Web-database gateways: tools of this kind, being navigation- and presentation-neutral, offer an industrial-strength alternative to the use of the Autoweb Page Generator and DataEntry Generator as implementation devices. However, their effectiveness is greatly enhanced if they are used in conjunction with a design model like HDM-lite. —Vertical WWW applications, i.e., software skeletons designed ad hoc to incorporate the structural, navigation, and presentation semantics most suitable to the specific domain (the Oracle Learning Architecture tool is an example in the Computer-Based Training field). Clearly, skeleton instantiation greatly reduces the effort required for deploying applications, but this advantage is counterbalanced by the rigidity in the skeleton definition and by the difficulty of transporting solutions for one domain to another one. As matter of fact, we did not find any template suitable to our projects. 6.2.10 Architecture. Although Autoweb is a research prototype not comparable to commercial page generation tools, using it in the development of applications permitted us to evaluate its architecture and to determine the most important directions of its revision. None of the mentioned applications where Autoweb is being used as the actual page generation technology has severe space and time constraints. The most frequently accessed Autoweb site (CorsiOnline) contains over 7,500 pages, with typical page size ranging from five to 50 kilobytes. Access rates average to 5,000 hits per month, with a few hits per second in peak times (e.g., during online lessons). The construction of a page requires from one query (in the simple case of a single index without noncontextual navigation) to 20 queries (for the main page of a course, which may have up to a dozen contextual links to subcomponents showing different kinds of teaching materials). From observations done in testing the CorsiOnline application, it resulted that the bottleneck of the Page Generator architecture is the client-server communication between the Dispatcher and the Server Process, because the Server is monothreaded and requests are enqueued and served sequentially. To overcome this limitation, two alternative approaches were available: (1) improving performance and scalability by introducing parallelism in the Server; (2) reimplementing Autoweb page generation functions on top of existing multithreaded commercial Web-database gateways, according to the “late data fill” approach sketched in Section 3.2.1. ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.


•

377

User feedback clearly pointed us in the direction of reimplementing the Autoweb Runtime Environment using standard page generation technologies (specifically, Microsoft’s ASP and JavaSoft’s JSP were requested). Such reengineering is based on the logical-to-physical mapping illustrated in Section 3.2, with a different distribution of activities between compile-time and run-time. At compile-time, high-level specifications are automatically translated into ASP or JSP templates, which map the layout dictated by language-independent stylesheets into code in a specific mark-up language (e.g., HTML), but include no data. Such templates are then installed in the server-side script engine and are automatically filled with data at run-time, based on the parameters of the user’s request. This solution has several advantages over the “proprietary server” approach of Autoweb 2.0: —It preserves the benefits of both conceptual modeling and automatic implementation. A site is still produced from high-level specifications without any programming. —It preserves multilanguage output. The same conceptual schema may be used to automatically produce templates in different mark-up languages (e.g., HTML and WML). —It assigns more work to compile-time tools, relieving the runtime page generator from the translation of abstract presentation into mark-up code. —It delegates runtime performance optimization and scalability issues to proven commercial technologies, which are reliable and accepted by industrial users. Typically, commercial Web-database gateways supporting ASP and JSP leverage multithread or multiprocess architectures, with sophisticated load-balancing capabilities, which ensure very good scalability. 7. CONCLUSIONS AND FUTURE WORK In this paper we have introduced and discussed the notion of model-driven development of Web applications, which aims at unifying into a coherent framework, tailored to Web development, concepts and techniques coming from Information System and Hypermedia development. The proposed approach has been substantiated by the implementation of the Autoweb System, a software environment where all the aspects of model-driven Web development are supported. The most important contributions of the paper are —HDM-lite, a design model comprising concepts for specifying the structural, navigation, and presentation semantics of Web applications. HDMlite integrates database and hypermedia modeling concepts with novel presentation abstractions tailored to the Web context. —A database architecture where not only application content is managed by a DBMS, but also metainformation about structure, navigation, and ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.

378

•


presentation is persistently stored in relational format, and amenable to evolution and maintenance. —Two mapping techniques for transforming conceptual schemas into relational structures, and database content and metadata into physical Web pages. —The Autoweb System, an original CASE environment supporting the most important activities of model-driven hypermedia design. Rapid prototyping, code generation, and WYSIWYG presentation design based on the same modeling concepts used to represent structure and navigation enable fast and low-cost generate-and-test loops, which contribute to enhancing the quality of the resulting application. —A process for both the development of new applications and the reverse engineering of existing ones. The proposed approach has proved valid in a number of different application domains. The Autoweb approach compares well to existing solutions for the development of large-scale, data-intensive applications, where manual authoring is not feasible: —Database gateways, Web-enabled form interpreters and generators, and even database-enabled application generators fail in capturing the semantic richness of Web-based interfaces, where navigation and presentation have an equally important, if not prominent, role with respect to data structure. —New-generation hypermedia authoring tools offering Web-export functions have a weak architecture, insufficient for storing and updating large data volumes, and lack a conceptual model enabling the automatic generation of repetitive code. Autoweb contributes original results also with respect to other research projects with comparable objectives. —Autoweb has a higher level of automated support, because applications are developed without any HTML, SQL, or presentation template coding. The resulting site has a graphical and navigation quality comparable to commercial Web sites; after delivery, the original design can be updated in all its major dimensions by means of high-level tools, and the changes can be propagated automatically to the running site, without any programming effort. This result required a long fine-tuning of the conceptual model (in particular, for what concerns navigation and presentation) in order to achieve a good compromise between usability, expressive power, and implementability of the necessary code generators. —Autoweb pioneers the notion of abstract, language-independent presentation specification, whereas other research and commercial approaches are based on templates tied to specific mark-up and server-side scripting languages. Language- and device-independent site design will become crucial in the near future, when PDAs and Web-enabled mobile phones will become a popular way to browse the Web. Autoweb demonstrates ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.


•

379

that it is possible to specify the presentation of pages in an abstract way, and to produce Web sites in the language of choice without compromising the graphical quality of the result. —Finally, a major contribution of the Autoweb experience comes from the lessons learned in field-testing the usability of the methodology and of the tools with real users, totally unaware of conceptual models and model-driven Web design. From the reported experiences, we have learnt that model-driven development is effective both from the technical and methodological standpoint, although it must be coupled to commercial page generation technologies, for better user acceptance, performance and scalability, and sometimes it must be complemented by manual intervention in the design of user interfaces, to treat special requirements occurring in high-quality Web applications. 7.1 Ongoing and Future Work Ongoing and future work concentrates on the directions illustrated in Section 6.2:1 —Advanced modeling primitives: we are currently working on a novel modeling language taking into account the user feedback reported in Sections 6.2.2, 6.2.3, and 6.2.4. In particular, we are concentrating on user-defined, composable navigation primitives, data derivation, and separation of structure and page definition. Preliminary results of this activity are visible at the URL http://webml.org. —User management and personalization: we are incorporating in the structure model an explicit notion of user, to exploit derivation queries and business rules for automatically adapting navigation and presentation to the needs of specific users or users’ groups. We envision two forms of personalization: (1) declarative personalization, in which derivation queries may be used to define content based on user profile data (e.g., a special price discount on the user’s birthday); (2) procedural personalization, in which active rules [Ceri and Widom 1996] may be used to update profile data or monitor user behavior (e.g., to classify users into groups based on browsing habits or past purchases). Initial results of this activity can be found in Ceri et al. [1999]. —Web-design patterns and application skeletons: an important use of model-driven design is the identification of recurrent Web design patterns and the construction of application skeletons in different domains, like electronic commerce, digital libraries, virtual galleries, technical documentation, and so on. We are presently investigating the best mix of structure and navigation in selected domains (namely, online catalogs and fashion sites) and designing conceptual-level application skeletons, to obtain partially implemented applications, which could be quickly 1

Part of the ongoing work takes place in the context of the W3I3 ESPRIT Project, under the sponsorship of the European Community. Information on the project can be found at http://www.txt.it/w3i3. ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.

380

•


personalized and deployed, overcoming the rigidity of current vertical application frameworks for the WWW. Results are reported in Garzotto et al. [1999]. —Architectural revision: we are pursuing in two distinct application projects (a large e-commerce Web site, and an online catalog of Italian fashion designers) the reengineering of the Autoweb page generation engine, to produce page templates for Microsoft’s ASP and JavaSoft’s JSP server-side scripting languages. —Integration with XML. After acceptance by the WWW Consortium, XML [W3C 1998b] is becoming a standard format for Web data. We are considering XML in conjunction with model-driven site development, under three complementary perspectives: (1) As an additional content storage format, beside relational technology. (2) As another publishing language for multioutput site delivery. (3) As a convenient syntax for encoding the conceptual model and facilitating the implementation of translators, using XSL technology [W3C 1998a]. —Multilanguage support. We are implementing a code generator for WML, for the automatic production of applications accessible via PCs and cellular phones. As the next step, we plan to implement a code generator for output in XML and XSL. We believe that in the near future the model-driven design approach, coupled to code generation from high-level specifications, will be the key factor for mastering the exponential increase in complexity introduced by the need of designing and maintaining over time coherent applications on multiple platforms. ACKNOWLEDGMENTS

We would like to thank Stefano Ceri, who provided useful comments on the HDM-lite model and on the features of the Autoweb System, Giuseppe Fedon who managed the Autoweb development team, all the RAI-EMSF staff, Mario Ferloni at Officine Meccaniche Riva, and all the people who collaborated to the Autoweb Project. REFERENCES ANDREWS, K., KAPPE, F., AND MAURER, H. A. 1995. Serving Information to the Web with Hyper-G. Computer Networks and ISDN Systems 27, 6, 919 –926. ATZENI, P., MECCA, G., AND MERIALDO, P. 1997. To Weave the Web. In M. Jarke, M. J. Carey, K. R. Dittrich, F. H. Lochovsky, P. Loucopoulos, and M. A. Jeusfeld Eds., Proc. 23rd Conference on Very Large Databases (Athens, Greece, Aug. 26 –29, 1997), pp. 206 –215. ATZENI, P., MECCA, G., AND MERIALDO, P. 1998a. Design and Maintenance of Data Intensive Web Sites. In H.-J. Schek, F. Saltor, I. Ramos, and G. Alonso Eds., Proc. Int. Conf. on Extending Database Technology, EDBT98 (Valencia, Spain, March, 1998), pp. 436 – 450. ATZENI, P., MECCA, G., MERIALDO, P., MASCI, A., AND SINDONI, G. 1998b. The Araneus Web-Base Management System. In L. M. Haas and A. Tiwary Eds., Proc. Int. Conf. Sigmod’98, Exhibits Program (Seattle, June, 1998), pp. 544 –546. ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.


•

381

AYERS, E. Z. AND STASKO, J. T. 1995. Using Graphic Hystory in Browsing the World Wide Web. In Proc. Fourth Int. WWW Conf. (Boston, Mass., Dec. 1995). BALASUBRAMANIAN, V., MA, B. M., AND YOO, J. 1995. A Systematic Approach to Designing a WWW Application. Communications of ACM 38, 8, 47– 48. BERNSTEIN, P. A., PAL, S., AND SHUTT, D. 1999. Context-Based Prefetch for Implementing Objects on Relations. In M. P. Atkinson, M. E. Orlowska, P. Valduriez, S. B. Zdonik, and M. L. Brodie Eds., Proc. 25th Int. Conf. on Very Large Databases, VLDB’99 (Edimburgh, September, 1999), pp. 327–338. CERI, S., BATINI, C., AND NAVATHE, S. 1993. Conceptual Database Design. Benjamin Cummings, Menlo Park, CA. CERI, S. AND WIDOM, J. 1996. Active Databases. Morgan Kaufmann. CERI, S., FRATERNALI, P., AND PARABOSCHI, S. 1999. Data-Driven, One-To-One Web Site Generation for Data-Intensive Applications. In M. P. Atkinson, M. E. Orlowska, P. Valduriez, S. B. Zdonik, and M. L. Brodie Eds., Proc. 25th Int. Conf. on Very Large Databases, VLDB’99 (Edimburgh, September, 1999), pp. 615– 626. CHEN, P. P. 1976. The Entity-Relationship Model: Toward a Unified View of Data. ACM TODS 1, 1, 9 –36. DEROSE, S. J. AND DURAND, D. G. 1994. Making Hypermedia Work: A User’s Guide to HyTime. Kluwer Academic. DIAZ, A., ISAKOWITZ, T., MAIORANA, V., AND GILABERT, G. 1995. RMC: A Tool to Design WWW Applications. In Proc. Fourth Int. WWW Conf. (Boston, Mass., 1995), pp. 11–14. FERNANDEZ, M. F., FLORESCU, D., LEVY, A. Y., AND SUCIU, D. 1998. Catching the Boat with Strudel: Experiences with a Web-Site Management System. In L. M. Haas and A. Tiwary Eds., Proc. Int. Conf. Sigmod’98 (Seattle, June, 1998), pp. 414 – 425. FLORESCU, D., LEVY, A. Y., AND IOANA MANOLESCU, D. S. 1999a. Query Optimization in the Presence of Limited Access Patterns. In A. Delis, C. Faloutsos, and S. Ghandeharizadeh Eds., Proc. Int. Conf. Sigmod’99 (Philadelphia, Pennsylvania, USA, June 1–3, 1999), pp. 311–322. FLORESCU, D., LEVY, A. Y., SUCIU, D., AND YAGOUB, K. 1999b. Optimization of Run-time Management of Data Intensive Web-sites. In M. P. Atkinson, M. E. Orlowska, P. Valduriez, S. B. Zdonik, and M. L. Brodie Eds., Proc. 25th Int. Conf. on Very Large Databases, VLDB’99 (Edimburgh, September, 1999), pp. 627– 638. FORTA, B. ET AL. 1997. The Cold Fusion Web Database Kit. QUE Corp. FRATERNALI, P. 1999. Tools and Approaches for Developing Data-Intensive Web Applications: a Survey. ACM Computing Surveys. to appear. FRATERNALI, P. AND PAOLINI, P. 1998. A Conceptual Model and a Tool Environment for Developing More Scalable and Dynamic Web Applications. In H.-J. Schek, F. Saltor, I. Ramos, and G. Alonso Eds., Proc. Int. Conf. on Extending Database Technology, EDBT98 (Valencia, Spain, March, 1998), pp. 421– 435. GARG, P. K. 1988. Abstractions Mechanisms in Hypertexts. Communications of ACM 31, 7, 862– 870. GARZOTTO, F., MAINETTI, L., AND PAOLINI, P. 1993a. HDM2: Extending the E-R Approach to Hypermedia Application Design. In R. Elmasri, V. Kouramajian, and B. Thalheim Eds., Proc. 12th Int. Conf. on the Entity Relationship Approach, ER’93 (Dallas, Texas, 1993), pp. 178 –189. GARZOTTO, F., MAINETTI, L., AND PAOLINI, P. 1994. Adding Multimedia Collections to the Dexter Model. In Proc. ECHT’94 (Edinburgh, Scotland, September 19 –23, 1994), pp. 70 – 80. GARZOTTO, F., PAOLINI, P., AND SCHWABE, D. 1993b. HDM-A model-based approach to hypertext application design. ACM TOIS 11, 15, 1–26. GARZOTTO, F., PAOLINI, P., BOLCHINI, D., AND VALENTI, S. 1999. “Modeling-by-Patterns” of Web Applications. In Proc. Int. Workshop on the World-Wide Web and Conceptual Modeling (WWWCM’99), Number 1727 in LNCS (Paris, France, November 15–18, 1999), pp. 293–306. Springer. GWYER, M. 1996. Oracle Designer/2000 WebServer Generator Technical Overview (version 1.3.2). Technical report (Sept.), Oracle Corporation. ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.

382

•


HALASZ, F. G. AND SCHWARZ, M. 1994. The Dexter Hypertext Reference Model. Communications of the ACM 37, 2, 30 –39. HARDMAN, L., BULTERMAN, D., AND VAN ROSSUM, G. 1994. The Amsterdam Hypermedia Model: Adding time and context to the Dexter Model. Communications of the ACM 37, 2, 50 – 62. HAUCH, F. J. 1996. Supporting hierarchical guided tours in the World Wide Web. In Proc. Fifth International World Wide Web Conference (Paris, France, May 1996). HOVEN, I. V. 1997. Deploying Developer/2000 applications on the Web, Oracle white paper. Technical report (Feb.), Oracle Corporation. HYPERWAVE INFORMATION MANAGEMENT. 1998. Hyperwave User’s Guide, Version 4.0. Munich, Germany: Hyperwave Information Management. ISAKOWITZ, T., STHOR, E. A., AND BALASUBRANIAN, P. 1995. RMM: a Methodology for Structured Hypermedia Design. Communications of the ACM 38, 8, 34 – 44. JONES, K. L. 1996. NIF-T-NAV: a Hierarchical Navigator for WWW Pages. In Proc. Fifth International World Wide Web Conference (Paris, France, 1996). KESSELER, M. 1995. A Schema-Based Approach to HTML Authoring. In Proc. Fourth Int. WWW Conf. (Boston, Mass., 1995). WAP FORUM. 1999. Wireless Markup Language Specifications, version 1.1. http://www.wapforum.org: WAP Forum Ltd. MYERS, B. A., HOLLAND, J. D., AND CRUZ, I. F. 1996. Strategic Directions in Human Computer Interaction. ACM Computing Surveys 28, 4 (Dec.), 794 – 809. NANARD, J. AND NANARD, M. 1991. Using Structured Types to Incorporate Knowledge in Hypertexts. In Proc. ACM Hypertext Conf. HT’91 (San Antonio, Texas, 1991), pp. 329 –343. NANARD, J. AND NANARD, M. 1995. Hypertext Design Environments and the Hypertext Design Process. Communications of the ACM 38, 8, 45– 46. ORACLE. 1997. Oracle learning architecture. Technical report, Oracle Corporation. PERKOWITZ, M. AND ETZION, O. 1997. Adaptive Web Sites: an AI Challenge. In Proc. of the 15th IJCAI (Nagoya, Japan, August 23–29, 1997), pp. 16 –23. SCHWABE, D., CALOINI, A., GARZOTTO, F., AND PAOLINI, P. 1992. Hypertext Development Using a Model-Based Approach. Software Practice and Experience 22, 11, 937–962. SCHWABE, D. AND ROSSI, G. 1995. The object-oriented hypermedia design model. Communications of the ACM 38, 8, 45– 46. STOTTS, P. AND FURUTA, R. 1989. Petri-net Based Hypertext: Document Structure with Browsing Semantics. ACM Transactions on Office Information Systems 7, 1, 3–29. TAKAHASHI, K. AND LIANG, E. 1997. Analysis and Design of Web-based Informations Systems. In Proc. Sixth Int. WWW Conf. (Santa Clara, California, 1997). TRIGG, R. H. 1988. Guided tours and tabletops: Tools for communicating in a hypertext environment. ACM Transactions on Office Information Systems 6, 4, 398 – 414. W3C. 1998a. An introduction to xsl. http://www.w3C.org/Style/xssl. W3C. 1998b. Xml 1.0. http://www.w3.org/XML. ZHENG, Y. AND PONG, M. 1992. Using Statecharts to Model Hypertexts. In D. Lucarella Ed., Proc. ECHT92 (Milan, Italy, December, 1992), pp. 242–250. Received February 1999; revised August 1999 and March 2000; accepted April 2000

ACM Transactions on Information Systems, Vol. 28, No. 4, October 2000.

Model-Driven Development of Web Applications ... - ACM Digital Library

Model-Driven Development of Web Applications ... - ACM Digital Library

Suggest Documents

Reverse Engineering of Web Applications - ACM Digital Library

Development of Applications with Service ... - ACM Digital Library

Research and Applications in Web Intelligence ... - ACM Digital Library

Partitioning Web Applications between the ... - ACM Digital Library

Mobile social networking applications - ACM Digital Library

Enabling Location-Based Applications - ACM Digital Library

Web Page Segmentation Evaluation - ACM Digital Library

Web Usability and Age - ACM Digital Library

Design, development and performance ... - ACM Digital Library

Design, development and performance ... - ACM Digital Library

Software development lifecycle models - ACM Digital Library

design - ACM Digital Library

crpit - ACM Digital Library

Conversations - ACM Digital Library

Incentives - ACM Digital Library

Gunrock - ACM Digital Library

Abstract - ACM Digital Library

AdaGIDE - ACM Digital Library

MOVELETS - ACM Digital Library

P10 - ACM Digital Library

2PXMiner - ACM Digital Library

feature - ACM Digital Library

C++ ... - ACM Digital Library

practice - ACM Digital Library