A Hypertext System for Integrating Heterogeneous ... - Semantic Scholar

0 downloads 0 Views 148KB Size Report
ware engineering artifacts, whether as hyper- text nodes, les, or database entities, that are stored across many heterogeneous, au- tonomous repositories.
A Hypertext System for Integrating Heterogeneous, Autonomous Software Repositories John Noll and Walt Scacchi Information and Operations Management Dept. University of Southern California Los Angeles, CA 90089-1421

Abstract

or introduction of new repository technology into existing environments. It may also happen when seeking to support the distributed development of software systems and related documents over the Internet. Consider the following two examples.

Hypertext is a simple concept for organizing information into a graph structure of linked container objects. This paper examines issues involved in applying hypertext concepts to the integration of heterogeneous, autonomous software repositories, and presents a solution called the Distributed Hypertext System (DHT). Based on a hypertext data model and client-server architecture, DHT features powerful modeling capabilities, integration of heterogeneous, preexisting repositories, update with concurrency control, and full local autonomy.

Multiplatform project. A software de-

velopment group is developing a database application to run in a client-server environment. The database server runs on a Unix host; the clients will run on PCs with PC operating systems. Developers of the database software will use the traditional Unix programming tools (cc, make, RCS) to develop and test their code. PC programmers will use native integrated development environments. In addition, there will be a shared network le system so all platforms can share les. The group desires tools for doing analysis on the project artifacts taken as a whole, to do requirements traceability, impact of change analysis, and metric collection. Despite the logical centralization afforded by a shared lesystem, there is still considerable heterogeneity present. Each programming environment has its own mechanisms for storing dependency relationships, function cross references (\tags"), and versions. Thus, the global analysis tools will

1 Introduction Considerable research has been devoted to software object management. Table 1 summarizes some of the results. For the most part, these e orts assume that the objects in the repository are stored in a central database or storage manager, which may be accessed by distributed clients. In reality, however, software artifacts may be stored in many diverse locations under independent control. This may happen due to multi-contractor development projects, corporate mergers, cooperative relationships among otherwise autonomous organizations, Appears in Proc. 3rd. Irvine Software Symposium, ACM SIGSOFT and UC Irvine, Irvine, CA, 49-60, April 1994. 

1

System SoftBench[7] AtFS[19] NSE[1] SLCSE[29] PMDB[25] Aspect[5] AD/Cycle[21] NuMil[22] Atherton[24] Triton[15] PCTE/OMS[3] CAIS-A[5] PGRAPHITE[30]

Concurrency optimistic locksc copy-edit-merge via dbmse locks locks locks via dbms versions optimistic locks nested trans. optimistic

Collections directoriesa directories directories no no no trees directories directories complex objects trees directories collection

Granularity le le le record record record record, le le object object le le primitive types

Query no attr.d no dmlf attr. dml dml dml attr. no no no no

Versioning tool-levelb yes yes yes no yes tool-level yes yes no yes ? no

a \Directories" refers to hierarchical containers of objects or other directories. b Tool-level versioning implements versioning through tools built on top of the repository layer. c \Locks" refers to the \check-out" style of long duration locking used by RCS. d \Attr." refers to searching for nodes with matching attribute values. e SLCSE and NuMil use the underlying relational dbms to provide concurrency control. f Dml queries use a full-featured data manipulation language that would be found in a DBMS.

Table 1: Software engineering repositories. have to retreive data from several di erent tools. Thus, there will be no shared le system or database to serve as an integrating \repositories". resource. Similarly, we cannot assume that Collaborative development across the all teams will agree to adopt and use any Internet. Several teams of researchers one team's object storage manager or data throughout North America have been assem- model. Nonetheless, it will be necessary for bled to collaborate on research into sharing each team to be able to modify jointly develresuable software process models. Each team oped documents or process assets. is expected to provide technical reports that Each of these two scenarios indicate document their individual research results, that there is a need to organize and manas well as to cooperate in the production of age dispersed of related software project-wide reports on their collective nd- objects across collections autonomous computing enviings. Each of these documents will evolve in ronments, as well as heterogeneous both structure and content over the course of ries and data models. To date, littlerepositoresearch the project. Additionally, the teams intend has been conducted on how to manage softto develop a shared collection of reusable pro- ware engineering artifacts, whether as hypercess assets and experience reports that can text nodes, les, or database entities, that be accessed over the Internet, although each are stored across many heterogeneous, auteam will use their pre-exisiting repositories. tonomous repositories. Due to geographic separation and the Hypertext system loose cooperative structure of this project, (DHT)Theis aDistributed research project at USC exeach team will be highly autonomous, man- ploring the application of Hypertext conaging its own computing environment and 2

schema and object comparability are problems: one person's module is another's subsystem; dependency relationships may be called by di erent names, such as \derived-from", \depends-on", etc.

cepts to the integration of heterogeneous, autonomous software repositories. The goal is to devise a scheme for software object management that provides the following:

1. Services to support software engineering activities that require concurrency con- Autonomy Our view of autonomy adtrol and versioning; dresses three concepts: design autonomy re ects the degree to which a repos2. Integration of pre-existing, heterogeitory can specify its own data model, neous repositories that preserve auschema, storage manager, applications, tonomous control over local repositories etc; execution autonomy is the ability of and existing applications; a repository to dictate who has access to 3. Transparent object access; and what objects, in what fashion; association autonomy refers to the freedom of 4. A simple implementation strategy. a repository to choose with what other repositories, if any, it will cooperate [28]. Note that in a heterogeneous environment composed of autonomous, pre-existing Transparency Schatz [27] de nes three types of transparency: type transrepositories, some of these requirements are parency, the ability to apply the same dicult to meet. For example, concuroperations to any type object; location rency control based on data-processing style transparency, the ability to access retransactions violates execution autonomy by mote objects in the same manner as lorequiring participants to defer to a global cal objects; and scale transparency, the transaction manager. Likewise, atomic operability of a system to perform as well ations applied to collections of objects may with one million as with 1000 objects. violate association or execution autonomy if To this we add source transparency, the they require participants to coordinate the ability to apply the same operations to commit or abort of such operations. Fian object regardless of where it is stored. nally, some repositories (such as relational database systems) do not have built-in supThis property has to do with di erences port for versioning or full-text searching. Soamong storage managers and the access lutions to these issues must take repository techniques they provide for storing and autonomy and the variation in the capabiliretrieving objects. Source transparency ties of heterogeneous storage managers into requires that one should be able to reaccount. trieve an object using the same command or access function regardless of The problem has three aspects: how the object is stored. Note that this is not the same as loHeterogeneity McLeod and colleagues decation transparency. Location trans ne the \spectrum of heterogeneity" for parency requires only that the same databases, comprised of several facets: commands or access functions apply data model, schema, object comparato both local and remote repositories. bility, data format, and storage manThus, the same SQL commands would ager [13]. This is equally applicable to apply to a local or remote relational software repositories. We nd a varidatabase; the open() function should ety of data models, from le-oriented to open a remote or a local le. relational to object-oriented. Likewise, 3

In contrast, source transparency requires that there be a single function (say, GetObject()) that would retrieve an object from a relational database or a le system or any other repository type.

NETWORK DHT node

GetObject("dht.c")

Gateway

In the following sections, we present the DHT system and show how it addresses the issues above to meet our stated requirements.

Result record

SELECT * FROM modules WHERE name = "dht.c"

Local Repository (Relational db)

2 DHT Architecture and Data Model

We now present a brief overview of DHT, a typical example of which is depicted in gure 1. A more detailed description of an earlier version of DHT can be found in [23]. However, this report highlights re nements and enhancements incorporated into DHT in its current version.

Figure 2: A server for a software module database.

tion protocol implements the operations speci ed in the DHT data model (see section 2), and includes provisions for locating Architecture. The DHT architecture is servers and authenticating and encrypting based on a client-server model. Clients im- messages. plement all application functionality that is not directly involved in storage management; Data Model. The DHT Data Model dea client is typically an individual tool, but nes hypertext primitives that provide sucient modeling power to represent software may be a complete environment. concepts, without compromisSeveral servers manage the objects engineering ing the autonomy of local repositories. It (nodes or links) in the hypertext, where dif- consists of four basic objects: nodes, the ferent servers can manage nodes and links. content objects; links that relationEach server consist of two components: the ships among nodes; anchorsmodel , that specify repository that owns and manages local ob- the points within node contents anjects, and a gateway process that transforms chor the endpoints of a link; and that contexts local objects into DHT nodes and links, and that contain links to allow speci cation of, DHT messages into local operations (see g- object compositions as sub-graphs. Nodes, ure 2). From the repository viewpoint, the and contexts have types, attributes, gateway process can be just another applica- links, and unique object identi ers (oids). tion. set of operations can be apNote that the gateway process is equiv- plied Ato xed DHT objects: create, delete, read, alent to a combination of transforming and and update an object. retrieve accessing processor components of the feder- links can be applied toAdditionally, a context to ated database architecture reference model the links within a context associatedobtain with described by Sheth and Larson [28]. The a speci ed node. The important feature of style of access implemented, however, is nav- these operations is that any one can be perigational rather than query-oriented, as dis- formed by a single repository on its own obcussed later in section 4. jects. Cooperation among repositories is not A request-response style communica- required. In addition, a given repository can 4

Client applications

LAN Gateway

LAN Gateway

Gateway

Ingres RDB

Wide Area Network

OODB

file system

Development Database

Software Archive

Softman Database

LAN Gateway

Gateway

Client applications

Process Model Repository

Lisp db

in-memory database

Name Server

Figure 1: A typical DHT environment. elect to provide any subset of these opera- versions as a context. The second is simpler, tions, as appropriate for the level of access it requiring the repository to assign a new oid to the last version of an object before updatintends to provide. ing the current version. The last alternative simply requires creation of a new entity. In 3 Issues addition, note that all three require relationSome additional issues related to sup- ships to be maintained as links among the port of software engineering, such as version- versions. ing, aggregations, and concurrency control, The choice of versioning model depends must be addressed. We discuss these next. on the application. For example, to build a speci c con guration of a system, speci c Versioning. Since participating reposito- versions of each module must be selected. ries may not have native support for version- A con guration can be modeled as a depening, it is a dicult issue to address. There dency graph where the nodes are speci c verare three possible alternatives for versioning sions of a module, hence in hypertext ( gure 3): An object identi er alternative is appropriate.the third versioning (oid) can refer to all of the versions together, as in HAM [8]. In this case, a link to the On the other hand, a programmer's object can refer to any version of the object. workspace should include the most recent Alternately, an oid can refer to the most re- versions of each module that the programcent version. In this case a link always points mer is working on, indicating that the secto the current version of an object. Finally, ond versioning alternative (where links point an oid can refer to a speci c version in the to the most recent version of a node) would history of an object's evolution, in which case be most appropriate. a link always points to the speci ed version. Finally, a tool tracing requirements will Subsequently, links that point to the most likely refer to modules implementing particurecent version are not possible, because each lar requirements. In this case, it is important new version gets a distinct new oid. to know which module implements a requireEach of these alternatives can be rep- ment. This case would likely consider all verresented using DHT primitives. The rst re- sions as part of the same whole, thus the rst quires a repository to manage a collection of 5

Version 1

OID_1

Version 1

OID_1

Version 1

OID_1

Version 1

OID_2

Version 1

OID_1

Version 2

OID_1

Version 2

OID_2

Version 1

OID_2

Version 1

OID_1

Version 2

OID_3

Version 2

OID_2

Version 3

OID_1

Version 3

OID_3

First update:

Version 1 OID_1 Version 2

Second update:

Version 1

OID_1

Version 2

Key:

Version 3

Link points to all versions.

Node -

Last Node Created -

Link points to current version.

Most Recent Version -

Link points to initial version.

Link -

Figure 3: Versioning models. speci es a set of links, which in turn can describe a variety of graph structures from simple sets of nodes to trees, DAGs, and general graphs. However, deleting a context only deletes its links. The nodes connected by the links remain una ected. Since a context's links are created and managed by the same repository that owns the context, autonomy is preserved.

alternative is appropriate.

Aggregation. Aggregate object are common in software engineering environments: a project is a collection of lifecycle documents; a system is compiled from a set of software modules. It is therefore desirable to have a modeling construct that allows sets of objects to be referred to as a single entity. Aggregation, however, introduces con icts between the goals of transparency and autonomy. Transparency requires that the same operations (create, get, update, delete) apply to aggregations as to unit objects; this may con ict with autonomy constraints depending on the semantics of delete operations applied to aggregates. For example, if delete implies that the components of the composite object are deleted as well, and the composite contains objects from more than one repository, the operation will violate execution and association autonomy because the a ected repositories will need to coordinate to ensure that the components are deleted. To overcome this diculty, aggregation is modeled by contexts in DHT. A context

Concurrency Control. Providing concurrency control presents problems similar to those of aggregation, because each repository must be free to implement its own concurrency mechanism (or none at all). As such, we cannot assume that there is a common notion of concurrency control. The challenge is to provide a concurrency mechanism that does not violate a repository's autonomy. As discussed in Section 2, DHT operations apply to single objects, and can be performed by one server without coordination with other servers. The DHT architecture exploits this feature to provide a form of timestamp concurrency control [18, pp. 380-383] designed to prevent \lost updates". 6

nents are also databases. In contrast, DHT objects are accessed primarily by navigation, which requires a much simpler local interface, and less capability from the component repositories.

When a client reads an object, the server returns a timestamp along with the object data; this timestamp is used to determine whether subsequent updates con ict. The client includes the timestamp with any subsequent update requests on the object. The server, upon receiving an update request, examines the timestamp to determine when the last update occurred in relation to the timestamp. If the update precedes the timestamp, the update succeeds; if an update has occurred since the timestamp, the server refuses the update and returns an error message. The client then takes whatever action it deems appropriate.

Hypertext.

A number of research projects have applied hypertext to software object management, including the Hypertext Abstract Machine (HAM) [8], the Documents Integration Facility (DIF) [14], and HyperCASE [11]. For the most part these are based on a single, centralized repository architecture. In contrast, systems such as PROXHY [17] and Chimera [2] seek to move away from this position. These two systems enable linking among objects from diverse sources. Subsequently, links are managed by the hypertext system, while nodes are managed by individual applications. This contrasts sharply with the DHT approach, which attempts to insulate applications from the details of object storage management through a uniform access interface.

4 Related Work We now compare the DHT approach to possible alternatives, as represented by (i) heterogeneous database management systems, (ii) hypertext systems for software development, and (iii) proposed \standard" object models.

Heterogeneous Database Management Standardized Object Models Program Systems. HDBMSs provide query- integration mechanisms such as the Comoriented access to data stored in heterogeneous component databases. There are three broad classes of such systems[4]: global schema multidatabases, in which applications access data through a single uni ed schema [9, 12, 26]. federated databases, wherein import schemas are used to provide access to external data through the existing local database schema [16, 13]; and multidatabase language systems, that retrieve data by posing queries directly to participating databases using a multidatabase query language [20, 6]. They main di erence between heterogeneous databases and DHT is the style of access. Heterogeneous databases provide query oriented access, with the accompanying requirement for query processors, schedulers, and local translators for the global query language. As a result, most heterogeneous database systems assume that compo-

mon Object Request Broker (CORBA), Object Linking and Embedding (OLE), and the IBM System Object Model (SOM) specify interfaces to potentially heterogeneous objects. However, these mechanisms focus on sharing application behavior, such as the rendering capability of a drawing tool or calculation functions of a spreadsheet. In these cases, the data objects used by the application (graphic le or spreadsheet) remain hidden behind the object interface. Thus, they are more appropriate for application integration and interoperability rather than data integration.

5 Discussion and Observations Conklin describes the \essence" of hypertext as a combination of three components: a data access method based on navigation; a representation scheme similar to 7

Key: Context

Node Link

Set

Sequence

Tree

Graph

Figure 4: Modeling aggregation in DHT. OpenRepository() CloseRepository() CreateObject() DeleteObject () GetObject() UpdateObject() GetAttributes() GetLinks() Table 2: Storage manager interface functions.

a semantic net; and an interface modality incorporating browsing by direct manipulation of link anchors [10]. By exploiting this multifaceted nature of hypertext, the DHT solution embodies a comprehensive approach to software object management across heterogeneous environments, with the following bene ts: (1.) The navigational style of access is an intuitive, natural mode for interacting with the structured and semi-structured (textual, graphical) data prevalent in software environments. (2). DHT lends itself to very simple gateway implementations, since only get and put operations are required. Most of the implementation of a gateway consists of code to manage connections and process requests and replies. This code is repository independent and can therefore be reused. The current DHT prototype provides a library of basic server functions for this purpose. The repository-dependent part of a gateway comprises an interface to the local storage manager, and must be hand crafted for each repository. This task, however, is straightforward requiring the implementation of eight basic storage management interface functions that are called by the server library (see Table 2). (Note that a simple read-only gateway need only implement OpenRepository(), CloseRepository(), and GetObject().)

Repository type Gateway Size Unix File system 1400 Relational database 900 Object-oriented database 600 Table 3: Gateway implementations (lines of code). Our experience shows these are simple to implement. For instance, we have been able to get gateways running within a single day of programming e ort. Examples of actual e ort to implement several gateways, measured in lines of C language source code, are shown in Table 3. (3.) The semantic net representation scheme of attributed nodes and links enables multiple, exible structures to be overlayed on the same core set of objects. This means that users can organize objects in di erent 8

lor, and E. James Whitehead, Jr,. Hyways to suit their speci c needs. For expertext for heterogeneous software enviample, users can access shared nodes using ronments. Technical report, University the same or independent contexts. Addiof California, Irvine, 1993. tionally, links can represent relationships between parts of objects (via anchors), in addi[3] Gerard Boudier, Ferdinando Gallo, tion to linking objects as a whole. Regis Minot, and Ian Thomas. An (4.) The user interaction technique overview of PCTE and PCTE+. In complements navigational access with a comSIGSOFT '88: Proceedings of the mon interaction style that applies to all softThird ACM SIGSOFT/SIGPLAN Software object types, yielding application- and ware Engineering Symposium on Practype-transparent interfaces. tical Software Development Environments, Boston. The Association for 6 Conclusions Computing Machinery, 1988. We believe the DHT concept and architecture propose an interesting solution to [4] M. W. Bright, A. R. Hurson, and Simin H. Pakzad. A taxonomy and a research problem and a practical problem. current issues in multidatabase systems. The research problem entails how to proIEEE Computer, March 1992. vide object management services to a dispersed group of autonomous, heterogeneous software object repositories. The practical [5] Alan W. Brown. Database Support for Software Engineering. Halstead Press problem is how to provide the geographically (John Wiley and Sons), 1989. dispersed software development teams with a logically centralized repository of sharable [6] Omran A. Bukhres, Jiansan Chen, and reusable software assets, where each has Weimin Du, Ahmed K. Elmagarmid, its own locally developed artifact repository. and Robert Pezzoli. Interbase: An exAn evolving prototype implementation of ecution environment for heterogeneous DHT which demonstrates a solution to these software systems. Computer, 26(8):57{ problems, in support of distributed software 68, August 1993. development environments, is operational. Further, experience with DHT access perfor- [7] M. H. Cagan. The HP SoftBench envimance shows that its retrieval (\get") and ronment: An architecture for a new genstorage (\put") operations across Internet eration of software tools. The Hewlettsites rivals that of local-area network le sysPackard Journal, 41(3), June 1990. tems for many di erent types of repositories. Thus, we believe these results begin to [8] Brad Campbell and Joseph M. Gooddemonstrate the viability of the DHT conman. HAM: A general purpose hypercept, architecture, and implementation. text abstract machine. Communications of the ACM, 31(7), July 1988.

References

[1] Evan W. Adams, Masahiro Honda, and [9] Chin-Wan Chung. DATAPLEX: An access to heterogeneous distributed Terrence C. Miller. Object management databases. Communications of the in a CASE environment. In Proceedings ACM , 33(1):70{79, January 1990. of the 11th International Conference on Software Engineering. The Association [10] Je Conklin. Hypertext: An introducfor Computing Machinery, 1989. tion and survey. Computer, 20(9):17{41, September 1987. [2] Kenneth M. Anderson, Richard N. Tay9

[11] Jacob L. Cybulski and Karl Reed. A hypertext based software-engineering environment. IEEE Software, March 1992. [12] Umeshwar Dayal and Hai-Yann Hwang. View de nition and generalization for database integration in a multidatabase system. IEEE Transactions on Software Engineering, SE-10(6):628{644, November 1984. [13] D. Fang, J. Hammer, D. McLeod, and A. Si. Remote-exchange: An approach to controlled sharing among autonomous, heterogenous database systems. In Proceedings of the IEEE Spring Compcon, San Francisco. IEEE, February 1991. [14] Pankaj K. Garg and Walt Scacchi. A hypertext system for software life cycle documents. IEEE Software, 7(3):90{99, May 1990. [15] Dennis Heimbigner. Experiences with an object manager for a processcentered environment. Technical Report CU-CS-484-92, University of Colorado at Boulder, 1992. [16] Dennis Heimbigner and Dennis McLeod. A federated architecture for information management. ACM Transactions on Oce Information Systems, 3(3), July 1985. [17] Charles J. Kacmar and John J. Leggett. PROXHY: A process-oriented extensible hypertext architecture. ACM Transactions on Information Systems, 9(4):399{420, October 1991. [18] Henry F. Korth and Abraham Silbershatz. Database System Concepts. McGraw-Hill, 1986. [19] Andreas Lampen and Axel Mahler. An object base for attributed software objects. In Proceedings of the EUUG Autumn '88 Conference, London. European UNIX Users Group, October 1988.

[20] Witold Litwin and Abdelaziz Abdellatif. An overview of the multi-database manipulation language MDSL. Proceedings of the IEEE, 75(5):621{631, May 1987. [21] V. J. Mercurio, B. F. Meyers, A. M. Nisbet, and G. Radin. AD/Cycle strategy and architecture. IBM Systems Journal, 29(2):170{187, 1990. [22] K. Naryanaswamy and Walt Scacchi. A database foundation to support software systems evolution. The Journal of Systems and Software, 7(1):37{49, March 1987. [23] John Noll and Walt Scacchi. Integrating diverse information repositories: A distributed hypertext approach. IEEE Computer, 24(12):38{45, December 1991. [24] William Paseman. Architecture of the Atherton software BackPlane. In Proceedings of 1989 ACM SIGMOD Workshop on Software CAD Databases. The Association for Computing Machinery, 1989. [25] Maria H. Penedo and E. Don Stuckle. PMDB{a project master database for software engineering environments. In Proceedings of the 8th International Conference on Software Engineering. IEEE, August 1985. [26] Ming-Chien Shan. Uni ed access in a heterogeneous information environment. IEEE Oce Knowledge Engineering, 3(2), August 1989. [27] B.R. Shatz. Telesophy: A system for manipulating the knowledge in a community. In Proc. Globecom 87, pages 1181{1186, New York, Sept. 1987. The Association for Computing Machinery. [28] Amit P. Sheth and James A. Larson. Federated database systems for managing distributed, heterogeneous, and au-

10

tonomous databases. ACM Computing Surveys, 22(3), September 1990. [29] Tom Strelich. The software life cycle support environment (SLCSE) a computer based framework for developing software systems. In SIGSOFT '88: Proceedings of the Third ACM SIGSOFT/SIGPLAN Software Engineering Symposium on Practical Software Development Environments, Boston. The Association for Computing Machinery, 1988. [30] Jack C. Wileden, Alexander L. Wolf, Charles D. Fisher, and Peri L.Tarr. PGRAPHITE: an experiment in persistent typed object management. In SIGSOFT '88: Proceedings of the Third ACM SIGSOFT/SIGPLAN Software Engineering Symposium on Practical Software Development Environments, Boston. The Association for Computing Machinery, 1988.

11