Reverse engineering relational databases to identify ... - Springer Link

Inf Syst Front (2006) 8:395–410 DOI 10.1007/s10796-006-9007-2

Reverse engineering relational databases to identify and specify basic Web services with respect to service oriented computing Youcef Baghdadi

Received: 15 December 2005 / Revised: 5 April 2006 / Accepted: 23 June 2006 / Published online: 28 November 2006 # Springer Science + Business Media, LLC 2006

Abstract Service-oriented computing (SOC) is the computing paradigm that utilizes services as a fundamental building block. Services are self-describing, open components intended to support composition of distributed applications. Currently, Web services provide a standardbased realization of SOC due to: (1) the machine-readable format (XML) of their functional and nonfunctional specifications, and (2) their messaging protocols built on top of the Internet. However, how to methodologically identify, specify, design, deploy and manage a sound and complete set of Web services to move to a service-oriented architecture (SOA) is still an issue. This paper describes a process for reverse engineering relational database applications architecture into SOA architecture, where SQL statements are insulated from the applications, factored, implemented, and registered as Web services to be discovered, selected, and reused in composing e-business solutions. The process is based on two types of design patterns: schema transformation pattern and CRUD operations pattern. First, the schema transformation pattern allows an identification of the services. Then the CRUD operations pattern allows a specification of the abstract part of the identified services, namely their port types. This process is implemented as a CASE tool, which assists analysts specifying services that implement common, reusable, basic business logic and data manipulation. Keywords Application architecture . Service-oriented computing . Web services . Service-oriented architecture . Relational databases . Transformation pattern . CRUD operations pattern . Reverse engineering process . CASE tool Y. Baghdadi (*) Sultan Qaboos University, Muscat, Oman e-mail: [email protected]

1 Introduction Service-oriented computing (SOC) is the computing paradigm that utilizes services as a fundamental building block to develop applications (Huhns & Singh, 2005; Papazoglou & Georgakopoulos, 2003). Services are self-describing, open components that support rapid, low-cost composition of distributed applications (Fremantel, Weerawarana, & Khalaf, 2002; Stal, 2002). Currently, Web services framework provides standard realization of SOC (Curbera, Khalaf, Mukhi, Tai, & Weerawarana, 2003; Kreger, 2003) due to: (1) the machine-readable format (XML-based description) of their functional and nonfunctional specifications, and (2) their messaging protocols (XML-based SOAP) built on top of the Internet protocols such as HTTP (Hyper Text Transfer Protocol), SMTP (Simple Message Transfer Protocol), or FTP (File Transfer Protocol). In addition, they can live with distributed object-computing middleware such as CORBA (Common Object Request Broker Architecture), DCOM (Distributed Common Object Model), and RMI (Remote Method Invocation) (Barry, 2003), make legacy databases and applications, and even traditional EAI (Enterprise Application Integration) messaging look like Web services (Meredith & Bjorg, 2003), integrate with semantic Web (Sycara, Paolucci, Ankolekar & Srinivasan, 2003), and implement business transactions (Papazoglou, 2003). This makes them not only a standardbased realization of SOC as a distributed computing infrastructure for intra- and cross-enterprise application (Papazoglou & Georgakopoulos, 2003), but also a de facto Internet standard instance of SOA (Service-Oriented Architecture) (Austin, Barbir, Ferris, & Garg, 2002; Huang & Chung, 2002; Huhns & Singh, 2005; Smith, 2004). Yet, Web services development is still hampered by technical, semantic, and methodological issues. Technical issues are related to policy, security, availability, perfor-

396

mance, transaction, and management. Semantic issues deal with the meaning of their behavior and the data elements they manipulate and communicate. Methodological issues concern with architecture, design process, models, formalisms, languages, notations, and tools used to reduce the complexity of their development respective to a large-scale usage. How to methodologically identify, specify, design, implement, deploy, organize and manage a sound and complete set of Web services to unlock and reuse common business logic and data is still an issue. Identifying the set of services to implement SOA architecture is not a trivial task; service identification is a determining factor in creating and migrating to a successful SOA (Arsanjani, 2002, 2004; Chen M., Chen A.K.N., & Shao, 2003; Endrei et al., 2004; Levi & Arsanjani, 2002; Papazoglou & Yang, 2002; Zimmermann, Korgdahl, & Gee, 2004). The approaches based on a case-by-case wrapping, or implementing a thin SOAP/WSDL/UDDI layer on top of existing applications yield brittle Web services (Altman, 2003), or not enough reliable, manageable and reusable Web services (Papazouglou & Yang, 2002). The existing object-oriented analysis and design method and their related CASE tools aim at generating code to implement a system, whereas, a service-oriented methodology should replace code generation by service discovery, selection, and composition mechanisms (Huhns & Singh, 2005). This paper describes a reverse engineering process, a software engineering technique that consists of extracting design specifications of an existing system in order to allow maintenance (or building) of software, as a method for identifying and developing Web services, with respect to SOC, for a large-scale reuse of common business logic and data locked within existing legacy systems, built at high cost over years, notably the relational databases, the most predominant in today’s market due notably to SQL (Collins, Navathe, & Mark, 2002; Fowler, 2003). It mainly aims at moving from relational database applications architecture, where SQL statements implementing common data manipulation or some business logic are embedded in the applications, to SOA architecture, where SQL statements are insulated from the applications, factored, implemented, and registered as Web services to be discovered, selected, and reused in composing applications. The reverse engineering process is intended to produce a sound and complete set of cohesive, loosely coupled, wellinterfaced, basic Web services from a normalized relational schema. Further, a forward engineering process will reuse these services in composing more coarse-grained Web services, applications supporting business processes, and ultimately dynamic e-business solutions. The reverse engineering process is based on design patterns, whereby a pattern is defined as a recurring solution

Inf Syst Front (2006) 8:395–410

to recurring problem in some context (Booch, Rumbaugh, & Jacobson, 1999; Gamma et al., 1995). It mainly uses two design patterns: schema transformations pattern and CRUD (Create, Retrieve, Update, Delete) operations pattern. First, the schema transformation pattern allows an identification of the services. Then the CRUD operations pattern allows a specification of the abstract part of the identified services, namely their port types. The CUD (Create, Update, and Delete) part of CRUD constitutes a sound set of operations. This set is completed by many R (retrieve) operations in accordance with the analysts’ requirements and the state of each entity stored as a database table. A CASE tool assists the process in the stages of identification, specification, design, and implementation of Web services. The produced Web services are stored in a repository, which provides a mean for their discovery and management. The next section introduces the requirements of the service-oriented paradigm, and its main building block that is Web service. Section 3 introduces a running case, the motives, and the design patterns; and details the reengineering process. Section 4 specifies the CASE tool. Section 5 discusses some related work. Finally a conclusion section presents further development and research.

2 Requirements of the service-oriented paradigm The service-oriented paradigm specifies some requirements, namely the properties of its main building block that is service, i.e., the specification of its description, organization, and messaging protocols. It also specifies the requirements of the architecture and platform technology that realize it.

2.1 Services-oriented computing paradigm (SOC) Service-oriented computing is a new computing paradigm, which is intended to be a middleware for intra- and interenterprise applications integration (Huhns & Singh, 2005; Papazoglou & Georgakopoulos, 2003). In this paradigm, services become the fundamental building blocks upon which new applications are created. Services are intended to provide higher-level abstraction for a new architecture of applications based on reuse in the context of autonomy and heterogeneity of components to improve productivity and quality of applications (Huhns & Singh, 2005). Thus, SOC defines a set of requirements that distinguishes it from other computing paradigms such as object-orientation paradigm. Therefore, to make SOC a middleware that provides capability such as automated discovery, dynamic

Inf Syst Front (2006) 8:395–410

selection, binding, composition, and choreography of services, services must satisfy the following properties: & & &

& &

& &

Self-description, i.e., service should be able to describe its public interface. It is also possible to retrieve the information from the executing service itself. Self-contained, i.e., a service can run independently of the state of any other service or application. Reuse, i.e., service should always be specified with its reuse by different applications in mind in order to bring flexibility and agility to the business. Reuse is one of the most important concept in the new methods of software development such as object-orientation or componentbased development. Loose coupling, i.e., ability of a service to change its implementation without any impact on its client applications. High cohesion, i.e., the degree of the functional relatedness of the operations of the service. High cohesion increases the clarity of the design; simplifies maintenance and future enhancements; achieves services granularity; and often supports low coupling. Interoperability, i.e., services can possibly request each other from any platform without any knowledge of how they are implemented. Implementation hiding, i.e., service hides its implementation so that it can change it (for many reasons) without affecting its clients. The clients invoke it only through well-specified interface. However, the interface needs to be specified with ontology to allow a clear contract between service and its clients.

Therefore, to operate in a SOC environment, applications termed services should: (1) define the semantics of their functional (e.g., capabilities) and nonfunctional (e.g., quality of services) requirements in a machine-readable standard ontology such as XML, and (2) communicate through messaging protocols built on top of the Internet protocols. Based on these standard declarative description and communication, automated discovery, selection and binding become a native capability of SOC middleware and applications. Although the existing object-oriented distributed computing middleware frameworks such as CORBA, DCOM, and RMI may pretend to realize SOC, the Web services framework is the most suitable to realize such a paradigm thanks to its underlying standards.

397

agents using XML-based messages exchanged via Internetbased protocols.” (Austin et al., 2002). That is, Web services are software components provided by organizations, located on the Web, and accessible from any Web-connected application using a set of standard messaging protocols based on the Internet. Web services are based on three main standards: (1) WSDL (Web Service Description language), which is a standard language used to describe and expose the interface of the service. WSDL helps to automatically generate proxies (i.e., stub for the client application and skeleton for the servant) independently of any platform. WSDL can be used with any messaging protocol (e.g., HTTP, SMTP, FTP, or BEEP). WSDL is used to advertise the service capabilities, interface, behavior, and quality so that service clients can utilize this description to achieve their goal. A WSDL document describes two aspects of the service: the abstract part (also called service interface) and the concrete part (also called service implementation). WSDL, when matured enough, can state the conceptual purpose and expected results of the service, whereby the service interface publishes the service signature (input/output/error/exceptions), and the QoS (Quality of Services) description publishes important non-functional services quality attributes (cost, performance, response time, security, reliability, scalability and availability). Publication of such information about available services provides the necessary means for discovery, selection, binding, and composition of services. (2) SOAP (Simple Object Access Protocol), which is based on Internet protocols such as HTTP, SMTP and FTP, is used to exchange XML messages communicating content (e.g., request, response, and fault message) and invoke operations offered by the services. (3) UDDI (Universal Description, Discovery, and Integration) is used to register and locate services on the Web. Figure 1 describes the abstract part of Web services, where: & & &

2.2 Web services as a realization of SOC A Web service is “a software application identified by a URI, whose interfaces and binding are capable of being defined, described, and discovered as XML artifacts. A Web service supports direct interactions with other software

&

Each Web service has one service interface (abstract part). Each service interface has one or more portTypes. Each portTtype contains one or more abstract operations. An operation is a specification of a logic implemented by the service. The portTtype is equivalent to a component interface. An operation is a sequence of messages. An operation references a set of messages, i.e., messages are defined independently of the operations, so they can be used by other operations.

398

Inf Syst Front (2006) 8:395–410 References 1

Port Type 1

1..*

0..1

Operation 1

0..1

Message (input/output/ exception)

1..*

*

Part (Attributes)

Fig. 1 Abstract part of a Web service

&

&

A message may be: an input message that conveys the request content, output message that conveys the response content, or a fault message that conveys an error or an exception. A message consists of a list of parameters (called parts), where a parameter is of a certain data type (e.g., String, Double).

The three underlying standard technologies, WSDL, SOAP and UDDI and other technology standards such as BPEL4WS (Business Process Execution Language for Web Services) allow Web services to be reusable, loosely coupled, highly cohesive, self-described, self-contained, interoperable components that provide a distributed computing infrastructure for both intra- and cross-enterprise application integration, and consequently a realization of SOC. While Web services framework provides a standard realization of SOC as a distributed computing infrastructure, their development is still hindered by technical, semantic, and especially methodological issues. The latter concern with architecture, design process, models, formalisms, languages, notations, and tools that we should use to reduce the complexity and cut the cost of such a development in order to realize a service-oriented computing architecture. That computing architecture will specify the methodology and the requirements for its implementing technology platforms. Currently, neither the case-by-case approaches, nor the existing object-oriented analysis and design methods are suitable to develop such architecture. The latter provide CASE tools to generate the code, instead of providing mechanisms to discover, select, and reuse existing autonomous and heterogeneous services (Huhns & Singh 2005). The former, based on different wrapping techniques of existing applications, yield brittle Web services. Meanwhile, with respect to business information systems, the relational databases, the most successful thanks to the presence of the standard language that is SQL, store objects such as procedures, triggers, and especially tables intended to provide applications with services, namely data manipulation. Factoring these services, in a cohesive way, is a potential, practical approach to interface them as

Web services. This is a big step towards moving to a service-oriented architecture, where existing code becomes loosely coupled to be readily reused, which reduces the complexity and cuts the time of the development. This approach is practically a reverse engineering process that transforms a relational database into a sound and complete set of basic, well-specified Web services, which requires some design patterns, namely the transformation pattern and CRUD operations pattern to ensure the soundness and completeness of the set of basic Web services. The next section presents the elements to develop such an approach.

3 Reverse engineering process for specifying Web services The reverse engineering is a software engineering technique that consists of extracting design specifications of an existing system in order to allow its maintenance (or building) taking into account a new architecture (e.g., multi-tier architecture, SOA), a new paradigm (e.g., object-orientation, service-orientation), a new data model (e.g., relational/object, object model, XML), new technology (e.g., XML, Web services), or simply changing in business requirements in terms of information and business rules. The interest and rationale of the reverse reengineering come from the difficulty and the high cost to maintain large software systems or to build them from scratch (Van den Brandt, Klint, & Verhoef, 1997). The goal is to mechanically use the past development efforts to reduce maintenance expense and improve software flexibility (Premerlani & Blaha, 1994). The reverse engineering is applicable not only to diverse software such as programming code, but also to databases, where various reasons do exist for reverse engineering a database (e.g., from hierarchical/network to relational, from relational to object-oriented, from relational to XML). This paper focuses on reserve engineering running relational databases, the most successful databases thanks notably to SQL, with respect to SOC in order to allow SOA architecture of database applications. In SOA architecture,

Inf Syst Front (2006) 8:395–410 Table 1 Description of some business objects and the events they undergo

399

Object

Attributes

Events

Effect of the events

Customer

[cid, name, billing address, shipping address, type, balance]

New customer Change billing address Change shipping address Drop customer Payment

Insert customer Billing address = new billing address Shipping address = new shipping address Delete customer Balance = balance − paid amount Balance = balance − returned amount Quantity = quantity − quantity ordered Quantity = quantity + quantity purchased Insert order with Status = 1 ‘Received’ Status = 2 ‘Checked’ if balance OK Status = 3 ‘Prepared’ Status = 4 ‘Shipped’ Status = 5 ‘Returned’ Status = 6 ‘paid’

Return items Product

[pid, designation, stock, price, code tax, value tax]

New order Restock

Order

[oid, date, owner, status, amount] + many occurrences of items (quantity, price, discount, subtotal)

New order Check balance Prepare order Ship order Return items Payment

the applications can access databases through Web services instead of embedded, tightly coupled SQL statements. In this context, the reverse engineering process is intended to produce a sound and complete set of cohesive, loosely coupled, well-interfaced, basic Web services implementing common business logic an data manipulation. The resulting Web services allow an automated discovery, selection, and binding for a dynamic composition of either coarse-grained Web services, or e-business applications. The next subsections introduce the running case, the motives, and the design patterns upon which the reverse engineering process is based.

describing the business objects ‘Customer’, ‘Product’, ‘Order’ along with some of the business events they undergo identified in the scope of the business process ‘Order entry’. Each event has an effect on some objects. For instance, ‘Payment’ event modifies the object ‘Customer’, specifically its attribute ‘balance’. Figure 2 shows a snapshot of the corresponding relational schema. Table 2 shows a database instance made up of two customers, two products, three orders, and four items (an item is an order line).

3.1 Running case

The aim is to move from the current applications, which embed tightly coupled and using middleware API such as Database Connectivity) and JDBC (Java

The following tables and snapshot summarize the main elements of the running case. Table 1 shows the attributes

3.2 Motivation: Service-oriented architecture as an architectural pattern

Fig. 2 A Microsoft Access snapshot showing the relationships between the relational tables

architecture of SQL statements ODBC (Open Database Con-

P1 P2 P1 P2 100 100 101 201

10% 10% 5% 5% 50 20 10 10 1350 95 475

nectivity) to access relational databases, to a new SOA architecture of the applications supported by the SOC paradigm. In SOA architecture of database applications, SQL statements are insulated from the main logic of the applications to become loosely coupled and reusable. That is the applications will communicate with the database through Web services they select from a registry (e.g., UDDI) as shown in Fig. 3 and instantiated in Fig. 4. It is then the responsibility of the Web service providers, not the applications, to implement factored SQL statements representing common logic and data manipulation. The advantages of the proposed approach based on Web services are: &

&

10 50 20 10 100 500 SA1 SA2 N1 N2 10 20

BA1 BA2

T1 T1

P1 P2

HD PR

1 2

100 101 201

12-10-05 12-12-05 20-12-05

4 1 1

10 10 20

quantity amount price stock balance sa name cid

ba

type

pid

des

tax

oid

date

status

cid

1 2

tax pid oid

discount

Tax Item Order Product Customer

Table 2 A database instance

50 100

Inf Syst Front (2006) 8:395–410 val

400

&

&

&

&

Insulation of the SQL statement from the logic of the applications. That is, the implementations of CRUD operations, some stored procedures, and triggers with SQL are hidden to the applications. The applications will see only the interfaces of such CRUD operations. Fowler in (Fowler, 2003) quotes “it’s wise to separate SQL access from the applications logic”. The SQL statements used in different applications can be factored and implemented as a reusable service instead of repeating the same SQL code. Indeed, in many applications, the same SQL statement is implemented many times with different applications. For instance the implementations of the functions ‘getByType(String)’ or ‘getBalance(int)’ with SQL are used by different applications such as ‘Order entry application’ as shown in Fig. 4, or ‘Customer relationship management’ and other applications needing customers’ information. The factoring will enhance the reuse of code, which is accessed as service through well-specified interfaces with WSDL. Factored SQL statements become loosely coupled as they are specified with WSDL and implemented as Web services. The access to different databases is simplified through friendly discoverable interfaces than embedded SQL. Although almost all programming languages can host SQL, Web services are most suitable to fit within any programming language due notably to their underlying standards that allow them to be loosely coupled. The application developers don’t deal with the problems of defining effective SQL queries and commands, neither the optimization of the code. It is up to the providers of the Web services to deal with it. This will lighten the load of the developers who have difficulties with SQL. Improved data security and integrity by: (1) controlling indirect access to objects from non-privileged users with security privileges, (2) ensure that related actions are performed together in atomic transaction, or not at all.

Inf Syst Front (2006) 8:395–410

Layers

401

Current Approach

Proposed Approach (SOA)

Applications A1

An

SQL

SQL

A1

An

2. Find

Middleware

UDDI 3. Bind 1. Publish Middleware (ODBC, JDBC) Web Services Implementing SQL SQL

Middleware (ODBC, JDBC)

Databases

DB1

DBm

DB1

DBm

Fig. 3 Current and proposed applications architectures compared

&

&

Improved performance by exploiting shared SQL, i.e., avoiding recompiling and testing same SQL statements for multiple users. Improved maintenance by upgrading or modifying SQL statements without affecting multiple applications.

3.3 Design patterns A pattern defines a recurring solution to a recurring problem in some context (Booch et al., 1999). That is, a design pattern represents a well-known solution to

Service Customer Registered Services (including customer service (UDDI)

1. Publish (WSDL)

2. Find (UDDI)

Client Order Entry Application //Select a service { while service not found) { Select a service } … //invoke the selected service Double bal =getBalance(10) … // use the response if bal < 0 {…} else {…} … }

Fig. 4 Web service architecture instantiated

Request:(getBalance, C1) (SOAP)

3. Bind (SOAP/HTTP) Response: (balance) (SOAP)

I n t e r f a c e

I m p l e m e n t a t i o n

Collection getByType(String condition) Double getBalance(int id) Boolean updateBalance(Double amount) … Boolean Insert1_Customer (Collection values) Boolean delete(String condition) …

Logic using SQL

DB Tables: Customer Order Product Item Procures:

402

common design problems in a given context. There are several categories of patterns. Generally, each category corresponds to a certain level of abstraction. For instance, the architectural patterns deal with the global properties and architecture of systems at a high abstraction level. The design patterns are widely used in lower levels of abstraction, for instance, they concern with the process of designing components based systems in which reusable units must be identified (Crnkovic, Hnich, Jonsson, & Kiziltan, 2002). The most famous object-oriented design patterns collection is contained in the book of Gamma et al. (1995): 23 design patterns were collected and documented. Therefore, the proposed solution uses an architectural pattern, where the service-oriented architecture presents a solution to distributed computing problems as shown in the previous section and depicted in Figs. 3 and 4, and two well-known design patterns: the schema transformation pattern and the CRUD operations pattern.

Inf Syst Front (2006) 8:395–410

terms of input, output or exception messages, and then aggregated into port types as shown in Fig. 1. First, the CUD (create, update, delete) part of CRUD constitutes a common set of operations. Each row of a relation instance (table) represents the state of an object or a relationship between objects. An object has a life cycle, i.e., its state is initialized once, modified many times, and dropped once: & &

3.3.1 Transformation pattern In the context of reverse engineering relational databases, the schema transformation pattern helps in the identification stage of the services. The transformation pattern consists of mapping a normalized relational schema specified as a set of relation schemas and a set of constraints (domains, primary keys, and foreign keys) into a set of named, basic services. It transforms each relational schema into one basic service. That is, the organization and interfacing of the services are based on the database table descriptions in the data dictionary so that we get one service per table. This transformation pattern (one table–one service) ensures: (1) A cohesion of the services because the attributes of a normalized relation schema are already aggregated using the functional dependency constraints. That is, these attributes generally undergo the same set of business events or use cases, which presages a set of predefined, related operations as a unit of specification. (2) Uniform interface to each table: the service interface will form a uniform access to the database table. For instance, the customer table will become a service whose interface is described in WSDL and used by all the applications that manipulate the customer as shown in Fig. 4 that represents an example of mapping customer table into Web service named customer service. 3.3.2 CRUD operations pattern The CRUD operations pattern helps in the specification stage of the abstract part of the named services. That is, the operations related to one table (one service) are specified in

&

The create operation implements the instantiation and initialization of the state. For instance, the object ‘Order’ is initialized once, i.e., when the ‘New order’ event occurs. The update operation implements the different changes in the state. A state may change after an event has happened, i.e., a post-event change, or before the event occurs, i.e., a pre-event change. The pre-event change in the state generally triggers the event or assists its realization. For instance, the object ‘Order’ undergoes many events as shown in Table 1. The status ‘received’ is changed after the event ‘New order’ has happened, whereas the status ‘prepared’ is changed before the event ‘Shipment’ occurs, it triggers and assists it. The delete operation implements the drop operation. This operation is invoked whenever an object is no longer utile.

Then this set is completed by many R (retrieve) operations in accordance with the context requirements because we need to know the state of the objects in order to trigger or to assist the realization of some events. That is, all the pre-event states must be provided by their respective objects as services. For instance, ‘shipment’ is an event, it cannot be triggered and realized if the object ‘Order’ cannot provide its state ‘prepared’, likewise the event ‘Payment’ cannot be triggered if the order status is not ‘shipped’. That is, the retrieve operation is basically used to trigger business events, which constitutes a means to identify them. The CRUD operation pattern ensures: (1) A specification of the portTypes for each service, i.e., each CRUD operation is mapped into a portType operation. For instance, the portType of the customer service provided in Fig. 4 and Table 5 is made up of a set of CRUD operations related to the customer table in the database. (2) A specification of each operation in terms of input message, output message, or exception (fault) message, i.e., the input and output parameters of a CRUD operation are mapped into parts (attributes of the input and output messages referenced by the operation as shown in Table 5. (3) Generation of the implementation of the CRUD operations provided by the service in term of generic SQL statements. The SQL statements that access the

Inf Syst Front (2006) 8:395–410

403

database table are embedded into the services which are registered to become easy to find and used by the applications. The transformation pattern and CRUD operations pattern will be used as techniques to reverse engineering relational databases. 3.4 Engineering process The process of identifying Web services is mainly a reverse engineering process, which is mainly based on transformation and CRUD patterns. Later on, a forward engineering process will reuse the resulting Web services to compose and implement more coarse-grained Web services, business processes, or any e-business solutions. The process consists of mapping running relational databases into a set of basic, fine-grained Web services as shown in Fig. 5. These Web services are intended to unlock both common business logic and data that are locked within existing legacy systems built at high cost over years, notably the relational databases, the most predominant in today’s market.

3.4.1 Reverse engineering steps The reverse engineering process is made up of three main steps to reverse the relational schema into a sound and complete cohesive, loosely coupled, well-interfaced, basic Web services. Step 1 The schema transformation pattern allows an identification of the services according to the transformation of each relation schema into a service. It consists of a mapping service that takes a relation schema along with its related constraints (domains, primary, and foreign keys) as input, to just provide an identified, named, basic Web services as output. At this stage, the abstract part, namely the service portType has not been yet specified. For instance, the tables ‘Customer’, ‘Order’, ‘Product’ and ‘Item’ are candidate to become services. Step 2 The CRUD operations pattern allows the specification of the abstract part i.e., the portType, including the aggregated operations and the referenced messages of the identified services

Specified e-Business solution

Reverse

Registry

Sound and Complete set of cohesive, loosely coupled, wellinterfaced Web services

Step2: Apply CRUD patterns

Forward Engineering

Engineering

Step 3: Implement and Deploy

Business events

Named Web services

Step 1: Apply Transformation patterns

Implemented e-Business solution with Web services

Fig. 5 The engineering process

Relational Schema: Relations Schema + Primary Keys + Foreign Keys

404

Inf Syst Front (2006) 8:395–410

Table 3 Application of CRUD operations pattern Operation

Category

Interfaces specification

Explanation

Create

Insert a row giving the list of values Insert a row giving a list of values and the list of attributes

▪Boolean Insert1_tableName (String tn, o lv) ▪Boolean Insert2_tableName (String tn, o la, o lv)

Insert a row that contains values for foreign keys. We must check that a value for a foreign key exists as value of a primary key if we enforce the referential integrity constraint Delete a row giving primary key value, which is not referenced

Two options: ▪Constraint checked by the system (e.g., SQL) ▪User-defined interface such as: Boolean Insert3_tableName (String tn, o la, o lv, o fk) ▪Boolean Delete1_tableName (String tn, o pk)

▪String tn: table name ▪o lv: list of values as object ▪String tn: table name ▪o la: list of attributes as object ▪o lv: list of values as object ▪tn: table name ▪o la: list of attributes as object ▪o lv: list of values as object ▪o fk: list of values of foreign keys as object

Delete rows that satisfy some criteria but are not referenced

▪Boolean Delete2_tableName (String tn, String wc)

Update a list of attributes that are not foreign keys giving a condition

▪Boolean Update1_tableName (String tn, o la, o lnv, String wc)

Update a list of attributes including foreign keys

Two options: ▪Constraint checked by the system (e.g., SQL for checking persistence of the referenced values) ▪User-defined interface such as: ▪Boolean Update2_tableName (String tn, o ra, o rt) ▪o RetrieveI_tableName (String tn, String wc)

Delete

Update

Retrieve

Retrieve from one table

Retrieve from more than one table

▪o Retrieve2_table_name (o lt, string wc)

Retrieve specific attributes from more than one table

▪o Retrieve3_table_name (o lt, o at String wc)

obtained in the first step by identifying their respective set of operations, and the input and output messages for each operation. The CUD (create, update, delete) part of CRUD constitutes a minimal set of operations. Then, this minimal set is completed by many R (retrieve) operations in accordance with application requirements as shown in Table 3. Table 4 shows a template of the portType specification corresponding to each relation schema. Whereas Table 5 shows the Web services related to the customer table in the running case.

▪String tn: table name ▪o pk: value of the primary key, which may be composed ▪String tn: table name ▪String wc: where clause given as string ▪String tn: table name ▪o la: list of attributes as object ▪o lnv: list of new values as object ▪String wc: where clause given as string ▪String tn: table name ▪o ra: list of referenced attributes as object ▪o rt: list of referenced table as object

▪o: returns an object as result ▪String tn: table name ▪String wc: where clause given as string ▪o: returns an object as result ▪o lt: list of given tables as object ▪String wc: where clause given as string ▪o: returns an object as result ▪o lt: list of given tables as object ▪o at: list of attributes. ▪String wc: where clause given as string.

Step 3 Implement and deploy the resulting Web services specification then publish their specification.

3.4.2 Forward engineering process The forward engineering process allows reusing the resulting Web services in different types of compositions such as coarse-grained Web services, applications, business processes, or new e-business solutions. This process is no longer an issue today. For instance, each e-business solution may be specified in a Web services flow

Inf Syst Front (2006) 8:395–410

405

Table 4 Web services specification for each table: a template Port type

Operation

Message in

Message out

Table_name = Named service

Insert1_tableName

▪tn: string ▪lv: complex data type ▪tn: string ▪lv: complex data type ▪la: complex ▪tn: table name ▪o la: list of attributes as object ▪lv: list of values as object ▪fk: list of values of foreign keys as object ▪tn: string ▪pk: complex data type ▪tn: string ▪wc: string ▪tn: string ▪la: complex data type ▪lnv: complex data type ▪wc: string ▪tn: table name ▪o ra: list of referenced attributes as object ▪o rt: list of referenced table as object ▪tn: string ▪lt:list of tables ▪wc: string ▪at: list of attributes

▪ack: boolean

Insert2_tableName

Insert3_tableName

Delete1_tableName Delete2_tableName Update1_tableName

Update2_tableName

As many Retrieve_table_name as decided by the developer

language such as BPEL4WS, which can invoke the repository to provide it with the required basic Web services for its implementation.

4 CASE tool specification The reverse engineering process is implemented as a CASE tool that assists identification, specification, and implemen-

▪ack: boolean

▪ack: boolean

▪ack: boolean ▪ack: Boolean ▪ack: Boolean

▪ack: boolean

▪res: simple or complex data type

tation of a sound and complete set of Web services that are basic, common business logic. The resulting Web services are stored in a repository, which provides a means for discovery, selection, binding, composition, and management of Web services. Figure 6 is a sequence diagram, which shows how the CASE tool assists users in specifying and generating their Web services, whereas the five snapshots show how the interface for any table (e.g., customer) is built. This interface constitutes the portType of the WSDL.

Snapshot 1 Main interface of the CASE tool

406

Inf Syst Front (2006) 8:395–410

Snapshot 2 Select ‘Create interface’ after connecting to the data dictionary

Snapshot 3 Select a table from the list of tables in the data dictionary (e.g., customer)

Snapshot 4 Select an operation among the CRUD operations (e.g., insert)

Inf Syst Front (2006) 8:395–410

407

Snapshot 5 Create the interface (e.g., customer interface)

5 Related work Many projects concern with reverse engineering. Van den Brandt et al. (1997) compiled an annotated bibliography grouped by topics including those related to reverse engineering data model, especially relational model in order to move to: (C1) new data model, paradigm, or even technology such as object-oriented data model or XML, or (C2) new system architecture such as multi-tier architecture or service-oriented architecture. In the first category (C1), in Premerlani & Blaha (1994), the authors describe an approach and a tool to reverse engineer relational databases to OMT (Object Modeling Technique) diagrams in a partially automated fashion. Fowler (2003)

describes a set of patterns for mapping a relational database schema into a set of classes. In Collins et al. (2002), the authors describe a set of algorithms to map relational and network models into XML schema to make available to wide-range of Web-based systems and applications. The closest projects and work to our approach are those of the second category (C2). In Strudel project developed by Fernandez, Florescu, Kang, Levi, and Suciu (1998), the existing relational databases are wrapped into the Strudel internal format to Web-enable legacy data repository. In Polo, Gomez, Piattini, and Ruiz (2002), the authors describe a kind of a formal and practical approach to generate three-tier Web applications from relational databases. The authors combine the three-tier architecture, the

Table 5 Transformation of the tables of the running case into Web services: example of the customer table Port type

Operation

Message in

Message out/fault

Customer

Insert1_Customer

▪tn: string (e.g., table Customer) ▪lv: complex data type (e.g.,) ▪tn: string (e.g., table Customer) ▪la: complex data type (e.g., (CID, NAME, BA, TY)) ▪lv: complex data type (e.g.,) ▪tn: string (e.g., table Customer) ▪pk: complex data type (e.g., ) ▪tn: string (e.g., table Customer) wc: string (e.g., where cid = 20) ▪tn: string (e.g., table Customer) ▪la: complex data type (e.g., BAL) ▪lnv: complex data type (e.g., ) ▪wc: string (e.g., where cid = 10) ▪tn: string (e.g., table Customer) ▪lt : list of tables (e.g., table Customer) ▪wc: string (e.g., where cid = 20) ▪at: list of attributes (e.g., NAME)

▪ack: Boolean or ▪Exception

Insert2_Customer

Delete1_Customer Delete2_Customer Update1_Customer

getBalance_Customer

▪ack: Boolean or ▪Exception

▪ack: Boolean or ▪Exception ▪ack: Boolean or ▪Exception ▪ack: Boolean or ▪Exception

▪res: Double or ▪Exception

408

Inf Syst Front (2006) 8:395–410

User

Table Frame

Attribute Frame :

Schema

CRUD Frame

Interface Frame

Repository

1: click_table() 2: find_table (string) 3: find_attribute()

4: click_interface()

5: find_interface(string)

6: enable_interface()

7: enable_CRUD() 8: select_CRUD(string) 9: select_attribute()

10: add_operation(string)

11: register_operation(string)

12: validate_interface(string)

13: register_interface(string)

Fig. 6 Interactions diagram (CASE tool-end user)

transformation patterns, and the CRUD patterns to generate a set of classes representing mainly the middle tier, i.e., the business logic. In Baghdadi (2005), the author describes a process that generates Web services from the attributes describing the business objects and coordination artifacts as described in the highest abstraction level of the business model, i.e., the universe of discourse. The attributes are aggregated according to a time/space constraint called factual dependency. Each aggregation of factually dependent attributes is validated with regard to an actual business event. The aggregation is then interfaced to lead to a wellspecified Web service. Our approach is related to a reverse engineering with respect to the service-oriented computing that is realized by the service-oriented architecture. It is a novel approach that extends and reuses the concepts of previous works, but its

main goal is to propose an architecture within which the existing code is first insulated from the applications logic, and then factored to be finally reused by different applications. That is, instead of generating classes with respect to an object-oriented paradigm, it generates a set of interfaces that are implemented by a sound and complete set of cohesive, loosely coupled, well-interfaced, multipurpose, basic Web services organized to be reused by any kind of applications, specifically with respect to SOA. The generation process enforces the requirements of the service-oriented computing. Besides, the approach is implemented as CASE tool that assists the analyst to specify, design and manage their own informational Web services. Such a category of Web services will require the analysts to specify the usage of the Web services, especially the business processes or any other solutions that are composed from Web services.

Inf Syst Front (2006) 8:395–410

6 Conclusion This paper has presented a reverse engineering process to identify and specify basic Web services from an existing relational schema. The approach aims at moving to a serviceoriented architecture for databases applications, whereby the SQL statements implementing business logic and data manipulation and embedded within applications are insulated from these applications, factored, and specified as Web services to be discovered and reused by a large number of applications. The approach is based on well-known architectural pattern that is the service-oriented architecture, and design patterns that are transformation and CRUD operations patterns. The process produces a sound and complete cohesive, loosely coupled, well-interfaced, basic Web services implementing business logic and data access. The reverse engineering process is implemented as a CASE tool that assists users in deciding the specification and design of their basic Web services. It also helps in generating the SQL code related to the CRUD operations. The resulting set of basic Web services is intended to be reused, in a forward engineering process, to compose more coarse-grained Web services, or any e-business solution including internal or cross business processes. This is a very simple and interesting approach for organizations willing to move to a service-oriented architecture by leveraging their most valuable assets. This work can extend to the management of the resulting Web services and the forward engineering with respect to a comprehensive reengineering process that steadily realizes the service-oriented paradigm.

References Altman, R. (2003). The challenge of Web services. Business Integration Journal, July, 59. Arsanjani, A. (2002). Developing and integrating enterprise components and services. Communications of the ACM, 45 (10), 31–34. Arsanjani, A. (2004). Service-oriented modeling and architecture. http://www-128.ibm.com/developerworks/webservices/library/ ws-soa-design1/. Austin, D., Barbir, A., Ferris C., & Garg, S. (2002). Web Services architecture requirements. W3C Working Group Draft 14. Baghdadi, Y. (2005). A business model for deploying Web services: A data-centric approach based on factual dependencies. Journal of Information Systems and e-Business, 3(2), 151–173. Barry, D. K. (2003). Web services and service-oriented architecture. USA: Elsevier. Booch, G., Rumbaugh, J., & Jacobson, I. (1999). The unified modelling language user guide. Addison Wesley. Chen, M., Chen, A. K. N., & Shao, B. B. M. (2003). Implications and

409 impacts of Web services to EC research and practices. Journal of Electronic Commerce Research, 4(4), 128–139. Collins, S. R., Navathe, B., & Mark, L. (2002). XML schema mappings for heterogeneous database access. Information and Software Technology, 44(4), 251–257. Crnkovic, I., Hnich, B., Jonsson, T., & Kiziltan, Z. (2002). Specification, implementation, and deployment of components. Communications of the ACM, 45(10), 35–40. Curbera, F., Khalaf, R., Mukhi, N., Tai, S., & Weerawarana, S. (2003). The next step in Web services. Communications of the ACM, 46 (10), 29–34. Endrei, M., et al. (2004). Patterns: Service-oriented architecture and Web services. http://www.IBM.com/Redbooks. Fernandez, M., Florescu, D., Kang, J., Levi, A., & Suciu, D. (1998). Catching the boat with strudel: Experience with Web site management system. SIGMOD Record, 27(2), 414–425. Fowler, M. (2003). Enterprise application architecture: Mapping relational databases. Addison Wesley Professional. Fremantel, P., Weerawarana, S., & Khalaf, R. (2002). Enterprise services. Communication of the ACM, 45(10), 77–82. Gamma, et al. (1995). Design patterns: Elements of reusable objectoriented software. Reading, Massachusetts: Addison Wesley. Huang, Y., & Chung, J. Y. (2002). A Web services-based framework for business integration solutions. Electronic Commerce Research and Applications, 2(1), 15–26. Huhns, M. N., & Singh, M. P. (2005). Service-oriented computing: Key concepts and principles. IEEE Internet Computing, Jan/Feb, 9(1), 75–81. Kreger, H. (2003). Fulfilling the Web services promise. Communication of the ACM, 46(6), 29–34. Levi, K., & Arsanjani, A. (2002). Goal-driven approach to enterprise component identification and specification. Communication of the ACM, 45(10), 45–52. Meredith, L. G., & Bjorg, S. (2003). Contracts and types. Communications of the ACM, 46(10), 41–48. Papazoglou, M. P. (2003). Web services and business transactions. WWW: Internet and Web Information Systems, 6(1), 49–91. Papazoglou, M. P., & Georgakopoulos, D. (2003). Service-oriented computing. Communications of the ACM, 46(10), 25–28. Papazoglou, M. P., & Yang, J. (2002). Design methodology for Web services and business process. Proceedings of the third international workshop on technologies for E-services, 54–64. Polo, M., Gomez, J. A., Piattini, M., & Ruiz, F. (2002). Generating three-tier applications from relational databases: A formal and practical approach. Journal of Information ad Software technology, 44, 923–941. Premerlani, W. J., & Blaha, M. R. (1994). An approach for reverse engineering relational databases. Communications of the ACM, 37(5), 42–49. Smith, D. (2004). Web services enable service-oriented and eventdriven architectures. Business Integration Journal, May, 12–14. Stal, M. (2002). Web services: Beyond component-based computing. Communication of the ACM, 45(10), 71–76. Sycara, K., Paolucci, M., Ankolekar, P., & Srinivasan, N. (2003). Automated discovery, interaction and composition of semantic Web services. Journal of Web Semantics, 1(1), 27–46. Van den Brandt, M. G. J., Klint, P., & Verhoef, C. (1997). Reverse engineering and system renovation: An annotated bibliography. ACM SIGSOFT Software Engineering Notes, 22(1), 57– 68. Zimmermann, O., Korgdahl, P., & Gee C. (2004). Elements of service-oriented analysis and design: An interdisciplinary approach for SOA projects. http://www-128.ibm.com/developerworks/ webservices/library/ws-soad1/.

410 Youcef Baghdadi has taught in many universities abroad. He is currently a research coordinator of the Department of Computer Science at Sultan Qaboos University in Oman. He holds a Ph.D. degree from the University of Toulouse in France. He is a member of the ACM and IEEE Computer Society. His research aims at bridging the gap between business and information technology (IT), namely in

Inf Syst Front (2006) 8:395–410 the areas of information systems (IS), cooperative IS, IT, web services, e-commerce and e-business. He has published articles in journals such as the Journal of Information Systems and eBusiness, International Journal of Electronic Business, Journal of Electronic Research and Applications, International Journal of Web and Grid Services, Journal of Informing Science and others.

Reverse engineering relational databases to identify ... - Springer Link

Reverse engineering relational databases to identify ... - Springer Link

Suggest Documents

Reverse Engineering Relational Schemas to

Reverse engineering of relational database applications - Springer Link

Introduction to Reverse Engineering - Springer Link

Addendum to Null values in nested relational databases - Springer Link

Converting Relational Databases into Object- relational Databases

Reverse metallurgical engineering towards ... - Springer Link

Software Visualization for Reverse Engineering - Springer Link

Software Engineering Design Patterns for Relational Databases

Relational Databases

Relational Databases

Program understanding in databases reverse engineering - CiteSeerX

Reverse engineering of relational database applications - CiteSeerX

From Bedside to Bench: Reverse Engineering Medical ... - Springer Link

Data Reverse Engineering of Legacy Databases to Object ... - CiteSeerX

Data Reverse Engineering of Legacy Databases to ... - Semantic Scholar

Experiment databases - Springer Link

Dependencies in Relational Databases

Tutorial: Managing Relational Databases

Quantum Relational Databases

Using relational databases to analyze microarray ...

Converting Relational to Graph Databases - Roma Tre

From Relational Databases to NoSQL Databases - Semantic Scholar

Migrating Relational Databases to the Cloud

Re-engineering relational databases: the way forward - CiteSeerX