Product Documentation Management through REST-based Web Service Filipe Rocha, Maria Leonilde R. Varela, Sílvio Carmo-Silva Department of Production and Systems, School of Engineering, University of Minho, Azurém Campus, 4800-058 Guimarães, Portugal,
[email protected],
[email protected],
[email protected]
Abstract Innovation is a creative process strongly associated with development and deployment of new products. This is essential in the global economy of today for the success and sustainability of companies. Collaboration between those involved in new product development (NPD) is an important requisite for product development and innovation. New product development involves not only the company and customers but also suppliers, free lancers and other stakeholders. Strong collaboration and agile communication among all are critical requirements for product development success. This is due to the highly competitive world economy we live in nowadays and also because data and relevant information that needs to be shared is distributed over networks of resources and users. Hence, web-based applications, decentralized repositories and databases are used to store and manage product and process development information as well as NPD stakeholders’ interaction. Thus, for meeting today’s new product development requirements Internet based collaborative tools and services must be applied. Web services are important assets in product development, helping the integration of data and knowledge bases and also for process and application interactions. This paper reports on a web services based research work focused on the development of a document-oriented web-based application for product development using Apache CouchDB technology and REST web services. The application seeks the managing of product and process information, as well as documentation generation, throughout the product development process cycle in an Internet based collaborative environment. An additional important function addressed deals with product development history.
Keywords - Web services; product information; document-management, CouchDB, REST.
17.1 Introduction Innovation is closely associated with new product development (NPD). This is an activity that can benefit very much from collaboration between stakeholders in-
volved. This can be enhanced through the use of agile tools and technology for efficient and fast information sharing and by carrying out of collaboration tasks. Such tools and technology are mostly available through the Internet. In fact, nowadays the Internet can be considered the best channel for collaboration and knowledge exchange and the path for companies’ sustainability and competitiveness [1]. This is not only true inside a corporation, where actual new product development happens, but also for suppliers and customers innovating in a cooperative environment [1-4]. Zhang et al. [5] enhances and describes a web-based collaborative design platform. This platform provides a product catalogue, a communication system between peers, knowledge access restrictions and project management. The use of web services [6] can be seen as a feasible interface between involved parties in product innovation, helping the integration of data and knowledge bases, processes and application interactions [1,3,5,7]. One important aspect of this work is to access product and process information, in the collaborative distributed context of NPD. Access to information for product design and development is not an easy task when using relational databases (RDB). Tables are referencing other tables and to retrieve little information, extreme computational work needs to be done by selecting and joining data from this structure. RDB are designed to store and report on highly structured, interrelated data. Moreover, they include schemas and storage of the existing data that need frequent update. This often causes problems not anticipated in the initial database design. Each change in the initial database schema is a challenge for local and distributed upgrades. When dealing with paperwork, this is common and happens when we need to adapt predefined paper forms by adding or removing information and fitting its strict form to the needs of the moment. In this paper, we propose a new approach to manage new product development information. This new concept is based on a document-oriented database (DoDB) with native distributed properties, as an alternative to the Relational Database Managements System (RDBMS). Hence, instead of the strict schema provided by tables we get to work with this new NoSQL (Not only SQL) next generation databases: schema-free, with easy replication support and simple API and capable of dealing with huge amounts of data [8]. For reasons described later in this paper we chose an open source Apache project named Apache CouchDB, commonly referred just as CouchDB [9]. This paper, in addition to this introduction follows with a literature review and the description of a proposed approach for managing product information. Based on this a system architecture is described and illustrated. Finally a conclusion is presented.
174
Computational Intelligence Applications for Future Power Systems
17.2 Literature Review Companies use proprietary software, most of the times centralized, having high costs with licenses and high risk with information losses. Aziz et al. [10] talk about the shortcomings of centralized architectures and proposes a decentralized based on open standards and open source tools, stating that systems like this are implementable at low cost. Sharma et al. [3] referring collaborative product innovation presents a theoretical framework identifying tools and processes for people and teams to collaborate on product development. Zhang et al. [5] reviewed the state of the art on Internet-based product information, concluding that despite several Internet web-based technologies that had been applied to product development still none had been applied to real industrial applications. They also found a number of practical problems remaining unsolved, including dynamic product information collection and update, real-time collaboration, scalability and interoperability. These problems occur because there are no real applications yet available, probably due to the existence of bottlenecks on the database back-ends. Relational databases have very strict schemas and aggravate those problems with dynamic information, i.e., a change on the initial schema represents a change in the whole system. Moreover, dynamic product information is not what we get with schemas because there is no predefined schema for each product or service. Major problems in scalability with relational databases become a point of failure. In particular, replication is not trivial. To understand why, consider the problem of having two database servers that need to have identical data. Having both servers for reading and writing data makes it difficult to synchronize changes. Having one master server and another slave is bad too, because the master has to take all the heat when users are writing information [11]. This statement leads to wish that there were a replication system that would come with an automatic conflict detection and resolution, by making it easy to synchronize data in both directions whenever wanted after being able to work offline independently [12]. REST (REpresentational State Transfer) technology [13] is considered to have great potential for solving scalability problems [14-17]. Fielding [13] states ―The REST architectural style has been defined to describe the web architecture and to guide its future evolution, preserving the fundamental characteristics of scalability‖. It promotes software evolvability, efficiency, performance and reliability [12] using the protocols already available for the web. Liu and Xu [18] reviewing some traditional technologies and systems for management of product information and discuss the need for their integration on a web environment as a means of having a more adaptive and flexible infrastructure. They also mention bandwidth and security concerns with data availability over the Internet. Unfortunately they do not say how the reviewed systems handle changes
made to predefined data management models, leaving therefore, important questions unanswered. Frutos and Borenstein [19] talk about mass customization and the role that customers should have in new product development. They report a web-based information system for flats’ customization through interactions between building company web site and costumers. Customers can generate a variety of customized solutions, i.e. new products, with control on costs, despite flat variants’ limitations. One approach to solve these limitations can be done with the NoSQL concept, as in Not Only SQL [8]. Mateljan, et al. [20] describe the several approaches to this concept and propose an ROI analysis. Leavitt [21] describes RDBMSs limitations and enumerates advantages and challenges of NoSQL databases, quoting several people’s thoughts about them. One concept that has been used in production areas in several fields is the NoSQL concept. One of these types of databases is the Apache CouchDB [9] can be of great utility in managing NPD information since it is document-oriented and also provides easy implementation of RESTful web services due to its internal REST functionalities [13, 22-24]. The next sections will describe this work: the motivation behind it and our proposed approach followed by describing the system’s architecture.
17.3 Motivation Where do we quickly store our information on a daily basis? Perhaps taking a quick note on a piece of paper, put it on a post it, open a text editor and type it or whatever suits us best. These documents usually end piled up and we just browse through them and still have trouble finding what we want, if we ever do. Now, when this involves project documentation, documents are referencing other documents and back again. This is often seen as paper chaos. Even if it is well organized we still need to spend some time browsing them to find any information, going back and forth and eventually we end up picking up another piece of paper to sanitize collected information. In practice, this is how relational database systems work. Tables are referencing other tables and to retrieve little information, extreme computational work needs to be done by selecting and joining data from this structure. Also, many different approaches have been developed for addressing the problem of adding structure back to semi-structured data, namely through relational databases, which are designed to store and report on highly structured, interrelated data. Moreover, in this kind of approach, as needs evolve the schema and storage of the existing data must be updated. This often causes problems as new needs arise that simply were not anticipated in the initial database designs, i.e., each change in the initial database schema is a challenge for local and distributed upgrades. When dealing with paperwork, this is common and happens when we need
176
Computational Intelligence Applications for Future Power Systems
to adapt predefined paper forms by adding or removing information, fitting its strict form to the needs of the moment. Therefore, in this paper, we propose a new approach to manage new product development information. This new concept is based on Document-oriented Databases (DoD) with native distributed properties, instead of the usual Relational Database Managements System (RDBMS). This means that instead of the strict schema provided by tables we get to work with the documents and free schemas. This new concept is based on NoSQL databases [8], namely one Open Source Apache Project named Apache CouchDB or commonly referred as just CouchDB. CouchDB is an open source ―distributed, fault-tolerant and schema-free document-oriented database, accessible via a RESTful HTTP/JSON API‖ [9]. The next sections will focus on explaining a new approach to the needs described on the previous sections followed by a description of the system architecture capable of fulfilling the need of organizing and sharing documentation on new product development.
17.4 Proposed approach In this work we propose the CouchDB for managing NPD information. CouchDB is an open source ―distributed, fault-tolerant and schema-free document-oriented database, accessible via a RESTful HTTP/JSON API‖ [9]. Data is stored in documents, presented in key-value maps using the data types from JavaScript Oriented Notation (JSON) [25]. CouchDB is designed to store and report on large amounts of semi-structured data, simplifying the development of document-oriented applications, which is the bulk of collaborative web applications. A CouchDB database can be described as a flat collection of JSON documents, identified by a unique ID (UUID). Each document is an object that consists of named fields. Field values can be strings, numbers and dates, ordered lists and associative maps. Figure 17.1 illustrates a JSON document including some of these substructures before. Moreover, CouchDB should also not be seen as an object-oriented database, despite relationships between documents are possible. With CouchDB, no schema is enforced, so new document types with new meaning can be safely added alongside the old ones. CouchDB is a fully Atomic Consistent Isolated Durable (ACID) [26] storage engine, never overwriting committed data or associated structures, ensuring the database file to be always in a consistent state. Document updates are serialized and there are no locks. Instead it uses a Multi-Version Concurrency Control (MVCC) model, meaning any number of clients can be reading documents without being locked out or interrupted by concurrent updates, even on the same document. [27]. Transactional commits provide consistent state to the database,
i.e., in case of failure during a commit transaction, the commit is discarded and maintenance operations are run. From time to time, some jobs are run in order to compact data, freeing space on disk. Hence, CouchdB Documents are indexed in B-Trees and each update to the database generates a new sequential number to identify the document state in its history represented by the _rev field shown in Figure 17.1.
Fig. 17.1 JSON document in CouchDB CouchDB is a peer based distributed database system. Here any number of CouchDB hosts (servers and offline-clients) can have independent ―replica copies‖ of the same database, where applications have full database interactivity (query, add, edit, delete). When back online or on a schedule, database changes are replicated bi-directionally. CouchDB also is characterized by having a built-in conflict detection and management and the replication process is incremental and fast, copying only documents and individual fields changed since the previous replication. Therefore, an updated document being inserted on a database should be replicated through the entire distributed database backend and easily retrieved. Hence, most applications require no special planning to take advantage of distributed updates and replication. With very little database work, it is possible to build a distributed document management application with granular security and full revision histories.
178
Computational Intelligence Applications for Future Power Systems
CouchDB also integrates a view model to add structure back to unstructured data. Views are the method of aggregating and reporting on the documents. CouchDB is based on Google’s Map/Reduce programming model [28] for views, but the results of a Map/Reduce task always reflect the current state of the documents in the database by only reindexing the documents that have changed since the last index update [29]. Figure 17.2 a) shows an example of a JavaScript map function counting the elements of the array revisions seen on Figure 17.1. It emits a view using emit (keys, values). Figure 17.2 b) is the JSON result document to the map function early described. The field total_rows holds a counter to the number of result documents mapped by the function. The offset field is where the document list has started. The rows array holds document objects used on emit function. Id is added automatically. A reduce function can now be applied using the values field and i.e. built-in reduce function is _sum. It returns the sum of all values in the view. _stats implements various numerical statistics on emitted data.
Fig. 17.2 JavaScript map function Any of these documents can be easily retrieved using an HTTP client, i.e. a browser, URL [30] since the communication protocol used by CouchDB is the HTTP [31]. Using these techniques, characteristics, functionalities and processes provided by CouchDB for producing and managing documents it should be easy to produce services that can generate and use information that needs to be shared on NPD. In the next section we will be proposing a system architecture that should support this.
17.5 System Architecture New product development can be highly enhanced if new product ideas from all stakeholders involved and mainly from consumers are taken and the development process supported via Internet collaboration. The system based on CouchDB, here proposed can provide such support and the effective implementation of the product development process. Thus, it takes the advantage provided by couchDB Therefore, by using CouchDB development information, including all external attachments, including design information, reports, specifications, meaning all iterations should be stored. There are several technologies being considered for implementation. CouchDB manages, with little maintenance efforts, the distributed environment keeping all peers synchronized on demand and/or on schedule. Since REST is used, operations to the database are simple. To manage product information a web interface (see Figure 17.3) is created using Ruby [32] and Ruby on Rails (or just Rails) [33]. Because Rails and CouchDB speak REST/HTTP homogeny is still assured with a small help from a Ruby wrapper. All of these technologies are open source and available for free.
Fig. 17.3 Excerpt of a page from the web interface If it is considered that all stakeholders speak in the same language, e.g., suppliers and NPD teams, they can dynamically supply resources for a product. In order to preserve suppliers’ confidential information data would be filtered by web ser-
180
Computational Intelligence Applications for Future Power Systems
vices, becoming a subpart of the product information. Figure 17.4 is a graphical representation of this idea.
Fig. 17.4 Use of web services to filter/provide confidential information Considering the example being used in this paper, Figure 17.5 could be a document produced for sharing. This document would now be consumed by a web service to provide the information needed.
Fig. 17.5 Shared basic product information to be consumed by a web service Moreover, there are also several use cases that could include those involved in distributed e-commerce websites where multiple partners are able to share their catalogs, patient medical records could be shared between different institutions, personal finances. Everyday needs involving different paper documents could be candidate since they are not immutable and keeping their history would be a prerequisite. Shared catalogs, provided by distributed properties could be used in this context, since NPD teams would have (online and offline) access to suppliers’ catalogs. Document revisions and attachments would be available through revision properties provided by Document-oriented Database Management Systems (DoDBMS).
17.6 Conclusion In this paper we make a proposal for a collaboration system for new product development (NPD), based on CouchDB emphasizing the importance and role of
managing information and documentation during and after the development process. Although we are still in an early stage of investigation and we could not carry out any practical implementation of the approach to NPD process and documentation management here proposed, similar concepts and approaches have been used in production on several fields [34, 35] and we have already performed some work, in order to put forward the proposed system soon. In short, this work tries to manage information more naturally, integrating its several parts and using new technologies, based on CouchDB and REST web services. Acknowledgments
This work was developed under the scope of project PEst-OE/EME/UI0252/2011, financed by The Foundation for Science and Technology – FCT.
References [1] [2]
[3] [4] [5] [6] [7]
[8] [9] [10]
[11] [12] [13] [14]
W. Yan, C. Chen, Y. Huang, and W. Mi, ―A data-mining approach for product conceptualization in a web-based architecture,‖ Computers in Industry, vol. 60, Jan. 2009, pp. 21-34. B. Dong, G. Qi, X. Gu, and X. Wei, ―Web service-oriented manufacturing resource applications for networked product development,‖ Advanced Engineering Informatics, vol. 22, 2008, pp. 282-295. A. Sharma, ―Collaborative product innovation: integrating elements of CPI via PLM framework,‖ Computer-Aided Design, vol. 37, 2005, pp. 1425-1434. L. Wang, ―Collaborative conceptual design—state of the art and future trends,‖ ComputerAided Design, vol. 34, Nov. 2002, pp. 981-996. S. Zhang, W. Shen, and H. Ghenniwa, ―A review of Internet-based product information sharing and visualization,‖ Computers in Industry, vol. 54, 2004, pp. 1-15. W.C.W. Group, N. February, D. Booth, W.C.F. Hewlett-packard, and H. Haas, ―Web Services Architecture,‖ Group, 2004. H. Wang, G. Liu, B. Han, and J. Zhang;, ―Collaborative simulation environment framework based on SOA,‖ Computer Supported Cooperative Work in Design, 2008. CSCWD 2008. 12th International Conference on, 2008, pp. 416-419. Http://nosql-database.org/, ―NOSQL Databases.‖ Apache Software Foundation, ―Apache CouchDB: The Apache CouchDB Project.‖ H. Aziz, J. Gao, P. Maropoulos, and W. Cheung, ―Open standard, open source and peer-topeer tools and methods for collaborative product development,‖ Computers in Industry, vol. 56, 2005, pp. 260-271. W. Shen, Q. Hao, and W. Li, ―Computer supported collaborative design: Retrospective and perspective,‖ Computers in Industry, vol. 59, Dec. 2008, pp. 855-862. K. Rodriguez and A. Al-Ashaab, ―Knowledge web-based system architecture for collaborative product development,‖ Computers in Industry, vol. 56, Jan. 2005, pp. 125-140. R.T. Fielding, ―Architectural styles and the design of network-based software architectures,‖ Citeseer, 2000. [14] B. Dong and D. Zhao, ―Service-oriented design part information semantic modeling and applications,‖ Computer Design and Applications (ICCDA), 2010 International Conference, vol. 5, 2010, pp. 174-177.
182
Computational Intelligence Applications for Future Power Systems
[15] S. Xie, H. Huang, and Y. Tu, ―A WWW-Based Information Management System for Rapid
[16]
[17]
[18] [19] [20]
[21] [22] [23] [24] [25] [26] [27] [28] [29]
[30] [31] [32] [33] [34] [35]
and Integrated Mould Product Development,‖ The International Journal of Advanced Manufacturing Technology, vol. 20, 2002, pp. 50-57. G. Decandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels, ―Dynamo: amazonʼs highly available key-value store,‖ SOSP ’07: Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles, ACM Press, 2007, pp. 205-220. K. Tseng and H. Abdalla, ―A Web-based integrated design system: its applications on conceptual design stage,‖ The International Journal of Advanced Manufacturing Technology, vol. 35, Nov. 2008, pp. 1028-1040. D.T. Liu and X.W. Xu, ―A review of web-based product data management systems,‖ Computers in Industry, vol. 44, 2001, pp. 251-262. J.D. Frutos and D. Borenstein, ―A framework to support customer-company interaction in mass customization environments,‖ Computers in Industry, vol. 54, 2004, p. 115–135. V. Mateljan, D. Cisic, and D. Ogrizovic, ―Cloud Database-as-a-Service (DaaS)-ROI,‖ MIPRO, 2010 Proceedings of the 33rd International Convention, IEEE, 2010, p. 1185– 1188. N. Leavitt, ―Will NoSQL Databases Live Up to Their Promise?,‖ Computer, vol. 43, Feb. 2010, pp. 12-14. J.C. Anderson, J. Lehnardt, and N. Slater, ―CouchDB: The Definitive Guide,‖ 2009. L. Richardson and S. Ruby, RESTful web services - Webservices for the real world, OʼReilly, 2007. S. Allamaraju, Restful web services cookbook, OʼReilly Media / Yahoo Press, 2010. D. Crockford, ―RFC 4627: The application/json Media Type for JavaScript Object Notation (JSON),‖ Proc. of XML, 2006. J. Gray and A. Reuter, Transaction Processing : Concepts and Techniques (Morgan Kaufmann Series in Data Management Systems), {Morgan Kaufmann}, 1992. D.P. Reed, ―Naming and Synchronization in a Decentralized Computer System,‖ Massachusetts Institute of Technology, 1978. J. Dean and S. Ghemawat, ―MapReduce : Simplified Data Processing on Large Clusters,‖ Communications of the ACM, vol. 51, 2008, pp. 1-13. K. Orend, ―Analysis and Classi cation of NoSQL Databases and Evaluation of their Ability to Replace an Object-relational Persistence Layer Bewertung ihrer Eignung zur Ablösung einer I assure the single handed composition of this master ’ s thesis only supported by d,‖ 2010. Http://curl.haxx.se/, ―cURL and libcurl.‖ Http://www.ietf.org/rfc/rfc2616.txt, ―Hypertext Transfer Protocol -- HTTP/1.1.‖ Y. Matsumoto, ―Ruby programming language,‖ 2001. Http://www.rubyonrails.com, ―Ruby on rails.‖ Http://wiki.apache.org/couchdb/CouchDB_in_the_wild, ―CouchDB_in_the_wild - Couchdb Wiki,‖ 2009. Http://www.couchbase.com/customers/case-studies, ―Couchbase and CouchDB Case Studies | Couchbase.‖