2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks
Availability Evaluation of Digital Library Cloud Services Julian Araujo∗ , Paulo Maciel∗ , Matheus Torquato∗ , Gustavo Callou† and Ermeson Andrade† ∗
†
Informatics Center, Federal University of Pernambuco (UFPE), Recife, PE, Brazil Email: cjma, prmm,
[email protected] Department of Statistics and Informatics, Federal Rural University of Pernambuco (UFRPE), Recife, PE, Brazil Email: gustavo,
[email protected]
Abstract—Cloud computing is a new paradigm that provides services through the Internet. Such paradigm has the influence of the previous available technologies (e.g., cluster, peer-to-peer and grid computing) and has been adopted to reduce costs, to provide flexibility and to make management easier. Companies like Google, Amazon, Microsoft, IBM, HP, Yahoo, Oracle, and EMC have conducted significant investments on cloud infrastructure to provide services with high availability levels. The advantages of cloud computing allowed the construction of digital libraries that represent collections of information. This system demands high reliability and studies regarding analysis of availability are important due to the relevance of conservation and dissemination of the scientific and literature information. This paper proposes an approach to model and evaluate the availability of a digital library. A case study is conducted to show the applicability of the proposed approach. The obtained results are useful for the design of this system since missing data can lead to various errors and incalculable losses.
technology. Information can be accessed from remote places using devices that support access to the Internet. Therefore, cloud computing is a relevant tool for the dissemination of scientific information and literature.
Keywords—Cloud computing; Digital Library; Accelerated Life Testing; Petri net; Reliability Block Diagram; Availability;
In order to provide uninterrupted services through cloud computing, it is important to evaluate and improve the dependability parameters of the underlying infrastructure. Some services might be regarded as digital library depending on the educational institution or company and the number of data operations involved. If an infrastructure suffers a server outage due to database deadlocks, loss of data or network failure may bring an incalculable damage. In addition, there is a cost associated for the recovering of data, and in some cases, it may not be possible to fully recover it [4].
I.
The digital libraries around the world have become fundamental, not merely to protect thousands of articles, collections and books, but also in sharing knowledge for the society. More people come to recognize the importance of digital libraries and the convenience they can bring to the society. Many educational institutions, like schools, universities, colleges, and companies have demonstrated interest to digitize their books and have their own digital libraries and provide services for theirs members. However, these services demand a lot of computing resources to achieve the high levels of reliability, availability, scalability, and security needed by the infrastructure.
I NTRODUCTION
The increasing development and utilization of services based on cloud computing has quickly emerged in the recent years. Several corporations and institutions have demonstrated interest in cloud computing, and also many platforms have been proposed. Google, Amazon, Microsoft, IBM, HP, Apple, Oracle, and Salesforce are a few examples of companies that are making a massive investment on cloud services [1]. More specifically, the U.S. National Institute of Standards and Technology (NIST) defines “cloud computing as model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, server, storage, application, and services)” [2]. Five essential functional characteristics for cloud computing are: on-demand self-service, broad network access, resource pooling, rapid elasticity, and measured service [2]. In addition, others common characteristics that directly impact the availability and reliability service of such systems are the virtualization, geographic distribution, resilient computing, security, scalability, and homogeneity.
This paper proposes an availability model of a digital cloud library in order to estimate downtime levels. A free and opensource cloud computing manager (OpenNebula) and library service (DSpace) were defined and deployed as the environment to conduct the analysis. Measurements were performed to obtain the availability parameters of the library service deployed in a private cloud. An accelerated failure testing (AFT) approach for achieving the availability parameters was conducted. Besides, a reliability block diagrams (RBD) and stochastic Petri nets (SPNs) were used to model and assess the digital library environment. Strategies are discussed aiming to improve the design as well as to compare digital library availability through cloud infrastructures. The remaining sections are organized as follows. Section II describes the architecture of the private cloud system analyzed in this paper. Section III exposes the models developed to represent the library cloud environment. Section IV presents the results obtained through model analysis, with focus on the metric availability. Section V presents the related work.
Services such as web-hosting, e-commerce, and social networking have been developed due to the benefits of cloud computing. An important advantage is the possibility to preserve and share knowledge through digital libraries [3]. Collections of digital libraries are becoming more popular around the globe due to the application of information and communication 978-1-4799-2233-8/14 $31.00 © 2014 IEEE DOI 10.1109/DSN.2014.65
666
Section VI concludes the paper and also presents some possible future works. II.
A RCHITECTURE OVERVIEW
This study considers a private cloud hosting a digital library service. The service is a digital asset management system that allows educational institutions to collect, preserve and disseminate the scholarly and intellectual endeavor of the academy. The library service is composed by: a system mechanism that securely identifies its users; work-flow process for item submission; import and export collections; statistical reports/summary and searchable engine. Users can access bibliographic information such as articles, papers, thesis, books, and dissertations. The user interface is responsible for interfacing between user and digital library management system. Figure 1 presents a digital library service hosted in a private cloud. The service provides submission of contents to the database by a manager, so then users can have access to them. Fig. 2.
Architecture Overview.
regarding the digital library service. An hierarchical model has been created to estimate the availability of the previously presented architecture. Fig. 1.
Digital Library Overview.
A. Accelerated Life Testing
The private cloud is composed of three main components: the Main and Standby nodes as well as the Management (M gn) server (See Figure 2). The Main node is composed by the main host of the environment containing a Virtual Machine Monitor (V M M ) and a Virtual Machine (V M ) hosted in the Main Node. The application (app) running in the V M is a digital library service. There is a Standby node, in order to ensure high levels of availability. It is a spare host which assumes the Main node role when a failure occurs. The management server is the component responsible to supervise and control the entire cloud environment through a specific cloud management tool. Figure 2 depicts the mentioned architecture. Additionally, it is important to stress that the remote storage volume can be accessed by the V M and its management is conducted through the Management server. All the components are interconnected by a private network.
Accelerated life testing (ALT) has been used to reduce lifetime of products through the acceleration of performance degradation features. This method aims to obtain data from experiments under higher stress conditions than the usual ones [5]. The accelerated exponential model [6], [7] is adopted when the time to failure under stress condition is exponentially distributed with a constant failure rate λs . The failure rate is calculated by the Equation 1, where λo is the failure rate under normal conditions, λs is the failure rate under stress conditions, and AF (acceleration factor) is the ratio of normal and accelerated conditions. λo = λs /AF
The Weibull distribution gives the distribution of lifetimes of devices. It was originally proposed to quantify weary data. The accelerated Weibull model [6] is proved by the distributed relationships between failure time and normal conditions. The Mean time to failure under normal conditions is represented by the Equation 2.
The cloud operational mode is described as follows. The Main node (and its V M ) and the Management server must be working in order to let the system be operational. However, in case the Standby node fails, the availability of the cloud goes down. It is worth highlighting that the roles of Standby node and Main node are swapped when the V M restores, therefore the host availability becomes essential to cloud availability as soon as an incoming recover is completed. The objective of the Standby Node is to maximize the availability of the cloud settling constraints that can be established through service level agreement. III.
(1)
M T T Fo = θ1/γ Γ(1 + 1/γ)
(2)
B. Reliability Block Diagram The Reliability block diagram (RBD) [8] is a combinatorial model initially proposed as a technique for calculating the reliability of a system using intuitive block diagrams. Such technique has also been extended to calculate other dependability metrics, such as availability and maintainability [9]. Figure 3 illustrates two examples, in which independent blocks
M ODELS
This section presents the adopted AFT, SPN and RBD models to compute the availability of the private cloud environment
667
Fig. 5.
Node RBD Model.
Fig. 3.
3) Composition Model: After modeling the front-end and node subsystems, a composition model is adopted to represent the whole system. Figure 6 shows the composition model in which each subsystem is represented by a block interconnected in a serial arrangement.
Reliability block diagram.
are arranged in parallel (Figure (3a)) and series (Figure (3b)) structures. In an arrangement series, the whole system is no longer operational if a single component fails. If a system with n independent components is considered, the reliability (instantaneous availability or steady state availability) is obtained by: Ps (t) =
n i=1
Pi
Fig. 6.
In a parallel arrangement (see Figure (3a)), the whole system is operational if only a single component is operational. For a system with n independent components, the reliability (instantaneous availability or steady state availability) is obtained by: n i=1
Frontend and Node RBD Model.
(1 − Pi )
(4)
where Pi is the reliability - Ri (t) (instantaneous availability (Ai (t)) or steady state availability (Ai )) of block bi .
Fig. 7.
A k-out-of-n system functions if and only if k or more of its n components are functioning. Let p be the success probability of each of those blocks. The system success probability (reliability or availability) is calculated by:
C. Stochastic Model
n n i=k
i
pk (1 − p)n−k
(5)
Fig. 4.
1) Cold Standby Redundant Model: A cold standby redundant system is composed of a non-active spare module that waits to be activated when the main active module fails. Figure 8 depicts the SP N model of this system, which includes four places, namely V M 1 ON , V M 1 OF F , V M 2 ON , V M 2 OF F that represent the operational and failure states of both the main and spare modules, respectively. The spare module (V M 2) is initially deactivated, hence no tokens are initially stored in places V M 2 ON and V M 2 OF F . When the main module fails, the transition T ACT is fired to activate the spare module. The redundancy model was adopted to the digital library environment in order to represent a situation when a V M goes down, and the activation of the spare V M restores the service.
Frontend + Redundant Node RBD Model.
SP N Models: This work adopts a particular Petri net extension, namely, Stochastic Petri Nets (SPN) [10], which allows the association of probabilistic delays to transitions using the exponential distribution, and the respective state space is isomorphic to continuous time Markov chains (CTMC) [10]. Besides, SP N allows the adoption of simulation techniques for obtaining dependability metrics as an alternative to the Markov chain generation. The following subsection briefly presents the proposed SP N building block for obtaining the metric availability.
1) Front-end Model: Figure 4 depicts the RBD model that represents the front-end. The front-end is composed of three serial components: hardware (Hw), operational system (OS) and the Management server (Mng). The machine that runs the Management server is the front-end [X]. Considering the architecture depicted in Figure 2, the Front-end is represented by M gnServer.
4) Front-end and Redundant Node: This model is composed of a front-end and two redundant node subsystems in a hot standby arrangement. Figure 7 shows the correspondent RBD model. This redundancy increases the availability of the environment due to the fact that when a node fails, another one automatically takes the place.
(3)
where P i is the reliability - Ri (t) (instantaneous availability (Ai (t)) or steady state availability (Ai )) of block bi .
Ps (t) = 1 −
Frontend RBD Model.
2) Node Model: Figure 5 shows the RBD model that represents the Node. The Node is composed of five serial components: hardware, operational system, Management server, virtual machine (VM) and the digital library service (DL). As depicted the architecture in Figure 2, the Node Model is represented by M ainN ode.
Table I presents the M T T F and M T T R parameters used in order to analyze the RBD models. The values are based 668
To define performance-intensive scenarios, all the HTTP requests generated by the browser while navigating through library service were recorded by the Jmeter tool [15]. A load test with just one user was used to determine the response time of performance intensive requests. In addition to this, three measurement experiments were defined to obtain the response time of the digital library service. The data were collected during the period of 48 hours until failures started to occur.
Fig. 8.
This study considers it a failure when the response time is greater than 6 seconds. From the collected data, the abnormalities or errors (outliers) in the measurements were removed. The next step was to fit the failure data to an appropriate probabilistic distribution. It is important to note that the environment was configured under the same conditions after each measurement occurred.
Cold standby model.
B. Numerical Results in [11], [12]. However, the digital library mean time to failure was obtained from experiments using accelerated life test. The experiment will be explained in the following section. TABLE I. Parameters Hw OS Management tool VM Digital Library
It was necessary to accomplish the ATF planning before the measurements. The test plan was adopted with the following characteristics. Type of stress: HTTP requests; Stress Loading: Constant stress test. After that, the accelerating stress factor was defined (Table II. The acceleration factor (AF ) was based on the ratio between normal conditions under accelerated conditions. For this study, the normal conditions are 0.03 request/s and the accelerated conditions consider 4 request/s.
RBD S PARAMETERS MTTF 8760 h 1440 h 788.4 h 2880 h 6865.3 h
MTTR 100 min 1 hr 1 hr 10 min 10 min
TABLE II.
The steady-state availability quantifies the combined effect of both the failure and repair processes in a system. It can be obtained from the Equation 6. Availability = M T T F/M T T F + M T T R IV.
FACTOR AND L EVEL VALUES
Workload Parameter Request
Request rate (req/s) Regular High 0.034 4
After defining the plan, it was possible to define the statistical parametric model. Thus, the failure times at each stress level were used to determine the most appropriate failure time probability distribution. The following models are commonly used: exponential, Weibull, Gamma and log-normal. Following the fit failures times collected during 48 hours, it was possible to conclude that the accelerated Weibull model was more appropriate to describe the behavior of the obtained data. Thus, the Weibull parameters were calculated in order to describe the mean time to failure (MTTF). Table III depicts the mean time to repair (MTTR) and mean time to failure (MTTF). The MTTR used to recover a Virtual Machine was estimated from observation on analyzed infrastructure.
(6)
R ESULTS AND D ISCUSSIONS
This section presents the results and discussions obtained from the digital library cloud service described above. First the failure rates under accelerated conditions were obtained. Then, the availability of model was parameterized and an analysis was performed. Based on this analysis, it was possible to obtain the availability of the digital library service deployed in the cloud computing. By analyzing the availability results, a designer may verify the respective downtime as well as identify the system parts that most affect the system availability.
TABLE III.
M ODEL PARAMETERS
Parameters MTTF MTTR
A. Test Environment This subsection describes the environment deployed. Three computers with identical configuration were adopted (CPU: Intel i5 1.2 GHz, RAM: 2048 megabytes, NIC: 100 Mbps). A computer was considered to generate the traffic, other one to be the Management cloud server (Front-end), and the third to host the digital library service (Node). All of them are interconnected via an Ethernet switch. The management server adopts an open source project named OpenNebula [13]. The considered digital library service was Dspace [14]. OpenNebula is a project aimed at providing features for management of virtual machines and for private computing. Dspace is an open source software for academic, non-profit, and commercial organizations creating open digital repositories.
Values 6865.3 hr 10 min
From the mean time to failure obtained by the accelerated test, it was possible to parameterize the RBD and SPN availability models explained in the previous section. In addition, Table I presents the mean time to failures of the others components used on the RBD models. In light of an architecture depicted in Figure 2 (baseline), a RBD model was built. Using an hierarchical approach, the front-end was modeled at first (See Figure 4). Then, the availability was calculated. Next, the Node represented by digital library service (See Figure 4) was modeled and its availability computed. Finally, the composition of both models depicted in Figure 6 was
669
performed. Table IV presents the availability and downtime results. The downtime increases due to series composition, i.e., non-redundancy. Through the achieved results, the designer in cloud computing environments can define service policies on availability and thus comprise costs. TABLE IV. Model Frontend Node Frontend + Node
ends whereas such thing wasn’t considered in the previous scenario. Additionally, strategies with coldstandby (sc2.1) and hotstandby (sc2.1) redundancy were considered for the frontend. Figure 10 shows the availability results in number of nines for each redundancy. According to the availability result, the downtime difference between hot and cold redundant strategies close to 87.4%. As the reader should notice, scenario 2 is more of an interesting option than scenario 1, since it has less downtime values. However, a greater investment in infrastructure is required, as well as the maintenance and management of the environment.
RBD AVAILABILITY Availability (%) 99.785029 99.767472 99.553288
Downtime (hr) 18.83 20.36 39.13
As of the first results a designer may be able to use different strategies in order to improve availability levels. Thus, two redundant models were considered in two scenarios. A scenario 1 had availability level analyzed adding ColdStandby (Sc1.1) and Hotstandby (Sc1.2) redundancy into the digital library N ode. The baseline scenario has non-redundant mechanisms meaning that just one V M was used to represent the digital library service. Thus, the proposed scenario aims to add redundancy in digital library service and consequently increase the availability levels. It is important to note that the scenario reduces outages that left the service inaccessible causing inconvenience to library users. Figure 9 shows the availability analysis results of the scenario 1 in terms of number of nine’s considering the environment without redundancy and adding two redundant strategies in digital library service (Node). A nine is the total number of consecutive nines in the percentage calculation for availability. The second column represents a coldstandby (Sc1.1) redundancy, i.e., the spare V M is activated when a failure is registered in the main V M .
6
# of Nines
5
2
0
BaseLine
Sc2.1
Sc2.2
Scenarios
Fig. 10.
Availability study of Scenario 2.
Finally, scenario 3 takes into account a coldstandby redundancy in front-end and digital service library (node). The purpose of this scenario is to evaluate the availability levels based on the range of the time to activate the spare redundancy. Figure 11 presents the availability results considering the number of nines. The availability results decreases along time due to the increased time for the activation of the spare redundancy.
# of nines
2.7 2.65 2.6 # of Nines
3
1
In addition, in the scenario 1 it was considered that a copy of the V M would be continuously taken to ensure the highest level of availability with the permanency of the service running. The hotstandby (Sc1.2) model took into account two digital library services (V M s) running in parallel. Aiming to emphasize availability results, Table V presents the downtime of each redundant model. Given its applicability, the model hotstandby allows less downtime, however, a higher cost is required.
2.55 2.5
3.6 3.5 3.4 3.3 3.2 3.1 3 2.9 2.8 2.7 2.6 2.5
0
0.5
1
1.5
2
2.5
3
3.5
Time to active (min)
2.45
Fig. 11.
2.4 2.35 2.3
BaseLine
Sc(1.1)
Availability study of Scenario 1. TABLE V. Configuration Baseline Cold Hot
Availability study of Scenario 3.
These results can have significant impact on the definition of services in cloud computing environment, especially concerning the conservation of scientific and literary knowledge, as digital libraries are susceptible to mild or severe disruptions. Therefore, it is expected that estimations generated from this work will serve as a guide in decision-making of designers implementing a particular solution of digital library to ensure high availability.
Sc(1.2)
Scenarios
Fig. 9.
4
D OWNTIME VALUES Downtime (min) 39.1319 19.6050 18.8677
V.
R ELATED R ESEARCH
Yang Jie and Liu Wanjum [16] described a short review concerning digital library through the cloud computing.
Afterwards, scenario 2 considered redundancy in front670
R EFERENCES
The paper defined concepts about cloud based architecture on service, application and resource layer. Shafi et al [17] presents a review about how cloud computing can contribute to digital and virtual library technologies. Some advantages and disadvantages of cloud computing are outlined as a way of saving cost of hardware and software maintenance and the problematic condition related to security risk. Both works focus in concepts and applicability of cloud computing for a digital library.
[1] [2]
[3]
[4]
Victor Jess [3] presented a file-storage service implemented in a private/hybrid cloud-computing. The experiment analyzed the response time in the elastic service considering comparison with fixed configuration. Furthermore, over response and service time perceived during the file downloading process using the private cloud. Weiming [18] proposed a platform as a service for fast developing and deploying digital libraries. According to the results, the engine ensures reliability, security, extensibility, availability, quality of service and manageability. In addition, the digital library was tested using the Chinese Traditional Medicine, Literature Chronicle and Calligraphy service.
[5] [6] [7] [8] [9] [10]
[11]
In [19], a list of digital open source software was compared considering different perspectives of features, functions and usability. In the same way, Shakar [20] described a guide for any organization or institute to decide which open source digital library will be ideal for creating their digital collection. Papadmapriya and Rajmohan [21] proposed a model for evaluating quality of web service. The authors defined attributes and metrics for quality, aiming to perform an assessment of utility and practicability considering three standards from IEEE 1061.
[12]
[13]
[14]
Wei et. al [22] uses an hierarchical method and proposes the hybrid models combining reliability block diagrams and general stochastic Petri nets. The results comprehend dependability analysis of virtual data center in cloud computing considering characteristics and mechanism of virtualization such as backup and live migration. According to related works, there are some researches concerning the advantages and disadvantages of cloud computing, as well as, models to analyze dependability of web services. However, almost no work was conducted to explore the availability of digital library services.
[15]
[16]
[17]
[18]
VI.
C ONCLUSION
This work presented an hierarchical modeling method to analyze the availability of digital library management. In the proposed approach an accelerated life testing model was used in order to obtain the availability parameter. Furthermore, the hierarchical model was created to estimate the availability of the cloud computing architecture. Additionally, redundant models were used intending to compare critical levels of availability taking into account quality services and preventing outages.
[19]
[20]
[21]
[22]
The results for the digital library implement in cloud computing employing different redundancy policies show that the conclusions drawn from this paper can improve digital library availability. As a future work, we plan to explore the impact of different redundancy models including the costs associated to each redundant module and live migration mechanism.
671
M. Sadiku, S. Musa, and O. Momoh, “Cloud computing: Opportunities and challenges,” Potentials, IEEE, vol. 33, no. 1, pp. 34–36, 2014. P. Mell and T. Grance, “The nist definition of cloud computing,” National Institute of Standards and Technology, vol. 53, no. 6, p. 50, 2009. V. J. Sosa-Sosa and E. M. Hernandez-Ramirez, “A file storage service on a cloud computing environment for digital libraries,” Information Technology and Libraries, vol. 31, no. 4, pp. 34–45, 2012. D. Patterson, “A simple way to estimate the cost of downtime,” in Proc. 16th Systems Administration Conf.— LISA, 2002, pp. 185–8. F. S. Fogliatto and J. L. D. Ribeiro, Confiabilidade e manutenc¸a˜ o industrial. Elsevier, 2009. H. Pham, Handbook of reliability engineering. Springer, 2003. K. B. Misra, Handbook of performability engineering. Springer, 2008. C. E. Ebeling, An introduction to reliability and maintainability engineering. Tata McGraw-Hill Education, 2004. W. Kuo and M. J. Zuo, Optimal reliability modeling: principles and applications. John Wiley & Sons, 2003. K. Trivedi, Probability and Statistics with Reliability, Queuing, and Computer Science Applications. John Wiley and Sons, 2nd Edition, 2006. J. Dantas, R. Matos, J. Araujo, and P. Maciel, “An availability model for eucalyptus platform: An analysis of warm-standy replication mechanism,” in Systems, Man, and Cybernetics (SMC), 2012 IEEE International Conference on, Oct 2012, pp. 1664–1669. D. S. Kim, F. Machida, and K. Trivedi, “Availability modeling and analysis of a virtualized system,” in Dependable Computing, 2009. PRDC ’09. 15th IEEE Pacific Rim International Symposium on, Nov 2009, pp. 365–371. R. Tansley, M. Bass, D. Stuve, M. Branschofsky, D. Chudnov, G. McClellan, and M. Smith, “The dspace institutional digital repository system: current functionality,” in Digital Libraries, 2003. Proceedings. 2003 Joint Conference on, May 2003, pp. 87–97. D. Milojicic, I. Llorente, and R. S. Montero, “Opennebula: A cloud management tool,” Internet Computing, IEEE, vol. 15, no. 2, pp. 11– 14, March 2011. E. H. Halili, Apache JMeter: A practical beginner’s guide to automated testing and performance measurement for your websites. Packt Publishing Ltd, 2008. Y. Jie and L. Wanjun, “Cloud computing in the application of digital library,” in 2010 International Conference on Intelligent Computation Technology and Automation, vol. 1, 2010, pp. 939–941. M. Shafi, B. Balraj, and S. Kumar, “Cloud computing solutions: Library perspectives,” in Cloud Computing Technologies, Applications and Management (ICCCTAM), 2012 International Conference on. IEEE, 2012, pp. 98–101. W. Lu, L. Zheng, J. Shao, B. Wei, and Y. Zhuang, “Digital library engine: Adapting digital library for cloud computing,” in Proceedings of the 2013 IEEE Sixth International Conference on Cloud Computing. IEEE Computer Society, 2013, pp. 934–941. S. R. Lihitkar and R. S. Lihitkar, “Open source software for developing digital library: Comparative study.” DESIDOC Journal of Library & Information Technology, vol. 32, no. 5, 2012. S. Tramboo, S. Shafi, S. Gul et al., “A study on the open source digital library software’s: Special reference to dspace, eprints and greenstone,” arXiv preprint arXiv:1212.4935, 2012. P. Nadanam and R. Rajmohan, “Qos evaluation for web services in cloud computing,” in Computing Communication & Networking Technologies (ICCCNT), 2012 Third International Conference on. IEEE, 2012, pp. 1–8. B. Wei, C. Lin, and X. Kong, “Dependability modeling and analysis for the virtual data center of cloud computing,” in High Performance Computing and Communications (HPCC), 2011 IEEE 13th International Conference on. IEEE, 2011, pp. 784–789.