Importance Of Cloud Computing In Deployment Of ...

2 downloads 64906 Views 185KB Size Report
Also, we emphasize on the issues faced in cloud computing. Throughout ... cloud. Some of the challenges addressed are positioning of VM's across physical machines, ... moves its data over the cloud, there can be security and trust issues.
Importance Of Cloud Computing In Deployment Of Effective Data Warehouse Pooja Gaikwad 1, Lalit Kathpalia2, P. SMetkewar3 PG Student, MBA IT, Symbiosis Institute of Computer Studies and Research, Affiliated to Symbiosis International University(SIU), Pune 2 Director, Symbiosis Institute of Computer Studies and Research, Affiliated to Symbiosis International University(SIU), Pune 3 Associate Professor, Symbiosis Institute of Computer Studies and Research, Affiliated to Symbiosis International University(SIU), Pune

1

Abstract-In this paper, we are trying to integrate the data warehousing techniques with cloud computing technology in order to take the advantage of cloud-based techniques in data warehousing environment. We focus on data warehouse issues and related challenges in developing and maintaining a data warehouse system. Also, we emphasize on the issues faced in cloud computing. Throughout, we will focus on overcoming the issues of data warehousing with the help of cloud data warehousing. Keywords-Cloud computing, Elastic computing, Cloud data warehousing, Elasticity I. INTRODUCTION A data warehouse is a central repository where data is combined from various sources. It stores current as well as historical data which can be used for problem-solving and decision-making. However, there are various challenges in building and maintaining a data warehouse listed below: • It takes extensive time to set up a data warehouse. • Over-provisioning,the process of assigning more resources than thedemand can lead to an increase in costs. • In order to build and retain a Datawarehouse, organizations deficit the expertise knowledge and understanding. • Long initial implementation time and associated high costs. • Organization faces a number of consequences in case of system overload, downtime or system crash. Cloud computing can be a probable solution to address these issues. Cloud computing, also known as ‘on-demand computing’, which is a web-based computing where shared data, information, resources are supplied to computers as per need and demand. Some of the benefits of cloud computing are: • The expertise is no longer needed within an organization in order to build and maintain a data warehouse. • As cloud services are pay per use the over-provisioning can be avoided allowing areduction in costs. However, there are several challenges faced while deploying the data warehouse over cloud such as cost issue, performance issue, WAN latency, data warehouse does not perform as per desired.In this paper, we will address issues in deploying a data warehouse in order to analyze the capacity of data warehousing in the cloud. II. PROBLEM DEFINITION In this paper, we focus on the issues of deploying a data warehouse over the cloud. Number of issues that will be addressed in the paper are:

@IJRTER-2016, All Rights Reserved

315

International Journal of Recent Trends in Engineering & Research (IJRTER) Volume 02, Issue 08; August - 2016 [ISSN: 2455-1457]

(i)

(ii) (iii) (iv)

While using a cloud service the customer is dependent on internet and infrastructure of the cloud provider. As a result, can be a challenge while importing the data into the cloud for storage, resulting in performance as well as acost issue. Performance issue can arise when we try to get ahuge amount of data from cloud storage to virtual nodes. Applications encounter WAN latency. The local data warehousing system allocates CPU, disk bandwidth, and memorybut cloud providers tend to offer low-end nodes for computations. As a result, data warehouse does not perform as per desired.

III. SURVEY BASED RESEARCH In this section, we will see the existing accomplishments to integrate data warehousing and cloud computing, that will help us to analyze the results achieved in this field. D.J Adabi considers the opportunities, drawbacks andrestrictions of transferring data into the cloud. Conclusions that are obtained from this paper are that the existing database systems are not capable of moving data into the cloud. However, Decision support systems (DSS) can be used which in turn can take the benefit of the cloud. [2] M. Brantner et al. made an effort to construct a database system on Amazon’s S3, while concentrating on OLTP. This paper focuses on whether a simple database can be constructed on S3. [3] D. Lomet et al. proposed an architecture for deployment of OLTP in the cloud. In this research, the author proposed that the database system needs to be split up into transactional components and data components in order to make the database acceptable to operate in the cloud.[4] S. Das et al. proposed Elastras (Elastic Transactional relational database) architecture. In this paper, theauthor focuses on transactional databases. The distributed nature of architecture makes it acceptable for parallel workloads.[5] A. Aboulnagahastraced some of the challenges faced in deployment of adata warehouse onto the cloud. Some of the challenges addressed are positioning of VM’s across physical machines, segregation of resources across VM and handle dynamic workloads. A solution to these issues is provided.[6] N.W. Paton proposed a method to deliver adaptive workload execution in the cloud.[7] IV. PROPOSED SOLUTION Following are the concepts proposed to address the issues mentioned in the paper: Elasticity is a term that allows the data storage to expand and evaluate independently either by automation or as per need without hindering the performance and availability. One of the functionality of elasticity is that it allows storage and querying of different forms of data such as unstructured, semi-structured and structured.

@IJRTER-2016, All Rights Reserved

316

International Journal of Recent Trends in Engineering & Research (IJRTER) Volume 02, Issue 08; August - 2016 [ISSN: 2455-1457]

Figure 1: Cloud Data Warehousing

When an organization moves its data over the cloud, there can be security and trust issues. One solution can be to use encryption, but this leads to technical issues as encrypted data needs different forms of analysis. However, one advantage of moving data over the cloud is that the cloud provider may have the expertise to provide enhanced security while the organization which is moving mov their data might not have. The pay-per-use use model reduces cost effectively but data transfer can still be expensive. Elasticity leads to a reduction in costs because cloud provider offers the ability to rent or let go extra resources. The snowflake data warehouse is specially intended for cloud and it logically merges and physically isolates computation and storage layers. The snowflake takes advantage of Amazon S3 which is elastic and available for its data to be stored centrally. This can be used to reduce costs thereby addressing the issue of cost. Performance issue can arise when we try to get a large amount of data from rom cloud storage to virtual nodes. To address this issue, distribute istribute workload and data across multiple nodes in the cloud cloud. But this might cause problems when querying as the node cannot store a large partition. Thus fetching results from multiple nodes is a difficulty. Thus, Thus if data marts can be deployed on one node then the issues of using multiple nodes can be avoided. The performance issue of the Data warehouse can be addressed by using Azure SQL data warehouse which uses parallel processing architecture. Features of Azure are that the data can be scaled without repose, resources can be added as per requirement thereby, reducing the cost. To address the issue of WAN latency the architecture needs API’s. The API’s reduce latency by allowing local system entrance without having to go to the main server continuously. If data warehouse server is located at large geographical distances then latency can become an issue. Data has to be efficiently imported and exported to and from the cloud. We can reduce the amount of bandwidth needed to transfer data by compressing the data. V. ADVANTAGES TAGES OF CLOUD DATA WAREHOUSING 1. Reduction in costs osts by sharing resources like expensive servers, network equipment, and IT personnel. 2. Cloud computing permits the clients to target on adding value in their fields of core competence for instance business and process perception as an alternative to building and maintaining IT infrastructure.

International Journal of Recent Trends in Engineering & Research (IJRTER) Volume 02, Issue 08; August - 2016 [ISSN: 2455-1457]

3. The cloud has the potential to load the data rapidly which in turn permits the vendor to work with enormous data sets within the shorter time frames specified by the clients, furthermore servicing additional customers at a given time. 4. Once the data is loaded into the data warehouse it needs to be organized in the most optimum manner to facilitate the optimal execution of queries. Optimizations can be performed manually or automatically with the help of database management system. This enables the SAAS BI vendors to fine enhance the classification of the stored data. 5. Faster query execution with the help of cloud infrastructure enables quick analytical reporting. VI. CONCLUSION In this paper, we have observed and researched both data warehousing and cloud computing. We have discussed the challenges in both the fields and have provided solutions to address the various issues. It is our opinion that due to qualities of cloud such as scalability, elasticity, reliability, deployment time and pay-per-use model the cloud data warehousing system have great potential. However, cloud computing is incapable of performing large-scale data warehousing due to various factors such as cloud performance, data transfer speed and pricing issues. REFERENCES 1. 2. 3. 4. 5. 6. 7.

Kees van Gelder (2011). Elastic Data Warehousing in the Cloud. D. Abadi (2009). Data Management in the Cloud: Limitations and Opportunities. M. Brantner, D. Florescu, D. Graf, D. Kossmann, T. Kraska (2008). Building a Database on S3. In SIGMOD '08 Proceedings of the 2008 ACM SIGMOD international conference on Management of data. D. Lomet, A. Fekete, G. Weikum, M. Zwilling (2009). Unbundling Transaction Services in the Cloud. In CIDR Perspectives. S. Das, S. Agarwal, D. Agrawal, A.E. Abbadi (2009). ElasTraS: An Elastic, Scalable, and Self-Managing Transactional Database for the Cloud. In USENIX HotCloud. A. Aboulnaga, K. Salem, A.A. Soror, U.F. Minhas, P. Kokosielis, S. Kamath (2009). Deploying Database Appliances in the Cloud. IEEE Data Eng. Bull. N.W. Paton, M.A.T. de Aragão, K. Lee, A.A.A. Fernandes, R. Sakellariou (2009). Optimizing Utility in Cloud Computing through Autonomic Workload Execution. IEEE Data Eng. Bull.

@IJRTER-2016, All Rights Reserved

318

Suggest Documents