cost optimization for data storage in public clouds: a ...

6 downloads 205 Views 689KB Size Report
heterogeneous data blocks using multiple public Cloud storage providers. ... complexities associated with setup and management of the of data centre for underlying ... scheme for cost calculation, offering instead “packages” (such as personal.
Proc. of the IE 2014 International Conference www.conferenceie.ase.ro

COST OPTIMIZATION FOR DATA STORAGE IN PUBLIC CLOUDS: A USER PERSPECTIVE Catalin NEGRU University Politehnica of Bucharest [email protected] Florin POP University Politehnica of Bucharest [email protected] Valentin CRISTEA University Politehnica of Bucharest [email protected]

Abstract. Cloud become more attractive and easy to use for everyone and people are more familiar with provided services and environments.One of the main aims of large-scale distributed systems such as the Cloud is to solve the problem of storage, data availability, and especially their security in a public and shared environments where providers impose costs for end-users. In this paper we propose acost minimization model based on binary linear optimization. The main scope is to find a cost efficient storage scheme for many heterogeneous data blocks using multiple public Cloud storage providers. The proposed model is validated through simulation using a realistic scenario: CyberWater -cyberinfrastructure for natural resource management. Keywords: Cloud Computing, Storage Systems, Cost Models, Binary Linear Optimization, CyberWater. JEL classification: C6 Mathematical Methods and Programming; C8 Data Collection and Data Estimation Methodology; Computer Programs.

1. Introduction More and more people from around the world produce digital data in their daily life (such as photo, sounds and videos, documents, etc.), which leads to an exponential growth creating a challenge for storage and management. Moreover, at the business level, organizations such as a company need to store and access their data and making a part of it public to the customers. A global vision of smart cities place the Public Cloud Storage services becomes a fundamental part of platform architecture for such systems. In any scenarios, data must be available 24x7 in any location and from any device. A possible solution for this challenge is represented by Cloud Computing Services. In Cloud computing everything is provided as a service to end-user, in a functional, usable and extremely powerful manner, permitting to use software resources in a pay-per-use manner. In this way the user has a great flexibility to adapt to changes [11]. There are plenty of Cloud providers, such as Amazon S3 [1], Google [2], Microsoft AZURE [3], Dropbox [4] and so on, that offer Cloud services (e.g. Storage as a Service, Infrastructure as a Service, Software as a Service, etc.). Clients that use Cloud storage services and pay to them can store their data and retrieve it via access standard methods (e.g. PUTs, GETs, REST, SOAP, etc.). In this way customers do not have to deal with complexities associated with setup and management of the of data centre for underlying storage infrastructure. Compared with traditional storage systems, Cloud storage has the

Proc. of the IE 2014 International Conference www.conferenceie.ase.ro following benefits: reduced cost obtained especially from the pay-as-you go model and economies of large scale, easy access to data, speed and agility. By 2015 almost 20% of all digital information will be processed in Cloud and 10% will be maintained on Cloud storage according to the IDC report [5]. With the adoption of Cloud storage, there are two sides of the cost optimization problem: first the Cloud storage provider must calculate his total cost of ownership and adequately put price on his services in order to have profit and amortize his investment and, second, a Cloud user must calculate total cost of storing data in the Cloud and minimized it as much as possible. Different components and their owners with claims for Storage as a Service are presented in Table 1 [11]. Component

Storage as a Service

Business Process Customer Business Logic Middleware Management Application Licensing/ Support OS Management OS Licensing / Support Cloud Storage Provider Server/Storage/Networks HW / Maintenance Domestic Utilities Maintenance Equipment Real Estate Table 1. Total cost of ownership perspective

A major problem with Cloud storage services is represented by the "vendor lock-in" problem, which refers to dependence solely on a particular Cloud storage provider. Switching from one provider to another can be expensive as Cloud storage providers charge inbound, outbound bandwidth and requests of data. A client who wants to move from one provider to another must pay twice the bandwidth and in addition for the actual cost of online storage [6]. The authors of [9] propose a secured cost-effective multi-Cloud storage model in Cloud computing which holds an economical distribution of data among the available in the market, to provide customers with data availability as well as secure storage. Also the authors of [6] propose application of RAID-like techniques at Cloud storage level, meaning striping of user data across multiple providers. We came with a model based on binary linear programming, which use real information and real scenarios, aiming to offer the best storage scheme with minimum cost. This paper is organized as following: Section 2presents cost models for a 10 chosen public storage Cloud providers andthe need for optimization; Section 3describes the optimization model; experiments through simulation and interpreted results are discussed in Section 4; in Section 5 conclusions and future work are highlighted. 2. Cost Models for Public Cloud Storage In Cloud the cost models represent computing one of the fundamental building blocks. Cloud storage providers offer a very wide portfolio of services, whilst clients access them against some financial arrangement. There are plenty of pricing schemes that Cloud storage provider’s offers. In table 2are presented Cloud storage providers and pricing schemes that are used. Is important to understand these pricing schemes in order to minimize the cost when buying Cloud storage services and also understand how will be issued the bill under each model. We identified the following pricing models:

Proc. of the IE 2014 International Conference www.conferenceie.ase.ro 

Consumption-based pricing model – client pays for what resources are used (such as storage capacity, bandwidth);  Subscription-based pricing–client pays the use of the service on monthly basis;  Advertising-based pricing– client receive a small amount of resources, and no pays for them, but in exchange receive allot of advertising;  Market-based pricing– in this model the price for resources is given by the market and client can buy the resource and use it right away. In order to calculate the total cost for storing data in Cloud the following parameters must be taken into consideration: storage capacity used on monthly bases, cost for out data transfer (data transferred outside the Cloud), and the cost for additional features offered by Cloud providers such as: increased redundancy, backup, archive, and so on. The big problem when calculating the total cost is represented by the fact that not all Cloud storage providers offer a transparent scheme for cost calculation, offering instead “packages” (such as personal plans, pro plans or server plans) at a given cost. Cloud Storage Provider Dropbox

Storage Price($ per GB per month) 0.09

Data Transfer Out ($ per GB) NA

PUT & GET Requests (1 milion) NA

Amazon S3 Google Cloud Storage Azure

0.085

0.12

0.005

0.0449

0.12

10

0.5

0.12

0.1

LiveDrive

0.0037

NA

NA

Carbonite

1,9

NA

NA

Copy

0.039

NA

NA

Just Cloud

0.039

NA

NA

Rackspace

0.10

0.12

-

HP Cloud Services

Latency(s) to upload 1 MB

130,02

0.10 0.12 0.10 Table 2.Cloud storage provider’s price comparison

3. Cost Optimization Model using Linear Binary Programming We deal with the following problem: a finite set of 𝑁 heterogeneous data blocks (with different size -𝐷𝐵𝑑 , 1 ≤ 𝑑 ≤ 𝑁) and a finite set of 𝑀 Cloud Storage providersare given. A specific provider𝐶𝑃𝑝 , 1 ≤ 𝑝 ≤ 𝑀 with a defined cost, 𝐶𝑜𝑠𝑡𝑝 , stores on ore more blocks of data. We have a maximum storage capacity, 𝐶𝑝 , for each provider.Let each link between a specific location loc and a Cloud Provider has different transfer cost 𝑇𝐶𝑙𝑜𝑐 ,𝑝 . Let consider L location that can be considered as data source (1 ≤ 𝑙𝑜𝑐 ≤ 𝐿). We will use a linear programming technique to model the problem of cost optimization. Linear programming represents a method to achieve the best outcome (such as maximum profit or lowest cost) in a mathematical model whose requirements are represented by linear relationships. This method is one of the most used in industry. It saves companies thousands or even millions of dollars a year. Let consider the binary variable 𝑥𝑑,𝑝 denotes whether the coresponding cloud provider p for data demand d is selected. The cost optimization problem can thus be formulated as the following linear binary optimization problem:

Proc. of the IE 2014 International Conference www.conferenceie.ase.ro 𝑁

𝑀

𝑇𝑜𝑡𝑎𝑙𝑆𝑡𝑜𝑟𝑎𝑔𝑒𝐶𝑜𝑠𝑡 =

𝑥𝑑,𝑝 𝐶𝑜𝑠𝑡𝑝 𝐷𝐵𝑑 𝑑 =1 𝑝=1

𝑀

𝑁

𝑀

𝑥𝑑,𝑝 = 𝑁

Each data block will be store on one cloud provider and all data blocks will be stored.

𝑥𝑑 ,𝑝 𝐷𝐵𝑑 ≤ 𝐶𝑝

The storage scheme respects the maximum capacity for each Cloud Provider p.

𝑥𝑑,𝑝 = 1𝑎𝑛𝑑 𝑝 =1

𝑇𝑜𝑡𝑎𝑙𝑆𝑡𝑜𝑟𝑎𝑔𝑒𝐶𝑜𝑠𝑡–The first function that is subject of minimization.

𝑑=1 𝑝 =1 𝑁

𝑑=1 𝐿

𝑀

𝑇𝑜𝑡𝑎𝑙𝑇𝑟𝑎𝑛𝑠𝑓𝑒𝑟𝐶𝑜𝑠𝑡 =

𝑇𝐶𝑙𝑜𝑐 ,𝑝 𝐷𝐵𝑙𝑜𝑐 𝑙𝑜𝑐 =1 𝑝=1

𝑇𝑜𝑡𝑎𝑙𝑇𝑟𝑎𝑛𝑠𝑓𝑒𝑟𝐶𝑜𝑠𝑡 – The second function that is subject of minimization.𝐷𝐵𝑙𝑜𝑐 is the total data produced in a specific location.

The CyberWater scenario states as proof of concept for our proposed model. CyberWater (cyber-infrastructure for natural resource management) [7] is a national project aiming to create a repository for data, concerning polluted water management. All this data must be stored and managed in a cost effective way. The sources of data are heterogeneous, such as sensors, data from public institutions, in various formats (e.g. xml, xls, csv, pdf) and geographically distributed [8]. There are two possible solutions to deal with this situation: first is to store all data in a private data centre and to support all the costs for storage and management, the second is to store data in different locations at different Cloud storage providers which are near, to the data sources. The question that raise here is, how to achieve the optimal cost for storing and manage that data, if is stored at different Cloud providers? We will impose as request to store all data blocks, so is it possible face with a splitting data problem. This is necessary for optimization process, but will be transparent for end user. We choose 5 Cloud Providers (presented in Table 2). Table 3 presents the storage costs per 1GB of data for 1 month and the cost of transfer from data sites to a storage provider ($/GB).We also consider 4 geographically areas of Romania where data are produced. The data from Table is taken form Cloud providers public web sites 1.

Data storage price (per GB/mo)

Amazon $0,085

Google $0,0449

Azure $0,050

Rackspace $0,100

Cost of transfer from data sites to a storage provider ($/GB) Amazon Google Azure Rackspace $0,08 $0,08 $0,08 $0,07 Romania S-E $0,11 $0,11 $0,11 $0,10 Romania S-W $0,12 $0,12 $0,12 $0,12 Romania N-E $0,12 $0,12 $0,12 $0,12 Romania N-W Table 3 Cost of data transfer from storage provider to data sites (in $ for 1 GB/mo) 1

Amazon: http://calculator.s3.amazonaws.com/index.html Google: https://developers.google.com/storage/pricing Azure: http://www.windowsazure.com/en-us/pricing/calculator/ Rackspace: http://www.rackspace.com/cloud/files/pricing/ HP: http://www.hpcloud.com/pricing/calculator

HP $0,100

HP $0,08 $0,11 $0,12 $0,12

Proc. of the IE 2014 International Conference www.conferenceie.ase.ro 4. Experimental results We present three scenarios for CyberWater project aiming to prove with them that it is possible to optimize your initial storage scheme and reduce the final costs. In all scenarios we considered five Cloud Storage Providers (Amazon, Google, Azure, Rackspace and HP), four geographically areas for Romania (S-E, S-W, N-E, N-W) and different data blocks (DB). Scenario 1: We set-up the initial capacity for all Cloud Providers with specified values. For considered data blocks the propose optimization scheme distributes the data to all providers and splits the data for 𝐷𝐵4 into two different blocks. The final capacity is equal with the initial capacity, so there will be no optimization. Amountof datato store (GB) Amazon 0 0 0 200 200

Google 0 0 250 350 600

Azure 0 0 200 0 200

Rackspace 0 400 0 0 400

HP 200 100 0 0 300

Initial Capacity Total Cost for Initial Capacity

200 $17,00

600 $26,94

200 $10,00

400 $40,00

300 $30,00

TOTAL $123,94

Final Capacity Total Cost for Final Capacity

200 $17,00

600 $26,94

200 $10,00

400 $40,00

300 $30,00

$123,94

Cost for data storage (Final) Cost of data transfer

$17,00 $24,00

$26,94 $72,00

$10,00 $24,00

$40,00 $40,00

$30,00 $27,00

$123,94 $187,00

Total cost

$310,94

Romania S-E Romania S-W Romania N-E Romania N-W Total Data Stored

Final Gain

0%

Total 200 500 450 550

Data Blocks (DB) 200 500 450 550

Scenario 2:We set-up the initial capacity for all Cloud Providers with a high value. For considered data blocks the propose optimization scheme distribute splits 𝐷𝐵1 and 𝐷𝐵4 into two different blocks. The final capacity is less than initial capacity, so there will be 64% cost optimization for storage. We can observe that Google and AZURE Cloud providers are used at their maximum capacity, having the lowest price. Amountof datato store (GB) Amazon 0 0 0 500 500

Google 450 500 0 50 1000

Azure 550 0 450 0 1000

Rackspace 0 0 0 0 0

HP 0 0 0 0 0

Initial Capacity Total Cost for Initial Capacity

1000 $85,00

1000 $44,90

1000 $50,00

1000 $100,00

1000 $100,00

TOTAL $379,90

Final Capacity Total Cost for Final Capacity

500 $42,50

1000 $44,90

1000 $50,00

0 $0,00

0 $0,00

$137,40

Cost for data storage (Final) Cost of data transfer

$42,50 $60,00

$44,90 $97,00

$50,00 $98,00

$0,00 $0,00

$0,00 $0,00

$137,40 $255,00

Total cost

$392,40

Romania S-E Romania S-W Romania N-E Romania N-W Total Data Stored

Final Gain

64%

Total 1000 500 450 550

Data Blocks (DB) 1000 500 450 550

Scenario 3: We set-up the initial capacity for Google Cloud Provider with a high value, considering that here we have the lowest price. There will be no split blocks and all of them will be stored on Google. The optimization is 80% according with initial scheme. We can observe that for all considered scenarios the cost of data transfer is higher that the storage cost, so this is an important cost factor that must be considered in any optimization models.

Proc. of the IE 2014 International Conference www.conferenceie.ase.ro Amountof datato store (GB) Amazon 0 0 0 0 0

Google 1000 500 450 550 2500

Azure 0 0 0 0 0

Rackspace 0 0 0 0 0

HP 0 0 0 0 0

Initial Capacity Total Cost for Initial Capacity

1000 $85,00

5000 $224,50

1000 $50,00

1000 $100,00

1000 $100,00

TOTAL $559,50

Final Capacity Total Cost for Final Capacity

0 $0,00

2500 $112,25

0 $0,00

0 $0,00

0 $0,00

$112,25

Cost for data storage (Final) Cost of data transfer

$0,00 $0,00

$112,25 $255,00

$0,00 $0,00

$0,00 $0,00

$0,00 $0,00

$112,25 $255,00

Total cost

$367,25

Romania S-E Romania S-W Romania N-E Romania N-W Total Data Stored

Final Gain

80%

Total 1000 500 450 550

Data Blocks (DB) 1000 500 450 550

5. Conclusions We propose in this paper a binary linear optimization method for cost optimization when buying cloud storage capacity from public Cloud providers. We can conclude that the final gain depends on initial demand of data, however in some cases is surprisingly high (e.g. 80%), so this method can be successfully used for any scenarios considering public Clouds. Acknowledgment The research presented in this paper is supported by the following projects:CyberWater grant of the Romanian National Authority for Scientific Research, CNDI-UEFISCDI, project number 47/2012 and SideSTEP Scheduling Methods for Dynamic Distributed Systems: a self-* approach, (PN-II-CT-RO-FR-2012-1-0084).We would like to thank the reviewers for their time and expertise, constructive comments and valuable insights.

References [1] Palankar, M. R., Iamnitchi, A., Ripeanu, M., &Garfinkel, S. (2008, June). Amazon S3 for science grids: a viable solution?.In Proc. of the 2008 int. workshop on Data-aware distributed computing (pp. 55-64). ACM. [2] Wada, H., Fekete, A., Zhao, L., Lee, K., & Liu, A. (2011). Data Consistency Properties and the Trade-offs in Commercial Cloud Storage: the Consumers' Perspective. In CIDR (Vol. 11, pp. 134-143). [3] Calder, B., Wang, J., Ogus, A., Nilakantan, N.,Skjolsvold, A., McKelvie, S.,&Rigas, L. (2011, October). Windows Azure Storage: a highly available Cloud storage service with strong consistency. In Proc. of the Twenty-Third ACM Symposium on Operating Systems Principles (pp. 143-157). ACM. [4] Drago, I., Mellia, M., M Munafo, M., Sperotto, A., Sadre, R., &Pras, A. (2012, November). Inside dropbox: understanding personal Cloud storage services. In Proc. of the 2012 ACM conference on Internet measurement conference (pp. 481-494). ACM. [5] S. Lawson. IDC: Efficiency will hold down storage growth. Internet (Jun 18, 2013 [March 12, 2014]): http://www.pcworld.com/article/2042272/efficiency-will-hold-down-storage-growth-idc-says.html. [6]Abu-Libdeh, H., Princehouse, L., &Weatherspoon, H. (2010, June). RACS: a case for Cloud storage diversity. In Proc. of the 1st ACM symposium on Cloud computing (pp. 229-240). ACM. [7] Mocanu, M., &Craciun, A. (2012, September). Monitoring Watershed Parameters through Software Services. In EIDWT (pp. 287-292). [8] Mocanu, M., Vacariu, L., Drobot, R., &Muste, M. (2013, May).Information-Centric Systems for Supporting Decision-Making in Watershed Resource Development. In Control Systems and Computer Science (CSCS), 2013 19th International Conference on (pp. 611-616). IEEE. [9]Singh, Y., Kandah, F., & Zhang, W. (2011, April). A secured cost-effective multi-Cloud storage in Cloud computing. In Computer Communications Workshops (INFOCOM WKSHPS), 2011 IEEE Conference on (pp. 619-624). IEEE. [10] Linear programming.Internet:http://en.wikipedia.org/wiki/Linear_programming, March 17, 2014. [11] Negru, C., & Cristea, V. (2013). Cost models-pillars for efficient Cloud computing: position paper. International Journal of Intelligent Systems Technologies and Applications, 12(1), 28-38.

Suggest Documents