1
Interoperable Data Management by Utilizing Personal and Infrastructure Clouds Attila Kertesz
F
Abstract—Cloud computing has reached a maturity state and high level of popularity that various cloud services have become a part of our lives. Mobile devices also benefit from cloud services. The huge data users produce with these devices are continuously posted to online services, which require the use of several cloud providers at the same time to efficiently store these data. The Internet of Things provides a way to improve social networking by interdisciplinary efforts that can be effectively supported by cloud computing solutions. In this article we propose novel approaches for composing and interoperating cloud solutions to support IoT functionality. We provide a novel way to unite separate personal clouds in an autonomous way to cope with the enormous data users produce and share in social networks. We also exemplify how to manage, share and process user data produced by mobile devices in different IaaS clouds. Keywords—Cloud Computing; Data Interoperability; Personal Clouds; IoT
N
the cloud computing paradigm has reached a maturity state that various cloud services have become a part of our lives. These services are offered at different cloud deployment models ranging from the lowest infrastructure level to the highest software or application level. Within Infrastructure as a Service (IaaS) solutions we can differentiate public, private, hybrid and community clouds according to recent reports of standardization bodies [7]. The previous two types may utilize more than one cloud system, which is also called as a cloud federation [8]. One of the open issues of such federations is the interoperable management of data among the participating OWADAYS
• A. Kertesz is with the University of Szeged and MTA SZTAKI, Hungary. E-mail:
[email protected] This research was supported by the European Union and the State of Hungary, co-financed by the European Social Fund in the framework of TAMOP 4.2.4. A/2-11-1-2012-0001 ’National Excellence Program’.
systems. Another popular family of cloud services is called cloud storage services or personal clouds. With the help of such solutions, user data can be stored in a remote location and can be accessed from anywhere. Mobile devices can also benefit from these cloud services. The enormous data users produce with these devices are continuously posted to online services, which may require the use of several cloud providers at the same time to efficiently store and retrieve these data. The aim of our research is to develop a solution that unites and manages separate personal clouds in an autonomous way to provide a suitable solution for these user needs. The Cluster of European Research Projects on the Internet of Things (CERP-IoT) considers the Internet of Things (IoT) as a vital part of Future Internet. They defined it as a dynamic global network infrastructure with self configuring capabilities based on standard and interoperable communication protocols. Things in this network interact and communicate among themselves and with the environment by exchanging data and information sensed. They react autonomously to events and influence them by triggering actions with or without direct human intervention [1]. Gubbi et al. [2] have identified that to support the IoT vision, the current computing paradigm need to go beyond traditional mobile computing scenarios and to evolve into connecting everyday existing objects by embedding intelligence. Cloud computing has the potential to address these needs, and it is able to hide data generation, processing and visualization tasks in order to tap the full potential of the Internet of Things
2
in various application domains. In this article we propose novel solutions for interoperable personal data management in clouds to partly contribute to the evolution of the Internet of Things. Our approaches enable to share user data among cloud providers in an autonomous way, and to manage and process this data produced by mobile devices in different clouds transparently. Regarding related works, the need for data interoperability and the extensive use of cloud storage services have been identified by various research and expert groups (eg. [7], [5], [3]). Dillon et. al [4] gathered several interoperability issues that need to be considered in cloud research, and named a new category called Data Storage as a Service to draw attention to the problem of data management in clouds. Garcia-Tinedo et al. [6] have also addressed performance issues of personal clouds. They developed a tool for actively measuring three providers: Dropbox, Box.com and SugarSync. They performed measurements for two months with various data transfer load models to search for interdependency among data sizes, transfer quality and speed. They published their measurement data and concluded that these providers have different service levels, and they often limit the speed of downloading. This work also served as a motivation for our research, but we decided to develop a more lightweight and easily extendible measuring tool to support our further research goal of autonomous data sharing among these providers. Research towards integrating the mobile world and clouds has been surveyed by Fernando et al. [11]. They argue that exploiting the full potential of mobile computing is difficult due to its inherent problems such as resource scarcity, frequent disconnections, and mobility. On the other hand, mobile cloud computing can address these problems by executing mobile applications on resource providers external to the mobile device. The integration of IoT and clouds has been envisioned by Botta et al. [9] by summarizing their main properties, features, underlying technologies, and open issues. A solution for merging IoT and clouds is proposed by Nastic et al. [10]. They argue that system designers and operations managers
face numerous challenges to realize IoT cloud systems in practice, due to the complexity and diversity of their requirements in terms of IoT resources consumption, customization and runtime governance. With this work we plan to make a step forward in this field by combining cloud computational and data services with capabilities of mobile devices. This vision brings these technologies closer to users, and provides a simple way to use heterogeneous cloud services in order to ease their lives.
AN
APPROACH
FOR
AUTONOMOUS
DATA MANAGEMENT AMONG PERSONAL CLOUDS
The largest amount of user provided data is stored at cloud storage services also called as personal clouds, which services are also relevant for IoT. Their popularity is accounted for easy access and sharing through various interfaces and devices, synchronization, version control and backup functionalities. The freemium nature of these services maintain a growing user community, and their high number of users also implies the development of other higher level services that make use of their cloud functionalities. To overcome the limits of freely granted storage, users may sign up to services of different providers, and distribute their data manually among them, which situation leads to a provider selection problem. In this situation tracking the amount and location of the already uploaded files and splitting larger files can be a difficult task for everyday users, which leads to the problem of cloud provider selection – not to mention their different capabilities concerning data transfer speeds. These facts serve as a motivation for our research, and one of the main goals of this work is to propose a higher level service that helps users to better manage their data by providing transparent and automated access to a unified storage over these clouds. In order to exemplify our approach we address four providers, namely Dropbox [14], Google Drive [13], SugarSync [15] and Box.com [16]. Drew Houston founded Dropbox Inc in 2007, and by 2011 it reached 14% market share by having 50 million registered users, while
3
in 2013 it exceeded 200 million. Its freemium model grants 2 GBs storage for a new registration that can be extended up to 8 GBs by inviting others or performing certain tasks. Google Drive is a personal cloud solution of Google. It was initiated in 2012, but it has several predecessors such as Google Docs since 2006. It also serves as an in-house data store for several other Google services, therefore it provides 15 GBs freely for a new user. Thanks to the coupled services of Google, its web interface is capable of previewing numerous file formats in a browser. SugarSync was launched in 2009, but its predecessor Sharpcast Photos dates back to 2006. It provided 5 GBs free storage for a newly registered user till December 2013, when the owners announced to close freemium services till February 2014. Since then its free service is only valid for 30 days trial period. Box.com was founded as a startup company in 2005. Since 2010 it has a built-in file preview functionality. It provides 10 GBs of free storage for a new user. Our proposed application to group these clouds consists of three components: • the MeasureTool component for performing monitoring processes, • the DistributeTool component for splitting and distributing files, • and the CollectTool component for retrieving splitted parts of a required file. The MeasureTool component implements three basic functions: connecting to a user account at a certain provider, uploading and downloading certain files to and from the storage of this account. It has a plugin-based structure to separate methods for different providers and to enable further provider support. A monitoring process for measuring the performance of a provider consists of generating a file of a predefined size with randomized content, uploading this file to the provider’s storage under a given user account, then downloading this file back to the host of the application. The main task of the DistributeTool component is to apply certain policies for splitting up and packaging files to be distributed among the participating cloud providers in an efficient way. The file to be uploaded to the providers’ storages is first split to a predefined number
Fig. 1. The proposed solution for autonomous data management
of files, what we call chunks, with equal sizes. The second step decides where to upload these file chunks. Once it has been determined and a chunk is uploaded, the DistributeTool component stores chunk identifiers (e.g. name, user token, file ID) to a local meta-data cache file. By using this meta-data file, the CollectTool component can later fetch the required chunk files from the different providers. The provider selection in the second step is made upon the information gathered by the MeasureTool component. Historical performance values are also stored and taken into account, and it is the role of the application administrator to set the relevance (i.e. ratio) of historical and latest performance results for provider selection. The measured performance values are converted to the following format (denoting percentage shares – the sum of these values represent 100%) taking into account the aggregated historical performance values (h), the latest performance values (l) and their ratio (r) by evaluating (h + l ∗ r), e.g.: {”googledrive” : 5392, ”dropbox” : 1615, ”box” : 1085, ”sugarsync” : 292 } According to these configuration numbers, the DistributeTool component takes the sum of these values (sum) and generates a random
4
number independently drawn from the range {0, sum} for each chunk by using Gaussian distribution. The CollectTool component is able to collect the previously uploaded user files from the cloud providers by using the metadata description file. Once the chunks of a required file are retrieved, they are unified with an optimized buffering technique. We have performed our evaluations on a private IaaS cloud based on OpenNebula. It has been developed by a national project called SZTAKI Cloud [12], which was initiated in 2012 to perform research in clouds, and to create an institutional cloud infrastructure for the Computer and Automation Research Institute of the Hungarian Academy of Sciences. Since 2014 it is in production state available for all researchers associated with the institute. It runs OpenNebula 4.4 with KVM, and controls over 440 CPU cores, 1790 GBs of RAM, 66 TBs shared and 35 TBs local storage for serving an average of 250 Virtual Machines (VM) per day for the last month. The application consisting of the previously discussed components has been deployed in a VM started at SZTAKI Cloud. For users, the most important metric for measuring provider performance is the data transfer speed. Therefore we used this metric to monitor the providers, and to use as a base for autonomous file sharing. To perform an evaluation of the MeasureTool component, we up- and downloaded files to each personal cloud considering the following scenarios: (i) transferring two 5 MBs file or a 10 MBs file, (ii) transferring five 10 MBs file or a 50 MBs file, and (iii) transferring ten 10 MBs file or a 100 MBs file. In this way we arrived to 6 different cases, and we could also measure data transfer performance for handling many small and few big files. We went through all cases systematically, and performed the same measurements several times. Once the limit of the freemium storage of a provider got exceeded, we halted the measurement and deleted all files on that storage to start following tests. We performed the same measurements on different periods of a week, i.e. on weekdays and at weekends. This evaluation showed that Google Drive has the best performance values followed by Dropbox and Box.com, while SugarSync has
the worst values. While the difference between Google Drive and SugarSync is slight, it is not easy to compare Box.com and Dropbox. Many small files are better handled by Dropbox, while bigger files are transferred faster by Box.com. Based on the results of the evaluation of the MeasureTool component, our initial hypothesis that service quality levels differ for various cloud providers has been proven. We evaluated our proposed solution with 4 different configurations (i.e. r = 0, 0.1, 0.5, 0.9) for user data distribution over the interconnected personal clouds. During these measurements the DistributeTool component performed the splitting and packaging of the user files, selecting providers for the created file chunks based on the performance values and configurations, and uploading the files to these providers. The retrieval of the files was performed by the CollectTool component by using the meta-data description file created by the DistributeTool component.
Fig. 2. Evaluation results for the proposed data distributor application with different configurations The evaluation results for the different configurations are shown in Figure 2. As we can see on this diagram, slight modifications on the ratio of historical and latest performance values (e.g. changing r from 0 to 0.1) do not imply big differences, but relying more on the latest performance values (i.e. using r = 0.5) results in faster up- and downloading times for the overall user data.
M ANAGING MOBILE DATA IN CLOUDS Mobile devices, which play an important role in IoT, can also benefit from cloud services. The enormous data users produce with these
5
devices are continuously posted to online services, which may require the modification of these data. Next we introduce a scenario that requires interoperable data management among cloud infrastructures to manage user data produced by mobile devices. Though the computing capacity of mobile devices has rapidly increased recently, there are still numerous applications that cannot be solved with them in reasonable time. Our approach is to utilize cloud infrastructure services to execute such applications on mobile data stored in personal clouds. The basic concept of our solution is the following: services for data management are running in one or more IaaS systems that keep tracking the cloud storage of a user, and execute data manipulation processes when new files appear in the storage (see Figure 3).
Fig. 3. Enhancing data management of mobile devices by interoperating clouds The service running in the IaaS cloud can download the user data files from the cloud storage, execute the necessary application on these files, and upload the modified data to the storage service. Such files can be for example a photo or video made by the user with his/her mobile phone to be processed by an application unsuitable for mobile devices. In our solution currently developed for Android devices, there is a possibility to configure the processes to be performed on the data with a separate configuration file. This file is automatically created and managed by a mobile application running
on the users device. The application is also responsible for communicating with the cloud storage, which is Dropbox in our case. The file manipulation applications have been created as a virtual appliance, and have been predeployed in our SZTAKI cloud infrastructure based on OpenNebula, to perform our evaluations. The mobile application In order to exemplify the usability of this generic approach, we have developed an Android application called FolderImage. It can be used to manipulate pictures produced by mobile devices. This program creates thumbnails of each image of the appropriate folder then ensembles them into a single image (called as folder image). This generated image represents the folder and gives an overview of its contents to the user. This app can be really useful by providing a glimpse of a directory, when a user has thousands of pictures spread over numerous directories, and she is looking for a specific one. It can be used in two modes: (i) in local operation, when the folder image is generated by using the computing resources of the actual device, and (ii) in cloud operation, when the application communicates directly with Dropbox. In this case the pictures can be uploaded on demand, or synchronized continuously, and the folder image is generated in the cloud, which is downloaded to the device after completion. These modes can be triggered by clicking on the appropriate button in its graphical interface. In the FolderImage application, clicking on the first button, it will automatically log in to Dropbox. After a successful login, the name of the account holder is displayed above the button. The second button can be used to trigger a folder image creation in the cloud, and the third one to perform the folder image generation locally. This generated image is also shown in the application GUI, under the buttons. Status messages and updates on image up-, downloads and generation are also shown between the buttons and the folder image. Though this GUI is relatively simple, it is not an easy task to show proper look on devices having different display resolutions.
6
As we mentioned, the second and third buttons can be used to generate the folder image, this task has five similar steps: 1) list: to generate a list of the images the actual folder contains; 2) download: to access the images of the folder; 3) resize: to generate thumbnails of the images; 4) create: to ensemble the thumbnails; 5) upload: to save the created folder image. Image generation in the cloud As we have depicted in Figure 3, the idea of our scenario is to move computation-intensive tasks from a mobile device to the cloud. Therefore we have also created a Java application called ImageConverter that is capable of performing the previous discussed 5 steps. We have encapsulated this application to a Virtual Appliance (VA), deployed, and started it as a web service running in a local cloud. It also has a direct connection with the user’s Dropbox storage. It can continuously synchronize the image directory, and perform the folder image generation once a new image is added to the folder. It can also be set to listen to a specific configuration file that instructs it to execute certain methods (eg. performing different kinds of image manipulation processes). Once the image generator method is called from the Android application, the configuration file is refreshed. Then the web service running in the cloud is notified about this change, which triggers the folder image generating and uploading processes. Performance evaluation We conducted a performance evaluation with the SZTAKI Cloud [12], where we deployed the ImageConverter VA in two different types of virtual images: one having one processor and one GB memory (VM1), and the other 4 processors with 4 GBs of memory (VM2). Meanwhile we have also tested the FolderImage application on two different Android devices: on a phone (Samsung Galaxy Mini with Android v2.2, 600 MHz CPU and 384 MBs RAM) and a tablet
(Asus Slider SL101 with Android v4.0, 1 GHz (dual-core) CPU and 1 GB RAM). The evaluation results incorporating the 5 steps of the executed methods for a folder containing 900 images are the following: • Android phone: 430596 ms, • Android tablet: 143010 ms, • Cloud VM1: 1841 ms, • Cloud VM2: 1068 ms. These results clearly show the differences among the different types of executions. Regarding the Android devices, the tablet performed the generation 3 times faster then the phone in both rounds of experiments. The web service running in VM2 type virtual machine performed two times faster then the other deployment at VM1. The local execution on the Android devices are significantly slower (more then 100 times) then the image generations performed in the cloud. These results prove that it is worth moving computation-intensive tasks to clouds from mobile devices.
M ULTI - CLOUD EVALUATION Next we show a scenario when academic and commercial IaaS clouds are interoperated through a personal cloud. To increase heterogeneity, we performed another evaluation by using Dropbox, OpenNebula and Amazon. For this scenario we ported a biochemical application to this environment. It generates conformers by unconstrained molecular dynamics at high temperature to overcome conformational bias, then finishes each conformer by simulated annealing and energy minimization to obtain reliable structures. The end users of this app are biologists or chemists, who need to examine molecular modeling for drug development. The execution of the whole application in a single PC takes around 5-8 days. We used three scripts managed by a Java web application to execute our app in a VM. The master script performs the initial conformer generation and the worker script performs additional conformer finishing methods for 1000 conformers at a time. Finally the uploading script compresses the subresults to a single result file. In this way the execution of the ported app consists of the execution of a master task in
7
the first phase, followed by running 50 worker tasks for processing the 50000 conformers in the second phase, finally calling the uploading script to create the final result in the third phase. Following our approach users only need to make available their data in a personal cloud, and to specify with a configuration file the order of data processing (by linking VM methods to data). Once this configuration file is available and at least one VM (executing the necessary service for processing user data) is running in an IaaS cloud, the autonomous data processing starts and goes on till all data is processed. We used the same template configuration for OpenNebula each having 4 virtual CPUs and 4 GBs of memory to start 3 VMs in SZTAKI Cloud (denoted by ONe in the figure), and for Amazon (denoted by AM) we also started Fig. 4. Detailed evaluation results for Phase 2 3 VMs with Linux Micro instances. It took around 10 minutes to perform the initial input data transfers and to deploy the IaaS VMs to R EFERENCES start with Phase 1 of our biochemical applica[1] H. Sundmaeker, P. Guillemin, P. Friess, S. Woelffle. Vition. The total execution took around 16 hours. sion and challenges for realising the Internet of Things. Detailed measurement results for Phase 2 of CERP IoT – Cluster of European Research Projects on the Internet of Things, CN: KK-31-10-323-EN-C, March this experiment is depicted in Figure 4. In these 2010. diagrams we can see how the VMs of different [2] J. Gubbi, R. Buyya, S. Marusic, M. Palaniswami. Internet IaaS systems competed for tasks, and how long of Things (IoT): A vision, architectural elements, and future directions. Future Generation Computer Systems, it took them to compute these tasks in total (the 29, Issue 7, pp. 1645-1660, September 2013. curve marks the start time of task computations [3] Volume J. Bozman. Cloud Computing: The Need for Portability by VMs – except for the first task, which started and Interoperability. IDC Executive Insights, August 2010. at 0:00). [4]
C ONCLUDING REMARKS The enormous data users produce with mobile devices are continuously posted to online services, which may require the use of several cloud providers at the same time to efficiently store, process and retrieve these data. The aim of our research in this article was to introduce cloud-based approaches that contribute to the evolution and proliferation of the Internet of Things. Our proposed solutions store, process and share user data produced by mobile devices by managing separate IaaS and personal clouds in an interoperable and autonomous way. Our future work aims at extending the functionalities of the designed services to widen interoperability and provider support.
[5]
[6]
[7]
[8]
[9]
T. Dillon, C. Wu, and E. Chang. Cloud Computing: Issues and Challenges. In proc. of the 24th IEEE International Conference on Advanced Information Networking and Applications, pp. 27–33, 2010. Fraunhofer Institute for Secure Information Technology. On THE Security of Cloud Storage Services, SIT Technical reports, March 2012. Online: http://www.sit.fraunhofer.de/content/dam/sit/en/documents/Cloud-Storage-Security a4.pdf R. Garcia-Tinedo, M. Sanchez-Artigas, A. MorenoMartinez, C. Cotes s P. Garcia-Lopez. Actively Measuring Personal Cloud Storage. The 6th IEEE International Conference on Cloud Computing (Cloud’13), pp. 301– 308, 2013. K. Jeffery, and B. Neidecker-Lutz. The Future of Cloud Computing, Opportunities for European Cloud Computing beyond 2010. Expert Group Report, Jan. 2010. A. Kertesz. Characterizing Cloud Federation Approaches. In book: Cloud Computing - Challenges, Limitations and R&D Solutions, Zaigham Mahmood (Ed.), Springer Series on Computer Communications and Networks, pp. 277-296, 2014. A. Botta, W. de Donato, V. Persico, A. Pescape. On the Integration of Cloud Computing and Internet of Things.
8
[10]
[11]
[12] [13] [14] [15] [16]
The 2nd International Conference on Future Internet of Things and Cloud (FiCloud-2014), August 2014. S. Nastic, S. Sehic, D. Le, H. Truong, and S. Dustdar. Provisioning Software-defined IoT Cloud Systems. The 2nd International Conference on Future Internet of Things and Cloud (FiCloud-2014), August 2014. N. Fernando, S. W. Loke, W. Rahayu. Mobile cloud computing: A survey. Future Generation Computer Systems, Volume 29, Issue 1, Pages 84-106, January 2013. The SZTAKI Cloud project website. http://cloud.sztaki.hu/en/home, May 2014. Google Drive. https://drive.google.com/, May 2014. Dropbox. https://www.dropbox.com/, May 2014. SugarSync. https://www.sugarsync.com/, May 2014. Box.com. https://www.box.com/, May 2014.