Commercialized Network Applications by Integrating the Network Distribution and High-Speed Cypher Technolgies in Cloud Computing Envionments Kazuo Ichihara Net&Logic Inc. 1-23-1, Daizawa, Setagaya, Tokyo, 155-0032, Japan e-mail:
[email protected]
Noriharu Miyaho, Shuichi Suzuki, Yoichiro Ueno Tokyo Denki University Department of Information Environment 2-1200,Muzai Gakuendai, Inzai, Chiba, 270-1382, Japan e-mail:
[email protected],
[email protected],
[email protected]
Abstract —This paper presents the evaluation results of the commercialization of High Security Disaster Recovery Technology (HS-DRT) that uses the network distribution and high-speed strong cypher technologies for realizing efficient secure network services. We commercialized the corresponding disaster recovery system and evaluated the performance of the distributed engines depending on the hash functions, versatile virtual memory system in the cloud computing environments, and furthermore the conjoint analysis for customer satisfaction. The average response time has also been estimated in terms of data file size and deployment methods of the engine cores. As for the practical network applications, the automatic back-up system using FTP server and the secure video streaming system are introduced. We also propose the future teleconference system with high security by using the HS-DRT. Finally, this paper describes the future technologies to be solved such as the technology for preventing from the insider attack, the technology for integrating pull and push type of HSDRT, and the automatic performance monitoring checking mechanism.
resources dispersed in multiple geographical locations are indispensable for this purpose. In addition to these factors, many companies and individuals involved in industry and commerce are interested in making use of public or private cloud computing facilities/environments, provided by carriers or computer vendors, to provide temporary storage of and realtime access to their multimedia files, as a means of achieving security and low maintenance and operation costs. In this paper we propose the evaluation results of the innovative file backup concept which makes use of an effective ultra-widely distributed data transfer mechanism and a high-speed strong cypher technology to efficiently realize safe data backup at an affordable maintenance and operation cost [1-3]. Most important requirements for users are to satisfy with both the specific security level against being deciphered illegally, quick recovery time and the affordable maintenance cost.
Keywords-disaster recovery; backup; metadata; distributed processing; cloud computing; strong cipher; secure video streaming
I.
INTRODUCTION
Innovative network technology to guarantee, as far as possible, the security of users’ or institutes' massive files of important data from any risks such as an unexpected natural disaster, a cyber-terrorism attack, etc., is becoming more indispensable day by day. For satisfying with the need, Cloud computing technology is expected to provide an effective and economical backup system by making use of a very large number of data stores and processing powers of which resources are not fully utilized. It is expected that the file data backup mechanism will be utilized by government and municipal offices, hospitals, insurance companies, etc., to guard against the occurrence of unexpected disasters such as earthquakes, large fires and storms and Tsunami. For ascertaining the secure back-up, the prompt restoration by making use of versatile number of the cellular phones, smart phones, digital signage and PCs, in addition to the cloud
Figure 1. A Proposed network service in case of disaster occurrence
In the case of block cipher technology, the data is divided into successive blocks and each block is processed separately, leading to slow encryption speed. In addition, as the data volume increases, the required processor and
memory cost increases in an exponential manner when the block cypher is used. On the other hand, in case of the stream cipher, the input data is simply operated on bit-by-bit, using a simple arithmetic operation. Therefore, high-speed processing becomes feasible. These are the fundamental differences between the two cipher technologies0. When an ultrawidely distributed file data transfer technology, a file fragmentation technology and an encryption technology are used together, then quite advantageous characteristics arise from a point of data volume. It is possible to combine the use of technologies, specifically, the spatial scrambling of all data files, the random fragmentation of the data files, and the corresponding encryption and replication of each file fragment using a stream cipher. Figure 1 shows the proposed network service concept in case of a disaster occurrence in the data center. We developed the high security disaster recovery technology (HS-DRT) for realizing the corresponding network services. To realize the proposed DRT mechanism, the following three principal network components are required such as for (1) a secure data center, (2) several secure supervisory servers, and (3) a number of client nodes such as smart phones, cellular phones, digital signage or cloud storages available to use. The corresponding history data, including the encryption key code sequence, which we call "encryption metadata", are used to recover the original data. This mechanism is equivalent to realizing a strong cipher code, comparable to any conventionally developed cipher code, by appropriately assigning affordable network resources 0. By making use of the proposed combined technology, illegal interception and decoding of the data by a third party becomes almost impossible and an extremely safe data backup system can be realized at reasonable cost. In this paper we prose the special case such as preventing from the insider attack. This kind of issues should be further discussed in the last Section. The proposed technology can also increase both the cipher code strength and the normal operation of decryption and reassembly of original data. In this paper, we briefly describe the related work in Section II, and describe the basic configuration of the DRT engine in Section III, and the performance evaluation of the proposed network services using DRT engine in Section IV. The practical commercialized systems and conjoint analyses are discussed in Section V. Finally, we provide with the conclusion and future issues to be studied such as prevention technique of insider attacks, the hybrid push-pull type mechanism, and the automatic performance monitoring of the cloud in Section VI.
II.
RELATED WORK
In the field of Disaster Recovery System, there are many researches and commercial products. Most of disaster recovery systems include data replication functions with some stand-by servers of the distant places. On the contrary, the proposed disaster recovery system with the HS-DRT has the secure distributed data backup scheme. By making use of the HS-DRT mechanism for attaining reliable backup scheme, we could provide the system product with the reasonable price to the individual and the company. For example, we could provide only the backup application by using the HS-DRT with plural cloud services as backup storage according to the proper customer contracts. So, it is as a little bit difficult to compare quantitatively our proposed system with the ordinary disaster recovery system from the viewpoint of the price. In the field of secure data backup system, other related studies have included the concept of a distributed file backup system [A][B]. However, in these studies, neither a precise performance evaluation nor practical network service systems are clearly described. In the field of Intrusion tolerance, Deswarte proposed “Persistent file server”[C]. The persistent file server was introduced encryption, fragmentation, replication, and scattering. The core technologies of HS-DRT resemble that of Persistent file server, except for spatial scrambling and random dispatching technology[1][2]. With these two technologies, deciphering by a third party by comparing and combining the encrypted fragments becomes almost impossible. In addition, the HS-DRT is applicable to any other fields, such as secure video streaming and so on. III.
BASIC CONFIGURATION OF HS-DRT ENGINE
We have previously clarified the relationships between data file capacity, the number of file fragments and the number of replications for the case of disaster recovery 0, [8] . The HS-DRT file backup mechanism has three principal components as shown in Figure 2. The main functions of the proposed network components are Data Center, Supervisory Server and various client nodes, and these can be specified as follows. The client nodes (at the bottom of Fig. 2) are PCs, Smart Phones, Network Attached Storage (NAS) devices, Digital Signage and Storage Services of the Cloud. They are connected to a Supervisory Server in addition to the Data Center via a secure network.. The Supervisory Server (on the right in Fig. 2) acquires the history data, which includes the encryption key code
Figure 2. The basic configuration of HS-DRT system
sequence (metadata) from the Data Center (on the left in Fig. 2) via a network. The basic procedure in the proposed network system is as follows. A.
Backup sequence When the Data Center receives the data to be backed up, it encrypts it, scrambles it, and divides it into fragments, and thereafter replicates the data to the extent necessary to satisfy the required recovery rate according to the predetermined service level. The Data Center encrypts the fragments again in the second stage and distributes them to the client nodes in a random order. At the sane time, the Data Center sends the metadata used for deciphering the series of fragments to the Supervisory Server. The metadata comprises encryption keys (for both the first and second stages), and several items of information related to fragmentation, replication, and distribution. B.
Recovery sequence When a disaster occurs, or at other times, the Supervisory Server initiates the recovery sequence. The Supervisory Server collects the encrypted fragments from various appropriate clients in a manner similar to a rake reception procedure. When the Supervisory Server has collected a sufficient number of encrypted fragments, which are decrypted, merged, and descrambled in the reverse order of that performed at the second stage of encryption, and the decryption is then complete. Though these processes, the Supervisory Server can recover the original data that has been backed-up. The Security level of the HS-DRT does not only depend on the cryptographic technology but also on the combined method of specifying the three factors, that is, spatial scrambling, fragmentation/replication, and the shuffling
algorithm. Because of these three factors, nobody is able to decrypt without collecting all relevant fragments constituting a unique set of fragments, and sorting the fragments into the correct order. Even if some fragments are intercepted, nobody is able to decrypt parts of the original data from such fragments. The spatial scrambling procedure can be realized by executing the simple algorithm using a C-style description [10]. This computation process should be repeated several times. To de-scramble, it is only necessary to perform the same operations in the reverse order. By introducing the above mentioned spatial scrambling technology, it is almost impossible for a third party to decipher the data by comparing and combining the encrypted fragments, since uniform distribution of information can be achieved by this spatial scrambling. One of the innovative ideas of HS-DRT is that the combination of fragmentation and distribution can be achieved in an appropriately shuffled order. Even if a cracker captured all raw packets between the data center and the client nodes, it would be extremely difficult to assemble all the packets in the correct order, because it would be necessary to try about N! (N: number of fragments) possibilities when N is sufficiently large. In fact since the bits patterns of a pair of encrypted fragments are completely different from each other owing to the different encryption key, it is impossible to identify the encrypted fragment with the other. Crackers would require innumerable attempts to decipher the data.. In addition, HS-DRT mainly uses a shuffling method with pseudo-random number generators for the distribution to the client nodes. When we distribute the fragments of the encrypted data to widely dispersed client nodes, we can send them in a shuffled order, since we predetermine the destination client nodes from the shuffled table in advance. When we use the shuffled table by making use of "Fisher-Yates shuffle" algorithm with 3 rounds leading a uniform distribution, the table itself is hard for a third party to guess 00. The practical systems to realize a hybrid HS-DRT processor can be effectively accomlished by making use of a cloud computing system at the same time. The system mainly consists of the following four parts: thin clients, a web applications server (Web-Apps Server), an HS-DRT processor, and Storage Clouds. Thin Clients are terminals which can use the web applications in the SaaS (Software as a Service) environment. Thin Clients can make use of the application services which are provided by the web applications server. The HS-DRT processor is considered to be a component of the hybrid cloud computing system, which can also strengthen the cloud computers’ security level at the same time. Usually, users can also make use one of plural HD-DRT processors in the cloud environments by the contract with the providers. The data center and the supervisory server can be integrated in the HS-DRT
processor. The HS-DRT processor can effectively utilize the storage clouds as grid node facilities which has a function related to a web application and executes the encryption, spatial scrambling, and fragmentation of the corresponding files. It is very important to note that the processing efficiency of the HS-DRT processor can easily be improved by increasing the amount of the web cache memory. We need to consider the scalability of the engine, since it may become a bottleneck in a very large system, owing to the number of clients and the amount of storage. In such cases, the HS-DRT processor may use a key-value database. As the HS-DRT processor can easily work with other HSDRT processors, the system can be extended. The secrecy of the system can easily be assured because there is no plain raw data stream appearing anywhere in the entire data processing procedure. IV. PERFORMANCE EVALUATION OF DRT ENGINE A.
Conjoint analysis as regards customer satisfaction We made the conjoint cards as follows, and conducted questionnaire to the 10 students of Miyaho laboratory in School of Information Environment of Tokyo Denki university. As results of the quantification class one analysis, the values of adjusted R squared for each student were 0.7027027, 0.9785458, 0.9684015, 0.8741837, 0.9985653, 0.9647512, 0.9814052, 0.9568698, 0.08060384 and 0.8661417, where quantification class one is multiple regression analysis for the categorical data. So, one outlier value is detected. The value of adjusted R squared of the entire these data was 0.5393581. The same value rose to 0.6353954 in the data which we deleted the outlier value. (Furthermore, the same value rose up to 0.7584202 when we standardize the data set to each student. This suggests the possibility of that we had better use recommended prices as dependent variable than use enjoyment as the answer of questionnaire. ) In what follows, we use the data deleted the outlier value.
B.
Conclusion of conjoint analysis and analysis of variance We use R for the analysis. And we use the library ‘DoE.base’ for the design of experiment. As a result of the conjoint analysis, it became recovery rate, security, rental charge and response time in order with the high importance of the same analysis. Secondary we test the original data by analysis of variance with one attribute using enjoyment as the dependent variable. Setting 0.05 as the level of significance, the p-value of response time is 0.0369, the same value of recovery rate is less than 2e-16, the same value of security is 3.25e-05, and the same value of rental charge is 0.00137. Furthermore, we test the same data up to second order of analysis of variance, and we find the significance of interaction for response time and security, and find the significance of interaction for recovery rate and security as follows. We cannot find any interaction between rental charge and other attributes. In conclusion, the recovery rate is the most important as commercial demand, and the importance of security is also recognized to potential. V. EVALUATION OF COMMERCIALIED SYSTEM A.
FTP server automatic recovery system We made a backup-system for FTP server. With this system, we improved the efficiency of the searching algorithm. When one stored a file to the FTP-server, the backup-engine need to search the added file, or renewed file. This process consumes the cpu resources so much, and we used the log file of FTP-transfer. The log file is updated after every transfer, and it contains all the information about added or renewed or deleted file. B.
Practical disaster recovery system demonstrating mutually dependent geographical advantages In this thesis, we describe an application we have newly invented, which is effective not only for cloud system, but also on-premise system. The configuration of the corresponding system is shown in Fig.3.
Table 1. Attributes and levels
attributes levels
Response time Slow a little
Recovery rate
Capacity
Satisfaction a Small a little little
Easiness Little load
Security
Consolation Rental charge
Dangerous a Uneasy a little little
Cheap
Slow
Satisfaction enough
Large
Some loads
Safe
Relieved
Expensive
very fast
Dissatisfied
Large enough
not easy
Very safe
Very relieved
Very cheap
Table 2. Samples of conjoint cards
Please answer the questionnaire about the following disaster recovery system. (Please leave one side.) Evaluate as a user Evaluate as a developer. Register number Name Response time
Recovery rate
Capacity
Easiness
1
Slow
Dissatisfied
Small a little
not easy
2
Slow
Satisfaction a little
Large
Little load
3
Slow
Satisfaction a little Large enough
Some loads
Security Consolation Rental charge Dangerous a Uneasy a little Very cheap little Dangerous a Very relieved Expensive little Very safe Uneasy a little Cheap
Table 3.
attributes
Df
Sum Sq
Mean Sq
F value
Pr(>F)
significance
Response time b[,1]
2
3769
1884
8.887
0.000229
***
Recovery rate b[,2]
2
35682
17841
84.144
< 2e-16
***
Security b[,5]
2
11297
5648
26.639
1.40E-10
***
Rental charge b[,7]
2
7376
3688
17.395
1.69E-07
***
b[,1]:b[, 2]
4
645
161
0.76
0.552794
b[,1]:b[, 5]
2
1382
691
3.259
0.041265
*
b[,2]:b[, 5]
2
1790
895
4.222
0.016512
*
Residuals
145
30744
212
This system shows the case of a composition of redundancy 3/2. It means that even if one Cloud of three Clouds does not work , the corresponding data will not be lost. Unlike conventional cloud system, each site has own file server for users in each site.
Data1 is 100M bytes. The size of Data1a, Data1b, Data1c is 50M bytes respectively. In this case, we didn’t use Cloud, and each site is basically controlled by a single company or an organization. And each site is linked via a secured network. So, we decided to store the meta-keys to the other site with simple AES encryption. This makes the system simple, since the method to backup the meta-keys is complex and it is difficult to transport them safely.
Figure 3. implemented system
As shown in Fig.3, Data1a, Data1b and Data1c are stored in each HS-DRT server of Main site, Site A and Site B. The data are already processed by HS-DRT, and the size of
Figure 4. Combined server system
Second, we make the unified one server system by combining two servers. Fig.4 shows the combined server system. In the system shown in Fig.3, total amount of data are 750G bytes when each file server has 100G bytes data. The original data is devided into 64 blocks. After this process, the datas are calculated and duplicated into 96 blocks of 150G bytes. So, the amount of all data become (100G+150G)x3=750G bytes. While in the system as shown in Fig.4, total amount of data is 600G bytes. The original data is devided into 64 blocks. After this process, the datas are shuffled and sent to other two sites. Each site has 32 blocks of 50G bytes. So, the amount of all datas become (100G+50G+50G)x3=600G bytes. This method is useful for decreasing the count of servers, and improving the efficiency of the data. And it also means cut down initial expenses. Finally, we could improve the way of recovering data. With Cloud, in order to securely realize the recovery of the data, the direction of data should be one way from Cloud to the user site. On the contrary, with the new application, the direction of data are bidirectional. In Fig.3, for example, the server in Site A has its own data for users in Site A. And the server has the backup data of Site B and Main Site. If the server in Site A is damaged, and exchanged to the new machine, the new server with no data, retrieves former datas from the server in Site B and the server in Main Site at the first stage. Next to the process, the server in Main Site makes backup for Site A. And the server in Site B also makes backup for Site A. VI.
SUMMARY
AND
FUTURE ISSUES
We have presented several commercialized system and the evaluation results by using HS-DRT in the cloud computing environments. We ascertained that the average response time in terms of the file data size is sufficiently practical to realize the corresponding network services. Finally, this paper has described several commercialized network services and system configurations for practical network applications by making use of cloud computing facilities and environments which have already been commercialized. Further studies should address the optimum network utilization technology. We are planning to verify the essential characteristics necessary to fully utilize the network resources to commercialize an ideal disaster recovery system.
(SUZUKI)
B.
In the conventional disaster recovery system mentioned above, we assumed the push-type client nodes or the Cloud storage which have global IP addresses registered in advance in both the data center and the supervisory center. However, it will be more desirable to increase the number of potential users to support the corresponding network services. For this purpose, the client nodes such as in the mobile cell-phone network can be utilized for the network services even if their IP addresses are changeable in time. In this case, the new mechanism not to necessitate the fixed IP addresses for the corresponding client nodes should be developed in addition. We call this type of client as the pull type client. The pull type client nodes with the private IP addresses can be the participants in the system in addition to the conventional client nodes with the global IP addresses. For realizing this mechanism, it will be effective to deploy the web server to accept the voluntary users to play a role as the file storage and relate the node as specific ID with password authentication. In this case, both the data center and the supervisory center register each fragmented file information list related to the client node ID with the changeable IP addresses in preparation to the recovery request from the user. Estimation of the required average response time per requesting message and the specification the web server are further studies. VII.
Insider Attack Prevention Technology
CONCLUSION
We proposed the pull-type network mechanism to handle the available list of client nodes. We verified that the performance of the Web server is deteriorated by the number of fragmented data to be sent and gathered drastically. We also examined the requirements of the Web server to attain the sufficient service level and confirmed that both of the pull type service and the push type service functions should be deployed simultaneously in the proposed disaster recovery system. A.
A.
Integrating Pull and Push type of Services
Automatic Performance monitoring checking mechanism of the Cloud
We implemented the efficient application service by using HS-DRT, which uses plural in-company dispersed sites. Recently the Cloud environment is becoming more popular, though in some case, we cannot use Cloud because of a policy or a rule of the organization. In such cases, this application will provide the effective solutions of a simple system with low-cost. In the next step, we will develop other application services that suites to several different type of situation. ACKNOWLEDGMENTS This work has been partially supported by the Grants-inAid for Scientific Research (Issue number: 21560414), the study (Issue number:151) of the National Institute of Information and Communications Technology (NICT) of Japan, and the study of the Research Institute for Science and Technology of Tokyo Denki University (Q09S-10). REFERENCES [1] N. Miyaho, Y. Ueno, S. Suzuki, K. Mori, and K. Ichihara, "Study on a Disaster Recovery Network Mechanism by Using Widely Distributed Client Nodes," ICSNC 2009, pp. 217-223, Sep., 2009. [2] Y. Ueno, N. Miyaho, S. Suzuki, and K. Ichihara, "Performance Evaluation of a Disaster Recovery System and Practical Network System Applications," ICSNC 2010, pp. 195-200, Aug., 2010. [3] S. Suzuki, “Additive cryptosystem and World Wide master key,” IEICE technical report ISEC 101(403), pp. 39-46, Nov., 2001. [4] N. Miyaho, S. Suzuki, Y. Ueno, A. Takubo, Y. Wada, and R. Shibata, “Disaster recovery equipments, programs, and system,” Patent. publication 2007/3/6 (Japan), PCT Patent :No.4296304, Apr. 2009. [5] Y.Ueno, N.Miyaho, and S.Suzuki, ”Disaster Recovery Mechanism using Widely Distributed Networking and Secure Metadata Handling Technology,” Proceedings of the 4th edition of the UPGRADE-CN workshop, Session II: Networking, pp. 45-48, Jun., 2009. [6] K. Kokubun, Y. Kawai, Y. Ueno, S. Suzuki, and N. Miyaho, “Performance evaluation of Disaster Recovery System using Grid Computing technology,” IEICE Technical Report 107 (403), pp. 1-6, Dec., 2007. [7] NTT-East “Wide area disaster recovery services”, 23.05.2010 [8] Y. Kitamura, Y. Lee, R. Sakiyama, and K. Okamura, “Experience with Restoration of Asia Pacific Network Failures from Taiwan Earthquake,” IEICE Transactions on Communications E90-B(11), pp. 3095-3103, Nov. , 2007. [9] S. Kurokawa, Y. Iwaki, and N. Miyaho, “Study on the distributed data sharing mechanism with a mutual authentication and meta-database technology,” APCC 2007, pp. 215-218, Oct., 2007.[10] Yamato, M. Kan, and Y. Kikuchi, “Storage Based Data Protection for Disaster Recovery,” The Journal of the IEICE 89(9), pp. 801-805, Sep. , 2006. [10] K. Sagara, K. Nishiki, and M. Koizumi, “A Distributed Authentication Platform Architecture for Peer-to-Peer Applications,” IEICE Transactions on Communications E88-B (3), pp. 865-872, 2005. [11] S. Tezuka, R. Uda, A. Inoue, and Y. Matsushita, “A Secure Virtual File Server with P2P Connection to a Large-Scale
Network,” IASTED International Conference NCS2006, pp. 310-315, 2006. [12] R. Uda, A. Inoue, M. Ito, S. Ichimura, K. Tago, and T. Hoshi, “Development of file distributed back up system,” Tokyo University of Technology, Technical Report, No.3, pp. 31-38, Mar 2008. [13] Y. Deswarte, L. Blain, J.C. Fabre, "Intrusion tolerance in distributed computing systems," Research in Security and Privacy, 1991. Proceedings, 1991 IEEE Computer Society Symposium on XXXXX, pp.110-121, 20-22 May 1991[KF Something is missing here.] [14] R. A. Fisher and F. Yates, Statistical tables for biological, agricultural and medical research (3rd ed.). London: Oliver & Boyd. pp. 26–27, 1948. [15] D. E. Knuth, The Art of Computer Programming, Volume 2: Seminumerical algorithms., 3rd edition, Addison Wesley. pp. 142-146, 1998. [16]
S. Tezuka, R. Uda, A. Inoue, and Y. Matsushita, “A Secure Virtual File Server with P2P Connection to a Large-Scale Network,” IASTED International Conference NCS2006, pp. 310-315, 2006. [17]
R. Uda, A. Inoue, M. Ito, S. Ichimura, K. Tago, and T. Hoshi, “Development of file distributed back up system,” Tokyo University of Technology, Technical Report, No.3, pp. 31-38, Mar 2008. [18]
Y. Deswarte, L. Blain and J. C. Fabre, “Intrusion tolerance in distributed computing systems,“ Proceedings of 1991 IEEE Symposium on Research in Security and Privacy, pp. 110121, IEEE CS(1991).