Data Synchronization and Remote Invocation

TR-2004-004 August, 2004

Data Synchronization and Remote Invocation Tomasz Müldner, Zhonghai Luo and Elhadi Shakshuki Jodrey School of Computer Science, Acadia University Wolfville, Nova Scotia, Canada B4P 2R6 Email: {Tomasz.Muldner,Zhonghai.Luo, Elhadi.Shakshuki}@acadiau.ca

Abstract Distributed applications have become very popular and there is a growing need to efficiently and securely distribute data and remotely invoke program components. This paper describes the design and the implementation of a distributed installer (DI), which can accomplish both these tasks. We review goals that any such system should achieve, and show which goals are achieved by DI and several similar systems. This paper provides the description of the functionality, design and the implementation of DI. 1. Introduction With the increased availability of personal computers and fast networks, there are more and more users who access several computers, for example one computer at home and the other in the office. As a result, users store data in files in two or more separate file systems, and have to make sure that the file system they currently access contains most up-to-date version of all files. For this reason, these systems should be synchronized, e.g. newer files from one system are copied to the other system. There are several products, which support file synchronization on multiple machines [9,12]. At the same time, more and more applications are built as distributed applications, rather than centralized ones. Here, a distributed application is one in which data and functionality are distributed across multiple machines connected by a network. Distributed systems provide parallelism and fault tolerance, making them potentially much more powerful than their individual components [7]. Most known implementations of distributed applications use standard Java-based technologies [6]. A typical model for building such applications has been client/server, where client systems interact with a smaller number of server systems [4]. The client/server model has several inherent weaknesses; including the fact that a central server is a bottleneck of the system’s performance, scalability and reliability. These weaknesses are addressed by Peer-to-Peer, P2P environments, a new emerging model for distributed computing [5]. Instead of the traditional client and server roles participating in an application, P2P supports peers that can function as both client and server at various points during the application life cycle. P2P applications are more fault-tolerant than their client/server counterparts, they support higher availability of services that can be replicated among multiple peers, and they are inherently scalable. For P2P environments, data distribution and synchronization among peers are two of the most essential issues [16]. In general, besides data distribution, it is also important to be able to distribute program components. A distributed application consists of multiple components, which are first installed on various networked machines, and then these components are invoked so that they can cooperate in the entire application. Therefore, there are two actions, which have to be performed to initialize a distributed application: distribution and remote invocation. Clearly, it is inconvenient to perform local installations and invocations. Instead, it should be possible to distribute and remotely invoke all components from a single machine. This paper describes a Distributed Installer (DI), which can be used for efficient data distribution and remote program invocation. DI users are peers, which communicate directly to perform these actions. Since DI can access remote file systems, it has to be secure and the transmission of data and instructions can not be tampered with. To accomplish this, DI uses Secure Socket Layer, SSL [1,3,14], which is a widely used transport-layer mechanism for a secure communication path between two parties. DI supports selective data transfer that provides one-to-one, one-to-many and many-to-many

synchronization with a variety of options, and both, push and pull way of getting data. It also supports various modes, in which the receiver can get data, e.g. automatic or manual confirmation. Finally, DI can remotely invoke applications. The rest of this paper is organized as follows. Section 2 discusses general goals for systems that implement data distribution and remote program invocation. Section 3 describes DI, shows several screen shots of DI’s GUIs and applications, and provides a brief description of the implementation of DI. Section 4 briefly describes related products and discusses how these products and DI satisfy goals given in section 2. Finally, section 5 gives conclusions and describes future work. 2. Goals for Data and Program Distribution This section lists essential goals that any application that implement data and program distribution must achieve: • Platform Independence: users of heterogeneous systems must be able to access the application. • Security: data and programs must be distributed in a secure way so that they can not be intercepted and tampered with. • Scalability: the application must be designed in such a way that the addition of users does not significantly hinder application performance. • Pull and Push: both pulling and pushing data should be supported. • Efficiency: providing and obtaining data has to be efficiently implemented. • Synchronization: one-to-one and one-to-many data synchronization should be supported. • Fault Tolerance: the implementation should be resistant to failures. • Flexibility: different update modes should be supported, such as automatic update, manual update, size-based update, time-based update, etc. • Selective Choice: it should be possible to select specific directories and/or files. • Report: a log report should be provided. • Remote Invocation: it should be possible to remotely invoke programs. 3. Distributed Installer (DI) This section discusses functionality, design and implementation of the proposed distributed installer (DI). It also discusses how the DI supports remote invocation. 3.1. Functionality and Design of DI

Let’s start with some definitions, starting with data distribution, and followed by remote program invocation (described in section 3.2). In order to describe the task of distributing data, we consider two kinds of applications, here called respectively subscribers and publishers. A subscriber expresses interest in some data that can be provided by a publisher; a publisher selects subscribers and specifies data to be sent to these subscribers. Note that a single application may be at the same time both, a publisher and a subscriber. Data may be made available to subscribers, using data pulling, where subscribers are trying to pull some data, or using data pushing, where the publishers push these data. A channel is an abstract concept that represents a virtual, unidirectional connection between two nodes (a channel is similar to a pipe used in P2P systems). There are two kinds of channels:

1. 2.

P-channel, created by a publisher. S-channel, created by a subscriber.

Each channel has a name, and an optional description (a text describing the intended use of this channel). The publisher creates a P-channel using DI’s publisher software, see Figure 1. In this case, the publisher created three P-channels, respectively called Quote, Weather and News. Each channel specifies the source of its data, i.e. the directory in the file system on the subscriber’s computer. Then, the publisher can use a push technology to distribute some or all of these P-channels to one or more selected subscribers. When the subscriber software receives the incoming P-channel, it will automatically create the corresponding S-channel based on the information of this P-channel; including the publisher's IP and port number, and the name of the P-channel. In addition, any data in the P-channel will be pushed into the newly created S-channel.

Figure 1. Publisher’s GUI.

Now, we describe details of P-channels, and actions invoked by the buttons that appear in the publisher’s GUI, see Figure 1. A publisher has the following static properties, i.e. data describing its services (stored in the configuration file publisher.properties): 1. port number on which the publisher will send data; 2. a root directory for storing all channels’ description; 3. the filename and password for the keystore (Apache SSL) used to establish secure SSL communication path; 4. alternative key name and password. In addition, the publisher uses another static configuration file, called target.properties, which lists all potential subscribers. Pressing the Create button on the GUI shown in Figure 1, the publisher starts a GUI to create a new P-channel, which specifies a channel name, description information, and data source. Figure 2 shows how to create the Weather channel from Figure 1.

Figure 2. Creating a P-channel. To distribute the P-channel to subscribers, the publisher will use the Distribute button from Figure 1. This way, the publisher can specify which subscribers will receive data, and which files/directories are to be sent. Thus, creating and using a P-channel represents the action of a publisher specifying subscribers and pushing data to these subscribers. Note that the subscribers can prevent any unsolicited channel distributed by publishers. Specifically, each subscriber has the following static properties stored in the configuration file subscriber.properties: 1. port number for receiving data; 2. a root directory for storing all incoming channels’ data; 3. a channel creation type, indicating how to create a new channel on the subscriber side when receiving a unsolicited channel from a publisher. This type is one of the three values: 0 (the default value) for automatic channel creation without user’s intervention, 1 for the prompt allowing the subscriber to accept or reject the data, and 2 for refusing any unsolicited channel; 4. the filename and password for the keystore used to establish secure SSL communication path; 5. alternative key name and password; 6. a security flag indicating whether or not the data transmission is secure (true by default); 7. Channel.UpdateType, indicating how to update channel. Here, there are three options: 0 (the default value) means that when receiving an existing unsolicited channel from publisher, a channel update operation is performed automatically without user’s intervention); 1 for the prompt allowing the user to accept or reject the update; and 2 for refusing the unsolicited update; 8. Channel.UpdateOption, indicating how to update channel’s content. There are four options: 0 (only if file sizes differ), 1 (only if file times differ), 2 (if file sizes or times differ, the default value), and 3 (forced update). The above design supports relative virtual mapping. To explain this concept, consider the following example: 1. A publisher P creates a channel named as "quote", and specifies the corresponding directory "c:\program\quote". 2. A publisher distributes the "quote" channel to a subscriber S with all channel data (it will include all files and sub-directories under "c:\program\quote"). 3. In S, a channel named "quote" is automatically created, and all received data (files and subdirectories) are stored with the original hierarchy under the root channel directory, specified in the configuration file subscriber.properties.

Now, we describe S-channels. As noted above, S-channels are typically created by subscriber software when receiving incoming P-channels pushed by publishers. However, the subscriber can also manually create an S-channel. Before creating S-channel manually, the subscriber must know in advance the available channel names in publisher side. If the channel name specified in S-channel does not exist in publisher side, this S-channel is invalid, just like that you set wrong frequency for a TV station channel. In other words, the S-channel is the reference to the corresponding P-channel. (For future extensions of this design, see section 4.) A subscriber uses the GUI shown in Figure 3. Four buttons that appear at the bottom of the GUI are respectively used to create, modify, and delete S-channels, as well as to update the selected S-channel, i.e. pull data from this channel.

Figure 3: Subscriber’s GUI.

When a new S-channel is created, it specifies an address of the publisher node; see Figure 4. Here, an address of a network node is the IP of this node, and the port number where the publisher software can listen; for example 131.162.8.9:10001. This S-channel also specifies the channel, its description, and the “update type” and “update options”; see the above description of the configuration file subscriber.properties. It should be noted that the update type and options can also be used for an existing S-channel, for the example one created by the publisher. This way, the subscriber can control the way data are received.

Figure 4. Example of an S-channel.

The update action invoked from the subscriber’s GUI, see Figure 3, performs one of two functions. If there are no local files in this channel, then it pulls all data from publisher to this subscriber. Otherwise, this action updates these files according to the specified update options. 3.2. Remote Invocation

The current version of DI supports remote invocation of only Java programs. To explain how it works, consider the following example of a distributed system consisting of 3 components, say C1, C2 and C3, residing on machine N. Below, we describe steps to be taken to distribute these components from machine N to three computers, N1, N2 and N3 and then to invoke all components: 1) A publisher (on machine N) creates three channels; one for each of the three computers N1, N2 and N3. 2) The publisher specifies the Main class in the descriptor file app.xml file (see below) for each channel. 3) The publisher adds addresses of N1, N2 and N3 to the file target.properties. 4) The publisher distributes all channels to three subscribers 5) The publisher uses the "remote start" button to remotely start all components. The app.xml file is an XML-based descriptor, which defines the startup class for the distributed program and the necessary path information to load dependent classes for this program, for example: ca.acadiau.cs.Lab client.jar

The above file should be located in the channel and sent to the subscriber along with other channel’s content. Subscribers can not modify this file, and only publishers can issue the remote start command. When the remote start command is sent to the subscriber, the subscriber will parse the descriptor file and load the startup class. 3.3. Implementation of DI

There are several techniques that support portability, namely: CORBA [2], Web Services and the underlying XML technology [17], and Java technology. We used Java, JDK 1.3 and SSL to support secure data transmission. The implementation of DI employs multi-threading technology to securely efficiently transfer channel's data from publishers to subscribers, and it is depicted in Figure 5. There are three different levels in this system. The top level includes “Channel Manager” and “Service Provider”. “Channel Manager” is a GUI interface responsible for not only managing the components of a distributed program but also for handling interactions with the end users. “Service Provider” implements two kinds of services: “Listening Service” and “Component Server Service”. “Listening Service” runs in subscriber nodes, obtaining instructions and data from publishers. “Component Server Service” runs in publisher nodes, acting as component server to send selected components to selected subscribers. The second level is “Secure Transfer Manager”. SSL is employed to ensure the secure data transmission. The lowest level is standard “TCP/IP”. “TCP/IP” is the most popular network protocol, so this system can run in most LANs and Internet. Computer Node A

Channel Manager

Computer Node B

Service Provider

Channel Manager

Service Provider

Secure Transfer Manager

Secure Transfer Manager

TCP/IP

TCP/IP

INTERNET

Figure 5. DI design levels.

Figure 6 shows the interaction between publishers and subscribers. This figure includes three nodes: the P node refers to a publisher, the S node refers to a subscriber, and the SP node refers to both a subscriber and a publisher.

S node TCP/IP SSL Listening Service

P node TCP/IP SSL Component Server Service

Listening Service

Component Server Service SSL TCP/IP SP node

Figure 6: Interactions between publishers and subscribers. 4. Related Work This section describes three selected file synchronization tools available from various sources, and compares them with DI. It also shows how DI satisfies goals from section 2. UNISON [15] is a file-synchronization tool for both, Unix and Windows platforms. It allows two replicas of a collection of files and directories to be stored on different hosts (or different disks on the same host), modified separately, and then brought up to date by propagating the changes in each replica to the other. It works between any pair of machines connected to the internet, communicating over either a direct socket link or tunneling over an rsh [10] or an encrypted SSH connection [8]. Unlike simple mirroring or backup utilities, Unison can deal with updates to both replicas of a distributed directory structure. Finally, Unison is resilient to failure. Idem [12] is File Synchronization and Folder Replication utility, which only runs in various flavors of Windows to mirror files. It can automatically replicate directories and synchronize files, and backup documents. Automatic actions are supported by checking a list of source folders at regular time intervals and updating target folders with new or modified files including sub-directories. Idem not only checks and updates time stamps but also it checks and copies files or folders security information; e.g. ACL permissions on NTFS partitions or attributes such as hidden, system, and archive. Backup and synchronization performed by Idem can perform update if the file is different in the source and target folders; or if the file in the source folder is newer than the corresponding file in the target folder. Finally, a daily log report lets you keep track of files processed during mirroring operations. The third tool described here is SFTP Plugin for Eclipse, which provides file and directory synchronization between the workspace and a remote location using the SSH Sftp protocol [11]. Its connection with the server is encrypted with SSH, and it uses Sftp protocol to transfer files. Table 1 compares the above three tools with DI.

Table 1. Comparison of various tools. Category Platform Independence Security Scalability

Pull and Push Efficiency One to one synchronization One to multiple synchronization Fault tolerance Flexibility 1. If file size differs 2. If file date differs 3. If size and date are differ 4. Forced update Selective choice Report Remote invocation Support files and directories Auto periodical update Support NTFS security information Receiver’s passive update type 1. Automatic 2. Prompt 3. Manual

UNISON File Synchronizer Unix&Win Y (SSH) N (only support one to one) N Y Y N Y 1,2,3

IDEM

SFTP Plugin

Distributed Installer

Win N N (only support one to one) N Y Y N N 1,2,3

Unix&Win Y (SSH) N (only support one to one) N Y Y N N 1,2,3,4

Unix&Win Y (SSL) Y (support one to many, many to many, using multi-thread technology) Y Y Y Y N 1,2,3,4

N N N Y N N

N Y N Y Y Y

N N N Y N N

Y N Y Y N N

1

1

1

1,2,3

5. Conclusions and Future Work This paper described goals for applications that implement remote file synchronization and remote program invocation, and it presented a distributed installer, DI which satisfies these goals. We discussed the design and implementation of DI, and showed how it works. DI uses SSL for secure communication in the transport layer. Comparing with SSH2 [13], SSL utilizes a more completed and a widely used PKI (public key infrastructure) than SSH2 to support authentication. SSL uses standard X.509 certificates to authenticate communication parties, while SSH2 just uses encrypted message digest. Therefore, SSL is more widely accepted than SSH2. DI can be used with firewalls. For example, the subscriber with an intranet and IP address 192.168.0.10 can access any publishers on Internet through the firewall and gateway/NAT. The key point is that the policies of firewall and gateway/NAT should allow the access to publisher's IP and port. Because currently most of firewall and gateway/NAT open the port of 80, if the publisher software listens at 80 (like a web server), it becomes firewall/gateway/NAT transparent. Now, consider a specific example of a campus that uses a firewall. It is possible to set up an S-channel from an off-campus machine to a machine inside the firewall. The publisher inside the firewall can push data to the off-campus machine. But if the off-campus machine wants to pull data from the publisher, its user would have to ask the administrator to grant permissions to the IP of the offcampus machine. As future work we plan to do the following. Since the implementation of file synchronization is mostly completed and we will work on the high-efficiency transfer using compression technology and break-point resuming technology. Second, the current interface is GUI-based, and for some applications it will be useful to provide a text-based interface. Third, the current implementation of remote program invocation is using a static list of all potential subscribers, and will be changed to allow for dynamic subscription. Fourth, to improve the flexibility of our system, we will add a publisher broker, which will maintain all available publishers (and their channels) so that subscribers can dynamically set and modify these data. Fifth, to execute remotely Java programs, it may have to be necessary to setup Java policy files, and the current version of DI does not allow remote installation of these files. Finally, our future work will allow for invocation of non-Java programs.

References [1] Apache-SSL, http://www.apache-ssl.org/ [2] CORBA, http://www.omg.org, 2003 [3] Instant SSL, SSL Certificates, http://www.instantssl.com/ [4] Leighton. G, Peer Web Services: Defining a Peer-to-Peer framework for Web Services, Honours Thesis, Acadia University, 2003. [5] Leuf, B., Peer to Peer: Collaboration and Sharing over the Internet. Pearson Education, Inc., 2002. [6] Müldner, T., “Analysis of Java Client/Server and Web Programming Tools for Development of Educational Systems”, Webnet’98, Orlando, November, 1998. [7] Mullender, S. J. and Presotto, D., “Programming Distributed Applications using Plan 9 from Bell Labs”, 4th European Research Seminar on Advances in Distributed Systems (ERSADS), Bologna, Italy, May, pp. 115-132, 2001. [8] OpenSSH, http://www.openssh.org/ [9] Peer Software, Inc., PeerSync High Volume peersync/ps_hvs_spec.asp?pid=pshvs&sol=ps

Server

(HVS),

http://www.peersoftware.com/solutions/

[10] Remote Shell, http://www.mkssoftware.com/docs/ man1/rsh.1.asp [11] Sftp File Synchronization Plugin, http://klomp.org/eclipse-plugins/org.klomp.eclipse. team.sftp/ [12] Soft Experience, Idem: Idem File Synchronization & Folder Replication utility, mirroring Windows files, http://peccatte.karefil.com/Software/Idem/ idemhelpeng.htm [13] SHH2, http://www.employees.org/~satch/ssh/faq/ manpages/ssh2_man.html [14] SSL, http://browserinsight2.lunaimaging.com: 8090/ faq/ssl.xtp [15] UNISON file synchronization, http://www.cis.upenn.edu/~bcpierce/unison/ [16] Windows Peer-to-Peer Networking, http://www.microsoft.com/windowsxp/p2p, 2003. [17] XML, Extensible Markup Language 1.0, 2nd Edition, http://www.w3.org/TR/REC-xml, 2003.

Data Synchronization and Remote Invocation

Data Synchronization and Remote Invocation

Suggest Documents

Data Redistribution and Remote Method Invocation in ... - CiteSeerX

Efficient Remote Method Invocation - Computer Science - Vrije

Reflective Remote Method Invocation - Google Sites

Implementing Asynchronous Remote Method Invocation ... - CiteSeerX

Tactics for Remote Method Invocation - Semantic Scholar

Integrating Remote Invocation and Distributed Shared State - CiteSeerX

The OPeNDAP and Remote NetCDF Invocation (RNI) - Rensselaer

Remote synchronization of amplitudes: a demodulation and ...

Wide-Area Parallel Programming using the Remote Method Invocation

java.rmi The remote method invocation guide [Book ...

High Performance Java Remote Method Invocation for Parallel ... - UdC

Efficient Remote Method Invocation - Department of Computer Science

Remote Batch Invocation for Compositional Object Services - CiteSeerX

Supporting Interactive Invocation of Remote ... - University of Kent

Remote Batch Invocation for Compositional Object Services - CiteSeerX

Remote Batch Invocation for Web Services - UT Computer Science

Data-Centric Synchronization - CiteSeerX

Remote File Synchronization Single-Round ... - Semantic Scholar

Decoupling Synchronization and Data Transfer in ... - CiteSeerX

Communications and Data Synchronization for Line Current ...

Remote atomic clock synchronization via satellites and optical ... - Core

Drift-free femtosecond timing synchronization of remote optical and

Data Link Control Frame synchronization

Schema-Directed Data Synchronization - Irisa