Services to Support Distributed Applications in a ... - Semantic Scholar

10 downloads 9116 Views 25KB Size Report
support distributed mobile computing. 1: Introduction. Recent years have seen the emergence of mobile computing as a major new field of computer science.
Republic, 1994.

Services to Support Distributed Applications in a Mobile Environment Nigel Davies†, Stephen Pink* and Gordon S. Blair† †Distributed

*Swedish

Multimedia Research Group, Lancaster University, Lancaster, U.K. nigel,[email protected] Abstract

A key characteristic of mobile computing is that the end systems involved can experience differing degrees of connectivity during typical operational cycles. This paper discusses the issues associated with developing distributed system services to operate in such an environment. We focus on the provision of file system support and argue that existing file systems, including those developed for use in a mobile environment, contain assumptions about their underlying communications infrastructures which are unlikely to hold in a mobile environment. This argument is supported by an in-depth examination of a specific file system issue: the support of shared libraries. A new service to support shared libraries in mobile environments is proposed and we discuss the integration of this service into a wider architecture of reactive services being developed to support distributed mobile computing.

1: Introduction Recent years have seen the emergence of mobile computing as a major new field of computer science. During this time a number of papers have been produced which define the scope of the field (e.g. [1], [2], [3]) and the issues which must be addressed. We propose that within the context of distributed systems research the key characteristic of mobile computing environments is that end systems experience differing degrees of connectivity during typical operational cycles. In particular, systems are expected to function when connected to a range of heterogeneous networks including high-speed networks (fully connected operation) and low-speed networks (weakly connected operation), and when totally disconnected (disconnected operation). To date, the transition between modes of operation has been a heavy weight process often involving both hardware and software re-configuration. However, the advent of technologies such as the MINT mobile internet router promise to significantly reduce the overheads associated

Institute of Computer Science, SICS, Box 1263, S-164 28 Kista, Sweden. [email protected]

with the transition process and lead to a more dynamic environment in which an end-system observes rapid and marked fluctuations in network characteristics 1 [4]. The environment described above presents a number of new challenges to distributed system services. In particular, in such an environment it is inappropriate to make assumptions regarding the degree of connectivity an endsystem may experience. For example, an end-system operating over a conventional wire-based network might reasonably expect a theoretical bandwidth in the region of 10-100 Mbps. The same system utilising a wireless widearea radio network would be fortunate if the bandwidth available exceeds 9.6Kbps. An analysis of existing distributed system services suggests that many contain assumptions which will make it difficult for them to operate in such a dynamic environment. In particular, they lack the ability to react to changes in the quality-of-service (QoS) provided by their supporting communications infrastructures. This paper focuses on the problems of developing a set of file system services to operate in a heterogeneous networked environment. Section 2 describes the evolution of current distributed file systems and highlights the aspects of these systems which make them unsuitable for our proposed environment. Particular attention is paid to the Coda file system which has been designed to operate in both connected and disconnected modes. Section 3 then discusses the issue of shared library support which we use to illustrate the assumptions present in current distributed file systems. It is argued that new system services are required to support shared libraries in weakly connected environments and one such service is described in some detail. Section 4 then generalises the results of the previous sections by considering the provision of a range of reactive distributed system services. An architecture based on the Chorus operating system [5] is presented in which support is provided for such services to monitor changes in the QoS of their communications infrastructures in order that they 1

The MINT router is being developed as part of the Walkstation project [4] and enables mobile computers to dynamically exploit the services offered by a number of networks. More specifically, it supports seamless transition between network types.

can react accordingly. Finally, section 5 contains some concluding remarks and section 6 our references.

2: The Evolution of Distributed File Systems Users of mobile workstations have a need for sharing files with their peers on stationary networks. File sharing in stationary environments is traditionally provided by a distributed file system. These file systems create a unified namespace for their users' files and achieve good performance by using mechanisms for caching data on clients. Whenever caching is used in a distributed system, consistency becomes an issue. Thus, consistency guarantees between client caches and file system stores are also provided by most distributed file systems.

2.1: AFS and NFS Much work has been done in the last decade on the design of distributed file systems [6] and various caching schemes and consistency models have been proposed. Some distributed file systems have become commercially very successful and millions of users rely on them every day. Two examples of such distributed file systems are Sun's Network Filesystem (NFS) [7] and Transarc's Andrew File System (AFS) [8]. NFS clients cache recently used blocks of files in main memory. NFS servers are stateless, so the onus is on the client to determine whether its cached blocks are stale. In contrast, AFS clients cache recently used blocks of files on disk and a stateful server is used which promises cache consistency callbacks to clients. Unfortunately, neither of these distributed file systems were designed for mobility. Both are designed with assumptions about the network that make mobile client operation very difficult. NFS assumes that the client and server are permanently connected. The NFS client must continually ask the NFS server whether its cached blocks are consistent with the file server store. Mobile clients, however must often operate in weakly connected or disconnected modes. A mobile NFS client would find it very difficult to support operation by its users on its cached replicas when the client becomes disconnected from the server or is only weakly connected to it originally. And, although AFS servers contain enough state to be able to tell the client when to flush its cache, when client and server are disconnected, no mechanism is provided in the original AFS design to detect inconsistencies (Huston and Honeyman [9] retrofit aspects of disconnected operation into AFS. This work is an attempt to make AFS more closely resemble the Coda file system, described in section 2.2., for mobile users while allowing compatibility with a large community of AFS users).

2.2: Mobility and the Coda Distributed File System One of the best known file systems that is designed

explicitly for mobile client operation is Coda from Carnegie Mellon University [10], [11]. Coda, a descendant of AFS, provides a unified name space for potentially mobile clients and their stationary servers as well as guarantees of consistency for the data in the files. Coda clients are not required to continually query the server to guarantee cache consistency since the server contains enough state to notify the client when the client's replica becomes stale. High availability is provided by file server replication and by caching recently used whole files on client disks (whole files are used instead of recently used blocks in order to avoid partial availability of files). Files are cached on disk instead of in main memory (as in NFS) since it is desirable to support disconnected operation across client reboots. In Coda the files that are cached on the client are determined using both implicit and explicit sources of information. The implicit information is recorded in the form of a reference history. The explicit information consists of a "hoard database" constructed by the user which contains a list of files and associated cache priorities. This information can either be generated manually or by automatically logging file activity associated with a given task. The process of updating the cache based on a user's hoard database is termed "hoard walking.". This process takes place periodically during routine operation and may also be initiated explicitly by users prior to planned disconnection. In case of network failure, clients are able to operate in disconnected mode using the contents of their cache. During this disconnected period, the program on the client that is responsible for transferring the files from the server to the disk cache when the network was available becomes responsible for logging any changes made to the cached replica. When connection between client and server is restored, the contents of the client's cache are reintegrated into the server's file store. Conflicts that have occurred for shared files are resolved automatically when possible, although the user may be required to resolve some conflicts manually.

2.3: Limitations of Existing File Systems As mentioned above, both AFS and NFS have built in assumptions about the underlying network, i.e. a fully connected environment is assumed. Coda was designed to avoid assumptions about full connectivity and has been successfully used in environments where disconnected operation is common. On closer examination, however, the Coda design also has built in assumptions based on the underlying network. For example, the decision to cache whole files assumes high bandwidth is available during periods of connection. This decision might not be so appropriate if only limited bandwidth is available or if file sizes are large. Similarly, the reintegration process batches up consistency checks until the next period of full connection. In theory, though, some checks could be carried out during periods of partial connectivity thus reducing the

overhead of reintegration and maintaining higher levels of global consistency. In our research we are interested in the problems of providing services which operate over heterogeneous networks and which therefore span the full spectrum of degrees of connectivity. The systems reviewed are not sufficiently flexible to exploit the various levels of connectivity. As a further illustration of this lack of flexibility, the following section considers the specific issue of supporting shared libraries in a weakly connected environment. We will return to more general system service issues in section 4.

3: Supporting Shared Libraries in a Weakly Connected Environment 3.1: The Rationale Behind Shared Libraries Shared libraries were introduced into a number of operating systems (e.g. UNIX SVR3 [12], SunOS [13] and OS/2 [14]) in the 1970’s and 1980's partly in response to the trend towards larger executables. Shared libraries exploit the most common distributed systems architecture (multiple workstations connected by a local area network and sharing a file name space) by allowing a single copy of commonly used functions to be maintained on a file server. These functions are linked into one or more arbitrary size libraries which may be shared by all of the nodes on the network. When an application is linked to form an executable image, the linker can automatically include those sections of the library which are required (static linking) or it can include references to the appropriate libraries which are resolved at execution time (dynamic linking). Dynamic linking is the more powerful technique since it can be used to ensure that executable files are kept to a reasonable size. In addition, it allows programs to link to the most recent version of a shared library without requiring re-compilation. Finally, in some operating systems (e.g. SunOS) dynamic linking allows applications running on the same workstation to share a single loaded copy of a library, considerably reducing memory requirements of workstations. A discussion of the issues relating to implementing shared libraries can be found in [15]. For the remainder of this paper it should be assumed that references to shared libraries imply dynamic linking.

3.2: The Impact of Mobility It might be concluded from the above section that the concept of shared libraries could be exploited in a mobile environment. Indeed, the limited storage capacity of mobile workstations dictates that executables must be kept to a moderate size. Moreover, if we wish to support the transfer of executable files to a mobile workstation that is weakly connected (e.g. by a 9.6 Kbits/sec wide area radio network), the size of executable files must be kept to a minimum since the time taken to perform this transfer will be

governed by the file size. Shared libraries are therefore clearly desirable in a mobile environment. However, there are two significant difficulties associated with supporting shared libraries in such an environment if Coda is used as the file system: (i) Coda does not analyse external dependencies of executable files, and, (ii) as discussed in section 2.2, the granularity of caching in Coda is whole files. The first of these problems only occurs if an executable file is cached without having been executed recently (e.g. as a result of a hoard walk). In the normal situation, where files are cached as a result of being referenced, the library files will also be cached since they will have been referenced at the same time as the executable file. The second problem is more general and arises as a result of the size of typical shared libraries: a characteristic feature of these files is that they tend to be significantly larger than executables. For example, one of the most commonly used of all shared libraries, libc, occupies approximately 500 KBytes, and the libraries for applications containing X Windows code run to several megabytes. If an executable references such a library then the Coda file system will transfer the entire library to the client disk. Indeed, one executable can contain references to many large shared libraries; perhaps so many that the cache becomes full and execution becomes impossible (shared libraries cannot be discarded from the cache in the normal manner since they must remain accessible to the application for the duration of its execution.). Even if all the libraries can fit into the cache, in a weakly connected environment the transfer time for these types of libraries would be measured in tens of minutes. One partial solution to these problems would be to restrict the size of shared libraries. In this way transfers which were necessary at execution time (because the dependencies were not analysed) would take a reasonable period (assuming a connection was available) and the policy of caching whole files would remain valid. However, such a restriction would represent a significant deviation from the current trend towards progressively larger shared libraries. Thus there is a conflict in Coda concerning the use of shared libraries in a weakly connected scenario. Shared libraries are desirable because of the poor quality links available to a mobile and its restricted storage and processing capacity. However, the restricted communications link and storage make it expensive or perhaps impossible to use Coda to cache the libraries on the client. The following section proposes a solution to this conflict.

3.3: A Proposed Solution The issue at the heart of the problem of supporting shared libraries is their size. If this could be reduced it would be practical to copy shared libraries onto a mobile along with the executables which reference them.

Compression schemes could, of course, be used to help address this problem (in addition to contributing towards a general reduction in storage and communications requirements [16]). However, it is obviously desirable to minimise the size of libraries prior to compression. To this end, we note that shared libraries are not simple files in the usual sense, but a number of files combined into a single larger file with an appropriate header. Hence, libraries can be partitioned into a number of smaller subcomponents, each subcomponent corresponding to an object file which was included in the linking process to create the library. We therefore suggest that when objects are copied to a mobile only those object files they reference are copied with them (rather than the entire libraries). These object files can then be re-combined into a shared library when on the mobile to provide transparency to executables. To allow us to evaluate this approach, we have implemented a tool which analyses executables and, for each shared library referenced, determines which object files within the library are required. These object files can then be transferred to the mobile where the tool reassembles them into a shared library. Redundant copying can be avoided by maintaining a list of previously transferred object files. Integrating the newly transferred object files into the client’s existing libraries is non-trivial if it is to be achieved without causing disruption to executing programs which access these libraries. The approach we have adopted is to introduce a level of indirection between the names for libraries which are embedded in executables and the libraries themselves. When components of a library are transferred to a client for the first time we create a lookup file with the same name as the library on the client. The required object files of the library are then transferred to the client and assembled into a shared library with a unique, system defined name. This name is placed in the lookup file. When executables attempt to open a library at linktime a modified version of the run-time link editor consults the lookup file and opens instead the libraries it specifies. Since executables only consult the lookup file at start-up time there is no additional performance penalty after this point. Furthermore, should a new executable require additional object files to be added to the library, a new entry is placed in the lookup file, thus providing a one-to-many mapping between library names embedded in executables and files in the client’s cache. Libraries which become excessively fragmented can be rebuilt at system shutdown/start-up time. Since the run-time link editor is itself dynamically linked with applications, we are able to maintain binary compatibility with applications which run on the servers. Initial results suggest that, using our analysis tool, we can substantially reduce traffic between a mobile client and its server. Consider the following two examples, a simple utility written in C and an ANSAware [17] client object (ANSAware is a distributed systems platform whose executables contain significant portions of library code; we use it here since many of our mobile applications are

written using ANSAware). If we statically link these applications, they occupy 106KBytes and 254KBytes respectively. If we dynamically link them, the sizes of both executables are reduced to around 25KBytes. These figures suggest a better reduction in executable size than can generally be obtained because the applications we tested were very small. However, reductions of 45% to 60% are typically quoted for standard applications, with executables for library dominated applications (e.g. graphics based) being in the region of a factor of 10 or more [15]. Given that we have dynamically linked our applications, the total size of the libraries required by the applications is 565KBytes for the C application and 655KBytes for the ANSA object. However, using our analysis tool we are able to reduce the size of the required libraries to 336KBytes for the C application and 352KBytes for the ANSA object. On a wide area low-bandwidth radio link this represents a substantial time saving. Moreover, since analysis of a number of typical ANSA executable files suggests that they mostly reference the same group of object files within the libraries, we expect that after the initial copying of the libraries subsequent dynamically linked executables can be copied without transfer of additional library information. Assuming this pattern of references, our system only needs to transfer two ANSA executables to make substantial savings over a system which statically links executables (the saving is (2*254)(352+50)=106KBytes).

4: Towards Mobile Services Using the services described in section 3.3, we are able to make substantial savings in communications traffic when shared libraries are moved to the client's cache (the regularity with which this happens will depend on a number of factors, e.g. cache size, total number of libraries required, frequency of library updates etc.). Whether or not this is important depends on the characteristics of the underlying communications. For example, if the client is fully connected to its file-server by a dependable link then there are likely to be few performance gains to be made by trading processing power against network traffic. It is therefore clearly important that services are able to monitor the QoS of their communications infrastructures. In more detail, we see two stages to the development of services to operate in a mobile environment:(i) Provision of mobile-aware QoS monitoring and management facilities which can provide information on the level of service currently being offered by the underlying communications infrastructure. (ii) Implementation of reactive services which utilise (i) to react to changes in their supporting communications infrastructures. Such services must avoid assumptions about their underlying support environment which would prevent them from operating effectively across a range of networks.

Figure 1 presents an architecture of a system currently being developed at the Lancaster University based on the concept of reactive services. Novel Applications

Existing Applications

File System

ODP Platform

Chorus + QoS Driven Bindings

Figure 1 : The MOST Architecture

The architecture is supported by an extended version of the Chorus distributed operating system [5]. Chorus has been chosen as the operating system for a number of reasons. For example, the use of modular design and implementation techniques within Chorus allows relatively easy substitution of new components such as new communication subsystems and file system components. In addition, services such as device drivers can be implemented (and debugged) in Chorus as user processes and then simply run as system processes when they are completed. This is in sharp contrast to the effort required to develop new device drivers in, for example, UNIX. The principal extension we propose to the Chorus system is the addition of QoS configurable connections used to support the full range of distributed systems services2. Users creating connections are required to specify a number of QoS parameters including the allowable connection_down_time and the throughput and latency of the provided connection. Connection_down_time, expressed in milliseconds, specifies the maximum period for which a connection can be unavailable. Throughput and latency are expressed in KBytes per second and milliseconds respectively. Violations of these QoS parameters are reported to the programmer by the system upcalling user defined procedures called QoS handlers. In mobile networks, where processing resources are generally more plentiful than communications resources, we anticipate that QoS handlers may perform substantial service re-structuring in the face of QoS degradation. Further details of our proposed extensions to Chorus can be found in [19]. In ongoing work, the extended version of Chorus will be used to implement both an object-based distributed systems platform and a distributed file system. The platform will be based on the ANSAware distributed systems platform [17] with QoS bindings made visible to application programmers in the form of object bindings. Application programmers will then be able to specify the desired QoS of a given binding and be notified if this QoS cannot be provided. As an example of how QoS information can be used in a distributed systems platform, consider a mobile 2

Note that a number of other extensions have been made to accommodate real-time traffic; further details can be found in [18].

application responsible for informing field workers of approaching dangerous weather conditions in the field (this example is based on a problem encountered by the authors as part of the MOST project at Lancaster [20]). The application consists of a single, central service which has access to national weather information, and a number of client objects (one for each field engineer) able to query the service. There are two possible ways of structuring this application: either the client application can register an interest in the weather in particular areas and the server subsequently notify it of changes in conditions, or, the client application can poll the server at regular intervals for weather reports on the relevant areas. In a communications environment with high connectivity the former solution would almost certainly be adopted to avoid the communications and processing overheads incurred by polling. However, in an environment with poor connectivity this event driven approach is less appropriate because users may be unable to distinguish the ‘no news is good news’ case from the case where the link between the server and mobile is broken. Thus in face of poor connectivity it is more appropriate for clients to poll the server (even though this generates more network traffic). The implication is that if the same application must run in an environment with varying connectivity, it should be capable of dynamically reconfiguring its communications strategy depending on the degree of connectivity available at any given time. Hence, the application must be able to monitor the QoS of its communications and react to changes in this QoS. In the case of the distributed file system, the QoS driven bindings will be used in the implementation of the system. In particular, it is anticipated that the file system will make use of changes in throughput, latency and degree of connectivity to determine its caching strategies. For example, it is hoped to integrate the shared library service described in section 3.3. as a service which the file system can use during periods of weakly connected operation. The architecture described above has been designed to support two distinct types of application: mobile-aware novel applications and existing conventional applications. Novel applications will make use of the distributed file system and the distributed systems platform. In addition, they will use the QoS driven bindings supported within the distributed systems platform to monitor and react to changes in the networks QoS. In contrast, conventional applications will rely on the file system to, wherever possible, make the effect of mobility transparent.

5: Concluding Remarks In this paper, we have considered the provision of distributed system services designed to operate given varying degrees of connectivity. A review of existing file system services, including those designed to operate in disconnected environments, highlighted the lack of flexibility in dealing with such a spectrum of connectivity. This argument was further illustrated by the consideration

of the design of a file system service to support shared libraries in a weakly connected environment. From this analysis, we argue that it is important in such environments for distributed system services to be able to monitor the quality of service provision from the underlying network and to react to changes in this provision. We have proposed an architecture, based on the Chorus distributed operating system, which facilitates the development of such services. Ongoing research [20] is examining the design of a distributed file service and associated object-based distributed systems platform. The distributed file service is being designed to enable existing applications to operate transparently in a mobile environment. In contrast, the object-based platform will enable a range of new collaborative, distribution-aware mobile applications to be developed.

[8] [9]

[10]

[11]

[12]

6: References [1]

[2]

[3]

[4] [5]

[6] [7]

Duchamp, D. "Issues in Wireless Mobile Computing" Proc. Third Workshop on Workstation Operating Systems, Key Biscayne, Florida, USA, IEEE Computer Society Press, Pages 2-10. 1992. Imielinski, T., and B.R. Badrinath. "Mobile Wirless Computing: Solutions and Challenges in Data Management", Technical Report, Department of Computer Science, Rutgers University, New Brunswick, U.S.A. 1992. Davies, N. "Mobile Computing Bibliography", Internal R e p o r t , Department of Computing, Lancaster University, Bailrigg, Lancaster, LA1 4YR. February 1994. Hager, R., A. Klemets, G.Q. Maguire, M.T. Smith, and F. Reichert. "MINT - A Mobile Internet Router" Proc. IEEE VTC'93, Secaucus, NJ, USA. 1993. Rozier, M., V. Abrossimov, F. Armand, I. Boule, M. Gien, M. Guillemont, F. Herrmann, C. Kaiser, S. Langlois, P. Léonard, and W. Neuhauser. "Overview of the CHORUS Distributed Operating System", Technical Report CS/TR-90-25, Chorus systèmes. 1990. Satyanarayanan, M. "Distributed File Systems" Distributed Systems. Editor: S. Mullender. 2nd Edition, Addison-Wesley, Pages: 353-383. 1993. Walsh, D., B. Lyon, G. Sager, J.M. Chang, D. Goldberg, S. Kleiman, T. Lyon, R. Sandberg, and P. Weiss. "Overview of the SUN Network File System" Proc. USENIX Winter Conference, Dallas, TX, USA. 1985.

[13] [14] [15] [16] [17] [18]

[19]

[20]

Howard, J.H. "An Overview of the Andrew File System" Proc. USENIX Winter Conference, Dallas, TX, USA. 1988. Huston, L.B., and P. Honeymann. "Disconnected Operation for AFS" Proc. USENIX Symposium on Mobile and Location Independent Computing, Cambridge, Massachusetts, Pages 1-10. 1993. Satyanarayanan, M., J.J. Kistler, P. Kumar, M.E. Okasaki, E.H. Siegel, and D.C. Steere. "Coda: A Highly Available File System for a Distributed Workstation Environment" IEEE Transactions on Computers Vol. 39 No. 4, Pages 447-459. 1990. Satyanarayanan, M., J.J. Kistler, L.B. Mummert, M.R. Ebling, P. Kumar, and Q. Lu. "Experiences with Disconnected Operation in a Mobile Environment" Proc. USENIX Symposium on Mobile and Location Independent Computing, Cambridge, Massachusetts, Pages 11-28. 1993. Arnold, J.Q. "Shared Libraries on UNIX System V" Proc. USENIX Summer Conference, Pages 1-10. 1986. Gingell, R.A., M. Lee, X.T. Dang, and M.S. Weeks. "Shared Libraries in SunOS." Proc. USENIX Summer Conference, Pages 131-145. 1987. Letwin, G. "Dynamic Linking in OS/2", Byte, Pages 273-280. April 1988 Sabatella, M. "Issues in Shared Library Design" Proc. USENIX Summer Conference, Anaheim, California, Pages 11-23. 1990. Douglis, F. "On the Role of Compression in Distributed Systems" Operating Systems Review, Vol. 27 No. 2, Pages 88-93. 1993. A.P.M. Ltd. "The ANSA Reference Manual Release 01.00", APM Cambridge Limited, UK. March 1989. Coulson, G., G.S. Blair, P. Robin, and D. Shepherd. "Extending the Chorus Micro-Kernel to Support Continuous Media Applications" Proc. 4th International Workshop On Network and Operating System Support for Digital Audio and Video, Lancaster House, Lancaster, U.K., Pages 49-60. 1993. Davies, N., G. Coulson, and G.S. Blair. "Supporting Quality of Service in Heterogeneous Networks: From ATM to GSM", Internal Report, Department of Computing, Lancaster University, Bailrigg, Lancaster, LA1 4YR. November 1993. Davies, N., G.S. Blair, A. Friday, A.D. Cross, and P.F. Raven. "Mobile Open Systems Technologies For The Utilities Industries" Proc. IEE Colloquium on CSCW Issues for Mobile and Remote Workers, London, U.K. 1993.

Suggest Documents