A Multiconsistency Memory Protocol Test Environment on Chorus T. Cornilleau*, E. Gressier*, M-I. Ortega** * Laboratoire Cedric Conservatoire National des Arts et Métiers 292, Rue Saint-Martin F-75141 Paris Cedex 03 E-mail: {vercor, gressier}@cnam.fr
** Chorus Systèmes 6, Avenue Gustave Eiffel F-78182 Saint-Quentin-en-Yvelines E-mail:
[email protected]
1. Introduction Distributed Shared Memory (DSM) offers a convenient support for distributed applications. It can be viewed as an alternative to the message passing programming model. DSM has been developed through research projects mostly: Ivy [Li 86], Clouds [Ramachandran 88], Munin [Carter 91], Mirage [Fleisch 89] for example. Meanwhile, efforts towards industrial products have been made : Treadmarks [Keleher 92], Chorus [Ortega 92] for example. Many memory consistencies have been proposed since Lamport's definition of sequential consistency [Lamport 79]. Two surveys, [Mosberger 93] [Raynal 93], summaries the various memory models. We intend to study and to experiment different memory consistencies. Our aim is to provide a better understanding of their properties. A multiconsistency memory test environment is the first step towards this goal. In this paper, we describe the structure of our experimental test environment built on top of the Chorus micro-kernel. The results of our study will be the basis of a distributed shared virtual memory subsystem. We also provide centralised synchronization services, enhanced mechanisms will be studied in future works. Fault tolerance constraints have not been considered in our design. Section 2 of this paper recalls the Chorus memory abstractions used to build our environment. Section 3 describes the multiconsistency test environment. Section 4 concludes with future works. 2. The Chorus Memory Model Chorus extends the basic memory abstractions (actors, segments, regions and local caches) [Ortega93] [Chorus94b] to support distribution and implementation of application personalities: • subsystem : A subsystem is a collection of actors and libraries that exports a given system interface to users. This interface offers a well defined memory model. It hides the underlying distribution of data. The subsystem has its own memory consistency semantic. • mapper : A mapper is a system actor dedicated to data management. Mappers serve kernel page requests. These requests are mainly page faults and page flushes, they concern segments local caches. We refine this definition. - segment mapper : A segment mapper, also called real mapper, serves kernel data requests to disk repository. It can provide other facilities as naming or access control management. - memory consistency mapper : Segments can be mapped on more than one site. Consequently, the coherency of different copies of a same segment has to be maintained following the subsystem consistency policy. Memory consistency is handled at cache level. A protocol between memory mappers guarantees the required memory consistency semantic. By the way, a hierarchy of mappers can be defined. Memory consistency mappers are halfway between VM and real mapper.
3. Test Environment Architecture 3.1. Design Hypothesis We use Chorus ClassiX [Chorus94a] for our experiments. The ClassiX architecture is depicted in figure 1. The C-actors subsystem offered by ClassiX is a basic environment. It brings File and Communication Management to applications. By the way, distributed applications over networked personal computers can be built easily. We choose this environment to be free from Unix license. Consequently, we are loosing some traditional Unix system services : - Unix system V IPC as shared memory (shmxxx) and semaphores (semxxx), - file mapping (mmap). Our multiconsistency protocol test environment is built with a set of supervisor actors. The ClassiX kernel is not modified. We use ClassiX and C-actors facilities to launch our environment and to provide user level test applications. c_init c_actor RGDBA c_actor rsh1 C++ cross Devlopt Tools
IOM
Chorus GDB
c_init c_actor
AM C-actors Subsystem
c_actor
Kernel v3 r5.2 Target PC 386
rsh2
IOM
AM C-actors Subsystem
Kernel v3 r5.2 Host Disk
Sun OS 4.1.3 Host
Target PC 386
Figure 1. ClassiX Architecture.
3.2. Memory Management As we said previously, the aim of our work is to provide distributed shared memory abstractions with coherency management facilities to user actors. Chorus actors map shared segments to regions in their address space. We extend this mecanism. The user can select a consistency policy for its segments : sequential, causal or release consistencies [Mosberger93]. A segment can be shared among several actors on different sites. Therefore, in an actor address space, distinct segments may be mapped with various consistencies as depicted in figure 2. 3.2.1. Shared Memory services extensions 3.2.1.1. Shared segment attachment An actor performs this operation to get an access to a shared segment. This service returns the address where the shared segment is mapped. This call requires the following parameters : . A user application segment key. It identifies the shared segment. After attachment, the kernel names the shared segment by a Chorus capability.
. A size. It cannot be modified after attachment. The size is rounded to a multiple of page size. The real size is given when the call returns. . The consistency policy. The possible consistencies are : sequential consistency: Ivy model, static distributed manager algorithm [Li 89], the dynamic distributed manager [Ortega 93] will be implemented as an extension, causal consistency: [Ahamad 91], [Gal 95], release consistency is a future extension: Dash model [Gharachorloo 90], Munin model [Carter 93]. . Access rights : read, write, write access includes read access. . The virtual address where the segment has to be mapped in the actor address space. The segment will be mapped at a page limit boundary. This operation returns the real address of mapping if the call succeeds, null else. #C1
#C1 consistency one Shared Memory Segment
Shared Memory Segment
#C2 User Actor Address Space
Kernel
#C2 consistency two
Kernel
Chorus Site
Kernel
Figure 2. Shared Segments management with different consistencies.
The request of an actor is handled as follow : a. The segment doesn't exist anywhere. A local cache is created. It is associated with the shared segment on the site. The segment is said physically created. The requester is called the shared segment creator. A region that maps the segment is plugged into the actor address space. The return of the system call gives the address where the shared segment is effectively mapped. A Chorus capability is associated to the creator context. The creator has a special privilege. It is the only one that can delete the shared segment. After physical creation, a shared segment only exists in memory. Consequently it does not have any backup image on a secondary storage device during all its life. b. The segment doesn't exist on the site of the requester but exists somewhere else. A local cache is created and associated with the shared segment on the requesting site. A copy of the existing segment is fetched locally. This operation is handled respecting consistency constraints. c. If a local cache already exists on the requester site, the segment is simply mapped in a new region of the actor address space. On a site, several actors share the same segment local cache. Problems arise when concurrent attachments occur. Collisions between concurrent requests can exist. Collisions affect system call parameters and shared segment physical creation. - physical creation collision : Several actors know the user key of a shared segment. Sometimes, multiple physical creations have to be handled simultaneously. Only one creator exists for a given shared segment. The creator is selected randomly. If this scheme does not correspond to application requirements, explicit synchronization has to be used by the user.
- system call parameters collision : System call parameters may not match between requests for the same segment. The creator parameters are used as the reference. If the system call response is not null, the parameters given at call time are right. Else requesters have to verify the parameters of the call, they have been changed by the shared memory service and correspond to the ones provided by the creator. Access rights are managed differently. If a segment is created with write access, any other attachment request will be able to ask for either read or write access. If it is created with read access, no write access will be given with the next attachments. Shared Segment control operation can impact on access rights. 3.2.1.2. Shared segment detachment The user specifies the key of the targeted shared segment. The answer to this call is zero if it succeeds, -1 else. When the call returns, the data contained in the segment are no more reachable by the actor. The last actor detaching a shared segment on a site distinct from the one of the creator implies the deletion of the associated local cache. Specific actions will be taken by the test environment depending on access rights, consistency rules and local cache contents. 3.2.1.3. Shared segment status Any actor can get status information on a shared segment when it knows the corresponding user key. This call returns: segment size, segment access rights, creator uid, consistency policy, the attached actors uid list and the name of the sites where the shared segment is used. The capability of the shared segment is not given. 3.2.1.4. Shared segment deletion This function can be done only by the creator. The associated local caches are deleted on each site. Enhancement to this feature will consider a graceful deletion as provided by the Unix System V shmctl system call when the IPC_RMID flag is set. 3.2.1.5. Shared segment control The creator of a shared segment can modify the access rights of a shared segment. Further studies will consider the control of creator privilege and consistency policy. We are aware that this kind of operation has to be carefully handled by the user application. This special operation needs explicit cooperation between all actors involved in the use of the targeted shared segment. 3.2.2. Shared memory manager Shared memory management relies on two services : shared memory and memory coherency. The shared memory service uses two types of multi-threaded supervisor actors : the global shared memory manager (GSHM) and the local shared memory managers (LSHM). LSHMs catch local requests and forward them to the GSHM. The GSHM maintains information concerning shared segments. Servicing attachment of shared segment requires that the GSHM communicates with the right memory consistency service. This one provides a coherent capability that will be used by kernels during page fault management operations. At segment creation time, the LSHM initiates its local cache. It is responsible of its mapping within the proper actor address space. It controls access rights. It removes the segment on the site as needed. Shared segments name management is handled by the GSHM all over the system. The memory consistency service is designed to match modularity requirements. There is one consistency management system for each type of experimented consistency. One consistency system is composed of two types of supervisor actors : local mappers (xLM, the "x" prefix can be : sq for sequential consistency, cc for causal consistency, rc for future release consistency)
and a consistency global mapper (xCGM). The GSHM is the only one to be aware of multiple consistency systems. The xCGM is responsible of coherent capability management. xLMs interact together to realize the "x" consistency protocol. When a page is first fetched in a site, the xCGM is solicited by the corresponding xLM. Figure 3 gives the general architecture of the multiconsistency memory protocol test environment. 3.3. Synchronization Management for Applications Schemes as producers/consumers or critical sections, programmed with shared memory need synchronization facilities. The purpose of our work is not to implement optimal synchronization schemes. Process synchronization can be easily provided by semaphores and barriers. A centralised server (SEM) manages semaphores in our multiconsistency test environment. An actor gets access to a semaphore by sending a P request to SEM. It is blocked until SEM replies. If the semaphore is free, SEM sends the reply immediately, the actor grants access to the protected resource. If the semaphore is busy, the request is stored in a FIFO list [Andrews 91] within SEM : the actor will get its reply when the resource will be freed and when it will be head of the list. Any actor that relinquishes a semaphore sends a V request to SEM. This request triggers the handling of blocked actors. The calling actor uses an ipcCall() system call, Chorus low level RPC, to send a P request. It is suspended until the reply comes back. It uses an ipcCall() again to send a V request. Barriers offer a distributed multi-actor Rendez-Vous mechanism. As for semaphores, each actor performs an ipcCall() to SEM and sends a B request. Before responding to actors, SEM waits barrier requests from all the actors attached to the barrier. Each actor is suspended until it gets its reply. Appl. Actor
GSHM LSHM
shared segment management service
LSHM
LSHM
sequential consistency causal consistency
ccLM
Kernel
ccGCM
consistency management service ccLM
ccLM
Kernel
Kernel
Figure 3. Multiconsistency memory protocol test environment architecture
Semaphores and barriers creations/deletions are handled mostly in the same way as shared memory segments. We examine the barrier case, it is the same for semaphores. Barriers imply an attachment operation before use. Only attached actors can perform a B request. If an attachment arrives at SEM during the processing of a B request for a given barrier, the reply to the demanding actor is delayed. It gets an answer when the barrier synchronization handling ends. Detachment operations for barriers are performed immediately.
4. Future Works Our test environment has to be completed with a memory map primitive like the mmap Unix system call. A memory map primitive will bring a secondary storage facility. Shared memory segments will get persistence if needed. Multiconsistency is a challenge for the design of new system architectures. We have to develop applications that use multiconsistency memory protocols to improve our knowledge of consistency properties. The next step of our development is to merge the Shared Memory Management facility to the Chorus ClassiX subsystem. Bibliography [Ahamad 91]
Implementing and Programming Causal Distributed Shared Memory. M. Ahamad, P.W. Hutto, R. John. Proc. of the 11th ICDCS. May 1991. [Andrews 91] Concurrent Programming - Principles and Practice. G. Andrews. Benjamin/Cummings 1991 [Carter 91] Implementation and Performance of Munin. J.B. Carter, J.K. Bennett, W. Zwaenepoel. SOSP13. p152-164. Oct. 1991. [Carter 93] Efficient Distributed Shared Memory based on multi-protocol Release Consistency. J.B. Carter. PhD, Rice University. September 1993 [Chorus 94a] Chorus ClassiX i386at r1 Product Description. Chorus systèmes. CS/TR-94-44.3. 1994. [Chorus 94b] Chorus Kernel v3 r5 Implementation Guide. Chorus systèmes. CS/TR-94-73.1.1994. [Coulouris 94] Distributed Systems : concepts and Design. G. Coulouris, J. Dollimore, T. Kindberg. Addison Wesley 1994. [Fleisch 89] Mirage: A Coherent Distributed Shared Memory Design. B. Fleisch, G. Popek, SOSP12, p.211-223, December 1989 [Gal 95] Spécification à l'aide du langage LOTOS d'un algorithme de gestion d'une mémoire à cohérence causale. V. Gal. Mémoire d'ingénieur Cnam. Paris. Mars 1995. [Gharachorloo 90] Memory consistency and event ordering in scalable shared-memory multiprocessors. K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, J. Hennessy. computer Architecture News. 18(2) : 15-26. June 1990. [Lamport 79] How to make a Multiprocessor Computer that correctly executes Multiprocess Programs. Leslie Lamport. IEEE TOC. V C-28. N9. September 1979. [Li 89] Memory Coherence in Shared Virtual Memory Systems. K. Li, P. Hudak. ACM TOCS, V7, N4, November 1989. [Li 86] Shared Virtual Memory on loosely coupled multiprocessors. K. Li. PhD. Dissertation. Dept. of Computer Science. Yale University, 1986. TR-YALEU-RR-492 [Keleher 92] Lazy Consistency for Software Distributed Shared Memory. P. Keleher, A.L. Cox, W. Zwaenepoel. SIARCH92, May 1992, p.13-21 [Mosberger 93] Memory Consistency Models. D. Mosberger. ACM Op. Syst. Rev. 1993. [Ortega 92] A Distributed Consistency Server for the Chorus system. M.I. Ortega, F. Armand, V. Abrossimov. SEDMS III, p.129-148, 92/03. USENIX. CS/TR-91-91. [Ortega 93] La Mémoire Virtuelle Partagée Répartie au sein du Système Chorus. M-I. Ortega. Thèse de Doctorat de l'Université Paris VII. LITP 93.01. Février 93. [Ramachandran 88] Unifying Synchronisation and Data Transfer in Maintaining Coherence of Distributed Shared Memory. U. Ramachandran, M. Ahamd, Y.A. Khalidi, Gatech, GIT-CS-88/23, June 1988 [Raynal 93] How to find his way in the jungle of Consistency Criteria for Distributed Objects Memories (or how to escape from Minos' Labyrinth). M. Raynal, M. Mizuno. IRISA. Rapport N° 730. May 1993.