Design and Implementation of a Low-Overhead File Checkpointing Approach Dan Pei
Dongsheng Wang Meiming Shen Weimin Zheng Department of Computer Science and Technology Tsinghua University Beijing, 100084, P.R.China E-mail:
[email protected]
Abstract One of checkpointing and recovery technique’s important capabilities is file checkpointing, i.e., to save and restore the state of user files of the process. This paper describes the design and implementation of a file checkpointing approach called Modification Operation Buffering. This approach buffers all the modification operations after a checkpoint until the next one, making all the operations between two checkpoints atomic as a whole. By choosing a suitable size dynamically for memory buffer, and by hiding the latency of flushing the buffer, this approach achieved an overhead lower than other approaches.
1.Introduction Checkpointing and recovery is a technique for saving process state during normal execution and restoring the saved state after a failure to reduce the amount of lost work [1]. Process state refers to everything that is included in a checkpoint in order to guarantee a successful recovery and it should include both volatile and persistent state [2]. Persistent state includes the status of all the user files related to the current execution of the process. The status of a file includes its content and its active information, i.e., its descriptor, access mode, the offset to which it is positioned, etc. Although supporting the correct rollback of persistent state has become the primary concern of many users [1], existing checkpoint libraries usually save and restore only active information [2, 3]. This is because it is unacceptably expensive to save all the content of user files into checkpoint due to their arbitrary size and number. This straightforward but incomplete way will result in inconsistent rollbacks of volatile state and persistent state under following circumstances [4]: 1) Rollback occurs before the first checkpoint after opening a file (RBFFC);
2) Rollback occurs after reading and writing the same area( RARW); 3) Rollback after deleting a file (RAD).
2. Related Work The reported file checkpointing approaches fall into three categories: 1) Shadow copy. Libckp [1] makes a shadow copy when the portion of the file that existed at previous checkpoint time is about to be modified or when the file is about to be deleted. During rollbacks, the shadow copy can be used to restore file to have correct content. 2) In-place update with undo logs. libfcp [5], winckp [6], and SCR algorithm [7] use this approach. It intercepts all file operations and generates undo log of restoring the pre-modification data. When a rollback occurs, these undo logs are applied in a reversed order to restore the original files. In this approach, a normal write operation leads to two necessary additional disk accesses: reading out corresponding content from original file and writing it into undo log. 3) Modification Operation Buffering [4]. This approach buffers all the modification operations after a checkpoint until the next one, making all the operations between two checkpoints atomic as a whole. At the time of the next checkpoint, the buffered operations are flushed from the buffer to the corresponding user files. Hence, volatile checkpoint file and persistent state of a process are always consistent during the running time of the process, and none of RBFFC, RARW, and RAD will occur. During rollback, the only work needed is to clear the buffers because no change was made to original user files since last checkpoint. [4] compared these approaches and concluded that by choosing a suitable size dynamically for memory buffer, and by hiding the latency of flushing the buffer, MOB can achieve a lower space overhead, lower normal executing overhead and lower recovery overhead than other approaches.
FileName Duplist MemBufferStart DiskBufferFd
Start
Fd AccessMode FilePointer WasClosed WasDeleted AMLHead MemBufferPointer MemBufferSize MemBufferFull DiskBufferPointer …… Table 1. The structure of File Information Table End
MemOrDisk
BufferStart
BufferEnd
FileSize
Next
Table 2. The structure of Address Mapping List
3. Modification Operation Buffering 3.1 Basic The basic buffering method of MOB is to append the writing content to the end of the buffer. In order for the ease of read operation, if the same area is written into more than once between two checkpoints, the corresponding content in the buffer is updated, but not appended to the current end of the buffer. This ensures that one modified area of user file has one single latest copy in the buffer. During read operation, the file content is read out from buffer if it is in the buffer. Otherwise, it is read out directly from the original file because this area must have been unchanged since last checkpoint. The MOB buffer has two parts: MemBuffer, which is in the physical memory, and DiskBuffer, which is on the hard disk. MemBuffer is used first if it is not full; Otherwise, DiskBuffer is used. The size of MemBuffer (MemBufferSize) can be changed dynamically. To some extent, the performance of MOB depends on MemBufferSize. Therefore, optimization is necessary. One method is to choose a suitable size for MemBuffer dynamically according to the size of the machine’s remnant physical memory. Another method is latency hiding technique: flushing is executed concurrently with the running of process. Therefore, the overhead of MOB due to low memory does not affect the process’ running time directly.
3.2 Data structure Every file opened by the process is allocated an entry in a File Information Table (FIT), the structure of which is shown in Table 1. MemBufferStart, MemBufferPointer, MemBufferSize, are the MemBuffer’s starting address, current pointer, and size, respectively. MemBufferFull is the flag which indicates whether MemBuffer is full. DiskBufferFd is the file descriptor of the DiskBuffer, and DiskBufferPointer points to the end of the DiskBuffer file.
AMLHead points to the head of an Address Mapping List (AML), whose structure is shown in Table 2. An entry in the AML corresponds to a file area that is continuous in the buffer. Start and End are the starting and ending address of this area, respectively. BufferStart and BufferEnd are this area’s corresponding starting and ending address in the buffer. MemOrDisk=0 means that this area is in MemBuffer; otherwise, it is on DiskBuffer.
3.3 Implementation The essential technique of the implementation of MOB is the maintenance of AMLs. In this section, (s, e) denotes a node in the AML, where s and e denote Start and End, respectively. The entries in the AML are ascendingly sorted by (s, e). Suppose that current AML is: (s1,e1)->(s2,e2)->…->(sn, en), (s1