Virtual Full Replication by Static Segmentation for ...

3 downloads 7875 Views 224KB Size Report
dictability is essential and for a hard real-time system the. 1 ... ity on disk I/O delays caused by unpredictable access times for hard drives. Further, to remove ... ing differences in properties of the data, we can bound re- ... prioritize recovery of critical segments. .... bound. For each data object, we guarantee global data dead-.
Virtual Full Replication by Static Segmentation for Multiple Properties of Data Objects Daniel Jagszent Institute for Program Structures and Data Organization, University of Karlsruhe [email protected]

Gunnar Mathiason, Sten F. Andler School of Humanities and Informatics, University of Sk¨ovde {gunnar.mathiason, sten.f.andler}@his.se

Abstract

number of nodes in the system, assuming that the number of updates grows as O(n). We suggest Virtual full replication [6] as a solution to preserve the image of full replication for applications, while reducing resource usage according to the actual need, rather than replicating all updates blindly to all nodes. Virtual full replication prevents usage of excessive resource so as that scalability can be achieved. Virtual full replication is based on knowledge about the needs of the application, and uses required properties for the data, such as location, consistency model, and storage media. Virtual full replication matches the data and their properties with the application and transaction requirements, so that replication and thereby scalability is improved. We present an algorithm for static segmentation, an analysis of its scalability with respect to bandwidth, storage and processing usage. We also present results from a proof-ofconcept implementation, which indicate that scalability is achieved by a system with resource usage that grows only as O(n), assuming a constant replication degree. The paper is organized as follows: Section 2 contains a background and describes the scalability problem more in detail. Section 3 describes our approach for how Virtual full replication is achieved by segmentation, using data properties and how these are managed with our algorithm. Section 4 presents our analysis for usage of three system resources. It also presents results from our implementation in the DeeDS database prototype.

We implement Virtual full replication for a distributed real-time database by segmenting the database on multiple data properties. Virtual full replication provides an image to the application of full replication in a partially replicated database, by replicating data to meet the actual data needs of the users of the data. This is useful since fully replicated real-time databases, that allow updates at all nodes, do not scale well as updates must be replicated to every other node for replica consistency, also to nodes where only a small share of the database will ever be used. We propose an algorithm that segments the database on multiple data properties without causing a combinatorial problem. We show, by analysis and an implementation, that scalability for such a system can be improved due to scalable resource usage, while application semantics of full replication is unchanged.

1 Introduction The use of a distributed database offers the potential of using redundancy for fault tolerance, availability, and performance. In particular, in a real-time distributed database, full replication of all data to all nodes offers full availability. In a fully replicated database with eventual consistency [2], all transactions can run locally and are independent of network delays. With main memory residence of database replicas, local database transactions are predictable, independent of both network and disk access delays. Full replication use excessive system resources, since the system must replicate all updates to all of the nodes in a system. This causes a scalability problem, with respect to bandwidth usage for replication of updates, storage usage for replicas, and processing usage for propagating, integrating and conflict resolving of detached updates. The usage of each of these resources grows as O(n2 ), where n is the

2 Scalability in a distributed real-time mainmemory database 2.1 A distributed real-time database architecture The main property of real-time systems is timeliness, which can only be achieved when resource usage is predictable and when execution is sufficiently efficient. Predictability is essential and for a hard real-time system the 1

consequence of a missed deadline may be fatal, while for soft real-time systems a missed deadline lowers the value of the provided service. Thus, for real-time systems the primary design concern is predictable resource usage. To remove unpredictability of disk accesses, the database of the distributed real-time database system DeeDS [2] resides entirely in main memory, removing dependability on disk I/O delays caused by unpredictable access times for hard drives. Further, to remove unpredictability of network delays or network partitioning, the database is fully replicated to all nodes. For applications that can tolerate temporarily inconsistent replicas, full replication makes timeliness of transactions independent of network delays, since there is no need for remote data access during transactions. In addition, replication improves fault tolerance for the main-memory resident data. Local execution of transactions with detached replication [3] allow independent updates, that is, concurrent and unsynchronized updates for different replicas of the same data objects. However, such updates cause database replicas to become temporarily inconsistent and the inconsistencies introduced must be resolved in the replication process by a conflict detection and resolution mechanism. In DeeDS, temporary mutual inconsistencies are allowed and guaranteed to be resolved at some point in time, giving the database the property of eventual consistency. With detached replication we can avoid a number of predictability problems associated with update synchronization, such as distributed locking of objects and reliance on stable communication during transaction execution.

ing differences in properties of the data, we can bound resource requirements at a lower level, while maintaining the same degree of availability of data for the application. This reduces flexibility somewhat, but for hard real-time system, access patterns and resource requirements are usually known a priori, and the flexibility gained by full replication is therefore less motivated. In future work we plan to support unspecified data needs of soft real-time transactions as well, by adaptation to data requirements online. With a bound on the degree of replication, k, where k

Suggest Documents