TDB: A Database System for Digital Rights Management - CiteSeerX

8)',2-'%06)436878%686

TDB: A Database System for Digital Rights Management RADEK VINGRALEK, UMESH MAHESHWARI, AND WILLIAM SHAPIRO February 2001

InterTrust Technologies Corporation 4750 Patrick Henry Drive Santa Clara, CA 95054 Tel: +1.408.855.0100 Fax: +1.408.855.0136 Copyright ©2001 InterTrust Technologies Corporation. All rights reserved.

TDB: A Database System for Digital Rights Management Radek Vingralek

Umesh Maheshwari William Shapiro STAR Lab InterTrust Technologies Corporation 4750 Patrick Henry Drive Santa Clara, CA 95054 frvingral,[email protected] [email protected]

Abstract Some emerging applications challenge the assumptions that have influenced the design of database systems in the past. One example is Digital Rights Management (DRM) systems, which ensure that digital goods such as music and books are consumed (i.e., played or viewed) according to the contracts associated with them. Enforcement of contracts may require access to some persistent state such as usage meters and account balances. For good performance and disconnected operation, it is desirable to maintain this state in the consumer’s device. The state is managed by a DRM database system, which must protect the data not only against accidental corruption, as most database systems do, but also against malicious corruption. DRM database systems may be embedded in consumer appliances with limited resources. Consequently, DRM database systems typically manage small databases with cacheable working sets. They need not be optimized for high throughput or concurrency, and they do not need to provide sophisticated query processing. Instead, they should be tightly integrated with the application programming language to reduce code complexity, minimize the code footprint, and simplify administration. In this paper we describe the architecture and implementation of TDB, an embedded DRM database system. TDB is tightly integrated with the C++ programming environment: it provides typed storage of C++ objects and uses C++ as the data definition language. We concentrate on those aspects of TDB’s design where it departs from the common database design principles to accommodate DRM applications. We also show that, although it provides additional functionality for DRM applications, TDB performs better then Berkeley DB, which is a widely used, freely available embedded database system. We also show that TDB has a code footprint comparable to other embedded database systems.

1 Introduction Some emerging applications challenge the assumptions that have influenced the design of database system in the past. One example is Digital Rights Management (DRM) [9, 17, 5]. DRM systems are embedded in consumer appliances such as MP3 players, set-top boxes, e-book readers, and PCs. DRM systems ensure that digital goods such as music and books are consumed (i.e., viewed or played) according to the contracts associated with them. A contract may be simple, such as pay-per-view, subscription, or one-time-fee, or it may be more complex, such as “free after first ten paid views”, “free, if usage data can be collected” or “$10 one-time-fee or free for every 1000th customer”. Enforcement of some contracts requires storing persistent state associated with the contract, such as usage meters, pre-paid account balances, audit records, discount-club membership, or access keys. The state is often stored locally in the consumer’s device to provide better performance and privacy and to enable disconnected operation. Since some of the state has monetary value, it requires the same protection against accidental corruption as provided by most database systems; this includes transactional semantics for updates, backups, type-safety, and automatic index maintenance. However, some of the characteristics of DRM systems are different from the assumptions commonly assumed in the design of database systems: Protection against malicious corruption. DRM database systems must protect the data not only against accidental, but also malicious corruption. A consumer who could successfully tamper with the database, for example, by resetting

1

a usage count or debit balance, might be able to obtain content for free. Moreover, the consumer has full control over the device, which makes protection against tampering harder. Protection against unauthorized reading. DRM database systems must protect the data against unauthorized reading. Consumers may find secrets such as content decryption keys or access codes by analyzing the database and use them to circumvent the DRM system. Protection against tampering and unauthorized reading is further complicated by the fact that many consumer devices may need to store the database on removable storage systems [11, 10], which can be analyzed and modified off-line. Single-user workload. DRM database systems are not typically used as shared servers that are accessed by many concurrent users. Instead, they are embedded in devices that are used by a single user at a time. Consequently, DRM database systems should be optimized for response time rather than throughput. In addition, DRM database systems do not require highly optimized concurrency control. A typical DRM workload consists of short sequences of transactions separated by long idle periods when the consumer does not use the device. Consequently, some of the database reorganization (such as log checkpointing) can be deferred until idle time. Small database size. DRM systems typically create relatively small databases. Therefore, DRM database systems can often cache the working set and physical clustering of data is less important. Small code footprint. Consumer appliances hosting DRM systems often have limited resources. Therefore, the code footprint is an important constraint. Inclusion of any features into the database system must be judged not only by the potential performance impact, but also by the corresponding increase in the code complexity, the code footprint, and increased cost of system administration. The database system should have a modular architecture, so that extra functionality can be traded off for smaller code footprint. Tight integration with the application programming language. DRM database systems typically use simple database schemas and queries, which do not warrant design of a separate database language. They should try to utilize the features available in the application programming language instead of reimplementing them. Integration with the application programming language also helps avoid accidental database corruption due to programming bugs in applications. It is difficult to layer a DRM database system on top of any of the existing database systems. To protect the application data against malicious corruption and unauthorized reading, the database system must also protect both application data and system meta-data. Otherwise, a malicious user can effectively remove data from a database by tampering with an index on the data or by adding entries to a database log. However, it is difficult to protect the database system meta-data using a module that is layered on top of the database system. In addition, layering a module that protects application data against unauthorized reading using encryption precludes efficient access the data using ordered indexes. Many of the existing database systems are not well suited for DRM applications for other reasons. For example, most embedded database systems have a small code footprint and modular architecture, but they often do not provide type-safety or automatic index maintenance. Similarly, many object-oriented database systems are tightly integrated with an application programming language, but they are typically designed as shared servers, have a large code footprint and often do not provide automatic index maintenance. Most commercial object-relational database systems are not suitable as DRM database systems because they are designed as shared servers and have a large code footprint. As we found it difficult to build a DRM database system on top of any existing database system, we decided to implement a DRM database system from scratch. In this paper we report on our design of a DRM database system, TDB, which is tightly integrated with the C++ programming environment. We concentrate on those aspects of TDB’s design where we depart from the common database design principles to accommodate DRM applications. We also show that although TDB implements more functionality than most embedded database systems, it has a comparable code footprint and provides better performance than Berkeley DB, a widely used embedded database system. The rest of this paper is organized as follows. In Section 2 we describe the layered architecture of TDB. In Sections 3—5, we discuss the design of the individual layers. In Section 6 we compare the size of TDB’s code footprint against other embedded database systems. In Section 7 we compare the performance of TDB against Berkeley DB. In Section 8 we review the related work. We conclude with Section 9.

2 TDB Architecture As an embedded database system, TDB may run in environments with limited resources. Consequently, it is important that the architecture of TDB is modular, so that applications link only with the modules they require. Figure 1 shows

2

the basic modules of TDB and their relationships. The dashed boxes represent infrastructure modules, which we expect to be provided by the platform.

Collection Store object collections functional indexes scan, match, range queries Object Store abstract objects concurrency control object cache

Backup Store full/incremental validated restore

Chunk Store untyped chunks encryption, hashing atomic updates recovery

Archival Store large size any stream read/write backups

Untrusted Store large size any read/write database

One-way counter small size any read

Secret Store small size trusted read secret key

Figure 1: Architecture of TDB. We assume that the platform provides a small secret store, which can be read only by the database system, and a one-way persistent counter, which cannot be decremented. In most devices, the secret store can be implemented in a ROM, which most devices use to store firmware. A more secure implementation may use a battery-backed SRAM that can be zeroed if physical tampering is detected. The one-way counter may be implemented using specialpurpose hardware [34]. We assume that the platform also provides an untrusted store for storing the database and an archival store for maintaining backups of the database. The untrusted store provides a file-system-based interface to a storage system that supports efficient random access, such as a (removable) flash RAM or a hard disk. The archival store provides a stream-based interface to a sequential storage system. A typical implementation of the backup store may stage backups in the untrusted store and opportunistically migrate them to a remote server. We assume that the untrusted store and the archival store can be arbitrarily read or modified by an attacker. The architecture of TDB consists of four layers: the chunk store, the backup store, the object store and the collection store. The chunk store securely stores a set of variable-sized sequences of bytes, which we term chunks. The chunk store guarantees that the chunks cannot be read by an unauthorized program and detects tampering by an unauthorized program. The programs that can access the secret store are authorized. All programs linked with the DRM database system must be authorized. Applications can update a number of chunks atomically with respect to system crashes. The chunk store allows the application to snapshot the database efficiently using copy on write. The backup store creates and securely restores database backups, which can be either full or incremental. The backup store restores only valid backups. In addition, it restores incremental backups in the same sequence as they were created. Backups are created using the database snapshots provided by the chunk store. A more detailed description of the backup store design can be found in [23].

3

The object store provides type-safe access to a set of named C++ objects. The object store maps one or more objects into a chunk. It implements full transactional semantics, including concurrency control. The object store maintains a cache of frequently accessed or dirty objects. The collection store allows applications to create indexed collections of objects. The indexes are automatically maintained while the objects in a collection are updated. The applications can access objects in collections using scan, exact-match and range queries. The collection store supports indexes organized as B-trees, dynamic hash tables [20] and lists.

3 Chunk Store The chunk store provides trusted storage for a set of named chunks. A chunk is a variable-sized sequence of bytes that is the unit of encryption and validation. All data and meta-data from higher modules are stored as chunks in the untrusted store. The chunk store ensures that only authorized programs (i.e., programs that can access the secret store) can read the data stored in chunks. Data secrecy is achieved by encrypting all chunks with a secret key stored in the secret store. The chunk store cannot prevent tampering with chunks in the untrusted store because the device is under the control of the consumer. The consumer can, for example, save a copy of the database, purchase some goods, then replay the saved copy in an attempt to erase any record of purchasing the goods. The chunk store does, however, detect tampering, including such replay attacks. Tamper-detection is achieved by hashing the entire database using a one-way hash function. The resulting hash value along with the current value of the one-way counter are signed with the secret key and stored at a known location in the untrusted store. If the hash of the database does not match the value of the chunks read from the untrusted store (indicating unauthorized modification), the chunk store will signal tamper detection. Similarly, if the current value of the one-way counter does not correspond to the signed value in the unstrusted store (indicating a possible replay attack), tamper detection is signaled. To avoid the excessive overhead of hashing the entire database each time a chunk is read or written, the database hash is calculated using a Merkle tree [27]. Each internal node of the Merkle tree contains hashes of its children, while each external node contains a hash of the data in one chunk. Consequently, reading or writing a chunk results in validating or updating hash values that comprise a path in the Merkle tree from its root to the external node that corresponds to the chunk. Using write-ahead logging, multiple chunk writes can be grouped into a single step that is atomic with respect to crashes. Unlike conventional database systems, the chunk store implements a log-structured storage model, in which copies of chunks do not exist outside of the log [30]. We discuss the trade-offs involved in log-structuring the chunk store in Section 3.2.1. More detailed information on aspects of chunk store’s design not discussed in this paper can be found in [23].

3.1 Specification The interface of the ChunkStore class, shown in figure 2 allows applications to manipulate the state of individual chunks. The chunk operations allow applications to allocate new chunk names (chunk ids), read the state of a chunk given its chunk id, write the new state associated with a chunk id or deallocate a chunk id along with its state. Several operations can be grouped into a single commit operation that is atomic with respect to crashes.

3.2 Chunk Store Design Several aspects of the log-structured chunk store differ from both conventional databases and log-structured file systems. 3.2.1 Log-structured Storage We chose to implement the chunk store as a log-structured store [30]. Unlike conventional database stores, where a log is used as temporary storage for recovery and better performance, in a log-structured store, the log is the primary and only storage. Consequently, there are no chunks stored outside of the log. Chunks are appended to the log tail after each commit operation. A hierarchical location map is used to locate the current version of a chunk. To scale to large numbers of chunks, the map itself is organized as a tree of chunks. The modified portions of the location 4

ChunkId allocateChunkId() Returns an unallocated chunk id. void write(ChunkId cid, Buffer bytes) Sets the state of cid to bytes, possibly of different size than previous state. Signals if cid is not allocated. Buffer read(ChunkId cid) Returns the last written state of cid. Signals if cid is not written or the chunk state has been tampered with. void deallocate(ChunkId cid) Deallocates cid. Signals if cid is not allocated.

Figure 2: ChunkStore interface. map are not written at each commit, but rather its modified state is written opportunistically at checkpoints. Upon recovery, the portion of the log written since the last checkpoint (which we call the residual log), is read to restore the latest committed state of the database. When a chunk is updated or deallocated, its previous version becomes obsolete. Periodically, obsolete chunk versions must be reclaimed by a log cleaner. We found several benefits of using a log-structured storage in a DRM database system:

The hash tree can be embedded in the location map. Since the hash tree must be traversed at each chunk read or write, there is no extra performance overhead for maintaining the location map 1 . Log-structuring makes traffic analysis more difficult. In particular, it is difficult to link multiple writes to an untrusted store with a single chunk because the chunks are never updated in place. The location map can be inexpensively snapshot using copy-on-write, which is used to implement fast backups. In addition, the location map snapshots can be efficiently compared, which allows creation of incremental backups. Since the incremental backups are small, they can be created more often, which is beneficial to many DRM applications. At the same time, we found that the major costs perceived with using log-structured storage in database systems [22] are less significant for DRM database systems. These costs include:

Poor read performance. Appending updated chunks to the tail of the log destroys clustering of related chunks. In fact, log-structured storage trades off read performance for a better write performance. Poor performance when cleaning. Performance studies based on TPC-B showed a 40% performance reduction due to log cleaning [31]. Because DRM databases are relatively small, the working data set can be cached in memory, minimizing the cost of unclustered reads. The primary performance bottleneck then becomes writes, for which log-structured systems are optimized. Because DRM database systems are embedded in devices used by a single user, we expect sufficiently long idle periods to perform most of the cleaning. In fact, we found that even if idle periods are not present in the workload, we are generally able to maintain a database utilization of up to 80% with little performance degradation due to cleaning. Because the chunk store is layered on top of an untrusted store, which has an interface similar to a file system, the chunk store can increase or decrease the space allocated for storage to trade off storage space for performance (i.e., cleaner overhead). Consequently, the chunk store can bound the per-commit overhead of cleaning by simply increasing the database size if a fixed amount of cleaning fails to free sufficient space for writing committed chunks. 3.2.2 Nondurable Commits The chunk store supports both durable and nondurable commits. The semantics of a nondurable commit is that the commit is guaranteed not to survive a crash until a subsequent durable commit completes. Nondurable commits are 1 However, there is extra storage overhead of 6 bytes per chunk on top of the space required for storing a one-way hash.

5

useful for implementing nested transactions. For example, the chunk store uses nondurable commits internally to checkpoint the location map. One interesting issue, which arose while implementing nondurable commits, was that the cleaner cannot reclaim chunks that become obsolete as the result of a nondurable commit. Consider the following example: Assume an existing chunk version A was modified and rewritten as A’ during a nondurable commit operation. Now assume the cleaner reclaims the space used by the now-obsolete chunk version A, which is now free for writing and is overwritten. If a crash occurs before a durable commit, A’ is lost and the previous version A has been overwritten, so it cannot be recovered. To prevent this situation from occurring, the chunk versions that become obsolete as a result of a nondurable commit cannot be reclaimed until a durable commit occurs. For large non-durable commits, this constraint can dramatically increase temporary storage requirements as the only way to generate additional space for writing data is to increase the size of the log.

4 The Object Store The object store manages a set of named objects persistently. It is designed to balance ease of use and type safety of its applications against complexity of its implementation. We wanted the object store to provide storage for typed objects, not byte sequences that applications must unpickle and type-cast. More specifically, we wanted to provide direct storage for application-defined C++ objects, not objects in a separate database language with a separate type system. We leveraged existing C++ language and runtime features to provide this functionality. (Their use does limit the portability of the object store to other languages.) Although we wanted tight integration with C++, we did not want to invest the effort and incur the implementation complexity for providing transparent persistence (also known as orthogonal persistence [1]). Transparent persistence includes automatic swizzling of persistent ids into memory pointers [29], automatic locking for concurrency control, and in some cases garbage collection based on reachability. These features usually require a tool to parse, compile, or instrument object representations. We settled for requiring explicit application calls to lock, swizzle, and reclaim objects. (While explicit reclamation is in line with regular C++ usage, explicit locking and swizzling make the representation and use of persistent objects different from regular, volatile C++ objects.) Nonetheless, we were able to design the object store interface such that locking omissions and type-unsafe swizzling by the application are caught using a combination of static and dynamic checks. The purpose of these mechanisms is to catch common programming mistakes, not provide an unyielding type-safe environment, which is also in line with regular C++ usage. Like many other transactional systems, the object store tracks the set of dirty objects during a transaction and commits or rolls them back at the end of the transaction. Furthermore, although TDB is designed to support a single user, we found it desirable to optionally support concurrent transactions: the user may run a number of applications concurrently, and there may be background transactions such as reporting usage to a trusted server. We kept the implementation complexity low by avoiding sophisticated techniques for high concurrency and throughput. Instead, the object store is optimized for low latency. The rest of this section describes the specification and the implementation of the object store.

4.1 Specification Objects are instances of abstract data types defined by the application. The application may insert, read, write, or remove objects in the object store. When an object is inserted into the object store, it is given a persistent id. The application can use this id to “open” the object in read-only or read-write mode. It may store this id in other persistent objects and retrieve it later, possibly after a system restart. The application can also register a “root” object id with the object store; the object store stores this id persistently, so the application may use it as the starting point for navigating objects. The application can group a sequence of object accesses into a transaction. It can run multiple transactions concurrently, possibly in different threads. Each transaction executes atomically with respect to concurrent transactions and crashes. We specify the function of the object store further using its C++ interface. A persistent object must be an instance of an application-defined subclass of class Object. When the application opens an object in read-only or read-write mode, it receives a ”smart pointer” for accessing the object. Smart pointers are instances of templatized (parameterized) classes ReadonlyRef and WritableRef instantiated with the apparent object types. A transaction is

6

ObjectId insert(Object *object) Inserts object for persistent storage in object store. Returns an id for the object. ReadonlyRef openReadonly(ObjectId oid) Returns a reference to a read-only view of the named object. WritableRef openWritable(ObjectId oid) Returns a reference to a writable view of the named object. void remove(ObjectId oid) Removes named object from storage in the object store and frees oid for reuse. void commit(bool durable) Commits the current state of objects that were inserted or opened for writing during this transaction; also commits the removal of objects. Iff durable is set, makes the commit (and all previous non-durable commits) durable. Invalidates this Transaction and the Refs generated during it for further use. void abort() Undoes changes made during the transaction. Invalidates this Transaction and the Refs generated during it.

Figure 3: Transaction interface. represented as an instance of class Transaction, which provides methods to insert, open, and remove objects and to commit or abort the transaction; the interface is shown in Figure 3. Different parametric instantiations of Ref reflect the subtyping relationship between their parameter types. Specifically, a Ref can be copy-constructed from a Ref, provided the real class of the object referenced by the latter is MyObject or a subclass of MyObject. Otherwise, the attempt to construct Ref would fail with a checked runtime error. 2 Given a Ref, the data and method members of the object can be accessed by invoking the dereferencing operators * and ->. A ReadonlyRef provides access to a const object; i.e., public data members are accessible as if they were declared const, and only const public methods can be invoked. A WritableRef provides access to all public members. The application can open an object for reading and writing multiple times during a transaction. A Ref is valid only until the transaction it was generated in is committed or aborted; any attempt to use the Ref further results in a checked runtime error. This means that each transaction must start navigating objects from the root; it cannot retain object references across transactions. We illustrate the use of Refs with an example in Figure 4. A call to open or remove an object may block to ensure serializability of concurrent transactions. A blocked call raises an exception after a timeout interval, thus breaking potential deadlocks. The application is expected to handle this exception; it may either retry the failed operation or abort and retry the entire transaction. The timeout interval can be tuned by the application. Subclasses of Object must implement a method to pickle an object into a sequence of bytes, and a constructor to unpickle an object from a sequence of bytes. Each subclass must also provide a class id that is unique across all object classes and persists across system restarts. The subclass must register its unpickling constructor with the object store under its class id. (The object store provides assistance in generating unique class ids, but that is outside the scope of this paper.) The application may choose to pickle objects in an architecture-independent format so that the stored database can be moved from one platform to another. It may also compress the object state so that the unpickled state is optimized for fast access and the pickled state is optimized for small storage. While the pickling and unpickling operations are provided by the application, the application does not have to invoke them explicitly; they are invoked by the object store as needed. TDB provides implementations of pickling and unpickling operations for basic types. Persistent objects may reference each other using object ids. However, the object store does not swizzle object ids into memory pointers. Furthermore, as evident from the interface, persistence of objects is based on explicit insertion and removal in the object store, not on reachability from a persistent root. These two limitations stem from the fact that our database system does not interpret the representation of abstract objects stored in it. 2 C++ template classes do not by themselves reflect the subtyping relationship of their parameter types, but we have added this feature using C++ runtime type information to match the subtyping relationship between C++ references.

7

class Meter: public Object f ... int viewCount, printCount; g; class Profile: public Object f // Contains information on all goods used by a consumer. ... vector meters; g; ... // Add a new Meter to the Profile registered as root object. Transaction t; Meter *meter = new Meter(); ObjectId meterId = t.insert(meter); ObjectId profileId = getRoot(); WritableRef profile = t.openWritable(profileId); profile-> meters.push back(meterId); t.commit(); ... // Increment view count for first good. Transaction t2; ObjectId profileId = getRoot(); ReadonlyRef profile = t2.openReadonly(profileId); ObjectId meterId = profile-> meters[0]; WritableRef meter = t2.openWritable(meterId); meter-> viewCount++; t2.commit();

Figure 4: Sample usage of the object store interface.

4.2 Implementation The object store is implemented over the chunk store. Committed states of persistent objects are stored in chunks. Recently used objects and dirty (i.e., modified but uncommitted) objects are cached in memory. Transactional isolation is provided using strict two-phase locking; the object store locks objects when they are opened for reading or writing. Below we provide some details of the implementation. 4.2.1 Persistent Storage Committed states of persistent objects are stored in chunks in pickled form. A chunk could store one or more objects. There is a tradeoff between storing single or multiple objects per chunk. Using single-object chunks has the following advantages:

Single-object chunks use log space efficiently because only modified objects are written to the log. Single-object chunks reduce access latency by reducing the amount of data that must be processed (e.g., copied, hashed, and encrypted or decrypted) when reading and writing a single object. Multi-object chunks complicate the implementation of the object cache in the presence of concurrent transactions. Different objects in a chunk could be modified by different transactions. Committing an object would require re-composing the container chunk with the unmodified versions of other objects in the chunk [4]. Using multi-object chunks has the following advantages:

Log-structuring destroys inter-chunk clustering as chunks are updated. Multi-object chunks retain physical clustering between objects in a chunk, and thus benefit from spatial locality. 8

Multi-object chunks result in fewer chunks and therefore lower space overhead from metadata such as the location map and the chunk headers in the log. We chose to use single-object chunks in the current implementation because the stated disadvantages are not severe for DRM systems. DRM databases are relatively small and their working sets are often cacheable. Therefore, clustering is not as important as in traditional database systems. Also, our per-chunk space overhead is about 20 bytes without crypto overhead and 38 bytes with crypto overhead, and DRM records are often a few hundred bytes, so the space overhead is tolerable. Single-object chunks simplify the implementation of object ids. The id of a persistent object is the same as that of the chunk in which it is stored. The insert operation allocates a new chunk id and returns it as the object id. 4.2.2 Object Cache The object store maintains an in-memory cache of objects indexed by object ids. It fetches objects into the cache on demand by reading the chunks in which they are stored. The pickled state of each object includes the id of its class. The object store uses the class id to find and invoke the unpickling constructor, and puts the unpickled object in the cache. It is beneficial to cache objects in the object store (compared to, say, caching chunks in the chunk store) because objects are ready for direct access by the application: they are decrypted, validated, unpickled, and type checked. If the database system provided a cache of unprocessed data, there would be incentive for the application to keep a separate cache of processed data, resulting in double caching and additional complexity. Furthermore, because the indexes in the collection store are implemented using objects in the object store, the object cache provides caching of indexes as well. When the cumulative size of the cache grows above a certain threshold, some of the least recently used (LRU) objects are evicted. We maintain a linked list of cached objects; whenever a Ref is dereferenced, the referenced object is moved to the head of the list. This linked list is shared between various caches in TDB, e.g., with the cache of map entries in the chunk store. The sharing allows dynamic apportioning of total cache space to different caches based on need. Objects referenced by the application are protected against eviction from cache using a reference count per cached object, which is incremented and decremented as Refs are created and destroyed. Objects inserted into the object store or opened for writing are pinned in the cache until the end of the transaction. This implements the no steal policy: dirty objects cannot be flushed to persistent storage until they are committed [14]. Although the no-steal policy does not scale to transactions with large write sets, most DRM transactions tend to have small write sets, so the policy is a good tradeoff for code simplicity. 4.2.3 Transactional Locking and Synchronization The object store supports concurrent transactions, but it is geared to low concurrency. For example, it does not support granular locks [14]. The application may even switch off locking to avoid the locking overhead in the absence of concurrent transactions. The object store provides transactional isolation using shared/exclusive locks over objects. When an object is opened for reading in a transaction, the transaction tries to get a shared lock on the object. Similarly, when an object is inserted, opened for writing, or removed, the transaction tries to get an exclusive lock on the object. Opening objects is similar to explicit locking, but unlike explicit calls to lock and unlock objects, it guards against accidental omissions: 1. The application cannot access an object without opening it. Whenever a Ref is dereferenced, we check that it was created during a transaction that is still active. This check ensures that a reference from a previous transaction is not accidentally reused, and forces the application to open the object anew in the current transaction, which in turn ensures that the object is properly locked. 2. The locks are released automatically by the object store. We use strict two-phase locking: the object store releases the locks after the end of the transaction, which provides isolation between concurrent transactions [14]. Besides using transactional locks, the object store and the other layers in TDB use a single “state mutex” to protect the consistency of their data structures against simultaneous access from multiple threads. The state mutex is held only for the duration of each application-level operation. However, when a thread blocks on a transactional lock inside an 9

operation, it is desirable to let another thread run an operation. Not doing so would defeat the purpose of object-level locking and lead to many spurious deadlocks. (A thread T 1 holding the state mutex could be waiting for a lock on an object currently locked by T 2 , but T2 cannot proceed to commit because it is waiting for the state mutex held by T 1 .) Therefore, when a thread waits on a transactional lock, the state mutex is released; the mutex is reacquired when the wait on the lock ends. Each transaction remembers the ids of the objects inserted, read, written, and removed. These sets help avoid locking an object multiple times, and provide the identities of objects to be committed or removed at commit time. If a transaction is explicitly aborted by the application, the object store evicts all objects opened for writing from the cache, deallocates the chunk ids corresponding to the objects inserted, and releases all locks.

5 Collection Store The collection store provides keyed access to collections of objects. Unlike other embedded database systems, we provide applications with a type-safe and extensible data model similar to the object-relational data model. Collections contain C++ objects, which can be accessed using object references. Applications can create one or more indexes on a collection, which are maintained automatically during updates of objects in the collection. Applications can add and remove indexes dynamically, without recompiling the application source code or rebuilding the database. Applications can access objects in a collection using exact match, range or scan queries. As in the design of object store, we avoid the complexity of parsing or instrumenting the application source code. In addition, we do not use a database language to define the database schema or express queries. We expect most DRM applications to use relatively simple schemas and queries, which do not justify the extra complexity of adding a database language compiler and/or optimizer to TDB. Instead, we provide as flexible data model as possible using the constructs available in C++. Since we do not not parse or instrument the application source code, the collection store had limited control over how applications access and modify collection objects. For example, it is difficult to determine whether an update has changed any of the indexed keys or whether an application has attempted to update keys in an index that was simultaneously used to iterate over the collection objects. A major achievement in the design of the collection store is implementing a flexible data model in spite of the limited information about application behavior.

5.1 Specification This section describes the data model and the interface of the collection store. 5.1.1 Data Model A collection is a set of objects that share one or more indexes. All objects in a collection must inherit from a common superclass, the collection schema class, which is associated with the collection. The database schema can be evolved by subclassing the collection schema class. The collection store can only index the data members that are defined in the collection schema class. An object can belong to at most one collection. 3 An index maps keys to the corresponding objects. An index covers all objects in exactly one collection. Indexes can enforce key uniqueness. Rather than require applications to define index keys as a sequence of fields at fixed offsets within collection schema objects, as is common in other embedded databases [8, 7, 6], the collection store implements functional indexes [16]. Keys in functional indexes are generated by applying an extractor function to objects in a collection. The extractor function must be pure, that is, its output should depend solely on its input. Compared to defining index keys by their offsets in collection objects, functional indexes lead to definitions that are simpler, type-safe, and more flexible: keys can contain variable-sized fields and it is possible to index derived values. 5.1.2 Interface The interface of collection store consists of four C++ classes: Indexer, CTransaction, Collection and Iterator. The type of the templatized class Indexer uniquely identifies an index on a collection. The class is templatized by the collection schema class, the index key class and the definition of the extractor function. An Indexer object 3 Although this restriction is not inherent to our data model, it simplified the implementation of the collection store and reduced the per-object storage overhead.

10

void commit(bool durable) Commits the transaction in the given durability mode. void abort() Aborts the transaction. WritableRef createCollection(string name, GenericIndexer* indexer) Returns a writable reference to a new named collection with a single index identified by indexer. ReadonlyRef readCollection(string name) Returns a read-only reference to an existing named collection. WritableRef writeCollection(string name) Returns a writable reference to an existing named collection. void removeCollection(string name) Removes a named collection along with all objects that were previously inserted into the collection.

Figure 5: CTransaction interface. also determines whether keys are unique and how the index is implemented (B-tree, dynamic hash table [20] or list). All instances of the Indexer class are required to inherit from non-templatized class GenericIndexer to allow polymorphic access to Indexer objects. Applications do not invoke any methods of the Indexer class except its constructor. The CTransaction class implements methods to terminate transactions, create new Collection objects or obtain references to them. The specification of the CTransaction class interface can be found in Figure 5. The Collection class is a subclass of the Object class. It implements methods to create and remove indexes on a collection, insert objects into a collection and query objects in a collection. The specification of the Collection class interface can be found in Figure 6. All index key classes are required to inherit from the GenericKey class to allow polymorphic access. As is evident from the interface, the collection store does not support complex queries such as joins or aggregates. If any of the indexes created on a collection is unique, an insertion of an object may raise an exception should the inserted object create a duplicate key in any of the unique indexes. Similarly, creating a new unique index on a nonempty collection may raise an exception should the created index cover duplicate keys. The Iterator class implements methods to enumerate objects in result sets returned by queries. Iterators enumerate each object at most once (i.e., the iterators are unidirectional). Applications can dereference the currently enumerated object in either readonly or writable mode to obtain an appropriate ObjectRef. Applications can update objects dereferenced in writable mode or delete a currently enumerated object, which triggers an automatic update of all affected indexes. The collection store implements insensitive iterators [25], i.e., applications do not see effects of their updates until the iterator is closed. In Figure 7 we illustrate the interface of collection store using a simple example based on the classes defined in Figure 4. Implementing the Profile class on top of the object store, as in Figure 4, does not scale to storing a large number of Meter objects because each access to the Profile object results in reading or writing the contents of meters, which contains all Meter objects. A more scalable implementation would represent the Profile class as a collection to reduce the volume of data read or written to the accessed Meter objects and a logarithmic number of index meta-objects. Figure 7 shows a collection-based implementation of the Profile class and its sample usage.

5.2 Implementation 5.2.1 Indexers Each Collection object maintains a list of Indexer objects that describe the indexes created on the objects in the collection. The Indexer objects uniquely identify indexes on the collection, including the index-related type information. The Indexer class is the only class in the collection store that is templatized by the index specific types such as the collection schema class and index key class. The remaining classes in the collection store such 11

void createIndex(CTransaction* t, GenericIndexer* indexer) Creates a new index identified by indexer. Raises an exception if indexer specifies an unique index and any of the objects in the collection violates uniqueness of the index. void removeIndex(CTransaction* t, GenericIndexer* indexer) Removes an index identified by indexer from the collection. Raises an exception if there is only one index on the collection. void insert(CTransaction* t, Object* object) Inserts object into the collection. Raises an exception if insertion of object would violate uniqueness of any of the collection indexes. Iterator* query(CTransaction* t, GenericIndex* indexer) Returns an iterator on a result set of a scan query that uses index identified by indexer. Iterator* query(CTransaction* t, GenericIndexer* indexer, GenericKey* match) Returns an iterator on a result set of an exact-match query that uses index identified by indexer and key match. Iterator* query(CTransaction* t, GenericIndexer* indexer, GenericKey* min, GenericKey* max) Returns an iterator on a result set of a range query that uses index identified by indexer and keys min and max.

Figure 6: Collection interface. as Collection or CTransaction are unaware of these types and use subclass polymorphism to compare and hash index keys and pickle and unpickle objects in a collection. The Indexer objects are themselves accessed polymophically in the Collection class implementation using the GenericIndexer superclass to perform runtime type checking of the objects inserted into the collection and keys provided as arguments to query() methods. The former must belong to subclasses of the collection schema class and the latter must belong to classes that inherit from the index key parameter class of the Indexer object that is used in the query. Since all templatization is limited to a single, relatively small class, the Indexer, the collection store provides type safety without suffering code bloat from excessive code parameterization. 5.2.2 Iterators A common problem in implementing iterators is their sensitivity to updates performed by the same transaction while the iterator is open. A special case of this problem is the Halloween syndrome [14], when an application updates keys of an index that is used as an access path for the iterator, which may lead to an indefinite iteration. To provide maximum protection for DRM applications, the collection store implements insensitive iterators. The applications using insensitive iterators do not see the effects of their updates while the iterator remains open. The absence of a database language compiled by the collection store complicates the implementation of insensitive iterators in several ways:

The collection store cannot prevent immediate updates of objects that were dereferenced in a writable mode. The collection store has no control over which index is used to query a collection. Consequently, the collection store cannot avoid the Halloween syndrome by selecting an appropriate index. The collection store enforces several constraints that together guarantee iterator insensitivity: 1. Writable references to objects in collections cannot be obtained via any other means than dereferencing an iterator on the collection. 2. No other iterators on the same collection can be open when an iterator is dereferenced in a writable mode. 3. Iterators can only be advanced in a single direction. 4. Index maintenance is deferred until iterator close. 12

// Modified definition of Meter class Meter: public Object f ... int id; // unique to every Meter object int viewCount, printCount; g; // Create a new collection called profile that contains meters. Int idEx( Meter& m ) f return m. id; g Indexer idIndexer(unique, hashTable); CTransaction t; WritableRef profile = t->createCollection("profile", &idIndexer ); // Insert a new Meter object to the profile collection. profile->insert(&t, new Meter()); // Create a new index on the profile collection that enumerates Meter objects by their total usage count. Int usageCountEx( Meter& m ) f return m. viewCount + m. printCount; g Indexer countIndexer(nonUnique, B-tree); profile->createIndex(&t, &countIndexer); // Reset all Meter objects in the profile collection that have total count exceeding 100 Iterator* i = profile->query(&t,&countIndexer,100,plusInfinity); for( ; !i->end(); i->next() )f WritableRef m = i->write(); m-> viewCount = m-> printCount = 0; g

Figure 7: Sample usage of the collection store interface. The first constraint ensures that the state of an iterator can be affected only by updates performed in other iterators (including itself). Consequently, collection store applications are required to use the CTransaction class which, unlike the Transaction class, does not provide methods to directly create, update and delete objects. The second constraint isolates the state of each iterator from updates performed in other iterators that were open by the same transaction. The third and fourth constraints protect the state of each iterator from the effects of its own updates. In particular, the fourth constraint prevents the Halloween syndrome. 5.2.3 Index Maintenance The collection store updates all indexes on a collection during an iterator close to guarantee iterator insensitivity (see Section 5.2.2). For each iterator, the collection store maintains a list of ids of objects that were dereferenced in a writable mode using the iterator. The list is used to redo all index updates when the corresponding iterator is closed. Compared to database systems that compile queries and updates, the index update is complicated by the fact that the collection store cannot statically determine which of the indexes will require an update. The application can call any public method on objects that it dereferenced in a writable mode, which in turn may affect any of the indexed keys. For each object dereferenced in a writable mode, the collection store determines which of the indexes needs to be updated by comparing snapshots of all indexed keys before and after each object is updated. The post-update key snapshot can be easily obtained during an iterator close by calling all extractor functions on the object version in the object store cache. The pre-update key snapshot could be obtained by re-reading the chunk that holds the object from the chunk store and applying the extractor functions to it. However, such an approach would degrade the

13

db name Berkeley DB [32] C-ISAM [8] Faircom [7] RDB [6] TDB - all modules collection store object store backup store chunk store support utilities

.text size (KB) 186 344 211 284 250 45 41 22 115 27

Figure 8: Code footprint size.

performance of the collection store because reading from the chunk store requires external store access. In addition, such an approach violates the layering of TDB. We have chosen a different approach, which trades off extra storage overhead for better performance. The collection store records key snapshots along with ids of updated objects. The snapshots are created prior to returning a writable reference to an application and thus prior to any updates on the object. When an iterator is closed, the snapshot is compared against the snapshot computed over the object version cached by the object store. It is possible to reduce the extra storage overhead by allowing applications to declare index keys as immutable and forego recording of those keys in the snapshot. Postponing index updates until iterator close creates another problem: the application updates could have created duplicate keys in unique indexes. The collection store cannot prevent such updates because it cannot even detect when such updates occur. Therefore, the collection store removes all objects that violate index integrity from the collection and raises an exception to the application when the iterator is closed. The exception object contains a list of ids of all objects that were removed from the collection so that the application can re-integrate them in the collection. 5.2.4 Index Implementation All storage management in the collection store is delegated to indexes, which maintain maps from keys to object ids. Currently, the collection store implements B-tree, dynamic hash table, and list indexes. TDB can be configured to use any non-empty subset of the above index implementations. The index meta-objects, such as hash buckets or B-tree nodes, are locked using a two-phase locking policy like any other objects. We have not implemented a sophisticated concurrency control that would allow the early release of such locks (e.g., [28]) because we do not expect a typical DRM workload to contain too many concurrent transactions and the complexity of implementation of such protocols did not seem to be justified by the potential performance gain.

6 Code Footprint TDB is designed as an embedded database system for DRM applications. Consequently, the size of its code footprint is an important design constraint. In Figure 6 we compare the size of the code footprint of TDB against other embedded database systems. The comparison is based on the size of the .text segment on the x86 platform, which typically represents the bulk of static memory space requirement of the system. As shown in Figure 6, TDB’s footprint is comparable to other embedded database systems, although it delivers more functionality such as protection against malicious corruption and unauthorized reading, fast incremental backups and type-safe access. TDB can also trade off functionality for performance. For example, its minimal configuration, which consists of chunk store, external store, and support utilities, requires only 142 KB space of static memory space.

14

Collection Account Teller Branch History

Size 100000 1000 100 252000

Figure 9: TPC-B Tables and Sizes

7 Performance We compared the performance of TDB with Berkeley DB 3.0.55 [32] using the TPC-B benchmark [13]. We chose Berkeley DB as a yardstick because it is widely used in many applications including several LDAP server implementations, a PGP certificate server and the Kerberos authentication server. Berkeley DB is also provided as a package in the Perl programming language environment.

7.1 TPC-B Benchmark We chose TPC-B, which is a dated benchmark, for the performance comparison because the Berkeley DB distribution includes an implementation of a TPC-B driver, which we assume is efficient. Therefore, we avoided optimizing a system we do not understand. In addition, Berkeley DB implements a weaker collection model than TDB. In particular, it supports only a single index per collection, which must have immutable keys. Therefore, we had to select a simple benchmark (such as TPC-B) where these limitations would not be an issue. We found that the Berkeley DB implementation of TPC-B differs slightly from the official specification [13]. Our implementation of TPC-B follows the Berkeley DB implementation in these cases. The benchmark schema consists of four collections: Account, Teller, Branch and History. Objects in all four collections are 100 bytes long and contain 4-byte unique ids. A transaction reads and updates a random object from each of the Account, Branch and Teller collections and inserts a new object into the History collection. The initial number of objects in each collection is scaled down from the benchmark specification to better model the size of an embedded database. Figure 9 shows the initial sizes of all collections.

7.2 Platform All experiments were run on a 733 MHZ Pentium 3 with 256 MB of RAM, running Windows NT 4.0. Both database systems used files created on a NTFS on an EIDE disk with 8.9 ms (read) and 10.9 ms (write) seek time, 7200 rpm (4.2 ms average rotational latency), and a 2MB disk controller cache. The one-way counter was emulated as a file on the same NTFS partition. As in Berkeley DB, we configured TDB to open log files with WRITE THROUGH, which instructs the OS to write through the file cache. In all experiments we configured both database systems to use 4MB of cache, which is the default for BerkleyDB implementation of TPC-B.

7.3 Experiments We conducted two experiments using TPC-B to measure TDB’s performance and compare it with Berkeley DB. In the first experiment we compared the average response time for a transaction of Berkeley DB to TDB with security (TDB-S) and without security (TDB). Without security TDB does not incur the processing overhead of hashing and encrypting data or the storage overhead for hash values and padding for encryption. Finally, without security, TDB does not increment the disk-based one-way counter after each transaction. We configured the security parameters of TDB-S to use SHA-1 for hashing and 3DES for encryption. There are other algorithms that are as secure as 3DES and run significantly faster [26]. In this experiment we configured the maximum database utilization (i.e., the maximal fraction of the database files that contain live chunks) at 60%, which is the default for TDB.

15

In the second experiment we studied the impact of database utilization level on performance and space efficiency of TDB (without security) because the database utilization has been shown to greatly effect the performance of logstructured systems [30]. As the utilization increases, the database becomes more compact, but the amount of work done by the cleaner to generate free regions for writing increases. We measured both the transaction throughput and the resulting database size after running the benchmark. For each experiment, we ran 200,000 TPC-B transactions. The results reported are the average response time over the later 100,000 transactions (when the systems had reached steady-state).

7.4 Results The results from the first experiment can be found in Figure 10. The average response time per transaction for TDB was approximately 56% of that of Berkeley DB. The large difference in response time is largely because Berkeley DB writes approximately twice as much data per transaction as TDB (1100 bytes vs. 523 bytes, on average). Because TDB uses variable-sized chunks, it is able to write updates compactly.

avg. response time (ms)

8 6.8

7

5.8

6 5 3.8

4 3 2 1 0 BerkeleyDB

TDB

TDB-S

Figure 10: Performance of Berkeley DB, TDB and TDB-S.

8

350

7

300

6

database size (MB)

avg. response time (ms)

Surprisingly, even the average response time for TDB-S was only about 85% of that of BerkeleyDB. The average response time for TDB-S was greater than that of TDB, largely because it writes more data as a result of padding for block encryption and has a higher per-chunk storage overhead (12 bytes) because it stores one-way hashes in the location map. Consequently, checkpointing the location map was more expensive. On the other hand, the extra CPU overhead of hashing and encryption was relatively small (less than 10% of the total CPU overhead).

5 TDB BerkeleyDB

4 3 2

250 200

TDB BerkeleyDB

150 100 50

1 0

0 0.5

0.6

0.7

0.8

0.9

0.5

utilization

0.6

0.7

0.8

0.9

utilization

Figure 11: TDB performance and database size vs. utilization. The results for the second experiment can be found in Figures 11. The response time results are somewhat surprising. The average response time initially decreases slightly as the utilization increases and then increases substantially after 70% utilization is reached. The initial decrease is a result of the smaller database. The file-system cache is more 16

effective with higher database utilization because each cached block contains more useful data. However, at utilization levels higher than 70%, the cleaning overhead exceeds the benefit of a smaller database. At 50% utilization, the cleaner does not run. As the utilization increases from 60% the time spent cleaning increases. Nonetheless, even at a utilization levels approaching 90%, TDB performed comparably to Berkeley DB. Figure 11 also shows the rate at which the resulting database size decreases as utilization increases. The size for Berkeley DB is much larger than TDB because it does not checkpoint the log during the benchmark.

8 Related Work Although we are not aware of any database system that protects its database against malicious data corruption, a number of database system share some aspects of their design with TDB. PicoDBMS is a database system designed to execute on resource-contrained smartcards [2]. While TDB’s storage model is optimized for a disk, PicoDMBS’s storage model is optimized for EEPROM. Finally, PicoDBMS provides a more sophisticated query processing capability than TDB. The BSD UNIX packages dbm and ndbm and their GNU alternative gdbm [12], are libraries that support keyed access to files. Like TDB these packages are tightly integrated with the application programming language (C) and they also provide indexed access to records in a file. Unlike TDB, they provide storage for untyped byte sequences, not typed objects. Also, they support only a single index per file, and leave it to the application to update the index explicitly. In this sense they provide maps, not indexes. The Berkeley DB [32] is an extension of the dbm package. It shares with dbm the basic data model, which allows one map per file. Unlike dbm, the Berkeley DB provides transactional access to the keyed files. Several commercial database systems [8, 7, 6] support data models similar to the ISAM standard for indexed access to sequential files [15]. The ISAM data model is similar to TDB in that it allows multiple indexes per data files and the indexes can be dynamically created. The indexes are automatically maintained while the underlying records are updated. Unlike TDB, all of the implementations require applications to specify the indexed fields in a record by their offset and length. Consequently, none of the implementations can index variable-sized or derived values. In addition, none of the implementations provides type-safe access to the records. Some of the implementations use a separate data definition language independent of the programming language [8, 6]. Object-oriented database systems provide tight integration with the application programming language. Many of them provide transparent pickling, swizzling, and locking. Providing these functions requires a tool that parses object representations and inserts suitable runtime checks in the code. This entails significant implementation effort and complexity. For example, PS-algol [1], EXODUS/E [3], and Thor/Theta [21] provide compilers for their languages that generate code with persistence in mind. In PS-algol and Thor, transient and persistent objects are defined and used similarly; persistence is determined by reachability from a persistent root, and unreachable objects are garbage collected. ObjectStore provides persistence for regular C++ classes without furnishing its own compiler [19]. It does use a tool to parse C++ class representations and virtual-memory techniques to manage persistence. These techniques are not suitable for embedded environments that do not provide virtual memory. Some object-oriented database systems, such as GemStone [24], provide automatic index maintenance using pathbased indexing. Here, keys are expressed as paths of instance variables stored in objects. There are several problems with this approach. First, it does not support keys derived from multiple instance variables. Second, evaluating keys and noticing modifications requires help from the compiler. A few database systems, such as GOM [18], provide index maintenance using function materialization. This is similar to functional indexes in TDB, except GOM maintains and stores the materialized keys with the object, while TDB computes them when objects are opened for writing, which reduces space usage and may reduce computation. Functional indexing was also proposed in the context of Thor [16]; it supports indexing of overlapping sets of objects and requires storing additional metadata with each object. Although log-structured storage was extensively used in file system design [30], we are not aware of any other log-structured database system. However, some of the database systems use a versioned storage model, where data updates are converted into inserts of new data versions. But the newly created versions are not necessarily written sequentially. The Postgress storage system maintains with each table a log of record versions [33]. Unlike logstructured systems, Postgress makes record versioning visible to applications by allowing them to query old versions of records. The Shadows database system implements atomic updates by versioning database pages [35]. New page versions are written to slots occupied by stale versions of other pages. These slots do not have to be contiguous, thus there is no need for compaction by cleaning as in log-structured storage systems.

17

9 Conclusions We have presented the design of TDB, a database system for Digital Rights Management (DRM) applications. The design of TDB exploits the characteristics of DRM applications. It strikes a balance between implementation complexity and ease of use to reduce code footprint and bugs in the database system as well as the application. The lowest layer of TDB, the chunk store, implements a log-structured storage model, which is integrated with the mechanisms for tamper-detection required in most DRM database systems. Log-structured storage also enables fast incremental backups and higher resilience against traffic-analysis attacks. We demonstrate that the overhead of cleaning on TPC-B workload is relatively small for database utilization up to 80%. The object store provides type-safe, transactional access to C++ objects. The object store design is simplified by not supporting strictly orthogonal persistence of objects. The object store does not support object reference swizzling and transparent locking. Also, in line with regular C++ usage, the object store does not automatically garbage collect unreachable objects. The object store implements no steal buffer management, as most DRM databases are small and have cacheable working sets. The collection store provides keyed access to collections of objects using one or more indexes. The indexes are automatically maintained when the objects are updated. The collection store design is simplified by using C++ constructs instead of a separate database language to describe the database schema or express queries. Yet the collection store implements a flexible data model with type-safe access to collection objects, schema evolution via class inheritance, and indexing of derived values. The collection store does not implement early release of locks on index meta-objects because most DRM applications are not accessed by many concurrent users. Our experiments based on the TPC-B workload show that although TDB incurs extra overhead due to encryption and hashing, it achieves better performance than a widely used embedded database system, Berkeley DB. We also show that TDB has code footprint similar to other embedded database systems.

References [1] M. P. Atkinson, P. J. Bailey, K. J. Chisholm, W. P. Cockshott, and R. Morrison. An approach to persistence. The Computer Journal, 26(4), 1983. [2] C. Bobineau, L. Bouganim, P. Pucheral, and P. Valduriez. PicoDBMS: scaling down database techniques for the smartcard. In Proceedings of the 26th International Conference on Very Large Databases, 2000. Cairo, Egypt. [3] M. J. Carey, D. J. DeWitt, G. Graefe, D. M. Haight, J. E. Richardson, D. T. Schuh, E. J. Shekita, and S. L. Vandenberg. The EXODUS extensible DBMS: an overview. Technical Report Technical Report 808, University of Wisconsin, Madison, Computer Sciences, 1988. [4] M. Castro, A. Adya, B. Liskov, and A. Myers. HAC: Hybrid adaptive caching for distributed storage systems. In Proceedings of the 16th ACM Symposium on Operating Systems Principles, 1997. [5] ContentGuard. Rights management from Xerox. available at www.contentguard.com, 2000. [6] Centura Software Corp. RDM (database www.centurasoft.com/products/databases/rdm, 2000.

manager).

available

at

[7] Faircom Corp. C-Tree Plus. available at www.faircom.com/product tech/ctree/c-tree.html, 2000. [8] Informix Corp. C-ISAM. available at www.informix.com/informix/techbriefs/cisam/cisam.htm, 2000. [9] InterTrust Technologies Corp. Digital rights management. available at www.intertrust.com/de/index.html, 2000. [10] SanDisk Corp. Consumer products. available at www.sandisk.com/cons/product.htm, 2000. [11] SSFDC Forum. Features and specifications of SmartMedia. available at www.ssfdc.or.jp/english/spec/index.htm, 2000. [12] Free Software Foundation. gdbm. available at www.gnu.org/software/gdbm/gdbm.html, 2000. [13] J. Gray. The Benchmark Handbook For Databse and Transaction Processing Systems. Morgan Kaufmann, 1991. [14] J. Gray and A. Reuter. Transaction Processing: Concepts and Techniques. Morgan Kaufmann, 1993. [15] The Open Group. Indexed sequential access www.opengroup.org/public/prods/dmm4.htm, 1998.

method

(ISAM).

available

at

[16] D. Hwang. Function-based indexing for object-oriented databases. PhD thesis, Massachusetts Institute of Technology, 1994.

18

[17] IBM. Cryptolope technology. available at www.ibm.com/security/cryptolope, 2000. [18] A. Kemper, C. Kilger, and G. Moerkotte. Function materialization in object bases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 1991. [19] C. Lamb, G. Landis, J. Orenstein, and D. Weinreb. The ObjectStore database system. Communications of the ACM, 34, 1991. [20] P. Larson. Dynamic hash tables. Communications of the ACM, 31(4), 1988. [21] B. Liskov, A. Adya, M. Castro, M. Day, S. Ghemawat, R. Gruber, U. Maheshwari, A. Myers, and L. Shrira. Safe and efficient sharing of persistent objects in Thor. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 1996. [22] D. Lomet. The case for log structuring in database systems. In Proceedings of the 6th International Workshop on High Performance Transaction Systems, 1995. Asilomar, CA. [23] U. Maheshwari, R. Vingralek, and W. Shapiro. How to build a trusted database system on untrusted storage. In Proceedings of the 4th Symposium on Operating Systems Design and Implementation, 2000. San Diego, CA. [24] D. Maier and J. Stein. Indexing in an object-oriented dbms. In Proceedings of International Workshop on Object-Oriented Database Systems, 1986. [25] J. Melton and A. Simon. The New SQL: A Complete Guide. Morgan Kaufmann, 1993. [26] A. Menezes, P. van Oorschot, and S. Vanstone. Handbook of applied cryptography. CRC Press, 1996. [27] R. Merkle. Protocols for public key cryptosystems. In Proceedings of the IEEE Symposium on Security and Privacy, 1980. Oakland, CA. [28] C. Mohan and F. Levine. ARIES/IM: an efficient and hight concurrency index management method using write-ahead logging. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 1992. San Diego, CA. [29] J. Moss. Working with persistent objects: To swizzle or not to swizzle. IEEE Transactions on Software Engineering, 18(8), 1992. [30] M. Rosenblum and J. Ousterhout. The design and implementation of a log-structured file system. In Proceedings of the 13th ACM Symposium on Operating Systems Principles, 1991. Pacific Grove, CA. [31] M. Seltzer, K. Bostic, M. McKusick, and C. Staelin. An implementation of a log-structured file system for UNIX. In Proceedings of the 1993 Winter USENIX Conference, 1993. San Diego, CA. [32] M. Seltzer and M. Olson. Challenges in embedded database system administration. In Proceeding of the Embedded System Workshop, 1999. Cambridge, MA (software available at www.sleepycat.com). [33] M. Stonebraker. The design of the Postgress storage system. In Proceedings of the 13th VLDB Conference, 1987. Brighton, UK. [34] Infineon Technologies. Eurochip II SLE 5536. available http://www.infineon.com/cgi/ecrm.dll/ecrm/scripts/prod ov.jsp?oid=14702&cat oid=8233, 2000. [35] T. Ylonen. Shadow paging is feasible. Licentiate’s thesis, Helsinki University of Technology, 1994.

19

at

TDB: A Database System for Digital Rights Management - CiteSeerX

TDB: A Database System for Digital Rights Management - CiteSeerX

Suggest Documents