The RasDaMan Approach to Multidimensional Database Management Peter Baumann, Paula Furtado, Roland Ritsch, Norbert Widmann Bavarian Research Centre for Knowledge-Based Systems (FORWISS) Orleansstr. 34, D-81667 Munich, Germany email: fbaumann,furtado,ritsch,
[email protected] fax: +49-89-48095-203
Abstract Multidimensional discrete data (MDD), i.e., arrays of arbitrary size, dimension, and base type, are receiving growing attention among the database community. MDD occur in a variety of application elds, e.g., technical/scienti c areas such as medical imaging, geographic information systems, climate research, scienti c simulations, and businessoriented applications like OLAP and data mining. In all these application elds the data managed can be modeled as MDD. RasDaMan (Raster Data Management in Databases) is a basic research project sponsored by the European Community where industrial and research partners collaborate to develop comprehensive MDD database technology. In the approach adopted, the logical and physical levels are strictly separated. A data de nition language for multidimensional arrays together with a declarative, optimizable query language allow for powerful associative retrieval. A streamlined storage manager for huge arrays enables fast, ecient access to MDD subsets. Previous work has con rmed that such an approach can lead to substantial improvements in functionality and performance, particularly in networked environments. In this contribution, we present the RasDaMan approach to MDD management together with the application elds chosen for assessment and evaluation. 1 Introduction While images are still the most prominent example of raster data, in principle any natural phenomenon becomes spatiotemporal raster data of some speci c dimensionality once it is sampled and quantised for storage and manipulation in a computer system; moreover, a variety of arti cial sources such as simulators, image renderers, and business data analyzers produce raster data. 1-D raster data frequently represents time series like the output of a seismographic sensor. 2-D raster images are of prime importance in a lot of areas. RasDaMan is sponsored by the European Commission in the ESPRIT Domain 4: Long-Term Research under grant no. 20073 c 1997 by the Association for Computing Machinery, Inc. Copyright Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for pro t or direct commercial advantage and that copies bear this notice and the full citation on the rst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior speci c permission and/or a fee. Request permissions from Publications Dept., ACM Inc., fax +1 (212) 869-0481, or (
[email protected]).
3-D raster data comprises the output of climate simulations or animated 2-D images. Still higher dimensions can occur { for instance, the ISO/IEC imaging standard PIKS (Programmer's Imaging Kernel System) de nes its type BasicImage as a 5-D entity consisting of three spatial, a time, and a channel (e.g., color) dimension [13]. Finally an arbitrary number of dimensions is possible in the eld of online analytical processing (OLAP) and data mining. The common characteristic they all share is that a large set of large multidimensional arrays has to be maintained, hence the scienti c term Multidimensional Discrete Data or MDD. Although MDD form a very well-de ned category of data structures, they have received surprisingly little attention among the database community for a long time. Traditionally, commercial DBMSs still revert to so-called BLOBs, binary large objects, which date back to the long elds suggested fteen years ago [17]. As the name already suggests, in this approach the DBMS does not know anything about the application semantics, but treats the multidimensional data item as a one-dimensional, encoded byte string. This notably holds not only for relational systems, but also for object-oriented DBMSs: as for today, DBMSs do not allow modeling of multidimensional data on a suciently high semantic level so that a exible, declarative query language can give support similar to that available on conventional database primitives such as strings and numbers. Besides this lack of functionality, the particular sequential storage method used with the mapping to BLOBs leads to a heavy performance penalty on all access patterns more sophisticated than sequential line-by-line access. Investigation in the pivotal MDD application areas medical imaging/PACS, geographic information systems, and OLAP/data mining have clearly shown that there is a common need for MDD services, provided this functionality can be decoupled from the legion of application speci c data formats in use. Functions such as subimage extraction, layer extraction (projection), and aggregation along speci ed dimensions play an important role in all these application elds. Even a certain class of content-based retrieval methods can be performed without MDD interpretation, namely those that only consider pixel level information. In this paper, the RasDaMan (Raster Data Management in Databases) approach to storage, manipulation, and retrieval of MDD in databases is presented. The RasDaMan approach to domain-independent MDD management follows the line rst published in [2] where, in order to determine the technical requirements for comprehensive MDD support in databases, investigations were undertaken in the area of image processing and analysis. The functionality regarded
as feasible for MDD DBMSs has been adapted to the particular needs of database interfaces to obtain classical DBMS properties such as declarativeness, orthogonality, optimizability, and data independence; this has been further formalized in [3]. Meanwhile, the concept has undergone substantial revision, in part to achieve conformance with the database standards SQL-92 [5, 14] and ODMG-93 [7]. Compared to other work in the eld, there are two main distinguishing features. First, we strive for a comprehensive approach combining a conceptual MDD model based on well-de ned mathematical semantics with a streamlined MDD storage management. Second, there is a close adherence to the relevant standards, SQL and ODMG. The sequence of this paper is organized as follows. In Section 2, we outline the approach followed by RasDaMan. Section 3 contains a brief comparison of the envisaged application areas; a review of the state of the art in MDD database technology follows in Section 4. Section 5 concludes the plot. 2 The RasDaMan Approach For adequate storage and retrieval of MDD, full array support across all database layers is needed. In the RasDaMan approach, a clear distinction is made between the logical (query) level and the physical (storage organization and data transmission) level of array management. On the conceptual level, arrays are treated as a general data abstraction in the sense of [22]; they can be of any dimensionality, they can have an arbitrary ( xed or variable) number of elements per dimension, and both primitive and derived types are admissible as array base types. A declarative array query language oers a rich set of highlevel operations similar to SQL set operations. The model has formal set-algebraic semantics based on AFATL Image Algebra [19], a rigid mathematical framework able to express any image transformation. Section 2.1 introduces the RasDaMan data model in an informal way using examples from typical application areas; see [3] for a formal treatment. On the physical level, a novel combination of tiling and spatial indexing allows for the ecient execution of queries on MDD while oering the bene ts of conventional database technology, such as query performance depending on the result set (and not on the overall data set size), concurrency control, support for crash recovery, and transaction management. Section 2.2 introduces the concepts of the storage manager. Due to the data independence achieved, it is possible to make encoding/decoding and compression/decompression internal features invisible to the application and adaptable to each client's actual needs. Query functionality is available regardless of the format in which data sets are maintained, and data can be provided in exactly the format required by the application; this can be the target machine and compilers' main memory representation for immediate further processing such as edge ltering or a Fourier transform, or some data exchange format such as JPEG to exploit hardware support for fast display. Necessary transformations are done within the DBMS which has to decide if it is more ecient to transform before or after network transmission and in which format to actually store data. An overview of the system architecture is presented in Section 2.3.
Figure 1: 3-D reconstruction of human skull and 2-D projection 2.1 The Conceptual MDD Model In this Section, key elements of the RasDaMan RasQL (Raster Query Language) interface are presented; RasQL itself is subdivided into RasDL (Raster De nition Language) and RasML (Raster Manipulation Language). RasQL is embedded in the object-oriented DBMS standard ODMG with its Object De nition Language (ODL) and Object Query Language (OQL), a superset of ISO SQL-92. Care was taken to keep extensions well-de ned and local. RasDL extends ODMG's ODL with n-D arrays for MDD, by providing the class template Marray which is parametrized with the array base type T and the spatial domain sd of the array, i.e., its dimension and boundaries. A spatial domain speci cation for an n-D array with lower and upper bounds loi and hii , resp., for each dimension 2 f1 g has the form i
:::n
[lo1 : hi1 lo2 : hi2 lon : hin ] Both loi and hii can take on the special value \*" denoting a variable boundary; only then growing and shrinking of a corresponding MDD instance are allowed. Two running examples will serve to demonstrate the modeling and retrieval facilities. The rst example is taken from a clinical environment; speci cally, we focus on 3-D computed tomograms or volume tomograms (VTs; see Figure 1). Such images consist of a sequence of (typically) 256256 2-D grayscale slices acquired by the scanner while the patient is moved through the device in small steps under control of the radiologist. The corresponding RasDL de nition reads as follows, with a sample link to the patient record added: ;
;:::;
class VolumeTomogram { d_Ref patient; Marray< short, [ 1:256, 1:256, 1:* ] > data; };
The second example, Landsat TM images, draws on satellite-based remote sensing. The family of Landsat TM (thematic mapper) satellites acquires images of size 70205760 through seven 8-bit sensors in dierent frequency bands yielding a total size of 270 MB (Figure 2). A class LandsatImage for the storage of Landsat images together with some embedded registration information might be de ned like this (with the ellipsis properly expanded):
class LandsatImage { RegistrationInfo regInfo; Marray< struct { unsigned short band1, ... , band7; }, [ 1:7020, 1:5760] > data; };
bounds would expand to the eld limits of the current MDD instance. A related operation, projection, reduces the dimensionality of the MDD by xing one or more dimensions to a layer of thickness one. If, like in the following example, one coordinate of a 3-D MDD is xed, a two dimensional MDD is returned. Example 2: \Three axial slices through the volume tomograms at point (x0,y0,z0)." select vt.data [ x0, *:*, *:* ], vt.data [ *:*, y0, *:* ], vt.data [ *:*, *:*, z0 ] from VolumeTomogram as vt
Generally speaking, the spatial operations category consists of all those that change the point set of an array; among the further operations not addressed here are the ane operations translation, rotation, and scaling of MDD. The high practical importance of spatial operations arises from the fact that the overhead of carrying out a operation on the server is easily oset by the reduced amount of data to be transferred. Consequently, performing such spatial operations on the server leads to a substantially faster network transfer and, ultimately, much shorter response times. The next family of functions operates on the pixel values. For every function available on the array base type, the so-called induced function is available which applies the operation simultaneously to all pixels of the array. Example 3: \The bands 1, 3 and 7 of all Landsat images, with the intensity of band 7 raised by 5." select li.band1, li.band3, li.band7 + 5 from LandsatImage as li
Figure 2: Sequence of satellite images Let us now turn from schema to instance level and inspect the query facilities. The RasML language is standard OQL enriched with primitives for MDD handling. The following examples show how to express spatial (1 and 2), induced (3) and aggregate operations (4) in RasML. How all these functions can be used together to state complex queries is shown in Example 5. Update of MDD with RasML is shown in Example 6. The trim operator serves to extract a subarray while leaving the dimensionality unchanged. Example 1: \The cutout between points (x0,y0) and (x1,y1) from all Landsat images." select li.data [ x0:x1, y0:y1 ] from LandsatImage as li
The wildcard operator \*" used in any one of the
The record accessor is de ned on the (aggregate) pixel type exactly like in C++, hence it can be induced to select a channel from the whole image: To perform statistical evaluation, business data consolidation and the like, a means is required to aggregate multidimensional data into lower-dimensional summary information. The condenser statement iterates over a given index range and combines all values found through the operation indicated into a single value. In the following examples, the generic function spatial_domain() is used which returns a list of low bound/high bound pairs describing the current array boundaries. Example 4: \For each volume tomogram, the maximum intensity occurring in the data cube." select condense max over p in spatial_domain(vt.data) using vt.data[p] from VolumeTomogram as vt
The condenser inspects all array elements and determines the maximum; note that no iteration sequence is prescribed, thereby leaving sucient space for query optimization, maybe through parallel evaluation. Aggregation is very important in the OLAP area as a prerequisite for roll up operations; the condense operation can also be used to compute a MDD of some lower dimension containing aggregated values. Condensers comprise a generalization of the relational aggregate functions; they are useful not only for extracting scalar information from multidimensional data sets, but they also allow retrieval conditions to be stated on complete arrays or parts thereof. In both cases, condensers can
dramatically reduce the amount of data to be transferred to the client application. To demonstrate orthogonality of the concepts, the following example shows a more complex query. Example 5: \The area of all volume tomograms speci ed by bitmask b, where the average of the intensities in the area exceeds 127." select vt.data and b from VolumeTomogram as vt where ( (condense + over p in spatial_domain(vt) using vt.data[p] * b[p]) / (condense + over q in spatial_domain(b) using b[q]) ) > 127
The rst condense adds the intensities in the area not masked out by the binary MDD b. The second condense counts the number of pixels in the mask. Division of the two values results in the average intensity, which must be greater than 127 for the MDD to be put into the result set. Finally the induced operation and with the bitmask is applied to the MDD to mask out irrelevant areas. Finally we show how RasML can be used to update a part of an MDD. Example 6: \Update the slice at z=42 which stretches across the whole cube in x and y directions with the 2-D image coming from the CT scanner." update VolumeTomogram set data[ *:*, *:*, 42 ] = scan_data where patient = scanned_patient
In a similar way, it is possible to extend an MDD with variable bounds in an update query by specifying a range bigger than the current spatial domain. Note that this does not involve replacing the complete MDD as is generally the case in conventional SQL when extending a BLOB. It is the task of the RasDaMan Storage Manager to organize MDD internally in a way that allows ecient processing of such statements. 2.2 Storage Management Due to its potentially huge size, MDD requires specialized internal storage structures which allow paging when accessing or processing the data. The storage structure for an MDD object should be designed to minimize the number of pages accessed when an operation is executed on the object or part of it. The limiting factor for the overall performance will be the degree to which the spatial proximity between multidimensional items is maintained. Storage management (SM) in RasDaMan aims at providing ecient access to MDD objects or parts of them and transparent support for various storage devices. The example in Figure 3 is used to explain the concepts of storage management in RasDaMan. The three parts of the gure (I-III) depict three dierent storage strategies for the same two-dimensional MDD object (e.g., a Landsat image). In each of the sub gures, the total area of the rectangle (except for the region marked 2) is the MDD instance. For each case, the object is subdivided into chunks which are stored in dierent disk pages. Dierent subdivisions are adopted for the three dierent storage strategies, namely, linear subdivision, aligned tiling and arbitrary subdivision. The area marked 1 in dark gray is the area of interest in the total MDD (conceptual level), e.g. a desired trimmed result like in Example 1 of Section 2. Disk pages (physical
level) accessed during the retrieval of the area of interest are marked with small letters starting with a and colored in light gray. On the right side, the distribution of area 1 inside the retrieved disk pages is shown. White areas of the disk pages indicate unused storage of each page in the disk. In conventional systems, MDD objects are stored as BLOBs. A BLOB is internally subdivided into linear sequences of bytes, each sequence corresponding to a separate page on disk (see Figure 3{I). A directory page or an index (usually a B-tree) provides access to the dierent pages of an object. This approach means a FORTRAN-like linearization which favors line-by-line access schemes and degrades all other accesses in a disastrous manner. On compressed data, even random access to lines would be disabled. Consequently, every access other than to the whole BLOB eld suers from a lot of application programming eort and very unsatisfactory performance. Figure 3{I illustrates this problem. The image is linearly subdivided into database pages. Even though subimage 1 has a size smaller than two database pages, six pages on disk (pages a{f) have to be accessed to operate on the subimage. In addition, in each page on disk, data is dispersed, as shown on the right side of the picture. For other operations inef ciency is even worse. For example, extension of the image with a new subimage 2 requires complex operations to be executed in all pages that compose the image. In order to avoid these problems, a subdivision of the data into aligned multidimensional rectangular tiles (de ned by a grid) has been proposed [24]. This is shown in Figure 3{II. As can be seen, access to subimage 1 now involves generally only four database pages, which is an improvement over the previous case. Only two would be required in the optimal case (exact that of coincidence between the subdivision into tiles and the area of interest). Extending the image to the right with subimage 2 would only involve the addition of extra pages to the image. One step further in providing exibility is to support subdivision into arbitrary multidimensional rectangular tiles (possibly nonaligned) as described in [10, 11] (Figure 3{III). For the same situation described previously, only two pages have to be accessed for subimage 1 if the storage of the image is optimized according to areas of interest, and extending the image with subimage 2 only requires the addition of one or more pages with the new tile(s). While there is no xed optimal structure for all operations and all MDD objects, our research has shown that subdivision into arbitrary multidimensional rectangular tiles is better for MDD. It allows more ecient execution of the most common operations on MDD objects (for instance, access to a subimage) and, as it is exible, it also allows, as a special case, the subdivision of the data into linear sequences of pixels or aligned rectangular tiles, for those cases where such approaches yield better performance. A system supporting such generic tiling allows the de nition of tiles corresponding to areas of interest. Performance can therefore be improved by optimizing for the most common access patterns. RasDaMan supports an innovative arbitrary rectangular tiling. Based on usage statistics, user provided information or data analysis, the most adequate tiling for an object can be adopted. Due to the declarative query language RasML, operations are not speci ed in a certain order of visited pixels like in conventional BLOB based systems. They can, therefore, be carried out in the optimal order for the storage layout of the tiles. Such a generic subdivision of the object's domain and the need to support spatial access to the objects (typically
Figure 3: Access Operations for Dierent Tiling Schemes access to a multidimensional rectangular area) requires a specialized index (since linear indexes only support ecient linear accesses). In RasDaMan, a spatial access method provides fast selection of MDD parts. For each object, a spatial index is created which maintains the information about the tiles of the object and corresponding spatial data. Optionally, tiles can be compressed. Dierent compression schemes can be used for dierent cases. In addition to paging, the classical hierarchical storage system has to be extended with tertiary devices to support storage of the huge amounts of data on hand. In RasDaMan, MDD objects will be transparently maintained on secondary or arbitrary tertiary storage devices; the organization of the data on secondary and tertiary devices will be chosen based on the characteristics of each device and typical access patterns, in an approach similar to those described in [8] and [20]. The tile based storage manager of RasDaMan will enable movement of tiles between secondary and tertiary storage. 2.3 Architecture The RasDaMan API consists of RasQL and the C++ Raster Library (RasLib) which serves for the integration of the MDD type into the C++ language. To make MDD persistent, RasDaMan follows the ODMG-93 standard [7] through providing the pointer type d Ref which behaves like a normal C++ pointer but is capable of managing persistent data. Hence, from a programmer's point of view the integration is seamless with no dierence between transient and persistent data. RasQL provides declarative query functionality which goes beyond the scope of ODMG's OQL. Therefore, it extends the SQL-92 subset of OQL with natural extensions
for MDD access. The client/server communication is based on the Distributed Computing Environment (DCE) of the Open Software Foundation (OSF) [21]. For transferring queries and their results over the net, the Client Communication Layer invokes DCE Remote Procedures Calls which are accepted by the Server Communication Layer. The Query Evaluator parses the query and builds an operator based query tree. Optimization of the query takes place in two steps. First, algebraic query rewriting based on rules derived from the AFATL Image Algebra is done and second, physical optimization based on tiling, clustering, and device information takes place. The Query Evaluator is tile-based, operations on MDD items are decomposed into operations on tiles. In the physical query optimization and the tile-based execution, the Query Evaluator uses modules of the storage manager to get information concerning physical storage and to fetch the tiles into main memory. As queries are speci ed with the declarative RasML, the Query Evaluator has high exibility to adapt query execution to a speci c physical storage layout. Dierent storage manager modules are used in dierent phases of query execution. To identify the tiles involved in a query and to calculate the costs to retrieve them, the Index Manager is consulted. The Catalog Manager takes care of schema information speci ed through a RasDL data de nition, whereas the Device Manager is responsible for handling dierent storage media characteristics. The nal execution plan is evaluated by retrieving tile sets from the Tile Manager and applying elementary image operations, e.g. spatial or induced operations, on them. The modules of RasDaMan are based on the commercial ODBMS 2 [1] from 2 Technology. An interface layer O
O
Figure 4: Simpli ed RasDaMan DBMS architecture between RasDaMan modules and the base DBMS, the Storage Management Interface, is responsible for the storage and access to all data in secondary and tertiary storage. This prepares RasDaMan for easy portability between dierent base DBMSs and storage systems. 3 Application Areas Due to the generality of the RasDaMan approach, it can be used for a wide variety of applications. Two application areas are being investigated in the project, medical imaging and GIS. Recently, OLAP received our attention due to its related requirements (see Section 2). In the medical environment, digital archival of patient data is becoming more and more standard [12, 18]. Data is produced in a wide variety of forms such as 1-D curves (e.g. ECG), 2-D images (e.g. Ultrasound), and 3-D volumetric data (e.g. Volume Computer Tomography). For interactive computer-supported consultations, the examiner needs advanced retrieval mechanisms, such as projections along dierent axes in volumetric data, cutouts and zooms, and associative search support for querying the database for particular medical phenomena (content based retrieval). STI, a Spanish software company, interfaces RasDaMan to a PACS (Picture Archiving and Communication System) which sub-
sequently will be evaluated under real-life conditions by the Spanish Hospital General de Manresa. In geographic applications [25], raster data is gaining more and more importance. With technological advances in memory, storage, and processing power, storing image data on a large scale is getting less expensive than vectorizing it. This data can naturally be modeled as 2-D MDD. 3-D and 4-D information arises, e.g., in climate simulations. Typical operations are selection by content (\less than 10% clouds") or recoding (\mark areas with temperature greater 40 in red"). A RasDaMan-based GIS application pilot is being set up by the French software company TransExpert to be assessed by the Spanish National Geographic Institute, Centro Nacional de Informacion Geogra ca. In OLAP [16], operational enterprise data are extracted into data warehouses to gain new strategic insights. To this end, the management user is presented with a multidimensional view which usually has a considerable number of dimensions, say, between four and ten. Consolidation and analysis queries stated by OLAP applications require CPU and disk intensive calculations such as roll up (aggregation along a speci ed dimension), slice and dice (projection and recombination) and pivoting (rotation). The feasibility of the RasDaMan approach for OLAP tasks is currently under investigation. A uniform model for MDD, together with ecient storC
age can be used as a solid basis for quickly developing applications. In all areas mentioned, complex operations on huge amounts of data will appear in future; for example, satellite data archives are planned in Petabyte (1024 TB) size. Hence, only by using transparent integration of tertiary storage can these data sets be managed in future. Provision of basic services close to the data source, i.e., the database, helps to eciently execute such operations and to minimize network trac. We, therefore, feel that an MDD DBMS is indispensable to meet the challenges posed by these application elds. 4 Related Work With object-oriented [15] and object-relational DBMSs [23], the application programmer has the means available to implement abstract data types (ADTs) for MDD of xed dimensions. However, no general MDD type is oered by such systems, and no MDD query language is available. Besides, it is not possible to provide a streamlined storage structure for MDD, let alone transparent tertiary storage support. One of the most prominent systems in this area is Illustra [23] which supports MDD through so-called DataBlades for 1-D time series and 2-D images. Internal storage of images, however, is still done in BLOBs. MDD of arbitrary dimensionality and optimized MDD storage structures are not supported. MDD storage systems optimize the storage of large quantities of MDD by tiling. This approach was suggested in [10, 11] where an MDD storage manager based on arbitrary rectangular tiling is described. Later, another approach based on aligned tiling has been described in [20] where some organization strategies for large multidimensional arrays on secondary and tertiary storage devices are presented. The OPTIMASS storage system [8] partitions multidimensional datasets into clusters based on device characteristics and an analysis of data access patterns. None of these systems is tightly integrated with a DBMS that has general-purpose MDD query capabilities. Specialized DBMSs are dedicated to high level operations on data for a particular application area. Paradise [9] is an example of a DBMS designed for handling 2-D MDD in GIS applications. MDD are modeled in the object-relational model of SHORE [6] as ADTs. Ecient storage of the raster ADTs is provided in Paradise by tiling the data into a set of SHORE objects. Paradise does not support MDD of more than two dimensions, nor does it provide a general MDD query language. 5 Conclusion We presented the RasDaMan approach to storage and retrieval of multidimensional discrete data with arbitrary size and dimensionality which is under development in the RasDaMan project. Based on a powerful mathematical model of imaging, a conceptual model for MDD has been developed and formalized which leans towards the relevant standards ODMG-93 and SQL-92. Making the full array semantics known to the DBMS allows for versatile, highlevel query support with algebraic optimization. By combining tiling techniques adopted from imaging with geo indexing taken from geo databases, a novel storage architecture has been developed which is particularly well-suited for fast, ecient operations on extremely large MDD sets on secondary as well as tertiary storage. Therefore, besides the
gain in functionality over BLOB-based systems, a substantially improved performance is expected. The implementation under way makes use of the commercially available object-oriented DBMS 2 to show the feasibility of integrating MDD services with conventional data types in a standards-driven manner; for this reason, ODMG conformance heavily in uenced our choice for 2 . Let us stress, however, that MDD concepts are not tied to the object-oriented approach; in fact, an experimental mapping to a relational DBMS has been undertaken [10, 11]. The next steps will consist of the completion of the system implementation and its evaluation through the PACS and GIS application pilots at the end user sites. In parallel, conceptual re nement of the query language as well as investigation on further storage management issues are planned. Also, a general benchmark for multidimensional databases will be developed and put into the public domain so that functionality and performance of such technology can be assessed and compared in an objective manner. The best way to keep informed about the ongoing development of RasDaMan is to access our WWW pages: O
O
http://www.forwiss.tu-muenchen.de/~rasdaman/
Acknowledgement We gratefully acknowledge the valuable comments of our reviewers and Paul Dunleavy, that helped us improving this work. We also would like to thank our industrial partners STI s.a., TransExpert e.i.g. and our end user partners Centro Nacional de Informacion Geogra ca and Hospital General de Manresa for their contributions to the project. References [1] F. Bancilhon, C. Delobel, P. Kanellakis: Building an Object-Oriented Database System. Morgan Kaufmann Publishers, San Mateo, CA, 1992. [2] P. Baumann: Language Support for Raster Image Manipulation in Databases. Proc. Int. Workshop on Graphics Modeling and Visualization in Science & Technology, Darmstadt, Germany, April 1992. [3] P. Baumann: On the Management of Multidimensional Discrete Data. VLDB Journal, 4(3)1994, Special Issue on Spatial Database Systems, pp. 401-444, 1994. [4] J. Boreczky, L. Rowe: Comparison of video shot boundary detection techniques. SPIE, 1996. [5] S. Cannan, G. Otten: SQL - The Standard Handbook. McGraw-Hill, 1993. [6] M. Carey, D. DeWitt, M. Franklin, N. Hall, M. McAulie, J. Naughton, D. Schuh, M. Solomon, C. Tan, O. Tsatalos, S. White, M. Zwilling: Shoring up Persistent Objects. Proceedings of the 1994 ACM-SIGMOD Conference, Minneapolis-Minnesota, 1994. [7] R.G.G. Cattell: The Object Database Standard: ODMG-93. Morgan Kaufmann Publishers, 1996. [8] L. Chen, R. Drach, M. Keating, S. Louis, D. Rotem, A. Shoshani: Ecient Organization and Access of Multi- dimensional Datasets on Tertiary Storage Systems. Information Systems Journal, April 1995.
[9] D. DeWitt, N. Kabra, J. Luo, J. Patel, J. Yu: ClientServer Paradise. Proceedings of the 20th VLDB Conference, Santiago, Chile, 1994. [10] P. Furtado, P. Baumann: Gest~ao de Informa c~ao Multidimensional Discreta em Bases de Dados. 5o Encontro Portugu^es de Computaca~o Gr~a ca, Aveiro, Portugal, pp. 241-252, February 1993. [11] P. Furtado, J. Teixeira: Storage Support for Multidimensional Discrete Data in Databases. Computer Graphics Forum - Special Issue on Eurographics'93 Conference, vol. 12, no.3, pp. 89-100, September 1993. [12] H. Garcia, D. Yun: Intelligent Distributed Medical Image Management. Proc. SPIE Medical Imaging Conference, pp. 80-91, February 1995. [13] The International Organization for Standardization (ISO): Information Technology: Computer Graphics and Image Processing, Image Processing and Interchange, Functional Speci cation. Part 2: Programmer's Imaging Kernel System: Application Program Interface. ISO/IEC IS 12087-2, 1992. [14] The International Organization for Standardization (ISO): Database Language SQL. ISO 9075, 1992. [15] W. Kim, Ed.: Modern database systems: the object model, interoperability, and beyond. ACM Press, 1995. [16] R. Kimball: The Data Warehouse Toolkit. John Wiley & Sons Inc., 1996. [17] R. Lorie: Issues in Databases for Design Transactions. in J. Encarnao, F. Krause (eds.): File Structures and Databases for CAD. North Holland Publishing, 1982. [18] R. Martinez, J. Kim, J. Nam, B. Sutaria: Remote Consultation and Diagnosis in a Global PACS Environment. Proc. SPIE Medical Imaging Conference, pp. 296-307, February 1993. [19] G. Ritter, J. Wilson, J. Davidson: Image Algebra: An Overview. Computer Vision, Graphics, and Image Processing, 49(1)1990, pp. 297-331, 1990. [20] S. Sarawagi, M. Stonebraker: Ecient Organization of Large Multidimensional Arrays. Tenth Int. Conf on Data Engineering, pp. 328-336, Houston, Feb. 1994. [21] John Shirley, Wei Hu, David Magid: OSF Distributed Computing Environment: Guide to Writing DCE Applications, 2nd Edition. O'Reilly & Associates, Sebastopol, CA, 1994. [22] J. Smith, D. Smith: Database Abstractions: Aggregation and Generalization. ACM ToDS 2(2)1977, pp. 105 [23] M. Stonebreaker, D. Moore: Object-Relational DBMSs: The Next Great Wave. Morgan Kaufmann Publishers, 1995. [24] H. Tamura: Image Database Management for Pattern Information Processing Studies. In: S. Chang, K. Fu (ed): Pictorial Information Systems. Lecture Notes in Computer Science Vol. 80, pp. 198-227, Springer 1980. [25] C. Tomlin: Geographic Information Systems and Cartographic Modelling. Prentice Hall, 1990.
About the Authors Peter Baumann is assistant head of the Knowledge Bases Research group within the Bavarian Research Centre for Knowledge Based Systems (FORWISS). He leads the RasDaMan project as Technical Manager. Peter Baumann holds a PhD in CS from Technische Universitat Darmstadt, Germany, and has published 21 reviewed papers and 22 technical reports and other publications in his scienti c areas, databases and computer graphics, in particular multidimensional databases.
Paula Furtado Graduation and Master in Computer Science at the University of Coimbra { Portugal. Assistant at that University since 1990. Has been working as a guest researcher at FORWISS in the RasDaMan project since October 1995. Her areas of interest include spatial indexes, multidimensional data storage and multimedia databases. Roland Ritsch has been working as a research scientist at FORWISS Munich since December 1995. He studied computer science and economics at the Technische Universitat Darmstadt where he took his diploma in November 1995. His research areas are object oriented and multidimensional databases, in particular design and optimization of declarative query languages. Norbert Widmann studied computer science and economics at the Technische Universitat Munchen. Since December 1995 he has been working at FORWISS in the RasDaMan project. His areas of interest include software engineering, geographic information systems and benchmarking.