data stores to select the best fit for their Web-based applications. Upon a suc-. * For the full list .... In paper [2], they added a new operator, called horizontal partitioning, to the previous ..... âThe ASSISTment Builder: Towards an Analysis of Cost.
Automatic Physical Database Tuning Middleware for Web-based Applications Jozsef Patvarczki⋆ and Neil T. Heffernan Worcester Polytechnic Institute, Computer Science Department, 100 Institute Road, Worcester, Massachusetts, USA {patvarcz,nth}@cs.wpi.edu http://www.cs.wpi.edu/~patvarcz
Abstract. In this paper we conceptualize the database layout problem as a state space search problem. A state is a given assignment of tables to computer servers. We begin with a database and collect, for use as a workload input, a sequence of queries that were executed during normal usage of the database. The operators in the search are to fully replicate, horizontally partition, vertically partition, and de-normalize a table. We do a time intensive search over different table layouts, and at each iteration, physically create the configurations, and evaluate the total throughput of the system. We report our empirical results of two forms. First, we empirically validate as facts the heuristics that Database Administrators (DBAs) currently use as in doing this task manually: for tables that have a high ratio of update, delete, and insert to retrieval queries one should horizontally partition, but for a small ratio one should fully replicate a table. Such rules of thumb are reasonable, however we want to parameterize some common guidelines that DBAs can use. Our second empirical result is that we applied this search to our existing data test case and found a reliable increase in total system throughput. The search over layouts is very expensive, but we argue that our method is practical and useful, as entities trying to scale up their Web-based applications would be perfectly happy to spend a few weeks of CPU time to increase their system throughput (and potentially reduce the investment in hardware). To make this search more practical, we want to learn reasonable rules to guide the search to eliminate many layout configurations that are not very likely to succeed. The second aspect of our project (not reported here) is to use the created configurations as input into a machine learning system, to create general rules about when to use the different layout operators. Keywords: Database tuning, partitioning, layout search, Web-based applications
1
Introduction
Nowadays, Database Administrators have to be familiar with multiple available data stores to select the best fit for their Web-based applications. Upon a suc⋆
For the full list of our funders please see http://www.webcitation.org/5xp605MwY.
2
Automatic Physical DB Tuning Middleware for Web-based App.
cessful selection, scalability issues related to the database could be a possible bottleneck of the system. Especially, if the requests are distributed among multiple database servers that could lead to slow response time or a possible system crash. Scalability issues and their possible solutions should be automatically addressed without spending enormous amount of time on investigating the database structure or investing into expensive hardware solutions. Our work is motivated by scalability issues of Web-based applications. Our novel assumption is that we can increase the total throughput of a Web-based application by automatically creating database configurations that are capable of answering all the incoming query templates using a single node. We use a second assumption that in the case of a Web-based application we can know all the incoming query templates beforehand because the user interacts through a possible Web interface such as web forms [9]. The application logic executes the same hard wired queries over and over again for the same web form request. We conceptualize the database layout problem as a state search problem where each state is a valid configuration of layouts across database nodes. We search for the best configuration that can maximize the total system throughput and increase the performance of the application. Our system is capable of connecting to the application database and it conducts an automated state space search over the database layouts based on the given workload. The workload needs to contain all the incoming query templates of the application to be able to create the join graph of the tables and determine the applicable operators. The join graph represents the connections between tables in all the queries. The system also examines the constraints graph of the tables of the database to validate the relationships of each involved tables. The constraint graph represents the relationships between tables that are involved in the queries such as one-to-one, one-to-many, and many-to-many. Based on the constructed graphs the system performs an analysis to determine the groups of possible applicable partitioning operators. These groups serve as an input to the state space search that can eliminate many invalid created layout configurations. A state Sn is considered to be valid if and only if the created layout configuration Li correctly answers each and every query Qi from the given workload W . To further increase the practical aspect of the state space search we propose to learn general guiding rules that can eliminate valid states with high precision. These valid states have no or have worse impacts on the overall throughput of the system. In Section 2 we describe the related work and in Section 3 we outline our proposed solution. Our solution for State Space Search, features selection, and Cut-off points detection is described in Section 4. Our middleware system is described in Section 5. Experimental results are discussed in Section 6, in Section 7 we summarize our contribution, and in Section 8 we conclude the work and discuss future directions.
Automatic Physical DB Tuning Middleware for Web-based App.
2 2.1
3
Related Work Parallel and Hybrid Solutions
MapReduce [7] is a programming model with an associated implementation to process and generate huge amounts of data in a large scale (hundreds and thousands nodes), heterogeneous shared-nothing environment. More recently, a marriage of MapReduce with DBMS Technologies led to HadoopDB [1] a hybrid architecture. HadoopDB targets the performance and scalability of a parallel database and the fault-tolerance feature of the flexible MapReduce to achieve better structured data processing. It uses Hadoop [22], the open source implementation of MapReduce, to parallelize the queries across nodes. HadoopDB uses PostgreSQL [10] as a database layer that processes the translated SQL queries. Parallel databases use shared-nothing infrastructures in a clustered environment and execute queries in parallel using multiple nodes. They can scale up well if the number of the involved nodes is small. In a heterogeneous environment the probability of a possible server failure is high and parallel databases are not designed for use in a large-scale heterogeneous environment. There is a conceptual difference between parallel database management systems (DBMS) and MapReduce based systems (Hadoop). In the case of DBMS, user can state what he/she wants in SQL language. In the case of MapReduce-like systems the user presents an algorithm to specify what he/she wants in a low-level programming language [17]. Analytic Database [27] utilizes cheap shared-nothing commodity hardware and it is designed for large scale data warehouses. It uses a column-store architecture where each column is independently stored on different nodes. It applies vertical partitioning on the original dataset to create multiple partitions that can be replicated across cluster nodes. It is used mostly for read intensive analytical applications where the system has to access a subset of columns. Verticas optimizer is designed to operate on this column-partitioned architecture to reduce I/O cost dramatically. The optimizer of HadoopDB does not have full support for joins nor cost-based optimization of the queries. Netezza’s [6] parallel system is a two-tiered system capable of handling very large queries from multiple users. Teradata [25] supports single row manipulation, block manipulation and full table or sub-table manipulation as well. It distributes the data randomly utilizing all the nodes. Its built-in optimizer can handle sophisticated queries, ad-hoc queries, and complex joins. IBM DB2’s data partitioning [21] tool suggests possible partitions based on the given workload and the frequency of each SQL statement occurrence. For IBM DB2’s data partitioning selection strategy [29] the main goal is - for a given static database schema and workload characteristic - to minimize the overall response time of the workload in multiple nodes. There are several other parallel databases available like Exadata [15] (parallel database version of Oracle), MonetDB [26], ParAccel [12], InfoBright [24], Greenplum [8], NeoView [28], Dataupia [5], etc. that all combine different techniques to achieve better performance and reliability. According to our best knowledge, none of them conducts a state space search and involves full replication, horizontal and
4
Automatic Physical DB Tuning Middleware for Web-based App.
vertical partitioning, and de-normalization operators by answering each query using a single database node. 2.2
Automated Physical Design Solutions
While there has been work in the area of automating physical database design [21, 29] we are not aware of any work that addresses the problem of incorporating the full range of common operators and can learn rules to better partition a given database with multiple database nodes. Papers [19, 29, 2, 20] study full replication and data consistency using distributed heavy weighted transactions. For example, in [3] they automatically select an appropriate set of materialized views and indexes to optimize the physical design for a given database and query workload as a built in tool in Microsoft SQL server 2000 using a single node. The paper [3] has tackled the very related problem of how to know automatically which indexes the system should use and which materialized views should apply. In paper [2], they added a new operator, called horizontal partitioning, to the previous optimization goal in Microsoft SQL server 2005. Microsoft SQL server 2005 offers an automated database design tool that offers physical design recommendation for horizontal partitioning. IBM DB2 [21] examined the problem of laying out multiple nodes which use many of the same operators we propose. GlobeTP [9] exploits the same fact that our system does: the workload of the web application is composed of a small set of query templates. In [13] their approach describes two common properties of Web-based applications. According to their strong assumption, workload is (1) dominated by reads and (2) it consists of a small number of query and update templates (typically between 10 and 100). Using the second assumption, their system solves strong consistency management of the servers. DBProxy [4] observed that most applications issue template-based queries and these queries have the same structure that contains different string or numeric constraints. AutoPart [14] deals with large scientific databases where the continuous insertions limit the application of indexes and materialized views. For optimization purposes, their algorithm horizontally and vertically partitions the tables in the original large database according to a representative workload using a single node. Ganymed [18] uses a novel scheduling algorithm that separates update and read-only queries using multiple nodes. GlobeDB [23] offers a different approach for edge servers to handle data distribution. Their system automatically partitions and replicates the database through a wide area network using multiple nodes.
3
Proposed Solution
Our middleware architecture is also based on shared-nothing commodity hardware where each node has its own CPU, disk, RAM, and file system. We focused on a Web-based application where the workload consists of a fixed number of query templates. This means the system is not presented with ad-hoc and unexpected queries. According to our best knowledge, none of the existing systems
Automatic Physical DB Tuning Middleware for Web-based App.
5
specializes for Web-based applications and considers the database layout problem as a state space search problem with the assumption that all the incoming queries should be answered by a single node. Our system is also novel because the state based search applies four different operators: full replication, horizontal and vertical partitioning, and de-normalization. Since we know all the query templates beforehand our system can pre-partition the data using these operators and pre-determined heuristics. Comparing to HadoopDB, we do not need to re-partition the data since we do not have unexpected and ad-hoc queries. By characterizing the problem as a state space search over database layout configurations, we iteratively minimize the total cost of the workload creating different database layouts and increase the total system throughput. Moreover, the second aspect of the project is a system that has a built-in corpus with generalized machine learned rules to determine which operator is applicable and when. As soon as the layout is determined, the data is distributed across the server nodes. A central dispatcher similar to the HadoopDN catalog - maintains the statistics about the current layout (table descriptors, data part locations, etc.). All the joins are pre-computed in our middleware and the communication bottleneck is eliminated. Our central dispatcher can push each query into the database layer directly where the well-defined schemas support indexing. We support SELECT with joins, INSERT, UPDATE, and DELETE SQL statements natively. We do not have additional failure detection mechanism, but the system is easily expandable with a full copy of the original database. Furthermore, a possible integration with Hadoop grants an additional layer that is capable to scale up to thousands of nodes as needed.
4
State Space Search over Layouts
We consider the database layout problem as a state space search problem with the assumption that all incoming queries should be answered by a single node. We do a time intensive search over different layouts, and each time, physically create the configurations, and evaluate the total throughput of the system. A state is a given assignment of tables to computer servers. The operators in the search are to fully replicate, horizontally partition, vertically partition, and denormalize a table. The search over layouts is very expensive, but we argue that our method is practical and useful, as entities trying to scale up their Web-based applications would be perfectly happy to spend few weeks of CPU time to increase their system throughput. Figure 1 shows an initiated complete search. As a start state (state 0) we fully replicate all tables across all database nodes and measure the total throughput of the system using the given workload. The system provides breath-first-search and depth-first-search algorithms to traverse the search tree. The default search algorithm is depth-first-search. We traverse down an entire path (state 1, 2, 3, and 4) before backtracking to the next valid path (state 2, 5, and 6). A state Sn is considered to be valid if and only if the created layout configuration Li correctly answers each and every query Qi from the given workload W . As soon as a valid state is created the system mea-
6
Automatic Physical DB Tuning Middleware for Web-based App.
Fig. 1. State Space Search
sures the system throughput. As one of the guiding rules we backtrack to the next valid path if the throughput of a child is less than the throughput of its parent. For example, if the throughput of sate 18 is less than the throughput of state 17 then we will not explore states 19, 20, 21, and 22. To further increase the practical aspect of the state space search we propose to learn general guiding rules that can eliminate valid states that are not very likely to succeed because they have worse or no impact on the overall throughput of the system. We parameterize common guidelines that DBAs use as heuristics in doing this task manually. Table 1 shows the features we applied to parameterize guidelines for the search. We divided the features into three categories: table-related, query- and workload-related, and state-related features. Table-related features like table size, distinct values, number of columns, etc. help to pre-select the applicable operators for a specific table. ”Is schema heavy?” feature considers the table schema heavy if one of the table columns’ type can generate high database memory and caching demand. If the column type is text, byte, xml, etc. and the frequency of the column specific retrieval query is high then one should help the database to share the memory requirements among multiple nodes. The detection of ’WHERE’ conditions can help to identify a possible operator and a partitioning key. We do not consider horizontal partitioning operator for tables that are involved in queries without a ’WHERE’ clause. For example ’SELECT * from table a’ can break our assumption that we can answer each query using a single node. Although we rule out applicable operators if specific query templates are involved, this assumption can reduce the intercommunication cost between nodes because each join is pre-computed and pushed into a single data node. Query and workload features are also very important to determine exact cut-off points. For example, a table with few tuples is not worth horizontally partition unless the number of table-related inserts versus the select queries’ ratio is high enough. As another example: consider the frequency of the queries in the workload (for all queries and table specific ones as well) and determine a cut-off point to characterize the application as write intensive (high number of Update, Delete, and Insert queries) or as read intensive one (high number of Select queries). With the determination of cut-off points we can pre-determine
Automatic Physical DB Tuning Middleware for Web-based App.
7
Table 1. Illustrating Table, Query, Workload, and State features used by the system. Table Features Table size Primary key Foreign key # of indexes # of distinc values # of columns Is schema heavy? # of INSERTS # of UPDATES # of DELETES # of table accesses State Features Full Replication Horizontal Partitioning Vertical Partitioning De-normalization Total Throughput
Query & Workload Features UDI vs. R query ratio % of UDI and R queries for Table A % of queries of Table A with ”WHERE” condition % of ”JOINS” on Table A comparing to other JOIN queries % of queries that involves a JOIN and Table A comparing to all queries % of queries that involves Table A compared to other queries % of queries that involves Table A and other tables compared to all queries % of queries that involves Table A’s columns compared to other queries % of queries that involves Table A’s columns and other tables’ columns compared to other queries The frequency of UDI and R queries
heuristics, guide a parameterized state space search better, and eliminate valid states that otherwise would not increase the system throughput. 4.1
Cut-off points detection
Cut-off points help the state space search to focus on creating layout configurations that could boost the performance of the application. Cut-off points reduce the size of the search space by eliminating valid table-operator keypairs because of their possible negative performance effect on the system. Figure 2 presents an example for the detection of the cut-off points. ASSISTments [11] is a Web-based Intelligent Tutoring System. In the system the number of sessions can be balanced among application servers but the continuous database retrieval queries, and update, delete and insert (UDI) queries decrease the system response time significantly. Currently, the ASSISTments system supports thousands of users over Massachusetts. It consists of multiple application servers, a load balancer, and a database server. We collected real-time queries during a week interval and constructed a workload that reflects a heavy tutoring day. The workload has all the query templates of the system (136) and 50,000 queries. The frequency of the queries reflects the real system’s workload characteristics. We determined the Cut-off points for select vs. insert, select vs. update, and select vs. delete query ratios. We removed all the UIDs from the workload and replaced them with retrieval queries to maintain 50,000. The system fully replicated all the tables and created multiple partitions across three database nodes. We simulated three
8
Automatic Physical DB Tuning Middleware for Web-based App.
Table 2. Two-tailed T-TEST results for retrieval vs. update ratio. R/U ratio 75%/25% Full Replication Horizontal Part. 81.25%/18.75% Full Replication Horizontal Part. 87.5%/12.5% Full Replication Horizontal Part. 93.75%/6.25% Full Replication Horizontal Part. 100%/0% Full Replication Horizontal Part.
Run 1[s] Run 2[s] Run 3 [s] AVR[s] TTEST p-value 4080.551 4089.372 4085.493 4085.139 4624.573 4630.825 4632.409 4629.269
less than 0.01
5416.133 5452.314 5421.186 5429.878 5501.191 5450.292 5393.945 5448.476
0.6394
5169.829 5170.803 5176.965 5172.532 5195.582 5194.494 5206.076 5198.717
less than 0.01
4913.791 4911.573 4925.373 4916.912 4914.215 4914.195 4927.423 4918.611
0.1231
4401.551 4409.372 4402.493 4404.472 4673.159 4660.961 4675.263 4669.794
less than 0.01
Fig. 2. Cut-off points of Horizontal Partitioning and Full Replication (upper: retrieval vs. insert query ratio with detected Cut-off point; lower: retrieval vs. update query ratio with zoomed Cut-off range
Automatic Physical DB Tuning Middleware for Web-based App.
9
application servers each with the 50,000 queries and captured the total throughput of the system (Times[s]) averaging the results. We turned off the cache of the PostgreSQL servers to eliminate the effect of query caching. After the initial measurements we increased the UDI query ratios in the workload. We measured each state three times to be able to determine that the results are significantly different. We found significant differences in each case. Table 2 presents one of the two-tailed T-TEST results. We zoomed into the 0%-75% interval to make the Cut-off points precise. Based on the results, we can see that is worth fully replicating a table if the percentage of the UDI vs. retrieval query ratio is less than or equal to 6%. We can also see that is worth considering horizontal partitioning if this ratio is greather than 18%. The average of the standard deviations is +/- 9.7. We also tried to change the total number of queries in the workload from 50,000 to 100,000 (system response time increased), but the Cut-off points did not change significantly.
5
The Middleware
In this section, we describe our middleware and the process of the state space search over different layout configurations. Figure 3 illustrates the high-level architecture of the system and its main inputs. The workload specifies the sequence of queries that were collected during normal usage of the database. It needs to contain all incoming query templates of the application. Constraints represent the relationships between tables. We considers one-to-one, one-to-many, and many-to-many relations. The placement algorithm determines the applicable operators, partitions, and partitioning keys based on the workload and constraints. The system also considers the application source database and the available database nodes for partitioning which applies the nodes’ addresses and database connector strings. Figure 4 shows the modularized architecture of the
Fig. 3. High-level architecture of the Database Tuning Middleware
system. The Data Placement Algorithm (DPA) [16] is responsible for determining the valid sets of tables and keys for each operator based on the given
10
Automatic Physical DB Tuning Middleware for Web-based App.
workload, constraints, and node information. As soon as the algorithm creates the sets it passes them to the State Space Search Module (SSSM) (Section 4). This module is the heart of the search. If the SSSM determines a valid state then it contacts the Layout Module (LM) to initiate the creation of the physical configuration with the selected operator. The LM stores information about the created configurations in the Layout Bank (LB). Therefore, it asks the LB for the required configuration. If the LB has no previous information about the requested layout then the LM starts the layout generation process. First, it connects to the source database to initialize the layout generation. Then it collects information about the source table Users (e.g. column names and types, indices, triggers, etc.) and generates the new table schema (Table name + Applied operator + New Table identifier + Partitioning Key) utilizing the given database nodes (e.g. Users HP 2abc4 id). In the next step the LM creates the new tables using the cloned table information on each node. With a pre-defined hash function, it distributes the tuples among multiple database nodes before sending the layout information to the LB. When the layout creation finishes, the SSSM initiates a test request to measure throughput of the created state. The Tester Module (TM) simulates the real world example with multiple application servers. Each simulated application server uses the workload to generate hundreds or thousands of requests for the back-end. The Database Connector (DCM) and Linker (DLM) modules are responsible for initializing and maintaining the database connections towards the available database nodes. The Query Analyzer Module (QAM) parses the queries in the workload and rewrites them to replace the original table names with the partitioned ones (e.g. replace table Users with Users HP 2abc4 id). It uses the same hash function the LM applies to determine the correct database nodes for data retrieval. Once the data is laid
Fig. 4. Modules of the Database Tuning Middleware
out on the database servers, the LM updates the middleware’s Query Router (QR) about the new changes in the layout configuration [16]. The QR maintains multiple connections to the database nodes, routes queries to the correct node, and transfers results back to the requestor. As soon as a new configuration is laid
Automatic Physical DB Tuning Middleware for Web-based App.
11
out, the Web-based application can connect to the QR without needing any code modification. The application detects the QR as a database node that manages and hides the configuration differences utilizing multiple databases.
6
Experimental Results
Our empirical result is determining a reliable increase in total system throughput by applying a hill climbing search to our existing data test case (the ASSISTments Intelligent Tutoring System). We understand that heuristics searches converge to local minimums but this is an acceptable compromise to achive a faster search. We collected real-time queries during a week interval and constructed a workload that reflects an average tutoring day. The workload had all the query templates of the system (136) and consisted of 1000 queries. The frequency of the queries reflected the real system’s workload characteristics. Our three database nodes had the following configurations: node 1, node 2, and node 3 are an Intel Xeon 4 Core CPU with 8 GB RAM running FreeBSD 7.1 i386. The database software used on all nodes is PostgreSQL version 8.2.13. We turned off the cache of the PostgreSQL servers to eliminate the effect of query caching. We utilized our query router and used three simulated application servers, each issued 1000 queries respectively. The router was an Intel Pentium 4, 3 GHz machine with 4 GB RAM running Ubuntu 4.1.2 OS. The code for this middleware is written in Python version 2.6. The bandwidth between the query router and the database nodes is 100 Mbps, and the number of hops from the query router to the database servers are equal. Each simulated application server repeated the throughput measurement 4 times and the entire measurements was repeated 3 times. We took the average of the three repeated executions. The experiment was running for 48 hours. The state space search considered 168 valid states and eliminated 155 ones that had no effect on the total throughput of the system. Figure 5 shows the results. We had 11 distinct operators. Each state is significantly different from the initial state 0 where we fully replicated all the 44 tables across the nodes. We found that State 10 gave the most significant 10% throughput improvement of the system.
7
Contributions
The contribution of this paper is a methodology which can help Web-based applications that deal with scalability problems. The typical scaling bottleneck of an application is at the database side. We propose a simple solution to resolve this issue and we make certain assumptions about how to simplify the task. We propose a search methodology by which we search for better database layouts. One of the assumptions we actually impose upon ourselfs is that queries should be answerable by a single database node. By making this assumption we simplified the processing of individual queries to the databases. Of course, sometimes DBAs want to write queries that can go accross all database nodes involving multiple tables, e.g. for analytic purposes but these analytic queries
12
Automatic Physical DB Tuning Middleware for Web-based App.
Fig. 5. Experimental results
can be executed as a background task by the DBAs. Furthermore, we proposed an established middleware that is general and it can be used by any Web-based application. We established two empirical results. First, we verified and parameterized the common assumption that one should horizontally partition tables or fully replicate them if the ratio of UDIs to the retrieval queries is greather than 18% or less than 6%. Second, we reported our experimental results using real application workload and performing the layout search over multiple database nodes. By conceptualizing the problem as a state space search problem and by doing a full state space search, we were able to physically create the layouts and evaluate the overall throughput of the system to parameterize the guiding rules. With our methodology we were able to increase the overal system throughput by 10%. We also identified other features that can be important to guide the search. We propose to use them as an input of a machine learning system to create general rules about when to use the different layout operators.
8
Future Work
We would like to use our middleware to actually find more interesting rules involving the proposed features of tables, queries, workload, and states. We propose to use them as rules of thumb of a machine learning system to recommend when to use the different layout operators. Our hypothesis is that we can learn more rules to capture human-like expertise and use these rules to better partition a given database. We also propose to evaluate our solution against many different web applications to illustrate the benefits of our approach.
Automatic Physical DB Tuning Middleware for Web-based App.
13
References 1. A. Abouzeid, K. Bajda-Pawlikowski, D. Abadi, A. Silberschatz, and A. Rasin. Hadoopdb: an architectural hybrid of mapreduce and dbms technologies for analytical workloads. Proc. VLDB Endow., 2:922–933, August 2009. 2. S. Agrawal, S. Chaudhuri, L. Kollar, A. Marathe, V. Narasayya, and M. Syamala. Database tuning advisor for microsoft sql server 2005. In In Proceedings of VLDB, pages 1110–1121, 2004. 3. S. Agrawal, S. Chaudhuri, and V. R. Narasayya. Automated selection of materialized views and indexes in sql databases. In Proceedings of the 26th International Conference on Very Large Data Bases, VLDB ’00, pages 496–505, San Francisco, CA, USA, 2000. Morgan Kaufmann Publishers Inc. 4. K. Amiri, S. Park, R. Tewari, and S. Padmanabhan. “DBProxy: A Dynamic Data Cache for Web Applications”. In IEEE Int’l Conference on Data Engineering (ICDE), Bangalore, India, March. 2003. 5. Dataupia. Dataupia satori server. http://www.dataupia.com/pdfs/ productoverview/Dataupia%20Product%20Overview.pdf, 2008. 6. G. S. Davidson, K. W. Boyack, R. A. Zacharski, S. C. Helmreich, and J. R. Cowie. Data-centric computing with the netezza architecture. In SANDIA REPORT, Unlimited Release, April. 2006. 7. J. Dean, S. Ghemawat, and G. Inc. Mapreduce: simplified data processing on large clusters. In In OSDI04: Proceedings of the 6th conference on Symposium on Opearting Systems Design and Implementation. USENIX Association, 2004. 8. Greenplum. Greenplum database 3.2 administrator guide. http://docs.huihoo. com/greenplum/GPDB-3.2-AdminGuide.pdf, 2008. 9. T. Groothuyse, S. Sivasubramanian, and G. Pierre. “GlobeTP: Template-Based Database Replication for Scalable Web Applications”. In Int’l World Wide Web Conf. (WWW), Alberta, Canada, May. 2007. 10. P. G. D. Group. Postgresql. http://www.postgresql.org, 2011. 11. N. T. Heffernan, T. E. Turner, A. L. N. Lourenco, M. A. Macasek, G. Nuzzo-Jones, and K. R. Koedinger. “The ASSISTment Builder: Towards an Analysis of Cost Effectiveness of ITS creation”. In FLAIRS, Florida, USA, 2006. 12. A. MacFarland. The speed of paraccels data warehousing solution changes the economics of business insight. http://www.clipper.com/research/TCG2010008. pdf, 2010. 13. C. Olston, A. Manjhi, C. Garrod, A. Ailamaki, B. M. Maggs, and T. C. Mowry. A scalability service for dynamic web applications. In IN PROC. CIDR, pages 56–69, 2005. 14. E. Papadomanolakis and A. Ailamaki. Autopart: Automating schema design for large scientific databases using data partitioning. In In Proceedings of the 16th International Conference on Scientific and Statistical Database Management, pages 383–392. IEEE Computer Society, 2004. 15. A. O. W. Paper. A technical overview of the sun oracle database machine and exadata storage server. http://www.oracle.com/technetwork/database/exadata/ exadata-technical-whitepaper-134575.pdf, 2010. 16. J. Patvarczki, M. Mani, and N. Heffernan. Performance driven database design for scalable web applications. In Proceedings of the 13th East European Conference on Advances in Databases and Information Systems, ADBIS ’09, pages 43–58, Berlin, Heidelberg, 2009. Springer-Verlag.
14
Automatic Physical DB Tuning Middleware for Web-based App.
17. A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden, and M. Stonebraker. A comparison of approaches to large-scale data analysis. In SIGMOD ’09: Proceedings of the 35th SIGMOD international conference on Management of data, pages 165–178, New York, NY, USA, 2009. ACM. 18. C. Plattner and G. Alonso. Ganymed: Scalable replication for transactional web applications. In ACM/IFIP/USENIX Int’l Middleware Conference, Toronto, Canada, October. 2004. 19. C. Plattner, G. Alonso, and M. T. zsu. Dbfarm: A scalable cluster for multiple databases. In In Middleware, pages 180–200, 2006. 20. R. Ramamurthy, D. J. DeWitt, and Q. Su. A case for fractured mirrors. In International Conference on Very Large Databases, pages 430–441. Morgan Kaufmann Publishers, 2002. 21. J. Rao, C. Zhang, G. Lohman, N. Megiddo, J. Rao, C. Zhang, and N. Megiddo. Automating physical database design in a parallel database. In Proc. 2002 ACM SIGMOD, pages 558–569, 2002. 22. J. Shafer, S. Rixner, and A. L. Cox. The hadoop distributed filesystem: Balancing portability and performance, April. 2010. 23. S. Sivasubramanian, G. Pierre, and M. van Steen. “GlobeDB: Autonomic Data Replication for Web Applications”. In Int’l World Wide Web Conf. (WWW), Chiba, Japan, May. 2005. ´ 24. D. Slezak and V. Eastwood. Data warehouse technology by infobright. In Proceedings of the 35th SIGMOD international conference on Management of data, SIGMOD ’09, pages 841–846, New York, NY, USA, 2009. ACM. 25. T. Sue Clarke. Butler group research paper. http://www.teradata.com/library/ pdf/butler_100101.pdf, October. 2000. 26. M. Vermeij, W. Quak, M. Kersten, and N. Nes. Monetdb, a novel spatial columnstore dbms. In Academic Proceedings of the 2008 Free and Open Source for Geospatial (FOSS4G) Conference, OSGeo, pages 193–199. Inge Netterberg and Serena Coetzee (Eds.), 2008. 27. Vertica. The vertica analytic database introducing a new era in dbms performance and efficiency. http://www.redhat.com/solutions/intelligence/collateral/ vertica_new_era_in_dbms_performance.pdf, 2009. 28. R. Winter. Hp neoview architecture and performance. http://h20195.www2.hp. com/v2/GetPDF.aspx/4AA2-6924ENW.pdf, 2009. 29. D. C. Zilio, A. Jhingran, and S. Padmanabhan. Partitioning key selection for a shared-nothing parallel database system. In IBM Research Report RC, 1994.