Monitoring the execution of query plans

Monitoring the execution of query plans Anastasios Gounaris, Norman W. Paton, Alvaro A.A. Fernandes, Rizos Sakellariou University of Manchester Technical Report, June 2003

1 Introduction Monitoring the execution of a query plan is the activity concerned with the collection of information that becomes available from the completed or on-going parts of the execution. Monitoring can be classified in three categories according to the transmission of data and messages that are related to the monitoring procedure. First, monitoring can be performed at the level of physical operators that comprise the query plan. In that case, no data is conveyed, and consequently, there is no communication overhead. Second, monitoring may require a set of operators of the query plan on a node to communicate with each other. In that case, the communication overhead can remain very low. Third, operators or sets of operators on different nodes may send data to each other in order to monitor aspects of the query plan. The third case applies to situations in which the query plan is executed in a parallel or distributed setting. In the last two cases, the monitoring incurs communication overhead. In all cases there is a CPU overhead. In order to have a complete view about the quality of the execution of the query plan, apart from the query plan, the execution environment needs to be monitored as well (which can be seen as a fourth case of monitoring). In this report, the focus is on the first case of monitoring, in which no data is conveyed. The improvements in the quality of the derived information when monitoring falls in the other two categories are also discussed. The structure of the report is as follows: In the second section, how monitoring fits into the context of query processing is discussed. The following section deals with the different frequencies at which monitor can occur. Section 4 analyses the information that can be captured by observing the execution of the physical operators in a query plan and discusses the overhead of collecting this information. In Section 5, a case study of monitoring Polar* is presented. The capability to make predictions based on monitoring information, the overhead incurred, and the accuracy of such predictions are examined in this section. Section 6 presents the way existing adaptive query processors monitor and how this relates to the proposed method. Section 7 deals with the extra properties that can be determined when data is shared between operators on a specific node or is transmitted to other nodes. Section 8 discusses briefly the implementation issues.

1

2 What to monitor? In this work, monitoring the execution of a query plan is part of a broader procedure that aims at ensuring the good quality of the plan which is chosen for evaluation by the system. This procedure consists of three phases. At the first phase the monitoring takes place. The system gathers information about the query plan while it is being executed. This information belongs to two categories, according to the way it is obtained. The information that can be directly measured from the query plan’s execution belongs to the first category, while the information that is created by any kind of manipulation of the information of the first category, belongs to the second one. In such a way, raw monitored data (or measurements) are distinguished from higher level (or derived) information. An orthogonal classification of the monitored information is shown in Figure 1. Monitored information can be examined at three levels of abstraction. In terms of accuracy, information can either be approximate or accurate. For example, the raw information about the number of tuples produced by a physical operator, can be accurate easily by using a simple counter, in contrast with the timings, which are usually approximations. In terms of time of creation, information can be produced either after the monitored subject has finished its execution or during its execution. For example, the selectivity of a particular physical operator in a given system can only be known accurately after the physical operator has finished its execution, whereas it is possible to estimate values of the selectivity during the execution of the physical operator. Also, the information produced during the execution of the monitored subject, can either refer to its final state, as in the above example, or refer to its state at the point of monitoring. Monitored Information is

Accurate

Approximate

created

after execution

during execution refers to

current state

final state

Figure 1: Classification of monitored information At the second phase of the broader procedure, the assessment of the information gathered at the previous phase takes place. During that assessment, the monitored information is checked against system constants or against other monitored information, in order to decide whether any action needs to be taken. In that way, the information gathered is used for the qualitative analysis of the execution of the query plan. At the final phase, the response actions occur. Examples of possible actions are the modification of the query plan and the calibration of the cost model employed by the system in order to become more accurate.

2

The reader should note that monitoring can be used for other purposes as well, like obtaining feedback on execution progress and accounting.

2.1 Linking measurements to system objectives Measurement need: The query processor needs to have insight into the execution of a query plan Objectives of Query Processor: The query is evaluated in an optimal or near-optimal way, i.e. with the least cost. Under different circumstances, this may mean that the query is evaluated as quickly as possible, or as quickly as possible within a certain budget, or fitting better the time and budget user preferences. Methods: Development of a mechanism to estimate the cost of a query plan (in time units, in resource usage units, in economic cost units, etc.) Development of a mechanism to choose the least expensive query plan Review of actual versus estimated cost, and replanning of the query if necessary Check for new resources that have become available, and replanning of the query if necessary Measurement issues: How are deviations from the predicted cost handled? How are changes in the pool of available resources handled? If the estimated cost is based on assumptions, are they realistic? What happens if they are not? Measurement Goals: To monitor the availability of resources in order to make decisions with respect to the need for query replanning To monitor the actual versus the estimated cost of the query plan in order to make decisions with respect to need for query replanning To monitor the availability of resources in order to make decisions with respect to the refinement of the cost estimation procedure To monitor the actual versus the estimated cost of the query plan in order to make decisions with respect to the refinement of the cost estimation procedure

3

Questions: What resources are available? What was the cost so far? What is the expected cost from now on? Can it be improved? Were the actual and the estimated costs the same? If no, what was wrong in the way of estimating? Things to be measured: Information about the resources that can be handled by the query processor (e.g. CPU, load, memory, temporary disk space, disk I/O bandwidth, net bandwidth, permanent storage space for each participating physical node [30, 32, 39, 38, 29, 19]) Cost of the query plan or subplans (e.g., total execution time, economic cost, resource usage) Statistical information that can be handled by the query processor (e.g., cardinality of operator inputs and outputs, distribution of attributes and selectivity of operators [21, 40, 46, 31, 9, 20, 37, 24, 11, 30, 13], availability of indices [24, 19], statistics about User Defined Functions [21, 23, 12], access cost to remote sources [50], cost to initiate subqueries to remote sources [21, 40], etc.) Table 1: Example of the identification of the useful metrics after identifying the monitor’s objectives

Linking measurements to system objectives is a well-established method for monitoring systems and processes, formularised by [6], who introduced the Goal-Question-Metric graphs. Based on that work, [33] proposed some initials steps for designing a monitor. These steps are as follows: Preliminary step: The need for the measurements, the objectives that motivate the conducting of the measurements, the methods for achieving those objectives, and the issues raised by the measurements are identified in this step. The purpose of this step is first to ensure that the system processes and the measurement process share common and feasible objectives, and second to ensure that the measurement issues refer to traceable processes and relate to the system methods. Measurement goals: The issues derived by the preliminary step are mapped to quantifiable and unambiguous measurement goals. Questions: For each measurement goal, a list of questions is identified. The answers of these questions give insights into the achievement of the goals.

4

Things to be measured: The set of measurements is defined in such a way that the questions can be answered. Following that approach, the set of measurements is defined in a top-down way, which shields the system from collecting useless information. Applying this approach to the monitoring of the query execution helps to clarify the context and state clearly the objectives of monitoring, as is shown in Table 1. The cost of the query plan or subplans, and some statistical information can be monitored without conveying data. However, this is impossible for the resource information. In order to gather such information, monitoring the execution environment is necessary as monitoring only the physical operators is inadequate. An alternative approach is to define the set of metrics in a bottom-up way. In that case, the set of information that can be directly captured or inferred by monitoring the query execution without transmitting data, is defined without considering whether there are merits in gathering that information. This approach answers in full the question ”what can be monitored without having to convey data” and it is examined in the following sections. However, the set of derived information can be arbitrarily large. Moreover, the implementation of that approach is very likely to result in inefficient and low-performance solutions, due to the monitoring overheads.

3 Monitoring frequencies Before identifying the different metrics, it is interesting to enumerate the potential policies with regard to the frequency of the monitoring procedure. Some metrics need to be computed only once during the lifetime of a particular instance of a physical operator. Other information is inferred from observing each of the tuples that comprise the operator’s input separately. However, most of the time, the most interesting monitoring information in terms of usefulness and ease of exploitation, is aggregate statistics, like average values, sums, counters, minimums and maximums. Aggregate values can be taken in two ways. First, when a window is assumed and only the measurements that belong to that window are used for the evaluation of the aggregate value. The width of the window can be either in time units or in number of most recent tuples. Moreover, the windows can be overlapping or disjoint. For example, for a window comprising the last three tuples, in the former case the first window includes tuples 1 to 3, the second includes the tuples 2-4 and so on. If the windows are disjoint, the first window includes tuples 1-3, the second 4-6, etc. Second, the aggregate value of a metric can be evaluated by using all the history of that metric. The following table summarises the different granularities.

5

Number 1 2 3 4 5 6

Granularity per operator per item

in the last time units so far updated every time units for the last items so far updated every items

Table 2: Granularities of monitoring frequency. include !" , $#$ , %&' , "() and "*+ . Items can be tuples, pages, buckets of hash tables and buffers.

4 Metrics for query operators 4.1 Information that can be gathered regardless the kind of the operator Complementary to Section 2, Table 3 presents the quantifiable properties that are common to all the physical operators. The third column, which shows the potential frequencies of monitoring, should be read with reference to Table 2. The operators considered are those examined in [41] with the addition of the operation call [45]. These operators are sufficient for evaluating SQL and OQL queries of the Select-From-Where form in a parallel or distributed environment. They also cover the parallel logical algebra in [42] except the Nest, Union and Map operations. Name

-,-.0/21

-3 14.)56 / -3 ,7982:6

-614. "*!" '7 14.)56 / 7 ,)7?8:>6

Description number of tuples produced so far time elapsed since the operator was created time the operator is really active time waiting since last tuple (per input) time to process a tuple i.e. time to evaluate the next() function in the iterator model [18] size of an output tuple memory used number of tuples received (per input)

Gran. 2 1 2,5,6 2,5,6 2,5,6 2,5,6 3,4 2,3,4

Table 3: General measurements The information considered is the information that can be directly measured and the information derived from the measurements after performing aggregate functions on these measurements. For the former, the measurements are taken either by applying counters on computational entities that are already implemented in the query processor, or by capturing measurable aspects of those entities. Examples of such computational entities are the tuple programming 6

type and the conditions of the predicate of an operator. Examples of measurable aspects are the time required for commands or set of commands that implement the operator, and the size of tuples in bytes.

4.2 Operator-specific information For some operations, we can monitor further information, thus extending the set of measurements in Table 3, as shown below.

4.2.1 Operators that accept predicates An operator may accept a predicate as one of its input parameters. The predicate consists of one or more conditions. The tuple is produced only if it satisfies all those conditions. The operators that can accept predicates are the scans, joins, exchange and unnest. Table 4 summarises the monitored information with regard to the evaluation of predicates. Name @BA0C!D

E=,-.0D

Description Granularity number of conditions evaluated per predicate 2,5,6 time to evaluate a predicate 2,5,6

Table 4: Additional measurements for operators that evaluate predicates

4.2.2 Operators that retrieve tuples from store Operators that touch the store include the scans and particular implementations of joins in object environments (e.g. [44, 43]). Because the store format is usually different from the tuple format required by the query processor, a mapping between the two formats needs to take place. The monitored information that is relevant to this kind of operator is shown in Table 5. Name

@EA)C C =>/8.F

E=>/8.

-GH/0=

Description Granularity time to connect to source 1 number of pages read 2 time to read a page 2,5,6 time to map a tuple from store format to tuple format 2,5,6

Table 5: Additional measurements for operators that touch the store

7

4.2.3 Hash-join Hash join is executed in two phases. In the first phase, the left input is consumed and partitioned into buckets by hashing on the join attribute of each tuple in it. In the second phase, the same hash function is used to hash the tuples in the right input. The tuples of the right input are concatenated with the corresponding tuples of the left input by probing the hash table. Subsequently, the predicate is applied over the resulting tuple. The optimiser needs to ensure that the smallest input is placed as the left input. Table 6 presents the metrics that are particular to hash joins. Name !1I.5J6 ,7?8:6 K 7 L 7 M 7

:>/2F0:

Description Granularity size of a tuple in left input 2,5,6 size of a tuple in right input 2,5,6 size of the th bucket 2 cardinality of the th bucket 2 number of tuples of the right input that correspond to the th bucket 2 time to hash a tuple 2,5,6 Table 6: Additional measurements for the Hash-Join

4.2.4 Unnest The unnest operator takes as input a tuple with a -valued attribute (or relationship), and produces single-valued tuples. The cardinality of the collection attribute or relationship can be monitored, as shown in Table 7. Name N $O$@BA)1

Description cardinality of the collection attributes and relationships

Granularity 2

Table 7: Additional measurement for the Unnest

4.2.5 Exchange The exchange operator encapsulates the parallelism in multi-node environments. It performs two functions concurrently. First, it packs tuples into buffers and sends these buffers to other processors. Second, it receives packed tuples from incoming buffers and unpacks them. The measurements that provide insight into these functions are given in Table 8.

8

Name L @EA)CPF L =,-AD KRQ ; 5>5.0,-F K A)S.0,-:>.)/D !7 Q ;T5>5>.B,F BF .0C 6 Q ;T5>5>.B,F ,-0. @E.07?S.)D 5/7?1

E=%/@EU

-;%CV=>/2@EU

Description number of consumers number of producers size of buffers size of overhead per buffer size of input tuple number of buffers sent number of buffers received number of failures to send a buffer time to pack a tuple time to unpack a tuple

Granularity 1 1 1 1 2,5,6 2,3,4 2,3,4 2,3,4 2,5,6 2,5,6

Table 8: Additional measurements for the Exchange

4.3 Overhead of taking measurements This section examines the overhead of the measurements that are needed to monitor the operators comprising a query plan. As presented in the basic report, there are measurements that are common for all the operators, or specific to particular operators. Also, some of these measurements capture the same information as in existing adaptive query processors.

4.3.1 Data The data used in the experiments are from the OO7 benchmark [10]. Three sizes are used: tiny, small and medium (the fan-out factor is set to 3). The large size is not employed as we want to have all the data in main memory when we apply the query plan operators which are singlepass. In this way we maximise the proportion of the overhead incurred by monitoring. Please note that the tiny size, which is one twentieth of the small one, is not formally defined in [10].

4.3.2 Queries and way of measuring The following queries are used in the experiments: Q1: Q2: Q3: Q4: Q5:

Retrieve Retrieve Retrieve Retrieve Retrieve

500 tuples with average size 155 bytes 10000 tuples with average size 155 bytes 100000 tuples with average size 155 bytes 50 tuples with average size 727 bytes 500 tuples with average size 2K bytes 9

Q6: Retrieve 500 tuples with average size 20K bytes Q7a: Join result sets from Q1 and Q4 on one attribute resulting in 500 tuples (using a hash-join with 10 tuples per bucket in the hash table) Q7b: Join result sets from Q4 and Q1 on one attribute resulting in 500 tuples (using a hash-join with 1 tuple per bucket in the hash table) Q8a: Join result sets from Q2 and Q5 on one attribute resulting in 10000 tuples (using a hash-join with 20 tuples per bucket in the hash table) Q8b: Join result sets from Q5 and Q2 on one attribute resulting in 10000 tuples (using a hash-join with 1 tuple per bucket in the hash table) Q9a: Join result sets from Q3 and Q6 on one attribute resulting in 100000 tuples (using a hash-join with 200 tuples per bucket in the hash table) Q9b: Join result sets from Q6 and Q3 on one attribute resulting in 100000 tuples (using a hash-join with 1 tuple per bucket in the hash table) Q10: Project one attribute from the result set of Q1 Q11: Project one attribute from the result set of Q2 Q12: Project one attribute from the result set of Q3 Q13: Unnest a collection attribute with fanout 3 from the result set of Q1 Q14: Unnest a collection attribute with fanout 3 from the result set of Q2 Q15: Unnest a collection attribute with fanout 3 from the result set of Q3 Q16: Retrieve 500 tuples which contain a string field with size 23 characters Q17: Retrieve a tuple which contains a string field with size 100 characters Q18: Retrieve a tuple which contains a string field with size 100K characters Q19: Retrieve a tuple which contains a string field with size 100OK characters When a query is chosen to participate in a set of measurements, it runs several times to “warm-up” the system . Then it runs for 10 times. The minimum and the maximum value of 10

these ten times are not taken into consideration in order to reduce the deviation. The granularity of the timer used is 1 microsecond. The measurements are taken on a PC with CPU speed 1133 MHz and 512 MB memory (from which 330 - 370 MB are free at the time of the measurements). The operators are implemented in C++ according to the iterator model and are part of the Polar* query engine [45].

4.3.3 Measuring the Overheads The measurements fall in three categories: first, the ones that involve counters (e.g. cardinalities of input, output, hash table buckets), second, those that require timings, i.e., two timestamps are taken and their difference is computed (e.g., time to evaluate a tuple, time to change the tuple format from the storage format to the evaluator format), and third, the measurements of the size of a tuple. The size of the tuple is not known in two cases. First, when the tuple has one or more fields with string type and undefined length, and second, when there is collection attribute with undefined size. Measuring the size of a collection requires a counter. For measuring the size of a string of characters we experimented with two techniques. A naive one (in terms of implementation) that prints the tuple field that contains the string of characters to an output and counts its length, and another one that finds the tuple field in the tuple that contains a string and measures its length without printing it out into a buffer. Table 9 presents the overhead added in evaluating a single tuple when there is a counter, a timing and a character counter for strings. Inserting a counter in an operator has a very small overhead, less than a microsecond. The overhead of measuring timings is at the order of microseconds, which means that too many of them in an operator activated for every tuple processed can have a significant impact on performance. The time cost of measuring the size of a string of character is strongly dependent on the size of the string (Figure 2). For small strings, the overhead is small but for strings of 1MByte (e.g., when entire documents are kept as strings) this can become several milliseconds. Table 9 shows the overheads for some sizes that range from a few bytes to 1 MByte.

4.3.4 Relative Overheads Table 10 presents the extent of the overheads more clearly as it shows how much the cost of evaluating a tuple increases with monitoring. The overhead of a counter is negligible for all the operators. Placing two time-stamps is more costly than projecting an attribute, but the proportion of its cost is relatively low for other operators (between 0.32% and 10.92%). Measuring the size of a string has no cost essentially if the string is around 20 bytes. However, if the size is 100 characters, the performance degrades significantly when the size is monitored for each tuple processed by the operator. For instance, it may increase the cost of a hash join joining 50 tuples with 500 tuples by 144%. If the size is counted only for one tuple in ten, the increase

11

5

10

buffer no buffer 4

10

3

overhead (microseconds)

10

2

10

1

10

0

10

−1

10

−2

10

1

10

2

10

3

4

10

5

10

10

6

10

size (bytes)

Figure 2: The overhead for measuring the size of a string of characters with and without using a buffer.

Metric

avg/tuple stddev

counter 0.03 timing 1.11 size - buffer - 23 bytes 0.57 size - buffer - 100 bytes 38.38 size - buffer - 100Kbytes 543.25 size - buffer - 1MByte 14973.63 size - no buffer - 23 bytes 0.00 size - no buffer - 100 bytes 11.88 size - no buffer - 100Kbytes 225.13 size - no buffer - 1MByte 4374.25

0.21 0.58 0.01 0.74 6.30 81.99 0.01 0.35 0.35 9.85

Queries Q1 - Q12 Q1 - Q12 Q16 - Q19 Q16 - Q19 Q16 - Q19 Q16 - Q19 Q16 - Q19 Q16 - Q19 Q16 - Q19 Q16 - Q19

Table 9: The overhead of taking measurements in microseconds ( W!XPZY[WP\]^_PP ).

12

Operator

avg/tuple (in X`PPT )

stddev

Queries

counter (%)

timing (%)

size-no buffer100bytes

SCAN - 155bytes SCAN - 727bytes SCAN - 2Kbytes SCAN - 20Kbytes HASH-JOIN - 1 HASH-JOIN - 10 HASH-JOIN - 20 HASH-JOIN - 200 PROJECT UNNEST (fan-out 3)

16.82 25.50 48.81 350.57 8.22 13.02 16.25 62.86 0.89 10.16

0.87 0.04 0.05 0.01 0.84 0.13 0.16 0.20 0.09 0.15

Q1-Q3 Q4 Q5 Q6 Q7b-Q9b Q7a Q8a Q9a Q10-Q12 Q13-Q15

0.18% 0.12% 0.06% 0.01% 0.36% 0.23% 0.18% 0.05% 3.39% 0.30%

6.60% 4.35% 2.27% 0.32% 13.50% 8.52% 6.83% 1.77% 125.27% 10.92%

70.63% 46.60% 24.34% 3.39% 144.52% 91.22% 73.11% 18.90% 1340.71% 116.88%

Table 10: The overhead of taking measurements for each tuple compared to the cost of the operators. The fourth column shows on which queries the cost of the operator is based. The last three columns show the increase in the cost when a counter, a timing and a character counter for a 100-byte string (when no buffer is used) are applied respectively. The values are in microseconds. The number in the first column indicates the average tuple size for scans, the number of tuples per bucket in the hash table for hash joins, and the cardinality of the collection attribute for unnests.

falls to 14.4%. The values in Table 10 can provide some insight in choosing efficient monitoring frequencies.

4.4 Summary Some general remarks are as follows: There are three types of monitored information: counters, timings, and lengths of strings of characters. The overhead of these three types is not dependent on the type of the query operator. The cost of a counter and computing a timing is constant for a given system, whereas the cost of measuring the size of a string depends on its size. The cost of a counter is negligible for all the operators examined. However, this is not true for timings and string lengths.

13

5 An example of monitoring: Monitoring Polar* Polar* [45] uses an optimiser1 that estimates the size of the intermediate results for each physical operator participating in the query plan. The plan that produces the minimum size of intermediate result is chosen, using a greedy bottom-up heuristic [14]. The optimiser can be extended to support a cost model, like e.g., the one described in [41]. In this cost model, a cost in time units is assigned to each physical operator. The optimiser assumes that there is basic knowledge about the size and the cardinalities of the base relations or extents, and also about some system variables. In this context, a monitoring mechanism can be effective if it can provide information about the actual cardinalities and sizes of the intermediate results, and about the execution time of the physical plan. Moreover, the granularity should be the same, i.e. if the optimiser estimates the execution time and the size of the results produced for each physical operator, then the monitored information should provide support for that knowledge as well. By doing that, the optimiser can annotate the query plan with its estimates about the output cardinalities, the output sizes and the time cost of each physical operator. Then, the monitored information can be used to detect deviations from the predicted performance. Detecting such deviations is the main responsibility of the monitoring process, but it makes little sense if it does not relate to a higher level goal. Such a goal could, for example, be the refinement of the cost model (Table 1). In this case, the formulas and data that are used by the cost model must be identified, and the monitored information should enable checking of the validity of the formulas and of any assumptions made. Appropriate attention should be given to the fact that a cost model may prove to be unrealistic due to the overhead incurred by the monitoring process. Another goal may be the prevention of an unacceptable and avoidable deviation in performance, which is mainly the objective of adaptive query processing. This goal can be rephrased as the capability to estimate with adequate accuracy the cost of the query plan, based on the monitored information that has become available up to that point. In the following subsections, the main physical operators evaluated by the Polar* query processor, are considered separately. More specifically, it will be verified whether a deviation from the initial expectations can a) be detected and b) predicted on-the-fly, without conveying data between operators or other nodes. Also, it will be examined which aspects (if any) of the cost model employed, can be refined, under the same constraint. In such a way, it will be examined to what extent the measurement goals of Table 1 can be met, with no communication overhead.

5.1 Monitoring Sequential Scans Table 11 shows the information that can be deduced by observing and analysing the monitoring data during the evaluation of the Sequential Scan physical operator. The acquisition column (Acq) refers to the way the optimiser obtains the relevant information to produce the initial query plan. This can be from the data dictionary (DD) which holds the metadata, or it is assumed to 1

This optimiser is an extended version of the optimiser used in [45]

14

be a constant characterising the system (SC), or it is estimated by using predefined methods and formulas of the cost model (estd). time cost

output cardinality

map objects

evaluate predicate

map a type attributes per type

evaluate a condition

number of conditions

selectivity

input cardinality

output size

input size

read pages

number of pages seek + latency rotation time

Figure 3: The hierarchy of the information monitored for sequential scans. The information in the ellipses is the cost of the operator according to the optimiser. The different kinds of information in circles are the subcosts, which are combined to evaluate the total cost. The rectangles show what lower-level information is needed, to calculate the costs and the subcosts. With reference to Table 1, the things that need to be measured for sequential scan are the cost of the operator, and any statistical information that i) is related to sequential scans, ii) can be processed by the query engine, and iii) does not incur a communication overhead. The cost of the operator is measured in i) number of tuples produced, ii) size of tuples produced, and iii) time units. These are shown in boldface in the table. The cost in time units, is decomposed in three costs [41]. First, the cost to map objects from store format in tuple format. Second, the cost to evaluate the predicate for each tuple, and third, the cost to read the pages from the disk. The subcosts are shown in small capital letters. The statistical information that is to be measured, is the information needed to evaluate the different costs and subcosts. Figure 3 shows how the different pieces of information are combined, in order to evaluate the costs and subcosts. All the costs, subcosts, and relevant statistical information can be monitored without having to convey data. Hence, all of them are examined. The last three columns of the Table 11 show, respectively, whether the relevant variable is monitored so that its value is available at any time (the column should be read with reference to the Tables 3 - 8), whether the total value of that variable before the end of the execution can be evaluated with certainty, and whether its final value can be predicted. In the query processor examined, the three optimisation criteria, that is the cardinality of the intermediate results, their size and the time cost of each operator are annotated on the query plan as parameters of the relevant physical operator. As shown in the table, for sequential scans, all three are monitored, and consequently, deviations from expected behaviour are detectable during the evaluation of the operator.

15

Description

Acq.

cardinality of output cardinality of base extents selectivity of predicate size of output size of base extents cost in time units

estd DD SC estd DD estd TIME TO MAP OBJECTS IN TUPLES estd time to map a type SC number of attributes per type in object DD TIME TO EVALUATE THE PREDICATES estd number of conditions per tuple DD time to evaluate a one-condition pred. SC TIME TO READ THE PAGES estd number of pages DD seek + latency time of disks SC rotation time of disks SC

Monitoring eval. total est. total

so far 7 C C!a Kcb Med

no

Kcb Mgd no no Kcb Med no no Kcb Med i=>/8.F

no no

f

GH/0=hf

J=,-.0D f

=>/82.Jf

no N/A no no N/A no no no N/A no no no no N/A no no

yes N/A yes yes N/A yes yes no N/A yes yes no no N/A yes yes

Table 11: Information derived from the monitoring of the Sequential Scan physical operator

Moreover, the monitoring provides the data necessary to predict such deviations, as shown in the last column. More specifically, the optimiser assumes that the exact selectivity of the operator is not known at compile time, and a constant is used instead. Based on the monitored C data, the selectivity can be derived from the formula jkY C!a , where and '7 are the cardinality of the output and the input respectively (Table 3). A new estimate for the cardinality of the result can be made by multiplying the monitored selectivity by the cardinality of the stored extent. In the same way, the total size of the result can be predicted, as it is the product of the monitored selectivity and the size of the stored extent. There is an additional way to estimate the total size, by using the information about the average size of the results produced up to that point: Kcb Med N Kcb Med N lnmol &f_u $hO$)RvEB wtuPj f_u $O$)vx0 w qpsrtY Y 7 In the first case, there is an assumption that the average size of a tuple in the results is equal to the average size of the tuples in the base extents, which may lead to wrong estimates. The estimation of the total execution time is a slightly more complicated task. According to [41], the cost can be divided into the cost for mapping objects, evaluating the predicates and reading the pages. As the time to map a single tuple is monitored and the cardinality of the base extents is known, the estimated total time for the mapping can be found by multiplying these 16

two quantities. Following the same approach, the total time for evaluating the predicates can be estimated by multiplying the base cardinality by the average time to evaluate a predicate. The time to read the pages is Kcb Med d

E=>/8.fsYy F0.0.)Usz{ 14/6|.0C @x}sz~=>/8.FHuP -,-A06|/6!)%&!'"*PO ¥ ³ ¥ ³ ³ d± L M =>/7?,FY /,-D7?8:6 7u 7Ef_u´ C © a¹ª« § 7