This paper analyzes properties of a data storage hierarchy system .... For example, Figure 3 illustrates the basic reference cycle to handle a reference.
Properties of Storage Hierarchy Systems with Multiple Page Sizes and Redundant Data CHAT-YU LAM and STUART E. MADNICK Massachusetts Institute of Technology
The need for high performance, highly reliable storage for very large on-line databases, coupled with rapid advances in storage device technology, has made the study of generalized storage hierarchies an important area of research. This paper analyzesproperties of a data storage hierarchy system specifically designed for handling very large on-line databases. To attain high performance and high reliability, the data storage hierarchy makes use of multiple page sizes in different storage levels and maintains multiple copies of the same information across the storage levels. Such a storage hierarchy system is currently being designed as part of the INFOPLEX database computer project. Previous studies of storage hierarchies have primarily focused on virtual memories for program storage and hierarchies with a single page size across all storage levels and/or a single copy of infprmation in the hierarchy. In the INFOPLEX design, extensions to the least recently used (LRU) algorithm are used to manage the storage levels. The read-through technique is used to initially load a referenced page of the appropriate size into all storage levels above the one in which the page is found. Since each storage level is viewed as an extension of the immediate higher level, an overflow page from level i is always placed in level i + 1. Important properties of these algorithms are derived. It is shown that depending on the types of algorithms used and the relative sizes of the storage levels, it is not always possible to guarantee that the contents of a given storage level i is always a superset of the contents of its immediate higher storage level i - 1. The necessary and sufficient conditions for this property to hold are identified and proved. Furthermore, it is possible that increasing the size of intermediate storage levels may actually increase the number of references to lower storage levels, resulting in reduced performance. Conditions necessary to avoid such an anomaly are also identified and proved. Key Words and Phrases: database computer, very large databases, data storage hierarchy, storage management algorithms, inclusion properties, modeling, performance and reliability analysis CR Categories: 4.3, 4.33, 5.2, 6.22, 6.34
1. INTRODUCTION
Two- and three-level memory hierarchies have been used in practical computer systems 15, 9, 131. However, there is relatively little experience with general hierarchical storage systems. Rapid advances in storage technology coupled with the need 'for high performance, highly reliable on-line databases make the idea of Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. This material is based on work supported in part by the National Science Foundation under Grant MCS77-20829. Authors' address: Center for Information Systems Research, Alfred P. Sloan School of Management, Massachusetts Institute of Technology, 50 Memorial Drive, Cambridge, MA 02139. 0 1979 ACM 0362-5915/79/O900-0345 $00.75 ACM Transactions on Database Systems, Vol. 4, No. 3, September 1979, Pages 345-367
346
*
C-Y. Lam and
S.E. Madnick
using a generalized storage hierarchy as the repository for very large shared databases very attractive. One major area of theoretic study of storage hierarchy systems in the past has been the optimal placement of information in a storage hierarchy system. Three approaches to this problem have been used: (1) Static placement [l, 4, 221, an approach that determines the optimal placement strategy statically at the initiation of the system; (2) dynamic placement [7,16], an approach that attempts to place information in the hierarchy optimally, taking into account the dynamically changing nature of access to information; and (3) information structuring [ l l , 141, an approach that manipulates the internal structure of information so that information items that are frequently used together are placed adjacent to each other. Another major area of theoretic study of storage hierarchy systems has been the study of storage management algorithms [2,3, 8, 10, 17, 211. Here the study of storage hierarchy and the study of virtual memory systems for program storage have overlapped considerably. This is largely due to the fact that most of the studies of storage hierarchies in the past have been aimed at providing a virtual memory for program storage. These studies usually do not consider the effects of multiple page sizes across storage levels, or the problem of providing redundant data across storage levels. These considerations are of great importance for a storage hierarchy designed specifically for very large databases. Madnick [15,18,19] proposed the design of a generalized storage hierarchy for large databases that makes use of multiple data redundancy against failure and multiple page sizes in different storage levels for high performance. Such a storage hierarchy system is to be used in the INFOPLEX database computer [12,201. Conceptually, the INFOPLEX database computer consists of a functional hierarchy and a physical (storage) hierarchy (see Figure 1). The functional hierarchy implements all the information management functions of a database manager, such as query language interpretation, security verification, and data path accessing. In INFOPLEX, the functional hierarchy is implemented using multiple microprocessors. Both pipeline and parallel processing are exploited to realize high performance and high reliability. To support the storage requirements of the functional hierarchy, INFOPLEX makes use of a generalized data storage hierarchy system. In this paper we extend this work by developing a model of the data storage hierarchy, proposing extensions to the least recently used (LRU) algorithm for managing the storage hierarchy, and deriving important properties of the data storage hierarchy. 2. MODEL OF A DATA STORAGE HIERARCHY
A data storage hierarchy consists of h levels of storage devices, M', M2, . . . ,Mh. The page size of M iis Qi,and the size of M' is rn, pages each of size Qi.Qi is always an integral multiple of Qi-',for i = 2, 3, .. ., h. The unit of information transfer between M' and Mi+' is a page, of size Qi.Figure 2 illustrates this model of the data storage hierarchy. All references are directed to M'. The storage management algorithms automatically transfer information among storage levels. As a result, the data storage ACM Transactions on Database Systems, Vol. 4, No.3, September 1979.
-
Properties of Storage Hierarchy Systems
347
I
I
I-----
It--\
V ir tu a I storage
-acfenrti./::
I-s
L_ - - _ _ _ _ _ _ - - - - - -
torage hierarchy
J
Fig. 1. INFOPLEX database computer conceptual organization
References
3Ecommon data path
t
Unit of data transf r between M and M Size 0 U n i t of data transfer between M2and M3
r ge o sizeQ3
I
L Fig. 2. Model of a data storage hierarchy system
ACM Transactions on Database Systems, Vol. 4, No. 3, September 1979.
340
-
C-Y. Lam and S. E.
Madnick
hierarchy appears to the reference source as an M' storage device with the size of Mh. As a result of the storage management algorithms (to be discussed next), multiple copies of the same information may exist in d i e r e n t storage levels. 2.1 Storage Management Algorithms
We shall focus our attention on the basic algorithms to support the read-through [18] operation. Algorithms to support other operations can be derived from these basic algorithms. In a read-through, the highest storage level that contains the addressed information broadcasts the information to all upper storage levels, each of which simultaneously extracts the page (of the appropriate size) that contains the information from the broadcast. If the addressed information is found in the highest storage level, the read-through reduces to a simple reference to the addressed information in that level. Figure 3 illustrates the read-through operation. Note that in order to load a new page into a storage level an existing page may have to be displaced from that storage level. We refer to this phenomenon as overflow. Hence the basic reference cycle consists of two subcycles, the readthrough cycle (RT) and the overflow handling cycle (OH), with RT preceding OH.
Reference to pope P:,
Pope containing pia
rfkw from
/
M
'
Overflow from M 2
MI
I
9 I
RffERNCE C Y C L E
-
I
Fig. 3. Illustration of the read-through operation ACM Transactions on Database Systems, Vol. 4, No. 3, September 1979.
Properties of Storage Hierarchy Systems
*
349
Reference to P
r-
+:I
READTHROUGH
1 MI
I
M2
1
M"
1 MX i s the highest
I--
A L4
All these levek are not affected
r
level where P i s found
Ma+l
I
Mh
I
Fig. 4. LOCAL-LRU
For example, Figure 3 illustrates the basic reference cycle to handle a reference to the page P:a. During the read-through (RT) subcycle, the highest storage level (M")that contains P:a broadcasts the page containing Pla to all upper storage levels, each of which extracts the page of appropriate size that contains P:, from the broadcast. As result of the read-through, there may be overflow from the storage levels. These are handled in the overflow-handling (OH) subcycle. It is necessary to consider overflow handling because it is desirable that information overflowed from a storage level be in the immediate lower storage level, which can then be viewed as an extension to the higher storage level. One strategy of handling overflow to meet this objective is to treat overflows from M ias references to Mi+'. We refer to algorithms that incorporate this strategy as having dynamic overflow placement (DOP). Another possible overflow handling strategy is to treat an overflow from M ias a reference to Mi" only when the overflow information is not already in Mi". If the overflow information is already in M+', no overflow handling is necessary. We refer to algorithms that incorporate this strategy as having static overflow placement (SOP). Let us consider the algorithms at each storage level for selecting the page to be overflowed. Since the least recently used (LRU) algorithm [S, 211 serves as the basis for most current algorithms, we shall consider natural extensions to LRU for managing the storage levels in the data storage hierarchy system. Consider the following two strategies for handling the read-through cycle. First, let every storage level above and including the level containing the addressed information be updated according to the LRU strategy. Thus, all storage levels lower than the addressed information do not know about the reference. This class of algorithms is called the LOCAL-LRU algorithm and is illustrated in Figure 4. The other class of algorithms that we consider is called the GLOBAL-LRU algorithm. In this case all storage levels are updated according to the LRU ACM Transactions on Database Systems,Vol. 4, No. 3, September 1979.
350
-
C-Y. Lam and S. E. Madnick Reference to P
I READTHROUGH
PIb I
A
1 M2
I MX i s the highest level where P i s found
These levels a r e also updated as if reference to P were made to them
Mx+I
I
Mh
I
c
Fig. 5. GLOBAL-LRU
strategy whether or not that level actually participates in the read-through. This is illustrated in Figure 5. Although the read-through operation leaves supersets of the page P:, in all levels, the future handling of each of these pages depends on the replacement algorithms used and the effects of the overflow handling. We would like to guarantee that the contents of each storage level, Mi, is always a superset of its immediately higher level, Mi-'. This property is called multilevel inclusion (MLI).Conditions to guarantee MLI are derived in a later section. It is not difficult to demonstrate situations where handling overflows generate references which produce overflows, which generate yet more references. Hence another important question to resolve is to determine the conditions under which an overflow from M iis always found to already exist in Mi+', i.e. no reference to is generated as a result of the overflow. This storage levels lower than Mi+' property is called multilevel overflow inclusion (MLOI).Conditions to guarantee MLOI will be derived in a later section. We shall consider these important properties in the light of four basic algorithm alternatives based on local or global LRU and static or dynamic overflow. Formal definitions for these algorithms are provided after the basic model of the data storage hierarchy system is introduced. 2.2 Basic Model of Data Storage Hierarchy
For the purposes of this paper, the basic model illustrated in Figure 6 is sufficient to model the data storage hierarchy. As far as the read-through and overflowhandling operations are concerned, this basic model is generalizable to an h-level storage hierarchy system. ACM Transactions on Database Systems, Vol. 4, No. 3, September 1979.
Properties of Storage Hierarchy Systems
*
351
R e f erellces
Fig. 6. Basic model of a data storage hierarchy system
M’ can be viewed as a reservoir which contains all the information. M i is the top level. It has mi pages each of size Qi. Mi ( j= i 1)is the next level. It has mj pages each of size nQi where n is an integer greater than 1.
+
2.3 Formal Definitions of Storage Management Algorithms
Denote a reference string by r = “rl,r2, . . . , rn,”where rt (15 t 5 n) is the page being referenced at the tth reference cycle. Let S,’ be the stack for M i at the beginning of the tth reference cycle, ordered according to LRU. That is, St‘ = ( S t i ( l )St’(2), , . .. , S t i ( K ) )where , St’(l)is the most recently referenced page and S t i ( K )is the least recently referenced page. Note that K Imi (mi = capacity of M’ in terms of the number of pages). The number of pages in Sti is denoted as I Sti1; hence I Sli I = K . By convention, S1’= 0, I S1’I = 0. St‘ is an ordered set. Define M,’ as the contents of S,’ without any ordering. Similarly, we can define StJand Mt’ for MJ. Let us denote the pages in Mi by Pj,PzJ,. . . . Each page Py3 in MJ consists of an equivalent of n smaller pages, each of size Qi = Q,/n. Denote this set of pages by (Py’)i;i.e. (Py’)i= {Pfl,P$, . . . ,Pi}.In general, (Mt‘)1is the set of pages, each of size Qi, obtained by “breaking down” the pages in Mt’. Formally, = U L 1 (St’(k))’ where x = I StJ1. (Py’)i is called the family from the parent page Py’. Any pair of pages Pia and Pfb from (PyJ)i are said to be family equivalent, denoted by Pfa P f b . Furthermore, a parent page PyJand a page Piz (for 15 z 5 n) from its family are said to be correspondingpages, denoted by P;zoA PyJ. Sti and St’ are said to be in corresponding order, denoted by St’ = Sl’, if Sti(rZ)A StJ(K)for K = 1,2,3, . . . , w,where w = min(I Sti I, I StJI). Intuitively, two ACM Transactions on Database Systems,Vol. 4, No.3, September 1979.
352
C-Y. Lam and S. E. Madnick
stacks are in corresponding order if for each element of the shorter stack, there is a corresponding page in the other stack at the same stack distance. (The stack distance for page St'(k) is defined to be k.) Mti and M? are said to be correspondingly equivalent, denoted by Mt' A Mt' if IMti I = IM i I and for any k = 1,2, . . ., IMt' I there exists x, such that St'(k) = St'(x) and Sti(x) St'(y) for all y # k. Intuitively, the two memories are correspondingly equivalent when each page in one memory corresponds to exactly one page in the other memory. The reduced stack, &',of St'isdefined to be st'(k)= St'(jk) for k = 1,. . . ,I I where j, is the minimum j k , where j k > jh-l(jo = 0) and &'(k) 4 S t i ( j )for j mi. THEOREM 4. Under GLOBAL-LRU-DOP, for any mi 2 2, Vr, t, ( M t J ) 2 i Mt' i f f m, 2 2mi. THEOREM 5. Under GLOBAL-LRU-SOP, for any mi 2 2, Vr, t, an overflow from M i finds its corresponding page in Mi i f fmj > mi. THEOREM 6. Under GLOBAL-LRU-DOP, for any mi 2 2, V r , t, an overflow from M i finds its corresponding page in M J iffmj > 2mi. THEOREM 7. Let M i (withmi pages),M J (with mjpages), and M be system A. Let M" (with mi pages), M'J (with mj pages), and M' be system B. Let mi 2 mi and mj 2 mi. Under GLOBAL-LRU-SOP, for any mi 2 2, no MLPA can exist if mj > mi and mj > mi. THEOREM 8. Let system A and system B be defined as in Theorem 7. Let
a
ACM Transactions on Database Systems, Vol. 4, No. 3, September 1979.
358
C-Y. Lam and S. E. Madnick
mi 2 mi and mj 2: m,. Under GLOBAL-LRU-DOP, for any mi L 2, no MLPA can exist if mj > 2mi and m$ > 2ml. 3 . 2 Derivation of Properties
THEOREM 1. Under LOCAL-LR U-SOP, or LOCAL-LRU-DOP, or GLOBALLRU-SOP, or GLOBAL-LRU-DOP, for any mi ? 2, mj 5 mi implies 3r, t, ( M i ) i2 M t i . PROOF Case 1. mj < mi. Consider the reference string r = "Pis, Pi,, ... , Ptm,+l),”. Using any one of the algorithms, the following stacks are obtained at t = mj + 2:
St’ = (Pfmj+l)a, pija, . . ., P6a, Pla),
Sr‘
(Pjmj+l), PLj,
-
-9
P3J9P2J).
Thus, Pila E Mti, but Pia !Z (Mt;)i;i.e. (Mt’)i2 Mi. Case 2. m, = mi = w. Consider the reference string r = “Pta, Pha, . . ., Pfw+l)a”. Using any one of the above algorithms, the following stacks are obtained at t = w + 2:
Sti = (Pfw+l)a, Pta, . . ., FL,Pia),
StJ= (PI39 P{w+1),Pw’,
PdJ,~ ‘ 3 ’ ) . Q.E.D. i.e. (Mt’)’2 Mti. Thus, Pia E M t i , but Pia 62 THEOREM 2. Under LOCAL-LRU-SOP or LOCAL-LRU-DOP, for any mi 2 2 and any m,,3r, t, ( M t J ) 2 i Mti. PROOF (For LOCAL-LRU-SOP). For m, I: mi the result follows directly from Theorem 1. For m, > mi, using the reference string r = “Pizap P ila ,P iz a , P$a,. . ., Pfa,Pkja”,the following stacks will be produced at t = 2m;+ 1:
st’ = (PLja,Pfa,Pfmj-l)a, ., p i m j - m i + 2 ) a ) ,
* *
a ,
--
S / = ( P i j ,PLj,_,,
-9
PZJ,
Q.E.D. Thus Pia E Mti, but P f , !Z (A&’)’, i.e. ( M J ) i2 Mti. PROOF (For LOCAL-LRU-DOP). For m, Imi the result follows directly from Theorem 1. For m, > mi, using the reference string
r =“
~ 6 , ,Pi,, Pia,Pba, . . ., Pfa,PLj2’,
the following stacks will be produced at t = 2mj + 1: Sti = ( P L , ~pia, , . . ., Pfmj-mi+2)a), S/ = (a1, a2, . . ., amj)
where for 1 Ii Im,, ai E {Pi,,, Pk,-,, . ., PB’, Pz’,P I ’ } , since P,’ is the only overflow from Mi. Thus, P & E Mli, but P f a fZ (Ml’)i;i.e. (Ml’)i 2 M I L . .Q.E.D. THEOREM 3. Under GLOBAL-LRU-SOP, for any mi 2 2, Vr, t, ( M i ) i2 Mt’ iff m, > mi. PROOF. This proof has two parts: part (a) to prove Vr,t, 2 Mti * mj > mi,or equivalently, m, 5 mi * 3r, t, ( M i ) i2 Mti;part (b) to prove mj > mi Vr, t, 2 kfti. Part ( a )of the Proof I
mj Imi * 3r, t, (Mt’ ) i 2 Mt’. This follows directly from Theorem 1. For part (b) of the Proof we need the following results. ACM Transactions on Database Systems, Vol. 4, No.3, September 1979.
Q.E.D.
Properties of Storage Hierarchy Systems
-
359
LEMMA 3.1. Vr, t such that IMtJI 5 mi, if m, = mi + 1, then ( a ) ;1ML’ and (b) 2= St’. PROOF OF LEMMA 3.1. For t = 2 (i.e. after the first reference), (a) and (b) are true. Suppose (a) and (b) are true for t, such that I M i I Imi. Consider the next reference: Case 1. It is a reference to M i . There is no overflow from M or M’, so (a) is still true. Since GLOBAL-LRU is used, (b) is still true. Case 2. It is a reference to M’. There is no overflow from M’.If there is no overflow from M’,the same argument as Case 1applies. If there is overflow from M i , the overflow page finds its corresponding page in MJ.Since SOP is used, this overflow can be treated as a “no-op.” Thus (a) and (b) are preserved. Case 3. It is a reference to M‘. There is no overflow from M’ since W{+II 5 Q.E.D. mi. Thus the same reasoning as in Case 2 applies. LEMMA 3.2. V r , t, such that 1 Mt’I = m,, if m; = mi + 1 then ( a ) 3 Mt’, ( b ) Sti StJ,and ( c ) (St’(m,))in S,’ = 0.Let us denote the conditions ( a ) ,( b ) , and ( c )jointly as Z ( t ) . PROOF OF LEMMA 3.2. Suppose that the first time St’(mj) is filled it is by the t * th reference. That is, St’( m,)= 0 for all t 5 t * and St’( m;)# 0 for all t > t *. From Lemma 3.1 we know that (a) and (b) are true for all t 5 t*. Let t , = t* + 1, t z = t * + 2, . . ., etc. We shall show by induction on t, starting at tl , that Z( t) is true. First we show that Z ( t l )is true as follows: Case 1. M $ .& ~ f * .
sti
sf. 2 S{. and M { .
Mf. S{.(m, - 1) f Sf.(m,). As a result of the reference at t* (to M “ ) ,S{~+~(rn;) = S$(m; - 1) andsf. (mi) overflows from M i . This overflow page finds its corresponding page in Mibecause there is no overflow from M iand (a). Since SOP is used, the overflow from M i can be treated as a “no-op.” Furthermore, since GLOBAL-LRU is used, (b) is I > 1 S$+lI * (a) and (c). Thus Z(t1)is true after the t*th reference. (b) and true.
-
Case 2. ( M i * ) 3 ’ Mi* and Mi. 4 Mi*.
(M’,.)’3 Mi* and M{. 4 Mi.
3 S $ ( k ) such that
(S:*(K))’n Mi. = 0.
S;. P S/,* and ( S { * ( k ) )n’ Mi. = 0 * K > 1s;.I and (S/,*(x))’ r l Mi* = 0 for all x, where mJ-l 2 3c L k. Thus (S{*(mJ-l))’ n S:. = 0 (i.e. the last page of S/,*is not in St*).S f * ( m ,overflows ) from M‘. There is no overflow from M’. Thus the overflow page from M‘ finds its corresponding page in M’. For the same reasons as in Case 1, (b) is still preserved. (b) and IS++I I > ISi*+lI =$ (a) and (c) are true. Thus Z(t1) is true. Assume that Z ( t k ) is true; to show that Z ( t k + l ) is true, we consider the next reference, at time tk+l . Imagine that the last page of S{, does not exist, i.e. S{, (m,) = 0.If the reference at tk+l is to a page in M:, or Mi,, then (a) and (b) still hold because GLOBALLRU is used and because overflow from M’ finds its corresponding page in M’ (see the proof of Lemma 3.1). If the reference at &+I is to a page not inM:, , then ACM Transactions on Database Systems, Vol. 4, No 3, September 1979.
360
*
C-Y. Lam and S. E.
Madnick
we can apply the argument used in considering the reference at time tl above to Q.E.D. show that Z(tk+l)is still true. LEMMA 3.3. Vr, t, if m; = mi + 1, then ( a ) (Mt’)l 2 Mti and ( b ) (St’(mj))i n Sti = 0. PROOF OF LEMMA 3.3. For
t such that IMlI 5 mi (a) follows directly from Lemma 3.1 and (b) is true because Sc(m;)= 0. For t such that I MtJI = m; (a) and Q.E.D. (b) follow directly from Lemma 3.2. 2 Mt’ and ( b ) (St’(mj))ifl St’ = LEMMA 3.4. Vr, t, z f mj > mi,then ( a ) 0. PROOF OF LEMMA 3.4. Let m; = mi + k . We shall prove this lemma by induction on h. For h = 1 (a) and (b) are true from Lemma 3.3. Suppose that (a) and (b) are true for K. Consider m; = mi + ( k + 1).That is, consider the effects of increasing M’ by 1 page in size. Since M iis unchanged, MJ(with mi + k + 1 pages) sees the same reference string as MJ(with mi + k pages). Applying the stack inclusion property [ 2 1 ] , we have M i(with mi + K + 1 pages) 1 M’ (with mi + k pages). Thus (a) is still true. Suppose (&’(mi + K + 1))’ f l St’ # 0, then there is a page in M ithat corresponds to this page. But S i ( m i + k + 1 ) is not in MJ (with mi + k pages). This contradicts the property that (Aft;)’2 Mj. This shows that (b) is still true. Q.E.D. Part ( b )of the Proof m; > mi * Vr, t, (
~ 22 Mti. ) ~
This follows directly from Lemma 3.4. THEOREM 4. Under GLOBAL-LRU-DOP, for any iff m, 2 2mi. PROOF. This proof has two parts.
mi
2 2,
Q.E.D. Vr, t, (MtJ)’ 2 Mti
a Mi.
Part (a): m; < 2mi * 3r, t,
Part (b): m;2 2mi * Vr, t, ( M i ) i2 Mi. Part ( a )of the Proof m; < 2mi * 3r, t, (Mt’)i 2 Mti.
For m; 5 mi the result follows from Theorem 1. Consider the case for 2mi > m; > mi. The reference string r = “Pi,, P i a ,P & , . . ., PfZm,)a’’ will produce the following stacks: Sti = (Pf2mi)a, PtPm,-l)o,. .
- 9
Pirn,+l)a),
S/
=
(al, ~
2 ~, 3
. ., ., antj)
where ai‘s are picked from L1 and LS alternatively, starting from L1. L1 = (PLi,P 2x, the induction statement is not violated.
Case 2. P i a Mt’, P,’ E MtJ,Each page in M increases its stack distance by 1. Each corresponding page in M ican at most increase its stack distance by 2, one due to the reference and one due to an overflow from Mi. Hence if Pi, = S t i ( k ) ,k < mi, then Pf, = Sf+l(k l), and P,’can be found within stack distance 2 ( k 1) in M iat time t + 1. Case 3. P i , (Z M t i , P,’ MtJ.As a result of the read-through from M’.each page in M i is increased by a stack distance of 1. That is, for k < mi,
+
+
Pta = Sti(k)* Pfa = S f + l ( k+ 1). ACM Transactions on Database Systems, Vol. 4, No. 3, September 1979.
362
-
C-Y. Lam and S. E. Madnick
Each page in MJ can at most increase its stack distance by 2, one due to loading the referenced page and one due to an overflow from M i . Hence, the page P,' is found within stack distance of 212 + 2 in M'. Since max(2K + 2 ) = 2mi Im;, PzJ is still in MJ. Q.E.D. COROLLARY TO LEMMA 4.1
m, > 2mi * ~ rt, ,(St'(m,))' n Sl = 0. PROOF OF COROLLARY. For any Pi, in S,', its corresponding page can be found within stack distance 2mi in St', and since pages in St' are unique, the information in the last page of S i is not found in SA i.e. (S{(m;))'n S,' = 0. Part ( b )of the Proof m;I2mi * Vr, t, (MtJ)i2 M:. Q.E.D. This follows directly from Lemma 4.1. THEOREM 5. Under GLOBAL-LRU-SOP, for any mi 2 2, V r , t, an overflow from M i finds its corresponding page in Miiffm; > mi. COROLLARY. Under GLOBAL-LRU-SOP, for any mi I2, Vr, t, an overflow from M ifinds its corresponding page in Mii f fV r , t, ( M i ) i2 M,'. PROOF. This proof has two parts as shown below. Part ( a ) of the Proof. m; > mi * V r , t, an overflow from M i finds its corresponding page in M'. From Lemma 3.4 m; > mi * V r , t, (Mt')' 2 M i and (St'(m;))ifl S,' = 0. Suppose the overflow from Mi, Ps, is caused by a reference to MJ.Then just before Pb, is overflowed, P,' exists in Mi. After the overflow, Pka finds its corresponding page still existing in MJ. Suppose the overflow Pi, is caused by a reference to M . Then just before the overflow from Mi, P,' exists in Miand (Si(mj))'n Sf = 0 i.e. the information in the last page of M' is not in Mi. This means that the last page of Miis not P j ; thus, the overflow page Pi, finds its corresponding page still in MJ after an overflow from Mioccurs. Part (b) of the Proof. m, 5 mi * 3r, t, such that an overflow from M idoes not find its corresponding page in M'. From Theorem 1, m; 5 mi * 3r, t, (Mi)' 2M:; then there exists P f , E M l and P,' 6Z MtJ.We can find a reference string such that a t the time of the overflow of Pi, from M i , P,' is still not in MJ. A string of references to M' will produce this condition. Then at the time of overflow of PL,, it will not find its corresponding page in M' . Q.E.D. THEOREM 6. Under GLOBAL-LRU-DOP, for mi 2, Vr, t, an overflow from M i finds its corresponding page in MJiffmj > 2mi. COROLLARY. Under GLOBAL-LRU-DOP, for mi 2 2, V r , t, an overflow from M ifinds its corresponding page in MJ implies that V r , t, 2 Mt. PROOF. This proof has two parts as shown below. Part ( a ) of the Proof. m; > 2mi * V r , t, an overflow from M i finds its corresponding page in M'. Theorem 4 ensures that m; > 2mi + V r , t, ( M i ) i2 M,' and Lemma 4.1 ensures that (StJ(m;))i n S,' = 0 we then use the same argument as in part (a) of the proof of Theorem 5. Part (b) of the Proof. mj 5 2mi * 3r, t, such that an overflow from M idoes not find its corresponding page in M J . ACM Transactions on Database Systems,Vol. 4, No. 3, September 1979.
-
Properties of Storage Hierarchy Systems
*
363
Case 1. m; < 2mi. m; < 2mi 3r, t, (Mt’)’2 M j (from part (a) of the proof of Theorem 4). We then use the same argument as in part (b) of the proof of Theorem 5. Case 2. m; = 2mi. The reference string r = “Pfa,Pin, ..., Pf2m,)a,Pfzmi+l)a” will produce the following stacks (at t = 2mi + 1): St‘= (Pf2mi)a, Pf2m,-l)a,*
st’
= ( P i i ,Pimi,Pi,+
Ptm;+l)a),
P&-1,
..., P i , PiI+l).
In handling the next reference to page Pfzmi+l)a,the pages P t m , + l ) a and PA,+, overflow at the same time; hence the overflow page P f m i + l ) a from M i does not Q.E.D. find its corresponding page in M’. T HE O R E M 7. Let M i (with mipages),M’ (with m’pages), and M be system A. Let M f i (with mI pages),M’J (with mj pages), and M‘ be system B. Let m: 2 mi and mj 2 mi. Under GLOBAL-LRU-SOP, for any mi I2, no MLPA can exist if m; > mi and mj > mi. PROOF. We shall show that Qr, t, ( M j U ( M ) ) i ) ( M i i U (Mt”)i).This will ensure that no MLPA can exist. Since mI P mi and LRU is used in M i and M’i, we can apply the LRU stack inclusion property to obtain M i c M i L .From Theorem 5 we know that overflows from M i or from M” always find their corresponding pages in MJ and M’J, respectively. Since SOP is used, these overflows can be treated as “no-ops.” Thus MJ and M” see the same reference string, and we can apply the LRU stack inclusion property to obtain Mt’ C Mt‘J (since mj z mj and LRU is used). M,‘ 5 Mt‘i and Mt’ C ML’* ( M i U ( M i ) ’ )_C (ML’ U Q.E.D. THEOREM 8. Let system A and system B be defined as in Theorem 7. Let mi 2 mi and mj Im;. Under GLOBAL-LRU-DOP, for any mi z 2, no MLPA can exist i f m, > 2mi and mj > 2mI. PROOF. We need the following preliminary results for this proof. LEMMA 8.1. Let S,‘ bepartitioned i d o two disjoint stacks, Wt and Vt defined as follows: W t ( k )= S i ( j ~for ) k = 1, .. . , I W,I where j o = 0, and jk is the minimum j , > j k - 1 such that 3 Pf, E St‘ and P f , Sicjk). Vt(k)= SIJ(j , ) for k = 1, . . . , I Vt1 where j o= 0, andjk is the minimum jk >jk-l such thatQP:, E Sl, P4, StJ(jk). (Intuitively, W, is the stack obtained from StJby collecting those pages that have their corresponding pages in M j such that the order of these pages in St‘ &preserved. Vt is what is left of StJafter Wl is formed.) Then Vr, t, ( a ) Wl f and ( b ) Vt 0,where 0,is the set ofpages corresponding to all the pages that ever overflowed from M i , up to time t. PROOF OF LEMMA 8.1. From Theorem 4, m; > 2m, * Vr, t, (Mi’)i2 Mt’. Thus for each page in Mt‘, its corresponding page is in Mi‘. This set of pages in Mi‘ is exactly Wt, and Wt f St‘by definition. Since the conditions for Vt and Wt are mutually exclusive and collectively exhaustive, the other pages in M i that are not in Wt are by definition in V t . Since a page in Vt does not have a corresponding page in Mt’, its corresponding page must have once been in M i because of readthrough, and later overflowed from M‘. Thus a page in V, is a page in 0,. Q.E.D. LEMMA 8.2. Any overflow page from M,’ is a page in V,.
st‘
ACM Transactionson Database Systems, Vol. 4, No. 3, September 1979.
-
364
C-Y. Lam and S. E. Madnick
PROOF OF LEMMA 8.2. From Theorem 4, m, > 2m, * Vr, t, (Mi)' 2 M i . From Theorem 6, m, > 2m, Vr, t, an overflow from M' always finds its corresponding page in MJ. An overflow from M i is caused by a reference to M . An overflow from M l also implies that there is an overflow from Mr'. Suppose the overflow page from Mi is Pd. Also suppose P,' E W t ,i.e. Pd sf Vt.We s h d show that this leads to a contradiction. The overflow page from Mr' is eitherpi, OrP;, (r # 0). If Pi, P,' is overflowed from Mi, Theorem 6 is violated since Pf, and P,' overflow at the same time, so Pi, will not find its corresponding page in MJ. If P;a 4 Pd is overflowed from M,",Theorem 4 is violated since after the overflow handling, there exists a page Ph, P,' in M' (since P,/E W t ) ;but P,' is no longer Q.E.D. in MJ. LEMMA 8.3. If there is no overflow from either MJ or M" then Vr, t, Vt, and V; have the same reverse ordering. Two stacks S and SJ are in the same reverse ordering, S' g S', if rS'(k) = r S J ( k )for 15 k 5 min( 1 S' I , I SJI ), where rS denotes the stack obtained from S by reversing its ordering. By convention, S g SJ i f S' = 0or SJ = 0. PROOF OF LEMMA 8.3 To facilitate the proof, we introduce the following definitions. (1) The orderedparent stuck ( S ' ) ,of the stack S' is the stack of parent pages corresponding to, and in the same ordering as, the pages in the reduced stack of S'. Formally, (S'),a and ( S ' ) l S S'. (2) Define a new binary operator, concatenation (II), between two stacks, S' and S2,to produce a new stack S as follows:
s'
s'
S = S'
I( S2,
S'(k) for k = 1,2, . .., IS' I, where S ( k ) =
S2(k) f o r k = I S ' ) + l ,
...,{I
S')+IS21}.
(3) Define a new binary operator, ordered difference (L), between a stack S' and a set T to produce a new stack S as follows:
S
= S' 2
T , where S(k) = S'( j k ) f o r k = 1 , 2,..., ( ~ S ' ~ - ~ S ' ~ T ~ ) ,
such that j o = 0, j k is the minimum j k > such that s ' ( j k ) fl T = 0. Intuitively, S is obtained from S' by taking away those elements of S' which are also in T. Figure 11 illustrates the LRU ordering of all level i pages ever referenced up to time t. Since there is no overflow from either Mior Mf', the length of this LRU stack is less than or equal to min(mj: By the definition of V;,VI = ( Yt)' 0 (SF)'. But (S?)' = (S:)' 11 ((Xt)' 2 (St ) ); hence,
y).
ACM Transactions on Database Systems, Vol. 4, No.3, September 1979.
Properties of Storage Hierarchy Systems v t
(Sty)11 ((( YI)’
(X,)’) 2 (s:)J) = ((XI)’ 2 (St‘)’) 11 (( YOJ0 ((Styu (Xt)’)) = ((XI)’ 2
= ((XN0
*
365
2
(F3,i)J)
11 v;.
Q.E.D. Thus the two stacks are in the same reverse ordering. LEMMA 8.4. Vr, t, ( a )Mi’ 2 M i , and ( b ) Vt and V ; are either in the same reverse ordering or the last element of V: is not an element of Vt. PROOF OF LEMMA 8.4. (a) and (b) are true for any time before there is any overflow from either Mj or M’j. (a) is true because any page ever referenced is in level j; so a page found in Mi is also found in MI’. (b) is true because of the result from Lemma 8.3. Assume that (a) and (b) are true for t. Consider the next reference at t + 1. Suppose this reference does not produce any overflow from either Mi or M’J;then (a) still holds because Mij 2 M i and M;‘ 2 M: (see Theorem 7). (b) still holds because overflows from Mi and M’Jare taken from the end of stacks Vt and K, respectively, and since there is no overflow from level j , (b)’s validity is not disturbed. Suppose this reference does produce overflow(s) from level j . Case 1. Overflow from M’j; no overflow from MJ: This cannot happen since overflow from M‘J implies reference to M which in turn implies overflow from MJalso. Case 2. Overflow from M’; no overflow from M‘J: 1. Suppose the last element in V ; is not an element of Vt. Then starting from the end of Vi, if we eliminate those elements not in Vt, the two stacks will be in the same reverse ordering. This follows from Lemma 8.3 and is illustrated in Figure 12. Thus we see that overflow from M’, i.e. overflowing the last page of Vt, will not violate (a) since this page is still in V;. (b) is still preserved since the last page in V ; is still not in Vt. 2. Suppose Vi and Vt are in the same reverse ordering. Then overflowing the
-S;
Most r ecen tI y r e fere nced =$
21
xt
I
yt
Least recently referenced
S’;
Fig. 11
Fig. 12 ACM Transactionson Database Systems, Vol. 4, No.3, September 1979.
366
-
C-Y. Lam and S. E. Madnick
last page of Vt does not violate (a) and results in the last page of V{ not in Vt. Case 3. Overflow from MI and overflow from M I : 1. Suppose the last element in VI is not in Vc.Referring to the diagram in Case 2, we see that the result of overflowing the last elementof V; and the last element of V,does not violate (a) and still preserves the condition that the last element of V: is not in Vt. 2. Suppose K and Vt are in the same reverse ordering. Then overflowing the last elements of VI and Vt leaves K and Vcstill in the same reverse ordering. (a) is not violated since the same page is overflowed from M" and MJ. Q.E.D. PROOF OF THEOREM 8. M" 1M' for the same reasons as those used in Theorem 7. From Lemma 8.4 M f J2 MJ.Hence
(Mi u ( M i ) ' ) (MI' u (MY)').
Q.E.D.
4. CONCLUSIONS
We have developed a model of a data storage hierarchy system specifically designed for very large databases. This data storage hierarchy makes use of different page sizes across storage levels and maintains multiple copies of the same information in the hierarchy. Four algorithms obtained from natural extensions to the LRU algorithm are studied in detail, and key properties of these algorithms that affect performance and reliability of the data storage hierarchy are derived. It is found that for the LOCAL-LRU algorithms, no choice of sizes for the storage levels can guarantee that a lower storage level always contains all the information in the higher storage levels. For the GLOBAL-LRU algorithms, by choosing appropriate sizes for the storage levels, we can (1) ensure the above inclusion property to hold at all times, (2) guarantee that no extra page references to lower storage levels are generated as a result of handling overflows, and (3) guarantee that no multilevel paging anomaly can exist. Several areas of further study emerge from this investigation. These include the study of store-behind algorithms [la] and the study of extensions to other known storage management algorithms. We hope that this study motivates further work in the area of generalized data storage hierarchy systems for very large databases. ACKNOWLEDGMENT
The authors would like to thank Mike Abraham, Sid Huff, and Ken Yip for reviewing an earlier version of this paper, and the referees for their editorial comments. REFERENCES 1. ARORA, S.R., AND CALLO, A. Optimal sizing, loading and re-loading in a multi-level memory
hierarchy system. Proc. AFIPS 1971 SJCC, Vol. 38, AFIPS Press, Montvale, N.J., pp. 337-344. 2. BELADY, L.A. A study of replacement algorithms for a virtual-storage computer. IBM Syst. J. 5, 2 (1966),78-101. 3. BELADY, L.A., NELSON, R.A., AND SHEDLER, C.S. An anomaly in space-time characteristics of certain programs running in a paging machine. Cornrn. ACM 12,6 (June 1969), 349-353. 4. CHEN, P.P.S. Optimal file allocation in multi-level storage systems. Proc. AFIPS 1973 NCC, Vol. 42, AFIPS Press, Montvale, N.J., pp. 277-282. ACM Transactions on Database Systems, Vol. 4, No. 3, September 1979.
Properties of Storage Hierarchy Systems
367
5. CONTI,C.J. Concepts for buffer storage. IEEE Computer Group News, March 1969,6-13. 6. DENNING, P.J. Virtual memory. Computg. Surveys 2.3 (Sept. 1970). 153-189. 7. FRANASZEK, P.A., AND BENNETT, B.T. Adaptive variation of the transfer unit in a storage hierarchy. IBM J.Res. and Deuelop. 22,4 (March 1978),405-412. 8. FRANKLIN, M.A., GRAHAM, G.S., AND GUPTA, R.K. Anomalies with variable partition paging algorithms. Comm. ACM 22,3 (March 1978), 232-236. 9. GREENBERG, B.S., AND WEBBER, S.H. MULTICS multilevel paging hierarchy. LEEE INTERCON, Session 20/3, New York, April 8-10, 1975. 10. HATFIELD, D.J. Experiments on page size, program access patterns, and virtual memory. ZBM J. Res. and Deuelop. 16, 1 (Jan. 1972),58-66. 11. HATFIELD, D.J., AND GERALD, J. Program restructuring for virtual memory. ZBM Syst. J. 20, 3 (1971), 168-192. S.E. Data base machine architecture in the context of information 12. HSIAO,D.K., AND MADNICK, technology evolution. Proc. Very Large Data Base Conf., Tokyo, Japan, Oct. 1977, pp. 63-84. 13. JOHNSON, C. IBM 3850-Mass storage system. IEEE INTERCON, Session 20/2, New York, April 8-10,1975. 14. JOHNSON, J. Progam restructuring for virtual memory systems. M.I.T. Project MAC TR-148, M.I.T., Cambridge, Mass., Mar. 1975. S.E. INFOPLEX data base computer architecture-concepts and 15. LAM, C.Y., AND MADNICK, directions. Working Paper Sloan School of Management, No. 1046-79, M.I.T., Cambridge, Mass., 1979. 16. LUM, V.Y., SENKO, M.E., WANG, C.P., AND LING, H. A cost oriented algorithm for data set allocation in storage hierarchies. Comrn. ACM 28,6 (June 1975), 318-322. S.E. Storage hierarchy systems. M.I.T. Project MAC TR-105, M.I.T., Cambridge, 17. MADNICK, Mass., 1973. 18. MADNICK, S.E. INFOPLEX-hierarchical decomposition of a large information management system using a microprocessor complex. Proc. AFIPS 1975 NCC, Vol. 44,AFIPS Press, Montvale, N.J., DP, 581-586. S.E. Design of a general hierarchical storage system. IEEE INTERCON, Session 20/ 19. MADNICK, 1, New York, April 6-10, 1975. 20. MADNICK,S.E. The INFOPLEX database computer: Concepts and directions. Proc. IEEE Comptr. Conf., Feb. 26,1979, pp. 168-176. 21. MATTSON, R.L., GECSEI, J., SLUTZ, D.R., AND TRAIGER,I.L. Evalulation techniques for storage hierarchies. ZBMSyst. J . 9, 2 (1970), 78-117. 22. RAMAMOORTHY, C.V., AND CHANDY, K.M. Optimization of memory hierarchies in multiprogrammed systems. J. ACM 17,3 (July 1970),426-445. Received December 1978; revised April 1979
ACM Transactions on Database Systems,Vol. 4, No. 3, September 1979.