Optimal Backup Policy for a Database System with ... - ScienceDirect

13 downloads 60635 Views 450KB Size Report
fundamental recovery technique for some medium failures in a database system ... of data is always to take the backup copies of all files in other places, and to ...
Available at

MATHEMATICAL

www.ElsevierMathematics.com l 0I.I.D .I l CICNCR d DIRECT*

COMPUTER MODELLING

Mathematical

and Computer

Modelling

38 (2003) 1373-1379 www.elsevier.com/locate/mcm

Optimal Backup Policy for a Database System with Incremental and Full Backups S.NAKAMURA Systems Division, The Bank of Nagoya Nagoya, 468-0003, Japan

C. QIAN Department of Industrial Engineering Aichi Institute of Technology Toyota, 470-0392, Japan

S.FUKUMOTO Department

of Information Network Engineering Aichi Institute of Technology Toyota, 470-0392, Japan

T. NAKAGAWA Department of Industrial Engineering Aichi Institute of Technology Toyota, 470-0392, Japan Abstract-The fundamental recovery technique for some medium failures in a database system is regularly carried out by executing a full backup that takes all copies of updated files. However, the overhead of such full backup becomes sometimes very large in a massive database system. To lessen the overhead of backups, an incremental backup with small overhead, which takes only copies of newly updated files, is usually adopted in most database systems. However, the overhead of an incremental backup to determine when and full backups:

increases in proportion to make full backups. the expected costs

to the total This paper

amount proposes

of updated files. It would a stochastic model with

incurred for two backups are obtained, backup interval which minimizes them is discussed. It is shown that an optimal a finite is given

and

unique

solution

of an equation

under

suitable

conditions.

Finally,

be necessary incremental

and an optimal

full

interval is given by a numerical example

and some useful discussions are made. @ 2003 Elsevier Ltd. All rights reserved.

Keywords-Database

system,

Media

fault,

Incremental

backup,

Full

backup,

Expected

cost.

1. INTRODUCTION In recent years, a database in computer systems is of great importance in the modern highly information-oriented society. A reliable database is the most indispensable instrument in an OLTP

(on-line

transaction

processing)

system

used especially

for bank

accounts.

For instance,

some errors in the online system of a bank might cause social confusion even for a short time, and occasionally, the bank might lose valuable public confidence. Therefore, some recovery techniques in a database management system have to be prepared prior to such problems, using fault-tolerant technologies [l] . 0895-7177/03/$ doi:

- see front matter @ 2003 Elsevier Ltd. All rights reserved.

lO.lOlS/SO895-7177(03)00351-O

Typeset by d,Ms-Tl$

1374

S. NAKAMURA et al.

It is a serious problem that data files in secondary media are sometimes corrupted by some faults due to noise, human errors, and hardware failures. In this case, we have to reconstruct the same files from the beginning. The simplest and most dependable method to ensure the safety of data is always to take the backup copies of all files in other places, and to take them out if files in the original secondary media are corrupted. However, this method takes many hours and is costly when files are large. To make backup copies efficiently, we might dump only files that have changed since the last backup. This would lessen significantly both duration time and size of the backup [2]. The recovery techniques for database failures [3-71 and the backup schemes for corrupted hard disks [8] have been extensively studied. When a complete full backup copy is repeated frequently, all the images of the database can be secured, however, its operation cost is very high. Thus, we suggest the following backup policy which ensures the safety of data and saves time: a full backup is carried out at scheduled times, and between these backups, an incremental backup, which takes all copies of newly updated files since a full backup, is done. That is, a full backup with large overhead is done at long intervals and an incremental one with small overhead is done at short intervals. In this paper, we formulate a stochastic backup model, which combines incremental and full backups. An incremental backup is made at periodic times and a full backup is done when either the number or the amount of updated files exceeds a certain threshold value. Then, we introduce the costs incurred for the overheads of two backups and obtain the expected cost rate between full backups. Further, we discuss an optimal interval for a full backup that minimizes the expected cost and compute it as a numerical example.

2. BACKUP

MODEL

We consider an incremental backup which takes all copies of newly updated trucks (see [l]) since the previous full backup. The overhead increases with the number of newly updated trucks. For example, if all updated trucks are included in the previous updated ones, then the amount of data transfer is the same as the previous one. However, if some updated trucks are different from the previous ones, then the amount of data transfer increases by their differences. Taking out the copies of the previous backup can make the recovery of a database easy and rapid, when some errors have occurred in storage media. An important problem in backup schemes is when to create a full backup. We want to reduce the number of full backups (with large overheads), however, the overhead of an incremental backup increases adaptively with the number of newly updated trucks. From this point of view, we should decide a full backup interval, by comparing the overheads of two backups. It is well known that when the amount of updated trucks exceeds a threshold level K, the overhead of an incremental backup is larger than that of a full backup. The value of K/M is about 60% in a usual online system, where M is the total trucks in a database. Thus, if the amount of updated trucks exceeds a level K, we should make a full backup, instead of an incremental backup. From the above discussions, we formulate the following stochastic backup model: an incremental backup is made at scheduled times and a full backup is done at the iVth cycle (N = 1,2,. . ) of an incremental one or when the amount of updated files exceeds a level K, whichever occurs first. Then, we discuss an optimal interval number N’ which minimizes the total overhead. It is assumed that random variables Wj are independent and are the amount of trucks which is updated newly at the j th backup. It is easily seen that WI is usually stochastically greater than Wj (j = 2,3,. . . ), and hence, has a different probability distribution from Wj. We define G(z) E Pr{Wi

5 CC},

(1)

and Q(Z) E Pr{Wj

5 Z}

(j = 2,3,. . . , N).

(2)

Optimal

Backup

Policy

1375

Noting that the total trucks updated at the j th backup are given by .Z’j E Cb, bution of 2, is Fj(x)

E Pr{Zj

< X} = G * Q(j-l)(~)

where Qci)(zr) is the i th fold Stieltjes convolution asterisk mark represents the Stieltjes convolution u) dG(u).

(j=1,2

)‘.‘,

N),

IV,, the distri-

(3)

of Q(z) with itself and Q(‘)(X) 5 1, and the of G(z) and Q(z), i.e., G * Q(z) E sozQ(x -

3. EXPECTED

COST

Introduce the following costs: the cost function C(X) is the cost incurred for the overhead of an incremental backup when the number of updated trucks is x, and a constant cost cf is the cost incurred for the overhead of a full backup. Then, the expected cost of the jth incremental backup, when the amount of updated trucks has not exceeded a level K, is

hj E J,K4w&) FjW>

(j = 1,2, . . ) N - 1).

Next, suppose that backups are scheduled to be made periodically at a unit of time, and consider one cycle from a full backup to next one. Recall that a full backup is made at the Nth backup if the amount of updated trucks does not exceed K, or at the jth backup (j = 1,2,. . , N - 1) if the amount exceeds K for the first time. Then, the total expected cost of one cycle is N-l

H = c

[Fj-I(K)

- Fj(K)]

(I;:c

hi + cf)

+FN-l(K)

@I’.+-)

(5)

j=l N-l

= cf + c

(N=

hjFj(K)

1,2,...),

j=l

where Fe(K)

z 1 and Cy z 0. Further, the mean time of one cycle is N-l

L G C j [Fj-l(K) j=l

- Fj(K)]

+ NFN-l(K) (6)

N-l

= c

Fj(K)

(N=

1,2,...).

j=o Thus, the expected cost rate of one cycle is, from (5) and (6),

cf

C(N)

E $

=

cf

+

=

+

N-l C j=l

N-l C W(K) j=l N-l c T?(K) j=O .&OK 4x1

N-l

C CiW)

j=O

It is evident that C( 1) = cf.

(7)

dFj(x)

(N = 1,2,...).

1376

S. NAKAMURA

4. OPTIMAL

FULL

et al.

BACKUP

INTERVAL

We seek an optimal interval number N* which minimizes the expected cost C(N) the inequality C(N + 1) - C(N) 2 0, we have

in (7). From

N-l c j=o

where ho E 0. Letting

L(N)

(hN

-

hj>Fj(K)

1

(8)

Cj,

denote the left-hand side of (8), we easily have -ql)

= h,

L(m)= $im L(N) = h, 2 Fj(K) - 5 hjFj(K), j=O

L(N + 1) - L(N)

= (h N+l - hN) 2

(9)

j=o

Fj(K>,

j=O

where h, E lim’3+oo hj. If hi is strictly increasing in j then L(N) is also strictly from L(1) to L(oo). In this case, we have the following optimal policy.

increasing

(i) If L(1) 2 cf then N’ = 1, i.e., we should make only a full backup. (ii) If L(1) < cf < L(oo) then there exists a finite and unique N* (1 < N* < oo) which satisfies (8), and the resulting expected cost is given in (7). (iii) If L(oo) 5 cj then N* = 00, i.e., a full backup is made only when the amount of updated trucks exceeds K.

5. NUMERICAL

EXAMPLE

Suppose that C(E) = c&ZG(z) = 1 - e-plz and Q(z) = 1 - e-pLI, where p 1 ~1 because the amount of updated trucks at the first time is greater than that after the next ones. Letting cyE (p - pl)/p (0 5 LY< l), we easily have

JTj(z)= 5 I$2 e-P""(1 l ,i-j+l) i=j

(j = 1,2,...),

E hj = 2 i=j+p and cj

C(N)

(j = 1,2,...),

%& ((~K)~li!)

+

(Q/P)

=

N-l C j=l N-l

M C i=j+l 03

(1 - oi-j+r)

(W-VP)

i-l lgj

evpK

e-pK (1 - oi-j+l)

It is proved in Appendix A that hi is strictly increasing to h, equation (8) by cc/p, it becomes -pK

j=l

5

i=j+l

(pKji i!

e-pK

- 4 (10)

1 + C C ((~K)~li!) j=l i=j

_ Nc

l~‘-j(l

zr&-j(l

l=j

(I_

= coK.

1

,i-j+l)

_ a)

Further,

2

2.

co/P

after dividing

(11)

Optimal

Letting

L(N)

Backup

d enote the left-hand side of (ll), L(1) = z

Policy

it is strictly increasing in N, and

1

PlK 1 ee-mK

l+/JrK-

1377

02)



&oo)=$rnmL(N)=pK -+ (13) + p(pp~~l)

[l - (1 + p-LIK)e-plK]

.

Thus, we have the following optimal policy. (i)’ If L(1) 2 cr/(cc/p) then N* = 1. (ii)’ If i(l) < c~/(Q/P) < L(oo), th en there exists a finite and unique minimum N’ < oo) which satisfies (ll), and the resulting cost is given in (10). (iii)’ If i(oo) 5 cp/(ce/p) then N* = 00.

N* (1

Suggest Documents