Appl. Comput. Math. 1 (2002), no. 2, pp. 195-200
OPTIMAL LOCATION OF RESERVING FILES IN DISTRIBUTED COMPUTER SYSTEMS URFAT NURIYEV †‡§ Abstract. In this work we study one of the possible approaches to the optimization of the distribution of reserving files at the nodes of a network based on the reliability criterion. To estimate reliability we use an approach related to the notion of a “reliability order” parameter.
1. Introduction One of the important aspects in the design of computer networks is the problem of defining the main characteristics of a distributed computer-aided data bank (DCADB). In solving this problem another problem of optimal distribution of files through nodes of the network by various criterions arises. This problem is caused by factors contradicting one another in the network [1-4,10]. On the one hand it is desirable to store one copy of any file at a node of the networks. It minimises the storage and operating costs. On the other hand, to storage a few copies of some files (in general all copies) at different nodes of the network has advantages, from which we note the following: -the costs to passing information via communication channels when these files are needed generally decrease ; -the time to access these files is generally shorter which is important for networks operating in the real time scale. -the vitality of the DCADB when certain nodes of the network fail raises, because in this case it is possible to use the information of the copies contained in the operating nodes. So taking into account the priority of criterions (cost and reliability) one can have various optimal variants to locate files at network nodes. In the existing problem on the optimal distribution the main optimization parameter is the operational cost (economical criterion - reduced costs) including costs for storage and handling arrays, for passage information in a system, as well as the net costs over a certain period of the time. These statements differ from each other only by admissions taken and primary considerations in each generality level. In some special networks, the requirement is to solve a problem in real time: the important peculiarity of time is to raise requirements for computer reliability. In considering operating such a system, it is necessary to take into account the possibility that various nodes are may break causing partial or full collapse of the operating system. Therefore it is natural to wish for the most reliable real time network operation in case of failure. In this work one of the possible approaches in optimizing the distribution of reserving files at nodes is considered on the basis of the reliability criterion. †Institute of Applied Mathematics, Baku State University, Z. Khalilov str. 23, 370148, Baku, Azerbaijan, e-mail:
[email protected] ‡The publication of this paper was supported by the grant of the Open Society Institute Zug Foundation §Manuscript received July 12, 2002. 195
196
APPL. COMPUT. MATH., VOL. 1, NO. 2, DECEMBER 2002
2. Mathematical Statement of the Problem The reliability problem is generally considered via the probability aspect, and the probability of system elements is assumed to be given. However, in solving practical problems to obtain corresponding probability data is difficult and impractical, with this in mind we use a different approach. To estimate reliability we will use an approach related to the notion of the “reliability order” parameter. We assign a number ϕj for every file and call it the “valuability (preference, priority of importance)” of the file. Suppose that the k-th copy of the j-th file has the valuability of ϕj (k), ϕj (k) ≤ ϕj (k − 1) ,∀ k. Let us consider the problem of optimal location, of reserving files at network nodes by the reliability criterion. Let m be the number of nodes performing the function of, processing n files in the applied problem. We aim to distribute these files and their copies between computers to maximize the summary valuability, taking into account the following requirements. I ) The memory volume to store files at each node is limited ; II) Each node stores at most one file with the given contents ; III) Each file is stored at least at one main node ; IV) Each node stores at least one file. This problem (KSPV) is formulated in the type of the following model of Boolean programming. XXX i
(
j
ϕj (k)xkij → max,
(1)
k
1, if the k-th copy of the j-th file is stored at the i-th node, . 0, otherwise, i =1,2,. . . ,m is the total number of computers in the network , j =1,2,. . . ,n is the total number of various files in networks under constraints where xkij =
xkij = 1 ∨ 0; i = 1, m; m n X X
lj xkij
≤
k = 1, m; j = 1, n Li
,
i = 1, m,
(2) (3)
j=1 k=1
where lj is the length of the j-th file, Li is the memory volume to store files at the j-th node. m X
xkij
≤
1, i = 1, m,
j = 1, n
,
(4)
k=1 m X
x0ij
≥
1 , j
= 1, n
,
(5)
i=1 n X
xkij
≥
1,
i = 1, m.
(6)
j=1
Constraints (3) –(6) are equivalent to conditions (I) – (IV), respectively. Note that the index 0 in x0ij indicates the original file and since in this problem one looks at reserve files, variables x0ij will not be considered anywhere except in (5). Further we suppose that ¡ ¢ a1) ϕj (k) , (j = 1, m, k = 1, m) , lj (j = 1, n) , Li i = 1, m is integer; a2) max {lj } ≤ max {Li } ; j=1,n
i=1,m
U. NURIYEV: OPTIMAL LOCATION OF RESERVING FILES IN DISTRIBUTED...
a3) a4) a5)
min
j=1,n n P j=1 n P j=1
{lj }
li
≥
lj
≤
≤
197
min {Li };
i=1,m
max {Li } ;
i=1,m m P i=1
Li .
It is clear that KSPV is a Knapsack system type problem.
3. Knapsack System Problem The Knapsack System Problem (KSP) with Boolean variables is a generalised Knapsack problem with Boolean variables for the case of several filled Knapsacks when each of the objects can be placed only in one Knapsack. In the literature this is called either the multiplicity Boolean Knapsack problem, or Knapsacks problem with Boolean variables, the multiple Knapsacks problem or the generalised Knapsack problem etc. [9]. There are many possible applications of this problem, for example, the problem of shipping m ships with n containers, a storage problem. There are also a number of problems in such areas as computational process management, optimal cutting-of, electronic systems design, etc, that can be reduced to the Knapsack system problem [5,6,9]. In general case the KSP is formulated as follows [9]: Consider ¡ a set of ¢ n objects An = {a1 , a2 , . . . , an } each of which has the valuability ϕj and weight `j j¡= 1, n and a set of m Knapsacks Bm = {b1 , b2 , . . . , bm } (m≤n) with the given ¢ capacity Li i = 1, m . The Knapsack system problem with Boolean variables is to locate objects through Knapsacks in order to maximise the final usefulness of selected objects and not to exceed the capacity of the Knapsacks. Supposing that variable xıj takes the value 1, if the j-th object is located at the i-th Knapsack, and 0 otherwise. The problem (KSP) can be formalised as follows [9]: To maximise m X n X
ϕj xij
(7)
i=1 j=1
under conditions xij = 0 ∨ 1, n X
(i = 1, 2, . . . , m; j = 1, 2, . . . , n) ,
`j xij ≤ Li ,
(i = 1, 2, . . . , m) ,
(8)
(9)
j=1 m X
xij ≤ 1,
(j = 1, 2, . . . , n) .
i=1
It is ¡ supposed ¢ that¡ ¢ ¡ ¢ ϕj j = 1, n , `j j = 1, n , Li i = 1, m are positive integers ; lj ≤ max {Li }, for j=1,2,. . . ,n, i=1,m
Li ≥ min {lj }; for i=1,2,. . . ,m, j=1,n
(10)
198
APPL. COMPUT. MATH., VOL. 1, NO. 2, DECEMBER 2002 n P
j=1
lj
> Li for i=1,2,. . . ,m.
Notice, that for m=1 expressions (7) – (10) define the standard Knapsack problem with Boolean variables. The Knapsack system problem with Boolean variables was studied in [5,8]. It is clear KSPV is a new variant of the problem (7) - (10). In the problem (1) – (6) the target function has a variable coefficient because after each loading of objects its value is decreased . Besides the problem has additional constraints of a block type .
4. Estimation Problem Together with Problem KSPV consider Problem KSPVE obtained from replacing condition (2) by condition (11) (continuous problem): 0 ≤ xkij ≤ 1.
(11)
It is easily seen that Problem KSPVE is an estimation for the Problem KSPV. Consider the following algorithm (A): A1: Ordering Knapsacks by decreasing capacity: L1 ≥ L2 ≥ . . . ≥ Lm ; A2: q = 1; A3: Ordering objects by decreasing the ratio
ϕj (1) `j
:
ϕ1 (1) ϕ2 (1) ϕn (1) ≥ ≥ ... ≥ ; `1 `2 `n A4: Solving the one dimensional continuous Knapsack problem for the q-th Knapsack; A5: q = q + 1; A6: If q > m go to step A10; A7: Redefining valuabilities of loaded objects at the q-th Knapsack; ϕ (k) A8: Reordering objects by decreasing the ratio j`j with respect to new valuabilities: ϕ (k) ϕi (k) ϕi1 (k) ≥ i2 ≥ ... ≥ n `i1 `i2 `in
;
A9: Go to step A4; A10: End. We have the following theorem: Theorem 4.1. The algorithm A gives the optimal solution of the problem KSPVE. The proof of the theorem directly follows from the lemma and the construction of algorithm A. Lemma 4.2. In algorithm A at step A1 where Knapsacks are interchanged the value of the objective function of the problem does not increase.
U. NURIYEV: OPTIMAL LOCATION OF RESERVING FILES IN DISTRIBUTED...
199
The proof of Lemma is based on reducing the value of more objects if we load firstly the knapsacks of most capacity, i.e. in this case we have more possibility to choice objects for the next knapsack. This causes the sum of valuabilities of loaded objects be maximal provided the knapsacks are ordered by decreasing capacity. If in algorithm A, at step A4, the one-dimensional discrete Knapsack problem is solved [11], then the approximated algorithm B for solving problem KSPV is obtained. Note that if Li = L (i=1, m) or ϕj (k) = ϕj , ∀k, then algorithm B gives the exact solution of problem KSPV.
5. Conclusion In the paper [7] to solve the Knapsack system problem with Boolean variables a branch and bound algorithm of the first level was suggested. In this algorithm the next levels of a solution tree are constructed, either by means of loading the object in a special Knapsack, or by means of excluding the object from all Knapsacks. (In this case the object is loaded in m+1 fictitious Knapsacks). In this case it is assumed that every mode generates m+1 nodes. Taking into account (4) applying this method to KSPV problem we can obtain an exact solution. It is well-known that since problem KSPV is NP-complete [6], the best of the known algorithms to get an exact solution run over a long time even for small values of m and n. On the other hand, the implemented experiments show that heuristic algorithms give an approximate solution, but close enough, so to solve a concrete problem it is reasonable to solve it first by the heuristic method and then to apply the branch and bound method. It allows us to find a good admissible solution and to eliminate cases without perspectives at the beginning of calculation. Thanks to this, calculations can be stopped at any time and the best of current solutions can be used instead of potentially achievable optimum.
References [1] Carraresi, P. and Gallo,G., (1982), “Optimal location of files and programs in computer networks”, Mathematical Programming Study,20, 35-53. [2] Casey, R.G., (1972), “Allogation of copies of a file an information network”, Proceedings of AFIPS ,40, 617-625. [3] Chu, W.W., (1969), “Optimal file allocation in a multiple computer system”, IEEE Transactions on Computers, C-18, 885-889. [4] Fisher, M.L. and Hochbaum, D.S., (1980), “Database location in computer networks”, Journal of the ACM, 27, 718-735. [5] Fisk, J.C. and Hung, M.S., (1979), “A heuristic routine for solving large loading problems”, Naval Research Logistics Quarterly, 26, 643-650. [6] Garey, M.R. and Johnson, D.S., (1979), Computer and Intractability : A Guide to the Theory of NPCompleteness. San Francisco: W.H.Freeman and Company. [7] Hung, M.S. and Fisk, J.C., ( 1978), “An algorithm for 0-1 Multiple Knapsack problems”, Naval Research Logistics Quarterly, 25, 571-579. [8] Martello, S. and Toth, P., ( 1981), “Heuristic algorihtms for the Multiple Knapsack problems”, Computing,27, 93-112. [9] Martello, S. and Toth, P., (1990), Knapsack Problems: Algorithm and Computer Implementations. Chichester: John Wiley & Sons Ltd. [10] Morgan, H.L. and Levin, K.D., (1977), “Optimal program and data locations in computer networks”, Communications of the ACM, 20, 315-322.
200
APPL. COMPUT. MATH., VOL. 1, NO. 2, DECEMBER 2002
[11] Nikitin, A.I. and Nuriev, U.G., (1983), “On the method for solving the knapsack problem”, Kibernetika 2, 108-109 (in Russian) [12] Robinson, E.P. and Gao, L.L., (1994), “Location of Computers and Multiple Files in Distributed Computers Systems”, Mathematical and Computer Modelling, 20 (7), 111-120.
Urfat G. Nuriyev graduated from Baku State University in 1976 and received a diploma in mathematics. He received degree in Candidate Physic-Mathematical Sciences (1984) from Institute Cybernetic of Academy of Sciences of Ukraine. In years 1976-1979 and 1984-1992 working as Senior Researcher and Head of Department at the Institute of Cybernetics of the Azerbaijan Academy of Sciences, between 1980 - 1983 postgraduate studies at the Institute of Cybernetics of the Ukrainian Academy of Sciences in Kiev. From 1992-1997 in Baku State University and 1997 - present time in Ege University teaching Associate Professor. His current research interests are in Operational Research, Discrete Mathematics, Combinatorial Optimization, Applied Programming, Computer Systems. He has published over 50 research papers.