470
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 38, NO. 2, MARCH 2008
Joint Optimization of Hardware and Network Costs for Distributed Computer Systems Danilo Ardagna, Chiara Francalanci, and Marco Trubian
Abstract—Multiple combinations of hardware and network components can be selected to design an information technology (IT) infrastructure that satisfies requirements. The professional criterion to deal with these degrees of freedom is cost minimization. However, a scientific approach has been rarely applied to cost minimization, particularly for the joint optimization of hardware and network systems. This paper provides an overall methodology for combining hardware and network designs in a single cost minimization problem for multisite computer systems. Costs are minimized by applying a heuristic optimization approach to a sound decomposition of the problem. We consider most of the design alternatives that are enabled by current hardware and network technologies, including server sizing, localization of multitier applications, and reuse of legacy systems. The methodology is empirically verified with a database of costs that has also been built as part of this paper. Verifications consider several test cases with different computing and communication requirements. Cost reductions are evaluated by comparing the cost of methodological results with those of architectural solutions that are obtained by applying professional design guidelines. The quality of heuristic optimization results is evaluated through comparison with lower bounds. Index Terms—Client server, costs, distributed computing, networks.
I. I NTRODUCTION
I
NFORMATION technology (IT) infrastructure is comprised of hardware and network components of a computer system [30]. The objective of infrastructural design is the minimization of the costs required to satisfy given computing and communication requirements [12], [25], [30]. In the literature, at a system level, two macrodesign problems have been studied, which are related to the selection of hardware and network components, respectively. The first problem is how to distribute the overall computing load of a system onto multiple machines in order to minimize hardware costs [10], [20], [25]. The second problem is where to locate machines that need to exchange information in order to minimize network costs [14], [27], [37]. These two problems are separately studied in the literature. However, design decisions on both alternatives are strongly interrelated, as different allocations of computing load can change Manuscript received December 30, 2005; revised February 17, 2006. This work was supported in part by IBM as part of Equinox and SUR international projects. This paper was recommended by Associate Editor C. Hsu. D. Ardagna and C. Francalanci are with the Department of Electronics and Information, Politecnico di Milano, 20133 Milan, Italy (e-mail: ardagna@ elet.polimi.it;
[email protected]). M. Trubian is with the Computer Science Department, Università degli Studi di Milano, 20135 Milan, Italy (e-mail:
[email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TSMCA.2007.914749
the communication patterns among machines and modify the economics of corresponding network structures. For the hardware macrodesign problem, the historical design principle was centralization, which was advocated to take advantage of hardware scale economies according to Grosch’s law [17]. In the late 1970s, Grosch’s law was controverted by Parkinson’s law [35], introducing a critical capacity level above which scale economies are not verified and centralization becomes cost inefficient. However, Grosch’s law has been revised as “it is most cost effective to accomplish any task on the least powerful type of computer capable of performing it” [17]. This form of decentralization and its operating rule, referred to as downsizing, became the methodological imperative for cost-oriented hardware system design in the mid 1980s. The corresponding infrastructural design guideline, which has been effectively summarized as “think big, but build small,” is still considered valid [36]. Academic studies have challenged the generality of the decentralization principle [20], [25]. Moreover, the empirical recognition of the cost disadvantages of decentralization occurred in the 1990s with the observation of client–server costs [39]. Decentralization reduces investment costs, but it increases management costs due to a more cumbersome administration of a greater number of hardware components. Hence, decentralization may result in an increase, as opposed to a reduction, of total costs, depending on the overall tradeoff between investment and management costs of all components. A general consequence is that the appropriate index for the optimization of hardware system design is the summation of investment and management costs. This cost index is referred to as the total cost of ownership (TCO) [25]. Recentralization has been thereafter considered to reduce management costs and, hence, TCO. Technology has also developed toward enabling greater centralization without increasing the individual size of computers, in compliance with the “think big, but build small” principle. Modern applications can be designed to be split into multiple modules, which are called tiers, each of which can be allocated on a different machine [12], [16], [30]. Multitier applications offer a greater flexibility to implement the most convenient tradeoff between hardware centralization and decentralization [18]. An application tier can be simultaneously allocated on multiple coordinated machines, i.e., on a cluster of servers [16], [22]. Computers within a cluster share the overall computing load and, through loadbalancing techniques, act as a single computing resource. This load sharing allows greater downsizing, reduces investment costs, and may slightly increase management costs. New client architectures also favor recentralization by reducing management costs. Thin clients are currently proposed as a less expensive alternative to personal computers [29], which are
1083-4427/$25.00 © 2008 IEEE
ARDAGNA et al.: JOINT OPTIMIZATION OF HARDWARE AND NETWORK COSTS
471
referred to in the following as fat clients. They have lower computing capacity than PCs, which is sufficient for the execution or the emulation of the presentation tier of applications, but require the centralization of the application logic on a server. The Independent Computing Architecture and the Remote Desktop Protocol standards allow remote access to the application logic by traditional PCs. This translates into hybrid configurations of PCs which execute only a subset of client applications, which will be referred to in the following as hybrid clients. The reuse of legacy components, both clients and servers, obviously affects TCO, but optimal reuse depends on design choices on other infrastructural alternatives. Frequently, legacy systems have a high residual economic value, and thus, their reuse becomes a relevant choice that can shift cost tradeoffs. Furthermore, they can be upgraded, and their life cycle can be extended over a significantly longer period of time with limited additional investments. For example, downsizing may not be convenient for an organization that has a legacy server far from completing its life cycle, and conversely, centralization may not be cost effective if all sites are equipped with reusable legacy servers that can still support most of the application load. Current professional design guidelines suggest a greater number of tiers accompanied by the implementation of clusters as the hardware design solution that minimizes TCO [16], [36]. By splitting applications into tiers and sharing load within clusters, the computing capacity limit to centralization increases considerably. Furthermore, current trends in network technologies and services represent an additional enabling factor for a more centralized system design, through broadband networks, such as optical metropolitan-area networks (MAN) and their connection with storage area network (SAN) systems. They also favor network scalability, as Internet-based virtual private networks (VPNs) make all-to-all connections straightforward [41]. It should be noted that network expenses may increase due to the high bandwidth requirements that are generally involved in storage networking technologies [33], [34]. However, since, over time, the ratio of communication costs to capacity has experienced continuous reductions, system centralization is currently encouraged in the professional literature also from a network standpoint [38]. Only a few academic studies have attempted a systematic analysis of cost issues in the design of hardware and network systems [20], [25]. It is interesting to note that previous academic contributions have been initiated by the first wave of professional studies challenging the initial centralization design paradigm and promoting decentralization as a cost-minimizing alternative. In the past, both Gavish and Pirkul [20] and also Jain [25] found that infrastructural design raises complex cost tradeoffs that are difficult to provide general solutions. More recently, the work presented in [31] proposes a methodology to support the design and cost performance modeling of client–server systems. They discuss an iterative approach to model a system’s performance and optimize server sizing and the allocation of applications. The approach supports whatif cost performance analyses for a predefined set of system configurations. Lin et al. [26] face the cost-oriented design of modern data centers. The approach considers a single-site scenario. Applications are allocated with a fixed number of tiers by satisfying response-time requirements.
Different from the previous literature, this paper considers a multisite scenario and addresses the following design choices: 1) server sizing and localization; 2) multiple tier allocations for server applications; 3) server sharing by multiple server applications; 4) thin-server sharing among different user classes; and 5) optimal reuse of legacy systems. This paper proposes a methodology that identifies a cost-minimizing solution to these design choices. The methodological solutions are compared with those obtained by applying current professional guidelines. Overall, professional guidelines generally recommend the recentralization of the IT architecture. However, for the sake of completeness, methodological results are also compared to an infrastructural solution designed by applying the decentralization paradigm and allocating a server cluster on each organizational site. This paper is organized as follows. Section II presents the optimization problems that are modeled in Section III. Section IV describes the algorithmic approach proposed to solve the optimization models. Section V discusses the results of the empirical verification of the approach by evaluating the quality of our solutions. Conclusions are drawn in Section VI. II. O PTIMIZATION P ROBLEM This paper’s model draws from Jain [25] the approach to the representation of design choices as a single cost minimization problem (see Section III-C). However, design variables have been significantly extended to account for the complexity of modern computer systems. This paper assumes that wide-area networks are implemented as IP VPNs [41] due to their flexibility in realizing point-to-point connections. This way, network design is performed by sizing link capacity and calculating the associated TCO. The general model of the IT infrastructure considered in this paper is shown in Fig. 1. A. Technology Requirements This section formalizes the technology requirements of an organization. The reference organization has a set S of sites. Definition 1. Sites: A site denoted by s ∈ S is defined as a set of organizational resources connected by a local-area network (LAN). Definition 2. Instances of server applications: An instance of a server application (or application process or simply application) ai ∈ A is characterized by the following: 1) operating system Oi ; 2) computing capacity requirements, which are indicated as Mipsi and measured in millions of instructions per second (MIPS); 3) primary memory requirements, which are indicated as Rami and measured in megabytes. Definition 3. User classes: Each site s ∈ S has a set of user classes C s . Each user class csi ∈ C s is a set of users with common computing requirements, i.e., using the same set of applications Ai ⊂ A. A user class is also characterized by the type of client computers, which is either fat, thin, or hybrid. The client type that is assigned to a user class is not a design alternative. Client machines are located at the same site as their user class.
472
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 38, NO. 2, MARCH 2008
Fig. 1. General model of the IT infrastructure.
If type = thin or type = hybrid, the following additional characteristics are defined: 1) remote protocol RPi ; 2) primary memory size Rami , which is measured in megabytes; 3) computing capacity Mipsi , which is required to support the remote execution of applications for the whole class and measured in MIPS. s C denotes the set of all user classes, i.e., C = C . s∈S
B. Technology Resources The computing requirements of the reference organization can be satisfied by means of the following hardware resources: servers, thin servers, and legacy servers. Each hardware resource is characterized by its configuration. Definition 4. Configuration: A configuration indexed by k ∈ SC is characterized by the following parameters: 1) primary memory size RAM_Sk , which is measured in megabytes; 2) computing capacity MIPS_Sk , which is measured in MIPS. Definition 5. Server: A server is a computer machine that supports application instances with configuration k ∈ SC. Definition 6. Thin server: A thin server is a computer machine that supports thin- or hybrid-client computers with configuration k ∈ SC. Definition 7. Legacy server: A legacy server is a legacy computer machine, i.e., a machine that has already been purchased by the organization, which is located at site s ∈ S. Each legacy server is characterized by a configuration k ∈ SC. Legacy servers are statically associated to sites and are supposed to be not moved across sites as a consequence of optimization. LS(s) represents the set of legacy servers of site s. Definition 8. Cluster: A cluster indexed by j ∈ CL is defined as a set of either servers, thin servers, or legacy servers characterized by the same configuration. The configuration of the server machines composing a cluster is also referred to as the cluster configuration. Thin servers and servers cannot coexist in the same cluster. A cluster that includes thin servers is also referred to as the thin cluster. A cluster that includes legacy servers is also referred to as a legacy cluster. N represents the maximum number of servers in a cluster. We denote the actual number of servers in cluster j as n(j) ≤ N .
Definition 9. Network connection: A network connection is a dedicated communication link ns connecting site s ∈ S to the VPN (see Fig. 1). A network connection is bidirectional, and its capacity BSs is defined as an ordered pair (BOs , BIs ), where BOs and BIs represent the input and output capacities, respectively, and are both measured in bits per second. Definition 10. Data exchanges: User classes and applications exchange data. Data exchanges are modeled as a directed weighted graph G = (V, E). A vertex ν ∈ V of graph G can represent an application, a user class, or a thin server. E is defined as the set of all arcs of graph G. The weight Rαβ associated with the directed arc (α, β) ∈ E connecting a generic vertex α ∈ V to a generic vertex β ∈ V represents the average bandwidth required to support the data exchanges from vertex α to vertex β. In general, Rαβ is a function of the frequency of data exchanges and their average size. If α is a user class and β is a thin server, Rαβ is also a function of the remote protocol associated with the user class and of the percentage of concurrent users in the class [32]. A special node ν0 ∈ V represents external applications exchanging data with internal applications through the Internet. Constraint 1. Sharing of clusters among user classes: Not all user classes can share the same thin cluster. This is specified by defining groups G1h = {Cis }, such that ∀l = m, G1l ∩ G1m = ∅, which are sets of user classes that can share the same thin cluster. IG1 represents the set of indices, which identifies groups of user classes that can share the same thin cluster, i.e., G1h , where h ∈ IG1 , is the hth set of user classes that can share the same thin cluster. Constraint 2. Sharing of clusters among applications: Not all server applications can share the same cluster. This is specified by defining groups G2h = {ai }, which are sets of server applications ai that can share the same cluster. IG2 represents the set of indices, which identifies groups of server applications that can share the same cluster, i.e., G2h , where h ∈ IG2 , is the hth set of applications that can share the same cluster. Observations: Note that IG1 identifies a partition because user classes are usually partitioned for security reasons or privileges. On the other hand, the groups identified by IG2 can overlap. This way, multiple allocations can be defined for server applications. As an example, a servlet engine can be executed with a Web server application or as an independent application or with an application server. Web and application servers are not usually executed by the same server. Furthermore, database
ARDAGNA et al.: JOINT OPTIMIZATION OF HARDWARE AND NETWORK COSTS
473
management system (DBMS) applications are usually allocated to private servers for data management and security reasons. Such a situation can be modeled by defining three groups: G21 containing Web server applications and servlet engines, G22 containing servlet engines and application servers, and G23 containing all DBMS applications. Note that G21 ∩ G22 = ∅. Application instances will be assigned to clusters whose overall capacity is greater than or equal to their MIPS requirements. This guarantees maximum CPU utilization. In this paper, the MIPS of server machines are estimated in such a way that maximum CPU utilization is lower than 60% [1], [2], [4], [10]. With values of utilization greater than 60%, small variations of throughput would cause a substantial growth of response time, and overall, performance would become unreliable. This empirical rule of thumb, which is commonly applied in practice [30], [32], has been given a formal validation. It has been formally demonstrated that a group of aperiodic tasks will always meet their deadlines as long as the bottleneck resource utilization is lower than 58% [2]. Note that performance analyses should follow cost analyses to refine sizing according to a formal queuing model. The aim of this paper is to evaluate a large number of alternative solutions and to find a candidate minimum-cost infrastructure that can be fine-tuned by applying performance evaluation techniques. Note that MIPS are evaluated for homogeneous classes of servers (for example, Intel machines will not be compared with SPARCs). Therefore, each application will be allocated on a specific class of servers. Similarly, the primary memory of each server in a cluster should be grater than or equal to the summation of the RAM requirements of all applications ai that are simultaneously executed by the cluster. For the sake of simplicity, server disk performance is not considered, and it is assumed that server configurations are CPU- and I/O-balanced and that disks are never a bottleneck. The LAN connection equipment inside a site is not taken into account because its cost per bit per second is several orders of magnitude smaller than the cost of leased network connections [41]. As far as the organizational site has a geographical radius of a few kilometers, current Fast Ethernet [100 Mb · s−1 ] and Gigabit Ethernet [1 − 10 Gb · s−1 ] technologies provide extremely wide bandwidth at low cost.
set j ∈ J is further constrained by the fact that only feasible allocations of user classes and server applications to clusters are allowed, according to groups G1h = {Cis }, where h ∈ IG1 , and G2h = {ai }, where h ∈ IG2 (see Constraints 1 and 2 in Section II-B).
III. O PTIMIZATION M ODEL Requirements (see Section II-A) can be satisfied with technology resources (see Section II-B) by making design decisions on a set of optimization alternatives. Let us enumerate from 1 to |J| all subsets of user classes or server applications which are feasible according to the definition of groups G1h and G2h . Let B denote the |C ∪ A| × |J| matrix whose column Bj represents the characteristic vector of the jth subset of client machines or server applications, which is defined as follows: The ith entry bij of Bj is equal to one if the client machine or server application i belongs to the jth subset; otherwise, it is equal to zero. Each column Bj is univocally associated with the minimum-cost cluster that can support all client machines or server applications in set j. Note that each set j ∈ J contains elements from C or A since it corresponds either to a thin cluster that supports the client computers of a set of user classes or to a cluster that supports a set of server applications. Each
A. Decision Variables In our model, optimization alternatives are represented by the following decision variables. 1) Selection of clusters 1, if the jth cluster in J is selected xj = 0, otherwise. 2) Allocation of clusters to sites 1, if the jth cluster in J is allocated s to site s ∈ S yj = 0, otherwise 1, if user class or server application α is allocated in site s and if user class or s server application β is allocated in a site wαβ = that is different from s 0, otherwise. 3) Reuse of legacy machines 1, if cluster j is replaced by using legacy cluster l on site s µslj = 0, otherwise 1, if legacy server i ∈ LS(s) on site s is sold s νi = 0, otherwise. B. Objective Function TCO, which is the objective function to be minimized, is defined as the summation of hardware-investment, hardware management, and network costs. 1) Hardware-investment = s s costs cj xj − ( (cj −uplj )µlj + rνis νis ). j∈J
s∈S l∈LC(s)j∈J
i∈LS(s)
a) Parameter cj represents the cost of the minimumcost cluster that can support all the user classes or server applications in set j ∈ J. That is to say, cj = mink∈SCj {n(j)(acq_ck + ai ∈Bj lic_cik )}, where SCj denotes the subset of server configurations that can support the user classes or server applications in set j, n(j) denotes the number of servers in cluster j, acq_ck denotes the acquisition cost of servers with configuration k (which includes the operating-system and installation costs [4]), and lic_cik denotes the license cost of server application ai when installed on configuration k (this term evaluates to zero when cluster j connects user classes because the license cost of client applications that are remotely executed by a thin cluster only depends on the number of users [7]). b) Parameter upslj represents the cost required to build a cluster that satisfies the same computational requirements of cluster j by reusing the servers in legacy cluster l ∈ LC(s) (upslj includes the cost of upgrading legacy machines in cluster l plus the cost of purchasing new servers if required). LC(s) represents the set
474
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 38, NO. 2, MARCH 2008
Fig. 2. Sample relationship between management hours and management costs Mngs (·).
of indices of all the clusters that can be built by using the legacy machines of site s. Hence, (cj − upslj ) represents the savings from building a cluster that is equivalent to cluster j by reusing legacy machines. c) Parameter rνis represents the residual economic value of legacy machine i of site s. 2) Hardware management costs = s Mng ( mng s s i+ s∈S i∈C j∈J yj pj ). a) Parameter pj represents the number of management hours required by cluster j (it depends both on the user classes or server applications allocated on cluster j and on cluster j’s configuration, for example k ∈ SC). That is, pj = i∈Bj pik , where parameter pik represents the number of management hours required by user class csi or application ai on cluster j with configuration k. Parameter mngi indicates the number of management hours required by the client machines of user class csi (which can be obtained as a cost benchmark from vendors; see Section V-A). b) Hardware management costs are computed as a nonlinear function Mngs (·) of management hours. The argument of function Mngs (·) is the total number of management hours required by all applications and user classes that are allocated on all clusters assigned to site s. Management hours can be either attributed to internal personnel or purchased. In the first case, they involve in-house costs, which are a stepwise function growing with the number of people that must be hired to provide the required amount of management hours; in the second case, they involve outsourcing costs, which are a linear function of management hours. Note that the magnitude of each step of in-house costs corresponds to the total annual cost of an additional human resource (see Section V-A). Mngs (·) is calculated as the minimum value between the in-house and outsourcing functions of costs for each value of management hours that is required (see the dashed line in Fig. 2). 3) Network costs = s s T C ( s s∈S (α,β)∈E Rαβ wαβ , (α,β)∈E Rβα wαβ ). a) Network costs of site s are computed as a 2-D stepwise linear function T Cs (·, ·) of the physical bandwidth required to support inbound and outbound information exchanges between user classes, applications, or thin clusters of site s and any other site (see Fig. 3). Note
Fig. 3. Sample relationship between inbound and outbound bandwidth and network costs T Cs (·, ·).
that the stepwise increases of network costs are due to a discrete bandwidth offer of network providers (see Section V-A). C. Problem Formulation The overall optimization problem is modeled as follows: P1) min
cj xj
j∈J
−
s∈S
+
cj − upslj µslj +
l∈LC(s) j∈J
Mngs
s∈S
+
T Cs
s∈S
such that
xj − yhs −
≤
yjs pj
s Rαβ wαβ ,
s Rβα wαβ
(α,β)∈E
∀i ∈ C ∪ A
(1)
∀j ∈ J
(2)
∀s ∈ S, ∀h, k ∈ J, h = k, ∀(α, β) ∈ E, α ∈ Bh , β ∈ Bk
s wαβ
µslj
rνis νis
j∈J
yjs = 0
s∈S yks ≤
mngi +
(α,β)∈E
i∈LS(s)
i∈C s
bij xj = 1
j∈J
yjs
∀j ∈ J, ∀s ∈ S
(3) (4)
l∈LS(s)
s lilj µslj +νis ≤ 1
∀i ∈ LS(s), ∀s ∈ S
l∈LC(s) j∈J
xj ∈ {0, 1} yjs ∈ {0, 1} wαβ ∈ {0, 1} µslj ∈ {0, 1} νis ∈ {0, 1}
∀j ∈ J ∀j ∈ J, ∀s ∈ S ∀(α, β) ∈ E ∀j ∈ J, ∀s ∈ S ∀i ∈ LS(s), ∀s ∈ S.
(5)
ARDAGNA et al.: JOINT OPTIMIZATION OF HARDWARE AND NETWORK COSTS
475
1) Meaning of Constraints: Constraint family (1) imposes that each user class or server application is assigned to exactly one cluster. Constraint family (1) summarizes two disjoint setpartitioning problems (SPPs): the first one selects thin clusters for user classes, whereas the second selects clusters for server applications. Constraint family (2) imposes that each cluster is assigned to exactly one site. It models the allocation of servers to sites. Constraint family (3) ties localization variables y s to variables w. A variable wαβ must evaluate to one if user classes or applications α and β have not been assigned to clusters allocated in the same site s and exchange data with each other in graph G. Constraint family (3) introduces the relationship required to compute network costs as a function of the allocation of servers to sites. Constraint families (2) and (3) together model a min k-cut problem with a nonlinear objective function. Constraint family (4) imposes that legacy cluster l can be upgraded in order to replace a cluster j at most once and only if cluster j has been selected and assigned to site s. Constraint family (5) imposes that each legacy server i is either sold or s used to build a legacy cluster. Parameter lilj is equal to one if legacy server i is used in legacy cluster l to replace cluster j on site s; otherwise, it is equal to zero. Constraint families (4) and (5) together represent a set of |S| set-packing problems (SPKPs) (one per site), and they model the tradeoff between upgrading legacy machines and purchasing new servers. 2) Observations: The overall problem can be seen as the combination of four highly intertwined families of problems: two SPPs, a min k-cut problem, and an SPKP. In order to cope with large and meaningful instances of the optimization problem, the problem must be decomposed because it involves a huge number of variables and constraints. We adopt the decomposition into the four families of problems discussed earlier, as will be described in the next section. Notwithstanding the decomposition, for a few specific subproblems, the number of variables grows exponentially with the number of applications and configurations. In those cases, we resort to a column generation approach, as discussed in Sections IV-A1 and B1.
2) Server optimization—Server applications are assigned to minimum-cost clusters of servers that satisfy computing requirements. This optimization subproblem is formalized as an SPP. 3) Server localization—The server machines identified by solving subproblems (1) and (2) are allocated to sites by minimizing overall network and management costs. This optimization subproblem is formalized as a min k-cut problem with a nonlinear objective function. 4) Reuse of legacy systems—The server machines identified by solving subproblems (1) and (2) and assigned to sites by solving subproblem (3) are possibly replaced with legacy machines to further reduce acquisition costs. This cost minimization problem is formalized as a family of SPKP, one for each site.
A. Client Optimization Disjoint sets of client computers that can share the same thin cluster, according to Definition 8 in Section II-B, are assigned to the same thin cluster. This assignment is modeled as a family of SPPs [39], one for each group G1h , where h ∈ IG1 , as follows. Let us enumerate all the nonempty subsets of elements in G1h , from 1 to |Jh |, for a given h ∈ IG1 . Let B denote the |G1h | × |Jh | matrix whose column Bj represents the characteristic vector of the jth subset of user classes—Qj (see Section III). Each column Bj corresponds to a cluster that can support the overall Rami and Mipsi requirements of all user classes in Qj . A cost cj = mink∈SCj {n(j) acq_ck } is associated with each column Bj and corresponds to the acquisition cost of the servers in the cluster (see Section III-B). Again, SCj denotes the subset of server configurations that can support the user classes in Qj , and n(j) denotes the number of servers in the cluster made of servers with configuration k ∈ SCj . Let xj denote a binary variable that is equal to one if the jth cluster in Jh is selected; otherwise, it is equal to zero. The optimization problem can be modeled as Ph )
min cj xj j∈Jh
IV. C OST M INIMIZATION A LGORITHM The overall optimization problem has been split into four intertwined subproblems that are solved in sequence. A final fine-tuning step that implements a tabu search (TS) approach is also performed in order to improve the, possibly, local optimum that is found through the isolated solution of the four subproblems. Problem decomposition, which is common in the solution of complex optimization problems, has been applied to reduce complexity and to obtain a good initial solution for the final TS step. The following subproblems have been identified. 1) Client optimization—User classes are assigned to minimum-cost thin clusters that satisfy computing requirements. The solution of the client optimization subproblem is generated by solving a family of disjoint SPPs, one for each group G1h , where h ∈ IG1 (see Constraint 1 in Section II-B).
bij xj = 1
∀i ∈ G1h
j∈Jh
xj ∈ {0, 1}
∀j ∈ Jh .
(6)
Each feasible solution identifies a set of clusters such that each user class in G1h is connected to exactly one of them. 1) Column generation: In order to solve each SPP Ph ), where h ∈ IG1 , the entire set of nonempty subsets of G1h can be generated only for small instances of the problem. For large instances, we apply a column generation technique to manage the growth in the number of variables [39]. First, we generate a restricted number of columns that represent a suitable allocation of all user classes on a set of clusters. Let RB denote the matrix composed by this set of columns. Then, we solve the linear relaxation of the SPP on matrix RB, obtaining a primal optimal solution x∗ and a vector π ∗ of optimal dual variables. In order to verify whether x∗ is optimal for the whole problem (i.e., for matrix B), we must verify whether the columns in B − RB have a reduced cost that is greater than or equal
476
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 38, NO. 2, MARCH 2008
to zero. This can be done by solving the following pricing problems: inν_ck , zk − PPh ) w∗ = min πi xik
otherwise, it is equal to zero. The optimization problem can be modeled as cj xj P2) min j∈J
i∈G1h
k∈SC
Rami xik ≤ RAM_Sk zk
∀k ∈ SC
(7)
i∈G1h
Mipsi xik ≤ MIP_Sk zk
∀k ∈ SC
xik ≤ 1
∀i ∈
G1h
xj ∈ {0, 1}
(8)
i∈G1h
(9)
k∈SC
xik ∈ {0, 1} ∀i ∈ G1h , ∀k ∈ SC + ∀k ∈ SC zk ∈ Z where xik is equal to one if user class i is assigned to a cluster with configuration k (otherwise, it is equal to zero), and zk is the number of servers in the cluster with configuration k. Constraint families (7) and (8) impose that the number of servers in a cluster provides enough memory and computing capacities to support all user classes allocated on the cluster. Constraint family (9) imposes that each user class i is assigned to one and only one cluster j of machines with configuration k. If w∗ is negative, then the corresponding set-partitioning model has to be expanded by adding a new column to RB, for example Bj , with bij = xik , ∀ i ∈ G1h , for each zk > 0. This process is repeated until w∗ is nonnegative. Then, the original SPP (as opposed to its linear relaxation) is optimally solved on RB. It is not guaranteed that the integral solution is optimal for B. However, for large instances of the optimization problem, we accept it as a good approximation of the optimum. Finally, we observe that the degrees of freedom of the setpartitioning models could be used to address additional practical requirements, such as scalability and availability. B. Server Optimization This subproblem considers the optimum allocation of server applications to clusters. Note that the set of servers involved in this subproblem excludes thin/hybrid servers. Server applications are organized in tiers, i.e., into sets of server applications that cooperate to manage the same request. Each server application or application tier must be assigned to exactly one cluster. Similar to the client optimization problem, this problem is modeled as an SPP as follows. Let us enumerate, from 1 to |J|, all subsets of elements in G2h for all h ∈ IG2 . Let B denote the |A| × |J| matrix whose column Bj represents the characteristic vector of the jth subset of server applications—Qj . Each column Bj corresponds to a cluster such that each individual server in the cluster has enough memory to support all server applications in Qj , as discussed in Definition 8 in Section II-B. Similarly, the number of servers in the cluster provides enough computing capacity to support all server applications in Qj . A cost cj , corresponding to the acquisition cost of all servers in the cluster (see Section III-B), is associated with each column Bj . Let xj denote a binary variable that is equal to one if the jth cluster in J is selected;
bij xj = 1
∀i ∈ A
j∈J
∀j ∈ J.
(10)
Each feasible solution of the SPP P2) identifies a set of clusters such that each server application in A is allocated to exactly one cluster of the identified set. 1) Column Generation: Since groups in G2h , where h ∈ IG2 , can overlap, we have a single SPP, not a set of disjoint problems. Hence, we need to resort to a column generation technique even for small instances of the optimization problem. The approach is the same as for the client optimization phase (see Section IV-A1). We report the pricing model. PP2) ∗ wA = min
inν_ck zk +
i∈A
k∈SC
lcik uik −
πi xik
i∈A
Rami xik ≤ RAM_Sk ∀k ∈ SC, ∀h ∈ IG2
(11)
Mipsi xik ≤ MIP_Sk zk
(12)
i∈G2h
∀k ∈ SC, ∀h ∈ IG2
i∈G2h
xik ≤ 1
∀i ∈ A
(13)
k∈SC
zk − uik ≤ N (1 − xik ) xik ∈ {0, 1}, uik ∈ Z zk ∈ Z
+
+
∀i ∈ A, ∀k ∈ SC
(14)
∀i ∈ A, ∀k ∈ SC
∀k ∈ SC
where xik is equal to one if server application i is assigned to a cluster with configuration k (otherwise, it is equal to zero), zk is the number of servers in the cluster with configuration k, and uik represents the number of application licenses to be purchased to support server application ai on the cluster with configuration k. Constraint family (11) imposes that each individual server in a cluster has enough memory to support all the server applications that have been assigned to the cluster. Similarly, constraint family (12) imposes that the number of servers in a cluster provides enough computing capacity to support all the server applications that have been assigned to the cluster. Constraint family (13) imposes that each application i is assigned to one and only one cluster j of servers with configuration k. Constraint family (14) forces uik to be at least as high as zk if xik is equal to one; otherwise, it is equal to zero since it introduces a positive contribution to the objective function. C. Server Localization This subproblem considers the optimum allocation of clusters to sites. Two cost items are affected by the allocation of clusters: hardware management and network costs that are
ARDAGNA et al.: JOINT OPTIMIZATION OF HARDWARE AND NETWORK COSTS
Fig. 4.
477
Set of reuse options for legacy machines LServer1 and LServer2 .
evaluated by means of the Mngs (·) and T Cs (·, ·) functions, respectively (see Section III-B). This cost minimization subproblem can be modeled as a network optimization problem as follows. Let us consider a directed graph G = (V, E) and a subset VCL ⊆ V − {ν0 }. Vertices in VCL represent clusters, whereas vertices in V − VCL − {ν0 } represent client computers that are located at the same site as their user class. The set of arcs E represents possible data exchanges between client computers and clusters, among clusters, and between ν0 (i.e., the special node that represents external applications) and clusters. The problem consists in partitioning VCL into disjoint subsets Js , where s = 1, . . . , |S|, in order to minimize the following objective function:
Mngs
s∈S
+
mngi +
i∈C s
pj
Each site s has a set of legacy clusters LC(s), a set of legacy servers LS(s), and a set Js of clusters of servers allocated to site s by the previous optimization steps. Each cluster j ∈ Js could be replaced by one or more combinations of legacy clusters l ∈ LC(s). Moreover, each legacy cluster could be upgraded to provide greater capacity. This problem can be modeled as a family of SPKPs, one for each site s, as follows: P3) max (cj − uplj ) µlj + rνi νi l∈LC(s) j∈Js
such that
l∈L(s)
j∈Js
µlj ≤ 1
i∈LS(s)
∀j ∈ Js
lilj µlj + νi ≤ 1
(15) ∀i ∈ LS(s)
(16)
l∈LC(s) j∈Js
Rαβ + T Cs (α,β)∈E s∈S
D. Reuse of Legacy Systems
α in s β in t=s
(α,β)∈E β in s α in t=s
Rαν0 ,
(α,ν0 )∈E α in s
Rαβ +
(ν0 ,β)∈E β in s
Rν0 β
where α(β) in s(t) denotes that vertex α(β) has been located at site s(t). For each site s, the first term is the management cost of all clusters and user classes located in s, and the second term represents the cost of the bandwidth required to connect vertices in s, where vertices in sites are different from s. In each feasible solution, each cluster is assigned to one site, and each set of client computers is assigned to the site of the corresponding user class. If all values pi equal zero, VCL is equal to V , and network costs are a linear function of bandwidth, the aforementioned network problem is known in the literature as the min k-cut problem, where k = |S|. The problem is strongly nondeterministic polynomial-time hard, and a heuristic approach based on local search has been adopted. The neighborhood of each feasible solution is defined by all solutions that can be obtained by moving a cluster to a different site for all clusters. The search is guided by a TS metaheuristic in which only the short-term memory mechanism has been implemented (see Section IV-E for a short introduction to TS concepts).
µlj ∈ {0, 1} νi ∈ {0, 1}
∀l ∈ LC(s), ∀j ∈ Js ∀i ∈ LS(s).
Binary variable µlj is equal to one only if legacy cluster l is selected to replace cluster j. Binary variable νi is equal to one only if legacy server i is sold. Constraint family (15) imposes that legacy cluster l can be upgraded at most once in order to replace cluster j. Constraint family (16) imposes that each legacy server i is either sold or used to form at most one legacy cluster. Parameter lilj is equal to one if legacy server i is used in legacy cluster l to replace cluster j; otherwise, it is equal to zero. Each feasible solution of SPKP identifies a set of clusters of servers that can be replaced at most once by using each legacy machine at most once. The objective function maximizes the savings from either the reuse or the sale of legacy machines. As an example, let us consider two legacy machines LServer1 and LServer2 (index i is set to values 1 and 2, respectively) and two clusters composed by one server—Server1 and Server2 (index j is set to values 1 and 2, respectively). Legacy machines can be combined to form three legacy clusters {Lserver1 }, {Lserver1 , Lserver2 }, and {Lserver2 } (index l is set to values 1, 2, and 3, respectively). The feasible alternatives are shown in Fig. 4. Setting µ11 to one means the use of the first legacy cluster ({Lserver1 }) to replace Server1 , whereas setting µ22 to one means the use of the second legacy cluster ({Lserver1 , Lserver2 }) to replace Server2 . If the cost vector is [10, 22, 14, 21, 4, 3], the optimal solutions are µ11 = µ32 = 1 and µ21 = µ22 = ν1 = ν2 = 0. Costs are minimized if legacy machines LServer1 and LServer2 are used to replace Server1 and Server2 , respectively.
478
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 38, NO. 2, MARCH 2008
1) Column Generation: Even if we have an SPKP for each site s, the number of different ways of combining legacy machines rapidly increases with the cardinality of set L(s) and with the number of clusters assigned to each site. Hence, we resort to a column generation technique. Let SC s ⊆ SC denote the set of configurations which legacy servers in LS(s) can be upgraded to. We build a pricing model PPjk) for each cluster j ∈ Js and for each configuration k ∈ SC s . Here, we report the model that considers the optimal reuse of legacy machines to replace thin clusters. PPjk)
w = cj − min ck zk + c_upik xi + πj + πi xi i∈LS(s)
i∈LS(s)
Ramh Mipsh C s ∈Bj Chs ∈Bj xi +zk ≥ max h , RAM_Sk MIP_Sk i∈LS(s) (17) xi ∈ {0, 1}
∀i ∈ LS(s)
zk ∈ Z . +
πj and πi are the optimal dual variables of the corresponding indices in constraint families (15) and (16) in the linear relaxation of current model P3). c_upik indicates the cost of upgrading legacy server i to configuration k. The term that is minimized in the objective function corresponds to the cost required to build a cluster that satisfies the same computational requirements of cluster j by purchasing zk new servers with configuration k and upgrading a subset of legacy machines to the same configuration. Constraint (17) imposes that the number of new and upgraded servers provides enough computing and memory capacities to satisfy the requirements of user classes. Note that the model that corresponds to the optimal reuse of legacy machines to replace clusters supporting server applications can be obtained by replacing constraint (17) with the following: Mipsh ah ∈Bj (18) xi + zk ≥ MIP_Sk i∈LS(s) s and configurations k ∈ SC are such that RAM_Sk ≥ ah ∈Bj Ramh . This problem can be solved in polynomial time with the following greedy algorithm, where r denotes the value of the right-hand side of constraint (17) [or (18)]. 1) Sort the c_upik + πi , ck values in nondecreasing order, and denote with l the position of value ck in the sequence. 2) If l is greater than r, then upgrade the first r legacy machines in the sequence computed in step 1). Else, upgrade the first l − 1 machines in the sequence, and purchase r − l + 1 server machines with configuration k.
E. Fine-Tuning Step The decomposition of the overall optimization problem into four subproblems does not guarantee that the final solution is
a global optimum. Hence, a fine-tuning step based on a TS approach [20] has been implemented to possibly improve the solution obtained by separately solving the four subproblems. TS is a metaheuristic that guides a local-search procedure to explore the solution space beyond local optimality. Let X denote the set of feasible solutions of our problem. To each x ∈ X, we associate a subset of X, which is called neighborhood of x and denoted with N (x). The neighborhood of x contains all the solutions that can be obtained with simple modifications of x, which are called moves. Given a feasible solution x, TS selects the solution with the best value of the objective function within a subset of N (x). The selection of a solution in N (x) is forbidden if that solution has already been selected in a previous iteration. Forbidden solutions are called tabu. To identify a solution, TS records the necessary information, which is called attributes, in a memory structure, which is called tabu list. The length of the tabu list, which is called tabu tenure, is limited, and the tabu list is managed with a first-in first-out (FIFO) policy. The tabu status of a solution can be overruled if certain conditions (called aspiration criteria) are verified. A common adopted aspiration criterion accepts a tabu solution if its objective function is strictly better than that of all the solutions that have already been explored. TS stop criteria are based on the total elapsed time or total number of iterations. In our model, the neighborhood of a solution is defined as follows. A user class (e.g., Cis1 ) or a server application (e.g., aj ) is disconnected from a cluster (e.g., ClusterA ) to which it is currently connected [see Fig. 5(a) and (b)]. A new minimumcost cluster (e.g., ClusterB ) is selected to replace ClusterA [Fig. 4(b)]. Costs are evaluated by assuming that ClusterB is located in the same site (e.g., s2 ) of the cluster that is replaced. A new minimum-cost cluster (e.g., ClusterC ) is selected to support Cis1 (or ai ), and the costs of allocating ClusterC in a site that is different from s2 are evaluated [Fig. 5(c)]. Hardware management and network costs are calculated by means of the Mngs (·) and T Cs (·, ·) functions. This way, a destination site (e.g., s3 ) is identified for ClusterC . At last, the possibility of discarding ClusterC is evaluated by connecting Cis1 (or ai ) to a different cluster in s3 [Fig. 5(d)]. For each server that is considered, the cost of upgrades is accounted for in the management-cost function Mngs (·). In order to reevaluate the optimal reuse of legacy machines, a few SPKP problems have to be solved. We manage the pool of columns generated in the solution of the fourth subproblem (see Section IV-D), and we generate new columns only when needed, as discussed in the following. 1) The SPKP problem is solved twice in site S2 in order to evaluate the optimal reuse of legacy servers in site S2 when user class Cis1 (or application ai ) is relocated or simply remains in site S2 with a dedicated cluster. Note that, in the legacy-reuse problem, the columns associated with ClusterA are discarded, and only the columns associated with ClusterB and ClusterC have to be generated. 2) The SPKP problem is solved once for each |S| − 1 destination site in order to evaluate the optimal reuse of legacy servers in the new site when user class Cis1 (or application ai ) is assigned to a new dedicated cluster ClusterC . Note that, in the legacy-reuse problem, only the columns associated with ClusterC have to be generated.
ARDAGNA et al.: JOINT OPTIMIZATION OF HARDWARE AND NETWORK COSTS
Fig. 5.
479
Moves of the fine-tuning step.
3) The SPKP problem is solved once for each possible cluster ClusterD that can host user class Cis1 (or application ai ) in a new destination site if ClusterC is discarded. Note that, in the legacy-reuse problem, only the columns associated with ClusterD have to be generated, whereas the columns associated with ClusterC are discarded. Each move shifts one element from a set to a different one. For example, an application is moved to a different cluster. The attributes of a move are the identifier of the element that is moved and the identifier of the set from which the element is removed. Hence, a move is considered tabu if it assigns an element to the same set from which it has been recently removed. The short-term memory management mechanism for the tabu list has been implemented as a FIFO list. The length of the tabu list varies with the size of the neighborhood (see, for example, [14]). If a move improves the current solution, then the length of the list is decreased in order to allow a deep search of the new solution domain. Conversely, if the current solution is worsened, the length of the tabu list is increased. Anyway, the size of the list varies between a minimum and maximum, whose most common experimental values are five and ten, respectively. F. Lower Bounds The quality of our heuristic procedures is evaluated by computing lower bounds to the value of global optimal solutions. Model P1) cannot be used to solve the entire optimization problem or to compute its continuous relaxation because the numbers of variables and constraints quickly grow beyond the size limits of commercial linear programming solvers. The following approach is taken to overcome these difficulties. Three terms contribute to the value of the objective function: hardware acquisition and management costs, network costs, and savings from the reuse of legacy servers. A lower bound of total costs can be calculated as the summation of two lower bounds of the first two terms and an upper bound of the third term. A lower
bound of the first term is obtained by identifying the minimumcost clusters of servers that satisfy computing requirements for thin clients, hybrid clients, and server applications and by centralizing these clusters in the site that requires minimum per-hour management costs. This represents a lower bound because it maximizes scale economies through centralization, and management costs grow linearly with hardware investment costs. Hence, when one site is considered, adding management costs to investment costs, as opposed to considering investment costs only, does not affect the ranking of solutions. To compute this bound, we optimally solve the following SPP: cj xj P4) ra∗ = min
j∈J
∀i ∈ C ∪ A
bij xj = 1
j∈J
xj ∈ {0, 1}
∀j ∈ J
∗ and then we add to ra the quantity Mngs¯( i∈C mngi + ¯ ∈ S is the site with the lowest management j∈J ∗ pj ), where s cost, and J ∗ is the set of clusters selected in the optimal solution of P4). A lower bound of network costs is calculated by optimally solving the following min k-cut problem: rαβ wαβ P5) rb∗ = min (α,β)∈E
yαs
− yβs ≤ wαβ
∀s ∈ S, ∀(α, β) ∈ E,
s∈Q
∀Q ⊂ S yis
s∈S yis ∈
=1
{0, 1}
wαβ ∈ {0, 1}
∀i ∈ C ∪ A ∀i ∈ C ∪ A, ∀s ∈ S ∀(α, β) ∈ E
480
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 38, NO. 2, MARCH 2008
where yis is equal to one if server application i is assigned to site s; otherwise, it is equal to zero. rαβ is a lower bound of the network costs, which is necessary to support two vertices α and β that exchange data within graph G. The model P5) has been derived from the min k-cut formulation presented in [18]. Finally, we can overestimate the third term by selling all legacy machines at a price equal to the investment cost that was sustained when they were purchased and by supposing that this budget can be used to buy new machines. This is equivalent to saving the initial investment cost and to considering the design of a new system from scratch. This is correct under the realistic assumption that hardware prices do not increase over time. Hence, an overall lower bound is provided by LB = ra∗ +Mngs¯ mngi + pj + rb∗ − cj(l) i∈C
j∈J ∗
s∈S l∈L(s)
where cj(l) is the initial acquisition cost of the legacy cluster l. V. E MPIRICAL V ERIFICATIONS Empirical verifications have been supported by Infrastructure Systems Integrated Design Environment (ISIDE), a prototype tool that implements the cost minimization algorithm. The tool includes a database of commercial infrastructural components and related cost data that are described in the next section. For the solution of linear integer programming models, ISIDE calls for CPLEX 8.0 library routines. Results are presented in Section V-B. A. Data Sample of Physical Components and Costs Testing has been based on 5000 server configurations from four vendors. Configurations of servers have been considered distinct if they have a different number or type of CPUs or a different RAM capacity. The database not only includes entrylevel servers but also midrange, blade, and high-performance servers that are typically adopted in server consolidation projects. Hardware-acquisition costs, including the cost of upgrades, have been obtained from vendors’ Internet sites, hardware technical documentation, and configuration tools. The residual value of legacy servers has been calculated as a linearly decreasing percentage of acquisition costs over a three-year life cycle. Hardware management costs have been evaluated as a percentage of acquisition costs [28], as discussed in previous empirical research works [4], [7]. Ad hoc surveys have been necessary to collect network costs because the cost of broadband connections are mostly confidential. A questionnaire has been submitted to four international carriers. The questionnaire has surveyed per-site annual fees of always-on VPN access technologies, which are leased lines and asymmetric digital subscriber line. Each carrier has provided cost figures on 80 VPN configurations between 64 kb/s and 64 Mb/s. Empirical verifications have considered the mean value of costs across the four carriers for each VPN configuration. Overall, the data collection effort has been significant, and the resulting database of physical components and costs re-
quires continuous maintenance. The ISIDE database has been updated twice. While the initial data collection has been cumbersome and has required a six-month full-time data collection effort, each subsequent update has required a one-month effort. Most data are available from public sources. Ad hoc surveys are necessary only for the collection of network costs. B. Empirical Results This section provides empirical evidence of the cost and time efficiency of the optimization algorithm. Simulations have been supported by a PIV 3-GHz Windows XP workstation with 1 GB of RAM. Analyses focus on three case studies—a multidepartment university, an Internet banking system, and an information retrieval system. Although encompassing all optimization alternatives discussed in Section I, the three case studies have substantially different technology requirements. In the first case study, user classes are numerous and use a variety of applications, making the allocation of servers to sites a critical design alternative. The Internet banking system is composed of complex multitier applications whose allocation on servers is particularly cumbersome. In addition, the information retrieval system is characterized by CPU-intensive applications, and the design of server farms plays an important role. The computing requirement data of the test cases are reported in [5]. In order to evaluate the performance of the cost minimization algorithm, each case study is analyzed for an increasing number of user classes, applications, and sites. Analyses are performed both with and without legacy systems. Legacy systems are considered by assuming that an initial solution built from scratch has to be modified after a six-month period in order to support a 50% increase of the overall load. Cost and time efficiency are evaluated by comparing the algorithm’s output with the output of the fine-tuning step (see Section IV-E) starting from an initial solution obtained by applying the following rules. 1) User classes adopting thin- or hybrid-client computers and belonging to the same group G1h are assigned to a single cluster, according to the server-consolidation principle [23]. 2) Applications belonging to the same group G2h are assigned to a single cluster, according to the serverconsolidation principle [22], [23]. Applications belonging to multiple groups are allocated to the cluster that maximizes the number of tiers of requests. 3) All servers are located in one site, which is selected by minimizing management costs, according to the server consolidation principle [23]. 4) Clusters are implemented by selecting the smallest server that can support applications, to reduce hardware acquisition costs, according to the “think big, but build small” design paradigm [36]. 5) Legacy components are reused if their cost of the upgrade is lower than the TCO of new machines. Intuitively, rules 1)–5) implement the recentralization professional design guidelines discussed in Section I. In the following, the initial solution that is obtained by applying the problem decomposition described in Section IV will be indicated as SolutionA , whereas the final algorithm’s solution will be indicated as SolutionB . The solution that is identified
ARDAGNA et al.: JOINT OPTIMIZATION OF HARDWARE AND NETWORK COSTS
481
TABLE I SUMMARY OF RESULTS FOR A MULTIDEPARTMENT UNIVERSITY
by applying rules 1)–5) will be referred to as SolutionC . For the sake of completeness, methodological results are also compared with a solution that is obtained by applying rules 1), 2), 4), and 5) and by replicating shared applications in each site on a dedicated cluster. This solution will be referred to as SolutionD and implements the decentralization principle that has preceded recentralization. Comparisons are based on the following metrics. 1) Improvement of the methodological initial solution: It represents the percent improvement of SolutionA and is a measure of efficiency of the final TS step. It is evaluated as (SolutionA − SolutionB )/SolutionB . 2) Improvement of the professional centralized solution: It represents the percent improvement of SolutionC and is a measure of the efficiency of our optimization approach compared to professional design guidelines. It is evaluated as (SolutionC − SolutionB )/SolutionB . 3) Improvement of the professional decentralized solution: It represents the percent improvement of SolutionB with respect to SolutionD . It is evaluated as (SolutionD − SolutionB )/SolutionB . Results show that the problem decomposition is very efficient because SolutionA is never improved by the fine-tuning step (i.e., SolutionA and SolutionB coincide), and the gap with the lower bound is small. Sensitivity and scalability analyses have also been performed and are reported in [5]. Sensitivity analyses show that the uncertainty on cost benchmarks has the same effects on both methodological and professional solutions. Scalability analyses show that the professional solution is more costly to upgrade. The algorithm’s solutions have also been compared with real system implemented in serverconsolidation projects. Results are reported in [8] and show that, on average, the algorithm’s solutions are 25% cheaper than the solutions designed by technology experts. 1) Multidepartment University: The university is composed by K departments (sites), where K ranges between one and six. Each site hosts three user classes, administrative staff, software engineering (SE) researchers and electronic engineering (EE) researchers. Each class is composed of 100 users. All users use a browser, an e-mail client, and an office automation suite. SE researchers use an integrated development environment, whereas EE researchers use a circuit simulator. The administrative staff is assigned to thin clients, whereas researchers are assigned to hybrid clients. An e-mail and a Web/proxy server application are introduced for each department, running on W2000 and Linux servers, respectively. The Web server is ac-
cessed not only by internal clients but also by external Internet users. Data on RAM, MIPS, and data exchanges among server applications have been empirically obtained from the analysis of our university’s system logs. Two groups are specified: G21 for e-mail servers and G22 for Web servers. A single group G11 , including all user classes, is specified. The solution identified by our methodology is fully distributed for all values of K. For each site, e-mail and Web applications are allocated on dedicated servers, and a thin cluster supporting all user classes is introduced. When legacies are considered, legacy Web server clusters are dismissed, whereas legacy e-mail servers are reused to support thin clusters. If SolutionB is compared with SolutionC , cost savings are higher than 280% because professional guidelines suggest the centralization of servers in a single site. If SolutionB is compared with SolutionD , cost savings are in the 20%–40% range. SolutionB is fully decentralized, and as a consequence, savings are lower than those obtained with respect to the professional centralized solution. The professional decentralized solution is improved with a less costly design of server clusters, which also involves lower management costs. For the one-site test case, the professional solution is improved by about 20% because, in this case, the difference between the methodological and professional solutions is only due to a different sizing of servers. Table I summarizes the total execution time, the metrics defined in Section V-B, and the computed lower bounds, as discussed in Section IV-F, as a function of the number of sites K. Note that, with one site, the solution identified by the decomposition is optimal. For K > 1, without considering legacies, the gap with the lower bound is about 5% and is mainly due to the linearization of management costs and to the evaluation of costs of thin clusters and clusters. In the legacy test case, the gap with the lower bound is about 20%, and it is mainly due to the weak estimation of the legacy component of the cost function in the lower bound. In general, as the size of the system (i.e., the number of sites) grows, the improvement of both the centralized and decentralized professional solutions increases. 2) Internet Banking System: The system is distributed over K sites, where K ranges between one and three. Each site supports 100 000 Internet users accessing the following applications: 1) Web server application; 2) servlet engine;
482
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 38, NO. 2, MARCH 2008
TABLE II SUMMARY OF RESULTS FOR AN INTERNET BANKING SYSTEM
3) application server; 4) relational DBMS storing historical data on stock quotes; 5) object-oriented DBMS storing user data. Users issue two types of requests—information retrieval and transaction execution—with a 10 : 1 ratio. The overall average access rate to the system is about 250 accesses per hour. Data on user classes, requests, RAM, MIPS, and data exchange requirements among server applications have been obtained from the logs of a large national financial institution. The target system for the optimization is based on UltraSPARC Solaris servers (see the observation on MIPS in Section II-B). DBMSs are replicated in all sites, and transactions write multiple copies of data synchronously for fault tolerance purposes. Three different allocations of applications into tiers are allowed: a five-tier allocation, which assigns each server application to a single tier, and two four-tier allocations, which assign the servlet engine to the same tier of either the Web or application server. This is obtained by introducing two G2h groups: The first group includes Web servers and servlet engines; the second group includes servlet engines and application servers. Results are reported in Table II. The decomposition is effective because it enables a 10%–20% reduction of TCO, which increases with the size and complexity of the system. SolutionB and SolutionC are different from each other. In SolutionB , database servers are replicated in all sites, according to design constraints, and Web applications are centralized on one cluster in one site. Servlet engines and application servers are allocated on the same cluster, but one such cluster is allocated on each site to serve the site’s user classes. This contrasts against professional guidelines suggesting the allocation of applications with the maximum number of tiers and the centralization of corresponding tiers of different instances of the same application (see Section I). SolutionC allocates all server applications to the same site on five tiers. As in the previous test case, with one site, the solution identified by the decomposition is optimal. For K > 1, the gap with the lower bound is about 2%, which is mainly due to the linearization of management costs, and increases up to 5% in legacy test cases. Note that, if SolutionB is compared with SolutionD , cost savings are in the 10%–20% range. As in the previous case, SolutionB is partly decentralized, and as a consequence, savings are lower than those obtained with respect to the professional centralized solution. The professional decentralized solution is improved by adopting a lower number of tiers and, thus, optimizing the design of clusters.
3) Information Retrieval System: The following test case is based on the work presented in [13]. An information retrieval system is considered, which is composed by two applications: 1) the information retrieval engine, called the inquiry server, which stores and retrieves documents by providing a query interface; 2) the central broker, called the connection server, which administers the connection between users and inquiry servers by maintaining a list of available connections and by routing queries and responses accordingly. The system is distributed over K sites, where K ranges between one and three. Each site has a local connection server and four inquiry servers. Local connection servers distribute requests to both local and remote inquiry servers (which store different data). External users access the distributed system from the Internet and can issue search, summary retrieval, and document retrieval commands. Requests are uniformly distributed across inquiry servers. The target system for the optimization is based on Alpha servers (see observations in Section II-B). Data on MIPS and RAM requirements and data exchanges are obtained from [13]. Two groups are introduced: G21 , including all the instances of connection servers, and G22 , including all inquiry servers. In both SolutionB and SolutionC , severs are localized in one site. Anyhow, in SolutionB , connection servers are centralized within the same cluster, whereas each instance of the inquiry server is assigned to a separate cluster. The professional solution suggests the location of all servers in one site and the centralization of both connection and inquiry servers in two clusters. When legacy systems are considered for reuse, they are upgraded to accommodate increasing requirements. The decomposition is effective as it identifies a solution that is 30%–60% cheaper than SolutionC . If SolutionB is compared with SolutionD , cost savings are in the 30%–80% range. SolutionB is centralized, and as a consequence, savings are significantly higher than those obtained with respect to the professional decentralized solution. Savings are significant also with respect to the professional centralized solution because the optimization of tiers and clusters has a greater impact with CPU-intensive applications. As in the previous test cases, with one site, the solution identified by the decomposition is optimal. For K > 1, the gap with the lower bound is about 4% and is mainly due to the linearization of management costs. When legacies are considered, the gap increases up to 27%. As in the first test case, this is mainly due to the weak estimation of the legacy
ARDAGNA et al.: JOINT OPTIMIZATION OF HARDWARE AND NETWORK COSTS
483
TABLE III SUMMARY OF RESULTS FOR AN INFORMATION RETRIEVAL SYSTEM
component of the cost function in the lower bound. Results are summarized in Table III.
A. Molteni and S. Amati for their assistance in data collection and development activities.
VI. C ONCLUSION
R EFERENCES
We have proposed an overall approach to support the costoriented design of hardware and network systems considering infrastructural, network, and management costs. Most of the design alternatives enabled by current technologies are addressed, including server sizing, localization of multitier applications, and reuse of legacy machines. Cost reductions have been evaluated by comparing the cost of methodological results with those of architectural solutions that are obtained by applying professional design guidelines. The quality of heuristic optimization results has been evaluated through comparison with lower bounds. Testing results indicate that cost reductions can be significant. The divide et impera design paradigm proposed by the literature does not seem to correct from a cost perspective. When design choices combine, the algorithm’s solution does not coincide with the juxtaposition of separate solutions for hardware and network systems. Cost reductions considerably grow with the size and complexity of the system, and current professional rules are challenged by findings. This indicates that general design guidelines are difficult to infer from empirical results. Analyses show that the geographical centralization of servers can reduce management costs but cannot be assumed as a universal cost-minimizing paradigm [11], [22], [23]. Similarly, a higher number of tiers does not represent a reliable source of savings (see Section I and [36]). Results encourage future research to both extend the algorithm and improve the support tool. From a methodological standpoint, cost-oriented design is suitable for the selection of a combination of technology resources, whereas it involves an approximation in their individual sizing. This represents the main limitation of this paper. To inspect the consequence of this approximation, future work should consider the integration of the cost-oriented algorithm with traditional performance analyses that provide precise sizing information. The range of design choices will be completed by including SAN design.
[1] T. F. Abdelzaher, K. G. Shin, and N. Bhatti, “User-level QoS-adaptive resource management in server end-systems,” IEEE Trans. Comput., vol. 52, no. 5, pp. 678–685, May 2003. [2] T. F. Abdelzaher and C. Lu, “Schedulability analysis and utilization bounds for highly scalable real-time services,” in Proc. Real-Time Technol. Appl. Symp., Taipei, Taiwan, 2001, pp. 15–25. [3] T. F. Abdelzaher, K. G. Shin, and N. Bhatti, “Performance guarantees for Web server end-systems: A control-theoretical approach,” IEEE Trans. Parallel Distrib. Syst., vol. 13, no. 1, pp. 80–96, Jan. 2002. [4] D. Ardagna and C. Francalanci, “A cost-oriented methodology for the design of Web based IT architectures,” in Proc. 17th ACM Symp. Appl. Comput., Madrid, Spain, 2002, pp. 1127–1133. [5] D. Ardagna, “A cost-oriented methodology for the design of information technology architectures,” Ph.D. dissertation, Dept. Electron. Inf., Politecnico di Milano, Milano, Italy, 2004. [6] D. Ardagna, C. Francalanci, and M. Trubian, “A cost-oriented approach for infrastructural design,” in Proc. 19th ACM Symp. Appl. Comput., Nicosia, Cyprus, 2004, pp. 1431–1437. [7] D. Ardagna and C. Francalanci, “A cost-oriented approach for the design of IT architectures,” J. Inf. Technol., vol. 20, no. 1, pp. 32–51, Feb. 2005. [8] D. Ardagna, C. Francalanci, G. Bazzigaluppi, M. Gatti, F. Silveri, and M. Trubian, “A cost-oriented tool to support server consolidation,” in Proc. 7th Int. Conf. Enterprise Inf. Syst., Miami, FL, 2005, pp. 323–330. [9] A. Aue and M. Breu, “Distributed information systems: An advanced methodology,” IEEE Trans. Softw. Eng., vol. 20, no. 8, pp. 594–605, Aug. 1994. [10] O. Berman and S. Vasudeva, “Approximating performance measure for public services,” IEEE Trans. Syst., Man, Cybern. A, Syst., Humans, vol. 35, no. 4, pp. 583–591, Jul. 2005. [11] M. Betts, The Next Chapter. The Future of Hardware, 2002, Computerworld. [Online]. Available: http://www.computerworld.com/ action/article.do?command=viewArticleTOC&specialReportId=120& articleId=75887 [12] J. E. Blyler and G. A. Ray, What’s Size Got to Do With It? Understanding Computer Rightsizing, ser. Understanding Science & Technology Series. New York: Wiley-IEEE Press, 1998. [13] B. Cahoon, S. McKinley, and Z. Lu, “Evaluating the performance of distributed architectures for information retrieval using a variety of workloads,” ACM Trans. Inf. Syst., vol. 18, no. 1, pp. 1–43, Jan. 2000. [14] L. W. Clarke and G. Anandalingam, “An integrated system for designing minimum cost survivable telecommunication network,” IEEE Trans. Syst., Man, Cybern. A, Syst., Humans, vol. 26, no. 6, pp. 856–862, Nov. 1996. [15] M. Dell’Amico and M. Trubian, “Applying tabu search to the jobshop scheduling problem,” Ann. Oper. Res., vol. 41, no. 1–4, pp. 231– 252, 1993. [16] R. Dumke, C. Rautenstrauch, and A. Schmietendorf, Performance Engineering: State of the Art and Current Trends. New York: SpringerVerlag, 2001. [17] P. Ein-Dor, “Grosch’s law re-revisited: CPU power and the cost of computation,” Commun. ACM, vol. 28, no. 2, pp. 142–151, Feb. 1985. [18] W. Emmerich, Engineering Distributed Objects. Hoboken, NJ: Wiley, 2000.
ACKNOWLEDGMENT The authors would like to thank Z. Liu and L. Zhang of IBM for their feedback on the early versions of ISIDE and
484
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 38, NO. 2, MARCH 2008
[19] C. E. Ferreira, A. Martin, C. C. De Souza, R. Weismantel, and L. A. Wolsey, “Formulations and valid inequalities for the node capacitated graph partitioning problem,” Math. Program., vol. 74, no. 3, pp. 247–266, Sep. 1996. [20] B. Gavish and H. Pirkul, “Computer and database location in distributed computer systems,” IEEE Trans. Comput., vol. C-35, no. 7, pp. 583–590, Jul. 1986. [21] F. Glover and M. Laguna, Tabu Search. Norwell, MA: Kluwer, 1997. [22] M. Harchol-Balter and A. B. Downey, “Exploiting process lifetime distributions for dynamic load balancing,” ACM Trans. Comput. Syst., vol. 15, no. 3, pp. 253–285, Aug. 1997. [23] VMware for Server Consolidation and Virtualization, HP, Palo Alto, CA, 2007. [Online]. Available: http://h71019.www7.hp.com/ActiveAnswers/ cache/71086-0-0-0-121.html [24] Server Consolidation, IBM, Armonk, NY, 2007. [Online]. Available: http://www-03.ibm.com/servers/solutions/serverconsolidation [25] H. K. Jain, “A comprehensive model for the design of distributed computer systems,” IEEE Trans. Softw. Eng., vol. SE-13, no. 10, pp. 1092– 1104, Oct. 1987. [26] W. Lin, Z. Liu, C. H. Xia, and L. Zhang, “Optimal capacity allocation for Web systems with end-to-end delay guarantees,” Perform. Eval., vol. 62, no. 1–4, pp. 400–416, 2005. [27] J. C. S. Lui and M. F. Chan, “An efficient partitioning algorithm for distributed virtual environment systems,” IEEE Trans. Parallel Distrib. Syst., vol. 13, no. 3, pp. 193–211, Mar. 2002. [28] M. Malenovsky and J. Yang, IBM Director: Driving Efficiencies in Scale-Out Computing, Mar. 2004. IDC White Paper. [Online]. Available: http://www.ibm.com/servers/eserver/xseries/system_management/pdf/ idc_ibm_director.pdf [29] T. W. Mathers, Windows Server 2003/2000 Thin Client Solutions, 2004. Paperback. [30] D. A. Menascé and V. A. F. Almeida, Scaling for E-Business. Technologies, Models, Performance and Capacity Planning. Englewood Cliffs, NJ: Prentice-Hall, 2000. [31] D. A. Menascé and H. Gomaa, “A method for design and performance modeling of client/server systems,” IEEE Trans. Softw. Eng., vol. 26, no. 11, pp. 1066–1085, Nov. 2000. [32] Windows Server 2003 Terminal Server Capacity and Scaling, 2003, Microsoft. [Online]. Available: http://www.microsoft.com/ windowsserver2003/techinfo/overview/tssaling.mspx [33] Storage Networking Solution. Unleashing the Power of Storage, Nortel Networks, 2001. [34] What Enterprise Executives Need to Know About Optical Ethernet Network Services, Nortel Networks, 2001. [35] C. N. Parkinson, Parkinson’s Law. London, U.K.: The Economist, Nov. 1955. [Online]. Available: http://alpha.montclair.edu/~lebelp/ ParkinsonsLaw.pdf [36] R. L. Scheier, Scaling Up For E-Commerce, 2001, Computerworld. [Online]. Available: http://www.computerworld.com/softwaretopics/ software/appdev/story/0,10801,59095,00.html [37] R. Subbu and A. C. Sanderson, “Network-based distributed planning using coevolutionary agents: Architecture and evaluation,” IEEE Trans. Syst., Man, Cybern. A, Syst., Humans, vol. 34, no. 2, pp. 257–269, Mar. 2004. [38] Communication Market Review and Forecast, TIA, Arlington, VA, 2005. [Online]. Available: http://www.tiaonline.org/media/mrf/index.cfm
[39] L. Willcocks, “Evaluating information technology investments: Research, findings, and reappraisal,” J. Inf. Syst., vol. 2, no. 3, pp. 243–268, 1992. [40] L. Wolsey, Integer Programming. Hoboken, NJ: Wiley, 1998. [41] R. Yuan and W. T. Strayer, Virtual Private Networks: Technologies and Solutions. Reading, MA: Addison-Wesley, 2001.
Danilo Ardagna received the Ph.D. degree in computer engineering from the Politecnico di Milano, Milan, Italy, in 2004. He is currently an Assistant Professor of information systems with the Department of Electronics and Information, Politecnico di Milano. His research interests include Web services composition, autonomic computing, and computer system cost minimization.
Chiara Francalanci received the M.S. degree in electronic engineering and the Ph.D. degree in computer sciences from the Politecnico di Milano, Milan, Italy. As part of her postdoctoral studies, she worked for two years at the Harvard Business School, Boston, MA, as a Visiting Researcher. She is currently an Associate Professor of information systems with the Department of Electronics and Information, Politecnico di Milano. She has authored articles on the design of information technology architectures and the feasibility analyses of IT projects, which are consulted in the financial industry, both in Europe and the U.S. She is a member of the editorial board of the Journal of Information Technology.
Marco Trubian received the Computer Science degree from the Università degli Studi di Milano, Milan, Italy, and the Ph.D. degree in electronic engineering from the Politecnico di Milano, Milan. Since December 2002, he is an Associate Professor of operational research with the Computer Science Department, Università degli Studi di Milano. His research interests include the modeling of combinatorial optimization problems and the development of heuristic and exact algorithms for their solution. He has published papers in international journals such as Networks, Discrete Applied Mathematics, Informs Journal on Computing, European Journal of Operational Research, and others. Some of his recent publications are on optimizing resource allocation policies in data centers, solving cardinality-constrained cut problems on planar graphs, and developing efficient heuristics for special multidimensional knapsack problems.