A Two-Layer Algorithmic Framework for Service ... - ACM Digital Library

6 downloads 0 Views 2MB Size Report
Oct 26, 2018 - Given a set of service providers (e.g., cell towers) and a set of customers .... is used in fitness approximation model designed for replacing the costly ..... demand of all customers; α is the power coefficient; Q is the penalized factor used to .... Then the top 10% of solutions are selected as elite, which is used to ...
Industry and Case Study Paper

CIKM’18, October 22-26, 2018, Torino, Italy

A Two-Layer Algorithmic Framework for Service Provider Configuration and Planning with Optimal Spatial Matching Xijun Li, Jianguo Yao

Mingxuan Yuan, Jia Zeng

School of Software Shanghai Jiao Tong University

Noah’s Ark Lab Huawei Technologies

ABSTRACT

KEYWORDS

Industrial telecommunication applications prefer to run at the optimal capacity configuration to achieve the required Quality of Service (QoS) at the minimum cost. The optimal capacity configuration is usually achieved through the selection of cell towers capacities and locations. Given a set of service providers (e.g., cell towers) and a set of customers (e.g., major residential areas), where each customer has an amount of demand and each provider has multiple candidate capacities and corresponding costs, the optimal capacity selection is configured through spatial matching to satisfy the demand of each customer at the minimum cost. However, existing solutions developed for spatial matching, in which each provider’s capacity is fixed, cannot be directly applied to the capacity configuration problem with multiple capacities and location selections. In this paper, we are the first to study Service Provider Configuration and Planning with Optimal Spatial Matching (SPC-POSM) problem, in which the objectives are 1) to select the proper capacity for each provider at the minimum total cost and 2) to assign providers’ service to satisfy the demand of each customer on a condition that the matching distance is no more than service quality requirement. We prove that SPC-POSM is NP-hard and design an efficient two-layer meta-heuristic framework to solve the problem. Unsupervised learning technique is utilized to accelerate the calculation and a novel local search mechanism is embedded to further improve solution quality. Extensive experimental results verify the effectiveness and efficiency of the proposed framework.

Spatial matching, Capacity selection, Planning, Telecommunication ACM Reference Format: Xijun Li, Jianguo Yao and Mingxuan Yuan, Jia Zeng. 2018. A TwoLayer Algorithmic Framework for Service Provider Configuration and Planning with Optimal Spatial Matching. In The 27th ACM International Conference on Information and Knowledge Management (CIKM ’18), October 22–26, 2018, Torino, Italy. ACM, New York, NY, USA, 9 pages. https://doi.org/10.1145/3269206.3272008

1

INTRODUCTION

In many industrial applications, we need to find the optimal configuration (e.g., the selection of providers’ capacity type and locations) to achieve a certain service quality at the minimum cost. For instance, in telecommunication field, an important task for operators is to decide location of new cell towers from a given set of candidate locations as well as to increase or decrease capacity of the cell towers with the minimum cost, in order to satisfy the telecommunication connection requirements of the major residential areas (e.g., all connections can be served within two kilometers). In addition, the government is also supposed to plan the capacity and location of emergency centers within a city to guarantee that the majority population can be covered (e.g., 1000 people can be served by at least one ambulance within five kilometers). In above examples, both cell tower and emergency center can be regarded as provider offering service to customers (e.g., residential areas and major population mentioned above). As is often the case, each customer has a certain amount of demand to be met. Meanwhile, each provider has a service capacity which could be exhausted by serving its customers. If each providers’ location and capacity are predetermined, the problem of finding the optimal matching between customers and providers (i.e., to decide which provider serves how many demands of which customer to achieve the best service quality, e.g., the maximum service distance is minimized) is called the SPatial Matching (SPM) problem [8, 9, 13, 15]. Nevertheless, in the real industrial application scenario, providers usually have multiple discrete capacities to select. Previous works [8, 9, 13, 15] assume that each provider can have only one capacity, which is oversimplified for real application. It should be noted that there is so far no proposed research that discusses the multiple capacities configuration of provider while requiring to meet the quality of service. Hence, a new problem based on SPM, called Service Provider Configuration and Planning with Optimal Spatial Matching (SPC-POSM), is proposed in this paper, which

CCS CONCEPTS • Mathematics of computing → Combinatorial optimization; Evolutionary algorithms; • Applied computing → Enterprise resource planning;

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. CIKM ’18, October 22–26, 2018, Torino, Italy © 2018 Association for Computing Machinery. ACM ISBN 978-1-4503-6014-2/18/10. . . $15.00 https://doi.org/10.1145/3269206.3272008

2273

Industry and Case Study Paper

CIKM’18, October 22-26, 2018, Torino, Italy

dedicates on finding the optimal capacity configuration and location planning for providers to meet the target service quality at the minimum cost. The main contributions of this paper are summarized as follows: ∙ A problem originated from real scenario: SPC-POSM problem is proposed in this paper. We prove that the problem is NP-hard. To the best of our knowledge, there is no existing algorithm that can find the optimal solution for SPC-POSM within polynomial time. ∙ A practical two-layer framework for SPC-POSM: We design and implement an efficient two-layer meta-heuristic framework for SPC-POSM. ∙ Omitting the rule designing step for Evolutionary Algorithm: Our framework takes advantage of Estimation of Distribution Algorithm (EDA) in the outer search to learn from simulation probability distribution capable of generating candidate solutions that are likely to be feasible, avoiding the need for manual specification of generation rules[7]. This offers flexibility in applying EA to complicated business scenario. ∙ Speeding up solution evaluation for optimization using machine learning: An unsupervised learning technique is used in fitness approximation model designed for replacing the costly subroutine of solution exact evaluation. This greatly boosts the efficiency of the proposed framework especially for large scale SPC-POSM. ∙ Further improving quality of solution through local search: We design a tabu search algorithm to further improve the quality of best solution found by outer search algorithm. ∙ Extensive evaluation on both real and synthetic datasets: Experimental results show that the proposed framework performs much better than the benchmark algorithms in terms of both accuracy and efficiency. All implementation code of algorithms and synthetic datasets are open to the public. The rest of this paper is organized as follows. Section 2 describes the motivation of this paper, and then formulates SPC-POSM problem. The related works are reviewed in Section 2.2. In Section 3, a two-layer algorithmic framework is presented to solve SPC-POSM problem, including the design, methodology and implementation. Section 4 evaluates the proposed framework in terms of accuracy and efficiency. Finally, we conclude this paper in Section 5.

those densely populated areas where amounts of mobile users cluster. Besides, under the scenarios of fire and medical emergency, service calling mainly happens in the densely populated areas. Thus, in practice the service providers, e.g. telecommunication operators, government, are apt to consider their customers as clusters when they plan their resource. In this paper, the problem of determining location and configuring capacity for cell tower is selected as a representative to demonstrate SPC-POSM problem. More specifically, given the geographical distribution information of customers as well as candidate locations and capacity installation cost of cell towers, telecommunication operators are supposed to optimally plan the capacity of cell towers in an area to ensure the quality of service at the minimum cost. It is highly beneficial for telecommunication operators to solve the problem as they can reduce the infrastructure costs while increasing revenue and user satisfaction.

2.2

Preliminary: Spatial Matching

We here introduce SPatial Matching (SPM) problem and its several variants. Assume that set 𝑃 contains three providers 𝑝1 , 𝑝2 and 𝑝3 , each of which has one type of capacity; Set 𝑂 consists of three customers 𝑜1 , 𝑜2 and 𝑜3 , each of which has a certain amount of demand to be served by providers. SPM problem aims to find the optimal matching between customers and providers, i.e., to decide which provider serves how many demands of which customer to achieve the optimization objective (service quality). Based on the classical problem, several variants [8, 14, 15] have been proposed, among which different optimization criteria are formulated. SPatial Matching for Minimizing Maximum Matching Distance (SPM-MM) [8] problem dedicates on making the maximum matching distance among all pairwise as minimized as possible. Wong etc., proposed Fair Assignment [14] problem, whose objective is to find such an assignment (the assignment is a set of matchings), in which for each 𝑜 ∈ 𝑂 is allocated to 𝑝 ∈ 𝑃 that (1) is as near to 𝑜 as possible, and (2) its servicing capacity has not been exhausted in serving other closer customer. Besides, Yiu etc., proposed Globally Optimized Assignment [15] problem whose goal is to make the sum of matching distances minimized. The spatial matching problem has been studied extensively in [5, 6, 8, 13], and corresponding solutions have alse been proposed according to various optimization criteria [5, 8, 13] but without full consideration of multiple capacity choice for providers.

2 SPC-POSM PROBLEM 2.1 Motivation

2.3

Recently, many resource planning and configuration problems have emerged in the areas of telecommunication, fire emergency, and medical emergency, etc., which requires to be solved judiciously. This class of problem can be formulated as SPC-POSM problem. Note that although customers, i.e., the service demander, move individually between different metropolitan areas, they are viewed as clusters when being provided service. Sibren et. al. [4] find that the telecommunication operators mainly establish their cell towers near

In SPC-POSM, a set 𝑂 of customers 𝑜𝑖 (𝑖 = 1, 2, ..., 𝑀 ), each of which has a demand 𝑜.𝑤 (𝑜.𝑤 ̸= 0), a set 𝑃 of candidate providers 𝑝𝑗 (𝑗 = 1, 2, ..., 𝑁 ), each of which has 𝑚 types of candidate service capacity 𝑝.𝑤𝑘 (𝑘 = 1, ..., 𝑚) with its corresponding cost 𝑝.𝑐𝑘 , and coordinate information of each point are given. The problem aims to determine the capacity type for each candidate provider while making the maximum service distance less than or equal to a service quality threshold 𝐷 at the minimum capacity cost.

2274

Problem Formulation

Industry and Case Study Paper

CIKM’18, October 22-26, 2018, Torino, Italy

Furthermore, given the definition of variables and constants in Table 1, the problem is formulated in formula (1)-(6).

p2 o1 p1

Table 1: Definition of variables used in SPC-POSM Variable 𝑂 𝑃 𝑥𝑖,𝑗 𝑐𝑗,𝑘 𝑤𝑖,𝑗 𝑜𝑖 .𝑤 𝑝𝑗 .𝑤𝑘 𝑝𝑗 .𝑐𝑘 𝑑(𝑜𝑖 , 𝑝𝑗 ) 𝑚𝑚𝑑 𝐷

o2

Definition Set of customers 𝑜𝑖 Set of providers 𝑝𝑗 1 or 0, represents whether 𝑜𝑖 is served by 𝑝𝑗 or not 1 or 0, represents whether 𝑝𝑗 chooses capacity 𝑝𝑗 .𝑤𝑘 or not Amount of 𝑜𝑖 ’s demand satisfied by 𝑝𝑗 Service demand of 𝑜𝑖 The 𝑘th candidate capacity of 𝑝𝑗 Cost of making 𝑝𝑗 have capacity 𝑝𝑗 .𝑤𝑘 Euclidean distance between 𝑜𝑖 and 𝑝𝑗 Maximum matching distance of optimal spatial assignment Service quality threshold

p3

p2 o1 1

p1

o4

o2

1

1

2

1

p3

o3

1

o4

o3

(a) SPC-POSM spatial layout (b) A full assignment for SPCPOSM

Figure 1: A toy example of SPC-POSM problem

Table 2: Parameters of toy example (a) Pairwise distance

𝐷𝑖𝑠𝑡. 𝑝1 𝑝2 𝑝3

𝑜1 4.24 1.00 2.24

𝑜2 1.41 5.00 3.61

𝑜3 2.82 5.00 3.00

𝑜4 4.00 3.61 2.24

(b) Candidate capacity of providers

Objective: ∑︁ ∑︁

min

𝑗

𝑐𝑗,𝑘 × 𝑝𝑗 .𝑐𝑘

𝐶𝑎𝑝. 𝑝1 𝑝2 𝑝3

(1)

𝑘

s.t. ∀𝑗,

∑︁

𝑤𝑖,𝑗 6

𝑖

∀𝑖,

∑︁

𝑐𝑗,𝑘 × 𝑝𝑗 .𝑤𝑘

(2)

𝑤𝑖,𝑗 ≡ 𝑜𝑖 .𝑤

(3)

𝑐𝑗,𝑘 ≡ 1, 𝑐𝑗,𝑘 ∈ {0, 1}

(4)

𝐶𝑜𝑠𝑡 𝑝1 𝑝2 𝑝3

𝑗

∀𝑗,

∑︁

𝑝𝑖 .𝑤2 2 4 4

𝑝𝑖 .𝑤3 3 5 8

(c) Cost of candidate capacity

𝑘

∑︁

𝑝𝑖 .𝑤1 1 3 2

𝑝𝑖 .𝑐1 0 12 0

𝑝𝑖 .𝑐2 10 13 12

𝑝𝑖 .𝑐3 20 14 14

𝑘

∀𝑖, 𝑗,

𝑥𝑖,𝑗 ≡ 1 → 𝑚𝑚𝑑 > 𝑑(𝑜𝑖 , 𝑝𝑗 ) 𝑚𝑚𝑑 6 𝐷

∀𝑖, 𝑗,

𝑤𝑖,𝑗 > 0 → 𝑥𝑖,𝑗 = 1

𝑝1 , 𝑝2 and 𝑝3 , and set 𝑂 has four customers 𝑜1 , 𝑜2 , 𝑜3 and 𝑜4 . We set the demand of 𝑜1 , 𝑜2 , 𝑜3 and 𝑜4 to be 2, 2, 2 and 1, respectively. Thus, the total demand of all customers is 7. Table 2(a) shows all pairwise Euclidean distances between 𝑃 and 𝑂. For the given providers, Table 2(b) and Table 2 presents all types of candidate capacity and corresponding costs, respectively. According to Table 2(b), each 𝑝 ∈ 𝑃 has at most 3 types of candidate capacity. Thus there exist total 27 capacity configurations. Suppose that the service quality threshold 𝐷 is 5, which means the maximum matching distance of a feasible assignment under any of capacity configurations cannot be larger than 5. The goal of SPC-POSM is to find a capacity configuration with the minimized total cost while making the maximum matching distance not larger than 𝐷. Figure 1(b) portrays an optimal solution with the minimized total cost of 13, in which the selected capacities for 𝑝1 , 𝑝2 and 𝑝3 are 1, 4 and 2, respectively. Note that the number beside the line indicates how much demand of 𝑜 served by 𝑝.

(5) (6) (7)

Objective function (1) represents that the total cost of capacity configuration is expected to be minimized. Constraint (2) makes sure that each provider has enough service capacity to satisfy those customers who are served by this provider. Constraint (3) indicates that the total demand of each customer must be satisfied. Constraint (4) restricts each provider to select only one type of capacity from its candidate capacities. Both constraints (5) and (6) ensure the services quality 1 (i.e., the maximum matching distance must be less than the service distance threshold 𝐷). And constraint (7) indicates that when a provider is linked to a customer, it must serve some demands of this customer. To illustrate the problem more intuitively, a toy example is presented in Figure 1, where set 𝑃 contains three providers

2.4

1

We use SPM-MM’s service quality estimation function here. However, our solution is not just restricted to this function. It can be easily extended to other functions used in works [5, 13, 15].

NP-Hard Proof

In this subsection, we prove that SPC-POSM problem is NPhard from the perspective of minimum set cover problem [11].

2275

Industry and Case Study Paper

CIKM’18, October 22-26, 2018, Torino, Italy

Minimum Set Cover Problem. Given a universe 𝒰 of elements and a family 𝒮 of subsets of 𝒰 whose union equals the universe. A cover is a subfamily 𝒞 ⊆ 𝒮 of sets whose union is 𝒰. The input is a pair (𝒰, 𝒮), and the task is to find 𝒞 with the minimum size (|𝒞|). For instance, consider the universe 𝒰 = {1, 2, 3, 4, 5} and the set of sets 𝒮 = {{1, 2, 3}, {2, 4}, {3, 4}, {4, 5}}. Obviously the union of 𝒮 is 𝒰. We can use the smallest subset {{1, 2, 3}, {4, 5}} of 𝒮 to cover the universe 𝒰. Theorem 1. The set covering problem is NP-hard. Converting. Let the universe 𝒰 be the customer set and every customer’s demand is 1. Correspondingly, let the sets 𝒮 be providers. Each element 𝑠 ∈ 𝒮 has two capacities 0 and |𝒮| and the corresponding costs are 0 and 1, respectively. For every customer 𝑒 ∈ 𝑠, we set 𝑑(𝑒, 𝑠) = 1. If 𝑒 ∈ / 𝑠, we set 𝑑(𝑒, 𝑠) = ∞. If we can find an optimal solution of SPC-POSM problem with the minimum cost 𝑚, then all the customers’ demands are satisfied. Since each customer’s demand is 1, all the elements in 𝒰 are covered. Whenever we select a provider whose capacity is larger than 0, its cost is 1. Thus, 𝑚 is the minimum number of providers to satisfy the customers’ demands. Obviously, this is precisely solution of the minimum set cover problem. Hence, we can infer that SPC-POSM problem is NP-hard.

Input: 1. Providers’ Candidate Capacity 2. Customers’ Demand 3. Coordinate Info 4. Threshold Global Search Solution Initializer

Local Search

Fitness Approximation

Solution Modificator

Selection

Solution Evaluation

Solution Updater

N

Terminate?

Terminate?

Y

N

Y

Output: Optimal Capacity Configuration

Figure 2: Proposed framework

capacity. Thus, a string of length 5 is used to represent a possible solution, i.e., capacity configuration. Each entry of the string corresponds to a provider’s choice of capacity, whose value is an integer varying within [1, 𝑚], where 𝑚 is the number of type of candidate capacity. As presented in the right of Figure 3, provider 𝑝3 selects its second capacity 𝑝3 .𝑤2 . Thus, the third entry of encoded string is set to be 2.

3 PROPOSED FRAMEWORK 3.1 Overview This framework is comprised of two-layer meta-heuristic algorithm. As shown in Figure 2, providers’ candidate capacity, customers’ demand, coordinate information of all points and service quality threshold 𝐷 are inputted into the framework. An optimal capacity configuration is supposed to be obtained using this framework. More specifically, in the framework the outer algorithm is responsible for searching a promising region of good quality solution. Besides, to enhance the algorithm efficiency, a fitness approximation model is utilized in the outer search algorithm. Then within the promising region, a local optimum solution will be obtained by using an inner search algorithm. In the following, we firstly present the encoding mechanism used in the proposed framework, in which candidate solution is represented as a string of integers. Then the fitness evaluation for candidate solution is demonstrated, including the design of fitness function and fitness approximation method. Note that above two modules can be utilized in any metaheuristic algorithm designed for SPC-POSM problem. After that, two meta-heuristic algorithms used in the framework are presented respectively, where we describe their implementation process at length and explain the reason why the two algorithms are adopted.

3.2

Best Solution

Encoder Cap.

Value 1 3 2 5 4 Index 1 2 3 4 5

√ encode

√ √

2 3



Which means that provider select its second capacity

√ Possible Solution (Capacity Configuration)

Encoded solution

Figure 3: Encoding mechanism used

3.3

Fitness Evaluation

In the traditional meta-heuristic algorithm, a possible solution is supposed to be evaluated using an evaluation function whose design requires to take into account optimization objective and many problem constraints. For SPC-POSM problem, the evaluation function 𝑓 (·) used in the proposed framework is designed as follows: 𝑓 (𝑠) = exp(𝐽(𝑠)),

Encoding Mechanism 𝐽(𝑠) =

We use integer encoding mechanism to encode a capacity configuration as a string of integer number, as shown in Figure 3, where there are five providers 𝑝1 , 𝑝2 , 𝑝3 , 𝑝4 and 𝑝5 , and each provider has at most five types of candidate

⎧ ⎪ ⎨

𝛼

𝑠 𝜎𝑐𝑜𝑠𝑡

𝑠 −𝛼, if 𝜎𝑐𝑎𝑝 ≥ 𝜎𝑑𝑒𝑚 and 𝑚𝑚𝑑 > 𝐷, ⎪ ⎩ −𝑄 × 𝛼, if 𝜎 𝑠 < 𝜎 𝑑𝑒𝑚 𝑐𝑎𝑝 ∑︁ ∑︁ 𝑠 𝜎𝑐𝑜𝑠𝑡 = 𝑐𝑗,𝑘 × 𝑝𝑗 .𝑐𝑘 , 𝑗

2276

(8)

𝑠 , if 𝜎𝑐𝑎𝑝 ≥ 𝜎𝑑𝑒𝑚 and 𝑚𝑚𝑑 ≤ 𝐷

𝑘

(9)

(10)

Industry and Case Study Paper

𝑠 𝜎𝑐𝑎𝑝 =

CIKM’18, October 22-26, 2018, Torino, Italy

∑︁ ∑︁ 𝑗

𝑐𝑗,𝑘 × 𝑝𝑗 .𝑤𝑘 ,

(11)

Fitness Approximation

𝑘

𝜎𝑑𝑒𝑚 =

∑︁

𝑜𝑖 .𝑤,

(12)

𝑖

where 𝑠 is a capacity configuration; 𝑚𝑚𝑑 is the maximum matching distance of optimal spatial assignment under the 𝑠 capacity configuration 𝑠; 𝜎𝑐𝑜𝑠𝑡 is the sum of capacity cost 𝑠 of 𝑠; 𝜎𝑐𝑎𝑝 is the sum of capacity of 𝑠; 𝜎𝑑𝑒𝑚 is the sum of demand of all customers; 𝛼 is the power coefficient; 𝑄 is the penalized factor used to penalize those infeasible solutions 𝑠 whose 𝜎𝑐𝑎𝑝 is less than 𝜎𝑑𝑒𝑚 ; Both 𝛼 and 𝑄 needs to be set properly. In our implementation, we set 𝛼 and 𝑄 to be 10.0 and 20.0 respectively. Each possible solution can obtain its fitness using function (8), where the larger the fitness, the better the solution. More specifically, observed from evaluation function (8), it is obvious that minimizing the total cost is via maximizing the fitness. Besides, constraints (2), (3), and (6) are met in functions (9), (11) and (12) respectively. Constraints (4) is met by the integer encoding mechanism. To meet the constraint (5), Swap-Chain algorithm is utilized in order to efficiently obtain 𝑚𝑚𝑑 given a capacity configuration.

Cluster 1

Cluster 2

① Clustering

Solution Exact Evaluation

=

=

② Fitness Calculation

③ Fitness Assignment

Figure 4: Fitness approximation process

(3) Fitness Assignment: Within each cluster, the fitness of representative is directly assigned to each of unevaluated points.

3.4

Global Search

A general global search algorithm is presented in Figure 2, of which each module is frequently used in most of popular meta-heuristic algorithms, such as Estimation of Distribution Algorithm (EDA) [16], Particle Swarm Optimization (PSO) [1], and Genetic Algorithm (GA), etc. Among these meta-heuristic algorithms, EDA is selected as the global search algorithm in the proposed framework since it has many advantages over other meta-heuristic algorithms, such as the absence of multiple parameters to be tuned and the expressiveness and transparency of the probabilistic model that guides the search process [16]. Besides, our experimental results also verified that EDA outperforms PSO and GA in terms of accuracy and efficiency. The pseudo code of EDA used in the proposed framework is presented in Algorithm 1, of which three important parts need to be demonstrated:

3.3.1 Swap-Chain Algorithm. SPC-POSM problem degrades into SPM-MM [8] problem once the capacity configuration is determined. SPM-MM problem focuses on getting the minimized 𝑚𝑚𝑑. To the best of our knowledge, Swap-Chain [8] is the state-of-the-art algorithm to solve SPM-MM problem. In brief, the idea behind Swap-Chain algorithm is reducing maximum matching distance by readjusting assignment through many times iteration until it cannot find a proper matching pair to reduce maximum matching distance. Here, we will not go into detail about the algorithm. although Swap-Chain is state-of-the-art, it is still time-consumed. The time complexity of the algorithm is 𝑂(𝜆 + 𝛾 + 𝑅 · 𝑡 · (|𝑉 | · 𝛽(|𝑃 |) + 𝑐 · |𝑉 |)), where 𝑅 ≪ |𝐸|, 𝑡 ≪ min{max𝑝∈𝑃 𝑝.𝑤, max𝑜∈𝑂 𝑜.𝑤}, and 𝑐 ≪ min{|𝑃 |, |𝑂|}. 𝜆 is the time complexity of building index for facilitate range queries. 𝛾 is the time complexity of the full assignment initialization. 𝑅 is the total number of possible extreme matches fetched in the algorithm. 𝑃 , 𝑂 are the sets of providers and of customers, respectively and |𝑉 | = |𝑃 | + |𝑂|.

(1) Distribution Construction: Given candidate capacity 𝐶𝑎𝑝 of providers, the first line of code is to construct an uniform distribution for sampling possible capacity configuration. For example, there are three providers, 𝑝1 , 𝑝2 and 𝑝3 , each of which has three types of candidate capacity, 𝑝𝑖 .𝑤1 , 𝑝𝑖 .𝑤2 and 𝑝𝑖 .𝑤3 . For each provider 𝑝𝑖 , we can construct a probability vector of length 3, of which the 𝑘th entry corresponds to the probability that 𝑝𝑖 selects 𝑝𝑖 .𝑤𝑘 as its capacity. At the beginning, each entry’s value is equal, which means that 𝑝𝑖 selects any type of candidate capacity with equal probability. In this way, three probability vectors can be constructed, constituting a probability matrix 𝐷𝑖𝑠𝑡. (2) Solution Sampling: The fifth line of code means that a population of possible solutions can be sampled from the probability matrix 𝐷𝑖𝑠𝑡. More specifically, for each provider, it can determine its capacity by sampling from corresponding probability vector of 𝐷𝑖𝑠𝑡. Once all providers determine their capacity, a capacity configuration (possible solution) is generated. (3) Distribution Update: The lines 11-13 of code are to update the probability distribution. These generated

3.3.2 Fitness Approximation. The fitness evaluation is with high time complexity since it needs to call Swap-Chain algorithm to obtain 𝑚𝑚𝑑. To enhance the efficiency of proposed framework, a fitness approximation method is designed, presented in Figure 4. As shown in Figure 4, given a set of possible solutions, their fitness approximation needs to be through the following steps: (1) Clustering: For the given solutions, they are first clustered into 𝑘 clusters according to similarity, where KMeans [3] is adopted. Then within each cluster, a representative is selected, which is the point closest to the cluster center. (2) Fitness Calculation: For those selected representatives, their fitness are calculated by evaluation function (8).

2277

Industry and Case Study Paper

CIKM’18, October 22-26, 2018, Torino, Italy

Algorithm 1: EDA used in the global search

1 2

3 4 5

6 7

8 9 10

12 13

15

1 2 3 4 5 6 7 8 9 10

Input: Best solution 𝑠* and its fitness 𝑓 * found by global search, Providers’ candidate capacity 𝐶𝑎𝑝 and corresponding cost 𝐶𝑜𝑠, Customers’ demand 𝐷𝑒𝑚, Coordinate of providers and customers, Service quality threshold 𝐷 Output: Capacity configuration with minimized cost set current solution 𝑠 = 𝑠* , tabu list 𝑇 = []; while not satisfy termination condition do generate neighbouring solution set 𝑁 (𝑠) of 𝑠; foreach 𝑠′ ∈ 𝑁 (𝑠) do if 𝑠′ ∈ / 𝑇 then 𝑓 (𝑠′ ) = 𝑆𝑜𝑙𝐸𝑥𝑎𝑐𝑡𝐸𝑣𝑎𝑙(𝑠′ ); %where SolExactEval() is to directly calculates %fitness of given solution using function (8) if 𝑓 (𝑠′ ) > 𝑓 * then 𝑓 * = 𝑓 (𝑠′ ); 𝑠* = 𝑠′ ; ′

11

sort solutions in the descending order of fitness; select top 10% solutions as 𝐸𝑙𝑖𝑡𝑒; update 𝐷𝑖𝑠𝑡 using 𝐸𝑙𝑖𝑡𝑒;

11

14

Algorithm 2: Tabu search used in the local search

Input: Providers’ candidate capacity 𝐶𝑎𝑝 and corresponding cost 𝐶𝑜𝑠, Customers’ demand 𝐷𝑒𝑚, Coordinate of providers and customers, Service quality threshold 𝐷, Population size 𝑃 𝑜𝑝𝑆𝑖𝑧𝑒 Output: Capacity configuration with minimized cost construct distribution matrix 𝐷𝑖𝑠𝑡 using 𝐶𝑎𝑝; %where 𝐷𝑖𝑠𝑡 is the probability distribution for sampling %possible solution; set best solution 𝑠* = 𝑛𝑢𝑙𝑙, best fitness 𝑓 * = 0; while not satisfy termination condition do sample a 𝑃 𝑜𝑝𝑆𝑖𝑧𝑒 of solutions 𝑠 from 𝐷𝑖𝑠𝑡 to constitute population 𝑃 𝑜𝑝𝑢; evaluate population: 𝑓 (𝑃 𝑜𝑝𝑢) = 𝐹 𝑖𝑡𝐴𝑝𝑝𝑟𝑜𝑥(𝑃 𝑜𝑝𝑢); %where FitApprox() is the Fitness Approximation Model foreach 𝑠 ∈ 𝑃 𝑜𝑝𝑢 do if 𝑓 (𝑠) > 𝑓 * then 𝑓 * = 𝑓 (𝑠); 𝑠* = 𝑠;

12 13 14

update tabu list 𝑇 = 𝑇 ∪ {𝑠 }; update current solution 𝑠 = 𝑠* final; return best solution 𝑠* ;

final; return best solution 𝑠* , best fitness 𝑓 * ;

It is likely to generate an infeasible solution using above methods. Here, we make the infeasible solution feasible by modulo operation, 𝑣 mod 𝑚, where 𝑣 is the value of an infeasible entry and 𝑚 is the corresponding number of types of candidate capacity.

solutions are first sorted in the descending order of fitness. Then the top 10% of solutions are selected as elite, which is used to update the probability distribution. Continuing the example above, given an elite of capacity configuration where 𝑝1 selects 𝑝1 .𝑤2 ; 𝑝2 selects 𝑝2 .𝑤3 and 𝑝3 selects 𝑝3 .𝑤1 , how do we update the probability distribution 𝐷𝑖𝑠𝑡? We just need to increase the second entry’s value of 𝑝1 ’s probability vector by one and to remain other entries of this vector unchanged. The rest of vector of 𝐷𝑖𝑠𝑡 are updated using the same rules.

3.5

4 EVALUATION 4.1 Experiment Environment We test our framework on both synthetic and real datasets in terms of accuracy and efficiency. Synthetic datasets are generated as follows. The coordinates of spatial objects follow the Uniform distribution over range [0, 100]. The demand of each customer is set to be [1, 10] randomly. For each provider, the number of candidate capacity is set to be [10, 30] randomly. Accordingly, the candidate capacity is set to be (0, 50] randomly and the corresponding cost is set to be [100, 1000] randomly. 66 synthetic instances are generated using above method, which can be clustered into four groups in terms of the scale of customer number, provider number and number of candidate capacity. Group 1, 2, 3 and 4 contains 6, 20, 20, and 20 instances respectively. Furthermore, statistic information of the four groups of datasets are shown in Table 3. Note that all synthetic datasets are available on the link2 , where interested readers can find all details about the synthetic datasets. Four real datasets, namely real01, real02, real03 and real04, are also used in our experiments, which is obtained from one of the largest telecommunication operators in China. Each real dataset contains two sets of spatial objects, a set of populated areas (PA) and a set of cell towers (CT), in one

Local Search

To further improve the quality of solution obtained by the global search algorithm, a local search algorithm is used, where option includes Tabu Search [2], Guided Local Search, etc. Similar to the global search, the local search presented in Figure 2 is also a general algorithmic flow, of which each module is frequently used in popular local search algorithms. We select Tabu Search as the local search algorithm, whose pseudo code is given in Algorithm 2. Note that the sixth line of code is to generate a set of neighboring solutions for the current solution, where there are two methods that modifies the current solution to a new one. The first method is swapping value of two entries in the encoded string. For example, given a integer string of (1, 3, 2), if the first and third entries are selected to be swapped, then the string will be changed as (2, 3, 1). The second method is increasing value of one entry in the encoded string. Continuing above example, if the first entry is selected to be increased, then the string will be changed to (2, 3, 2).

2

2278

https://github.com/xijunlee/SPC-POSM/tree/master/data

Industry and Case Study Paper

CIKM’18, October 22-26, 2018, Torino, Italy

Table 3: Synthetic datasets statistics Group

Ave. # capacity

Ave. # customer

Ave. # provider

1 2 3 4

9.50 12.10 33.10 24.40

16.00 80.40 227.25 344.35

7.17 53.45 124.60 247.40

In GA, a population of candidate solutions to an optimization problem is evolved toward better solution by performing selection, crossover and mutation operators. In our implementation of GA, we still adopt the encoding mechanism described in Section 3.2 to encode each candidate solution. Tournament selection [12], two-point crossover and random mutation are used in the implementation. All the compared algorithms are parallelized on a Spark cluster using the CentOS 7 operating system equipped with 150 cores, 512G memory, whose Python implementation code is available on the link3 . Note that each compared algorithm, PSO, GA, the proposed framework and its variant, is performed five times on each tested instance to limit the effect of randomness.

Table 4: Real datasets statistics Name

# capacity

# customer

# provider

real01 real02 real03 real04

5 7 12 12

9438 11892 9201 18651

3146 3964 3067 6217

4.2

region of the largest cities of China. Here PA and CT are referred to the customer and provider defined in SPC-POSM problem respectively. The coordinate and demand of PAs are recorded in these datasets, where the demand of PA ranges from [10, 100]. The coordinate and candidate capacity of each cell tower are also given in these datasets, where the number of candidate capacity ranges from [5, 12]. The coordinates are all normalized into range of [0, 100]. The summaries of the real datasets are presented in Table 4. The real datasets are not open to the public due to the commercial reasons. We compare the proposed framework with one of the state-of-the-art mathematical solver, Gorubi [10], as well as our implementation of two state-of-the-art meta-heuristic algorithms which are designed to solve problems closely resembling SPC-POSM, Particle Swarm Optimization (PSO) and Genetic Algorithm (GA). A classical PSO algorithm works by having a swarm of candidate solutions which are moved around in the search space according to a few simple formulae. The movements of candidate solutions are guided by their own best known position in the search space as well as the entire swarm’s best known position. The classical PSO algorithm is designed to optimize problem defined in the real number space. To adapt PSO to SPC-POSM problem defined in the discrete space, we redesign the solution update operator as follows: 𝑣 ′ [𝑖] = 𝑊 𝑣[𝑖] + 𝑐1 𝑟1 (^ 𝑠[𝑖] − 𝑠[𝑖]) + 𝑐2 𝑟2 (𝑠* [𝑖] − 𝑠[𝑖])%𝑚, ′

𝑠 [𝑖] = ⌈(𝑠[𝑖] + 𝑣[𝑖])%𝑚⌉,

Results on Synthetic Datasets

In this subsection, we test the accuracy of the proposed framework, by comparing the total cost of capacity configuration obtained from the proposed framework with that calculated from Gorubi, PSO and GA, respectively. 4.2.1 Accuracy comparison with mathematical solver. Due to the high computation cost of mathematical solver and page limit, we only show the results of synthetic dataset group 1. Similar results can be observed on other small datasets. When the problem size increases, the running time of Gurobi increases exponentially. For example, for a problem with 100 providers and 200 customers, Gurobi takes 1.2 hours without finishing computation. The parameters of the algorithms are best tuned. We recorded the result of proposed framework with the parameter setting, population size = 100, EDA max iteration = 100, tabu list length = 10, tabu max iteration = 100, tabu neighborhood size = 100, max iteration block = 3, 𝑄 = 2, and 𝛼 = −10. Note that the result obtained from the solver is optimal. Through this test, it can be observed how close the proposed framework is to the mathematical solver in terms of solution accuracy. Table 5: Accuracy comparison Gurobi

(13) (14)

where 𝑠 is one solution of the swarm, i.e., a capacity configuration of SPC-POSM, encoded as a string of integer using method described in Section 3.2, and 𝑠 has its velocity 𝑣, which is a string of integers with the same length of 𝑠; 𝑠′ and 𝑣 ′ are the updated solution and velocity respectively; 𝑠[𝑖] represents the 𝑖th provider’s capacity choice; 𝑊 , 𝑐1 and 𝑐2 are three update coefficients; 𝑚 is the maximum number of type of candidate capacity for the 𝑖th provider. Each provider of solution 𝑠 updates its capacity choice and velocity according to formulae (13) and (14). In our implementation of PSO, 𝑊 , 𝑐1 and 𝑐2 are set to be 0.5, 0.1 and 0.2 respectively.

Proposed

Proposed w/o ls

Instance

Cost

Ave. cost

Var.

Ave. cost

Var.

1 2 3 4 5 6

1000.00 222.00 0.00 100.00 0.00 3100.00

1002.86 222.00 4.47 100.00 0.33 3107.65

1.53 0.00 0.91 0.00 0.00 5.5

1007.86 225.50 7.85 100.0 2.55 3155.43

0.84 0.20 0.33 0.00 0.56 15.47

The tested results are presented in Table 5, where the second column shows Gurobi’s result, and the third and fourth columns give the result obtained by the proposed framework, and the last two columns show the result obtained by the proposed framework without local search phase. Through observing Table 5, we can find that 1) the results obtained by the proposed framework are extremely close to the optimum with great stability (low variance); 2) in terms of both 3

2279

https://github.com/xijunlee/SPC-POSM/tree/master/code

Industry and Case Study Paper

CIKM’18, October 22-26, 2018, Torino, Italy

accuracy and stability, the results obtained by the proposed framework without local search phase are worse than that obtained by the integrated framework.

proposed framework, through comparing it against PSO and GA. We continue to select the group 2, 3 and 4 of datasets as tested instances. The group 1’s datasets are excluded in this test because its scale is too small to differentiate the convergence and execution time of compared algorithms. Note that in this test, the parameter setting of compared algorithms are the same with that set in Section 4.2. Convergence. Within each group of datasets, each algorithm runs 100 iterations on every instance. The tested results are shown in Figure 5, where it can be observed that 1) the proposed framework and PSO converge to much better solution than GA on any group of datasets and GA converges too early; 2) the proposed framework outperforms PSO on all of datasets except for datasets of group 4.

4.2.2 Accuracy comparison with other meta-heuristic algorithms. Group 2, 3, and 4 of synthetic datasets, 60 instances in total, are used in this test. The parameter setting of the proposed framework is the same as that adopted in Section 4.2.1. To guarantee fairness of comparison, the max iteration, max iteration block and population size of PSO and GA are the same with that of the proposed framework. The tested results are plotted in Figure 6, where the height of box represents the average total cost of one group. The average total cost of one group means that within the group, each instance’s result obtained by corresponding algorithm is summed up then averaged. From Figure 6, it can be observed that 1) for PSO GA

Average execution time (s)

105 104 104 104

Group 3

Group 4

8 6 4

4.2.3 Efficiency Evaluation. From the perspective of convergence and execution time, we evaluate the efficiency of the 1.2✕106

22.0✕103

Proposed PSO GA

18.0✕103 16.0✕103 14.0✕103 12.0✕103 10.0✕103

1.1✕106

20

40

60 Iteration

(a) Group 2

80

100

Proposed PSO GA

2.3✕106

1.0✕106 9.5✕105 9.0✕105

2.2✕106 2.1✕106 2.0✕106 1.9✕106 1.8✕106

8.5✕105 1

Group 4

2.4✕106

Proposed PSO GA

1.1✕106 Average total cost

20.0✕103

Group 3

Execution time. We then compare the execution time of different algorithms. We calculate the average execution time on each group of datasets and report the results in Figure 7, where we can observe that 1) the average execution time of GA is the shortest among that of the three algorithms due to its premature convergence; 2) the average execution time of the proposed framework is much shorter than that of PSO except for the result on the group 2, which indicates that the proposed framework are more efficient on large scale SPCPOSM problem. This is because the fitness approximation model designed in the proposed framework starts to greatly reduce execution time when the problem scale becomes large. For small scale SPC-POSM, clustering possible solutions and evaluating the solutions might cost more time than directly evaluating them. Thus, the proposed framework costs more time on the datasets of group 2 than PSO.

each group of datasets, the proposed framework can achieve the best optimization performance, 0.3%, 4.5% and 7.8% better than its variant (the proposed framework without local search phase), PSO and GA respectively; 2) the proposed framework has better optimization performance than PSO and GA even without local search phase; 3) GA performs worst on each group of datasets among these algorithms and PSO is the second worst. Based on all above experimental results, it can be claimed that the proposed framework is capable of achieving good optimization performance on SPC-POSM problem, extremely close to the optimum obtained by mathematical solver.

8.0✕103

Group 2

Figure 7: Execution time comparison among different algorithms on the synthetic datasets

Figure 6: Accuracy comparison among different algorithms on the synthetic datasets

Average total cost

10

2

Group 2

GA

12

104 100

PSO

14

Average total cost

Average total cost

Proposed Proposed w/o ls

Proposed 16

1

20

40

60

80

100

1.7✕106

1

20

40

Iteration

(b) Group 3

(c) Group 4

Figure 5: Convergence of different algorithms on the synthetic datasets

2280

60 Iteration

80

100

Industry and Case Study Paper

4.3

CIKM’18, October 22-26, 2018, Torino, Italy

Results on Real Datasets Proposed

PSO

In the future, we plan to refine the proposed framework by making use of supervised learning techniques to further boost the efficiency of solution evaluation.

GA

Average normalized cost

1.2

ACKNOWLEDGMENTS

1

This work was supported in part by National Key Research & Development Program of China (No. 2018YFB1003603), the Program for NSFC (No. 61772339), and the Shanghai Rising-Star Program (No.16QA1402200). The corresponding authors are Dr. Yuan, Mingxuan, Dr. Yao, Jianguo and Dr. Zeng, Jia. The authors would also like to thank anonymous referees for their valuable comments and helpful suggestions.

0.8 0.6 0.4 0.2 0

real01

real02

real03

real04

REFERENCES

Figure 8: Accuracy comparison among different algorithms on the real datasets

Proposed

PSO

[1] Ke-Lin Du and MNS Swamy. 2016. Particle swarm optimization. In Search and Optimization by Metaheuristics. Springer, 153– 173. [2] Michel Gendreau and Jean-Yves Potvin. 2014. Tabu search. In Search methodologies. Springer, 243–263. [3] Zhexue Huang. 1998. Extensions to the k-means algorithm for clustering large data sets with categorical values. Data mining and knowledge discovery 2, 3 (1998), 283–304. on C´ aceres, Margaret [4] Sibren Isaacman, Richard Becker, Ram´ Martonosi, James Rowland, Alexander Varshavsky, and Walter Willinger. 2012. Human mobility modeling at metropolitan scales. In Proceedings of the 10th international conference on Mobile systems, applications, and services. Acm, 239–252. [5] James M Kang, Mohamed F Mokbel, Shashi Shekhar, Tian Xia, and Donghui Zhang. 2007. Continuous evaluation of monochromatic and bichromatic reverse nearest neighbors. In Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on. IEEE, 806–815. [6] Flip Korn and S Muthukrishnan. 2000. Influence sets based on reverse nearest neighbor queries. In ACM SIGMOD Record, Vol. 29. ACM, 201–212. [7] Xijun Li, Mingxuan Yuan, Di Chen, Jianguo Yao, and Jia Zeng. 2018. A Data-Driven Three-Layer Algorithm for Split Delivery Vehicle Routing Problem with 3D Container Loading Constraint. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 528–536. [8] Cheng Long, Raymond Chi-Wing Wong, Philip S Yu, and Minhao Jiang. 2013. On optimal worst-case matching. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. ACM, 845–856. [9] Kyriakos Mouratidis, Nikos Mamoulis, et al. 2010. Continuous spatial assignment of moving users. The VLDB Journal 19, 2 (2010), 141–160. [10] Gurobi Optimization. 2014. Inc.,Gurobi optimizer reference manual,. (2014). [11] Christos H Papadimitriou and Kenneth Steiglitz. 1998. Combinatorial optimization: algorithms and complexity. [12] Kumara Sastry, David E Goldberg, and Graham Kendall. 2014. Genetic algorithms. In Search methodologies. Springer, 93–117. ¨ [13] Raymond Chi-Wing Wong, M Tamer Ozsu, Philip S Yu, Ada WaiChee Fu, and Lian Liu. 2009. Efficient method for maximizing bichromatic reverse nearest neighbor. Proceedings of the VLDB Endowment 2 (2009), 1126–1137. [14] Raymond Chi-Wing Wong, Yufei Tao, Ada Wai-Chee Fu, and Xiaokui Xiao. 2007. On efficient spatial matching. In Proceedings of the 33rd international conference on Very large data bases. VLDB Endowment, 579–590. [15] Man Lung Yiu, Kyriakos Mouratidis, Nikos Mamoulis, et al. 2008. Capacity constrained assignment in spatial databases. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, 15–28. [16] Aimin Zhou, Jianyong Sun, and Qingfu Zhang. 2015. An estimation of distribution algorithm with cheap and expensive local search methods. IEEE Transactions on Evolutionary Computation 19, 6 (2015), 807–822.

GA

Average execution time (h)

35 30 25 20 15 10 5 0

real01

real02

real03

real04

Figure 9: Execution time comparison among different algorithms on the real datasets Furthermore, we compare the proposed framework with PSO and GA on the real datasets, in terms of total cost and execution time. Note that to protect the commercial interest, the total cost result of algorithms are normalized into the range of [0, 1]. The experimental result are reported in Figure 8 and Figure 9, where it can be observed that 1) the proposed framework can achieve the best optimization on each real dataset, 9.25% and 11.5% better than PSO and GA respectively; 2) the execution time of the proposed framework is the shortest among the three algorithms on each real dataset, approximately half to PSO’s and GA’s. From the experimental results, we can conclude that the proposed framework significantly outperforms other benchmark algorithms in terms of both effectiveness and efficiency, especially for large scale SPC-POSM problem.

5

CONCLUSION

In this paper, we proposed the incremental spatial matching problem called Service Provider Configuration and Planning with Optimal Spatial Matching (SPC-POSM) problem, originated from many real business scenarios. A practical two-layer meta-heuristic framework is designed and implemented for SPC-POSM. Extensive experimental results show that the proposed framework performs much better than the benchmark algorithms in terms of both accuracy and efficiency.

2281