Approximation of Optimization Problems and ... - Semantic Scholar

1 downloads 0 Views 164KB Size Report
Claudio Ferretti. Giancarlo Mauri. Dipartimento di Scienze dell'Informazione. Universit a di Milano. Abstract. We study optimization problems in a context derivedĀ ...
Approximation of Optimization Problems and Learnability Bruno Apolloni Claudio Ferretti Giancarlo Mauri Dipartimento di Scienze dell'Informazione Universita di Milano Abstract

We study optimization problems in a context derived from the studies about computational learnability. While an optimization problem can be hard to solve, it could be easy to approximate. But when the degree of approximation is checked by another approximating algorithm we can setup an interaction protocol in which the task is to nd the power of the checker, a task similar to that of learning a secret concept. The results are about relations between approximability and this new kind of learnability.

1 Introduction In the paper we deal with a special kind of learning protocol whose task is to be skilled at least as our teacher, which in its own turn is far from being perfect. Namely, facing with an optimization problem, possibily NP-hard, we have to be more approximate than our supervisor. The instances of the problem are supplied randomly with distribution law P, our solution is shortly censored by the teacher with a \yes" if it is no worse than the solution he is able to nd out by himself, and with a \no" otherwise. Availing of a limited amount of these answers, we are able to beat the teacher for any problem in MAX NP. The paper is so organized: Section 2 formally de nes the above as a GElearning problem and recalls some notions of approximability and approximability preserving reductions. Section 3 claims that: i) GE-learning is not trivially reducible to PAC learning, ii) any optimization problem allowing for a PAS scheme is GE-learnable, iii) namely, 0/1 Knapsack, such as any other problem in MAX NP, is GElearnable. Some forewards conclude the paper. 1

N R >

   RV QQ s WQ 0 R? t

?W

pr D

bft0

}ZRZ ZZ W   +R t

U

?R/W

Figure 1: A GE-learning system

2 De nitions

2.1 GE-Learning

De nition 1 Given an optimization problem PR associated with a real valued

evaluation function m(pr; D), taking as input an instance pr and a feasible solution D, a GE-learning system for PR is composed of three machines, N , V and U , each one of which knows PR and the function m. They interact through some tapes as described in Figure 1, where R means a read-only head, W a write-only head, and R/W a read-write head.

We consider the following internal operations and interactions between these machines. The system keeps repeating the following sequence of operations: 1. N is active: it generates an instance pr of the problem PR, according to a probability distribution P, and writes it on the tape; 2. U is active: it reads pr from the tape, then it computes an approximated solution D through an algorithm ProtU, which takes as input an instance of the problem and a natural number determining the level of approximation, and writes it on the tape to V : D = ProtU(pr; t); 3. V is active: it reads pr from the tape, then it computes the function 1 if m(pr; D)  m(pr; ProtV (pr; t0)) , where ProtV is an approximat0 else ing algorithm for PR, which takes as input an instance of the problem

and a natural number determining the level of approximation; the result bft0 is then written on the tape to U. Note that t0 must be selected once before the beginning of interactions and is unknown to U; 4. U is active: it reads the resulting bft0 and computes a natural valued function UpdU(bft0 ; t), assigning the result to t. This function updates the approximation level U will use the next time it will be required to beat V . A bft0 equal to 0 witnesses the approximation of U being beaten by the approximation of V . Aim of machine U is to nd a parameter t such that it is able to beat V on almost every instance pr. ProtV determines a set of possible degrees of approximation of our problem and we have to nd which degree of approximation of ProtU we must reach to be sure to have almost always bft0 equal to 1. Clearly, machine U could compute the true optimum of pr to beat its adversary, but this could be computationally too expensive. So we look for a t minimal with respect to some cost function C(t). We will caracterize the following property of the approximation algorithms ProtU and ProtV :

De nition 2 An approximation scheme ProtV is GE-predictable by ProtU i

there exists a function UpdU such that for any input parameter , for any distribution P on instances, for any sample sequence of instances, after a number of instances polynomial in 1= machine U has probability at most  of receiving a bft0 equal to 0 on a new instance.

If ProtU  ProtV we say that ProtV is self-GE-predictable. We can see that this protocol is similar to those of computational learning ([Val84]), with the main di erence that an `error' in the learning process is not a 6= between V 's and U's output, but a 0, and for each instance p of P :  the optima of p and f(p), optP (p) and optR (f(p)) respectively, satisfy optR (f(p))  o  optP (p),  for any feasible solution d of f(p), we can nd in polynomial time a feasible solution c of p satisfying optP (p) ? m(p; d)  q  [optR(f(p)) ? m(f(p); c)].

3 Results

3.1 GE-Learning

Let's try to link our GE-learning context to the usual contex of computational learning. In the rst case, the output of the learner is checked for  relation against the target output, while in the second the check requires that the two values be equal. A simple reduction could map the real valued labels of the rst context to some boolean values of the second context. But we show that this way is not feasible, neither from a boolean context to the GE-learning context, nor on the reverse path, even if we allow to use two di erent functions to map target labels and learned labels:

Theorem 1 i) There is no couple of functions f; g : f0; 1g ! IR satisfying x 6= y , f(x) < g(y). ii) There is no couple of functions i; j : IR ! f0; 1g satisfying n < m , i(n) 6= j(m). Proof. i) Suppose to have such f and g functions. From the hypothesis we have f(0)  g(0) and f(1)  g(1). Moreover f(0) < g(1) and f(1) < g(0). Then we can say that f(0) < g(1)  f(1) and f(1) < g(0)  f(0), an absurd. ii) Suppose to have such i and j functions. Consider that i(n) = 0 for some n, then j(n+1) = j(n+2) = 1. It follows that 0 = i(n+1) 6= j(n+2), implying that n + 1 < n + 1, an absurd. 2

A more complex task would be to translate a whole GE-learning problem to a boolean learning problem, not mapping only the labels. This problem would require to extend the techniques proposed in [PW90], but we won't address to it further.

3.2 Approximation Schemes

The monotonicity property of a PAS algorithm is the key to see that it will be GE-learnable, when we consider as cost the number of instances to see before the parameter t of machine U can be considered a good guess of the hidden t0. Theorem 2 Given a maximization problem PR, if we have a size measure r, over the set of instances will be generated, such that:

8pr : 8t  r; m(pr; ProtV (pr; r)) = m(pr; ProtV (pr; t)) and the PR has a PAS ProtV monotone, then ProtV is self-GE-predictable after L(?1 ; r)  2?1 (r + ln ?1) instances. Proof. We de ne L(?1 ; r) as in [Val84], and consider as success an interaction giving bft0 equal to 0. In such occasions it holds m(pr; ProtV (pr; t))
e(pr; t0). Being ProtV monotone, it follows that t < t0 . Let's de ne a function UpdU this way: t if bft0 = 1 UpdU(bft0 ; t) = t + 1 if bft0 = 0 When starting with t = 0 and having t0 successes we reach the exact t0. If at the end of the L(?1 ; r) instances we have t < t0 < r, we have necessarily had less than t0 successes, and then also fewer than r. But in this situation the de nition of L() allows to state that the probability to have bft0 = 0 in the following interactions is less than , or that it is higher, but this event has probability smaller than , for any distribution P used in the generation of instances. So our system has con dence and accuracy parameters equal to , but this allows to state that the same algorithm may be seen as a prediction algorithm with probability of error smaller than , as shown in [HKLW88]. 2 As an example of PAS algorithm we can consider -APPROX. Note that we are bounding the number of instances to be seen in the learning process, while the computational cost is in this case exponential in t0 ([Sah75]). When we suppose to know the maximum number of objects the instances will have, we can state the following: Corollary 1 The -APPROX approximation scheme for the 0/1-Knapsack problem is self-GE-predictable after 2?1(r+ln?1) instances, being r the maximum number of objects in the instances will be generated. Proof. Shown by the previous theorem and Lemma 1, knowing that this

algorithm, with in input k = r greater or equal to the number of objects of the instance, would nd the optimum. 2 We now study the relation between the existence of a PAS for a problem PR and the monotonicity property.

Theorem 3 There exists a PAS that is not monotone.

Proof. We can modify the -APPROX algorithm in this way: it holds the second best solution it nds, as well as the best, and associates to each one the level at which they were found, where level is the size of the combination from which they were built. Let the input approximation parameter be k, and the input instance pr. At the end it checks whether the best solution was built from a combination of fewer than k objects. This would mean that the same approximation could be found at a lower approximation level, and then it tries to output the second best solution. It will reject the best solution max when the second best sec pr;sec)  1 . We can easily show that in this satis es the condition 1 ? mm((pr;max ) k+1 case sec will satisfy the approximation condition since max satis es it. 2

As an example run of the non-monotone algorithm described in the preceding theorem consider the following instance:  capacity 10,  objects with weigth equal to pro t and equal to f7; 6:9; 11;12;3:1g. The algorithm, with k in input equal to 0; 1; 2; 3; 4;5, will output solutions with evaluation equal to 7; 10; 7; 10; 10; 10, respectively. We see that it is not monotone, as with k = 2 it rejects the best solution, the same found with k = 1, while it may not do the same with k = 3, since the second best solution satis es no more the approximation condition. On the other hand we can extend the property that makes -APPROX monotone to any PAS algorithm:

Theorem 4 Any PAS can be made monotone.

Proof. We can modify any PAS algorithm to make it call itself recursively at approximation levels lower than the current k, holding the best solution it nds. The running time may only grow linearly with k. 2 One can derive GE-learnability results about some problems from L-reducibility to problems that we know to be GE-learnable.

Corollary 2 Approximability-preserving reductions preserve GE-predictability. Proof. Shown by the result in [PY88] stating that when A has approximability-preserving reduction to B then if B has a PAS then also A has a PAS, and by using the previous theorems about monotonicity and GE-predictability: the monotonicity is guaranteed and so is the existence of the measure r for the instances will be generated. 2 This result allows to state that every maximization problem in MAX NP, which is essentially de ned in [PY88] as the class of maximization problems having PAS algorithms, is GE-predictable.

4 Conclusions Learning is often a heavy job. In order to make it easier di erent ways have been followed in literature, such as specializing the distribution law of the examples ([BI88]) or abandoning the search of the symbolic expression of the concepts ([Lit87,War89]). Here we propose the relaxing of the learning task in an ideal framework where the quality of our hypotheses is checked by a people with polynomial resources as us. Further steps in that directions may concern the settlement of wider classes of polynomially GE-learnable approximation schemes, possibly through learner

protocols which di ere from that of the supervisor. Then relations between the classes of these two protocols might constitute a source of meaningful insights on the related optimization problems.

References [BI88]

G.M. Bendek, A. Itai. Learnability for xed distributions. In Proc. 1988 Workshop on Computational Learning Theory, 80{, 1988. [HKLW88] D. Haussler, M. Kearns, N. Littlestone, M.K. Warmuth. Equivalence of models for polynomial learnability. In Proc. 1988 Workshop on Computational Learning Theory, 42{55, 1988. [Joh74] D. Johnson. Approximation algorithms for combinatorial problems. J. Computer and System Sci., 9:256{278, 1974. [Lit87] N. Littlestone. Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm. Machine Learning, 2(4):285{318, 1987. [PW90] L. Pitt, M.K. Warmuth. Prediction-preserving reducibility. J. Computer and System Sci., 41(3):430{467, 1990. [PY88] C. Papadimitriou, M. Yannakakis. Optimization, approximation, and complexity classes. In Proc. 20th ACM Symp. Theory Comp., 229{234, 1988. [Sah75] S. Sahni. Approximate algorithms for the 0/1-Knapsack problem. J. Assoc. Comput. Mach., 22:115{124, 1975. [Val84] L.G. Valiant. A theory of the learnable. Comm. Assoc. Comp. Mach., 27(11):1134{1142, 1984. [War89] M.K. Warmuth. Towards representation independence in PAC learning. In Proc. 1989 Workshop on Analogical and Inductive Inference, 78{103, 1989.