Feb 28, 2012 - algorithm scalable at the cost of returning inaccurate solutions since the ..... for the inner product, w
Maximum Inner-Product Search using Tree Data-structures Parikshit Ram
Alexander G. Gray
arXiv:1202.6101v1 [cs.CG] 28 Feb 2012
February 29, 2012 Abstract The problem of efficiently finding the best match for a query in a given set with respect to the Euclidean distance or the cosine similarity has been extensively studied in literature. However, a closely related problem of efficiently finding the best match with respect to the inner product has never been explored in the general setting to the best of our knowledge. In this paper we consider this general problem and contrast it with the existing best-match algorithms. First, we propose a general branchand-bound algorithm using a tree data structure. Subsequently, we present a dual-tree algorithm for the case where there are multiple queries. Finally we present a new data structure for increasing the efficiency of the dual-tree algorithm. These branch-and-bound algorithms involve novel bounds suited for the purpose of best-matching with inner products. We evaluate our proposed algorithms on a variety of data sets from various applications, and exhibit up to five orders of magnitude improvement in query time over the naive search technique.
1
Introduction
In this paper, we consider the problem of efficiently finding the best-match for a query from a given set of points with respect to the inner-product similarity. Formally, we consider the following problem: Problem. For a given set of N points S ⊂ RD and a query q ∈ RD , efficiently find a point p ∈ S such that: hq, pi = maxhq, ri. r∈S
(1)
We call this the problem of maximum inner-product search. The focus of this paper is to improve the efficiency of this search. An alternate formulation of the above problem in terms of a vector and matrix multiplication is as following: Problem. For a given vector w ∈ RD and a matrix M ∈ RD×N , efficiently compute the following:
T
w M = max(wT M ). (2) ∞
This problem appears to be very similar much existing work in literature. Efficiently finding the best match with respect to the Euclidean (or more generally Lp ) distance is the widely studied problem of fast nearest-neighbor search in metric spaces [9]. Efficient retrieval of the best match with respect to the cosine similarity is the extensively researched in the field of text mining and information retrieval [1]. But as we will explain in the next section, the maximum inner-product search is not only different from these aforementioned tasks, but also arguably harder.
1.1
Applications
An obvious application of maximum inner-product search stems out of the widely successful matrix-factorization framework in recommender system challenges like the “Netflix prize” [22, 21, 2]. The matrix-factorization task obtains accurate representation of the available data in terms of user vectors and items vectors (examples for items would be movies or music). In this setting, the preference of a user for an item is the inner-product between the corresponding user’s vector and the item’s vector. The efficient retrieval of recommendations for a user is equivalent to the problem in equation 1 with the user as the query and the items as the reference set. For the challenges, linear scan of the items are usually employed to find the 1
best recommendations. An efficient search algorithm would make the retrieval of recommendations in the matrix-factorization framework scalable to real world systems. The usual document retrieval tasks use the cosine-similarity to match documents. However, in certain setting [11], the documents are represented as (not necessarily normalized) vectors and the inner-product between these vectors represent the similarity between the documents. In this case, unless the vectors are normalized to have the same length, document matching using the cosine-similarity [1] might make the algorithm scalable at the cost of returning inaccurate solutions since the inner-product is not the same as the cosine-similarity (we will explain this more elaborately in the section 2). There is a similar problem known as the the max-kernel operation: for a given set of points S and a query q and a kernel-function K(·, ·), the task is to find the point p ∈ S with the maximum value of K(q, p) over the set S. This problem is widely used in maximum-a-posteriori inference [20] in machine learning, and for the task of image matching [23] in computer vision. If the kernel function can be explicitly represented in the form a function ϕ(·) such that K(q, p) = hϕ(q), ϕ(r)i, then this problem reduces to the problem in equation 1 after all the points in the set S and the query q is transformed into the ϕ-space.
1.2
This Paper
In this paper, we consider the general problem of efficient maximum inner-product search and propose two tree-based branch-and-bound algorithms along with a new data structure to solve this problem more efficiently that the naive linear scan. In the following section, we contrast this problem to the usual problems of nearest-neighbor search in metric spaces and best-matches with respect to cosine-similarity. This presents the need for explicit attention to this problem (equation 1). However, we do motivate the use of the existing tree data structures for solving this task efficiently. In section 3, we propose a simple branch-and-bound algorithm using the existing ball-tree data structure [28] and a novel bound. In the following section (section 4), we address the situation where there are multiple queries on the same set of points and propose a dualtree branch-and-bound algorithm along with a new tree data structure, cone trees, to index the queries. The proposed algorithms are evaluated for their efficiency over a variety of data sets in section 5. Section 6 demonstrates how the proposed algorithms can be applied to the max-kernel operation with a general kernel function where it is not required to have an explicit representation of the points in the ϕ-space. In the final section (section 7), we provide our conclusions along with possible future directions for this work.
2
Maximum Inner-product Search
The inner-product between two vectors is very closely related to the Euclidean distance between the points represented by these vectors as well as to the cosine-similarity between these to vectors. Numerous techniques exists for nearest-neighbor search in Euclidean metric space (see surveys like [9]). Large scale best matching algorithms have also been developed for the cosine-similarity measure [1], with a lot of focus on text data. The problem of nearest-neighbor search (in metric space) has also been solved approximately with the widely popular Locality-sensitive hashing (LSH) method [14, 18]. The LSH technique has also been extended to other forms of similarity functions (as opposed to the distance as a dissimilarity function) like the cosine similarity [7]1 . The approximate max-kernel operations can also be solved efficiently with LSH under certain conditions on the kernel function. Some other techniques like dimension reduction [30] and dual-tree algorithms [20] have also been used to solve the approximate max-kernel operation efficiently.
2.1
How is maximum inner-product search different from existing problems?
In what follows, we will try to show that the problem stated in equation 1 is different from these existing problems. Hence techniques applied to these problems (like LSH) cannot be directly applied to this problem. 1 An important thing to note here is that the similarity function used in Charikar et.al.[7] is not exactly the cosine-similarity. The distance between two points p and q was measured by θ/π, where θ is the angle made the two points at the origin, making
the similarity function 1 −
θ π
. This similarity function has a direct correspondence to the cosine similarity.
2
y • pI
• pE
• q
• pC
x O
Figure 1: Best matches: For a given query q, pC , pE and pI denote the best match with respect to the Cosine-similarity, the Euclidean distance and the Inner-product respectively. It is apparent from this figure that even on a plane, the best match with respect to these similarity functions can be very different. Nearest-neighbor Search in Euclidean Space. The problem of finding the nearest-neighbor in this setting can be posed as finding a point p ∈ S for a query q such that: ! 2 krk2 2 p = arg min kq − rk2 = arg max hq, ri − r∈S r∈S 2 2
6= arg maxhq, ri (unless krk2 = k ∀ r ∈ S). r∈S
Hence, if the norms of all the points in S are normalized to have the same length, then the problem of finding the best match with respect to the inner-product is equivalent to the problem of finding the nearest-neighbor in Euclidean metric space. However, without this restriction, the two problems can have potentially very different answers (figure 2). Best-matching with Cosine-similarity. The problem of finding the best match with respect to the cosine-similarity can be posed as finding a point p ∈ S for a query q such that p
= 6=
hq, ri hq, ri = arg max r∈S krk r∈S kqk krk arg maxhq, ri (unless krk = k ∀ r ∈ S).
arg max r∈S
As in the previous case, the best match with cosine similarity is the best match with inner-products if all the points in the set S are normalized to have the same length. Under general conditions, the best matches with these two similarity functions can be very different (see figure 2). Locality-sensitive Hashing. LSH has been applied to a wide variety of similarity functions. LSH involves constructing hashing functions such that each hash function h must satisfy the following for any pair of points r, p ∈ S: Pr[h(r) = h(p)] = sim(r, p), (3) 3
where sim(r, p) ∈ [0, 1] is the similarity function of interest. For our situation, we can normalize our data set such that ∀ r ∈ S, krk ≤ 12 , and assume that the all the data is in the first quadrant (so that none of the inner-products go below zero). In that case, sim(r, p) = hr, pi ∈ [0, 1] is a valid similarity function of interest. It is known that for any similarity function to admit a locality sensitive hash function family (as defined in equation 3), the distance function d(r, p) = 1 − sim(r, p) must satisfy the triangle inequality (Lemma 1 in [7]). However, the distance function d(r, p) = 1 − hr, pi does not satisfy the triangle inequality (even when all the points are restricted to the first quadrant)3 . So LSH cannot be applied to the inner product similarity function even when we assume that all the data lies in the first quadrant (which is quite a restrictive assumption). Efficient Max-kernel Operation. There are different existing techniques of solving this problem efficiently. For kernel functions with very high (possibly infinite) dimensional explicit representations, Rahimi, et.al., 2007 [30], propose a technique to transform these high-dimensional representations into lower-dimensions while still approximately preserving the inner-product to improve scalability. However, the final search still involves a linear scan over the set of points for the maximum inner-product or a fast nearest-neighbor search under the assumption that finding the nearest-neighbor is equivalent to maximizing the inner-product. For translation invariant kernels4 , a tree-based recursive algorithm has been shown to scale to large sets [20]. However, it is not clear how this algorithm can be extended to the general class of kernels. LSH is widely used for image matching in computer vision [23], but only for kernel functions that admit a locality sensitive hashing function [7]. Hence, none of the existing techniques can be directly applied to our problem (equation 1) without introducing inaccurate results or limiting assumptions.
2.2
Why is maximum inner-product search possibly harder?
Unlike the distance functions in metric space, inner products do not induce any form of triangle inequality (even under some assumptions as mentioned in the previous section). Moreover, this lack of any induced triangle inequality causes the similarity function induced by the inner products to have no admissible family of locality sensitive hashing functions. And any modification to the similarity function to conform to widely used similarity functions (like Euclidean distance or Cosine-similarity) will create inaccurate results. Moreover, inner-products lack a very basic property of generally used similarity functions – the self similarity is high (generally the highest). For example, the Euclidean distance of a point to itself is 0; the cosine-similarity of a point to itself is 1. The inner-product of a point x to itself is kxk2 , which may be high or low depending on the value of the kxk. Moreover, there can possibly be many other points like y in the set such that hy, xi > kxk2 . Hence, without any assumptions, this problem of obtaining the best match with respect to the maximum inner product is inherently harder than the previously dealt similar problems. This is possibly the reason why there is no existing work for this problem without any restrictions on the domain (at least to the best of our knowledge).
2.3
Are trees the answer?
In this paper, we explore the tree data structure for indexing the points and a branch-and-bound algorithm specifically for the task of maximum inner-product search. Tree data structures have been widely used for the task of nearest-neighbor search [13, 4, 29]. And even though the task of nearest-neighbor search is slightly different from the task of maximum inner-product search, we believe that trees can still be useful for this task. 2 This normalization is different than the normalization mentioned before where all the points were normalized to have the same length. Here the lengths are normalized to be less than equal to one, but not equal to each other. 3 Consider the following counter example: Let x, y, z ∈ S be points such that kxk = kyk = kzk = 1, and angles made between x & y, y & z and z & x at the origin are π4 − 0.1 , π4 and π2 − 0.3 respectively. In this case the inequality, d(x, y) + d(y, z) ≥ d(z, x) does not hold for d(·, ·) = 1 − h·, ·i. d(x, y) = 0.226, d(y, z) = 0.293 & d(z, x) = 0.704. 4 Kernel functions K(p, q) which are dependent only the (Euclidean) distance between the points p and q are considered translation invariant kernels. The Gaussian RBF kernel is such a translation invariant kernel function.
4
Trees are known to be good indexing schemes in low to medium dimensions, while some new tree data structures have been developed for data in high dimensions with some low dimensional structure [10, 4]. A hierarchical representation of the data is useful. In this paper, we try to solve the problem of exact maximum inner-product search. The hierarchical tree data structure provides a very intuitive extension to solve the problem approximately to gain efficiency [8, 32]. Moreover, if the search is to be performed with strict constraints – error constraints or time constraints, the tree-based branch-and-bound algorithms can be easily adapted for that purpose. This is because these branch-and-bound algorithms are incremental algorithms. This is not possible with something like LSH – LSH provides theoretical error bounds, but there is no way of ensuring the error constraint during the search. Moreover, LSH is inherently not an incremental algorithm, and hence cannot be used in a limited time setting. An important advantage of trees is that the trees require a single construction – the branch-and-bound algorithm adapts for the different levels of approximate and/or time limitations. Hashing techniques require multiple hashes for different levels of approximation. The usual norm is to pre-hash for multiple values of approximation. Trees can also be constructed by learning from the data using techniques from machine learning [6, 25] to provide better accuracy and efficiency. This is why we use trees to solve the problem. Trees might not be the best possible way to solve this problem, but trees do bring a lot of advantages with them.
3
Tree-based Search
Ball trees [29, 28] are binary space-partitioning trees that have been widely used for the task of indexing data sets. Every node in the tree represents a set of points and each node is subsequently indexed with a center and a ball enclosing all the points in the node. The set of point at a node is divided into two disjoint sets which form the child nodes. This partitions the space into (possibly overlapping) hyper-spheres. The tree is built hierarchically and a node is declared to be a leaf node if it contains a set of points of size below a threshold value N0 .
Figure 2: Ball-trees: All the points are limited within the root ball (the bold-face circle). However, the subsequent balls does not necessarily lie within the parent ball – the points still lie within the root ball, but the ball enclosing the points in the child node are not necessarily compact enough to lie within the parent ball. However, the child node would be confined within the parent node if we used hyper-rectangles instead of balls to index the data.
3.1
Tree Construction
We use a simple ball tree construction heuristic that approximately picks a pair of pivot points which are farthest apart from each other [28], and splits the data by assigning the points to their closest pivot. The
5
intuition behind this heuristic is that these two points might lie in the principal direction. The splitting and the recursive tree construction algorithm is presented in Algorithms 1 & 2 for completeness. The tree is very space efficient since every node only stores the indices of the item vectors instead of the item vectors themselves. Hence the matrix for the items is never duplicated. Another implementation optimization is that the vectors in the items’ matrix are sorted in place (during the tree construction) such that all the items in the same leaf node are arranged serially in the matrix. This is to avoid any random access to the memory when accessing items in the same leaf node.
Algorithm 1 MakeBallTreeSplit(Data S) Pick a random point x ∈ S 2 A ← arg maxx′ ∈S kx − x′ k2 ′ 2 B ← arg maxx′ ∈S kA − x k2 return (A, B)
Algorithm 2 MakeBallTree(Set of items S) Input – Set S Output – Tree T T.S ← S T.µ ← mean(S) T.R ← maxp∈S kp − T.µk22 if |S| ≤ N0 then // Leaf node return T else // else split the set (A, B) ← MakeBallTreeSplit(S) Sl ← {p ∈ S : kp − Ak22 ≤ kp − Bk22 } Sr ← S \ Sl T.lc ← MakeBallTree(Sl ) T.rc ← MakeBallTree(Sr ) return T end if
Figure 3: Ball-tree Construction: The object T.S denotes the set of points in the node T . T.µ denotes the Euclidean mean of the items in the node T and T.R denotes the minimum radius of the ball centered around T.µ enclosing all the points in the node T . T.lc and T.rc denotes the left and right child of the tree node T .
3.2
Branch-and-bound algorithm
Ball trees are widely used for the task of nearest neighbor search and are known to be fairly scalable to moderately high dimensions [28, 26]. The search usually employs the depth-first branch-and-bound algorithm. A nearest neighbor query is answered by traversing the tree in a depth-first manner by first going down the node closer to the query and bounding the minimum possible distance to the other branch with the triangle-inequality. If this bound is greater than the current neighbor candidate for the query, the branch is removed from computation. An analogous greedy depth-first algorithm can be used for maximum inner-product search. But instead of traversing down the node closer to the query, the choice is made on the basis of the maximum possible inner-product between the query and any potential point from the node. The recursive depth-first branch and bound algorithm is presented in Algorithm 4. The search algorithm for a query (q) begins at the root of the tree (Alg. 5). At each step, the algorithm is at a tree node (T ). It checks if the maximum possible 6
inner-product between the query and any point in the node, MIP(q, T ), is any better than the current bestmatch for the query (q.bm). If the check fails, this branch of the tree is not explored any more. Otherwise, the algorithm recursively traverses the tree, exploring the branch with the better potential candidates in a depth-first manner. If the node is a leaf, the algorithm just finds the best-match within the leaf with a linear search (Alg. 3). This algorithm ensures that the exact solution (i.e., the best-match) is returned by the end of the algorithm.
Algorithm 3 LinearSearch(Query q, Reference Set S) for each p ∈ S do if hq, pi > q.λ then q.bm ← p q.λ ← hq, pi end if end for
Algorithm 4 TreeSearch(Query q, Tree Node T ) if q.λ < MIP(q, T ) then // This node has potential if isLeaf (T) then LinearSearch(q, T.S) else // best depth first traversal Il ← MIP(q, T.lc); Ir ← MIP(q, T.rc); if Il ≤ Ir then TreeSearch(q, T.rc); TreeSearch(q, T.lc); else TreeSearch(q, T.lc); TreeSearch(q, T.rc); end if end if end if // Else the node is pruned from computation return;
Algorithm 5 FindExactMaxIP(Query set V , Reference Set S) T ← MakeBallTree(S) for each q ∈ V do q.λ ← −∞; q.bm ← ∅; TreeSearch(q, T ); return q.bm; end for
Figure 4: Single-tree Search: The object q.bm contains the current best-match candidate for the query and q.λ denotes the inner-product between the query q and its current best-match q.bm. The function ‘MIP(q, T ) = hq, T.µi + kqk T.R’ denotes the upper bound on the maximum possible inner-product between the query q and any point lying in the tree node T .
7
y
q
p∗ rp
θp Rp
θq,p∗ p0
φ
ωp x
O
Figure 5: Bounding with a ball 3.2.1
Bounding maximum inner-product with a ball
Since the triangle inequality does not hold for the inner product, we present an novel analytical upper bound for the maximum possible inner product of a given point (in this case, the query q) with points in a ball. It is important to note that the information about the ball is limited to its center and its radius. For the rest of this section, we use the notation k·k to denote the k·k2 . R
Theorem 3.1. Given a ball Bp0p of points centered at p0 with radius Rp and (query) point q, the maximum R possible inner product between the point q and the ball Bp0p is bounded from above by: max hq, pi ≤ hq, p0 i + Rp kqk . R
(4)
p∈Bp0p
R
Proof. Suppose that p∗ is the best possible match in the ball Bp0p for the query q and rp be the Euclidean distance between the ball center p0 and p∗ (by definition, rp ≤ Rp ). Let θp be the angle between the vector p~0 and the vector p0~p∗ , φ and ωp be the angles made at the origin between the vector p~0 and vectors ~q and p~∗ respectively (see figure 5). The length of p∗ in terms of p0 and θp is: q kp∗ k = (kp0 k + rp cos θp )2 + (rp sin θp )2 . (5) The angle ωp can be expressed in terms of p0 and θp as: cos ωp =
kp0 k + rp cos θp rp sin θp , sin ωp = . kp∗ k kp∗ k
(6)
Let θq,p∗ be the angle between the vectors ~q and p~∗ . With the triangle inequality of angles, we have: |θq,p∗ | ≥ |φ − ωp |. Assuming that the angles lie in the range [−π, π] (instead of the usual [0, 2π]), and the fact that cos(θ) = cos(−θ), we get: (7) cos θq,p∗ ≤ cos(φ − ωp ), 8
since cos(·) is monotonically decreasing in the range [0, π]. Using this inequality we obtain the following bound for the highest possible inner-product between q and any point in the ball: max hq, pi =
R p∈Bp0p
= ≤
hq, p∗ i(by assumption) kqk kp∗ k cos θq,p∗
kqk kp∗ k cos(φ − ωp ),
where the last inequality follows from equation 7. Substituting equations 5 & 6 in the above inequality, we have max hq, pi ≤ R
p∈Bp0p
≤ = ≤
kqk (cos φ(kp0 k + rp cos θp ) + sin φ(rp sin θp )) kqk max (cos φ(kp0 k + rp cos θp ) + sin φ(rp sin θp )) θp
||q|| (cos φ(||p0 || + rp cos φ) + sin φ(rp sin φ))
kqk (cos φ(kp0 k + Rp cos φ) + sin φ(Rp sin φ)) (since rp ≤ Rp ).
The second inequality comes from the definition of maximum. The following equality comes from maximizing over θp . This gives us the optimal value of θp = φ. Simplifying the final inequality gives us equation 4. For the tree-search algorithm (Alg. 4), we set the maximum possible inner-product between q and a tree node T as MIP(q, T ) = hq, T.µi + T.R kqk . This upper bound can be computed in almost the same time required for a single inner-product (since the norms of the queries can be pre-computed before searching the tree). This algorithm is evaluated against the naive linear search algorithm in section 5.
4
Dual-tree based Search
For a set of queries, the tree can be traversed separately for each query. However, if the set of queries is very large, a common technique to improve efficiency of querying is to index the queries in the form of a tree as well. The search is then subsequently done by traversing both trees simultaneously using the dualtree algorithm [15]. The basic idea is to amortize the cost of tree-traversal for a set of queries which are very similar to each other and would follow (approximately) the same path down the tree. The dual-tree algorithms have been applied to different tree-based algorithms like nearest-neighbor search [15] and kernel density estimation [16] with provable theoretical runtime bounds [31].
4.1
Dual-tree Branch-and-bound Algorithm
The generic dual-tree algorithm is presented in Algorithm 6. Similar to the Algorithm 4, the algorithm traverses down the tree on the reference set S (referred to as the RTree). However, the algorithm also traverses down the tree on the set V of queries (QTree), resulting in a four-way recursion. At each step, the algorithm is at a QTree node Q and a RTree node T. For every Q, the value Q.λ denotes the minimum inner-product between any query in Q and its current best-match candidate. If this value is greater than the maximum possible inner product, MIP(Q, T ), between any query in Q and any reference point in T , this part of the recursion is no longer explored. When the algorithm is at the leaf level of both the trees, it obtains the best-matches for each query in the QTree leaf by doing a linear scan over the RTree leaf. In this section, we explore two ways of indexing the queries – (1) indexing the queries using the ball-tree (2) indexing the queries using a novel data structure, the cone-tree. In the following subsections, we derive expressions for MIP(Q, T ) each of these kinds of QTree.
9
Algorithm 6 DualSearch(QTree Node Q, RTree Node T ) if Q.λ < MIP(Q, T ) then // This node has potential if isLeaf (T ) & isLeaf (Q) then for each q ∈ Q.S do LinearSearch(q, T.S) end for Q.λ ← minq∈Q.S q.λ else if isLeaf (T ) then DualSearch(Q.lc, T ); DualSearch(Q.rc, T ); Q.λ ← min{Q.lc.λ, Q.rc.λ} else if isLeaf (Q) then Il ← MIP(Q, T.lc); Ir ← MIP(Q, T.rc); if Il ≤ Ir then DualSearch(Q, T.rc); DualSearch(Q, T.lc); else DualSearch(Q, T.lc); DualSearch(Q, T.rc); end if else // best depth first traversal Il ← MIP(Q.lc, T.lc); Ir ← MIP(Q.lc, T.rc); if Il ≤ Ir then DualSearch(Q.lc, T.rc); DualSearch(Q.lc, T.lc); else DualSearch(Q.lc, T.lc); DualSearch(Q.lc, T.rc); end if Il ← MIP(Q.rc, T.lc); Ir ← MIP(Q.rc, T.rc); if Il ≤ Ir then DualSearch(Q.rc, T.rc); DualSearch(Q.rc, T.lc); else DualSearch(Q.rc, T.lc); DualSearch(Q.rc, T.rc); end if Q.λ ← min{Q.lc.λ, Q.rc.λ} end if end if // Else the node is pruned from computation
Algorithm 7 FindExactMaxIPDualTree(Query Set V , Reference Set S) T ← MakeBallTree(S) Q ← MakeQueryTree(V ) ∀ trees nodes Q′ in the tree Q, Q′ .λ ← −∞; ∀ queries q ∈ V , q.bm ← ∅, q.λ ← −∞; DualSearch(Q, T ); ∀ queries q ∈ V , return q.bm;
Figure 6: Dual-tree Search: The tree-building subroutine for the set of queries “MakeQueryTree” can be the “MakeBallTree” subroutine (Alg. 2) or the “MakeConeTree” subroutine (Alg. 9). The object q.bm contains the current best-match for the query q. Q.λ denotes the lowest affinity between any query in the node Q and its current best-match. The function MIP(Q, T ) denotes the upper bound on the maximum possible inner-product between any query in the node Q and any point in the node T .
10
y Rq q0
θq ∗ rq q
ωq p∗ rp
θp Rp
θp∗ ,q∗
p0
ωp
φ
x O
Figure 7: Bounding between two balls.
4.2
Ball Tree for Queries R
R
Theorem 4.1. Given two balls Bp0p and Bq0q centered at p0 and q0 with radius Rp and Rq respectively, the R R maximum possible inner-product with any pair of points p ∈ Bp0p and q ∈ Bq0q is bounded from above by: max
R
R
p∈Bp0p ,q∈Bq0q
hq, pi ≤ hq0 , p0 i + Rq Rp + kq0 k Rp + kp0 k Rq . R
(8)
R
Proof. Consider the pair of point (p∗ , q ∗ ), p∗ ∈ Bp0p , q ∗ ∈ Bq0q be such that hq ∗ , p∗ i =
max
R
R
p∈Bp0p ,q∈Bq0q
hq, pi.
(9)
Let θp be the angle p~0 makes with the vector p0~p∗ , and θq be the corresponding angle in the query ball. Let ωp be the angle between the vectors p~0 and p~∗ and ωq be the angle between the vectors q~0 and q~∗ . Let rp be the distance between p0 and p∗ , rq be the distance between q0 and q ∗ . Finally, let φ be the angle made between p0 and q0 at the origin. R R Some facts for the ball Bp0p (the facts are analogous for the ball Bq0q ): q 2 ∗ kp k = kp0 k + rp2 + 2 kp0 k rp cos θp , cos ωp =
rp sin θp kp0 k + rp cos θp , sin ωp = . ∗ kp k kp∗ k
Using the triangle inequality of the angles, we know that:
|θq∗ ,p∗ | ≥ |φ − (ωp + ωq )|, giving us the following: hq ∗ , p∗ i = kp∗ k kq ∗ k cos(φ − (ωp + ωq ))
11
(10)
Replacing ωp and ωq with θp and θq by using the aforementioned equalities (similar to the techniques in proof for theorem 3.1), we have: hq ∗ , p∗ i
= hq0 , p0 i + rp rq cos(φ − (θp + θq )) + rp kq0 k cos(φ − θp ) + rq kp0 k cos(φ − θq ) ≤ max hq0 , p0 i + rp rq cos(φ − (θp + θq )) + rp kq0 k cos(φ − θp ) + rq kp0 k cos(φ − θq ) rp ,rq ,θp ,θq
≤ maxhq0 , p0 i + rp rq + rq kp0 k + rp kq0 k (since cos(·) ≤ 1),
(11)
≤ hq0 , p0 i + Rp Rq + Rq kp0 k + Rp kq0 k ,
(12)
rp ,rq
where the first inequality comes from the definition of max and the final inequality comes from the fact that rp ≤ Rp , rq ≤ Rq . For the dual-tree search algorithm (Alg. 6), the maximum-possible inner-product between two tree nodes Q and T is set as MIP(Q, T ) = hq0 , p0 i + Rp Rq + Rq kp0 k + Rp kq0 k . It is interesting to note that this upper bound bound reduces to the bound in theorem 3.1 when the ball containing the queries is reduced to a single point, implying Rq = 0. y
x O
Figure 8: Cone-tree: These cones are open cones and only the angle made at the origin with the axis of the cone is bounded for every point in the cone. The norms of the queries are not bounded at all.
4.3
Cone-trees for Queries
An interesting fact is that in equation 1, the point p, where the maximum is achieved, is independent of the norm ||q|| of the query q. Let θq,r be the angle between the q and r at the origin, then the task of searching for the maximum inner-product is equivalent to search for a point p ∈ S such that: p = arg max krk cos θq,r . r∈S
(13)
This implies that we only care about the direction of the queries irrespective of their norms. For this reason, we propose the indexing of the queries on the basis of their direction (from the origin) to form a conetree (figure 8). The queries are hierarchically indexed as (possibly overlapping) open cones. Each cone is represented by a vector, which corresponds to its axis, and an angle, which corresponds to the maximum angle made by any point within the cone with the axis at the origin. 12
Algorithm 8 MakeConeTreeSplit(Data Q) Pick a random point x ∈ Q A ← arg minx′ ∈S cos θx,x′ B ← arg minx′ ∈S cos θA,x′ return (A, B).
Algorithm 9 MakeConeTree(Set of items S) Input – Set S Output – Tree T T.S ← S T.µ ← mean(S) T.C ← minp∈S cos θT.µ,p if |S| ≤ N0 then return T else (A, B) ← MakeConeTreeSplit(S) Sl ← {p ∈ S : cos θA,p > cos θB,p } Sr ← S \ Sl T.lc ← MakeConeTree(Sl ) T.rc ← MakeConeTree(Sr ) return T end if
Figure 9: Cone-tree Construction: The object T.S denotes the set of points in the node T , T.µ denotes the Euclidean mean of the items in the node T and T.C denotes the cosine of the maximum angle made by any point in the node with T.µ at the origin. The angle made between any two points A and B at the origin is denoted by θA,B . 4.3.1 Cone-tree Construction The cone-tree construction is very similar to the ball-tree construction. The only difference is the use of cosine similarity instead of the Euclidean distances for the task of splitting. The cone-tree construction pseudo-code is presented in Figure 8. 4.3.2
Cone-Ball Bound
Since the norms of the queries do not affect the solution in equation 13, we assume that the norms of the queries are all equal to 1 for convenience. ω
R
Theorem 4.2. Given a ball Bp0p of points centered at p0 with radius Rp and a cone Cq0q with the axis of the R cone q0 and aperture5 of 2ωq ≥ 0, the maximum possible inner-product between any pair of points p ∈ Bp0p , ωq q ∈ Cq0 is bounded from above by: max
ω
R
q∈Cq0q ,p∈Bp0p
hq, pi
=
max
ω
R
q∈Cq0q ,p∈Bp0p
kpk cos θq,p
≤ kp0 k cos({|φ| − ωq }+ ) + Rp , where φ is the angle made between p0 and q0 at the origin and the function {x}+ = max{x, 0}. Proof. There are two cases to consider here: (i) |φ| < ωq 5 The
aperture of the cone is twice the angle made between the axis and the perimeter of the cone.
13
(14)
y
q0 ωq
ωq
q∗
p∗ rp
θp Rp
p0 θp∗ ,q∗ ωp
φ
x O
Figure 10: Bounding between a ball and a cone (ii) |φ| ≥ ωq ω
R
For case (i), the center p0 of the ball Bp0p lies within the cone Cq0q , implying that max
ω
R
q∈Cq0q ,p∈Bp0p
kpk cos θq,p ≤ kp0 k + Rp .
(15)
ω
since there could be some query q ∗ ∈ Cq0q which is in the same direction as p0 , giving the maximum possible inner-product. For case (ii), let us assume that φ ≥ 0 without loss of generality. Then φ ≥ ωq . Continuing with the similar notation as in theorem 3.1 & 4.1 for the best pair of points (q ∗ , p∗ ) as well as the notation from figure 10, we can say that (16) |θp∗ ,q∗ | ≥ |φ − ωq − ωp | Since ωq is fixed, we can say that max
ω
R
q∈Cq0q ,p∈Bp0p
kpk cos θq,p
≤
kp∗ k cos θq∗ ,p∗ (by def.)
≤
kp∗ k cos(φ − ωq − ωp ).
(17)
Expressing kp∗ k and ωp in terms of kp0 k , rp and θp , and then subsequently maximizing over θp and using the fact that rp ≤ Rp , we get that max
ω
R
q∈Cq0q ,p∈Bp0p
kpk cos θq,p ≤ kp0 k cos(φ − ωq ) + Rp .
(18)
Combining case (i) and (ii), we obtain equation 14.
5
Experiments and Results
In this section, we evaluate the efficiency of the two proposed algorithms 5 & 7. For the dual-tree algorithm, we use the two variations – (i) the set of queries indexed as a ball-tree (referred to as Alg. 7(B)), (ii) the 14
Dataset Bio Corel Covertype LCDM LiveJournal MNIST MovieLens Netflix OptDigits Pall7 Physics PSF SJ2 U-Random Y!-Music
D 74 32 55 3 25,327 786 51 51 64 7 78 2 2 20 51
|S| 210,409 27,749 431,012 10,777,216 121,625 60,000 3,706 17,770 1,347 100,841 112,500 3,056,092 50,000 700,000 624,961
|V | 75,000 10,000 150,000 6,000,000 100,000 10,000 6,040 480,189 450 100,841 37,500 3,056,092 50,000 300,000 1,000,990
Table 1: Datasets used for evaluation: The dimensionality D and number of points in the reference set S and the set of queries V . set of queries indexed as a cone-tree (referred to as Alg. 7(C)). Since we are not aware of any efficient exact method for maximum inner-product search, we compare our proposed algorithms to the linear search algorithm (Alg. 3). We report the speedup of the proposed algorithms over linear search. Speedup is defined as the ratio of the time taken by the linear search and the time taken by the evaluated algorithm. For the trees, the leaf size N0 can be selected by cross-validation (choosing the leaf size giving the highest speedup). However, for our experiments, we choose a ad hoc value of N0 = 20 for all datasets to demonstrate the gain in efficiency without any expensive cross-validation. Datasets. We use a variety of datasets from different fields of data mining. We use the following collaborative filtering datasets: MovieLens [17], Netflix [3] and the Yahoo! Music [12] datasets. After the matrix factorization stage, the matrix of item-vectors is used as the reference set and the matrix of user-vectors is used as the set of queries. For text data, we use the LiveJournal blog moods data set [19]. We also use the MNIST digits dataset [24] for evaluation. We also use three astronomy datasets – LCDM [27], PSF and SJ2. A synthetic data set (U-Rand) of uniformly random points in 20 dimensions is used to evaluate the performance of the tree-based algorithms on data sets without any underlying structure. The rest of the datasets are widely used machine learning data sets from the UCI machine learning repository [5]. The details of the datasets are presented in Table 1 and the size of the datasets (in bytes) is presented in figure 11. For the collaborative filtering datasets, there is a clear definition of the reference set (the items) and the set of queries (the users). For the rest of the data sets, we randomly split the datasets into query and reference sets. Tree Construction Times. The tree-building procedure is extremely efficient. We present the tree construction times in table 5 and contrast them with the runtime of the linear search algorithm (Alg. 3). For some of the larger data sets, the extrapolated runtime of Alg. 3 is reported. In the last column, we present the ratio of the tree construction times with the runtimes of Alg. 3. For algorithm 5 & 7(B), the tree construction involves building one and two ball-trees respectively. For algorithm 7(C), the queries are normalized to have unit length for convenience since the norms of the queries do not affect the answers (equation 13). Following the query normalization, two trees are built. We include the query normalization in the tree construction time for completeness. This is the reason for the significant difference between construction times for algorithm 7(B) and 7(C). The numbers in the last column of table 5 (R) show how small the construction times are with respect to the actual linear search. The highest ratio is 0.15 for the OptDigits dataset. This implies that any speedup over 1.18 at search time is enough to compensate for the tree construction time. For most of the datasets, this ratio is much lower. Moreover, this tree building cost is a one time cost. Once the tree is built, it can
15
1010
108
Size (in bytes)
106
104
102
Yahoo! Music
Uniform Random
SJ2
PSF
Physics
Pall7
Optdigits
Netflix
Movielens
Moods
LCDM
MNIST
Covertype
Corel
Bio
100
Figure 11: Total dataset sizes (in bytes): The combined sizes of the reference set and the query set for each data set are presented in this figure. be used for searching the dataset multiple times. Search Efficiency. The speedups over linear search are presented in Table 5. We have reported every dataset we evaluated our algorithm on. Overall, the speedup numbers vary from as low as 1.13 for the OptDigits dataset to over 105 (4 orders of magnitude) for the LCDM and the PSF dataset. An important thing to note here is that for datasets with low speedup (below an order of magnitude) with Alg. 5, the speedup numbers for all three algorithms were pretty low and fairly comparable for all three algorithms. However, even a speedup of 2 is pretty significant in terms of absolute times. For example, for the Yahoo! music dataset, a search speedup of mere 2 with a tree construction time of 120 seconds gives a saving of 19 hours of computation time. For most datasets with a high value of speedup for Alg. 5, the speedups for the dual-tree algorithms are also very high. There are three important things to note here. Firstly, the dual-tree algorithms (Alg. 7) do not perform very well if the single-tree algorithms (Alg. 5) does not have a high speedup. This is mostly because the tree is unable to find tight bounds and hence has to travel every branch. The dual-tree scheme loosens the bound to amortize the traversal cost over multiple queries. But if the bounds are bad for algorithm 5, the bounds for the dual-tree are much worse. Hence, the dual-tree algorithm does not show any significant speedup. Secondly, the dual-tree algorithm (especially Alg. 7(C)) starts outperforming the single-tree algorithm significantly when the set of queries is really large. This is a usual behavior for dual-tree algorithms. The query set has to be large enough for the gains from the amortization of query traversal of the reference tree (RTree) to outweigh the computational cost of traversing the query-tree (QTree) itself. Finally, the dual-tree algorithm with ball-trees for the query set is generally significantly slower than the dual-tree with a cone-tree for the queries. There are possibly two possible reasons for that – (i) The cones provide a tighter indexing of the queries than balls. A single cone can be used to index points in multiple balls which lie in the same direction but have varying norms. (ii) The upper bound for MIP(Q, T ) in equation 11 is fairly loose. We do provide two ways of obtaining tighter bounds in the Appendix, but we have not yet evaluated the algorithm with the new bounding techniques. We also consider the general problem of obtaining the points in the set S with the k highest inner-product with the query q. This is analogous to the k-nearest neighbor search problem. We present the speedups of our algorithms over linear search for k = 1, 2, 5 & 10 in figure 12.
16
Dataset Bio Corel Covertype LCDM LiveJournal MNIST MovieLens Netflix OptDigits Pall7 Physics PSF SJ2 U-Rand Y! Music
Alg.5 4.3 0.2 5.5 36.7 2223 8.06 0.03 0.2 0.01 0.26 2.33 9.06 0.1 4.94 9.72
Alg.7(B) 5.7 0.27 7.2 56.46 4073 9.1 0.08 8.27 0.012 0.52 3.0 18.1 0.2 6.9 28.85
Alg.7(C) 10.6 0.66 14.8 99.3 4745 11.38 0.27 33.5 0.022 1.4 5.8 34.95 0.46 15.64 112.5
Alg.3 4,028 43 14,885 1,984,200 517,194 817 4.62 1,878 0.135 364 1,114 282,514 75 26,586 137,306
R(%) 0.25 1.5 0.1 0.005 0.92 1.5 6 1.7 15 0.4 0.5 0.01 0.6 0.6 0.08
Table 2: Tree construction time (in seconds) contrasted with the linear search time (in seconds). Dataset Bio Corel Covertype LCDM LiveJournal MNIST MovieLens Netflix OptDigits Pall7 Physics PSF SJ2 U-Rand Y!-Music
Alg.5 7,059.62 14.27 927.51 29,526 28.04 2.61 2.23 1.98 1.13 1,020 4.93 61,502 544 3.76 2.11
Alg.7(B) 6.55 17.38 10.05 1,327 10.42 2.22 1.36 1.92 1.10 23.14 4.0 96,570 190 3.18 2.09
Alg.7(C) 273.52 7.68 773.34 101,950 15.45 2.5 1.67 1.84 1.10 2,285 4.08 125,800 767 3.28 2.16
Table 3: Speedups over linear search for k = 1.
6
Max-kernel Operation with General Kernel Functions
In this section, we show a method to apply the proposed algorithms in a inner-product space where the inner-products are defined by a kernel function, but it is not possible to explicitly represent the points in the ϕ-space. Without an explicit representation, the tree construction has to be modified since there would be no explicit representation of the mean of a set. For a tree node T with the set of point T.S, the mean in ϕ-space is defined as X 1 ϕ(p). µ= |T.S| p∈T.S
µ might not have an explicit representation, but it is possible to compute inner products with µ as follows: hµ, ϕ(q)i =
X 1 K(q, p). |T.S| p∈T.S
However, this computation is possibly very expensive (as opposed to the operation in equation 4 which is equivalent to a single inner-product). Instead of picking the mean of the set in the ϕ-space as the center of the ball, we propose picking the point in the ϕ-space which is closest to the mean µ as the new center. So 17
105 Dual ball−ball
104 103 102 101
105 104
Dual ball−cone
103 102 1
10
100 105 104 Single ball
Speedup over Linear Search
100
103 102 101
Y!Music
URand
SJ2
PSF
Physics
Pall7
OptDigits
Netflix
MovieLens
MNIST
LiveJournal
LCDM
Covertype
Corel
Bio
100
Figure 12: Speedups over linear search for k = 1, 2, 5 & 10. the new center pc is given by: pc
=
arg min kϕ(r) − µk2
=
arg min K(r, r) −
r∈T.S
r∈T.S
X 2 K(r′ , r). |T.S| ′
(19)
r ∈T.S
This operation is quadratic in computation time, but is done at the preprocessing phase to provide efficiency during the search phase. Given this new center pc , we can compute the radius Rp of the ball enclosing the set T.S as follows: Rp2
= =
max kϕ(r) − ϕ(pc )k
2
r∈T.S
max K(pc , pc ) + K(r, r) − 2K(r, pc ).
r∈T.S
(20)
Now given this method of choosing the center and evaluating the radius, a ball-tree can be built in the ϕ-space using Algorithm 2 without ever requiring the explicit representation of the points. Given this ball-tree, the equation 4 in theorem 3.1 can be modified to this situation as follows: p (21) MIP(q, T ) = K(q, pc ) + Rp K(q, q), 18
k 1 2 5 10
where pc is defined in equation 19 and Rp is defined in equation 20. Computing this upper bound is equivalent to a single kernel function evaluation (K(q, q) be pre-computed before searching the tree). Using this upper bound, the tree-search algorithm (Alg. 5) can be performed in ϕ-space without any explicit representation of the points. We will present the evaluation of this method in the longer version of the paper. Using the same principles, the dual-tree algorithm (Alg. 7) can also be applied to the ϕ-space without any explicit representation of the points. For the dual-tree with ball-tree for the queries, the upper bound on the maximum inner-product between queries in node Q and points in node T in theorem 4.1 becomes: p p MIP(Q, T ) = K(qc , pc ) + Rp Rq + Rp K(qc , qc ) + Rq K(pc , pc ), (22)
where pc and qc are the chosen ball centers in the ϕ-space with radius Rp and Rq respectively. For queries indexed in a cone-tree, the central axis of the cone can be the point in the ϕ-space making the smallest angle with the mean of the set in the ϕ-space. Since the queries are supposed to be normalized in the ϕ-space, for a query tree node Q, the mean of the set Q.S is supposed to be: µ=
X ϕ(q) 1 . |Q.S| kϕ(q)k q∈Q.S
So the new central axis qc of the cone is given by: qc
=
arg max
=
arg max
q∈Q.S
q∈Q.S
hµ, ϕ(q)i kµk kϕ(q)k ′ P √K(q ,q) ′ ′
q′ ∈Q.S
K(q ,q )
p K(r, r)
.
(23)
Again, this computation is quadratic in the size of the dataset, but provides efficiency during search time. The cosine of half the aperture of the cone is now given by: cos ωq
= =
hϕ(qc ), ϕ(q)i kϕ(qc )k kϕ(q)k K(qc , q) min p . q∈Q.S K(qc , qc )K(q, q) min
q∈Q.S
(24)
Given qc and ωq , the upper bound in theorem 4.2 for a cone-tree node Q of queries and a ball-tree node T of reference points is given by: p MIP(Q, T ) = K(pc , pc ) cos({|φ| − ωq }+ ) + Rp , (25) where φ is defined as:
K(pc , qc ) . cos φ = p K(qc , qc )K(pc , pc )
This bound is very efficient to compute as it only requires a single kernel function evaluation (the terms K(pc , pc ) and K(qc , qc ) can be pre-computed and stored in the trees).
7
Conclusion
We consider the general problem of maximum inner-product search and present three novel methods to solve this problem efficiently. We use the tree data structure and present a branch-and-bound algorithm for maximum inner-product search. We further extend it to the case where the set of queries is very large. We evaluate the proposed algorithms with a variety of datasets and exhibit their computational efficiency. A theoretical analyses of these proposed algorithms would give us a better understanding of the computational efficiency of these algorithms. We do not have any rigorous runtime bounds for our algorithm and it would be part of our future work.
19
References [1] R. Bayardo, Y. Ma, and R. Srikant. Scaling up all pairs similarity search. In Proceedings of the 16th international conference on World Wide Web. ACM, 2007. [2] R. M. Bell and Y. Koren. Lessons from the netflix prize challenge. SIGKDD Explor. Newsl., 2007. [3] J. Bennett and S. Lanning. The netflix prize. In Proc. KDD Cup and Workshop, 2007. [4] A. Beygelzimer, S. Kakade, and J. Langford. Cover Trees for Nearest Neighbor. Proceedings of the 23rd international conference on Machine learning, 2006. [5] C. L. Blake and C. J. Merz. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml/, 1998. [6] L. Cayton and S. Dasgupta. A learning framework for nearest neighbor search. Advances in Neural Information Processing Systems, 20, 2007. [7] M. S. Charikar. Similarity estimation techniques from rounding algorithms. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing. ACM, 2002. [8] P. Ciaccia and M. Patella. PAC Nearest Neighbor Queries: Approximate and Controlled Search in Highdimensional and Metric spaces. Data Engineering, 2000. Proceedings. 16th International Conference on, 2000. [9] K. Clarkson. Nearest-neighbor searching and metric space dimensions. Nearest-Neighbor Methods for Learning and Vision: Theory and Practice, 2006. [10] S. Dasgupta and Y. Freund. Random projection trees and low dimensional manifolds. In Proceedings of the 40th annual ACM symposium on Theory of computing. ACM, 2008. [11] S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 1990. [12] G. Dror, N. Koenigstein, Y. Koren, and M. Weimer. The yahoo! music dataset and kdd-cup’11. Journal Of Machine Learning Research, 2011. [13] J. H. Freidman, J. L. Bentley, and R. A. Finkel. An Algorithm for Finding Best Matches in Logarithmic Expected Time. ACM Trans. Math. Softw., 1977. [14] A. Gionis, P. Indyk, and R. Motwani. Similarity Search in High Dimensions via Hashing. 1999. [15] A. G. Gray and A. W. Moore. ‘N -Body’ Problems in Statistical Learning. In NIPS, 2000. [16] A. G. Gray and A. W. Moore. Nonparametric density estimation: Toward computational tractability. In SIAM Data Mining, 2003. [17] GroupLens. MovieLens dataset. [18] P. Indyk and R. Motwani. Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. In STOC, 1998. [19] S. Kim, F. Li, G. Lebanon, and I. Essa. Beyond Sentiment: The Manifold of Human Emotions. Arxiv preprint arXiv:1202.1568, 2011. [20] M. Klaas, D. Lang, and N. de Freitas. Fast maximum a posteriori inference in monte carlo state spaces. In Artificial Intelligence and Statistics, 2005. [21] Y. Koren. The bellkor solution to the netflix grand prize. 2009. [22] Y. Koren, R. M. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. IEEE Computer, 2009.
20
[23] B. Kulis and K. Grauman. Kernelized locality-sensitive hashing for scalable image search. In Computer Vision, 2009 IEEE 12th International Conference on. Ieee, 2009. [24] Y. LeCun. MNist dataset, 2000. http://yann.lecun.com/exdb/mnist/. [25] Z. Li, H. Ning, L. Cao, T. Zhang, Y. Gong, and T. S. Huang. Learning to search efficiently in high dimensions. In Advances in Neural Information Processing Systems 24. 2011. [26] T. Liu, A. W. Moore, A. G. Gray, and K. Yang. An Investigation of Practical Approximate Nearest Neighbor Algorithms. In Advances in Neural Information Processing Systems 17, 2005. [27] R. Lupton, J. Gunn, Z. Ivezic, G. Knapp, S. Kent, and N. Yasuda. The SDSS Imaging Pipelines. Arxiv preprint astro-ph/0101420, 2001. [28] S. M. Omohundro. Five Balltree Construction Algorithms. Technical Report TR-89-063, International Computer Science Institute, December 1989. [29] F. P. Preparata and M. I. Shamos. Computational Geometry: An Introduction. Springer, 1985. [30] A. Rahimi and B. Recht. Random features for large-scale kernel machines. Advances in neural information processing systems, 2007. [31] P. Ram, D. Lee, W. March, and A. Gray. Linear-time algorithms for pairwise statistical problems. In Advances in NIPS. 2009. [32] P. Ram, D. Lee, H. Ouyang, and A. G. Gray. Rank-approximate nearest neighbor search: Retaining meaning and speed in high dimensions. In Advances in Neural Information Processing Systems 22. 2009.
A
Tighter Bounds with Optimization
In this section, we present two ways to get a tighter bound on equation 11 with respect to θp and θq . The maximum inner product bound MIP(Q, T ) between two balls is given as: hq ∗ , p∗ i ≤
A.1
max hq0 , p0 i + rp rq cos(φ − (θp + θq )) + rp ||q0 || cos(φ − θp ) + rq ||p0 || cos(φ − θq ).
θp ,θq ,rp ,rq
(26)
Two-variable Optimization
Assuming that |φ − (θp + θq )| ≤
π π π , |φ − θp | ≤ , |φ − θq | ≤ , 2 2 2
we can say that: hq ∗ , p∗ i ≤ maxhq0 , p0 i + Rp Rq cos(φ − (θp + θq )) + Rp ||q0 || cos(φ − θp ) + Rq ||p0 || cos(φ − θq ) θp ,θq
= f (θp , θq ). Now
∂f (θp ,θq ) ∂θp
= 0 and
∂f (θp ,θq ) ∂θq
(27) (28)
= 0 gives us the following optimality conditions: sin(φ − (θp + θq )) ||q0 || , =− sin(φ − θp ) Rq
(29)
sin(φ − (θp + θq )) ||p0 || . =− sin(φ − θq ) Rp
(30)
21
The second order conditions are the following: ∂ 2 f (θp , θq ) ∂θp2
=
−Rp Rq cos(φ − (θp + θq )) − Rp ||q0 || cos(φ − θp ),
(31)
∂ 2 f (θp , θq ) ∂θq2
=
−Rp Rq cos(φ − (θp + θq )) − Rq ||p0 || cos(φ − θq ),
(32)
∂ 2 f (θp , θq ) ∂θp ∂θq
=
−Rp Rq cos(φ − (θp + θq )),
(33)
which are all < 0 for the stated range of φ, θp and θq . So the optimal values obtained from the optimality conditions (equations 29 & 30) correspond to the maximum. However, the optimality conditions do not have an analytic solution for θp and θq . Hence, any efficient optimization algorithm can be used to solve maxθp ,θq f (θp , θq ) in the specified range.
A.2
One-variable Optimization
Another approach is the following: max hp0 , q0 i + rp rq cos(φ − (θp + θq )) + rp ||q0 || cos(φ − θp ) + rq ||p0 || cos(φ − θq )
θp ,θq ,rp ,rq
≤ max max pT0 q0 + rp rq cos(φ − (θp + θq )) + rp ||q0 || cos(φ − θp ) + rq ||p0 || cos(φ − θq ), θq ,rp ,rq
(34)
θp
Since for fixed θq , ωq is fixed. And for a fixed ωq , using the single-tree bounding, θp = (φ − ωq ). Making this substitution in equation 34, we get the following optimization task: hp∗ , q ∗ i
max hp0 , q0 i + rp ||q ∗ || + rq ||p0 || cos(φ − θq ) q ≤ maxhp0 , q0 i + Rq ||p0 || cos(φ − θq ) + Rp ||q0 ||2 + Rq2 + 2Rq ||q0 || cos θq
≤
θq ,rp ,rq
θq
(35) (36)
= f (θp ),
where the second inequality comes from the assumption that |θq | ≤
π π , |φ − θq | ≤ , 2 2
and the fact that rp ≤ Rp , rq ≤ Rq . The first-order optimality condition gives us the following: Rq ||q0 || sin θp Rp q = Rq ||p0 || sin(φ − θq ), 2 ||q0 || + Rq2 + 2Rq ||q0 || cos θq
while the second-order derivative is given by: −Rq Rp ||q0 ||
cos θp (||q0 ||2 + Rq2 + Rq ||q0 || cos θq ) + Rq ||q0 || + Rq ||p0 || cos(φ − θq ), 3/2 ||q0 ||2 + Rq2 + 2Rq ||q0 || cos θq
which is always < 0 implying that the optimal θp is the maximum even though the equation A.2 does not give an analytic solution for θp . An efficient optimization algorithm can be used to solve this one dimensional optimization problem maxθp f (θp ) to obtain tight bounds for MIP(Q, T ). promising result.
22