PhD. Defense. SDP Approaches for Distance Geometry Problems. 1 ... Learning low dimensional manifolds in high dimensiona
PhD. Defense
SDP Approaches for Distance Geometry Problems
Semidefinite Programming Approaches for Distance Geometry Problems Pratik Biswas Electrical Engineering, Stanford University
1
PhD. Defense
SDP Approaches for Distance Geometry Problems
The Distance Geometry Problem
• Input: Subset of all pair-wise Euclidean distances between a set of points(possibly corrupted by noise), and positions of some points(anchors). • Output: Positions of all points. • Synonymous with the Graph Realization problem.
2
PhD. Defense
SDP Approaches for Distance Geometry Problems
Outline • Motivation • Distance geometry background • Semidefinite Programming relaxation • Sensor Network Localization • Molecule Conformation • Conclusion
3
PhD. Defense
SDP Approaches for Distance Geometry Problems
Motivation Distance geometry and other related problems find application in a wide variety of real life problems. • Estimating positions of nodes in a wireless network. • Predicting structures of large molecules. • Learning low dimensional manifolds in high dimensional spaces. • Internet tomography, data visualization.
4
PhD. Defense
SDP Approaches for Distance Geometry Problems
Wireless Sensor Networks
Source: MICA Project, Berkeley
• Large number of sensor ’motes’ densely deployed over a geographical area. • Sensors collect local environmental information(temperature, humidity) and communicate with each other to relay the gathered data to a decision making authority.
5
PhD. Defense
SDP Approaches for Distance Geometry Problems
• Applications in habitat monitoring (James Reserve Project 2002), emergency medical care (CodeBlue project 2004), industrial control in manufacturing plants (Intel Mote Project 2005), structural health monitoring (NETSHM 2006), military and civilian surveillance (He et al. 2004), asset location (Fontana et al. 2003) etc. • Important to know positions of sensors to better interpret the data and for efficient routing.
6
PhD. Defense
SDP Approaches for Distance Geometry Problems
(a) Triangulation
(b) Cooperative Localization
• Typical triangulation schemes like GPS can be expensive and infeasible. • Wireless nodes within ’hearing’ distance of each other can estimate mutual distances(Received Signal Strength, Time of Arrival). • Can we estimate positions of the sensors using this mutual distance information?
7
PhD. Defense
SDP Approaches for Distance Geometry Problems
Molecule Conformation
(a) Nucleus-Nucleus interaction
(b) Cross Peaks
• 3-D structures of large molecules are useful for studying their function and evolution. e.g. Protein Data Bank. • By exploiting nucleus-nucleus interactions, Nuclear Magnetic Resonance measurements provide ranges on interatomic distances (Wuthrich 1985). • Given sparse, noisy bounds on interatomic distances, can we find (relative) coordinates of atoms?
8
PhD. Defense
SDP Approaches for Distance Geometry Problems
Problem Definition(Exact Distances) • n unknown points xi ∈ Rd , i = 1, . . . , n • m anchors ak ∈ Rd , k = n + 1, . . . , n + m. Positions already known. • For a pair of unknown points (i, j) ∈ N , the exact Euclidean distance is given to us as dˆij . • For a pair of an unknown point and an anchor (i, k) ∈ M, the exact Euclidean distance is given to us as dˆik . • Objective: Find the locations of the n unknown xi , i = 1, . . . , n.
9
PhD. Defense
SDP Approaches for Distance Geometry Problems
• Graph Realization problem. Let V := {1, . . . , n} and A := {an+1 , . . . , an+m }. b is to find the The realization problem in Rd for the graph (V, A; D) coordinates x1 , . . . , xn ∈ Rd from the partial distance data b = {dˆij : (i, j) ∈ N ∪ M}. D
10
PhD. Defense
SDP Approaches for Distance Geometry Problems
Previous Theoretical Work • Schoenberg (1935) and Young and Householder (1938) established some basic properties of distance matrices which have set the stage for future research in distance geometry. • Since the distance information is exact, the realization X = [x1 , . . . xn ] must satisfy kxi − xj k2
kxi − ak k2
= (dˆij )2 , ∀ (i, j) ∈ N = (dˆik )2 , ∀ (i, k) ∈ M
• Consider the relationship between the Gram or inner product matrix b no anchors, Y = X T X and the square distance matrix D, b ij = kx2 k + kx2 k − 2xT xj = Yii + Yjj − 2Yij D i j i
• There exists a realization in dimension d iff the Gram matrix b is positive semidefinite, and has a rank Y ,corresponding to the given D, equal to d.
11
PhD. Defense
SDP Approaches for Distance Geometry Problems
• Distance Matrix completion: Complete distance matrix such that Gram matrix has rank d. Decomposition of Y gives d dimensional embedding. • These ideas form the basis of a class of algorithms known as multidimensional scaling (Torgerson 1952, Cox and Cox 2001, Trosset 1998,2000,2002). However, the problem is nonconvex, the computation can get stuck in local minima, and there is no guarantee of finding a desired realization in polynomial time. • NP-hardness: Saxe (1979), More and Wu (1994), Aspnes et al (2004). • By relaxing the problem to lie in a convex space, a global minimum can be computed in polynomial time.
12
PhD. Defense
SDP Approaches for Distance Geometry Problems
Matrix Representation • Let X = [x1 x2 . . . xn ] ∈ Rd×n . kxi − xj k2 = eTij X T Xeij kxi − ak k2 = (ei ; −ak )T [X Id ]T [X Id ](ei ; −ak ) where ei is the ith unit vector in Rn , eij = ei − ej , and Id is the d × d identity matrix. • We can rewrite the problem as: Find Y, X such that
eTij Y eij = (dˆij )2 , ∀ (i, j) ∈ N XT T Y (ei ; −ak ) = (dˆik )2 , ∀ (i, k) ∈ M (ei ; −ak ) X Id Y = XT X
13
PhD. Defense
SDP Approaches for Distance Geometry Problems
SDP Relaxation • Relax the constraint Y = X T X to Y X T X. This is equivalent to: Y XT 0 X Id • An n × n matrix Z 0, iff. bT Zb ≥ 0, ∀b ∈ Rn .
• The set of all n × n symmetric positive semidefinite matrices forms a convex cone.
Shape of Positive Semidefinite Cone, n = 2
14
PhD. Defense
SDP Approaches for Distance Geometry Problems
• The problem is now a Semidefinite Program with linear equality constraints and one linear matrix inequality in Y, X. Find Y, X such that
eTij Y eij = (dˆij )2 , ∀ (i, j) ∈ N XT T Y (ei ; −ak ) = (dˆik )2 , ∀ (i, k) ∈ M (ei ; −ak ) X Id T Y X 0 (1) X Id
• There exist efficient interior point solvers for solving such SDPs such as Sedumi (Sturm 1998), SDPT3 (Kim et al. 1999), DSDP (Benson et al. 2000) etc.
15
PhD. Defense
SDP Approaches for Distance Geometry Problems
Uniqueness • The distance geometry problem instance is said to be uniquely localizable if ¯ ∈ Rd×n and there is no xi ∈ Rh , i = 1, . . . , n, there is a unique realization X where h > d, such that: kxi − xj k2 = (dˆij )2 , kxi − (ak ; 0)k2 = (dˆik )2 , xi 6=
(¯ xi ; 0)
∀ (i, j) ∈ N
∀ (i, k) ∈ M
for some i ∈ 1, . . . , n
• There should exist no non-trivial realization in a higher dimensional space (The trivial realization corresponds to setting the xi = (¯ xi ; 0) for j = 1, . . . , n)
16
PhD. Defense
SDP Approaches for Distance Geometry Problems
Theorem • The following theorem links the idea of unique localizability to the SDP relaxation. Suppose that there are enough distance measures between the network of points such that the underlying graph is connected. Then, the following statements are equivalent(So and Ye 2005): 1. The problem instance is uniquely localizable. 2. The solution matrix of (1), satisfies Y = X T X. • Proof uses ideas from duality.
17
PhD. Defense
SDP Approaches for Distance Geometry Problems
Observable Error Measure • The SDP relaxation computes a solution that localizes all possibly localizable unknown points. • For points which are not uniquely localizable, Yii 6= kxi k2 , so Yii − kxi k2 acts as an error measure to isolate such points. • Probabilistic interpretation: If each xi is viewed as a random point, Y − X T X is the covariance matrix of X = x1 , . . . , xn . Yii − kxi k2 = 0 means there is no ambiguity in its position.
18
PhD. Defense
SDP Approaches for Distance Geometry Problems
19
Examples 0.6
0.05
0.4
0.04
0.2
0.03 0
0.02
−0.2 −0.4
0.01 −0.4 −0.2 0 0.2 0.4 SDP, RMSD = 3.44e−002
(a) 50 points, 3 anchors, R=0.25.
0.6
0 0
10
20
30
40
(b) Error and trace correlation
50
PhD. Defense
SDP Approaches for Distance Geometry Problems
Sensor Network Localization
20
PhD. Defense
SDP Approaches for Distance Geometry Problems
Problem Definition • Input
– n unknown sensor nodes xj ∈ R2 , j = 1, ..., n, m known sensor nodes(anchors) ak ∈ R2 , k = n + 1, ..., n + m.
– For nodes which are within communication radius R of each other, we have a distance estimate dik between xi and ak , and dij between xi and xj . – The measured distances are noisy. • Output Accurate position estimates for all unknown sensor nodes.
21
PhD. Defense
SDP Approaches for Distance Geometry Problems
Previous Work • Surveys: see Hightower and Boriello (2001), Niculescu (2004) and Estrin et al (2004). • Beacon based triangulation: e.g., Bulusu and Heidemann (2000), Howard et al. (2001). • Distance Forwarding: e.g., Niculescu and Nath (2001), Savarese et al. (2002), Savvides et al (2001, 2002), . • Rigidity Based approaches: e.g. Moore et al. (2004) • Multidimensional Scaling: e.g. Shang et al. (2003), Patwari et al. (2005). • Convex optimization: e.g., Doherty et al. (2001) Tseng (2006), Wolkowicz et al. (2007), Gao et al. (2006)
22
PhD. Defense
SDP Approaches for Distance Geometry Problems
Distance Geometry Model with Noise • If noisy distance measures are given by dij for i, j ∈ N and dkj for k, j ∈ M, then problem can be formulated as an error minimization problem:
Minimizex1 ,...,xn ∈Rd
X
(i,j)∈N
X 2 2 γij kxi − xj k − dij + γik kxi − ak k2 − d2ik (i,k)∈M
• Other models equally applicable: Sum of squares error, Maximum likelihood estimation, Inequality. • Weights γij can be varied based on prior knowledge of noise in measurements. • Again, this can be relaxed to an SDP with linear constraints and one linear matrix inequality in Y, X. Similar relaxations for other formulations too.
23
PhD. Defense
SDP Approaches for Distance Geometry Problems
Effect of Noise • Usually Y 6= X T X • Rank of matrix Y is not d. Interior point methods used to solve SDP give a higher dimensional solution, so there is a ’folding’ • Sensitive to anchor placement
– Anchors in the interior : Crowding at center – Anchors placed in perimeter : Better estimates
24
PhD. Defense
SDP Approaches for Distance Geometry Problems
0.6
0.6
0.4
0.4
0.2
0.2
0
0
−0.2
−0.2
−0.4
−0.4
−0.4 −0.2 0 0.2 0.4 SDP, RMSD = 1.97e−001
(a) 4 inner anchors: (±0.2, ±0.2)
0.6
−0.4 −0.2 0 0.2 0.4 SDP, RMSD = 7.46e−002
0.6
(b) 4 outer anchors: (±0.45, ±0.45)
60 sensors, Radio Range =0.25, 20% noise
25
PhD. Defense
SDP Approaches for Distance Geometry Problems
Regularization • Subtract regularization term from objective function n n X n m X λ 1 X X 2 2 kxi − xj k + kxi − ak k n + m 2 i=1 j=1 i=1 k=1
• Inspired by maximum variance unfolding (Weinberger et al. 2005) and tensegrity theory in graphs(So and Ye 2006)
(a) Higher dimension
(b) Lower dimension
26
PhD. Defense
SDP Approaches for Distance Geometry Problems
0.6
0.6
0.4
0.4
0.2
0.2
0
0
−0.2
−0.2
−0.4
−0.4
−0.4 −0.2 0 0.2 0.4 SDP, RMSD = 1.97e−001
(a) Before
0.6
−0.4 −0.2 0 0.2 0.4 0.6 Regularized SDP, RMSD = 5.12e−002
(b) After
Effect of regularization on previous example
27
PhD. Defense
SDP Approaches for Distance Geometry Problems
Gradient Descent • SDP solution provides good initial point for local gradient descent. • SDP objective function also provides a estimation lower bound which can be used as a proof of quality. • Gradient at each unknown sensor will only depend on neighborhood. So gradient calculation is local.
28
PhD. Defense
SDP Approaches for Distance Geometry Problems
0.5 objective function value
0.35
0
0.3 0.25 0.2 0.15 0.1 0.05 0
−0.5 −0.5
0
0.5
(a) Gradient search trajectories
0.5
0
0
−0.5
−0.5 0
(c) Gradient result, 50 iterations, SDP solution starting point
20 30 40 number of gradient steps
50
(b) The sum of error squares vs. number of gradient search steps
0.5
−0.5
10
0.5
−0.5
0
0.5
(d) Result of gradient search with random initial starting point
29
PhD. Defense
SDP Approaches for Distance Geometry Problems
Simulation Results • Simulations were performed on a network of 60 nodes (30 independent trials) randomly placed in a square region of size 1 × 1. • The distances between between 2 nodes i and j was calculated if it was less than a given radio range R. • The distances were modeled to be noisy by setting ˆ + nf ∗ N (0, 1))| d = d|(1
where d is the measured distance, dˆ is the true value of the distance, nf is a specified constant, and N (0, 1) is a normally distributed random variable.
• Accuracy is measured using Root Mean Square Distance (RMSD): !1/2 n X 1 RM SD = √ kˆ xi − xi k2 . n i=1 • Effect of varying noise factor, radio range and number of anchors studied.
30
PhD. Defense
SDP Approaches for Distance Geometry Problems
SDP 100
m=4 m=6 m=8
60
40
20
0 0.25
60
40
20
0.3 0.35 Radio Range R
0 0.25
0.4
Regularized SDP + Gradient 100
0.4
MDS m=4 m=6 m=8
80 Error (%R)
Error (%R)
0.3 0.35 Radio Range R
100
m=4 m=6 m=8
80
60
40
20
0 0.25
m=4 m=6 m=8
80 Error (%R)
80 Error (%R)
Regularized SDP 100
60
40
20
0.3 0.35 Radio Range R
0.4
0 0.25
0.3 0.35 Radio Range R
0.4
Variation of estimation error (RMSD) with number of anchors (randomly placed) and varying radio range, nf = 0.2.
31
PhD. Defense
SDP Approaches for Distance Geometry Problems
Regularized SDP
80
80
70
70
60
60
50 40 R=0.25 R=0.30 R=0.35 R=0.40
30 20
Error (%R)
Error (%R)
SDP
50 40 30 20
10 0 0.05
R=0.25 R=0.30 R=0.35 R=0.40
10 0.1
0.15 0.2 Noise Factor nf
0.25
0 0.05
0.3
0.1
0.15 0.2 Noise Factor nf
Regularized SDP + Gradient
70
70 60
50 40 30
50
10
10 0.15 0.2 Noise Factor nf
0.3
30 20
0.1
0.25
0.3
R=0.25 R=0.30 R=0.35 R=0.40
40
20
0 0.05
0.25
MDS
Error (%R)
Error (%R)
60
0.3
80
R=0.25 R=0.30 R=0.35 R=0.40
80
0.25
0 0.05
0.1
0.15 0.2 Noise Factor nf
Variation of estimation error (RMSD) with measurement noise, 6 anchors randomly placed.
32
PhD. Defense
SDP Approaches for Distance Geometry Problems
Molecule Conformation
33
PhD. Defense
SDP Approaches for Distance Geometry Problems
Problem Definition • Consider a molecule with n atoms with unknown coordinates xi ∈ R3 , i = 1, . . . , n in a local coordinate system. • Input: Upper and lower bounds on some interatomic distances in a molecule. NMR measurements can only provide measurements between atoms less than about 6˚ A (1˚ A = 10−8 cm) apart. The pairs of unknown points for which these bounds are known are specified in the edge set N . • We define the lower bound distance matrices D = (dij ) where dij is specified if (i, j) ∈ N , and dij = 0 otherwise. The upper bound distance matrix ¯ = (d¯ij ) is defined similarly. D • 3-D structure of the molecule.
34
PhD. Defense
SDP Approaches for Distance Geometry Problems
Previous Work • EMBED algorithm, developed by Crippen and Havel (1988). • Multidimensional scaling by Trosset (1998,2000,2002) • Data Box Algorithm described by Glunt et al. (1999) • ABBIE, developed by Hendrickson exploits the concepts of graph rigidity. (1991) • Global optimization methods: Global smoothing and continuation approach is used in the DGSOL algorithm by More and Wu (1994,1997), GNOMAD algorithm, developed by Williams et al. (2001) • Geometric build up (triangulation) algorithms have been demonstrated in work by Wu et al. (2000,2003).
35
PhD. Defense
SDP Approaches for Distance Geometry Problems
Anchor-Free Inequality based formulation
Find such that Find
X = [x1 , . . . , xn ] ∈ Rd×n d2 ≤ kxi − xj k2 ≤ d¯2 ∀(i, j) ∈ N ij
ij
Y
s.t.
d2ij ≤ eTij Y eij ≤ d¯2ij ,
Y 0
• Regularization term can also be added. • Translational invariance term is added:
Pn
i=1
∀(i, j) ∈ N
xi = 0 or Y e = 0.
• X is chosen to be the best rank-d approximation of the eigenvalue decomposition of Y . • If there is enough distance data, X would only be a rotated version of the original set of points. • The use of SDP for Euclidean distance matrix completion has been explored before by Barvinok (1995), Alfakih et al. (1999), Laurent (2001)
36
PhD. Defense
SDP Approaches for Distance Geometry Problems
Towards a Distributed Algorithm • Challenge: Large size of problem, thousands of atoms. • SDP becomes intractable for n in the hundreds. • Same problem experienced for large sensor networks. • Need a distributed method, robust to error propagation. • Fully decentralized methods could be used, but they cannot use the global information efficiently • Middle Path : Break up larger problem into subproblems.
37
PhD. Defense
SDP Approaches for Distance Geometry Problems
Distributed Algorithm with Anchors • Partition into equal sized clusters. Include unknown point in cluster with closest anchors. • Solve each cluster separately. • At each stage, discard points badly estimated and solve again using well estimated points as anchors. • Anchor free: Need new clustering method. Also need to ensure all clusters are in a common coordinate system.
38
PhD. Defense
SDP Approaches for Distance Geometry Problems
Clustering Reverse Cuthill-McKee ordering permutes the distance matrix to block diagonalize it. 0
0
10
10
20
20
30
30
40
40
50
50
60
60
70
70
80
80
90
90 0
10
20
30
40
50 nz = 630
60
70
80
90
0
10
20
30
Block Diagonalization
40
50 nz = 630
60
70
80
90
39
PhD. Defense
SDP Approaches for Distance Geometry Problems
Stitching
Clusters with common points
• After solving the clusters, use common points between neighboring clusters for stitching. • Discard bad points using error measures. • Use good points as anchors in subsequent computations.
40
PhD. Defense
SDP Approaches for Distance Geometry Problems
Simulations • Chose molecules with given 3-D coordinates from the Protein Data Bank. • Only considered distances from atoms within cutoff radius R, The cutoff radius R is chosen to be 6˚ A, the upper limit for NMR measurements. • Upper and lower bound distance were generated as follows: d¯ij dij
= dˆij (1 + |¯ ij |) = dˆij max(0, 1 − |ij |),
2 where ¯ij , ij ∼ N (0, σij )
• Sometimes, not all distances below 6˚ A are known. Therefore, we also consider experiments where only a fraction of the entire distance data is used. • Accuracy of estimation measured using Root Mean Square Distance error (with respect to true coordinates). • Experiments performed on Pentium IV 3.0 GHz PC with 2GB RAM.
41
PhD. Defense
SDP Approaches for Distance Geometry Problems
(a) 1HOE(558 atoms) with 40% dis- (b) 1PHT(814 atoms) with 50% distances, 1% noise, RMSD = 0.2154 ˚ A tances, 5% noise, RMSD = 1.2014 ˚ A
(c) 1RHJ(3740 atoms) with 70% dis- (d) 1F39(1534 atoms) with 85% distances,5% noise, RMSD = 0.9535˚ A tances, 10% noise, RMSD = 0.9852 ˚ A
42
PhD. Defense
SDP Approaches for Distance Geometry Problems
1I7W(8629 atoms) with 100% of distances below 6˚ A and 10% noise on upper and lower bounds, RMSD = 1.3842 ˚ A
43
PhD. Defense
SDP Approaches for Distance Geometry Problems
Results for 100% of Distances below 6˚ A and 10% noise on upper and lower bounds
PDB ID
No. of
% of total pairwise
atoms
distances used
RMSD(˚ A)
CPU time (secs)
1PTQ
402
8.79
0.1936
107.7
1PHT
814
5.35
1.2624
223.9
1AX8
1003
3.74
0.6408
280.1
1F39
1534
2.43
0.7338
358.0
1KDH
2923
1.34
1.1035
959.1
1RHJ
3740
1.10
1.8365
1584.4
1MQQ
5681
0.75
1.4906
1461.1
44
PhD. Defense
SDP Approaches for Distance Geometry Problems
˚ and 5% noise on upper and lower bounds and Results with 70% of distances below 6A with 50% of distances below 6˚ A and 1% noise on upper and lower bounds PDB ID
No. of atoms
70% Distances, 5% Noise RMSD(˚ A)
CPU time (secs)
50% Distances, 1% Noise RMSD(˚ A)
CPU time (secs)
1PTQ
402
0.2794
93.8
0.7560
22.1
1PHT
814
0.4701
129.4
0.6639
53.6
1AX8
1003
0.8184
251.9
0.0314
71.8
1F39
1534
1.1271
353.1
0.2809
113.4
1KDH
2923
2.5693
1641.0
2.8222
488.4
1RHJ
3740
0.9535
1286.1
0.1158
361.5
1MQQ
5681
3.1570
1683.4
2.3108
1466.2
45
PhD. Defense
SDP Approaches for Distance Geometry Problems
Conclusion • Introduced an SDP relaxation for the distance geometry problem. • Proposed techniques to improve performance with noisy distances. Demonstrated effectiveness for sensor network localization. • Proposed a distributed algorithm for solving larger problems. Demonstrated working on molecule conformation problems. • Other work: Extensions to angle measurements(Biswas et al. 2005) • Future research directions:
– Better stitching and error measurement. – Tighter relaxations: Sum of squares (Nie 2006) – Explore links with dimensionality reduction(Weinberger et al. 2006, Xiao et al. 2006) and tensegrity theory(So and Ye 2006).
46
PhD. Defense
SDP Approaches for Distance Geometry Problems
Acknowledgements • Prof. Yinyu Ye and research group. • Prof. Kim-Chuan Toh (NUS), Anthony So, Tzu-Chen Liang, Ta-Chung Wang. • Prof Stephen Boyd, Prof Leonidas Guibas and Prof. Fabian Pease. • Dr. Hamid Aghajan, Dr. Renate Fruchter. • Friends • Family.
47
PhD. Defense
SDP Approaches for Distance Geometry Problems
Computational complexity • Complexity depends upon interior-point algorithms for solving SDP. • For a network of n unknown points, let k = 3 + |N | + |M|, the number of constraints. • The worst-case number of total arithmetic operations to compute an √ 3 2 3 -solution of (1), is bounded by O((n + n k + k ) n + k log 1 ). √ • n + k log 1 represents the bound on the worst-case number of interior-point algorithm iterations. Practically, the number of interior-point algorithm iterations is a constant 20 − 30. • k is bounded by O(n2 ). Thus, the worst case complexity is bounded by O(n6 ). • In practice, as the number of points increases, number of constraints required to solve for all positions scales more typically with O(n). Therefore the complexity is typically bounded by O(n3 ).
48