Semidefinite Programming Approaches for Distance ... - Google Sites

PhD. Defense

SDP Approaches for Distance Geometry Problems

Semidefinite Programming Approaches for Distance Geometry Problems Pratik Biswas Electrical Engineering, Stanford University

1

PhD. Defense


The Distance Geometry Problem

• Input: Subset of all pair-wise Euclidean distances between a set of points(possibly corrupted by noise), and positions of some points(anchors). • Output: Positions of all points. • Synonymous with the Graph Realization problem.

2

PhD. Defense


Outline • Motivation • Distance geometry background • Semidefinite Programming relaxation • Sensor Network Localization • Molecule Conformation • Conclusion

3

PhD. Defense


Motivation Distance geometry and other related problems find application in a wide variety of real life problems. • Estimating positions of nodes in a wireless network. • Predicting structures of large molecules. • Learning low dimensional manifolds in high dimensional spaces. • Internet tomography, data visualization.

4

PhD. Defense


Wireless Sensor Networks

Source: MICA Project, Berkeley

• Large number of sensor ’motes’ densely deployed over a geographical area. • Sensors collect local environmental information(temperature, humidity) and communicate with each other to relay the gathered data to a decision making authority.

5

PhD. Defense


• Applications in habitat monitoring (James Reserve Project 2002), emergency medical care (CodeBlue project 2004), industrial control in manufacturing plants (Intel Mote Project 2005), structural health monitoring (NETSHM 2006), military and civilian surveillance (He et al. 2004), asset location (Fontana et al. 2003) etc. • Important to know positions of sensors to better interpret the data and for efficient routing.

6

PhD. Defense


(a) Triangulation

(b) Cooperative Localization

• Typical triangulation schemes like GPS can be expensive and infeasible. • Wireless nodes within ’hearing’ distance of each other can estimate mutual distances(Received Signal Strength, Time of Arrival). • Can we estimate positions of the sensors using this mutual distance information?

7

PhD. Defense


Molecule Conformation

(a) Nucleus-Nucleus interaction

(b) Cross Peaks

• 3-D structures of large molecules are useful for studying their function and evolution. e.g. Protein Data Bank. • By exploiting nucleus-nucleus interactions, Nuclear Magnetic Resonance measurements provide ranges on interatomic distances (Wuthrich 1985). • Given sparse, noisy bounds on interatomic distances, can we find (relative) coordinates of atoms?

8

PhD. Defense


Problem Definition(Exact Distances) • n unknown points xi ∈ Rd , i = 1, . . . , n • m anchors ak ∈ Rd , k = n + 1, . . . , n + m. Positions already known. • For a pair of unknown points (i, j) ∈ N , the exact Euclidean distance is given to us as dîj . • For a pair of an unknown point and an anchor (i, k) ∈ M, the exact Euclidean distance is given to us as dîk . • Objective: Find the locations of the n unknown xi , i = 1, . . . , n.

9

PhD. Defense


• Graph Realization problem. Let V := {1, . . . , n} and A := {an+1 , . . . , an+m }. b is to find the The realization problem in Rd for the graph (V, A; D) coordinates x1 , . . . , xn ∈ Rd from the partial distance data b = {dîj : (i, j) ∈ N ∪ M}. D

10

PhD. Defense


Previous Theoretical Work • Schoenberg (1935) and Young and Householder (1938) established some basic properties of distance matrices which have set the stage for future research in distance geometry. • Since the distance information is exact, the realization X = [x1 , . . . xn ] must satisfy kxi − xj k2

kxi − ak k2

= (dîj )2 , ∀ (i, j) ∈ N = (dîk )2 , ∀ (i, k) ∈ M

• Consider the relationship between the Gram or inner product matrix b no anchors, Y = X T X and the square distance matrix D, b ij = kx2 k + kx2 k − 2xT xj = Yii + Yjj − 2Yij D i j i

• There exists a realization in dimension d iff the Gram matrix b is positive semidefinite, and has a rank Y ,corresponding to the given D, equal to d.

11

PhD. Defense


• Distance Matrix completion: Complete distance matrix such that Gram matrix has rank d. Decomposition of Y gives d dimensional embedding. • These ideas form the basis of a class of algorithms known as multidimensional scaling (Torgerson 1952, Cox and Cox 2001, Trosset 1998,2000,2002). However, the problem is nonconvex, the computation can get stuck in local minima, and there is no guarantee of finding a desired realization in polynomial time. • NP-hardness: Saxe (1979), More and Wu (1994), Aspnes et al (2004). • By relaxing the problem to lie in a convex space, a global minimum can be computed in polynomial time.

12

PhD. Defense


Matrix Representation • Let X = [x1 x2 . . . xn ] ∈ Rd×n . kxi − xj k2 = eTij X T Xeij kxi − ak k2 = (ei ; −ak )T [X Id ]T [X Id ](ei ; −ak ) where ei is the ith unit vector in Rn , eij = ei − ej , and Id is the d × d identity matrix. • We can rewrite the problem as: Find Y, X such that

eTij Y eij = (dîj )2 , ∀ (i, j) ∈ N   XT T  Y  (ei ; −ak ) = (dîk )2 , ∀ (i, k) ∈ M (ei ; −ak ) X Id Y = XT X

13

PhD. Defense


SDP Relaxation • Relax the constraint Y = X T X to Y X T X. This is equivalent to:   Y XT  0 X Id • An n × n matrix Z 0, iff. bT Zb ≥ 0, ∀b ∈ Rn .

• The set of all n × n symmetric positive semidefinite matrices forms a convex cone.

Shape of Positive Semidefinite Cone, n = 2

14

PhD. Defense


• The problem is now a Semidefinite Program with linear equality constraints and one linear matrix inequality in Y, X. Find Y, X such that

eTij Y eij = (dîj )2 , ∀ (i, j) ∈ N   XT T  Y  (ei ; −ak ) = (dîk )2 , ∀ (i, k) ∈ M (ei ; −ak ) X Id   T Y X 0  (1) X Id

• There exist efficient interior point solvers for solving such SDPs such as Sedumi (Sturm 1998), SDPT3 (Kim et al. 1999), DSDP (Benson et al. 2000) etc.

15

PhD. Defense


Uniqueness • The distance geometry problem instance is said to be uniquely localizable if ¯ ∈ Rd×n and there is no xi ∈ Rh , i = 1, . . . , n, there is a unique realization X where h > d, such that: kxi − xj k2 = (dîj )2 , kxi − (ak ; 0)k2 = (dîk )2 , xi 6=

(¯ xi ; 0)

∀ (i, j) ∈ N

∀ (i, k) ∈ M

for some i ∈ 1, . . . , n

• There should exist no non-trivial realization in a higher dimensional space (The trivial realization corresponds to setting the xi = (¯ xi ; 0) for j = 1, . . . , n)

16

PhD. Defense


Theorem • The following theorem links the idea of unique localizability to the SDP relaxation. Suppose that there are enough distance measures between the network of points such that the underlying graph is connected. Then, the following statements are equivalent(So and Ye 2005): 1. The problem instance is uniquely localizable. 2. The solution matrix of (1), satisfies Y = X T X. • Proof uses ideas from duality.

17

PhD. Defense


Observable Error Measure • The SDP relaxation computes a solution that localizes all possibly localizable unknown points. • For points which are not uniquely localizable, Yii 6= kxi k2 , so Yii − kxi k2 acts as an error measure to isolate such points. • Probabilistic interpretation: If each xi is viewed as a random point, Y − X T X is the covariance matrix of X = x1 , . . . , xn . Yii − kxi k2 = 0 means there is no ambiguity in its position.

18

PhD. Defense


19

Examples 0.6

0.05

0.4

0.04

0.2

0.03 0

0.02

−0.2 −0.4

0.01 −0.4 −0.2 0 0.2 0.4 SDP, RMSD = 3.44e−002

(a) 50 points, 3 anchors, R=0.25.

0.6

0 0

10

20

30

40

(b) Error and trace correlation

50

PhD. Defense


Sensor Network Localization

20

PhD. Defense


Problem Definition • Input

– n unknown sensor nodes xj ∈ R2 , j = 1, ..., n, m known sensor nodes(anchors) ak ∈ R2 , k = n + 1, ..., n + m.

– For nodes which are within communication radius R of each other, we have a distance estimate dik between xi and ak , and dij between xi and xj . – The measured distances are noisy. • Output Accurate position estimates for all unknown sensor nodes.

21

PhD. Defense


Previous Work • Surveys: see Hightower and Boriello (2001), Niculescu (2004) and Estrin et al (2004). • Beacon based triangulation: e.g., Bulusu and Heidemann (2000), Howard et al. (2001). • Distance Forwarding: e.g., Niculescu and Nath (2001), Savarese et al. (2002), Savvides et al (2001, 2002), . • Rigidity Based approaches: e.g. Moore et al. (2004) • Multidimensional Scaling: e.g. Shang et al. (2003), Patwari et al. (2005). • Convex optimization: e.g., Doherty et al. (2001) Tseng (2006), Wolkowicz et al. (2007), Gao et al. (2006)

22

PhD. Defense


Distance Geometry Model with Noise • If noisy distance measures are given by dij for i, j ∈ N and dkj for k, j ∈ M, then problem can be formulated as an error minimization problem:

Minimizex1 ,...,xn ∈Rd

X

(i,j)∈N

X 2 2 γij kxi − xj k − dij + γik kxi − ak k2 − d2ik (i,k)∈M

• Other models equally applicable: Sum of squares error, Maximum likelihood estimation, Inequality. • Weights γij can be varied based on prior knowledge of noise in measurements. • Again, this can be relaxed to an SDP with linear constraints and one linear matrix inequality in Y, X. Similar relaxations for other formulations too.

23

PhD. Defense


Effect of Noise • Usually Y 6= X T X • Rank of matrix Y is not d. Interior point methods used to solve SDP give a higher dimensional solution, so there is a ’folding’ • Sensitive to anchor placement

– Anchors in the interior : Crowding at center – Anchors placed in perimeter : Better estimates

24

PhD. Defense


0.6

0.6

0.4

0.4

0.2

0.2

0

0

−0.2

−0.2

−0.4

−0.4

−0.4 −0.2 0 0.2 0.4 SDP, RMSD = 1.97e−001

(a) 4 inner anchors: (±0.2, ±0.2)

0.6

−0.4 −0.2 0 0.2 0.4 SDP, RMSD = 7.46e−002

0.6

(b) 4 outer anchors: (±0.45, ±0.45)

60 sensors, Radio Range =0.25, 20% noise

25

PhD. Defense


Regularization • Subtract regularization term from objective function n n X n m X λ 1 X X 2 2 kxi − xj k + kxi − ak k n + m 2 i=1 j=1 i=1 k=1

• Inspired by maximum variance unfolding (Weinberger et al. 2005) and tensegrity theory in graphs(So and Ye 2006)

(a) Higher dimension

(b) Lower dimension

26

PhD. Defense


0.6

0.6

0.4

0.4

0.2

0.2

0

0

−0.2

−0.2

−0.4

−0.4

−0.4 −0.2 0 0.2 0.4 SDP, RMSD = 1.97e−001

(a) Before

0.6

−0.4 −0.2 0 0.2 0.4 0.6 Regularized SDP, RMSD = 5.12e−002

(b) After

Effect of regularization on previous example

27

PhD. Defense


Gradient Descent • SDP solution provides good initial point for local gradient descent. • SDP objective function also provides a estimation lower bound which can be used as a proof of quality. • Gradient at each unknown sensor will only depend on neighborhood. So gradient calculation is local.

28

PhD. Defense


0.5 objective function value

0.35

0

0.3 0.25 0.2 0.15 0.1 0.05 0

−0.5 −0.5

0

0.5

(a) Gradient search trajectories

0.5

0

0

−0.5

−0.5 0

(c) Gradient result, 50 iterations, SDP solution starting point

20 30 40 number of gradient steps

50

(b) The sum of error squares vs. number of gradient search steps

0.5

−0.5

10

0.5

−0.5

0

0.5

(d) Result of gradient search with random initial starting point

29

PhD. Defense


Simulation Results • Simulations were performed on a network of 60 nodes (30 independent trials) randomly placed in a square region of size 1 × 1. • The distances between between 2 nodes i and j was calculated if it was less than a given radio range R. • The distances were modeled to be noisy by setting ˆ + nf ∗ N (0, 1))| d = d|(1

where d is the measured distance, dˆ is the true value of the distance, nf is a specified constant, and N (0, 1) is a normally distributed random variable.

• Accuracy is measured using Root Mean Square Distance (RMSD): !1/2 n X 1 RM SD = √ kˆ xi − xi k2 . n i=1 • Effect of varying noise factor, radio range and number of anchors studied.

30

PhD. Defense


SDP 100

m=4 m=6 m=8

60

40

20

0 0.25

60

40

20

0.3 0.35 Radio Range R

0 0.25

0.4

Regularized SDP + Gradient 100

0.4

MDS m=4 m=6 m=8

80 Error (%R)

Error (%R)


100

m=4 m=6 m=8

80

60

40

20

0 0.25

m=4 m=6 m=8

80 Error (%R)

80 Error (%R)

Regularized SDP 100

60

40

20


0.4

0 0.25


0.4

Variation of estimation error (RMSD) with number of anchors (randomly placed) and varying radio range, nf = 0.2.

31

PhD. Defense


Regularized SDP

80

80

70

70

60

60

50 40 R=0.25 R=0.30 R=0.35 R=0.40

30 20

Error (%R)

Error (%R)

SDP

50 40 30 20

10 0 0.05

R=0.25 R=0.30 R=0.35 R=0.40

10 0.1

0.15 0.2 Noise Factor nf

0.25

0 0.05

0.3

0.1


Regularized SDP + Gradient

70

70 60

50 40 30

50

10

10 0.15 0.2 Noise Factor nf

0.3

30 20

0.1

0.25

0.3

R=0.25 R=0.30 R=0.35 R=0.40

40

20

0 0.05

0.25

MDS

Error (%R)

Error (%R)

60

0.3

80

R=0.25 R=0.30 R=0.35 R=0.40

80

0.25

0 0.05

0.1


Variation of estimation error (RMSD) with measurement noise, 6 anchors randomly placed.

32

PhD. Defense


Molecule Conformation

33

PhD. Defense


Problem Definition • Consider a molecule with n atoms with unknown coordinates xi ∈ R3 , i = 1, . . . , n in a local coordinate system. • Input: Upper and lower bounds on some interatomic distances in a molecule. NMR measurements can only provide measurements between atoms less than about 6˚ A (1˚ A = 10−8 cm) apart. The pairs of unknown points for which these bounds are known are specified in the edge set N . • We define the lower bound distance matrices D = (dij ) where dij is specified if (i, j) ∈ N , and dij = 0 otherwise. The upper bound distance matrix ¯ = (d¯ij ) is defined similarly. D • 3-D structure of the molecule.

34

PhD. Defense


Previous Work • EMBED algorithm, developed by Crippen and Havel (1988). • Multidimensional scaling by Trosset (1998,2000,2002) • Data Box Algorithm described by Glunt et al. (1999) • ABBIE, developed by Hendrickson exploits the concepts of graph rigidity. (1991) • Global optimization methods: Global smoothing and continuation approach is used in the DGSOL algorithm by More and Wu (1994,1997), GNOMAD algorithm, developed by Williams et al. (2001) • Geometric build up (triangulation) algorithms have been demonstrated in work by Wu et al. (2000,2003).

35

PhD. Defense


Anchor-Free Inequality based formulation

Find such that Find

X = [x1 , . . . , xn ] ∈ Rd×n d2 ≤ kxi − xj k2 ≤ d¯2 ∀(i, j) ∈ N ij

ij

Y

s.t.

d2ij ≤ eTij Y eij ≤ d¯2ij ,

Y 0

• Regularization term can also be added. • Translational invariance term is added:

Pn

i=1

∀(i, j) ∈ N

xi = 0 or Y e = 0.

• X is chosen to be the best rank-d approximation of the eigenvalue decomposition of Y . • If there is enough distance data, X would only be a rotated version of the original set of points. • The use of SDP for Euclidean distance matrix completion has been explored before by Barvinok (1995), Alfakih et al. (1999), Laurent (2001)

36

PhD. Defense


Towards a Distributed Algorithm • Challenge: Large size of problem, thousands of atoms. • SDP becomes intractable for n in the hundreds. • Same problem experienced for large sensor networks. • Need a distributed method, robust to error propagation. • Fully decentralized methods could be used, but they cannot use the global information efficiently • Middle Path : Break up larger problem into subproblems.

37

PhD. Defense


Distributed Algorithm with Anchors • Partition into equal sized clusters. Include unknown point in cluster with closest anchors. • Solve each cluster separately. • At each stage, discard points badly estimated and solve again using well estimated points as anchors. • Anchor free: Need new clustering method. Also need to ensure all clusters are in a common coordinate system.

38

PhD. Defense


Clustering Reverse Cuthill-McKee ordering permutes the distance matrix to block diagonalize it. 0

0

10

10

20

20

30

30

40

40

50

50

60

60

70

70

80

80

90

90 0

10

20

30

40

50 nz = 630

60

70

80

90

0

10

20

30

Block Diagonalization

40

50 nz = 630

60

70

80

90

39

PhD. Defense


Stitching

Clusters with common points

• After solving the clusters, use common points between neighboring clusters for stitching. • Discard bad points using error measures. • Use good points as anchors in subsequent computations.

40

PhD. Defense


Simulations • Chose molecules with given 3-D coordinates from the Protein Data Bank. • Only considered distances from atoms within cutoff radius R, The cutoff radius R is chosen to be 6˚ A, the upper limit for NMR measurements. • Upper and lower bound distance were generated as follows: d¯ij dij

= dîj (1 + |¯ ij |) = dîj max(0, 1 − |ij |),

2 where ¯ij , ij ∼ N (0, σij )

• Sometimes, not all distances below 6˚ A are known. Therefore, we also consider experiments where only a fraction of the entire distance data is used. • Accuracy of estimation measured using Root Mean Square Distance error (with respect to true coordinates). • Experiments performed on Pentium IV 3.0 GHz PC with 2GB RAM.

41

PhD. Defense


(a) 1HOE(558 atoms) with 40% dis- (b) 1PHT(814 atoms) with 50% distances, 1% noise, RMSD = 0.2154 ˚ A tances, 5% noise, RMSD = 1.2014 ˚ A

(c) 1RHJ(3740 atoms) with 70% dis- (d) 1F39(1534 atoms) with 85% distances,5% noise, RMSD = 0.9535˚ A tances, 10% noise, RMSD = 0.9852 ˚ A

42

PhD. Defense


1I7W(8629 atoms) with 100% of distances below 6˚ A and 10% noise on upper and lower bounds, RMSD = 1.3842 ˚ A

43

PhD. Defense


Results for 100% of Distances below 6˚ A and 10% noise on upper and lower bounds

PDB ID

No. of

% of total pairwise

atoms

distances used

RMSD(˚ A)

CPU time (secs)

1PTQ

402

8.79

0.1936

107.7

1PHT

814

5.35

1.2624

223.9

1AX8

1003

3.74

0.6408

280.1

1F39

1534

2.43

0.7338

358.0

1KDH

2923

1.34

1.1035

959.1

1RHJ

3740

1.10

1.8365

1584.4

1MQQ

5681

0.75

1.4906

1461.1

44

PhD. Defense


˚ and 5% noise on upper and lower bounds and Results with 70% of distances below 6A with 50% of distances below 6˚ A and 1% noise on upper and lower bounds PDB ID

No. of atoms

70% Distances, 5% Noise RMSD(˚ A)

CPU time (secs)

50% Distances, 1% Noise RMSD(˚ A)

CPU time (secs)

1PTQ

402

0.2794

93.8

0.7560

22.1

1PHT

814

0.4701

129.4

0.6639

53.6

1AX8

1003

0.8184

251.9

0.0314

71.8

1F39

1534

1.1271

353.1

0.2809

113.4

1KDH

2923

2.5693

1641.0

2.8222

488.4

1RHJ

3740

0.9535

1286.1

0.1158

361.5

1MQQ

5681

3.1570

1683.4

2.3108

1466.2

45

PhD. Defense


Conclusion • Introduced an SDP relaxation for the distance geometry problem. • Proposed techniques to improve performance with noisy distances. Demonstrated effectiveness for sensor network localization. • Proposed a distributed algorithm for solving larger problems. Demonstrated working on molecule conformation problems. • Other work: Extensions to angle measurements(Biswas et al. 2005) • Future research directions:

– Better stitching and error measurement. – Tighter relaxations: Sum of squares (Nie 2006) – Explore links with dimensionality reduction(Weinberger et al. 2006, Xiao et al. 2006) and tensegrity theory(So and Ye 2006).

46

PhD. Defense


Acknowledgements • Prof. Yinyu Ye and research group. • Prof. Kim-Chuan Toh (NUS), Anthony So, Tzu-Chen Liang, Ta-Chung Wang. • Prof Stephen Boyd, Prof Leonidas Guibas and Prof. Fabian Pease. • Dr. Hamid Aghajan, Dr. Renate Fruchter. • Friends • Family.

47

PhD. Defense


Computational complexity • Complexity depends upon interior-point algorithms for solving SDP. • For a network of n unknown points, let k = 3 + |N | + |M|, the number of constraints. • The worst-case number of total arithmetic operations to compute an √ 3 2 3 -solution of (1), is bounded by O((n + n k + k ) n + k log 1 ). √ • n + k log 1 represents the bound on the worst-case number of interior-point algorithm iterations. Practically, the number of interior-point algorithm iterations is a constant 20 − 30. • k is bounded by O(n2 ). Thus, the worst case complexity is bounded by O(n6 ). • In practice, as the number of points increases, number of constraints required to solve for all positions scales more typically with O(n). Therefore the complexity is typically bounded by O(n3 ).

48

Semidefinite Programming Approaches for Distance ... - Google Sites

Semidefinite Programming Approaches for Distance ... - Google Sites

Suggest Documents

semidefinite programming approaches to distance ... - Google Sites

Semidefinite Programming Approaches for Sensor ... - CiteSeerX

Robust Semidefinite Programming Approaches for ... - Semantic Scholar

Robust Semidefinite Programming Approaches ... - Optimization Online

Semi-infinite linear programming approaches to semidefinite ...