Greedy Sparse Learning over Network

1

Greedy Sparse Learning over Network Ahmed Zaki, Arun Venkitaraman, Saikat Chatterjee and Lars K. Rasmussen School of Electrical Engineering and ACCESS Linneaus Center KTH Royal Institute of Technology, Stockholm

Abstract—In this article, we develop a greedy algorithm for solving the problem of sparse learning over a right stochastic network in a distributed manner. The nodes iteratively estimate the sparse signal by exchanging a weighted version of their individual intermediate estimates over the network. We provide a restricted-isometry-property (RIP)-based theoretical performance guarantee in the presence of additive noise. In the absence of noise, we show that under certain conditions on the RIP-constant of measurement matrix at each node of the network, the individual node estimates collectively converge to the true sparse signal. Furthermore, we provide an upper bound on the number of iterations required by the greedy algorithm to converge. Through simulations, we also show that the practical performance of the proposed algorithm is better than other stateof-the-art distributed greedy algorithms found in the literature.

I. I NTRODUCTION Sparse learning and distributed algorithms are becoming increasingly relevant in signal processing, machine learning and optimization for big data analysis [1]. Furthermore, for big data analysis with real-time constraints, a requirement is fast execution with limited computational and communication costs. In this context, low complexity distributed algorithms for large-scale sparse learning are of particular interest [2], [3]. Sparse learning is used for various applications, where sparsity is promoted in associated signals/systems. Examples are sparse linear and non-linear regressions, sparse coding and representations [4], [5], compressed sensing [6], [7], model selection [8] and machine learning [9], [10], [11]. Many realworld and man-made signals/data are sparse in nature. A sparse signal or datum is typically high-dimensional, and has a few non-zero scalars or a few significant scalars. In addition, in the context of big data analysis, signals/data may be expected to reside in a distributed fashion. That means signals/data are not in a centralized place for modeling, but reside in nodes of a network. To analyze the signals/data, nodes communicate over the network and exchange relevant messages to achieve better performance, for example, estimation of signals from measurements observed at individual nodes. Therefore, design and theoretical analysis of large-scale distributed sparse learning algorithms over networks is of considerable importance. To realize large-scale distributed sparse learning, greedy algorithms were designed and analyzed recently [12], [13]. A major advantage of greedy algorithms is their low computational complexity and hence their suitability for large-scale scenarios. A limitation is that the theoretical analysis of greedy algorithms has limited tractability as the associated algorithmic steps are non-convex. Encouraged by earlier success in designing greedy algorithms for sparse representations [14], [15], [16], [17], [18], we develop a greedy algorithm for sparse

learning over network. The proposed algorithm has a low communication cost (for information exchange over network) and computational cost at each node (for learning). We also provide a theoretical analysis of the proposed algorithm. The main contributions are as follows: 1) We propose a greedy algorithm for distributed sparse learning that is fast to converge, and has a low communication cost. 2) We provide a restricted-isometry-property (RIP)-based theoretical analysis of performance. For noiseless measurements, we show that the estimates at all nodes converge to the true signal under some theoretical conditions. 3) We provide an upper bound on the number of times nodes exchange information for convergence. 4) Numerical simulations show that greed is good – the proposed greedy algorithm provides competitive performance. A. System Model We consider a connected network with L nodes. The neighborhood of node l is defined by the set Nl ⊆ {1, 2, . . . , L}. Each node is capable of receiving weighted data from other nodes in its neighborhood. The weights assigned to links between nodes can be written as a network matrix H = {hlr } ∈ RL×L , where hlr is the link weight from node r to node l. Typically weights are non-negative and hlr = 0 if there is no link from node r to node l. H is also known as the adjacency matrix in literature. Assumption 1: H is a right stochastic matrix. The motivation of the above assumption is that any nonnegative network matrix can be recast as a right stochastic matrix. Our task is to learn or estimate a sparse signal x in a distributed manner over the network. The observation vector yl ∈ RMl at the node l is modeled as yl = Al x + el ,

(1)

Ml ×N

where Al ∈ R is a measurement or dictionary matrix with Ml < N , and el is an additive noise that represents measurement noise and/or modeling error. Assumption 2: The sparse signal x has at most s non-zero scalar elements, and s is known a-priori. The above assumption is used for several greedy sparse learning algorithms, such as subspace pursuit (SP) [19], CoSaMP [15], iterative hard thresholding (IHT) [20] etc. B. Literature Review In this subsection, we provide a review of the relevant literature for sparse learning over network. First, we discuss the

2

related problem of distributed compressed sensing. Distributed compressed sensing using convex optimization is addressed in [21], [22]. Based on the method of alternating-directionmethod-of-multipliers (ADMM), distributed basis pursuit [23] and distributed LASSO [24] were realized. Distributed LASSO (D-LASSO) is shown to solve the exact convex optimization problem of a centralized scenario in a distributed fashion. Using adaptive signal processing techniques such as gradient search, distributed sparse learning and sparse regression were realized in [25], [26], [27]. These adaptive algorithms typically use a mean-square-error cost averaged over all the nodes in a network to find an optimal solution via gradient search. Distributed learning and regression is then performed via diffusion of information over a network and adaptation in all individual nodes. Using a Bayesian framework of finding posterior with sparsity promoting priors, a distributed-message-passing based method is proposed in [28]. Further, to promote sparsity in solutions, distributed system learning such as distributed dictionary learning is also considered in [29], [30]. This work is focused on distributed greedy algorithms. In this regard, a distributed iterative hard thresholding algorithm was proposed in [31] for distributed sparse learning in both static and time-varying networks. Each node of a network finds a local intermediate estimate and the mean value of the set of estimates is found using a global consensus. Further improvement on the work [31] was proposed in [32] to provide a reduced communication cost. Another distributed hard thresholding (DiHaT) algorithm was developed in [12] where observations, measurement matrices and local estimates are exchanged over the network. The DiHaT algorithm provides fast convergence compared to D-LASSO, and provides competitive performance, but at the expense of high communication cost. At this point, we mention that our proposed algorithm exchanges estimates of the underlying sparse signal over network, unlike the case of [12] where other data is also exchanged. In [12], the authors also proposed a simplified algorithm that only uses estimate exchange, but the algorithm lacks any theoretical analysis. Denoting the signal at node l by xl , we have so far discussed distributed algorithms that mainly use the system setup shown in Section I-A. For this model, the measured signals are the same for all the nodes; that means ∀l, xl = x. There exist other signal models in the literature where the measured signals are not the same for all nodes. For example, supports of xl are the same in [33], [34], [35], [36], or xl have common and private support and/or signal parts in [37], [38], [13]. We remind that this work considers the setup where ∀l, xl = x. C. Notations and preliminaries We use calligraphic letters T and S to denote sets that are sub-sets of Ω , {1, 2, . . . , N }. We use |T | and T c to denote the cardinality and complement of the set T , respectively. For the matrix A ∈ RM ×N , a sub-matrix AT ∈ RM ×|T | consists of the columns of A indexed by i ∈ T . Similarly, for x ∈ RN , a sub-vector xT ∈ R|T | is composed of the components of x indexed by i ∈ T . Also we denote (·)t and (·)† as transpose and pseudo-inverse, respectively. In this work

A†T , (AT )† . We use k·k and k·k0 to denote the standard `2 norm and `0 norm of a vector, respectively. For a sparse signal x = [x1 , x2 , . . . , xi , . . . , xN ]t , the support-set T of x is defined as T = {i : xi 6= 0}. We use |x| to denote element-wise amplitudes of x, and x∗ denotes elements of |x| in descending order; x∗j denotes the j’th largest element of |x|. We define a function that finds support of a vector, as follows supp(x, s) , {the set of indices corresponding to the s largest amplitude components of x}. If x has s non-zero elements then T = supp(x, s). We use the standard definition of Restricted-Isometry-Property of a matrix as follows: Definition 1 (RIP: Restricted Isometry Property [39]): – A matrix A ∈ RM ×N satisfies the RIP with Restricted Isometry Constant (RIC) δs if (1 − δs )kxk2 ≤ kAxk2 ≤ (1 + δs )kxk2 holds for all vectors x ∈ RN such that kxk0 ≤ s, and 0 ≤ δs < 1. The RIC satisfies the monotonicity property, i.e., δs ≤ δ2s ≤ δ3s . Next we provide three lemmas for later theoretical analysis. Lemma 1: [40, Lemma 1] For nonnegative numbers a, b, c, d, x, y, 2 p a2 + c2 x + (b + d)y . (ax + by)2 + (cx + dy)2 ≤ Lemma 2: [40, Lemma 2] Consider the standard sparse representation model y = Ax + e with kxk0 = s1 . Let ¯ such that x ¯S ← S ⊆ {1, 2, . . . , N } and |S|= s2 . Define x ¯ S c ← 0. If A has RIC δs1 +s2 < 1, then we have A†S y ; x p ¯ )S k≤ δs1 +s2 kx − x ¯ k+ 1 + δs2 kek, k(x − x and s ¯ k≤ kx − x

p 1 + δs2 1 kxS c k+ kek. 2 1 − δs1 +s2 1 − δs! +s2

Lemma 3: Consider two vectors x and z with kxk0 = s1 , kzk0 = s2 and s2 ≥ s1 . We have S1 , supp(x, s1 ) and S2 , supp(z, s2 ). Let S∇ denote the set of indices of the s2 − s1 smallest magnitude elements in z. Then, √ √ kxS∇ k≤ 2k(x − z)S2 k≤ 2kx − zk. Proof: The proof is shown in Section VI. The remainder of the paper is organized as follows. We propose the algorithm in Section II. The theoretical guarantees of the algorithm is discussed in Section III. The simulation results are presented in Section IV. The necessary supporting lemmas and detailed analytic proofs are shown in Section VI. II. D ISTRIBUTED A LGORITHM - NGP In this section we propose and describe the distributed greedy algorithm. We refer to the algorithm as network greedy pursuit (NGP). The algorithm is motivated by the algorithmic structures of SP and CoSaMP. The pseudo-code of the NGP algorithm is shown in Algorithm 1. The NGP algorithm is executed at each node of the network.

3

Algorithm 1 Network Greedy Pursuit - at l’th node Known inputs: yl , Al , s, {hl,j }L j=1 Initialization: k←0 (k denotes time index) rl,k ← yl (Residual at k’th time) Tˆl,k ← ∅ (Support-set at k’th time) ˆ l,k ← 0 x (Sparse solution at k’th time) Iteration: repeat k ←k+1 (Iteration counter) t ` 1: Tl,k ← supp(Al rl,k , s) 2: T˜l,k ← T`l,k ∪ Tˆl,k−1 ˜ l,k such that x ˜ T˜l,k ← A†l,T˜ yl ; x ˜ T˜ c ← 0 3: x l,k l,k P ˇ l,k ← ˜ r,k 4: x hlr x (Information exchange)

is no noise, then the NGP algorithm achieves exact estimate of x at every node. Note that if H is an identity matrix (that means no information exchange) then NGP reduces to the standard SP algorithm [19]. According to [40], for SP, δ3s = 0.4859. Naturally a question arises as to why the NGP has a poorer δ3s than SP! The main reason is that a typical RIP based analysis is a worstcase approach, where upper-bound relaxations are used. As the NGP has an information-exchange step as an additional step compared to SP, our RIP based analysis technique has an inherent limitation - it is unable to provide better δ3s than SP. At this point, we mention that distributed IHT (DiHaT) is the closest distributed greedy algorithm for sparse learning, and DiHaT has a δ3s = 13 = 0.3333 [12].

r∈Nl

Tˆl,k ← supp(ˇ xl,k , s) ˆ l,k such that x ˆ Tˆl,k ← A†l,Tˆ yl ; x ˆ Tˆ c ← 0 6: x l,k l,k ˆ l,k 7: rl,k ← yl − Al x until stopping criterion ˆ l , Tˆl , rl Final output: x 5:

Denoting k as the iteration counter for information exchange, node l receives intermediate estimates from its neighboring nodes. The received intermediate estimates are denoted by {˜ xr,k }, r ∈ Nl . Then node l computes a weighted sum (see step 4 of the algorithm). The support-finding function supp(·) incorporates pruning such that a-priori knowledge of ˜ l,k and x ˇ l,k both can have a sparsity level is used. Note that x sparsity level higher than s, and pruning helps to sparsify. In practice, examples of stopping criteria of the algorithm are: a predetermined number of iterations and non-decreasing `2 -norm of residual (rl,k ). Note that, the NGP algorithm is not designed for consensus across nodes. The algorithm is designed to achieve better performance via cooperation using information exchange over network, compared to the mode of no cooperation. Now we state the main theoretical result of the NGP algorithm. Main result: The NGP algorithm follows a recurrence inequality as stated below: L P

ˆ l,k k≤ kx − x

l=1 L P

ˆ l,k−1 k+ %l kx − x

l=1

kxTˆ c k≤

l=1

L P

l,k

L P l=1

L P

ςl kel k,

l=1

%l kxTˆ c

l,k−1

k+

L P

ϑl kel k,

l=1

where %l , ςl , ϑl are known constants in terms of hlr and δ3s of Al . The recurrence inequality provides a bound on the performance of the algorithm at iteration k. Under certain technical conditions, the NGP algorithm will converge in a finite number of iterations. At convergence, the performance of the algorithm is upper bounded as follows ˆ l k≤ constant × kekmax , ∀l, kx − x where scalar kekmax , maxkel k. In particular, one of the l technical conditions is a requirement that δ3s of Al has to be bounded as δ3s < 0.362. That means, if δ3s < 0.362 and there

III. T HEORETICAL A NALYSIS In this section, we show and prove the main theoretical results. For notational clarity, we use RIC constant δ3s , max{δ3s (Al )}. l

A. On convergence To prove convergence of the NGP algorithm, we first need a recurrence inequality relation. Theorem 1 (Recurrence inequality): At iteration k, we have L P

L P

ˆ l,k k≤ c1 kx − x

l=1 L P

l=1

kxTˆ c k≤ c1

l=1

ˆ l,k−1 k+ wl kx − x

l,k

L P l=1

L P

d3 wl kel k,

l=1

wl kxTˆ c

l,k−1

k+d4

L P

wl kel k,

l=1

where wl = c1 = d1 = d3 = d4 =

P

hrl ,

r q

2 (3−δ 2 ) 2δ3s 3s 2 2 , (1−δ 3s ) √ √ √ 2(1−δ3s )+ 1+δ3s , d2 = 1 + δ3s , 1−δ3s q √ 1+δ2s 2 2 (d1 + d2 ) + w (1−δ ) , and 1−δ3s 3s l

√

2(d1 + d2 ) +

q

2 (3−δ 2 )(1+δ ) 2δ3s 2s 3s 2 )2 (1−δ )2 . (1−δ3s 3s

This recurrence inequality is mentioned in the main result stated in Section II. The detailed proof of the above theorem is shown in Section VI. Next, we use the intermediate steps in the proof of Theorem 1 for addressing convergence. We show convergence of NGP in two alternative approaches, by the following two theorems. Theorem 2 (Convergence): If δ3s < 0.362 and kekmax ≤ γx∗s , then after k¯ ≤ cs iterations, the NGP algorithm converges and its performance is bounded by ˆ l,k¯ k≤ d kekmax kx − x log(16/c2 )

2 where c = log(1/c21) , d = √1−δ and γ < 1 are positive 3s 1 constants in terms of δ3s . At these conditions, estimated support sets across all nodes are correctly identified, that means, ∀l, Tˆl = supp(x, s) = T . The detailed proof of the above theorem is shown in Section VI. For an interpretation of the above theorem, we provide

4

a numerical example. If δ3s ≤ 0.2, then we have c ≤ 3, d ≤ 2.23; for an appropriate γ such that kekmax ≤ γx∗s , the ˆ l,k¯ k is upper bounded by 2.23kekmax after performance kx − x 3s iterations. kxk > 1, Theorem 3 (Convergence): If δ3s < 0.362 and kek max l m kxk 1 ¯ then after k = log /log iterations, the NGP kekmax

c1

algorithm converges and its performance is bounded by ˆ l,k¯ k≤ d kekmax , kx − x q √ 1+δ2s 1 2 where d = 1 + 1−c . (d + d ) + 2 1 2 1−δ3s 1−δ3s 1 The detailed proof of the above theorem is shown in Section VI. A relevant numerical example is as follows: if kxk = 20 dB, then we have k¯ = 7 and δ3s ≤ 0.2 and kek max d = 15.62. These convergence results are mentioned in the main result stated in Section II. Corollary 1: Consider the special case of a doubly stochastic L P network matrix H. If δ3s < 0.362 and (Lkxk)/( kel k) > 1, l=1 L P 1 then after k¯ = log Lkxk/ kel k /log c1 iterations, l=1

the performance of the NGP algorithm is bounded as L P

ˆ l,k¯ k≤ d kx − x

l=1

L P

kel k.

l=1

q √ 1+δ2s 2 1 where d = 1 + 1−c . (d + d ) + 2 1 2 1−δ3s 1−δ3s 1 The above result follows from using the assumption of doubly stochastic network matrix in the proof of Theorem 3. Remark 1: As 0 < γ < 1, the technical condition kekmax ≤ kxk γx∗s is stricter than the other technical condition kek > 1. max kxk > 1 holds, but That is, if kekmax ≤ γx∗s holds, then kek max not vice-versa. Still the former technical condition leads to a tighter upper bound on performance.

Corollary 2: In the absence of noise, if δ3s < 0.362, the number of iterations required for convergence of NGP algorithm is upper bounded by log(4/c21 ) log (1/ρmin ) s + , c ≤ min log (1/c1 ) log(1/c21 ) The above result follows from Theorem 4 and Theorem 5. IV. S IMULATION R ESULTS In this section, we study the performance of the NGP algorithm. We first describe the simulation setup and then show the simulation results. A. Simulation Setup In the simulations, we use two kinds of network matrices: right stochastic and doubly stochastic. We have already mentioned in Section I-A that a right stochastic matrix is more general. The use of doubly stochastic matrix is mainly for comparison purposes with DiHaT. We loosely define a term called ’degree of the network’ to represent how densely the network is connected. Let us denote the degree of a network by d, which means that each node of the network is connected with d other nodes or ∀l, |Nl |= d. In other words, each row of the edge matrix of the network contains d ones. Given a random edge matrix, a right stochastic matrix can be easily generated. For a doubly stochastic matrix generation, we use a standard setup of optimizing the second largest eigenvalue modulus (SLEM) of a matrix. This is known as the SLEM optimization problem in the literature [41]. Let E denote an edge matrix, µ(H) denote the SLEM of H and 1 denote a column of ones. We use the following convex optimization problem for generating a doubly stochastic H: arg min µ(H) subject to H1 = 1, Ht = H, H

∀(i, j), hij ≥ 0, ∀(i, j) ∈ / E, hij = 0. B. Without additive noise In this section, we consider the case where there is no additive noise in setup (1), that is el = 0. We show that all nodes in the network learn the same signal perfectly, that is ˆ l = x. In other words, the NGP algorithm in this case ∀l, x achieves consensus over the network. Theorem 4: If δ3s < 0.362, then after k¯ ≤ cs iterations, the NGP algorithm provides the same x in all nodes, where log(4/c2 ) c = log(1/c12 ) . 1 An alternate bound on the number of iterations is presented in the following theorem. min ) Theorem 5: If δ3s < 0.362, then after k¯ = d log(1/ρ log(1/c1 ) + e iterations, the NGP algorithm provides the same x in all nodes, i∈T |xi | where ρmin = minkxk and > 0 is an arbitrarily chosen constant. The proofs of the above two theorems are given in Section VI. Remark 2: From Theorem 4, if δ3s < 0.2 then we have c ≤ 2. According to Theorem 5, for a binary sparse signal where the non-zero elements are set to ones, and δ3s < 0.2, we have k¯ = 0.74 log(s).

We perform Monte-Carlo simulations using the generative model (1), where we generate yl by randomly drawing Al , sparse signal x and additive Gaussian noise el . The task is estimating or learning of x from {yl , ∀l}. We use binary and Gaussian sparse signals, meaning that non-zero components of the sparse signal are either ones or iid Gaussian scalars. The use of binary and Gaussian sparse signals was shown in [19], [18]. We use the mean signal-to-estimation-noise error (mSENR) as a performance metric, as defined below L

mSENR =

1 X E{kxk2 } . ˆ l k2 } L E{kx − x l=1

Here E(·) denotes expectation that is performed in practice by sample averaging. For ease of simulations, we consider that all Al s are of same size, and thus ∀l, Ml = M . We define the signal-to-noise ratio (SNR) for node l as SNRl =

E{kxk2 } . E{kel k2 }

We assume that SNRl is the same across all nodes; hence we drop the subscript l and write SNR. In [12], the greedy

5

70

Binary Sparse data

1

Gaussian Sparse data

1

0.9

0.9

0.8

0.8

0.7

0.7

Prob. of PSE

mSENR (in dB)

50

40

30

20

NGP, Binary Data, d=4 NGP, Gaussian Data, d=4 NGP, Binary Data, d=1 NGP, Gaussian Data, d=1

10

Prob. of PSE

60

0.6 0.5 0.4 0.3 0.2

0 15

20

25

30

35

40

45

50

55

60

SNR (in dB)

0.5 0.4 0.3 0.2

NGP DiHaT

0.1 0 10

0.6

0

20

NGP DiHaT

0.1

40

Sparsity level (s)

60

0

0

20

40

60

Sparsity level (s)

Fig. 1: mSENR performance of NGP and DiHaT with respect to SNR, Binary and Gaussian sparse data, M=100, N=500, s=20, L = 20, d = 4, number of information exchange = 30. Network matrix is right stochastic.

Fig. 2: Probability of perfect support-set estimation (PSE) of NGP and DiHaT with respect to sparsity level, M = 100, N = 500, L = 20, d = 4, number of information exchange = 30, No observation noise.

DiHaT algorithm is compared with the convex D-LASSO [24] and D-LASSO is shown to provide a slow convergence (see simulation results in [12]); DiHaT is shown to provide much faster convergence. In our simulations, we compare NGP with DiHaT. For both algorithms, the stopping criterion is a maximum allowable number of iterations, that we set as 30. This number is motivated by the experiment in Section IV-E. For all experiments described below, we use observation dimension M = 100, signal dimension N = 500, L = 20 nodes in the network and degree of the associated matrix as d = 4.

stochastic network and then compare NGP and DiHaT. We generate a doubly stochastic matrix and fix it for all the experiments. For the experiment of phase transition, we vary the sparsity level s and evaluate the probability of PSE in a frequentist manner - that is the number of times PSE occurred in the Monte-Carlo simulations. We perform the experiment for the noiseless condition. Fig. 2 shows the probability of PSE for NGP and DiHaT. It is evident that the NGP can recover signals with a higher sparsity level as compared to the DiHaT. This can be explained partly by the higher RIC, δ3s < 0.362 for the NGP as compared to δ3s < 31 = 0.333... for the DiHaT.

B. Performance for a right stochastic matrix In this case, we show the performance of the NGP for a right stochastic matrix. Fig. 1 shows mSENR versus SNR for both binary and Gaussian sparse signals where the sparsity level is s = 20. It can be seen that the NGP performs linearly in dB scale. For this simulation setup, we do not show the performance of DiHaT as it is designed for a doubly stochastic network matrix, and we found that its performance is quite pessimistic. As a comparison, we also show the performance of the NGP for the case where the network matrix H is the identity matrix (d = 1). It was mentioned that for this H, the NGP reduces to the classical SP algorithm. It can be seen that the mSENR versus SNR performance for this case is quite poor and does not improve with SNR. It is evident from this study that the NGP improves performance for the case of cooperation among nodes.

D. Experiment on robustness to measurement noise In this experiment, we check how the algorithm performance varies with respect to SNR. We use binary and Gaussian sparse signals with s = 20. We plot the mSENR versus SNR in Fig. 3. It can be seen that the performance of the NGP is significantly better than DiHaT, prominently in the high SNR region.

C. Phase transition experiment

E. Experiment on convergence speed In this experiment, we observe how fast algorithms converge. A fast convergence leads to less usage of communication and computational resources, and less time delay in learning. Here communication resources refers to the cost of transmitting and receiving estimates among nodes. We compare the NGP and the DiHaT for the noiseless condition and 30 dB SNR. In this case we use s = 20. The results are shown in Fig. 4 where we show mSENR versus the number of iterations. We note that the NGP has a significantly quicker convergence. In our experiments, the NGP achieved convergence typically within five iterations.

In this case, we are interested in the probability of perfect signal estimation. For the noiseless case, this is equivalent to the probability of perfect support-set estimation (PSE) in all nodes. From this experiment onward, we use a doubly

F. Experiment on sensitivity to sparsity level So far in all previous experiments, the NGP is able to demonstrate its performance advantage over DiHaT. A natural

6

Binary Sparse data


70

60

60

50

50

mSENR (in dB)

mSENR (in dB)

70

40

30

40

30

NGP DiHaT

20

10 10

20

30

40

50

NGP DiHaT

20

10 10

60

20

30

SNR (in dB)

40

50

60

SNR (in dB)

Fig. 3: mSENR performance of NGP and DiHaT with respect to SNR, M = 100, N = 500, s = 20, L = 20, d = 4, and number of information exchange = 30.

Binary Sparse data

50


70

45 60 40 50

mSENR (in dB)

mSENR (in dB)

35 30 25 20 15

5 0

0

5

10

15

20

25

30

20

NGP, SNR = 30dB DiHaT, SNR = 30dB NGP, No noise DiHaT, No noise

10

40

NGP, SNR = 30dB DiHaT, SNR = 30dB NGP, No noise DiHaT, No noise

10

30

0

0

5

Number of iterations

10

15

20

25

30


Fig. 4: mSENR performance of NGP and DiHaT with respect to number of information exchange, M = 100, N = 500, s = 20, L = 20, d = 4, and SNR = 30dB.

35

30

25

mSENR (in dB)

question arises as to what happens if the sparsity level s is not exactly known. In this section, we experiment to find the sensitivity of the NGP and the DiHaT algorithms to knowledge of the sparsity level. For this, we use 30 dB SNR and a Gaussian sparse signal with s = 20. Fig. 5 shows the result for different assumed s that varies as s = 18, 20, 25, and 30. While the NGP shows better performance in the regime of lower number of iterations, it is difficult to conclude its superiority over DiHaT in the regime of a higher number of iterations.

20

NGP, Input s = 18 NGP, Input s = 20 NGP, Input s = 25 NGP, Input s = 30 DiHaT, Input s = 18 DiHaT, Input s = 20 DiHaT, Input s = 25 DiHaT, Input s = 30

15

10

G. Reproducible research 5

0

5

10

15


In the spirit of reproducible research, we provide relevant Matlab codes at the link https://www.kth.se/ en/ees/omskolan/organisation/avdelningar/commth/research/ software and the link https://sites.google.com/site/saikatchatt/ softwares. The code produces the results shown in the figures.

Fig. 5: Sensitivity performance of NGP and DiHaT with respect to knowledge of the sparsity level, Gaussian sparse data, M = 100, N = 500, s = 20, L = 20, d = 4, SNR = 30dB.

7

V. C ONCLUSIONS In this work, we proposed a greedy algorithm termed the NGP algorithm to iteratively estimate a sparse signal over a connected network. The nodes of the network exchange intermediate estimates with their neighbors at every iteration. The nodes perform a weighted average of the received estimates to improve their local estimates. We used an RIP-based analysis to show that under the condition of no noise at the nodes, the estimates converge to the the true value of the sparse signal. The limitation of the RIP based analysis becomes prominent as the convergence conditions become more strict with increasing number of steps in the algorithm. We also demonstrated the performance of the NGP algorithm under various typical scenarios using simulation. It was shown that the NGP has a fast convergence, that is the performance saturates within five iterations. The quick convergence of the NGP makes it a useful tool in sparse learning for big data.

It can be easily seen that Steps 1 and 2 for the NGP are similar to the SP algorithm. Therefore, from Lemma 4, we can write the following inequality for Step 2, p √ ˆ l,k−1 k+ 2(1 + δ2s )kel k. kxT˜ c k≤ 2δ3s kx − x (7) l,k

Next, using Lemma 2, after Step 3, we have s √ 1 1 + δ2s ˜ l,k k ≤ kx − x kxT˜ c k+ kel k 2 l,k 1 − δ3s 1 − δ3s s 2 (7) 2δ3s ˆ (8) ≤ 2 kx − xl,k−1 k+d1 kel k. 1 − δ3s √ √ 2(1−δ3s )+ 1+δ3s where d1 = . Next, we bound kxTˆ c k in 1−δ3s l,k ˆ ˜ Step 5. Define, ∇Tl,k , { ∪ Tr,k } \ Tˆl,k . Then, we have r∈Nl

2

kxTˆ c k =kx∇Tˆl,k k2 +kx X

hlr xT˜ c k2 ,

(9)

r,k

r∈Nl

VI. T HEORETICAL PROOFS

where (a) follows from the assumption that ∀l,

A. Proof of Lemma 3

P

hlr = 1.

r∈Nl

Consider the following relation, = kxS1 ∩S∇ + (z − x)S1 ∩S∇ k ≥ kxS1 ∩S∇ k−k(z − x)S1 ∩S∇ k.

k2

r∈Nl

(a)

≤ kx∇Tˆl,k k2 +k

kzS1 ∩S∇ k

c ∩ T˜r,k

l,k

(2)

Also, from Lemma 3, we have √ ˇ l,k k kx∇Tˆl,k k ≤ 2kx − x X X (a) √ ˜ r,k k = 2k hlr x − hlr x

Rearranging the terms in the above equation gives

r∈Nl

r∈Nl

√ X ˜ r,k k, ≤ 2 hlr kx − x

(b)

kxS1 ∩S∇ k

≤ kzS1 ∩S∇ k+k(z − x)S1 ∩S∇ k.

(3)

Define S´ , S2 \ S∇ as the set of indices of the s1 highest magnitude elements in z. Now, we can write, kzS1 ∩S∇ k2

= kzS1 ∩S∇ k2 +kzS´k2 −kzS´k2 ≤ kzS2 k2 −kzS´k2 = kzS2 \S1 k2 +kzS1 k2 −kzS´k2 ≤ kzS2 \S1 k2 ,

(4)

(5)

From (3) and (5), and from the fact that (S2 \S1 )∩(S1 ∩S∇ ) = ∅, we have kxS∇ k

= kxS1 ∩S∇ k ≤ k(x − z)S2 \S1 k+k(x − z)S1 ∩S∇ k √ ≤ 2k(x − z)(S2 \S1 )∪(S1 ∩S∇ ) k √ ≤ 2k(x − z)S2 k.

Combining (9) and (10), we have #2 " (7) √ P 2 ˜ r,k k 2 hlr kx − x kxTˆ c k ≤ l,k r∈Nl # " 2 √ p P ˆ r,k−1 k+ 2(1 + δ2s )ker k hlr + 2δ3s kx − x r∈Nl " # q 2 2 (8) √ P 2δ3s ˆ r,k−1 k+d1 ker k ≤ 2 hlr 2 kx − x 1−δ3s r∈Nl # " √ 2 p P ˆ r,k−1 k+ 2(1 + δ2s )ker k + hlr 2δ3s kx − x . r∈Nl

(6)

B. Proof of Theorem 1 We consider the bounds on the estimates in the k’th iteration at node l. The following lemma was proved for the SP algorithm in [40]. Lemma 4: [40, Lemma 3] In steps 1 and 2 of SP, we have p √ ˆ l,k−1 k+ 2(1 + δ2s )kel k. kxT˜ c k≤ 2δ3s kx − x l,k

where P (a) and (b) follows from the assumption that ∀l, hlr = 1 and ∀l, r, the value, hlr ≥ 0 respectively. r∈Nl

where we used the highest magnitude property of S´ in the last step. The above equation can be written as kzS1 ∩S∇ k≤ kzS2 \S1 k= k(z − x)S2 \S1 k.

(10)

r∈Nl

Applying Lemma 1 and simplifying, we get, q 2 2 ) P 2δ3s (3−δ3s ˆ r,k−1 k kxTˆ c k≤ hlr kx − x 2 1−δ3s l,k r∈Nl √ P + 2(d1 + d2 ) hlr ker k.

(11)

r∈Nl

Also, from Lemma 2, we have q √ 1+δ2s 1 ˆ l,k k ≤ 1−δ kx − x 2 kxT ˆ c k+ 1−δ3s kel k. 3s

l,k

From equation (11) and (12), we can write √ P 1+δ2s ˆ l,k k≤ c1 ˆ r,k−1 k+ 1−δ kx − x hlr kx − x kel k 3s r∈N ql P 2 + 1−δ hlr ker k. 2 (d1 + d2 ) 3s

r∈Nl

(12)

8

Summing up thePabove equation ∀l and using the network assumption that hrl = wl , we get,

Then, at iteration k2 , we can write the above equation in the vector form as,

r L P

ˆ l,k k≤ c1 kx − x

L P

ˆ l,k−1 k+ wl kx − x

d3 wl kel k.

To prove the second result, we consider (11), which after summing ∀l can be written as q 2 l L 2 ) P P 2δ3s (3−δ3s ˆ l,k−1 k kxTˆ c k≤ wl kx − x 2 1−δ3s l,k l=1 l=1 (13) L √ P + 2(d1 + d2 ) wl kel k. l=1

Substituting k − 1 in (12), we have q 1 ˆ l,k−1 k ≤ 1−δ kx − x 2 kxT ˆc

kxTˆ c k≤ c1

L X

l,k

l=1

wl kxTˆ c

l,k−1

l=1

k+

1+δ2s 1−δ3s kel k.

k+d4

L X

wl kel k.

then, T˜l,k2 , ∀l contains π({1, 2, . √ . . , p + q}) where, c2 = q 2 (1+δ ) √ 2 2δ3s 2δ3s 2s and d5 = 2(1 + δ2s ) + . 1−δ 2 1−δ 2 3s

r,k−1

r∈Nl

h it t Let kxTˆ c k , kxTˆ c k. . . kxTˆ c k , kek , [ke1 k. . . keL k] . 1,k L,k k We can vectorize the above equation to write kxTˆ c k ≤ c1 H kxTˆ c k + d4 Hkek. k−1

From equation (7) and (12), we can write l,k

l,k−1

t

Let kekmax = maxkel k and kekmax , [kekmax . . . kekmax ] . l

Then, we can bound kxT˜ c k as

k+d5 kel k.

c2 (c1 H)k2 −k1 −1 kxTˆ c k

k1 + c2 d4 H IL + . . . + (c1 H)k2 −k1 −2 + d5 kekmax .

max

h it where kx∗{p+1,...,s} k = kx∗{p+1,...,s} k. . . kx∗{p+1,...,s} k . Alternately, for any l, we can write c2 d4 kxT˜ c k≤ c2 c1k2 −k1 −1 kx∗{p+1,...,s} k+ 1−c + d5 kekmax . 1 l,k2

Now, it can be easily seen that if x∗p+q ≥ c2 c1k2 −k1 −1 kx∗{p+1,...,s} k+

(15)

c2 d4 1−c1

+ d5 kekmax ,

then π({1, 2, . . . , p + q}) ⊂ T˜l,k2 , ∀l. Next, we find the condition that π({1, 2, . . . , p+q}) ⊆ Tˆl,k . Lemma 6: Consider k1 , k2 such that k1 < k2 . If ∀l, π({1, 2, . . . , p}) ⊆ Tˆl,k1 , π({1, 2, . . . , p + q}) ⊆ T˜l,k2 and c3 d4 + d6 kekmax , x∗p+q ≥ c3 c1k2 −k1 −1 kx∗{p+1,...,s} k+ 1−c 1 ˆl,k , ∀l contains π({1, 2, . . . , p + q}). The constants, then Tq 2 q √ 2 (1+δ ) 4δ 2 4δ3s 2s c3 = (1−δ3s and d = + 2d1 . 2 )2 6 (1−δ 2 )(1−δ3s )2 3s

3s

Proof It is sufficient to prove that the s highest magnitude ˇ l,k contains the indices π(j) for j ∈ {1, 2, . . . , p + indices of x q}. Mathematically, we need to prove, min j∈{1,2,...,p+q}

Proof From (11) and (14), we can write X X kxTˆ c k≤ c1 hlr kxTˆ c k+d4 hlr ker k.

kxT˜ c k≤ c2 kxTˆ c

k1 + c2 d4 H IL + . . . + (c1 H)k2 −k1 −2 + d5 kek.

k2

Let π be the permutation of indices of x such that |xπ(j) |= x∗j where x∗i ≥ x∗j for i ≤ j. In other words, x∗ is x sorted in the descending order of magnitude. The proof follows in two steps: (1) We find the condition such that π({1, 2, . . . , p + q}) ⊆ Tˆl,k2 assuming that π({1, 2, . . . , p}) ⊆ Tˆl,k1 , where k2 > k1 , (2) We use the above condition to bound the number of iterations of the NGP algorithm. The condition in step-1 is a consequence of the following two lemmas. First, Lemma 5 derives the condition that π({1, 2, . . . , p+q}) ⊆ T˜l,k , i.e., the desired indices are selected in Step 2 of NGP. Lemma 5: Consider k1 , k2 such that k1 < k2 . If ∀l, π({1, 2, . . . , p}) ⊂ Tˆl,k1 and c2 d4 x∗p+q ≥ c2 c1k2 −k1 −1 kx∗{p+1,...,s} k+ 1−c + d 5 kekmax , 1

k

c2 (c1 H)k2 −k1 −1 kxTˆ c k

From the assumptions, we have for any l, kxTˆ c k≤ l,k1 kx∗{p+1,...,s} k. Using the right stochastic property of H, we can write, c2 d4 kxT˜ c k ≤ c2 ck12 −k1 −1 kx∗{p+1,...,s} k + 1−c + d kek 5 1

C. Proof of Theorem 2

r∈Nl

k2

(14)

l,k

kxT˜ c k ≤

k2

l=1

3s

(16)

it h where kxT˜ c k , kxT˜ c k. . . kxT˜ c k . Applying (15) repeat1,k L,k k edly to (16), we can write for k1 < k2 ,

kxT˜ c k ≤ √

Combining (13) and (14), we get L X

k + d5 kek,

k2

l,k−1

3s

k2 −1

k2

l=1

l=1

l=1

kxT˜ c k ≤ c2 kxTˆ c

L P

k(ˇ xl,k )π(j) k> maxc k(ˇ xl,k )d k, ∀l. d∈T

The LHS of (17) can be written as

!

P (a)

˜ k(ˇ xl,k )π(j) k = h x lr T˜r,k

r∈Nl

π(j)

P (b) P

´ T˜r,k − xT˜r,k = hlr xπ(j) + hlr x

r∈Nl

π(j)

l

r∈N

P

P

˜ T˜r,k − xT˜r,k ≥ hlr xπ(j) − hlr x

r∈Nl

r∈Nl π(j)

P

˜ T˜r,k − xT˜r,k ≥ x∗p+q − hlr x

,

r∈Nl π(j)

(17)

9

where (a) and (b) follows from the fact that π({1, 2, . . . , p + q}) ⊂ T˜l,k2 , ∀l. Similary, the RHS of (17) can be bounded as

!

P

˜ T˜r,k hlr x k(ˇ xl,k )d k =

r∈Nl

d

P P

˜ T˜r,k − xT˜r,k hlr xd + hlr x =

r∈Nl

d r∈Nl

P

˜ T˜r,k − xT˜r,k . hlr x =

r∈Nl d

in [42, Theorem 6] require that c1 < 1 which holds √ as δ3s < , where 0.362. Note that the constant γ is defined as γ = 2 4d2−1 7 c3 d4 + d . We point out that the constant d in our result d7 = 1−c 6 1 is different from that derived in [42, Theorem 6]. We provide here an argument for the same. From Step 6 of the NGP at ˆ l,k¯ k≤ kyl − Al xk= kel k≤ kekmax . node l, we have kyl − Al x This follows because Tˆl,k¯ = T (see the proof in [42, Theorem 6]). Next, we can write 1 ˆ l,k¯ k ˆ l,k¯ k ≤ √1−δ kAl x − x kx − x 2s 1 2 ˆ l,k¯ k+kel k ≤ √1−δ ≤ √1−δ kyl − Al x kekmax ,

Using the above two bounds, the condition (17) can now be written as

P

˜ T˜r,k − xT˜r,k hlr x x∗p+q >

r∈Nl π(j)

(18)

P

˜ T˜r,k − xT˜r,k . + hlr x

r∈Nl d

where the last inequality follows from the fact that δ2s < δ3s .

Define the RHS of the required condition from (18) at node l as RHSl . Then, we can write the sufficient condition as

√

P

˜ T˜r,k − xT˜r,k RHSl ≤ 2 hlr x

r∈Nl {π(j),d} √ P ˜ r,k k. ≤ 2 hlr kx − x r∈Nl

From the above equation and (8), we can bound RHSl as q 2 √ P 4δ3s ˆ RHSl ≤ hlr 2d ke k kx − x k+ 2 1 r r,k−1 1−δ (14) P ≤ c3 hlr kxTˆ c

r,k−1

r∈Nl

P

k+d6

3s

ˆ k k , [kx − x ˆ 1,k k. . . kx − x ˆ L,k k]t . The above where kx − x bound can be simplified as ˆ k¯ k ≤ c1 Hkx − x ˆ k−1 k kx − x ¯ √ q 1+δ2s 2 + 1−δ3s IL + 1−δ kekmax 2 (d1 + d2 )H 3s

ˆ k−1 k + d8 kekmax , c1 Hkx − x ¯

≤

where (a) follows from the right√ stochastic property of H. q 1+δ2s 2 Also, d8 = 1−δ . Applying the above 2 (d1 + d2 ) + 1−δ 3s 3s relation iteratively and using the fact that c1 < 1 (as δ3s < 0.362), we get

hlr ker k.

At iteration k2 , RHSl can be vectorized as t

k2 −1

k + d6 Hkek.

¯

ˆ k¯ k ≤ (c1 H)k kx − x ˆ0k kx − x

¯ +d8 IL + . . . + (c1 H)k−1 kekmax

Applying (15) repeatedly, we can write for k1 < k2 , RHS ≤ c3 (c1 H)k2 −k1 −1 HkxTˆ c k

(a)

≤

k1

+ c3 d4 IL + . . . + (c1 H) ≤

ˆ k−1 ˆ k¯ k ≤ c1 Hkx − x k kx − x ¯ √ q 1+δ2s 2 + 1−δ3s IL + 1−δ kek, 2 (d1 + d2 )H

(a)

r∈Nl

RHS = [RHS1 . . . RHSL ] ≤ c3 HkxTˆ c

k2 −k1

−2

+ d6 Hkek c3 d4 c3 ck12 −k1 −1 kx∗{p+1,...,s} k + 1−c + d kekmax , 6 1

where (a) follows from the assumption that for any l, kxTˆ c k≤ kx∗{p+1,...,s} k and the right stochastic property of l,k1 H. From the above bound on RHS, it can be easily seen that (18) is satisfied when c3 d4 x∗p+q > c3 ck12 −k1 −1 kx∗{p+1,...,s} k+ 1−c + d kekmax . 6 1 The above two results are compactly stated as follows. Corollary 3: The sufficient condition Lemma 5 and Lemma 6 to hold is x∗p+q

3s

D. Proof of Theorem 3 ¯ From (8), at the k’th iteration, we can write in the vector form,

3s

r∈Nl

(a)

2s

for >

c3 d4 c1k2 −k1 kx∗{p+1,...,s} k+ 1−c + d6 kekmax . 1 The above corollary follows from the observation that for δ3s < 1, we have c2 < c3 < c1 and d5 < d6 . Next, the above corollary can be used to prove Theorem 2 by similar steps as outlined in [42, Theorem 6]. The proof steps

¯

ck1 kxk +

d8 1−c1 kekmax ,

where (a) follows from the right stochastic property of H ˆ l,0 k= kxk, ∀l. Also, we have and the initial condition, kx − x t defined kxk , [x . . . x] in the above equation. Substituting kxk 1 ¯ k = dlog /log e, we get the result. kekmax

c1

E. Proof of Theorem 4 To prove the theorem, we use the results of Lemma 5 and Lemma 6 with kekmax = 0. The proof then follows from using the steps outlined in [42, Theorem 5] and the assumption that c1 < 1 which holds as δ3s < 0.362. F. Proof of Theorem 5 The proof follows from contradiction-based arguments similar to those presented in the proof of [19, Theorem 7]. Suppose that T * Tˆl,k¯ . Then, kxTˆ c k≥ min|xi |= ρmin kxk. ¯ l,k

i∈T

10

From equation (15), we can write (for no measurement noise), ¯

(a)

kxTˆ c k ≤ c1 H kxTˆ c k ≤ (c1 H)k kxk < ρmin kxk, ¯ k

¯ k−1

where (a) follows from the right stochastic property of H ¯ Note that we need c1 < 1 and substituting the value of k. (which holds as δ3s < 0.362) to have k¯ ≥ 1. The above equation implies that kxTˆ c k< ρmin kxk. This contradicts the ¯ l,k assumption that T * Tˆl,k¯ . Hence, proved. R EFERENCES [1] K. Slavakis, G. B. Giannakis, and G. Mateos, “Modeling and optimization for big data analytics: (statistical) learning tools for our era of data deluge,” IEEE Signal Processing Magazine, vol. 31, pp. 18–31, Sept 2014. [2] Z. J. Xiang, H. Xu, and P. J. Ramadge, “Learning sparse representations of high dimensional data on large scale dictionaries,” in Advances in Neural Information Processing Systems 24, pp. 900–908, Curran Associates, Inc., 2011. [3] A. C. Gilbert, P. Indyk, M. Iwen, and L. Schmidt, “Recent developments in the sparse fourier transform: A compressed fourier transform for big data,” IEEE Signal Processing Magazine, vol. 31, pp. 91–100, Sept 2014. [4] M. Elad, Sparse and redundant representations: From theory to applications in signal and image processing. Springer, 2010. [5] R. Rubinstein, A. Bruckstein, and M. Elad, “Dictionaries for sparse representation modeling,” Proceedings of the IEEE, vol. 98, pp. 1045 –1057, june 2010. [6] D. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory, vol. 52, pp. 1289 –1306, Apr. 2006. [7] E. Candes and M. Wakin, “An introduction to compressive sampling,” IEEE Signal Proc. Magazine, vol. 25, pp. 21–30, Mar. 2008. [8] O. Banerjee, L. El Ghaoui, and A. d’Aspremont, “Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data,” J. Mach. Learn. Res., vol. 9, pp. 485–516, June 2008. [9] M. Tipping, “Sparse bayesian learning and the relevance vector machine,” J. Machine Learning Res., vol. 1, pp. 211–244, 2001. [10] D. P. Wipf and B. D. Rao, “Sparse bayesian learning for basis selection,” IEEE Trans. Signal Process., vol. 52, pp. 2153 – 2164, Aug. 2004. [11] J. Wright, A. Yang, A. Ganesh, S. Sastry, and M. Yi, “Robust face recognition via sparse representation,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 31, pp. 210 –227, feb. 2009. [12] S. Chouvardas, G. Mileounis, N. Kalouptsidis, and S. Theodoridis, “Greedy sparsity-promoting algorithms for distributed learning,” IEEE Transactions on Signal Processing, vol. 63, pp. 1419–1432, March 2015. [13] D. Sundman, S. Chatterjee, and M. Skoglund, “Design and analysis of a greedy pursuit for distributed compressed sensing,” IEEE Transactions on Signal Processing, vol. 64, pp. 2803–2818, June 2016. [14] J. A. Tropp, “Greed is good: Algorithmic results for sparse approximation,” IEEE Trans. Inf. Theor., vol. 50, pp. 2231–2242, Sept. 2006. [15] D. Needell and J. Tropp, “Cosamp: Iterative signal recovery from incomplete and inaccurate samples,” Tech. Rep., California Inst. Tech., Jul. 2008. [16] D. L. Donoho, Y. Tsaig, I. Drori, and J. L. Starck, “Sparse solution of underdetermined systems of linear equations by stagewise orthogonal matching pursuit,” IEEE Transactions on Information Theory, vol. 58, pp. 1094–1121, Feb 2012. [17] Y. C. Eldar, P. Kuppinger, and H. Bolcskei, “Block-sparse signals: Uncertainty relations and efficient recovery,” IEEE Transactions on Signal Processing, vol. 58, pp. 3042–3054, June 2010. [18] S. Chatterjee, D. Sundman, M. Vehkaper¨a, and M. Skoglund, “Projection-based and look-ahead strategies for atom selection,” IEEE Trans. Signal Process., vol. 60, pp. 634 –647, Feb. 2012. [19] W. Dai and O. Milenkovic, “Subspace pursuit for compressive sensing signal reconstruction,” IEEE Trans. Inf. Theory, vol. 55, pp. 2230 –2249, May 2009. [20] T. Blumensath and M. Davies, “Iterative hard thresholding for compressed sensing,” Applied and Computational Harmonic Analysis, vol. 27, p. 265274, 2009. [21] M. Duarte, S. Sarvotham, D. Baron, M. Wakin, and R. Baraniuk, “Distributed compressed sensing of jointly sparse signals,” in Signals, Systems and Computers, 2005. Conference Record of the Thirty-Ninth Asilomar Conference on, pp. 1537 – 1541, 28 - november 1, 2005.

[22] D. Baron, M. Duarte, M. Wakin, S. Sarvotham, and R. Baraniuk, “Distributed compressive sensing,” http://arxiv.org/abs/0901.3403, jan. 2009. [23] J. Mota, J. Xavier, P. Aguiar, and M. Puschel, “Distributed basis pursuit,” Signal Processing, IEEE Transactions on, vol. 60, pp. 1942 –1956, april 2012. [24] G. Mateos, J. A. Bazerque, and G. B. Giannakis, “Distributed sparse linear regression,” IEEE Transactions on Signal Processing, vol. 58, pp. 5262–5276, Oct 2010. [25] S. Chouvardas, K. Slavakis, Y. Kopsinis, and S. Theodoridis, “A sparsity promoting adaptive algorithm for distributed learning,” IEEE Transactions on Signal Processing, vol. 60, pp. 5412–5425, Oct 2012. [26] Z. Liu, Y. Liu, and C. Li, “Distributed sparse recursive least-squares over networks,” IEEE Transactions on Signal Processing, vol. 62, pp. 1386– 1395, March 2014. [27] P. D. Lorenzo and A. H. Sayed, “Sparse distributed learning based on diffusion adaptation,” IEEE Transactions on Signal Processing, vol. 61, pp. 1419–1433, March 2013. [28] P. Han, R. Niu, M. Ren, and Y. C. Eldar, “Distributed approximate message passing for sparse signal recovery,” in Signal and Information Processing (GlobalSIP), 2014 IEEE Global Conference on, pp. 497–501, Dec 2014. [29] K. Kreutz-Delgado, J. F. Murray, B. D. Rao, K. Engan, T.-W. Lee, and T. J. Sejnowski, “Dictionary learning algorithms for sparse representation,” Neural computation, vol. 15, pp. 349–396, 02 2003. [30] J. Liang, M. Zhang, X. Zeng, and G. Yu, “Distributed dictionary learning for sparse representation in sensor networks,” IEEE Transactions on Image Processing, vol. 23, pp. 2528–2541, June 2014. [31] S. Patterson, Y. C. Eldar, and I. Keidar, “Distributed compressed sensing for static and time-varying networks,” IEEE Transactions on Signal Processing, vol. 62, pp. 4931–4946, Oct 2014. [32] P. Han, R. Niu, and Y. C. Eldar, “Modified distributed iterative hard thresholding,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3766–3770, April 2015. [33] T. Wimalajeewa and P. K. Varshney, “Omp based joint sparsity pattern recovery under communication constraints,” IEEE Transactions on Signal Processing, vol. 62, pp. 5059–5072, Oct 2014. [34] T. Wimalajeewa and P. K. Varshney, “Wireless compressive sensing over fading channels with distributed sparse random projections,” IEEE Transactions on Signal and Information Processing over Networks, vol. 1, pp. 33–44, March 2015. [35] S. M. Fosson, J. Matamoros, C. Antn-Haro, and E. Magli, “Distributed recovery of jointly sparse signals under communication constraints,” IEEE Transactions on Signal Processing, vol. 64, pp. 3470–3482, July 2016. [36] J. Matamoros, S. M. Fosson, E. Magli, and C. Antn-Haro, “Distributed admm for in-network reconstruction of sparse signals with innovations,” IEEE Transactions on Signal and Information Processing over Networks, vol. 1, pp. 225–234, Dec 2015. [37] W. Chen and I. J. Wassell, “A decentralized bayesian algorithm for distributed compressive sensing in networked sensing systems,” IEEE Transactions on Wireless Communications, vol. 15, pp. 1282–1292, Feb 2016. [38] D. Sundman, S. Chatterjee, and M. Skoglund, “Distributed greedy pursuit algorithms,” Signal Processing, vol. 105, pp. 298 – 315, 2014. [39] E. J. Candes and T. Tao, “Decoding by linear programming,” IEEE Trans. Inf. Theory, vol. 51, pp. 4203 – 4215, Dec. 2005. [40] C. B. Song, S. T. Xia, and X. J. Liu, “Improved analysis for subspace pursuit algorithm in terms of restricted isometry constant,” IEEE Signal Processing Letters, vol. 21, pp. 1365–1369, Nov 2014. [41] S. Boyd, P. Diaconis, and L. Xiao, “Fastest mixing markov chain on a graph,” SIAM Rev., vol. 46, pp. 667–689, Apr. 2004. [42] J.-L. Bouchot, S. Foucart, and P. Hitczenko, “Hard thresholding pursuit algorithms: Number of iterations,” Applied and Computational Harmonic Analysis, pp. –, 2016.

Greedy Sparse Learning over Network

Greedy Sparse Learning over Network

Suggest Documents

Sparse Greedy Minimax Probability Machine

Greedy Selfish Network Creation

Greedy Dictionary Selection for Sparse ... - Infoscience - EPFL

Greedy Sparse Linear Approximations of ... - Semantic Scholar

a sparse greedy algorithm for classification

Sparse Online Greedy Support Vector Regression - CiteSeerX

Greedy Deep Dictionary Learning

Weak Greedy Routing over Graph Embedding for

Futuristic greedy approach to sparse unmixing of ... - Semantic Scholar

A greedy algorithm for sparse recovery using precise ... - Google Sites

The Emergence of Sparse Spanners and Greedy Well-Separated Pair

Greedy Algorithms for Sparse Adaptive Decision Feedback Equalization

Implementation of Greedy Algorithms for LTE Sparse ... - Google Sites

Greedy Sparse Signal Reconstruction Using Matching Pursuit ... - arXiv

A New Greedy Algorithm for Multiple Sparse Regression

Neural Network Based Greedy Job Scheduler

Futuristic greedy approach to sparse unmixing of ... - Semantic Scholar

Greedy Selfish Network Creation - Semantic Scholar

Greedy Compositional Clustering for Unsupervised Learning of

Greedy SparsityâPromoting Algorithms for Distributed Learning

Characterization and Greedy Learning of Interventional Markov ...

Efficient Greedy Learning of Gaussian Mixture Models

The Greedy Trap and Learning From Mistakes

Characterization and Greedy Learning of Interventional Markov ...

Greedy Sparse Learning over Network