Distributed Compressive Sensing

3 downloads 0 Views 494KB Size Report
IT] 22 Jan 2009. Distributed Compressive Sensing. Dror Baron,1 Marco F. Duarte,2 Michael B. Wakin,3. Shriram Sarvotham,4 and Richard G. Baraniuk2 ∗.
Distributed Compressive Sensing Dror Baron,1 Marco F. Duarte,2 Michael B. Wakin,3 Shriram Sarvotham,4 and Richard G. Baraniuk2 ∗

arXiv:0901.3403v1 [cs.IT] 22 Jan 2009

1

Department of Electrical Engineering, Technion – Israel Institute of Technology, Haifa, Israel 2 Department of Electrical and Computer Engineering, Rice University, Houston, TX 3 Division of Engineering, Colorado School of Mines, Golden, CO 4 Halliburton, Houston, TX

This paper is dedicated to the memory of Hyeokho Choi, our colleague, mentor, and friend. Abstract Compressive sensing is a signal acquisition framework based on the revelation that a small collection of linear projections of a sparse signal contains enough information for stable recovery. In this paper we introduce a new theory for distributed compressive sensing (DCS) that enables new distributed coding algorithms for multi-signal ensembles that exploit both intra- and inter-signal correlation structures. The DCS theory rests on a new concept that we term the joint sparsity of a signal ensemble. Our theoretical contribution is to characterize the fundamental performance limits of DCS recovery for jointly sparse signal ensembles in the noiseless measurement setting; our result connects single-signal, joint, and distributed (multi-encoder) compressive sensing. To demonstrate the efficacy of our framework and to show that additional challenges such as computational tractability can be addressed, we study in detail three example models for jointly sparse signals. For these models, we develop practical algorithms for joint recovery of multiple signals from incoherent projections. In two of our three models, the results are asymptotically best-possible, meaning that both the upper and lower bounds match the performance of our practical algorithms. Moreover, simulations indicate that the asymptotics take effect with just a moderate number of signals. DCS is immediately applicable to a range of problems in sensor arrays and networks. Keywords: Compressive sensing, distributed source coding, sparsity, random projection, random matrix, linear programming, array processing, sensor networks.

1

Introduction

A core tenet of signal processing and information theory is that signals, images, and other data often contain some type of structure that enables intelligent representation and processing. The notion of structure has been characterized and exploited in a variety of ways for a variety of purposes. In this paper, we focus on exploiting signal correlations for the purpose of compression. ∗

This work was supported by the grants NSF CCF-0431150 and CCF-0728867, DARPA HR0011-08-1-0078, DARPA/ONR N66001-08-1-2065, ONR N00014-07-1-0936 and N00014-08-1-1112, AFOSR FA9550-07-1-0301, ARO MURI W311NF-07-1-0185, and the Texas Instruments Leadership University Program. Preliminary versions of this work have appeared at the Allerton Conference on Communication, Control, and Computing [1], the Asilomar Conference on Signals, Systems and Computers [2], the Conference on Neural Information Processing Systems [3], and the Workshop on Sensor, Signal and Information Processing [4]. E-mail: [email protected], {duarte, shri, richb}@rice.edu, [email protected]; Web: dsp.rice.edu/cs

Current state-of-the-art compression algorithms employ a decorrelating transform such as an exact or approximate Karhunen-Lo`eve transform (KLT) to compact a correlated signal’s energy into just a few essential coefficients [5–7]. Such transform coders exploit the fact that many signals have a sparse representation in terms of some basis, meaning that a small number K of adaptively chosen transform coefficients can be transmitted or stored rather than N ≫ K signal samples. For example, smooth signals are sparse in the Fourier basis, and piecewise smooth signals are sparse in a wavelet basis [8]; the commercial coding standards MP3 [9], JPEG [10], and JPEG2000 [11] directly exploit this sparsity.

1.1

Distributed source coding

While the theory and practice of compression have been well developed for individual signals, distributed sensing applications involve multiple signals, for which there has been less progress. Such settings are motivated by the proliferation of complex, multi-signal acquisition architectures, such as acoustic and RF sensor arrays, as well as sensor networks. These architectures sometimes involve battery-powered devices, which restrict the communication energy, and high aggregate data rates, limiting bandwidth availability; both factors make the reduction of communication critical. Fortunately, since the sensors presumably observe related phenomena, the ensemble of signals they acquire can be expected to possess some joint structure, or inter-signal correlation, in addition to the intra-signal correlation within each individual sensor’s measurements. In such settings, distributed source coding that exploits both intra- and inter-signal correlations might allow the network to save on the communication costs involved in exporting the ensemble of signals to the collection point [12–15]. A number of distributed coding algorithms have been developed that involve collaboration amongst the sensors [16–19]. Note, however, that any collaboration involves some amount of inter-sensor communication overhead. In the Slepian-Wolf framework for lossless distributed coding [12–15], the availability of correlated side information at the decoder (collection point) enables each sensor node to communicate losslessly at its conditional entropy rate rather than at its individual entropy rate, as long as the sum rate exceeds the joint entropy rate. Slepian-Wolf coding has the distinct advantage that the sensors need not collaborate while encoding their measurements, which saves valuable communication overhead. Unfortunately, however, most existing coding algorithms [14, 15] exploit only inter-signal correlations and not intra-signal correlations. To date there has been only limited progress on distributed coding of so-called “sources with memory.” The direct implementation for sources with memory would require huge lookup tables [12]. Furthermore, approaches combining pre- or postprocessing of the data to remove intra-signal correlations combined with Slepian-Wolf coding for the inter-signal correlations appear to have limited applicability, because such processing would alter the data in a way that is unknown to other nodes. Finally, although recent papers [20–22] provide compression of spatially correlated sources with memory, the solution is specific to lossless distributed compression and cannot be readily extended to lossy compression settings. We conclude that the design of constructive techniques for distributed coding of sources with both intra- and inter-signal correlation is a challenging problem with many potential applications.

1.2

Compressive sensing (CS)

A new framework for single-signal sensing and compression has developed under the rubric of compressive sensing (CS). CS builds on the work of Cand`es, Romberg, and Tao [23] and Donoho [24], who showed that if a signal has a sparse representation in one basis then it can be recovered from

2

a small number of projections onto a second basis that is incoherent with the first.1 CS relies on tractable recovery procedures that can provide exact recovery of a signal of length N and sparsity K, i.e., a signal that can be written as a sum of K basis functions from some known basis, where K can be orders of magnitude less than N . The implications of CS are promising for many applications, especially for sensing signals that have a sparse representation in some basis. Instead of sampling a K-sparse signal N times, only M = O(K log N ) incoherent measurements suffice, where K can be orders of magnitude less than N . Moreover, the M measurements need not be manipulated in any way before being transmitted, except possibly for some quantization. Finally, independent and identically distributed (i.i.d.) Gaussian or Bernoulli/Rademacher (random ±1) vectors provide a useful universal basis that is incoherent with all others. Hence, when using a random basis, CS is universal in the sense that the sensor can apply the same measurement mechanism no matter what basis sparsifies the signal [27]. While powerful, the CS theory at present is designed mainly to exploit intra-signal structures at a single sensor. In a multi-sensor setting, one can naively obtain separate measurements from each signal and recover them separately. However, it is possible to obtain measurements that each depend on all signals in the ensemble by having sensors collaborate with each other in order to combine all of their measurements; we term this process a joint measurement setting. In fact, initial work in CS for multi-sensor settings used standard CS with joint measurement and recovery schemes that exploit inter-signal correlations [28–32]. However, by recovering sequential time instances of the sensed data individually, these schemes ignore intra-signal correlations.

1.3

Distributed compressive sensing (DCS)

In this paper we introduce a new theory for distributed compressive sensing (DCS) to enable new distributed coding algorithms that exploit both intra- and inter-signal correlation structures. In a typical DCS scenario, a number of sensors measure signals that are each individually sparse in some basis and also correlated from sensor to sensor. Each sensor separately encodes its signal by projecting it onto another, incoherent basis (such as a random one) and then transmits just a few of the resulting coefficients to a single collection point. Unlike the joint measurement setting described in Section 1.2, DCS requires no collaboration between the sensors during signal acquisition. Nevertheless, we are able to exploit the inter-signal correlation by using all of the obtained measurements to recover all the signals simultaneously. Under the right conditions, a decoder at the collection point can recover each of the signals precisely. The DCS theory rests on a concept that we term the joint sparsity — the sparsity of the entire signal ensemble. The joint sparsity is often smaller than the aggregate over individual signal sparsities. Therefore, DCS offers a reduction in the number of measurements, in a manner analogous to the rate reduction offered by the Slepian-Wolf framework [13]. Unlike the single-signal definition of sparsity, however, there are numerous plausible ways in which joint sparsity could be defined. In this paper, we first provide a general framework for joint sparsity using algebraic formulations based on a graphical model. Using this framework, we derive bounds for the number of measurements necessary for recovery under a given signal ensemble model. Similar to Slepian-Wolf coding [13], the number of measurements required for each sensor must account for the minimal features unique to that sensor, while at the same time features that appear among multiple sensors must be amortized over the group. Our bounds are dependent on the dimensionality of the subspaces in which each group of signals reside; they afford a reduction in the number of measurements that we quantify through the notions of joint and conditional sparsity, which are conceptually related to joint and 1

Roughly speaking, incoherence means that no element of one basis has a sparse representation in terms of the other basis. This notion has a variety of formalizations in the CS literature [24–26].

3

conditional entropies. The common thread is that dimensionality and entropy both quantify the volume that the measurement and coding rates must cover. Our results are also applicable to cases where the signal ensembles are measured jointly, as well as to the single-signal case. While our general framework does not by design provide insights for computationally efficient recovery, we also provide interesting models for joint sparsity where our results carry through from the general framework to realistic settings with low-complexity algorithms. In the first model, each signal is itself sparse, and so we could use CS to separately encode and decode each signal. However, there also exists a framework wherein a joint sparsity model for the ensemble uses fewer total coefficients. In the second model, all signals share the locations of the nonzero coefficients. In the third model, no signal is itself sparse, yet there still exists a joint sparsity among the signals that allows recovery from significantly fewer than N measurements per sensor. For each model we propose tractable algorithms for joint signal recovery, followed by theoretical and empirical characterizations of the number of measurements per sensor required for accurate recovery. We show that, under these models, joint signal recovery can recover signal ensembles from significantly fewer measurements than would be required to recover each signal individually. In fact, for two of our three models we obtain best-possible performance that could not be bettered by an oracle that knew the the indices of the nonzero entries of the signals. This paper focuses primarily on the basic task of reducing the number of measurements for recovery of a signal ensemble in order to reduce the communication cost of source coding that ensemble. Our emphasis is on noiseless measurements of strictly sparse signals, where the optimal recovery relies on ℓ0 -norm optimization,2 which is computationally intractable. In practical settings, additional criteria may be relevant for measuring performance. For example, the measurements will typically be real numbers that must be quantized, which gradually degrades the recovery quality as the quantization becomes coarser [33, 34]. Characterizing DCS in light of practical considerations such as rate-distortion tradeoffs, power consumption in sensor networks, etc., are topics of future research [31, 32].

1.4

Paper organization

Section 2 overviews the single-signal CS theories and provides a new result on CS recovery. While some readers may be familiar with this material, we include it to make the paper self-contained. Section 3 introduces our general framework for joint sparsity models and proposes three example models for joint sparsity. We provide our detailed analysis for the general framework in Section 4; we then address the three models in Section 5. We close the paper with a discussion and conclusions in Section 6. Several appendices contain the proofs.

2

Compressive Sensing Background

2.1

Transform coding

Consider a real-valued signal3 x ∈ RN indexed as x(n), n ∈ {1, 2, . . . , N }. Suppose that the basis Ψ = [ψ1 , . . . , ψN ] provides a K-sparse representation of x; that is, x=

N X

ϑ(n) ψn =

n=1

K X

ϑ(nk ) ψnk ,

(1)

k=1

2

The ℓ0 “norm” kxk0 merely counts the number of nonzero entries in the vector x. Without loss of generality, we will focus on one-dimensional signals (vectors) for notational simplicity; the extension to multi-dimensional signal, e.g., images, is straightforward. 3

4

where x is a linear combination of K vectors chosen from Ψ, {nk } are the indices of those vectors, and {ϑ(n)} are the coefficients; the concept is extendable to tight frames [8]. Alternatively, we can write in matrix notation x = Ψϑ, where x is an N × 1 column vector, the sparse basis matrix Ψ is N × N with the basis vectors ψn as columns, and ϑ is an N × 1 column vector with K nonzero elements. Using k · kp to denote the ℓp norm, we can write that kϑk0 = K; we can also write the set of nonzero indices Ω ⊆ {1, . . . , N }, with |Ω| = K. Various expansions, including wavelets [8], Gabor bases [8], curvelets [35], etc., are widely used for representation and compression of natural signals, images, and other data. The standard procedure for compressing sparse and nearly-sparse signals, known as transform coding, is to (i) acquire the full N -sample signal x; (ii) compute the complete set of transform coefficients {ϑ(n)}; (iii) locate the K largest, significant coefficients and discard the (many) small coefficients; (iv) encode the values and locations of the largest coefficients. This procedure has three inherent inefficiencies: First, for a high-dimensional signal, we must start with a large number of samples N . Second, the encoder must compute all of the N transform coefficients {ϑ(n)}, even though it will discard all but K of them. Third, the encoder must encode the locations of the large coefficients, which requires increasing the coding rate since the locations change with each signal. We will focus our theoretical development on exactly K-sparse signals and defer discussion of the more general situation of compressible signals where the coefficients decay rapidly with a power law but not to zero. Section 6 contains additional discussion on real-world compressible signals, and [36] presents simulation results.

2.2

Incoherent projections

These inefficiencies raise a simple question: For a given signal, is it possible to directly estimate the set of large ϑ(n)’s that will not be discarded? While this seems improbable, Cand`es, Romberg, and Tao [23, 25] and Donoho [24] have shown that a reduced set of projections can contain enough information to recover sparse signals. A framework to acquire sparse signals, often referred to as compressive sensing (CS) [37], has emerged that builds on this principle. In CS, we do not measure or encode the K significant ϑ(n) directly. Rather, we measure and encode M < N projections y(m) = hx, φTm i of the signal onto a second set of functions {φm }, m = 1, 2, . . . , M , where φTm denotes the transpose of φm and h·, ·i denotes the inner product. In matrix notation, we measure y = Φx, where y is an M × 1 column vector and the measurement matrix Φ is M × N with each row a measurement vector φm . Since M < N , recovery of the signal x from the measurements y is ill-posed in general; however the additional assumption of signal sparsity makes recovery possible and practical. The CS theory tells us that when certain conditions hold, namely that the basis {ψn } cannot sparsely represent the vectors {φm } (a condition known as incoherence [24–26]) and the number of measurements M is large enough (proportional to K), then it is indeed possible to recover the set of large {ϑ(n)} (and thus the signal x) from the set of measurements {y(m)} [24, 25]. This incoherence property holds for many pairs of bases, including for example, delta spikes and the sine waves of a Fourier basis, or the Fourier basis and wavelets. Signals that are sparsely represented in frames or unions of bases can be recovered from incoherent measurements in the same fashion. Significantly, this incoherence also holds with high probability between any arbitrary fixed basis or frame and a randomly generated one. In the sequel, we will focus our analysis to such random measurement procedures.

5

2.3

Signal recovery via ℓ0 -norm minimization

The recovery of the sparse set of significant coefficients {ϑ(n)} can be achieved using optimization b by searching for the signal with the sparsest coefficient vector {ϑ(n)} that agrees with the M observed measurements in y (recall that M < N ). Recovery relies on the key observation that, under mild conditions on Φ and Ψ, the coefficient vector ϑ is the unique solution to the ℓ0 -norm minimization ϑb = arg min kϑk0 s.t. y = ΦΨϑ (2)

with overwhelming probability. (Thanks to the incoherence between the two bases, if the original signal is sparse in the ϑ coefficients, then no other set of sparse signal coefficients ϑ′ can yield the same projections y.) In principle, remarkably few incoherent measurements are required to recover a K-sparse signal via ℓ0 -norm minimization. More than K measurements must be taken to avoid ambiguity; the following theorem, proven in Appendix A, establishes that K + 1 random measurements will suffice. Similar results were established by Venkataramani and Bresler [38]. Theorem 1 Let Ψ be an orthonormal basis for RN , and let 1 ≤ K < N . Then: 1. Let Φ be an M × N measurement matrix with i.i.d. Gaussian entries with M ≥ 2K. Then all signals x = Ψϑ having expansion coefficients ϑ ∈ RN that satisfy kϑk0 = K can be recovered uniquely from the M -dimensional measurement vector y = Φx via the ℓ0 -norm minimization (2) with probability one over Φ. 2. Let x = Ψϑ such that kϑk0 = K. Let Φ be an M × N measurement matrix with i.i.d. Gaussian entries (notably, independent of x) with M ≥ K + 1. Then x can be recovered uniquely from the M -dimensional measurement vector y = Φx via the ℓ0 -norm minimization (2) with probability one over Φ. 3. Let Φ be an M × N measurement matrix, where M ≤ K. Then, aside from pathological cases (specified in the proof ), no signal x = Ψϑ with kϑk0 = K can be uniquely recovered from the M -dimensional measurement vector y = Φx. Remark 1 The second statement of the theorem differs from the first in the following respect: when K < M < 2K, there will necessarily exist K-sparse signals x that cannot be uniquely recovered from the M -dimensional measurement vector y = Φx. However, these signals form a set of measure zero within the set of all K-sparse signals and can safely be avoided with high probability if Φ is randomly generated independently of x. Comparing the second and third statements of Theorem 1, we see that one measurement separates the achievable region, where perfect recovery is possible with probability one, from the converse region, where with overwhelming probability recovery is impossible. Moreover, Theorem 1 provides a strong converse measurement region in a manner analogous to the strong channel coding converse theorems of information theory [12]. Unfortunately, solving the ℓ0 -norm minimization problem is prohibitively complex, requiring a N combinatorial enumeration of the K possible sparse subspaces. In fact, the ℓ0 -norm minimization problem in general is known to be NP-hard [39]. Yet another challenge is robustness; in the setting of Theorem 1, the recovery may be very poorly conditioned. In fact, both of these considerations (computational complexity and robustness) can be addressed, but at the expense of slightly more measurements. 6

2.4

Signal recovery via ℓ1 -norm minimization

The practical revelation that supports the new CS theory is that it is not necessary to solve the ℓ0 -norm minimization to recover the set of significant {ϑ(n)}. In fact, a much easier problem yields an equivalent solution (thanks again to the incoherence of the bases); we need only solve for the smallest ℓ1 -norm coefficient vector ϑ that agrees with the measurements y [24, 25]: ϑb = arg min kϑk1

s.t. y = ΦΨϑ.

(3)

This optimization problem, also known as Basis Pursuit, is significantly more approachable and can be solved with traditional linear programming techniques whose computational complexities are polynomial in N . There is no free lunch, however; according to the theory, more than K + 1 measurements are required in order to recover sparse signals via Basis Pursuit. Instead, one typically requires M ≥ cK measurements, where c > 1 is an overmeasuring factor. As an example, we quote a result asymptotic in N . For simplicity, we assume that the sparsity scales linearly with N ; that is, K = SN , where we call S the sparsity rate. Theorem 2 [39–41] Set K = SN with 0 < S ≪ 1. Then there exists an overmeasuring factor c(S) = O(log(1/S)), c(S) > 1, such that, for a K-sparse signal x in basis Ψ, the following statements hold: 1. The probability of recovering x via ℓ1 -norm minimization from (c(S)+ǫ)K random projections, ǫ > 0, converges to one as N → ∞. 2. The probability of recovering x via ℓ1 -norm minimization from (c(S)−ǫ)K random projections, ǫ > 0, converges to zero as N → ∞. In an illuminating series of papers, Donoho and Tanner [40–42] have characterized the overmeasuring factor c(S) precisely. In our work, we have noticed that the overmeasuring factor is quite similar to log2 (1 + S −1 ). We find this expression a useful rule of thumb to approximate the precise overmeasuring ratio. Additional overmeasuring is proven to provide robustness to measurement noise and quantization error [25]. Throughout this paper we use the abbreviated notation c to describe the overmeasuring factor required in various settings even though c(S) depends on the sparsity K and signal length N .

2.5

Signal recovery via greedy pursuit

Iterative greedy algorithms have also been developed to recover the signal x from the measurements y. The Orthogonal Matching Pursuit (OMP) algorithm, for example, iteratively selects the vectors from the matrix ΦΨ that contain most of the energy of the measurement vector y. The selection at each iteration is made based on inner products between the columns of ΦΨ and a residual; the residual reflects the component of y that is orthogonal to the previously selected columns. The algorithm has been proven to successfully recover the acquired signal from incoherent measurements with high probability, at the expense of slightly more measurements, [26, 43]. Algorithms inspired by OMP, such as regularized orthogonal matching pursuit [44], CoSaMP [45], and Subspace Pursuit [46] have been shown to attain similar guarantees to those of their optimization-based counterparts. In the following, we will exploit both Basis Pursuit and greedy algorithms for recovering jointly sparse signals from incoherent measurements.

7

2.6

Properties of random measurements

In addition to offering substantially reduced measurement rates, CS has many attractive and intriguing properties, particularly when we employ random projections at the sensors. Random measurements are universal in the sense that any sparse basis can be used, allowing the same encoding strategy to be applied in different sensing environments. Random measurements are also future-proof: if a better sparsity-inducing basis is found for the signals, then the same measurements can be used to recover a more accurate view of the environment. Random coding is also robust: the measurements coming from each sensor have equal priority, unlike Fourier or wavelet coefficients in current coders. Finally, random measurements allow a progressively better recovery of the data as more measurements are obtained; one or more measurements can also be lost without corrupting the entire recovery.

2.7

Related work

Several researchers have formulated joint measurement settings for CS in sensor networks that exploit inter-signal correlations [28–32]. In their approaches, each sensor n ∈ {1, 2, . . . , N } simultaneously records a single reading x(n) of some spatial field (temperature at a certain time, for example).4 Each of the sensors generates a pseudorandom sequence rn (m), m = 1, 2, . . . , M , and modulates the reading as x(n)rn (m). Each sensor n then transmits its M numbers in sequence to the obtaining M measurements y(m) = PN collection point where the measurements are aggregated, T and φ = [r (m), r (m), . . . , r (m)], x(n)r (m). Thus, defining x = [x(1), x(2), . . . , x(N )] n m 1 2 N n=1 the collection point automatically receives the measurement vector y = [y(1), y(2), . . . , y(M )]T = Φx after M transmission steps. The samples x(n) of the spatial field can then be recovered using CS provided that x has a sparse representation in a known basis. These methods have a major limitation: since they operate at a single time instant, they exploit only inter-signal and not intrasignal correlations; that is, they essentially assume that the sensor field is i.i.d. from time instant to time instant. In contrast, we will develop signal models and algorithms that are agnostic to the spatial sampling structure and that exploit both inter- and intra-signal correlations. Recent work has adapted DCS to the finite rate of innovation signal acquisition framework [47] and to the continuous-time setting [48]. Since the original submission of this paper, additional work has focused on the analysis and proposal of recovery algorithms for jointly sparse signals [49, 50].

3

Joint Sparsity Signal Models

In this section, we generalize the notion of a signal being sparse in some basis to the notion of an ensemble of signals being jointly sparse.

3.1

Notation

We will use the following notation for signal ensembles and our measurement model. Let Λ := {1, 2, . . . , J} denote the set of indices for the J signals in the ensemble. Denote the signals in the ensemble by xj , with j ∈ Λ and assume that each signal xj ∈ RN . We use xj (n) to denote sample n in signal j, and assume for the sake of illustration — but without loss of generality — that these signals are sparse in the canonical basis, i.e., Ψ = I. The entries of the signal can take arbitrary real values. We denote by Φj the measurement matrix for signal j; Φj is Mj × N and, in general, the entries of Φj are different for each j. Thus, yj = Φj xj consists of Mj < N random measurements 4

Note that in Section 2.7 only, N refers to the number of sensors, since each sensor acquires a signal sample.

8

of xj . We will emphasize random i.i.d. Gaussian matrices Φj in the following, but other schemes are possible, including random ±1 Bernoulli/Rademacher matrices, and so on. P To compactly represent the signal and measurement ensembles, we denote M = j∈Λ Mj and define X ∈ RJN , Y ∈ RM , and Φ ∈ RM ×JN as     x1 y1  x2   y2      X =  . , Y =  . , . .  .   .  xJ yJ

and



  Φ= 

Φ1 0 . . . 0 0 Φ2 . . . 0 .. .. . . . . .. . . 0 0 . . . ΦJ



  , 

(4)

with 0 denoting a matrix of appropriate size with all entries equal to 0. We then have Y = ΦX. Equation (4) shows that separate measurement matrices have a characteristic block-diagonal structure when the entries of the sparse vector are grouped by signal. Below we propose a general framework for joint sparsity models (JSMs) and three example JSMs that apply in different situations.

3.2

General framework for joint sparsity

We now propose a general framework to quantify the sparsity of an ensemble of correlated signals x1 , x2 , . . . , xJ , which allows us to compare the complexities of different signal ensembles and to quantify their measurement requirements. The framework is based on a factored representation of the signal ensemble that decouples its location and value information. To motivate this factored representation, we begin by examining the structure of a single sparse signal, where x ∈ RN with K ≪ N nonzero entries. As an alternative to the notation used in (1), we can decouple the location and value information in x by writing x = P θ, where θ ∈ RK contains only the nonzero entries of x, and P is an identity submatrix, i.e., P contains K columns of the N × N identity matrix I. Any K-sparse signal can be written in similar fashion. To model the set of all possible sparse signals, we can then let P be the set of all identity submatrices of all possible sizes N × K ′ , with 1 ≤ K ′ ≤ N . We refer to P as a sparsity model. Whether a signal is sufficiently sparse is defined in the context of this model: given a signal x, one can consider all possible factorizations x = P θ with P ∈ P. Among these factorizations, the unique representation with smallest dimensionality for θ equals the sparsity level of the signal x under the model P. In the signal ensemble case, we consider factorizations of the form X = P Θ where X ∈ RJN as above, P ∈ RJN ×δ , and Θ ∈ Rδ for various integers δ. We refer to P and Θ as the location matrix and value vector, respectively. A joint sparsity model (JSM) is defined in terms of a set P of admissible location matrices P with varying numbers of columns; we specify below additional conditions that the matrices P must satisfy for each model. For a given ensemble X, we let PF (X) ⊆ P denote the set of feasible location matrices P ∈ P for which a factorization X = P Θ exists. We define the joint sparsity level of the signal ensemble as follows. Definition 1 The joint sparsity level D of the signal ensemble X is the number of columns of the smallest matrix P ∈ PF (X). In contrast to the single-signal case, there are several natural choices for what matrices P should be members of a joint sparsity model P. We restrict our attention in the sequel to what we call common/innovation component JSMs. In these models each signal xj is generated as a combination of two components: (i) a common component zC , which is present in all signals, and 9

(ii) an innovation component zj , which is unique to each signal. These combine additively, giving xj = zC + zj , j ∈ Λ. Note, however, that the individual components might be zero-valued in specific scenarios. We can express the component signals as zC = PC θC , zj = Pj θj ,

j ∈ Λ,

where θC ∈ RKC and each θj ∈ RKj have nonzero entries. Each matrix P ∈ P that can express such signals {xj } has the form   PC P1 0 . . . 0  PC 0 P2 . . . 0    P = . (5) .. .. . . . ,  .. . ..  . . PC

0

0

. . . PJ

T θ T θ T . . . θ T ]T , where PC , {Pj }j∈Λ are identity submatrices. We define the value vector as Θ = [θC 1 2 J where θC ∈ RKC and each θj ∈ RKj , to obtain X = P Θ. Although the values of KC and Kj are dependent on the matrix P , we omit this dependency in the sequel for brevity, except when necessary for clarity.

If a signal ensemble X = P Θ, Θ ∈ Rδ were to be generated by a selection of PC and {Pj }j∈Λ , where all J + 1 identity submatrices share a common column vector, then P would not be full rank. In other cases, we may observe a vector Θ that has zero-valued entries; i.e., we may have θj (k) = 0 for some 1 ≤ k ≤ Kj and some j ∈ Λ, or θC (k) = 0 for some 1 ≤ k ≤ KC . In both of these cases, by removing one instance of this column from any of the identity submatrices, one can obtain a matrix Q with fewer columns for which there exists Θ′ ∈ Rδ−1 that gives X = QΘ′ . If Q ∈ P, then we term this phenomenon sparsity reduction. Sparsity reduction, when present, reduces the effective joint sparsity of a signal ensemble. As an example of sparsity reduction, consider J = 2 signals of length N = 2. Consider the coefficient zC (1) 6= 0 of the common component zC and the corresponding innovation coefficients z1 (1), z2 (1) 6= 0. Suppose that all other coefficients are zero. The location matrix P that arises is   1 1 0  0 0 0   P =  1 0 1 . 0 0 0

The span of this location matrix (i.e., the set of signal ensembles X that it can generate) remains unchanged if we remove any one of the columns, i.e., if we drop any entry of the value vector Θ. This provides us with a lower-dimensional representation Θ′ of the same signal ensemble X under the JSM P; the joint sparsity of X is D = 2.

3.3

Example joint sparsity models

Since different real-world scenarios lead to different forms of correlation within an ensemble of sparse signals, we consider several possible designs for a JSM P. The distinctions among our three JSMs concern the differing sparsity assumptions regarding the common and innovation components.

10

3.3.1

JSM-1: Sparse common component + innovations

In this model, we suppose that each signal contains a common component zC that is sparse plus an innovation component zj that is also sparse. Thus, this joint sparsity model (JSM-1) P is represented by the set of all matrices of the form (5) with KC and all K Pj smaller than N . Assuming that sparsity reduction is not possible, the joint sparsity D = KC + j∈Λ Kj . A practical situation well-modeled by this framework is a group of sensors measuring temperatures at a number of outdoor locations throughout the day. The temperature readings xj have both temporal (intra-signal) and spatial (inter-signal) correlations. Global factors, such as the sun and prevailing winds, could have an effect zC that is both common to all sensors and structured enough to permit sparse representation. More local factors, such as shade, water, or animals, could contribute localized innovations zj that are also structured (and hence sparse). A similar scenario could be imagined for a network of sensors recording light intensities, air pressure, or other phenomena. All of these scenarios correspond to measuring properties of physical processes that change smoothly in time and in space and thus are highly correlated [51, 52]. 3.3.2

JSM-2: Common sparse supports

In this model, the common component zC is equal to zero, each innovation component zj is sparse, and the innovations {zj } share the same sparse support but have different nonzero coefficients. To formalize this setting in a joint sparsity model (JSM-2) we let P represent the set of all matrices of the form (5), where PC = ∅ and Pj = P for all j ∈ Λ. Here P denotes an arbitrary identity submatrix of size N × K, with K ≪ N . For a given X = P Θ, we may again partition the value vector Θ = [θ1T θ2T . . . θJT ]T , where each θj ∈ RK . It is easy to see that the matrices P from JSM-2 are full rank. Therefore, when sparsity reduction is not possible, the joint sparsity D = JK. The JSM-2 model is immediately applicable to acoustic and RF sensor arrays, where each sensor acquires a replica of the same Fourier-sparse signal but with phase shifts and attenuations caused by signal propagation. In this case, it is critical to recover each one of the sensed signals. Another useful application for this framework is MIMO communication [53]. Similar signal models have been considered in the area of simultaneous sparse approximation [53–55]. In this setting, a collection of sparse signals share the same expansion vectors from a redundant dictionary. The sparse approximation can be recovered via greedy algorithms such as Simultaneous Orthogonal Matching Pursuit (SOMP) [53, 54] or MMV Order Recursive Matching Pursuit (M-ORMP) [55]. We use the SOMP algorithm in our setting (Section 5.2) to recover from incoherent measurements an ensemble of signals sharing a common sparse structure. 3.3.3

JSM-3: Nonsparse common component + sparse innovations

In this model, we suppose that each signal contains an arbitrary common component zC and a sparse innovation component zj ; this model extends JSM-1 by relaxing the assumption that the common component zC has a sparse representation. To formalize this setting in the JSM-3 model, we let P represent the set of all matrices (5) in which PC = I, the N × N identity matrix. This N and θ ∈ RKj . implies each Kj is smaller than N while KC = N ; thus, we obtain θC ∈ RP j Assuming that sparsity reduction is not possible, the joint sparsity D = N + j∈Λ Kj . We also consider the specific case where the supports of the innovations are shared by all signals, which extends JSM-2; in this case we will have Pj = P for all j ∈ Λ, with P an identity submatrix of size N × K. It is easy to see that in this case sparsity reduction is possible, and so the the joint sparsity can drop to D = N + (J − 1)K. Note that separate CS recovery is impossible in JSM-3 with any fewer than N measurements per sensor, since the common component is not sparse. However, we will demonstrate that joint CS recovery can indeed exploit the common structure. 11

A practical situation well-modeled by this framework is where several sources are recorded by different sensors together with a background signal that is not sparse in any basis. Consider, for example, a verification system in a component production plant, where cameras acquire snapshots of each component to check for manufacturing defects. While each image could be extremely complicated, and hence nonsparse, the ensemble of images will be highly correlated, since each camera is observing the same device with minor (sparse) variations. JSM-3 can also be applied in non-distributed scenarios. For example, it motivates the compression of data such as video, where the innovations or differences between video frames may be sparse, even though a single frame may not be very sparse. In this case, JSM-3 suggests that we encode each video frame separately using CS and then decode all frames of the video sequence jointly. This has the advantage of moving the bulk of the computational complexity to the video decoder. The PRISM system proposes a similar scheme based on Wyner-Ziv distributed encoding [56]. There are many possible joint sparsity models beyond those introduced above, as well as beyond the common and innovation component signal model. Further work will yield new JSMs suitable for other application scenarios; an example application consists of multiple cameras taking digital photos of a common scene from various angles [57]. Extensions are discussed in Section 6.

4

Theoretical Bounds on Measurement Rates

In this section, we seek conditions on M = (M1 , M2 , . . . , MJ ), the tuple of number of measurements from each sensor, such that we can guarantee perfect recovery of X given Y . To this end, we provide a graphical model for the general framework provided in Section 3.2. This graphical model is fundamental in the derivation of the number of measurements needed for each sensor, as well as in the formulation of a combinatorial recovery procedure. Thus, we generalize Theorem 1 to the distributed setting to obtain fundamental limits on the number of measurements that enable recovery of sparse signal ensembles. Based on the models presented in Section 3, recovering X requires determining a value vector Θ and location matrix P such that X = P Θ. Two challenges immediately present themselves. First, a given measurement depends only on some of the components of Θ, and the measurement budget should be adjusted between the sensors according to the information that can be gathered on the components of Θ. For example, if a component Θ(d) does not affect any signal coefficient xj (·) in sensor j, then the corresponding measurements yj provide no information about Θ(d). Second, the decoder must identify a location matrix P ∈ PF (X) from the set P and the measurements Y .

4.1

Modeling dependencies using bipartite graphs

We introduce a graphical representation that captures the dependencies between the measurements in Y and the value vector Θ, represented by Φ and P . Consider a feasible decomposition of X into a full-rank matrix P ∈ PF (X) and the corresponding Θ; the matrix P defines the sparsities of the common and innovation components KC and Kj , 1 ≤ j ≤ J, as well as the joint sparsity P D = KC + Jj=1 Kj . Define the following sets of vertices: (i) the set of value vertices VV has elements with indices d ∈ {1, . . . , D} representing the entries of the value vector Θ(d), and (ii) the set of measurement vertices VM has elements with indices (j, m) representing the measurements yj (m), with j ∈ Λ and m ∈ {1, . . . , Mj }. The cardinalities for these sets are |VV | = D and |VM | = M , respectively. We now introduce a bipartite graph G = (VV , VM , E), that represents the relationships between the entries of the value vector and the measurements (see [4] for details). The set of edges E is defined as follows:

12

Value vector coefficients

Measurements

1

(1,1)

...

(1,2)

...

...

2

D

(J, MJ )

VM

Figure 1: Bipartite graph for distributed compressive sensing (DCS). The bipartite graph G = (VV , VM , E) indicates the relationship between the value vector coefficients and the measurements.

• For every d ∈ {1, 2, . . . , KC } ⊆ VV and j ∈ Λ such that column d of PC does not also appear as a column of Pj , we have an edge connecting d to each vertex (j, m) ∈ VM for 1 ≤ m ≤ Mj . • For every d ∈ {KC + 1, KC + 2, . . . , D} ⊆ VV , we consider the sensor j associated with column d of P , and we have an edge connecting d to each vertex (j, m) ∈ VM for 1 ≤ m ≤ Mj . In words, we say that yj (m), the mth measurement of sensor j, measures Θ(d) if the vertex d ∈ VV is linked to the vertex (j, m) ∈ VM in the graph G. An example graph for a distributed sensing setting is shown in Figure 1.

4.2

Quantifying redundancies

In order to obtain sharp bounds on the number of measurements needed, our analysis of the measurement process must account for redundancies between the locations of the nonzero coefficients in the common and innovation components. To that end, we consider the overlaps between common and innovation components in each signal. When we have zc (n) 6= 0 and zj (n) 6= 0 for a certain signal j and some index 1 ≤ n ≤ N , we cannot recover the values of both coefficients from the measurements of this signal alone; therefore, we will need to recover zc (n) using measurements of other signals that do not feature the same overlap. We thus quantify the size of the overlap for all subsets of signals Γ ⊂ Λ under a feasible representation given by P and Θ, as described in Section 3.2. Definition 2 The overlap size for the set of signals Γ ⊂ Λ, denoted KC (Γ, P ), is the number of indices in which there is overlap between the common and the innovation component supports at all signals j ∈ / Γ: KC (Γ, P ) = |{n ∈ {1, . . . , N } : zC (n) 6= 0 and ∀ j ∈ / Γ, zj (n) 6= 0}| .

(6)

We also define KC (Λ, P ) = KC (P ) and KC (∅, P ) = 0. For Γ ⊂ Λ, KC (Γ, P ) provides a penalty term due to the need for recovery of common component coefficients that are overlapped by innovations in all other signals j ∈ / Γ. Intuitively, for each entry counted in KC (Γ, P ), some sensor in Γ must take one measurement to account for that entry of the common component — it is impossible to recover such entries from measurements made by sensors outside of Γ. When all signals j ∈ Λ are considered, it is clear that all of the common component coefficients must be recovered from the obtained measurements. 13

4.3

Measurement bounds

Converse and achievable bounds for the number of measurements necessary for DCS recovery are given below. Our bounds consider each subset of sensors Γ ⊆ Λ, since the cost of sensing the common component can be amortized across sensors: it may be possible to reduce the rate at one sensor j1 ∈ Γ (up to a point), as long as other sensors in Γ offset the rate reduction. We quantify the reduction possible through the following definition. Definition 3 The conditional sparsity of the set of signals Γ is the number of entries of the vector Θ that must be recovered by measurements yj , j ∈ Γ:   X Kj (P ) + KC (Γ, P ). Kcond (Γ, P ) =  j∈Γ

The joint sparsity gives the number of degrees of freedom for the signals in Λ, while the conditional sparsity gives the number of degrees of freedom for signals in Γ when the signals in Λ \ Γ are available as side information. Note also that Definition 1 for joint sparsity can be extended to a subset of signals Γ by considering the number of entries of Θ that affect these signals:   X Kjoint (Γ, P ) = D − Kcond (Λ − Γ, P ) =  Kj (P ) + KC (P ) − KC (Λ \ Γ, P ). j∈Γ

Note that Kcond (Λ, P ) = Kjoint (Λ, P ) = D. The bipartite graph introduced in Section 4.1 is the cornerstone of Theorems 3, 4, and 5, which consider whether a perfect matching can be found in the graph; see the proofs in Appendices B, D, and E, respectively, for detail.

Theorem 3 (Achievable, known P ) Assume that a signal ensemble X is obtained from a common/innovation component JSM P. Let M = (M1 , M2 , . . . , MJ ) be a measurement tuple, let {Φj }j∈Λ be random matrices having Mj rows of i.i.d. Gaussian entries for each j ∈ Λ, and write Y = ΦX. Suppose there exists a full rank location matrix P ∈ PF (X) such that X Mj ≥ Kcond (Γ, P ) (7) j∈Γ

b to the system for all Γ ⊆ Λ. Then with probability one over {Φj }j∈Γ , there exists a unique solution Θ b hence, the signal ensemble X can be uniquely recovered as X = P Θ. b of equations Y = ΦP Θ;

Theorem 4 (Achievable, unknown P ) Assume that a signal ensemble X and measurement matrices {Φj }j∈Λ follow the assumptions of Theorem 3. Suppose there exists a full rank location matrix X P ∗ ∈ PF (X) such that Mj ≥ Kcond (Γ, P ∗ ) + |Γ| (8) j∈Γ

for all Γ ⊆ Λ. Then X can be uniquely recovered from Y with probability one over {Φj }j∈Γ .

Theorem 5 (Converse) Assume that a signal ensemble X and measurement matrices {Φj }j∈Λ follow the assumptions of Theorem 3. Suppose there exists a full rank location matrix P ∈ PF (X) such that X Mj < Kcond (Γ, P ) (9) j∈Γ

14

b such that Y = ΦP Θ b but X b := P Θ b 6= X. for some Γ ⊆ Λ. Then there exists a solution Θ

The identification of a feasible location matrix P causes the one measurement per sensor gap that prevents (8)–(9) from being a tight converse and achievable bound pair. We note in passing that the signal recovery procedure used in Theorem 4 is akin to ℓ0 -norm minimization on X; see Appendix D for details.

4.4

Discussion

The bounds in Theorems 3–5 are dependent on the dimensionality of the subspaces in which the signals reside. The number of noiseless measurements required for ensemble recovery is determined by the dimensionality dim(S) of the subspace S in the relevant signal model, because dimensionality and sparsity play a volumetric role akin to the entropy H used to characterize rates in source coding. Whereas in source coding each bit resolves between two options, and 2N H typical inputs are described using N H bits [12], in CS we have M = dim(S) + O(1). Similar to Slepian-Wolf coding [13], the number of measurements required for each sensor must account for the minimal features unique to that sensor, while at the same time features that appear among multiple sensors must be amortized over the group. Theorems 3–5 can also be applied to the single sensor and joint measurement settings. In the single-signal setting (Theorem 1), we will have x = P θ with θ ∈ RK , and Λ = {1}; Theorem 4 provides the requirement M ≥ K + 1. It is easy to show that the joint measurement is equivalent to the single-signal setting: we stack all the individual signals into a single signal vector, and in both cases all measurements are dependent on all the entries of the signal vector. However, the distribution of the measurements among the available sensors P is irrelevant in a joint measurement setting. Therefore, we only obtain a necessary condition j Mj ≥ D + 1 on the total number of measurements required.

5

Practical Recovery Algorithms and Experiments

Although we have provided a unifying theoretical treatment for the three JSM models, the nuances warrant further study. In particular, while Theorem 4 highlights the basic tradeoffs that must be made in partitioning the measurement budget among sensors, the result does not by design provide insight into tractable algorithms for signal recovery. We believe there is additional insight to be gained by considering each model in turn, and while the presentation may be less unified, we attribute this to the fundamental diversity of problems that can arise under the umbrella of jointly sparse signal representations. In this section, we focus on tractable recovery algorithms for each model and, when possible, analyze the corresponding measurement requirements.

5.1

Recovery strategies for sparse common + innovations (JSM-1)

We first characterize the sparse common signal and innovations model JSM-1 from Section 3.3.1. For simplicity, we limit our description to J = 2 signals, but describe extensions to multiple signals as needed. 5.1.1

Measurement bounds for joint recovery

Under the JSM-1 model, separate recovery of the signal xj via ℓ0 -norm minimization would require Kjoint ({j}) + 1 = Kcond ({j}) + 1 = KC + Kj − KC (Λ \ {j}) + 1 measurements, where KC (Λ \ {j}) accounts for sparsity reduction due to overlap between zC and zj . We apply Theorem 4 to the JSM1 model to obtain the corollary below. To address the possibility of sparsity reduction, we denote 15

by KR the number of indices in which the common component zC and all innovation components zj , j ∈ Λ overlap; this results in sparsity reduction for the common component. Corollary 1 Assume the measurement matrices {Φj }j∈Λ contain i.i.d. Gaussian entries. Then the signal ensemble X can be recovered with probability one if the following conditions hold:   X X Kj  + KC (Γ) + |Γ|, Γ 6= Λ, Mj ≥  j∈Γ

j∈Γ

X

Mj

j∈Λ



≥ KC + 

X j∈Λ



Kj  + J − KR .

Our joint recovery scheme provides a significant savings in measurements, because the common component can be measured as part of any of the J signals. 5.1.2

Stochastic signal model for JSM-1

To give ourselves a firm footing for analysis, in the remainder of Section 5.1 we use a stochastic process for JSM-1 signal generation. This framework provides an information theoretic setting where we can scale the size of the problem and investigate which measurement rates enable recovery. We generate the common and innovation components as follows. For n ∈ {1, . . . , N } the decision whether zC (n) and zj (n) is zero or not is an i.i.d. Bernoulli process, where the probability of a nonzero value is given by parameters denoted SC and Sj , respectively. The values of the nonzero coefficients are then generated from an i.i.d. Gaussian distribution. The outcome of this process is that zC and zj have sparsities KC ∼ Binomial(N, SC ) and Kj ∼ Binomial(N, Sj ). The parameters Sj and SC are sparsity rates controlling the random generation of each signal. Our model resembles the Gaussian spike process [58], which is a limiting case of a Gaussian mixture model. Likelihood of sparsity reduction and overlap: This stochastic model can yield signal ensembles for which the corresponding generating matrices P allow for sparsity reduction; specifically, there might be overlap between the supports of the common component zC and all the innovation components zj , j ∈ Λ. For J = 2, the probability that a given index is present in all supports is SR := SC S1 S2 . Therefore, the distribution of the cardinality of this overlap is KR ∼ Binomial(N, SR ). We must account for the reduction obtained from the removal of the corresponding number of columns from the location matrix P when the total number of measurements M1 + M2 is considered. In the same way we can show that the distributions for the number of indices in the overlaps required by Corollary 1 are KC ({1}) ∼ Binomial(N, SC,{1} ) and KC ({2}) ∼ Binomial(N, SC,{2} ), where SC,{1} := SC (1 − S1 )S2 and SC,{2} := SC S1 (1 − S2 ). Measurement rate region: To characterize DCS recovery performance, we introduce a measurement rate region. We define the measurement rate Rj in an asymptotic manner as Rj := lim

N →∞

Mj , j ∈ Λ. N

Additionally, we note that lim

N →∞

KC Kj = SC and lim = Sj , j ∈ Λ N →∞ N N

Thus, we also set SXj = SC + Sj − SC Sj , j ∈ {1, 2}. For a measurement rate pair (R1 , R2 ) and sources X1 and X2 , we evaluate whether we can recover the signals with vanishing probability of error as N increases. In this case, we say that the measurement rate pair is achievable. 16

For jointly sparse signals under JSM-1, separate recovery via ℓ0 -norm minimization would require a measurement rate Rj = SXj . Separate recovery via ℓ1 -norm minimization would require an overmeasuring factor c(SXj ), and thus the measurement rate would become Rj = SXj ·c(SXj ). To improve upon these figures, we adapt the standard machinery of CS to the joint recovery problem. 5.1.3

Joint recovery via ℓ1 -norm minimization

As discussed in Section 2.3, solving an ℓ0 -norm minimization is NP-hard, and so in practice we must relax our ℓ0 criterion in order to make the solution tractable. We now study what penalty must be paid for ℓ1 -norm recovery of jointly sparse signals. Using the vector and frame     zC Φ1 Φ1 0 e   z1 Z := and Φ := , (11) Φ2 0 Φ2 z2 we can represent the concatenated measurement vector Y sparsely using the concatenated coefficient e vector Z, which contains KC +K1 +K2 −KR nonzero coefficients, to obtain Y = ΦZ. With sufficient b which yields overmeasuring, we have seen experimentally that it is possible to recover a vector Z, xj = zbC + zbj , j = 1, 2, by solving the weighted ℓ1 -norm minimization Zb = arg min γC ||zC ||1 + γ1 ||z1 ||1 + γ2 ||z2 ||1

e s.t. y = ΦZ,

(12)

where γC , γ1 , γ2 ≥ 0. We call this the γ-weighted ℓ1 -norm formulation; our numerical results (Section 5.1.6 and our technical report [59]) indicate a reduction in the requisite number of measurements via this enhancement. If K1 = K2 and M1 = M2 , then without loss of generality we set γ1 = γ2 = 1 and numerically search for the best parameter γC . We discuss the asymmetric case with K1 = K2 and M1 6= M2 in the technical report [59]. 5.1.4

Converse bound on performance of γ-weighted ℓ1 -norm minimization

We now provide a converse bound that describes what measurement rate pairs cannot be achieved via the γ-weighted ℓ1 -norm minimization. Our notion of a converse focuses on the setting where each signal xj is measured via multiplication by the Mj by N matrix Φj and joint recovery is performed via our γ-weighted ℓ1 -norm formulation (12). Within this setting, a converse region is a set of measurement rates for which the recovery fails with overwhelming probability as N increases. We assume that J = 2 sources have innovation sparsity rates that satisfy S1 = S2 = SI . Our first result, proved in Appendix F, provides deterministic necessary conditions to recover the components zC , z1 , and z2 , using the γ-weighted ℓ1 -norm formulation (12). We note that the lemma holds for all such combinations of components that generate the same signals x1 = zC + z1 and x2 = zC + z2 . Lemma 1 Consider any γC , γ1 , and γ2 in the γ-weighted ℓ1 -norm formulation (12). The components zC , z1 , and z2 can be recovered using measurement matrices Φ1 and Φ2 only if (i) z1 can be recovered via ℓ1 -norm minimization (3) using Φ1 and measurements Φ1 z1 ; (ii) z2 can be recovered via ℓ1 -norm minimization using Φ2 and measurements Φ2 z2 ; and (iii) zC can be recovered via ℓ1 -norm minimization using the joint matrix [ΦT1 ΦT2 ]T and measurements [ΦT1 ΦT2 ]T zC . Lemma 1 can be interpreted as follows. If M1 and M2 are not large enough individually, then the innovation components z1 and z2 cannot be recovered. This implies a converse bound on the individual measurement rates R1 and R2 . Similarly, combining Lemma 1 with the converse bound

17

1 0.9 0.8 0.7

R2

0.6 Anticipated Converse

0.5 Simulation

Achievable

0.4

Separate Recovery

0.3 0.2 0.1 0 0

0.2

0.4

R1

0.6

0.8

1

Figure 2: Recovering a signal ensemble with sparse common + innovations (JSM-1). We chose a common component sparsity rate SC = 0.2 and innovation sparsity rates SI = S1 = S2 = 0.05. Our simulation results use the γ-weighted ℓ1 -norm formulation (12) on signals of length N = 1000; the measurement rate pairs that achieved perfect recovery over 100 simulations are denoted by circles. of Theorem 2 for single-source ℓ1 -norm minimization of the common component zC implies a lower bound on the sum measurement rate R1 + R2 . Anticipated converse: As shown in Corollary 1, for indices n such that x1 (n) and x2 (n) differ and are nonzero, each sensor must take measurements to account for one of the two coefficients. In the case where S1 = S2 = SI , the joint sparsity rate is SC +2SI −SC SI2 . We define the measurement function c′ (S) := S · c(S) based on Donoho and Tanner’s oversampling factor c(S) (Theorem 2). It can be shown that the function c′ (·) is concave; in order to minimize the sum rate bound, we “explain” as many of the sparse coefficients in one of the signals and as few as possible in the other. From Corollary 1, we have R1 , R2 ≥ SI + SC SI − SC SI2 . Consequently, one of the signals must “explain” this sparsity rate, whereas the other signal must explain the rest: [SC + 2SI − SC SI2 ] − [SI + SC SI − SC SI2 ] = SC + SI − SC SI . Unfortunately, the derivation of c′ (S) relies on Gaussianity of the measurement matrix, whereas in our case Φ has a block matrix form. Therefore, the following conjecture remains to be proved rigorously. Conjecture 1 Let J = 2 and fix the sparsity rate of the common component SC and the innovation sparsity rates S1 = S2 = SI . Then the following conditions on the measurement rates are necessary to enable recovery with probability one:  Rj ≥ c′ SI + SC SI − SC SI2 , j = 1, 2,  R1 + R2 ≥ c′ SI + SC SI − SC SI2 + c′ (SC + SI − SC SI ) . 5.1.5

Achievable bound on performance of ℓ1 -norm minimization

We have not yet characterized the performance of γ-weighted ℓ1 -norm formulation (12) analytically. Instead, Theorem 6 below uses an alternative ℓ1 -norm based recovery technique. The proof describes a constructive recovery algorithm. We construct measurement matrices Φ1 and Φ2 , each consisting 18

of two parts. The first parts of the matrices are identical and recover x1 − x2 . The second parts of the matrices are different and enable the recovery of 21 x1 + 21 x2 . Once these two components have been recovered, the computation of x1 and x2 is straightforward. The measurement rate can be computed by considering both identical and different parts of the measurement matrices. Theorem 6 Let J = 2, N → ∞ and fix the sparsity rate of the common component SC and the innovation sparsity rates S1 = S2 = SI . If the measurement rates satisfy the following conditions: Rj

> c′ (2SI − SI2 ), j = 1, 2, ′

R1 + R2 > c (2SI −

SI2 )



+ c (SC + 2SI − 2SC SI −

(13a) SI2

+ SC SI2 ),

(13b)

then we can design measurement matrices Φ1 and Φ2 with random Gaussian entries and an ℓ1 -norm minimization recovery algorithm that succeeds with probability approaching one as N increases. Furthermore, as SI → 0 the sum measurement rate approaches c′ (SC ). The theorem is proved in Appendix G. The recovery algorithm of Theorem 6 is based on linear programming. It can be extended from J = 2 to an arbitrary number of signals by recovering P all signal differences of the form xj1 −xj2 in the first stage of the algorithm and then recovering J1 j xj in the second stage. In contrast, our γ-weighted ℓ1 -norm formulation (12) recovers a length-JN signal. Our simulation experiments (Section 5.1.6) indicate that the γ-weighted formulation can recover using fewer measurements than the approach of Theorem 6. The achievable measurement rate region of Theorem 6 is loose with respect to the region of the anticipated converse Conjecture 1 (see Figure 2). We leave for future work the characterization of a tight measurement rate region for computationally tractable (polynomial time) recovery techniques. 5.1.6

Simulations for JSM-1

We now present simulation results for several different JSM-1 settings. The γ-weighted ℓ1 -norm formulation (12) was used throughout, where the optimal choice of γC , γ1 , and γ2 depends on the relative sparsities KC , K1 , and K2 . The optimal values have not been determined analytically. Instead, we rely on a numerical optimization, which is computationally intense. A detailed discussion of our intuition behind the choice of γ appears in the technical report [59]. Recovering two signals with symmetric measurement rates: Our simulation setting is as follows. The signal components zC , z1 , and z2 are assumed (without loss of generality) to be sparse in Ψ = IN with sparsities KC , K1 , and K2 , respectively. We assign random Gaussian values to the nonzero coefficients. We restrict our attention to the symmetric setting in which K1 = K2 and M1 = M2 , and consider signals of length N = 50 where KC + K1 + K2 = 15. In our joint decoding simulations, we consider values of M1 and M2 in the range between 10 and 40. We find the optimal γC in the γ-weighted ℓ1 -norm formulation (12) using a line search optimization, where simulation indicates the “goodness” of specific γC values in terms of the likelihood of recovery. With the optimal γC , for each set of values we run several thousand trials to determine the empirical probability of success in decoding z1 and z2 . The results of the simulation are summarized in Figure 3. The savings in the number of measurements M can be substantial, especially when the common component KC is large (Figure 3). For KC = 11, K1 = K2 = 2, M is reduced by approximately 30%. For smaller KC , joint decoding barely outperforms separate decoding, since most of the measurements are expended on innovation components. Additional results appear in [59]. Recovering two signals with asymmetric measurement rates: In Figure 2, we compare separate CS recovery with the anticipated converse bound of Conjecture 1, the achievable bound of Theorem 6, and numerical results. 19

C

1

2

C

Probability of Exact Reconstruction

Probability of Exact Reconstruction

K = 11, K = K = 2, N = 50, γ = 0.905 1 0.8

Joint Separate

0.6 0.4 0.2 0

15 20 25 30 35 Number of Measurements per Signal, M

K = 3, K = K = 6, N = 50, γ = 1.425 C

1

2

C

1 Joint Separate

0.8 0.6 0.4 0.2 0

15 20 25 30 35 Number of Measurements per Signal, M

Figure 3: Comparison of joint decoding and separate decoding for JSM-1. The advantage of joint over separate decoding depends on the common component sparsity. 0.65

Measurement Rate per Sensor, R

0.6 0.55 0.5 0.45 0.4 0.35 0.3 0.25 1

2

3

4 5 6 7 Number of Sensors, J

8

9

10

Figure 4: Multi-sensor measurement results for JSM-1. We choose a common component sparsity rate SC = 0.2, innovation sparsity rates SI = 0.05, and signals of length N = 500; our results demonstrate a reduction in the measurement rate per sensor as the number of sensors J increases. We use J = 2 signals and choose a common component sparsity rate SC = 0.2 and innovation sparsity rates SI = S1 = S2 = 0.05. We consider several different asymmetric measurement rates. In each such setting, we constrain M2 to have the form M2 = αM1 for some α, with N = 1000. The results plotted indicate the smallest pairs (M1 , M2 ) for which we always succeeded recovering the signal over 100 simulation runs. In some areas of the measurement rate region our γ-weighted ℓ1 -norm formulation (12) requires fewer measurements than the achievable approach of Theorem 6. Recovering multiple signals with symmetric measurement rates: The γ-weighted ℓ1 norm recovery technique of this section is especially promising when J > 2 sensors are used. These savings may be valuable in applications such as sensor networks, where data may contain strong spatial (inter-source) correlations. We use J ∈ {1, 2, . . . , 10} signals and choose the same sparsity rates SC = 0.2 and SI = 0.05 as the asymmetric rate simulations; here we use symmetric measurement rates and let N = 500. The results of Figure 4 describe the smallest symmetric measurement rates for which we always succeeded recovering the signal over 100 simulation runs. As J increases, lower measurement rates can be used; the results compare favorably with the lower bound from Conjecture 1, which gives Rj ≈ 0.232 as J → ∞. 20

5.2

Recovery strategies for common sparse supports (JSM-2)

Under the JSM-2 signal ensemble model from Section 3.3.2, separate recovery of each signal via ℓ0 -norm minimization would require K + 1 measurements per signal, while separate recovery via ℓ1 -norm minimization would require cK measurements per signal. When Theorems 4 and 5 are applied in the context of JSM-2, the bounds for joint recovery match those of individual recovery using ℓ0 -norm minimization. Within this context, it is also possible to recover one of the signals using K + 1 measurements from the corresponding sensor, and then with the prior knowledge of the support set Ω, recover all other signals from K measurements per sensor; thus providing an additional savings of J − 1 measurements [60]. Surprisingly, we will demonstrate below that for large J, the common support set can actually be recovered using only one measurement per sensor and algorithms that are computationally tractable. The algorithms we propose are inspired by conventional greedy pursuit algorithms for CS (such as OMP [26]). In the single-signal case, OMP iteratively constructs the sparse support set Ω; decisions are based on inner products between the columns of Φ and a residual. In the multi-signal case, there are more clues available for determining the elements of Ω. 5.2.1

Recovery via Trivial Pursuit (TP)

When there are many correlated signals in the ensemble, a simple non-iterative greedy algorithm based on inner products will suffice to recover the signals jointly. For simplicity but without loss of generality, we assume that an equal number of measurements Mj = M are taken of each signal. We write Φj in terms of its columns, with Φj = [φj,1 , φj,2 , . . . , φj,N ]. Trivial Pursuit (TP) Algorithm for JSM-2 1. Get greedy: Given all of the measurements, compute the test statistics J 1X hyj , φj,n i2 , n ∈ {1, 2, . . . , N }, ξn = J

(14)

j=1

and estimate the elements of the common coefficient support set by b = {n having one of the K largest ξn }. Ω When the sparse, nonzero coefficients are sufficiently generic (as defined below), we have the following surprising result, which is proved in Appendix H. Theorem 7 Let Ψ be an orthonormal basis for RN , let the measurement matrices Φj contain i.i.d. Gaussian entries, and assume that the nonzero coefficients in the θj are i.i.d. Gaussian random variables. Then with M ≥ 1 measurements per signal, TP recovers Ω with probability approaching one as J → ∞. In words, with fewer than K measurements per sensor, it is actually possible to recover the sparse support set Ω under the JSM-2 model.5 Of course, this approach does not recover the K coefficient values for each signal; at least K measurements per sensor are required for this. Corollary 2 Assume that the nonzero coefficients in the θj are i.i.d. Gaussian random variables. Then the following statements hold: 5

One can also show the somewhat stronger result that, as long as approaching one. We have omitted this additional result for brevity.

21

P

j

Mj ≫ N , TP recovers Ω with probability

1. Let the measurement matrices Φj contain i.i.d. Gaussian entries, with each matrix having an overmeasuring factor of c = 1 (that is, Mj = K for each measurement matrix Φj ). Then TP recovers all signals from the ensemble {xj } with probability approaching one as J → ∞. 2. Let Φj be a measurement matrix with overmeasuring factor c < 1 (that is, Mj < K), for some j ∈ Λ. Then with probability one, the signal xj cannot be uniquely recovered by any algorithm for any value of J. The first statement is an immediate corollary of Theorem 7; the second statement follows because each equation yj = Φj xj would be underdetermined even if the nonzero indices were known. Thus, under the JSM-2 model, the TP algorithm asymptotically performs as well as an oracle decoder that has prior knowledge of the locations of the sparse coefficients. From an information theoretic perspective, Corollary 2 provides tight achievable and converse bounds for JSM-2 signals. We should note that the theorems in this section have a slightly different flavor than Theorem 4 and 5, which ensure recovery of any sparse signal ensemble, given a suitable set of measurement matrices. Theorem 7 and Corollary 2 above, in contrast, rely on a random signal model and do not guarantee simultaneous performance for all sparse signals under any particular measurement ensemble. Nonetheless, we feel this result is worth presenting to highlight the strong subspace concentration behavior that enables the correct identification of the common support. In the technical reports [59, 61], we derive an approximate formula for the probability of error in recovering the common support set Ω given J, K, M , and N . While theoretically interesting and potentially practically useful, these results require J to be large. Our numerical experiments show that the number of measurements required for recovery using TP decreases quickly as J increases. However, in the case of small J, TP performs poorly. Hence, we propose next an alternative recovery technique based on simultaneous greedy pursuit that performs well for small J. 5.2.2

Recovery via iterative greedy pursuit

In practice, the common sparse support among the J signals enables a fast iterative algorithm to recover all of the signals jointly. Tropp and Gilbert have proposed one such algorithm, called Simultaneous Orthogonal Matching Pursuit (SOMP) [53], which can be readily applied in our DCS framework. SOMP is a variant of OMP that seeks to identify Ω one element at a time. A similar simultaneous sparse approximation algorithm has been proposed using convex optimization [62]. We dub the DCS-tailored SOMP algorithm DCS-SOMP. To adapt the original SOMP algorithm to our setting, we first extend it to cover a different measurement matrix Φj for each signal xj . Then, in each DCS-SOMP iteration, we select the column index n ∈ {1, 2, . . . , N } that accounts for the greatest amount of residual energy across all signals. As in SOMP, we orthogonalize the remaining columns (in each measurement matrix) after each step; after convergence we obtain an expansion of the measurement vector yj on an orthogonalized subset of the columns of basis vectors. To obtain the expansion coefficients in the sparse basis, we then reverse the orthogonalization process using the QR matrix factorization. Finally, we again assume that Mj = M measurements per signal are taken.

DCS-SOMP Algorithm for JSM-2 1. Initialize: Set the iteration counter ℓ = 1. For each signal index j ∈ Λ, initialize the orthogonalized coefficient vectors βbj = 0, βbj ∈ RM ; also initialize the set of selected indices b = ∅. Let rj,ℓ denote the residual of the measurement yj remaining after the first ℓ iterations, Ω and initialize rj,0 = yj . 22

2. Select the dictionary vector that maximizes the value of the sum of the magnitudes of the projections of the residual, and add its index to the set of selected indices nℓ = arg max

n∈{1,...,N }

b = [Ω b nℓ ]. Ω

J X |hrj,ℓ−1 , φj,n i| j=1

kφj,n k2

,

3. Orthogonalize the selected basis vector against the orthogonalized set of previously selected dictionary vectors ℓ−1 X hφj,nℓ , γj,t i γj,ℓ = φj,nℓ − γj,t . kγj,t k22 t=0 4. Iterate: Update the estimate of the coefficients for the selected vector and residuals hrj,ℓ−1 , γj,ℓ i , kγj,ℓ k22 hrj,ℓ−1 , γj,ℓ i = rj,ℓ−1 − γj,ℓ . kγj,ℓ k22

βbj (ℓ) = rj,ℓ

5. Check for convergence: If krj,ℓ k2 > ǫkyj k2 for all j, then increment ℓ and go to Step 2; otherwise, continue to Step 6. The parameter ǫ determines the target error power level allowed for algorithm convergence. 6. De-orthogonalize: Consider the relationship between Γj = [γj,1 , γj,2 , . . . , γj,M ] and the Φj given by the QR factorization Φj,Ωb = Γj Rj , where Φj,Ωb = [φj,n1 , φj,n2 , . . . , φj,nM ] is the socalled mutilated basis.6 Since yj = Γj βj = Φj,Ωb xj,Ωb = Γj Rj xj,Ωb , where xj,Ωb is the mutilated coefficient vector, we can compute the signal estimates {b xj } as x bj,Ωb = Rj−1 βbj ,

where x bj,Ωb is the mutilated version of the sparse coefficient vector x bj .

In practice, we obtain b cK measurements from each signal xj for some value of b c. We then use DCS-SOMP to recover the J signals jointly. We orthogonalize because as the number of iterations approaches M the norms of the residues of an orthogonal pursuit decrease faster than for a nonorthogonal pursuit; indeed, due to Step 3 the algorithm can only run for up to M iterations. The computational complexity of this algorithm is O(JN M 2 ), which matches that of separate recovery for each signal while reducing the required number of measurements. Thanks to the common sparsity structure among the signals, we believe (but have not proved) that DCS-SOMP will succeed with b c < c(S). Empirically, we have observed that a small number of measurements proportional to K suffices for a moderate number of sensors J. Based on our observations, described in Section 5.2.3, we conjecture that K + 1 measurements per sensor suffice as J → ∞. Thus, this efficient greedy algorithm enables an overmeasuring factor b c = (K + 1)/K that approaches 1 as J, K, and N increase. 6

We define a mutilated basis ΦΩ as a subset of the basis vectors from Φ = [φ1 , φ2 , . . . , φN ] corresponding to the indices given by the set Ω = {n1 , n2 , . . . , nM }, that is, ΦΩ = [φn1 , φn2 , . . . , φnM ]. This concept can be extended to vectors in the same manner.

23

1

Probability of Exact Reconstruction

J = 32 0.9

16

0.8

8 4

0.7

2

0.6

1

0.5 0.4

2

0.3

4

0.2

8 16 32

0.1 0 0

5

10 15 20 25 30 Number of Measurements per Signal, M

35

Figure 5: Recovering a signal ensemble with common sparse supports (JSM-2). We plot the probability of perfect recovery via DCS-SOMP (solid lines) and separate CS recovery (dashed lines) as a function of the number of measurements per signal M and the number of signals J. We fix the signal length to N = 50, the sparsity to K = 5, and average over 1000 simulation runs. An oracle encoder that knows the positions of the large signal expansion coefficients would use 5 measurements per signal.

5.2.3

Simulations for JSM-2

We now present simulations comparing separate CS recovery versus joint DCS-SOMP recovery for a JSM-2 signal ensemble. Figure 5 plots the probability of perfect recovery corresponding to various numbers of measurements M as the number of sensors varies from J = 1 to 32, over 1000 trials in each case. We fix the signal lengths at N = 50 and the sparsity of each signal to K = 5. With DCS-SOMP, for perfect recovery of all signals the average number of measurements per signal decreases as a function of J. The trend suggests that for large J close to K measurements per signal should suffice. On the contrary, with separate CS recovery, for perfect recovery of all signals the number of measurements per sensor increases as a function of J. This occurs because each signal experiences an independent probability p ≤ 1 of successful recovery; therefore the overall probability of complete success is pJ . Consequently, each sensor must compensate by making additional measurements. This phenomenon further motivates joint recovery under JSM-2. Finally, we note that we can use algorithms other than DCS-SOMP to recover the signals under the JSM-2 model. Cotter et al. [55] have proposed additional algorithms (such as M-FOCUSS) that iteratively eliminate basis vectors from the dictionary and converge to the set of sparse basis vectors over which the signals are supported. We hope to extend such algorithms to JSM-2 in future work.

5.3

Recovery strategies for nonsparse common component + sparse innovations (JSM-3)

The JSM-3 signal ensemble model from Section 3.3.3 provides a particularly compelling motivation for joint recovery. Under this model, no individual signal xj is sparse, and so recovery of each signal separately would require fully N measurements per signal. As in the other JSMs, however, the commonality among the signals makes it possible to substantially reduce this number. Again, the potential for this savings is evidenced by specializing Theorem 4 to the context of JSM-3.

24

Corollary 3 If Φj is a random Gaussian matrix for all j ∈ Λ, P is defined by JSM-3, and X X Mj ≥ Kj + KC (Γ, P ) + |Γ|, Γ ∈ Λ, j∈Γ

X j∈Λ

(15)

j∈Γ

Mj ≥

X

Kj + N + J − KR ,

(16)

j∈Λ

then the signal ensemble X can be uniquely recovered from Y with probability one. This suggests that the number of measurements of an individual signal can be substantially decreased, as long as the total number of measurements is sufficiently large to capture enough information about the nonsparse common component zC . The term KR denotes the number of indices where the common and all innovation components overlap, and appears due to the sparsity reduction that can be performed at the common component before recovery. We also note that when the supports of the innovations are independent, as J → ∞, it becomes increasingly unlikely that a given index will be included in all innovations, and thus the terms KC (Γ, P ) and KR will go to zero. On the other hand, when the supports are completely matched (implying Kj = K, j ∈ Λ), we will have KR = K, and after sparsity reduction has been addressed, KC (Γ, P ) = 0 for all Γ ⊆ Λ. 5.3.1

Recovery via Transpose Estimation of Common Component (TECC)

Successful recovery of the signal ensemble {xj } requires recovery of both the nonsparse common component zC and the sparse innovations {zj }. To help build intuition about how we might accomplish signal recovery using far fewer than N measurements per sensor, consider the following thought experiment. If zC were known, then each innovation zj could be estimated using the standard single-signal CS machinery on the adjusted measurements yj − Φj zC = Φj zj . While zC is not known in advance, P it can be estimated from the measurements. In fact, across all J sensors, a total of j∈Λ Mj random projections of zC are observed (each corrupted by a contribution from one of the zj ). Since zC is not sparse, it cannot P be recovered via CS techniques, but when the number of measurements is sufficiently large ( j∈Λ Mj ≫ N ), zC can be estimated using standard tools from linear algebra. A key requirement for such a method to succeed in recovering zC is that each Φj be different, so that their rows combine to span all of RN . In the limit (again, assuming the sparse innovation coefficients are well-behaved), the common component zC can be recovered while still allowing each sensor to operate at the minimum measurement rate dictated by the {zj }. A prototype algorithm is listed below, where we assume that each measurement matrix Φj has i.i.d. N (0, σj2 ) entries. TECC Algorithm for JSM-3 b as the vertical concatenation of the 1. Estimate common component: Define the matrix Φ 1 b T ]T . bT , . . . , Φ b = [Φ bT , Φ b regularized individual measurement matrices Φj = M σ2 Φj , that is, Φ 2 1 J j j

bT Y . Calculate the estimate of the common component as zbC = J1 Φ

2. Estimate measurements generated by innovations: Using the previous estimate, subtract the contribution of the common part from the measurements and generate estimates for the measurements caused by the innovations for each signal: ybj = yj − Φj zbC .

3. Recover innovations: Using a standard single-signal CS recovery algorithm,7 obtain estimates of the innovations zbj from the estimated innovation measurements ybj .

7

For tractable analysis of the TECC algorithm, the proof of Theorem 8 employs a least-squares variant of ℓ0 -norm minimization.

25

4. Obtain signal estimates: Estimate each signal as the sum of the common and innovations estimates; that is, x bj = zbC + zbj .

The following theorem, proved in Appendix I, shows that asymptotically, by using the TECC algorithm, each sensor needs to only measure at the rate dictated by the sparsity Kj . Theorem 8 Assume that the nonzero expansion coefficients of the sparse innovations zj are i.i.d. Gaussian random variables and that their locations are uniformly distributed on {1, 2, . . . , N }. Let the measurement matrices Φj contain i.i.d. N (0, σj2 ) entries with Mj ≥ Kj + 1. Then each signal xj can be recovered using the TECC algorithm with probability approaching one as J → ∞. For large J, the measurement rates permitted by Theorem 8 are the best possible for any recovery strategy for JSM-3 signals, even neglecting the presence of the nonsparse component. These rates meet the minimum bounds suggested by Corollary 3, although again Theorem 8 is of a slightly different flavor, as it does not provide a uniform guarantee for all sparse signal ensembles under any particular measurement matrix collection. The CS technique employed in Theorem 8 involves combinatorial searches that estimate the innovation components; we have provided the theorem simply as support for our intuitive development of the TECC algorithm. More efficient techniques could also be employed (including several proposed for CS in the presence of noise [25, 63, 64]). It is reasonable to expect similar behavior; as the error in estimating the common component diminishes, these techniques should perform similarly to their noiseless analogues. 5.3.2

Recovery via Alternating Common and Innovation Estimation (ACIE)

The preceding analysis demonstrates that the number of required measurements in JSM-3 can be substantially reduced through joint recovery. While Theorem 8 shows theoretical gains as J → ∞, practical gains can also be realized with a moderate number of sensors. In particular, suppose in the TECC algorithm that the initial estimate zbC is not accurate enough to enable correct identification of the sparse innovation supports {Ωj }. In such a case, it may still be possible for a rough approximation of the innovations {zj } to help refine the estimate zbC . This in turn could help to refine the estimates of the innovations. The Alternating Common and Innovation Estimation (ACIE) algorithm exploits the observation that once the basis vectors comprising the innovation zj have been identified in the index set Ωj , their effect on the measurements yj can be removed to aid in estimating zC . Suppose that we b j . We can then partition the measurements have an estimate for these innovation basis vectors in Ω into two parts: the projection into span({φj,n }n∈Ωb j ) and the component orthogonal to that span. We build a basis for the RMj where yj lives:

Bj = [Φj,Ωb j Qj ], b j , and the Mj × (Mj − |Ω b j |) where Φj,Ωb j is the mutilated matrix Φj corresponding to the indices in Ω matrix Qj has orthonormal columns that span the orthogonal complement of Φj,Ωb j . This construction allows us to remove the projection of the measurements into the aforemenbj: tioned span to obtain measurements caused exclusively by vectors not in Ω e j = QT Φ j . yej = QTj yj and Φ j

(17)

These modifications enable the sparse decomposition of the measurement, which now lives in b RMj −|Ωj | , to remain unchanged: 26

yej =



N X

n=1

αj φej,n .

T e = and modified measurement matrix Φ Thus, the modified measurements Ye = ye1T ye2T . . . yeJT iT h eT eT Φ eT . . . Φ can be used to refine the estimate of the common component of the signal, Φ 1 2 J e† e zf C =Φ Y,

(18)

where A† = (AT A)−1 AT denotes the pseudoinverse of matrix A. b j = Ωj ), the measurements yej will describe When the innovation support estimate is correct (Ω only the common P component zC . If this is true for every signal j and the number of remaining measurements Jj=1 (Mj − Kj ) ≥ N , then zC can be perfectly recovered via (18). However, it may be difficult to obtain correct estimates for all signal supports in the first iteration of the algorithm, and so we find it preferable to refine the estimate of the support by executing several iterations. ACIE Algorithm for JSM-3 b j = ∅ for each j. Set the iteration counter ℓ = 1. 1. Initialize: Set Ω

2. Estimate common component: Update estimate zf C according to (17)–(18).

3. Estimate innovation supports: For each sensor j, after subtracting the contribution zf C b from the measurements, ybj = yj − Φj zf , estimate the support of each signal innovation Ω . C j

4. Iterate: If ℓ < L, a preset number of iterations, then increment ℓ and return to Step 2. Otherwise proceed to Step 5. 5. Estimate innovation coefficients: For each j, estimate the coefficients for the indices in bj, Ω zbj,Ωb j = Φ† b (yj − Φj zf C ), j,Ωj

where zbj,Ωb j is a mutilated version of the innovation’s sparse coefficient vector estimate zbj .

6. Recover signals: Compute the estimate of each signal as x bj = zf bj . C +z

Estimation of the supports in Step 3 can be accomplished using a variety of techniques. We propose to run a fixed number of iterations of OMP; if the supports of the innovations are known to match across signals — as in JSM-2 — then more powerful algorithms like SOMP can be used. The ACIE algorithm is similar in spirit to other iterative estimation algorithms, such as turbo decoding [65]. 5.3.3

Simulations for JSM-3

We now present simulations of JSM-3 recovery for the following scenario. Consider J signals of length N = 50 containing a common white noise component zC (n) ∼ N (0, 1) for n ∈ {1, . . . , N }. Each innovations component zj has sparsity K = 5 (once again in the time domain), resulting in xj = zC + zj . The signals are generated according to the model used in Section 5.1.6. We study two different cases. The first is an extension of JSM-1: we select the supports for the various innovations separately and then apply OMP to each signal in Step 3 of the ACIE algorithm in order to estimate its innovations component. The second case is an extension of JSM-2: we select one common support for all of the innovations across the signals and then apply the DCS-SOMP algorithm (Section 5.2.2) to estimate the innovations in Step 3. In both cases we use L = 10 27

1

0.9

0.9 Probability of Exact Reconstruction

Probability of Exact Reconstruction

1

0.8 0.7 0.6 0.5 0.4 0.3 0.2

J=8 J = 16 J = 32

0.1 0 0

10 20 30 40 Number of Measurements per Signal, M

0.8 0.7 0.6 0.5 0.4 0.3 0.2

J=8 J = 16 J = 32

0.1 0 0

50

(a)

5

10 15 20 25 30 Number of Measurements per Signal, M

35

(b)

Figure 6: Recovering a signal ensemble with nonsparse common component and sparse innovations (JSM-3) using ACIE. (a) recovery using OMP separately on each signal in Step 3 of the ACIE algorithm (innovations have arbitrary supports). (b) recovery using DCS-SOMP jointly on all signals in Step 3 of the ACIE algorithm (innovations have identical supports). Signal length N = 50, sparsity K = 5. The common structure exploited by DCS-SOMP enables dramatic savings in the number of measurements. We average over 1000 simulation runs.

iterations of ACIE. We test the algorithms for different numbers of signals J and calculate the probability of correct recovery as a function of the (same) number of measurements per signal M . Figure 6(a) shows that, for sufficiently large J, we can recover all of the signals with significantly fewer than N measurements per signal. As J grows, it becomes more difficult to perfectly recover all J signals. We believe this is inevitable, because even if zC were known without error, then perfect ensemble recovery would require the successful execution of J independent runs of OMP. Second, for small J, the probability of success can decrease at high values of M . We believe this behavior is due to the fact that initial errors in estimating zC may tend to be somewhat sparse (since zbC roughly becomes an average of the signals {xj }), and these sparse errors can mislead the subsequent OMP processes. For more moderate M , it seems that the errors in estimating zC (though greater) tend to be less sparse. We expect that a more sophisticated algorithm could alleviate such a problem; the problem is also mitigated at higher J. Figure 6(b) shows that when the sparse innovations share common supports we see an even greater savings. As a point of reference, a traditional approach to signal acquisition would require 1600 total measurements to recover these J = 32 nonsparse signals of length N = 50. Our approach requires only approximately 10 random measurements per sensor — a total of 320 measurements — for high probability of recovery.

6

Discussion and Conclusions

In this paper we have extended the theory and practice of compressive sensing to multi-signal, distributed settings. The number of noiseless measurements required for ensemble recovery is determined by the dimensionality of the subspace in the relevant signal model, because dimensionality and sparsity play a volumetric role akin to the entropy used to characterize rates in source coding. Our three example joint sparsity models (JSMs) for signal ensembles with both intra- and inter-signal correlations capture the essence of real physical scenarios, illustrate the basic analysis and algorithmic techniques, and indicate the significant gains to be realized from joint recovery. In some sense, distributed compressive sensing (DCS) is a framework for distributed compression of 28

sources with memory, which has remained a challenging problem for some time. In addition to offering substantially reduced measurement rates, the DCS-based distributed source coding schemes we develop here share the properties of CS mentioned in Section 2. Two additional properties of DCS make it well-matched to distributed applications such as sensor networks and arrays [51, 52]. First, each sensor encodes its measurements separately, which eliminates the need for inter-sensor communication. Second, DCS distributes its computational complexity asymmetrically, placing most of it in the joint decoder, which will often have more computational resources than any individual sensor node. The encoders are very simple; they merely compute incoherent projections with their signals and make no decisions. There are many opportunities for applications and extensions of these ideas. First, natural signals are not exactly sparse but rather can be better modeled as ℓp -compressible with 0 < p ≤ 1. Roughly speaking, a signal in a weak-ℓp ball has coefficients that decay as n−1/p once sorted according to magnitude [24]. The key concept is that the ordering of these coefficients is important. For JSM-2, we can extend the notion of simultaneous sparsity for ℓp -sparse signals whose sorted coefficients obey roughly the same ordering [66]. This condition could perhaps be enforced as an ℓp constraint on the composite signal   J J J  X X X |xj (N )| . |xj (2)|, . . . , |xj (1)|,   j=1

j=1

j=1

Second, (random) measurements are real numbers; quantization gradually degrades the recovery quality as the quantization becomes coarser [34, 67, 68]. Moreover, in many practical situations some amount of measurement noise will corrupt the {xj }, making them not exactly sparse in any basis. While characterizing these effects and the resulting rate-distortion consequences in the DCS setting are topics for future work, there has been work in the single-signal CS literature that we should be able to leverage, including variants of Basis Pursuit with Denoising [63, 69], robust iterative recovery algorithms [64], CS noise sensitivity analysis [25, 34], the Dantzig Selector [33], and one-bit CS [70]. Third, in some applications, the linear program associated with some DCS decoders (in JSM-1 and JSM-3) could prove too computationally intense. As we saw in JSM-2, efficient iterative and greedy algorithms could come to the rescue, but these need to be extended to the multi-signal case. Recent results on recovery from a union of subspaces give promise for efficient, model-based algorithms [66]. Finally, we focused our theory on models that assign common and innovation components to the signals in the ensemble. Other models tailored to specific applications can be posed; for example, in hyperspectral imaging applications, it is common to obtain strong correlations only across spectral slices within a certain neighborhood. It would be then appropriate to pose a common/innovation model with separate common components that are localized to a subset of the spectral slices obtained. Results similar to those obtained in Section 4 are simple to derive for models with full-rank location matrices.

A

Proof of Theorem 1

Statement 2 is an application of the achievable bound of Theorem 4 to the case of J = 1 signal. It remains then to prove Statements 1 and 3.

29

Statement 1 (Achievable, M ≥ 2K): We first note that, if K ≥ N/2, then with probability one, the matrix Φ has rank N , and there is a unique (correct) recovery. Thus we assume that K < N/2. With probability one, all subsets of up to 2K columns drawn from Φ are linearly b such that |Ω| = |Ω| b = K, independent. Assuming this holds, then for two index sets Ω 6= Ω b colspan(ΦΩ ) ∩ colspan(ΦΩb ) has dimension equal to the number of indices common to both Ω and Ω. A signal projects to this common space only if its coefficients are nonzero on exactly these (fewer than K) common indices; since kθk0 = K, this does not occur. Thus every K-sparse signal projects to a unique point in RM . Statement 3 (Converse, M ≤ K): If M < K, there is insufficient information in the vector y to recover the K nonzero coefficients of θ; thus we assume M = K. In this case, there is a single explanation for the measurements only if there is a single set Ω of K linearly independent columns and the nonzero indices of θ are the elements of Ω. Aside from this pathological case, the rank of subsets ΦΩb will generally be less than K — which would prevent robust recovery of signals b or will be equal to K — which would give ambiguous solutions among all such supported on Ω, b sets Ω. 

B

Proof of Theorem 3

We let D := KC +

X

Kj

(19)

j∈Λ

denote the number of columns in P . Because P ∈ PF (X), there exists Θ ∈ RD such that X = P Θ. Because Y = ΦX, Θ is a solution to Y = ΦP Θ. We will argue that, with probability one over Φ, Υ := ΦP has rank D, and thus Θ is the unique solution to the equation Y = ΦP Θ = ΥΘ. We recall that, under our common/innovation model, P has the form (5), where PC is an N × KC submatrix of the N × N identity, and each Pj , j ∈ Λ, is an N × Kj submatrix of the N × N identity. To prove that Υ has rank D, we will require the following lemma, which we prove in Appendix C. Lemma 2 If (7) holds, then there exists a mapping C : {1, 2, . . . , KC } → Λ, assigning each element of the common component to one of the sensors, such that for each Γ ⊆ Λ, X j∈Γ

Mj ≥

X

Kj +

j∈Γ

KC X

1C(k)∈Γ

(20)

k=1

and such that for each k ∈ {1, 2, . . . , KC }, the kth column of PC is not a column of PC(k) . Intuitively, the existence of such a mapping suggests that (i) each sensor has taken enough measurements to cover its own innovation (requiring Kj measurements) and perhaps some of the common component, (ii) for any Γ ⊆ Λ, the sensors in Γ have collectively taken enough extra measurements to cover the requisite KC (Γ, P ) elements of the common component, and (iii) the extra measurements are taken at sensors where the common and innovation components do not overlap. Formally, we will use the existence of such a mapping to prove that Υ has rank D.

30

We proceed by noting that Υ has the form  Φ1 PC Φ1 P1 0 ...  Φ2 PC 0 Φ P 2 2 ...  Υ= .. .. .. ..  . . . . ΦJ PC 0 0 ...

0 0 .. . ΦJ PJ



  , 

where each Φj PC (respectively, Φj Pj ) is an Mj × KC (respectively, Mj × Kj ) submatrix of Φj obtained by selecting columns from Φj according to the nonzero entries of PC (respectively, Pj ). In total, Υ has D columns (19). To argue that Υ has rank D, we will consider a sequence of three matrices Υ0 , Υ1 , and Υ2 constructed from small modifications to Υ. We begin by letting Υ0 denote the “partially zeroed” matrix obtained from Υ using the following construction. We first let Υ0 = Υ and then make the following adjustments: 1. Let k = 1. 2. For each j such that Pj has a column that matches column k of PC (note that by Lemma 2 this cannot happen if C(k) = j), let k′ represent the column index of the full matrix P where this column of Pj occurs. Subtract column k′ of Υ0 from column k of Υ0 . This forces to zero all entries of Υ0 formerly corresponding to column k of the block Φj PC . 3. If k < KC , add one to k and go to step 2. The matrix Υ0 is identical to Υ everywhere except on the first KC columns, where any portion of a column overlapping with a column of Φj Pj to its right has been set to zero. Thus, Υ0 satisfies the next two properties, which will be inherited by matrices Υ1 and Υ2 that we subsequently define: P1. Each entry of Υ0 is either zero or a Gaussian random variable. P2. All Gaussian random variables in Υ0 are i.i.d. Finally, because Υ0 was constructed only by subtracting columns of Υ from one another, rank(Υ0 ) = rank(Υ).

(21)

We now let Υ1 bePthe matrix obtained from Υ0 using the following construction. For each C j ∈ Λ, we select Kj + K k=1 1C(k)=j arbitrary rows from the portion of Υ0 corresponding to sensor j. Using (19), the resulting matrix Υ1 has X

Kj +

j∈Λ

KC X k=1

1C(k)=j

!

=

X

Kj + KC = D

j∈Λ

rows. Also, because Υ1 was obtained by selecting a subset of rows from Υ0 , it has D columns and satisfies rank(Υ1 ) ≤ rank(Υ0 ). (22) We now let Υ2 be the D × D matrix obtained by permuting columns of Υ1 using the following construction: 1. Let Υ2 = [ ], and let j = 1. 31

th 2. For each k such that C(k) = j, let Υ1 (k) denote PKCthe k column of Υ1 , and concatenate Υ1 (k) to Υ2 , i.e., let Υ2 ← [Υ2 Υ1 (k)]. There are k=1 1C(k)=j such columns.

3. Let Υ′1 denote the columns of Υ1 corresponding to the entries of Φj Pj (the innovation components of sensor j), and concatenate Υ′1 to Υ2 , i.e., let Υ2 ← [Υ2 Υ′1 ]. There are Kj such columns. 4. If j < J, let j ← j + 1 and go to Step 2.

Because Υ1 and Υ2 share the same columns up to reordering, it follows that rank(Υ2 ) = rank(Υ1 ).

(23)

Based on its dependency on Υ0 , and following from Lemma 2, the square matrix Υ2 meets properties P1 and P2 defined above in addition to a third property: P3. All diagonal entries of Υ2 are Gaussian random variables. P C This follows because for each j, Kj + K k=1 1C(k)=j rows of Υ1 are assigned in its construction, while PKC Kj + k=1 1C(k)=j columns of Υ2 are assigned in its construction. Thus, each diagonal element of Υ2 will either be an entry of some Φj Pj , which remains Gaussian throughout our constructions, or it will be an entry of some kth column of some Φj PC for which C(k) = j. In the latter case, we know by Lemma 2 and the construction of Υ0 that this entry remains Gaussian throughout our constructions. Having identified these three properties satisfied by Υ2 , we will prove by induction that, with probability one over Φ, such a matrix has full rank. Lemma 3 Let Υ(d−1) be a (d − 1) × (d − 1) matrix having full rank. Construct a d × d matrix Υ(d) as follows:  (d−1)  Υ v1 (d) Υ := ω v2t

where v1 , v2 ∈ Rd−1 are vectors with each entry being either zero or a Gaussian random variable, ω is a Gaussian random variable, and all random variables are i.i.d. and independent of Υ(d−1) . Then with probability one, Υ(d) has full rank. Applying Lemma 3 inductively D times, the success probability remains one. It follows that with probability one over Φ, rank(Υ2 ) = D. Combining this last result with (21-23), we obtain rank(Υ) = D with probability one over Φ. It remains to prove Lemma 3. Proof of Lemma 3: When d = 1, Υ(d) = [ω], which has full rank if and only if ω 6= 0, which occurs with probability one. When d > 1, using expansion by minors, the determinant of Υ(d) satisfies det(Υ(d) ) = ω · det(Υ(d−1) ) + C, where C = C(Υ(d−1) , v1 , v2 ) is independent of ω. The matrix Υ(d) has full rank if and only if det(Υ(d) ) 6= 0, which is satisfied if and only if ω 6=

−C . det(Υ(d−1) )

By assumption, det(Υ(d−1) ) 6= 0 and ω is a Gaussian random variable that is independent of C and −C  det(Υ(d−1) ). Thus, ω 6= det(Υ (d−1) ) with probability one. 32

C

Proof of Lemma 2

To prove this lemma, we apply tools from graph theory. We seek a matching within the graph G = (VV , VM , E) from Figure 1, i.e., a subgraph (VV , VM , E) with E ⊆ E that pairs each element of VV with a unique element of VM . Such a matching will immediately give us the desired mapping C as follows: for each k ∈ {1, 2, . . . , KC } ⊆ VV , we let (j, m) ∈ VM denote the single node matched to k by an edge in E, and we set C(k) = j. To prove the existence of such a matching within the graph, we invoke a version of Hall’s marriage theorem for bipartite graphs [71]. Hall’s theorem states that within a bipartite graph (V1 , V2 , E), there exists a matching that assigns each element of V1 to a unique element of V2 if for any collection of elements Π ⊆ V1 , the set E(Π) of neighbors of Π in V2 has cardinality |E(Π)| ≥ |Π|. In the context of our lemma, Hall’s condition requires that for any set of entries in the value vector, Π ⊆ VV , the set E(Π) of neighbors of Π in VM has size |E(Π)| ≥ |Π|. We will prove that if (7) is satisfied, then Hall’s condition is satisfied, and thus a matching must exist. Let us consider an arbitrary set Π ⊆ VV . We let E(Π) denote the set of neighbors of Π in VM joined by edges in E, and we let SΠ = {j ∈ Λ : (j, m) ∈ E(Π) for some m}. Thus, SΠ ⊆ Λ denotes the set of P signal indices whose measurement nodes have edges that connect to Π. It follows that |E(Π)| = j∈SΠ Mj . Thus, in order to satisfy Hall’s condition for Π, we require X

Mj ≥ |Π|.

(24)

j∈SΠ

P

We would now like to show that j∈SΠ Kj + KC (SΠ , P ) ≥ |Π|, and thus if (7) is satisfied for all Γ ⊆ Λ, then (24) is satisfied in particular for SΠ ⊆ Λ. In general, the set Π may contain vertices for both common components and innovation components. We write Π = ΠI ∪ ΠC to denote the disjoint union of these two sets. P By construction, |ΠI | = j∈SΠ Kj because we count all innovations with neighbors in SΠ , and because SΠ contains all neighbors for nodes in ΠI . We will also argue that KC (SΠ , P ) ≥ |ΠC | as follows. By definition, for a set Γ ⊆ Λ, KC (Γ, P ) counts the number of columns in PC that also appear in Pj for all j ∈ / Γ. By construction, for each k ∈ ΠC , node k has no connection to nodes (j, m) for j ∈ / SΠ ; thus it must follow that the kth column of PC is present in Pj for all j ∈ / SΠ , due to the construction of the graph G. Consequently, KC (SΠ , P ) ≥ |ΠC |. P Thus, j∈SΠ Kj + KC (SΠ , P ) ≥ |ΠI | + |ΠC | = |Π|, and so (7) implies (24) for any Π, and so Hall’s condition is satisfied, and a matching exists. Because in such a matching a set of vertices in VM matches to a set in VV of lower or equal cardinality, we have in particular that (20) holds for each Γ ⊆ Λ. 

D

Proof of Theorem 4

Given the measurements Y and measurement matrix Φ, we will show that it is possible to recover some P ∈ PF (X) and a corresponding vector Θ such that X = P Θ using the following algorithm: • Take the last measurement of each sensor for verification, and sum these J measurements to obtain a single global test measurement y. Similarly, add the corresponding rows of Φ into a single row φ. P • Group all the remaining j∈Λ Mj − J measurements into a vector Y and a matrix Φ. 33

• For each matrix P ∈ P: – choose a single solution ΘP to Y = ΦP ΘP independently of φ — if no solution exists, skip the next two steps; – define XP = P ΘP ; – cross-validate: check if y = φXP ; if so, return the estimate (P, ΘP ); if not, continue with the next matrix. We begin by showing that, with probability one over Φ, the algorithm only terminates when it gets a correct solution — in other words, that for each P ∈ P the cross-validation measurement y can determine whether XP = X. We note that all entries of the vector φ are i.i.d. Gaussian, and independent from Φ. Assume for the sake of contradiction that there exists a matrix P ∈ P such that y = φXP , but XP = P ΘP 6= X; this implies φ(X − XP ) = 0, which occurs with probability zero over Φ. Thus, if XP 6= X, then φXP 6= y with probability one over Φ. Since we only need to search over a finite number of matrices P ∈ P, cross validation will determine whether each matrix P ∈ P gives the correct solution with probability one. We now show that there is a matrix in P for which the algorithm will terminate with the correct solution. We know that the matrix P ∗ ∈ PF (X) ⊆ P will be part of our search, and that the unique solution ΘP ∗ to Y = ΦP ∗ ΘP ∗ yields X = P ∗ ΘP ∗ when (8) holds for P ∗ , as shown in Theorem 3. Thus, the algorithm will find at least one matrix P and vector ΘP such that X = P ΘP ; when such matrix is found the cross-validation step will return this solution and end the algorithm.  Remark 2 Consider the algorithm used in the proof: if the matrices in P are sorted by number of columns, then the algorithm is akin to ℓ0 -norm minimization on Θ with an additional crossvalidation step. The ℓ0 -norm minimization algorithm is known to be optimal for recovery of strictly sparse signals from noiseless measurements.

E

Proof of Theorem 5

We let D denote the number of columns in P . Because P ∈ PF (X), there exists Θ ∈ RD such that X = P Θ. Because Y = ΦX, then Θ is a solution to Y = ΦP Θ. We will argue for Υ := ΦP that b 6= Θ such that Y = ΥΘ = ΥΘ. b Moreover, since P has full rank(Υ) < D, and thus there exists Θ b b rank, it follows that X := P Θ 6= P Θ = X. We let Υ0 be the “partially zeroed” matrix obtained from Υ using the identical procedure detailed in Appendix B. Again, because Υ0 was constructed only by subtracting columns of Υ from one another, it follows that rank(Υ0 ) = rank(Υ). Suppose Γ ⊆ Λ is a set for which (9) holds. We let Υ1 be the submatrix of Υ0 obtained by selecting the following columns: • For any k ∈ {1, 2, . . . , KC } such that column k of PC also appears as a column in all Pj for j∈ / Γ, we include column k of Υ0 as a column in Υ1 . There are KC (Γ, P ) such columns k. • For any k ∈ {KC + 1, KC + 2, . . . , D} such that column k of P corresponds to an innovation P for some sensor j ∈ Γ, we include column k of Υ0 as a column in Υ1 . There are j∈Γ Kj such columns k. P This submatrix has j∈Γ Kj + KC (Γ, P ) columns. Because Υ0 has the same size as Υ, and in particular has only D columns, then in order to have that rank(Υ0 ) = D, it is necessary that all P j∈Γ Kj + KC (Γ, P ) columns of Υ1 be linearly independent. 34

Based on the method described for constructing Υ0 , it follows that Υ1 is zero for all measurement rows not corresponding to the set Γ. Therefore, consider the submatrix Υ2 of Υ1 obtained by selecting only the measurement rows corresponding to the setPΓ. Because of the zeros in Υ1 , it follows that rank(Υ1 ) = rank(Υ2 ).PHowever, since P Υ2 has only j∈Γ Mj rows, P we invoke (9) and have that rank(Υ1 ) = rank(Υ2 ) ≤ j∈Γ Mj < j∈Γ Kj +KC (Γ, P ). Thus, all j∈Γ Kj +KC (Γ, P ) columns of Υ1 cannot be linearly independent, and so Υ does not have full rank. 

F

Proof of Lemma 1

Necessary conditions on innovation components: We begin by proving that in order to recover zC , z1 , and z2 via the γ-weighted ℓ1 -norm formulation it is necessary that z1 can be recovered via single-signal ℓ1 -norm minimization using Φ1 and measurements y¯1 = Φ1 z1 . Consider the single-signal ℓ1 -norm minimization problem z¯1 = arg min kz1 k1

s.t. y¯1 = Φ1 z1 .

Suppose that this ℓ1 -norm minimization for z1 fails; that is, there exists z¯1 6= z1 such that y¯1 = Φ1 z¯1 and kz¯1 k1 ≤ kz1 k1 . Therefore, substituting z¯1 instead of z1 in the γ-weighted ℓ1 -norm formulation (12) provides an alternate explanation for the measurements with a smaller or equal modified ℓ1 norm penalty. Consequently, recovery of z1 using (12) will fail and we will recover x1 incorrectly. We conclude that the single-signal ℓ1 -norm minimization of z1 using Φ1 is necessary for successful recovery using the γ-weighted ℓ1 -norm formulation. A similar condition for ℓ1 -norm minimization of z2 using Φ2 and measurements Φ2 z2 can be proved in an analogous manner. Necessary condition on common component: We now prove that in order to recover zC , z1 , and z2 via the γ-weighted ℓ1 -norm formulation it is necessary that zC can be recovered via singlesignal ℓ1 -norm minimization using the joint matrix [ΦT1 ΦT2 ]T and measurements [ΦT1 ΦT2 ]T zC . The proof is very similar to the previous proof for the innovation component z1 . Consider the single-signal ℓ1 -norm minimization z¯C = arg min kzC k1

s.t. y¯C = [ΦT1 ΦT2 ]T zC .

Suppose that this ℓ1 -norm minimization for zC fails; that is, there exists z¯C 6= zC such that y¯C = [ΦT1 ΦT2 ]T z¯C and k¯ zC k1 ≤ kzC k1 . Therefore, substituting z¯C instead of zC in the γ-weighted ℓ1 -norm formulation (12) provides an alternate explanation for the measurements with a smaller modified ℓ1 -norm penalty. Consequently, the recovery of zC using the γ-weighted ℓ1 -norm formulation (12) will fail, and thus we will recover x1 and x2 incorrectly. We conclude that the single-signal ℓ1 -norm minimization of zC using [ΦT1 ΦT2 ]T is necessary for successful recovery using the γ-weighted ℓ1 norm formulation. 

G

Proof of Theorem 6

We construct measurement matrices Φ1 and Φ2 that consist of two sets of rows. The first set of rows is identical in both and recovers the signal difference x1 − x2 . The second set is different and recovers the signal average 21 x1 + 12 x2 . Let the submatrix formed by the identical rows for the signal difference be ΦD , and let the submatrices formed by unique rows for the signal average be ΦA,1 and ΦA,2 . Thus the measurement matrices Φ1 and Φ2 are of the following form:     ΦD ΦD . and Φ2 = Φ1 = ΦA,2 ΦA,1 35

The submatrices ΦD , ΦA,1 , and ΦA,2 contain i.i.d. Gaussian entries. Once the difference x1 − x2 and average 21 x1 + 21 x2 have been recovered using the above technique, the computation of x1 and x2 is straightforward. The measurement rate can be computed by considering both parts of the measurement matrices. Recovery of signal difference: The submatrix ΦD is used to recover the signal difference. By subtracting the product of ΦD with the signals x1 and x2 , we have ΦD x1 − ΦD x2 = ΦD (x1 − x2 ). In the original representation we have x1 − x2 = z1 − z2 with sparsity rate 2SI . But z1 (n) − z2 (n) is nonzero only if z1 (n) is nonzero or z2 (n) is nonzero. Therefore, the sparsity rate of x1 − x2 is equal to the sum of the individual sparsities reduced by the sparsity rate of the overlap, and so we have S(X1 − X2 ) = 2SI − (SI )2 . Therefore, any measurement rate greater than c′ (2SI − (SI )2 ) for each ΦD permits recovery of the length N signal x1 − x2 . (As always, the probability of correct recovery approaches one as N increases.) Recovery of average: Once x1 − x2 has been recovered, we have 1 1 1 1 x1 − (x1 − x2 ) = x1 + x2 = x2 + (x1 − x2 ). 2 2 2 2 At this stage, we know x1 − x2 , ΦD x1 , ΦD x2 , ΦA,1 x1 , and ΦA,2 x2 . We have   1 1 1 x1 + x2 , ΦD x1 − ΦD (x1 − x2 ) = ΦD 2 2 2   1 1 1 ΦA,1 x1 − ΦA,1 (x1 − x2 ) = ΦA,1 x1 + x2 , 2 2 2   1 1 1 x1 + x2 , ΦA,2 x2 + ΦA,2 (x1 − x2 ) = ΦA,2 2 2 2 where ΦD (x1 − x2 ), ΦA,1 (x1 − x2 ), and ΦA,2 (x1 − x2 ) are easily computable because (x1 − x2 ) has been recovered. The signal 21 x1 + 12 x2 is of length N ; its sparsity rate is equal to the sum of the individual sparsities SC + 2SI reduced by the sparsity rate of the overlaps, and so we have S( 12 X1 + 21 X2 ) = SC + 2SI − 2SC SI − (SI )2 + SC (SI )2 . Therefore, any measurement rate greater than c′ (SC + 2SI − 2SC SI − (SI )2 + SC (SI )2 ) aggregated over the matrices ΦD , ΦA,1 , and ΦA,2 enables recovery of 21 x1 + 21 x2 . Computation of measurement rate: By considering the requirements on ΦD , the individual measurement rates R1 and R2 must satisfy (13a). Combining the measurement rates required for ΦA,1 and ΦA,2 , the sum measurement rate satisfies (13b). We complete the proof by noting that c′ (·) is continuous and that limS→0 c′ (S) = 0. Thus, as SI goes to zero, the limit of the sum measurement rate is c′ (S). 

H

Proof of Theorem 7

We assume that Ψ is an orthonormal matrix. Like Φj itself, the matrix Φj Ψ also has i.i.d. N (0, 1) entries, since Ψ is orthonormal. For convenience, we assume Ψ = IN . The results presented can be easily extended to a more general orthonormal matrix Ψ by replacing Φj with Φj Ψ. Assume without loss of generality that Ω = {1, 2, . . . , K} for convenience of notation. Thus, the correct estimates are n ≤ K, and the incorrect estimates are n ≥ K + 1. Now consider the statistic ξn in (14). This is the sample mean of J i.i.d. variables. The variables hyj , φj,n i2 are i.i.d. 36

since each yj = Φj xj , and Φj and xj are i.i.d. Furthermore, these variables have a finite variance.8 Therefore, we invoke the Law of Large Numbers (LLN) to argue that ξn , which is a sample mean of hyj , φj,n i2 , converges to E[hyj , φj,n i2 ] as J grows large. We now compute E[hyj , φj,n i2 ] under two cases. In the first case, we consider n ≥ K + 1 (we call this the “bad statistics case”), and in the second case, we consider n ≤ K (“good statistics case”). Bad statistics: Consider one of the bad statistics by choosing n = K + 1 without loss of generality. We have E[hyj , φj,K+1 i2 ] = E = E

"

K X

n=1 "K X

#2

xj (n)hφj,n , φj,K+1 i 2

2

xj (n) hφj,n , φj,K+1 i

n=1



+ E

K K X X

n=1 ℓ=1,ℓ6=n

=

K X

n=1

+

#



xj (ℓ)xj (n)hφj,ℓ , φj,K+1 ihφj,n , φj,K+1 i

    E xj (n)2 E hφj,n , φj,K+1 i2

K K X X

E[xj (ℓ)]E[xj (n)]E [hφj,ℓ , φj,K+1 ihφj,n , φj,K+1 i] ,

n=1 ℓ=1,ℓ6=n

since the terms are independent. We also have E[xj (n)] = E[xj (ℓ)] = 0, and so E[hyj , φj,K+1 i2 ] = =

K X

n=1 K X

n=1

    E xj (n)2 E hφj,n , φj,K+1 i2

  σ 2 E hφj,n , φj,K+1 i2 .

(25)

  To compute E hφj,n , φj,K+1 i2 , let φj,n be the column vector [a1 , a2 , . . . , aM ]T , where each element in the vector is i.i.d. N (0, 1). Likewise, let φj,K+1 be the column vector [b1 , b2 , . . . , bM ]T where the elements are i.i.d. N (0, 1). We have hφj,n , φj,K+1 i2 = (a1 b1 + a2 b2 + . . . + aM bM )2 =

M X

a2m b2m + 2

M X

am ar bm br .

m=1 r=m+1

m=1 8

M −1 X

In [61], we evaluate the variance of hyj , φj,n i2 as  M σ 4 (34M K + 6K 2 + 28M 2 + 92M + 48K + 90 + 2M 3 + 2M K 2 + 4M 2 K), 2 Var[hyj , φj,n i ] = 2M Kσ 4 (M K + 3K + 3M + 6),

For finite M , K and σ, the above variance is finite.

37

n∈Ω n∈ / Ω.

Taking the expected value, we have # # "M −1 M " M X X X   2 2 2 am ar bm br am bm + 2E E hφj,n , φj,K+1 i = E m=1 r=m+1

m=1

=

M X

m=1

=

M X

E [am ar bm br ]

m=1 r=q+1

M X

M −1 X M X     E a2m E b2m + 2 E [am ] E [ar ] E [bm ] E [br ]

M X

(1) + 0

m=1

=

  E a2m b2m + 2

M −1 X

m=1 r=m+1

(since the random variables are independent)

    (since E a2m = E b2m = 1 and E [am ] = E [bm ] = 0)

m=1

= M, and thus

  E hφj,n , φj,K+1 i2 = M.

(26)

Combining this result with (25), we find that

E[hyj , φj,K+1 i2 ] =

K X

σ 2 M = M Kσ 2 .

n=1

Thus we have computed E[hyj , φj,K+1 i2 ] and can conclude that as J grows large, the statistic ξK+1 converges to E[hyj , φj,K+1 i2 ] = M Kσ 2 . (27) Good statistics: Consider one of the good statistics, and without loss of generality choose n = 1. Then, we have  !2  K X E[hyj , φj,1 i2 ] = E  xj (1)kφj,1 k2 + xj (n)hφj,n , φj,1 i  n=2

i

h

= E (xj (1))2 kφj,1 k4 + E

  = σ 2 E kφj,1 k4 +

n=2

K X

xj (n)2 hφj,n , φj,1 i2

n=2

    = E xj (1)2 E kφj,1 k4 + K X

"

K X

n=2

#

(all other cross terms have zero expectation)

    E xj (n)2 E hφj,n , φj,1 i2

  σ 2 E hφj,n , φj,1 i2 .

(by independence) (28)

Extending the result from (26), we can show that Ehφj,n , φj,1 i2 = M . Using this result in (28), E[hyj , φj,1 i2 ] = σ 2 Ekφj,1 k4 +

K X

n=2

38

σ 2 M.

(29)

  To evaluate E kφj,1 k4 , let φj,1 be the column vector [c1 , c2 , . . . , cM ]T , where the elements of the P 2 vector arerandom N(0, 1). Define the random variable Z = kφj,1 k2 = M m=1 cm . Note that (i)   4 2 E kφj,1 k  = E Z  and (ii) Z is chi-squared distributed with M degrees of freedom. Thus, E kφj,1 k4 = E Z 2 = M (M + 2). Using this result in (29), we have E[hyj , φj,1 i2 ] = σ 2 M (M + 2) + (K − 1)σ 2 M = M (M + K + 1)σ 2 . We have computed the variance of hyj , φj,1 i and can conclude that as J grows large, the statistic ξ1 converges to E[hyj , φj,1 i2 ] = (M + K + 1)M σ 2 . (30) Conclusion: From (27) and (30) we conclude that  (M + K + 1)M σ 2 , n ∈ Ω 2 lim ξn = E[hyj , φj,n i ] = KM σ 2 , n∈ / Ω. J→∞ . Therefore, as J increases we can For any M ≥ 1, these values are distinct — their ratio is M +K+1 K distinguish between the two expected values of ξn with overwhelming probability. 

I

Proof of Theorem 8

Our proof has two parts. First we argue that limJ→∞ zbC = zC . Then we show that this implies vanishing probability of error in recovering each innovation zj . Part 1: We can write our estimate as

Mj J J J 1X 1 X R T R 1 bT 1 X bT 1X 1 T (φ ) φj,mxj , Φ Φ j xj = zbC = Φ y = Φj yj = J J J J Mj σj2 j Mj σj2 m=1 j,m j=1 j=1 j=1

where φR j,m denotes the m-th row of Φj , that is, the m-th measurement vector for node j. Since T R the elements of each Φj are Gaussians with variance σj2 , the product (φR j,m ) φj,m has the property T R 2 E[(φR j,m ) φj,m ] = σj IN .

It follows that T R 2 2 2 E[(φR j,m ) φj,m xj ] = σj E[xj ] = σj E[zC + zj ] = σj zC

and, similarly, that



E

1 Mj σj2

Mj X



T R  (φR j,m ) φj,m xj = zC .

m=1

Thus, zbC is a sample mean of J independent random variables with mean zC . From the law of large numbers, we conclude that limJ→∞ zbC = zC .

Part 2: Consider recovery of the innovation zj from the adjusted measurement vector ybj = yj − Φj zbC . As a recovery scheme, we consider a combinatorial search over all K-sparse index sets drawn from {1, 2, . . . , N }. For each such index set Ω′ , we compute the distance from yb to the column y , colspan(Φj,Ω′ )), where Φj,Ω′ is the matrix obtained by sampling the span of Φj,Ω′ , denoted by d(b columns Ω′ from Φj . (This distance can be measured using the pseudoinverse of Φj,Ω′ .) For the correct index set Ω, we know that d(ybj , colspan(Φj,Ω )) → 0 as J → ∞. For any other index set Ω′ , we know from the proof of Theorem 1 that d(ybj , colspan(Φj,Ω′ )) > 0. Let 39

ζ := min d(ybj , colspan(Φi,Ω′ )). ′ Ω 6=Ω

With probability one, ζ > 0. Thus for sufficiently large J, we will have d(ybj , colspan(Φj,Ω )) < ζ/2, and so the correct index set Ω can be correctly identified. Since limJ→∞ zbC = zC , the innovation estimates zbj = zj for each j and for J large enough. 

Acknowledgments

Thanks to Emmanuel Cand`es, Albert Cohen, Ron DeVore, Anna Gilbert, Illya Hicks, Robert Nowak, Jared Tanner, and Joel Tropp for informative and inspiring conversations. Special thanks to to Mark Davenport for a thorough critique of the manuscript. MBW thanks his former affiliated institutions Caltech and the University of Michigan, where portions of this work were performed. Final thanks to Ryan King for supercharging our computational capabilities.

References [1] D. Baron, M. F. Duarte, S. Sarvotham, M. B. Wakin, and R. G. Baraniuk, “An information-theoretic approach to distributed compressed sensing,” in Allerton Conf. Communication, Control, and Computing, (Monticello, IL), Sept. 2005. [2] M. F. Duarte, S. Sarvotham, D. Baron, M. B. Wakin, and R. G. Baraniuk, “Distributed compressed sensing of jointly sparse signals,” in Asilomar Conf. Signals, Systems and Computers, (Pacific Grove, CA), Nov. 2005. [3] M. B. Wakin, S. Sarvotham, M. F. Duarte, D. Baron, and R. G. Baraniuk, “Recovery of jointly sparse signals from few random projections,” in Workshop on Neural Info. Proc. Sys. (NIPS), (Vancouver, Canada), Nov. 2005. [4] M. F. Duarte, S. Sarvotham, D. Baron, M. B. Wakin, and R. G. Baraniuk, “Performance limits for jointly sparse signals via graphical models,” in Workshop on Sensor, Signal and Information Proc. (SENSIP), (Sedona, AZ), May 2008. [5] R. A. DeVore, B. Jawerth, and B. J. Lucier, “Image compression through wavelet transform coding,” IEEE Trans. Inform. Theory, vol. 38, pp. 719–746, Mar. 1992. [6] J. Shapiro, “Embedded image coding using zerotrees of wavelet coefficients,” IEEE Trans. Signal Proc., vol. 41, pp. 3445–3462, Dec. 1993. [7] Z. Xiong, K. Ramchandran, and M. T. Orchard, “Space-frequency quantization for wavelet image coding,” IEEE Trans. Image Proc., vol. 6, no. 5, pp. 677–693, 1997. [8] S. Mallat, A Wavelet Tour of Signal Processing. San Diego: Academic Press, 1999. [9] K. Brandenburg, “MP3 and AAC explained,” in AES 17th International Conference on High-Quality Audio Coding, Sept. 1999. [10] W. Pennebaker and J. Mitchell, “JPEG: Still image data compression standard,” Van Nostrand Reinhold, 1993. [11] D. S. Taubman and M. W. Marcellin, JPEG 2000: Image Compression Fundamentals, Standards and Practice. Kluwer, 2001. [12] T. M. Cover and J. A. Thomas, Elements of Information Theory. New York: Wiley, 1991. [13] D. Slepian and J. K. Wolf, “Noiseless coding of correlated information sources,” IEEE Trans. Inform. Theory, vol. 19, pp. 471–480, July 1973. [14] S. Pradhan and K. Ramchandran, “Distributed source coding using syndromes (DISCUS): Design and construction,” IEEE Trans. Inform. Theory, vol. 49, pp. 626–643, Mar. 2003. [15] Z. Xiong, A. Liveris, and S. Cheng, “Distributed source coding for sensor networks,” IEEE Signal Proc. Mag., vol. 21, pp. 80–94, Sept. 2004. [16] R. Cristescu, B. Beferull-Lozano, and M. Vetterli, “On network correlated data gathering,” in IEEE INFOCOM, (Hong Kong), Mar. 2004. [17] M. Gastpar, P. L. Dragotti, and M. Vetterli, “The distributed Karhunen-Loeve transform,” IEEE Trans. Inform. Theory, vol. 52, pp. 5177–5196, Dec. 2006.

40

[18] R. Wagner, V. Delouille, H. Choi, and R. G. Baraniuk, “Distributed wavelet transform for irregular sensor network grids,” in IEEE Statistical Signal Proc. (SSP) Workshop, (Bordeaux, France), July 2005. [19] A. Ciancio and A. Ortega, “A distributed wavelet compression algorithm for wireless multihop sensor networks using lifting,” in IEEE Int. Conf. Acoustics, Speech, Signal Proc. (ICASSP), (Philadelphia), Mar. 2005. [20] T. Uyematsu, “Universal coding for correlated sources with memory,” in Canadian Workshop Inform. Theory, (Vancouver, Canada), June 2001. [21] J. Garcia-Frias and W. Zhong, “LDPC codes for compression of multi-terminal sources with hidden Markov correlation,” IEEE Communications Letters, vol. 7, pp. 115–117, Mar. 2003. [22] C. Chen, X. Ji, Q. Dai, and X. Liu, “Slepian-Wolf coding of binary finite memory sources using Burrows Wheeler transform,” in Data Compression Conference, (Snowbird, UT), Mar. 2009. [23] E. Cand`es, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Inform. Theory, vol. 52, pp. 489–509, Feb. 2006. [24] D. Donoho, “Compressed sensing,” IEEE Trans. Inform. Theory, vol. 52, pp. 1289–1306, Apr. 2006. [25] E. J. Cand`es, “Compressive sampling,” in International Congress of Mathematicians, vol. 3, (Madrid, Spain), pp. 1433–1452, 2006. [26] J. Tropp and A. C. Gilbert, “Signal recovery from partial information via orthogonal matching pursuit,” IEEE Trans. Info. Theory, vol. 53, pp. 4655–4666, Dec. 2007. [27] R. G. Baraniuk, M. Davenport, R. A. DeVore, and M. B. Wakin, “A simple proof of the restricted isometry property for random matrices,” Constructive Approximation, vol. 28, pp. 253–263, Dec. 2008. [28] W. U. Bajwa, J. D. Haupt, A. M. Sayeed, and R. D. Nowak, “Joint source-channel communication for distributed estimation in sensor networks,” IEEE Trans. Inform. Theory, vol. 53, pp. 3629–3653, Oct. 2007. [29] M. Rabbat, J. D. Haupt, A. Singh, and R. D. Nowak, “Decentralized compression and predistribution via randomized gossiping,” in Int. Workshop on Info. Proc. in Sensor Networks (IPSN), (Nashville, TN), Apr. 2006. [30] W. Wang, M. Garofalakis, and K. Ramchandran, “Distributed sparse random projections for refinable approximation,” in Int. Workshop on Info. Proc. in Sensor Networks (IPSN), (Cambridge, MA), 2007. [31] S. Aeron, M. Zhao, and V. Saligrama, “On sensing capacity of sensor networks for the class of linear observation, fixed SNR models,” 2007. Preprint. [32] S. Aeron, M. Zhao, and V. Saligrama, “Fundamental limits on sensing capacity for sensor networks and compressed sensing,” 2008. Preprint. [33] E. Cand`es and T. Tao, “The Dantzig selector: Statistical estimation when p is much larger than n,” Annals of Statistics, vol. 35, pp. 2313–2351, Dec. 2007. [34] S. Sarvotham, D. Baron, and R. G. Baraniuk, “Measurements vs. bits: Compressed sensing meets information theory,” in Allerton Conf. Communication, Control, and Computing, (Monticello, IL), Sept. 2006. [35] E. Cand`es and D. Donoho, “Curvelets — A surprisingly effective nonadaptive representation for objects with edges,” Curves and Surfaces, 1999. [36] M. F. Duarte, M. B. Wakin, D. Baron, and R. G. Baraniuk, “Universal distributed sensing via random projections,” in Int. Workshop on Info. Proc. in Sensor Networks (IPSN), (Nashville, TN), Apr. 2006. [37] “Compressed sensing website.” http://dsp.rice.edu/cs/. [38] R. Venkataramani and Y. Bresler, “Further results on spectrum blind sampling of 2D signals,” in IEEE Int. Conf. Image Proc. (ICIP), vol. 2, (Chicago), Oct. 1998. [39] E. Cand`es, M. Rudelson, T. Tao, and R. Vershynin, “Error correction via linear programming,” in Annual IEEE Symposium on Foundations of Computer Science, (Pittsburgh, PA), pp. 295–308, Oct. 2005. [40] D. Donoho and J. Tanner, “Neighborliness of randomly projected simplices in high dimensions,” Proc. National Academy of Sciences, vol. 102, no. 27, pp. 9452–457, 2005. [41] D. Donoho, “High-dimensional centrally symmetric polytopes with neighborliness proportional to dimension,” Discrete and Computational Geometry, vol. 35, pp. 617–652, Mar. 2006. [42] D. L. Donoho and J. Tanner, “Counting faces of randomly projected polytopes when the projection radically lowers dimension,” Journal of the American Mathematical Society, vol. 22, pp. 1–53, Jan. 2009. [43] A. Cohen, W. Dahmen, and R. A. DeVore, “Near optimal approximation of arbitrary vectors from highly incomplete measurements,” 2007. Preprint.

41

[44] D. Needell and R. Vershynin, “Uniform uncertainty principle and signal recovery via regularized orthogonal matching pursuit,” Dec. 2007. Preprint. [45] D. Needell and J. Tropp, “CoSaMP: Iterative signal recovery from incomplete and inaccurate samples,” Applied and Computational Harmonic Analysis, June 2008. To appear. [46] W. Dai and O. Milenkovic, “Subspace pursuit for compressive sensing: Closing the gap between performance and complexity,” Mar. 2008. Preprint. [47] A. Hormati and M. Vetterli, “Distributed compressed sensing: Sparsity models and reconstruction algorithms using anihilating filter,” in IEEE Int. Conf. Acoustics, Speech, Signal Proc. (ICASSP), (Las Vegas, NV), pp. 5141– 5144, Apr. 2008. [48] M. Mishali and Y. Eldar, “Reduce and boost: Recovering arbitrary sets of jointly sparse vectors,” Feb. 2008. Preprint. [49] M. Fornassier and H. Rauhut, “Recovery algorithms for vector valued data with joint sparsity constrains,” SIAM Journal on Numerical Analysis, vol. 46, no. 2, pp. 577–613, 2008. [50] R. Gribonval, H. Rauhut, K. Schnass, and P. Vandergheynst, “Atoms of all channels, unite! Average case analysis of multi-channel sparse recovery using greedy algorithms,” 2007. Preprint. [51] D. Estrin, D. Culler, K. Pister, and G. Sukhatme, “Connecting the physical world with pervasive networks,” IEEE Pervasive Computing, vol. 1, no. 1, pp. 59–69, 2002. [52] G. J. Pottie and W. J. Kaiser, “Wireless integrated network sensors,” Comm. ACM, vol. 43, no. 5, pp. 51–58, 2000. [53] J. Tropp, A. C. Gilbert, and M. J. Strauss, “Simulataneous sparse approximation via greedy pursuit,” in IEEE Int. Conf. Acoustics, Speech, Signal Proc. (ICASSP), (Philadelphia), Mar. 2005. [54] V. N. Temlyakov, “A remark on simultaneous sparse approximation,” East J. Approx., vol. 100, pp. 17–25, 2004. [55] S. F. Cotter, B. D. Rao, K. Engan, and K. Kreutz-Delgado, “Sparse solutions to linear inverse problems with multiple measurement vectors,” IEEE Trans. Signal Proc., vol. 51, pp. 2477–2488, July 2005. [56] R. Puri and K. Ramchandran, “PRISM: A new robust video coding architecture based on distributed compression principles,” in Allerton Conf. Communication, Control, and Computing, (Monticello, IL), Oct. 2002. [57] R. Wagner, R. G. Baraniuk, and R. D. Nowak, “Distributed image compression for sensor networks using correspondence analysis and super-resolution,” in Data Comp. Conf. (DCC), Mar. 2000. [58] C. Weidmann and M. Vetterli, “Rate-distortion analysis of spike processes,” in Data Compression Conference (DCC), (Snowbird, UT), pp. 82–91, Mar. 1999. [59] D. Baron, M. Duarte, S. Sarvotham, M. B. Wakin, and R. G. Baraniuk, “Distributed compressed sensing,” Tech. Rep. TREE0612, Rice University, Houston, TX, Nov. 2006. Available at http://dsp.rice.edu/cs/. [60] R. D. Nowak. Personal Communication, 2006. [61] S. Sarvotham, M. B. Wakin, D. Baron, M. F. Duarte, and R. G. Baraniuk, “Analysis of the DCS one-stage greedy algoritm for common sparse supports,” Tech. Rep. TREE0503, Rice University, Houston, TX, Oct. 2005. Available at http://www.ece.rice.edu/∼ shri/docs/TR0503.pdf. [62] J. Tropp, “Algorithms for simultaneous sparse approximation. Part II: Convex relaxation,” Signal Proc., vol. 86, pp. 589–602, Mar. 2006. [63] D. Donoho and Y. Tsaig, “Extensions of compressed sensing,” Signal Proc., vol. 86, pp. 533–548, Mar. 2006. [64] J. D. Haupt and R. D. Nowak, “Signal reconstruction from noisy random projections,” IEEE Transactions on Information Theory, vol. 52, pp. 4036–4048, Sept. 2006. [65] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon limit error correcting coding and decoding: Turbo codes,” IEEE Int. Conf. Communications, pp. 1064–1070, May 1993. [66] R. G. Baraniuk, V. Cevher, M. F. Duarte, and C. Hegde, “Model-based compressive sensing,” 2008. Preprint. [67] P. T. Boufounos and R. G. Baraniuk, “Quantization of sparse representations,” Technical Report TREE0701, Rice University, Houston, TX, Mar. 2007. [68] A. K. Fletcher, S. Rangan, and V. K. Goyal, “On the rate-distortion performance of compressed sensing,” in IEEE Int. Conf. Acoustics, Speech, Signal Proc. (ICASSP), (Honolulu, HI), pp. III–885–888, Apr. 2007. [69] E. Cand`es, J. Romberg, and T. Tao, “Stable signal recovery from incomplete and inaccurate measurements,” Comm. on Pure and Applied Math., vol. 59, pp. 1207–1223, Aug. 2006. [70] P. T. Boufounos and R. G. Baraniuk, “1-bit compressive sensing,” in Conf. Information Sciences and Systems (CISS), (Princeton, NJ), pp. 16–21, Mar. 2008. [71] D. B. West, Introduction to Graph Theory. Prentice Hall, 1996.

42