Error Bounds for Maximum Likelihood Matrix Completion ... - CiteSeerX

Error Bounds for Maximum Likelihood Matrix Completion Under Sparse Factor Models Akshay Soni, Swayambhoo Jain, and Jarvis Haupt

Stefano Gonella

Department of Electrical and Computer Engineering University of Minnesota – Twin Cities Email: {sonix022,jainx174,jdhaupt}@umn.edu

Department of Civil, Environmental, and Geo-Engineering University of Minnesota – Twin Cities Email: [email protected]

Abstract—This paper examines a general class of matrix completion tasks where entry wise observations of the matrix are subject to random noise or corruption. Our particular focus here is on settings where the matrix to be estimated follows a sparse factor model, in the sense that it may be expressed as the product of two matrices, one of which is sparse. We analyze the performance of a sparsity-penalized maximum likelihood approach to such problems to provide a general-purpose estimation result applicable to any of a number of noise/corruption models, and describe its implications in two stylized scenarios – one characterized by additive Gaussian noise, and the other by highly-quantized one-bit observations. We also provide some supporting empirical evidence to validate our theoretical claims in the Gaussian setting. Index Terms—Complexity regularization, matrix completion, maximum likelihood, sparse estimation

I. I NTRODUCTION This work investigates a class of estimation tasks that entail fitting a collection of data, which we assume here to be arranged as a matrix, to a prescribed form of low-dimensional model, in settings where perhaps only a fraction of the data elements are observed, and where each corresponding observation is corrupted by some form of random noise. Our particular focus here is on structural models wherein the collection of vectors comprising the columns of the matrix to be inferred are well-approximated as vectors in some a priori unknown union of (nominally, low-dimensional) linear subspaces. Our investigation here is motivated by recent results establishing performance guarantees for matrix completion techniques employing low-rank modeling assumptions – here, we aim to establish performance guarantees for matrix completion methods that employ these more general types of low-dimensional structural models and for observations characterized by general noise/corruption models (i.e., beyond the commonly employed additive noise model). Here (and in the remainder of the paper) we denote by X∗ ∈ n1 ×n2 R an a priori unknown matrix whose entries we seek to estimate (the superscript asterisk is used to distinguish the “true” parameter of interest in our estimation task). Of particular interest to us here are cases where the unknown matrix X∗ admits a factorization of the form X∗ = D∗ A∗ , (1) where D∗ ∈ Rn1 ×r and A∗ ∈ Rr×n2 for some integer r ≤ n2 , and where the matrix A∗ is sparse in the sense that the number of nonzero elements of A∗ is much smaller than its dimension. In the remainder of the paper, we will refer to models of this form as sparse factor models. Further, for pragmatic reasons (motivated by natural constraints arising in applications), and also to facilitate our analysis, we assume the largest element of X∗ satisfies a maxnorm constraint, in the sense that there exists some finite Xmax > 0 ∗ such that kX∗ kmax , maxi,j |Xi,j | ≤ Xmax /2 (the factor of 1/2 is somewhat arbitrary, and is introduced to simplify analysis). Rather than acquire all of the elements of X∗ directly, we suppose here that we only observe X∗ at a subset of its locations, obtaining at

each a noisy version of the underlying matrix entry, where the “noise” is described via a probability density function (pdf) or probability mass function (pmf) for the corresponding observation. We let S ∈ [n1 ] × [n2 ] denote the set of locations at which observations are collected, where [n1 ] is shorthand for the set of nonnegative integers less or equal to n1 (and similarly for [n2 ]), and describe our collection of |S| measurements of X∗ in terms of a collection {Yi,j }(i,j)∈S , YS of conditionally (on S) independent random quantities. Formally, we write the joint pdf (or pmf) of the observations as Y ∗ (Yi,j ), pX∗S (YS ) , pXi,j (2) (i,j)∈S ∗ (Yi,j ) denotes the corresponding scalar pdf (or pmf), and where pXi,j we use the shorthand X∗S to denote the collection of elements of X∗ indexed by (i, j) ∈ S. In terms of the aforementioned model, our task here is described as follows: given S and corresponding noisy observations YS of X∗ distributed according to (2), estimate X∗ under the assumption that it admits a sparse factor model of the form (1) with A∗ sparse.

A. Our Contributions We address these matrix estimation tasks using the machinery of complexity-regularized maximum likelihood (ML) estimation. Our main contributions come in the form of estimation error guarantees applicable in settings where the available data correspond to an incomplete collection of noisy observations of elements of the underlying matrix (obtained at random locations), and under general (random) noise/corruption models. At a high level our main results establish an intuitive result, that the normalized per-element error of estimates obtained by penalized ML methods are upper bounded (up to constants and logarithmic factors) by the ratio between the number of “degrees of freedom” of the data in the prescribed model and the number of measurements obtained. We examine the implications of our results in two stylized settings – one where observations are corrupted by additive zero-mean Gaussian noises, and another where each observation is described in terms of a highly quantized (1-bit) noisy version of the corresponding matrix entry. B. Connections with Existing Works Our analysis here entails extension of a main result of [1], [2] for complexity regularized maximum likelihood estimation – which has been utilized in a number of works to establish error bounds for Poisson estimation problems using multi scale models [3], [4], transform domain sparsity models [5], and dictionary-based matrix factorization models [6] – to “missing data” scenarios inherent in matrix completion tasks. More generally, our effort is motivated in large part by a desire to extend the theoretical analyses in several recent works in matrix completion that employ low rank models (see, e.g., [7]–[10]) – especially noisy matrix completion problems (e.g.,

[11]–[13]) – to more general structural models, such as the sparse factors model we consider here. The factor models we employ here essentially enforce that each column of X∗ lie in a union of linear subspaces. In this sense our efforts here are also closely related to problems in subspace clustering [14], [15], sparse principal component analysis [16], and probabilistic principal component analyses using mixture models [17], as well as recent works on subspace clustering from missing data [18], [19], and “high rank” matrix completion [20], [21]. The sparse factor models we examine here are also at the heart of the emerging sparse modeling discipline known as dictionary learning [22]–[24], where several efforts have proposed algorithmic procedures for coping with missing data [25], [26]. Recently, several efforts in the dictionary learning literature have established theoretical guarantees on identifiability, as well as local correctness of a number of factorization methods [27]– [32], including in noisy settings [33]. C. Outline The remainder of this paper is organized as follows. In Section II we present our main result establishing estimation error guarantees for a general class of estimation problems characterized by incomplete and noisy observations, and we discuss implications of this result for two specific noise models. In Section III we provide a brief experimental investigation that partially validates our theoretical guarantees for the Gaussian noise setting. We provide a few brief conclusions in Section IV. We state our results here without proof; the full details will be made available in [34]. D. Some Information Theoretic Preliminaries To set the stage for the statement of our main result, we remind the reader of a few key concepts. First, when p(Y ) and q(Y ) denote the pdf (or pmf) of a real-valued random variable Y , the KullbackLeibler divergence (or KL divergence) of q from p is denoted D(pkq) and given by p(Y ) D(pkq) = Ep log q(Y ) where the logarithm is taken to be the natural log. By definition, D(pkq) is finite only if the the support of p is contained in the support of q. Further, the KL divergence satisfies D(pkq) ≥ 0 and D(pkq) = 0 when p(Y ) = q(Y ). We also use the Hellinger affinity denoted by A(p, q) and given by # "s # "s q(Y ) p(Y ) A(p, q) = Ep = Eq p(Y ) q(Y ) Note that A(p, q) ≥ 0 essentially by definition, and a simple application of the Cauchy-Schwarz inequality gives that A(p, q) ≤ 1, implying overall that 0 ≤ A(p, q) ≤ 1. ei,j of When p and q are parameterized by elements Xi,j and X e respectively, so that p(Yi,j ) = pX (Yi,j ) and matrices X and X, i,j q(Yi,j ) = qXei,j (Yi,j ), we use the shorthand notation D(pX kqX e) , P Q D(p ) and A(p , q ) , A(p , q ). kq X X X e e e i,j i,j i,j Xi,j i,j Xi,j X II. M AIN R ESULTS A. General Error Bound for Sparse Factor Matrix Completion Our main result establishes error bounds for sparse factor model matrix completion under general noise/corruption models and random sampling models. Here, our approach will be to estimate X∗ via sparsity-penalized maximum likelihood methods; specifically, we consider estimates of the form b = arg X

min

X=DA∈X

{− log pXS (YS ) + λ · kAk0 } ,

(3)

where λ > 0 is a user-specified regularization parameter, XS is shorthand for the collection {Xi,j }(i,j)∈S of entries of X indexed by S, and X is an appropriately constructed class of candidate estimates. To facilitate our analysis here, we take X to be a countable class of estimates constructed as follows: first, for a specified β ≥ 1, β we set Llev = 2dlog2 (n1 ∨n2 ) e (where, (n1 ∨ n2 ) = max{n1 , n2 }) and construct D to be the set of all matrices D ∈ Rn1 ×r whose elements are discretized to one of Llev uniformly-spaced values in the range [−1, 1] and A to be the set of all matrices A ∈ Rr×n2 whose elements either take the value zero, or are discretized to one of Llev uniformly-spaced values in the range [−Amax , Amax ]. Here we assume Amax < (n1 ∨ n2 ) for convenience. Then, we let X 0 , {X = DA : D ∈ D, A ∈ A, kXkmax ≤ Xmax } ,

(4)

and take X to be any subset of X 0 . Notice that this overall model comprises (a discretization of) the class of max-norm constrained rank-r matrices; similar models (albeit without the discretization we employ here) were considered in [35] for the low-rank matrix completion problem. Our main result may be stated as follows. Theorem II.1. Let the sample set S be drawn from the independent Bernoulli model with γ = m(n1 n2 )−1 as described above, and let YS be described by (2). If CD is any constant satisfying ∗ kpX CD ≥ maxX∈X maxi,j D(pXi,j ), then we have that for any i,j 2CD choice λ ≥ 2 · (β + 2) · 1 + 3 · log(n1 ∨ n2 ), the complexity penalized maximum likelihood estimator (3) satisfies the (normalized, per-element) error bound ES,YS −2 log A(pX 8CD log m b λ , pX∗ ) (5) ≤ n m 1 n2 D(pX∗ kpX ) + + 3 min X∈X n1 n2 4CD (β + 2) log(n1 ∨ n2 ) n1 r + kAk0 λ+ . 3 m Before proceeding to specific examples, we note a few salient points about this result. First, as alluded above, our result is not specific to any one observation model; thus, our general result will allow us to analyze the error performance of sparse factor matrix completion methods under a variety of different noise or corruption models. Our results are somewhat “plug and play” in that specialization to a particular noise model requires calculation of the corresponding information theoretic quantities (as well as CD ). Next, we note that here, as is often the case with analyses of penalized empirical risk minimization strategies for sparse inference, the estimation strategies prescribed by our analysis may not be computationally tractable. As written, our estimators would require solving a combinatorial optimization, because of the `0 penalty, as well as the optimization over the discrete set X . However, inference in the sparse factor models we consider here is fundamentally challenging since these inference problems as written, are not (jointly) convex optimizations in the matrix factors. Overall, our results here may be interpreted as benchmarks or guidelines for the performance of sparse factor matrix completion under different corruption models. Below we examine the implications of Theorem II.1 in two stylized scenarios. B. Implications for Specific Noise Models In this section we consider the implications of Theorem II.1 in two unique scenarios, characterized by additive Gaussian noise, and quantized (one-bit) observations characterized by any of a number of link functions. In each case, our aim is to identify the scaling

behavior of the estimation error as a function of the key problem parameters. To that end, we consider for each case the fixed choice log(8rAmax /Xmax ) β = max 1, 1 + (6) log(n1 ∨ n2 ) for describing the number of discretization levels in the elements of each of the matrix factors. Then, for each scenario (characterized by its own unique likelihood model) we consider a specific choice of X , and one specific estimate obtained according to (3) with 2CD λ = 2(β + 2) 1 + · log(n1 ∨ n2 ), (7) 3 (where CD depends on the particular likelihood model), and simplify the resulting oracle bounds for sparse factors. In what follows, we will make use of the fact that our assumption Xmax ≥ 1 implies β = O (log(r ∨ Amax )/ log(n1 ∨ n2 )), and so (β + 2) log(n1 ∨ n2 ) = O (log(n1 ∨ n2 )) ,

(8)

on account of the fact that r < n2 and Amax < (n1 ∨ n2 ) by assumption. 1) Implications for Additive Gaussian Noise Models: We first examine the implications of Theorem II.1 in a setting where observations are corrupted by independent additive zero-mean Gaussian noise with known variance σ 2 . In this case, we have that the observations YS are distributed according to a multivariate Gaussian density of dimension |S| whose mean corresponds to the collection of matrix parameters at the sample locations, and with covariance matrix σ 2 I|S| , where I|S| is the identity matrix of dimension |S|. That is, 1 1 ∗ 2 pX∗S (YS ) = exp − kY − X k , (9) S S F 2σ 2 (2πσ 2 )|S|/2 where we have shorthand notation kYS − P used the representative ∗ 2 X∗S k2F , (i,j)∈S (Yi,j − Xi,j ) . In this setting we have the following result. Corollary II.1 (Sparse Factor Matrix Completion with Gaussian Noise). Let β be as in (6), let λ be as in (7) with CD = 2X2max /σ 2 , b obtained via (3) satisfies and let X = X 0 . The estimate X i h b 2F ES,YS kX∗ − Xk = n1 n2 n1 r + kA∗ k0 O (σ 2 + X2max ) log(n1 ∨ n2 ) (10) m when A∗ is exactly sparse, having kA∗ k0 nonzero elements. A few comments are in order regarding these error guarantees. First, we note that our analysis provides some useful (and intuitive) understanding of how the estimation error decreases as a function of the number of measurements obtained, as well as the dimension and sparsity parameters associated with the matrix to be estimated. Consider, for instance, the case when A∗ is sparse and where log m < n1 r + kA∗ k0 (which should often be the case, since log(m) ≤ log(n1 n2 )). In this setting, our error bound shows that the dependence of the estimation error on the dimension (n1 , n2 , p) and sparsity (kA∗ k0 ) parameters, as well as the (nominal) number ∗ k0 of measurements m is n1 r+kA log(n1 ∨ n2 ). We may interpret m ∗ the quantity n1 r + kA k0 as the number of degrees of freedom in the matrix X∗ to be estimated, and in this sense we see that the error rate of the penalized maximum likelihood estimator exhibits characteristics of the well-known parametric rate (modulo the logarithmic factor).

We also comment briefly on the presence of the X2max term present in the error bounds. Readers familiar with the literature on noisy matrix completion under low rank assumptions are aware that various forms of “incoherence” assumptions have been utilized to date, and the form of the resulting error bounds depend on the particular type of assumption employed. For example, The work [12] examines matrix completion problems in additive noise settings but utilizes a different form of incoherence assumption formulated in terms of the “spikiness” of the matrix to be estimated (and quantified in terms of the ratio between the max norm and Frobenius norm of X∗ ). There, the estimation approach entails optimization over a set of candidates that each satisfy a “spikiness” constraint, and the resulting bounds do exhibit a dependence on the max-norm of the matrix to be estimated (similar to here). A more direct comparison to our result here is [35], which considers matrix completion problems characterized by entry-wise observations obtained at locations chosen uniformly at random (with replacement), each of which may be modeled as corrupted by independent additive noise, and estimates obtained by nuclear norm penalized estimators over a class of max-norm and rank-constrained matrices (similar to the sets over which we optimize here). Casting the results of that work (specifically, [35, Corollary 2]) to the setting we consider here, we observe that those results imply rank-r matrices may be accurately estimated in the sense that b 2F kX∗ − Xk (n1 ∨ n2 ) r ≤ c (σ ∨ Xmax )2 log(n1 + n2 ) n1 n2 m (n1 + n2 )r ≤ c0 (σ 2 + X2max ) log(n1 ∨ n2 ) m with high probability, where c, c0 are positive constants. Comparing this last result with our result (10), we see that our guarantees exhibit the same effective scaling with the max-norm bound Xmax , but can have an (perhaps significantly) improved error performance in the case where kA∗ k0 n2 r – precisely what we sought to identify by considering sparse factor models in our analyses. The two bounds roughly coincide in the case where A∗ is not sparse, in which case we may take kA∗ k0 = n2 r in our error bounds. 2) Implications for Quantized (One-bit) Observation Models: We also examine a kind of “resource constrained” scenario where entrywise observations of the matrix are constrained to a single bit each. Formally, given a sampling set S we suppose that our observations are conditionally (on S) independent random variables described by ∗ Yi,j = 1{Zi,j ≥0} for (i, j) ∈ S where Zi,j = Xi,j − Wi,j , and the {Wi,j }i∈[m],j∈[n] are some iid continuous zero-mean real scalar random variables having probability density function and cumulative distribution function f (w) and F (w), respectively, for w ∈ R. Note that in this model we assume that the individual noise realizations {Wi,j }(i,j)∈S are unknown, but we assume that the noise distribution is known. It is also worth noting that the cdf F (·) that we specify here could be replaced by any of a number of commonly-used link functions. For example, choosing F (x) to be the cdf of a standard Rx 2 Gaussian random variable, F (x) = Φ(x) = −∞ √12π e−t /2 dt, gives rise to the well-known probit model, while taking F (x) to be the logistic function, F (x) = 1+e1−x , leads to the logit regression model. We may interpret the observation models as above essentially as quantized noisy versions of the true matrix parameters. The minus sign on the Wi,j ’s is merely a modeling convenience here, and is intended to simplify the notation. Under this model, it is easy to see that each Yi,j is a Bernoulli random variable whose parameter

(i,j)∈S

We will also assume here F (Xmax ) < 1 and F (−Xmax ) > 0; it follows that the true Bernoulli parameters (as well as the Bernoulli parameters associated with candidate estimates X ∈ X ) are bounded away from 0 and 1. Defining ! ! 1 2 cF,f,Xmax , sup · sup f (t) , |t|≤Xmax F (t)(1 − F (t)) |t|≤Xmax 2

f (t) and c0F,f,Xmax , inf |t|≤Xmax F (t)(1−F , and specializing the (t)) results of Theorem II.1 to this setting we obtain the following result.

3

b− X ∗ k 2F kX n 1n 2

4

0.5

log10

is related to the true parameter through the cumulative distribution function F . Specifically, note that for any fixed (i, j) ∈ S, we have ∗ ∗ that Pr(Yi,j = 1) = Pr(Wi,j ≤ Xi,j ) = F (Xi,j ). Thus, in this |S| scenario, we have that YS ∈ {0, 1} and Y ∗ Yi,j ∗ 1−Yi,j pX∗S (YS ) = F (Xi,j ) 1 − F (Xi,j ) . (11)

0

−0.5

−1

−1.5

−0.6

−0.4 −0.2 log10(γ )

0

Fig. 1: Normalized mean-square error comparison of low-rank Matrix Completion by the Fixed Point Continuation method (• markers) and sparse factor matrix completion ( markers). For γ ≥ 0.3, the error of sparse factor completion exhibits a slope of approximately −1, on this log-log plot, providing some empirical validation of (10).

(at least up to leading factors) to be tighter for this case where X∗ is exactly low-rank (e.g., setting kA∗ k0 = n2 r in our bound). That said, our approach also enjoys the benefit of having the rank Corollary II.2 (Sparse Factor Matrix Completion from One-bit (or an upper bound for it) be known, while the procedure in [13] Observations). Let β be as in (6), let X = X 0 , and let pX∗S be assumes only a bound on the nuclear norm of the unknown matrix. of the form in (11) with F (Xmax ) < 1 and F (−Xmax ) > 0 and let 2 b obtained Overall, the establishment of general minimax estimation error rates λ be as in (7) with CD = 2cF,Xmax Xmax . The estimate X for matrix completion in sparse factor models and for the several via (3) satisfies various likelihood models we consider here is (to our knowledge) i h ! ∗ 2 still an open problem – we defer that pursuit to a future effort. b ES,YS kX − XkF 1 cF,Xmax 2 = O + X max III. E XPERIMENTAL E VALUATION – G AUSSIAN N OISE n1 n2 c0F,Xmax cF,Xmax ∗ In this section we briefly provide some experimental evidence n1 r + kA k0 · log(n1 ∨ n2 ) , (12) validating the error performance of sparse factor matrix completion m under a Gaussian corruption model. For this example, the underlying ∗ when A is exactly sparse, having kA∗ k0 nonzero elements. matrix X∗ was such that n1 = 50, n2 = 400, and r = 10. Each ∗ Unlike in the Gaussian case, the KL divergence and negative column of the sparse matrix A had 2 nonzero elements, whose were uniformly generated values in [1, 3]; entries of the log Hellinger affinities in this case do not directly take the form amplitudes ∗ D matrix were iid Gaussian random variables with unit variance, of squared differences between entries of the underlying matrices, though here we present an error bound in terms of the mean-square thresholded to satisfy the range constraints that the elements be in error. To obtain error guarantees expressible in terms of the squared [−1, 1]. The standard deviation of the observation noise was σ = 0.5. The results in Figure 1 depict an error comparison between a lowerrors, our analysis makes use of intermediate results from [36] and rank matrix completion method (using the Fixed Point Continuation, [13] to obtain bounds for each of these quantities in terms of squared differences between the underlying matrix parameters. Additional or FPC, method of [37]) and a sparse factor completion method (using an algorithmic approach based on convex relaxation of the `0 term, details are provided in [34]. and using an alternating minimization approach over D and A where Finally, we briefly compare our results with the results of [13] each is solved using an accelerated first order method along the lines for low-rank matrix completion from one-bit observations. In that m work, the authors consider maximum-likelihood optimizations over of [38]) as a function of the downsampling parameter γ = n1 n2 ∈ [0, 1]. Here, the results for the FPC method are best possible in the a (convex) set of max-norm and nuclear-norm constrained matrices, sense that we (clairvoyantly) selected the regularization parameter and show that the estimates so-obtained satisfy yielding the best estimate for each γ; in contrast, the results for ! r b 2F (n1 + n2 )r kX∗ − Xk the sparse factor matrix completion approach are for one judiciously (13) = O CF,Xmax Xmax selected regularization parameter (λ0 = 2.1). Not only does the the n1 n2 m error of the sparse factor completion method exhibits the predicted with high probability, where CF,Xmax is a parameter that depends error decay, leveraging sparsity (when present) allows the method to on the max-norm constraint cdf F (·) and pdf f (·), somewhat anal- outperform other low-rank matrix completion methods. ogously to the leading factor of (cF,Xmax /c0F,Xmax ) in our bounds. IV. C ONCLUSIONS It is interesting to note a main qualitative difference between that result and ours. For concreteness, let us consider the case where We have established a general result for noisy matrix completion A∗ is not sparse, so that we may set kA∗ k0 = n2 r in (12). In under various noise/corruption models, and described its implication that case, it is easy to see that the overall estimation error behavior in two stylized scenarios. Full versions of the proofs, as well as predicted by our bound (12) scales in proportion to ratio between the extensions and applications to other noise models (e.g., heavy-tailed number of degrees of freedom ((n1 + n2 )r) and the nominal number noise and Poisson distributed observations) are reported in [34]. of measurements m, while the bound in [13] scales according to the ACKNOWLEDGMENT square root of that ratio. The authors of [13] proceed to show that JH graciously acknowledges partial support from the NSF EARS the estimation error rate they obtain is minimax optimal over their set of candidate estimates; on the other hand, our bound appears Program, under Award No. AST-1247885.

R EFERENCES [1] J. Q. Li, Estimation of Mixture Models, Ph.D. thesis, Dept. of Statistics, Yale University, New Haven, CT, 1999. [2] A. Barron, L. Birgé, and P. Massart, “Risk bounds for model selection via penalization,” Probability theory and related fields, vol. 113, no. 3, pp. 301–413, 1999. [3] E. D. Kolaczyk and R. D. Nowak, “Multiscale likelihood analysis and complexity penalized estimation,” Annals of Statistics, pp. 500–527, 2004. [4] R. M. Willett and R. D. Nowak, “Multiscale Poisson intensity and density estimation,” IEEE Transactions on Information Theory, vol. 53, no. 9, pp. 3171–3187, 2007. [5] M. Raginsky, R. Willett, Z. T. Harmany, and R. F. Marcia, “Compressed sensing performance bounds under Poisson noise,” IEEE Transactions on Signal Processing, vol. 58, no. 8, pp. 3990–4002, 2010. [6] A. Soni and J. Haupt, “Estimation error guarantees for Poisson denoising with sparse and structured dictionary models,” in Proc. International Symposium on Information Theory, 2014, to appear. [7] E. J. Candès and B. Recht, “Exact matrix completion via convex optimization,” Foundations of Computational Mathematics, vol. 9, no. 6, pp. 717–772, 2009. [8] E. J. Candès and T. Tao, “The power of convex relaxation: Near-optimal matrix completion,” IEEE Transactions on Information Theory, vol. 56, no. 5, pp. 2053–2080, 2010. [9] R. H. Keshavan, A. Montanari, and S. Oh, “Matrix completion from a few entries,” IEEE Transactions on Information Theory, vol. 56, no. 6, pp. 2980–2998, 2010. [10] W. Dai and O. Milenkovic, “SET: An algorithm for consistent matrix completion,” in Proc. IEEE International Conference on Acoustics Speech and Signal Processing, 2010, pp. 3646–3649. [11] E. J. Candès and Y. Plan, “Matrix completion with noise,” Proceedings of the IEEE, vol. 98, no. 6, pp. 925–936, 2010. [12] S. Negahban and M. J. Wainwright, “Restricted strong convexity and weighted matrix completion: Optimal bounds with noise,” The Journal of Machine Learning Research, vol. 13, no. 1, pp. 1665–1697, 2012. [13] M. A. Davenport, Y. Plan, E. van den Berg, and M. Wootters, “1-bit matrix completion,” arXiv preprint arXiv:1209.3672, 2012. [14] P. Tseng, “Nearest q-flat to m points,” Journal of Optimization Theory and Applications, vol. 105, no. 1, pp. 249–252, 2000. [15] Y. Ma, H. Derksen, W. Hong, and J. Wright, “Segmentation of multivariate mixed data via lossy data coding and compression,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 9, pp. 1546–1562, 2007. [16] H. Zou, T. Hastie, and R. Tibshirani, “Sparse principal component analysis,” Journal of computational and graphical statistics, vol. 15, no. 2, pp. 265–286, 2006. [17] M. E. Tipping and C. M. Bishop, “Mixtures of probabilistic principal component analyzers,” Neural computation, vol. 11, no. 2, pp. 443–482, 1999. [18] L. Balzano, B. Recht, and R. Nowak, “High-dimensional matched subspace detection when data are missing,” in Proc. IEEE International Symposium on Information Theory, 2010, pp. 1638–1642. [19] L. Balzano, A. Szlam, B. Recht, and R. Nowak, “k-subspaces with missing data,” in Proc. IEEE Statistical Signal Processing Workshop, 2012, pp. 612–615. [20] B. Eriksson, L. Balzano, and R. Nowak, “High-rank matrix completion and subspace clustering with missing data,” arXiv preprint arXiv:1112.5629, 2011. [21] A. Singh, A. Krishnamurthy, S. Balakrishnan, and M. Xu, “Completion of high-rank ultrametric matrices using selective entries,” in Proc. IEEE International Conference on Signal Processing and Communications, 2012, pp. 1–5. [22] B. A. Olshausen and D. J. Field, “Sparse coding with an overcomplete basis set: A strategy employed by V1?,” Vision Research, vol. 37, pp. 3311–3325, 1997. [23] M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Trans. Signal Proc., vol. 54, no. 11, pp. 4311–4322, 2006. [24] J. Mairal, F. Bach, J. Ponce, and G. Sapiro, “Online dictionary learning for sparse coding,” in Proc. ICML, 2009. [25] Z. Xing, M. Zhou, A. Castrodad, G. Sapiro, and L. Carin, “Dictionary learning for noisy and incomplete hyperspectral images,” SIAM Journal on Imaging Sciences, vol. 5, no. 1, pp. 33–56, 2012.

[26] M. Zhou, H. Chen, J. Paisley, L. Ren, L. Li, Z. Xing, D. Dunson, G. Sapiro, and L. Carin, “Nonparametric bayesian dictionary learning for analysis of noisy and incomplete images,” IEEE Transactions on Image Processing, vol. 21, no. 1, pp. 130–144, 2012. [27] M. Aharon, M. Elad, and A. M. Bruckstein, “On the uniqueness of overcomplete dictionaries, and a practical way to retrieve them,” Linear algebra and its applications, vol. 416, no. 1, pp. 48–67, 2006. [28] R. Gribonval and K. Schnass, “Dictionary identification – Sparse matrixfactorization via l1 minimization,” IEEE Transactions on Information Theory, vol. 56, no. 7, pp. 3523–3539, 2010. [29] Q. Geng, H. Wang, and J. Wright, “On the local correctness of `1 minimization for dictionary learning,” Submitted, 2011, online at: arxiv. org/abs/1101.5672. [30] D. A. Spielman, H. Wang, and J. Wright, “Exact recovery of sparselyused dictionaries,” in Proceedings of the Twenty-Third international joint conference on Artificial Intelligence, 2013, pp. 3087–3090. [31] K. Schnass, “On the identifiability of overcomplete dictionaries via the minimisation principle underlying K-SVD,” Submitted, 2013, online at: arxiv.org/abs/1301.3375. [32] A. Agarwal, A. Anandkumar, P. Jain, P. Netrapalli, and R. Tandon, “Learning sparsely used overcomplete dictionaries via alternating minimization,” arXiv preprint arXiv:1310.7991, 2013. [33] R. Jenatton, R. Gribonval, and F. Bach, “Local stability and robustness of sparse dictionary learning in the presence of noise,” Submitted, 2012, online at: arxiv.org/abs/1210.0685. [34] A. Soni, S. Jain, J. Haupt, and S. Gonella, “Noisy matrix completion under sparse factor models,” in preparation, 2014, “unpublished”. [35] V. Koltchinskii, K. Lounici, and A. B. Tsybakov, “Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion,” The Annals of Statistics, vol. 39, no. 5, pp. 2302–2329, 2011. [36] J. D. Haupt, N. D. Sidiropoulos, and G. B. Giannakis, “Sparse dictionary learning from 1-bit data,” in Proc. International Conference on Acoustics, Speech and Signal Processing, 2014. [37] S. Ma, D. Goldfarb, and L. Chen, “Fixed point and Bregman iterative methods for matrix rank minimization,” Mathematical Programming, vol. 128, no. 1-2, pp. 321–353, 2011. [38] A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM Journal on Imaging Sciences, vol. 2, no. 1, pp. 183–202, 2009.

Error Bounds for Maximum Likelihood Matrix Completion ... - CiteSeerX

Error Bounds for Maximum Likelihood Matrix Completion ... - CiteSeerX

Suggest Documents

Finite-sample risk bounds for maximum likelihood estimation ... - arXiv

Bounds for the asymptotic normality of the maximum likelihood ...

Bounds for the asymptotic normality of the maximum likelihood ...

Performance Bounds for Maximum Likelihood Detection of ... - Core

Maximum-Likelihood Template Matching - CiteSeerX

Generalized Maximum-Likelihood Sequence Detection for ... - CiteSeerX

Reduced lists of error patterns for maximum likelihood soft decoding ...

On Survivor Error Patterns For Maximum Likelihood Soft ... - IEEE Xplore

Bit Error Probability for Maximum Likelihood Decoding of Linear Block ...

Maximum-Likelihood Template Matching - CiteSeerX

Numerical Techniques for Maximum Likelihood Estimation ... - CiteSeerX

Maximum likelihood

Computation of Error Bounds for P-matrix Linear ... - PolyU

MAXIMUM LIKELIHOOD ESTIMATION The maximum

Penalized Maximum Likelihood Principle for

Maximum-likelihood parameter estimation for

Probabilistic Maximum Error Modeling for Unreliable ... - CiteSeerX

Targeted Maximum Likelihood Estimation for

Maximum Likelihood Estimation of ARMA Model with Error ... - Core

Symbol Error Rates of Maximum-Likelihood Detector: Convex ... - arXiv

Maximum Likelihood Channel Parameter Estimation from ... - CiteSeerX

Maximum Penalized Likelihood Estimation of Finite ... - CiteSeerX

Maximum Penalized Likelihood Estimation of Finite ... - CiteSeerX

Maximum Simulated Likelihood Estimation with Spatially ... - CiteSeerX