Certain uncertainty: using pointwise error estimates in super-resolution microscopy ´ c§, , Elias Amselem, and Johan Elf∗ Martin Lind´en§, , Vladimir Curi´
arXiv:1607.04675v1 [physics.bio-ph] 15 Jul 2016
Department of Cell and Molecular Biology, Uppsala University, Sweden.§ Contributed equally. (Dated: July 19, 2016) Point-wise localization of individual fluorophores is a critical step in super-resolution microscopy and single particle tracking. Although the methods are limited by the accuracy in localizing individual flourophores, this point-wise accuracy has so far only been estimated by theoretical best case approximations, disregarding for example motional blur, out of focus broadening of the point spread function and time varying changes in the fluorescence background. Here, we show that pointwise localization uncertainty can be accurately estimated directly from imaging data using a Laplace approximation constrained by simple mircoscope properties. We further demonstrate that the estimated localization uncertainty can be used to improve downstream quantitative analysis, such as estimation of diffusion constants and detection of changes in molecular motion patterns. Most importantly, the accuracy of actual point localizations in live cell super-resolution microscopy can be improved beyond the information theoretic lower bound for localization errors in individual images, by modeling the fluorophores’ movement and accounting for their point-wise localization uncertainty.
Super-Resolution fluorescence microscopy and live cell single particle tracking rely on computer intensive data analysis to find and localize single fluorescent emitters in noisy images. Much effort has been spent on developing and testing efficient spot localization algorithms [1] and understanding the theoretical limits for localization accuracy [2–5]. However, the problem of estimating and using the actual accuracy is still unsolved. PALM/STORM type super-resolution imaging [6, 7] relies on the serial activation and localization of sparse photo-switchable fluorophores. Knowledge about the localization accuracy is important to build up a high resolution image since uncertain points will only contribute blur. Often, only the number of photons, pixel size, and background noise for each emitter is used to estimate the localization accuracy, assuming that it achieves its theoretical limit. However, theoretical estimates neglect many important factors, and are prone to systematic er-
FIG. 1. Simulated images of a diffusing fluorophore with varying localization accuracy. (a) Fluorescent spots under different imaging conditions, where N is the number of photons per spot and b is the background level of photons per pixel; (b) Representative frames from a simulated movie with b = 1, N = 150. Images were simulated using SMeagol [8].
∗
[email protected]
rors in particular when the background is variable and the emitter is moving, which is the common situation for live cell super-resolution imaging. Knowledge of the localization uncertainty is also important in single particle tracking (SPT) [9], where it can be used to improve estimators of diffusion constants [10, 11]. It is common in live cell imaging that the localization uncertainty varies throughout an experiment, for example due to out-of-focus motion, drift, motion blur, fluorophore intensity fluctuations, heterogeneous background, or gradual photobleaching of the background or labeled molecule. Examples of heterogeneous and timevarying spot quality are shown in Fig. 1. Here, we investigate methods to extract and use localization uncertainty of single dots in super-resolved single particle tracking, using a combination of experimental data and highly realistic simulated microscopy experiments [8]. We propose and characterize an uncertainty estimator based on the Laplace approximation, combined with information about physical limitations in the detection system. This method outperforms the common practice of combining maximum-likelihood localization with a Gaussian point-spread function (PSF) model, and the Cram´er-Rao lower bound (CRLB), which systematically underestimates the uncertainty in low light conditions relevant for live cell applications. Second, we demonstrate how estimated localization uncertainties can be used to improve the estimation of diffusion constants, particle positions, and state changes in single- and multi-state diffusion SPT data. For the multi-state case, we derive a variational Expectation Maximization (EM) algorithm for a diffusive Hidden Markov model (HMM) which extends previously described algorithms [10–16] by accounting for both multistate diffusion, localization uncertainty, and motion blur.
2
FIG. 2. Actual and estimated errors p in simulated data. (a) Root mean square (RMS) errorp h(µest. − µtrue )2 i vs. root mean estimated error variance hε2 i for MLE localization from the CRLB and Laplace approximations. (b) PSF width prior and fitted parameter distributions with (MAP) and without (MLE) prior. (c) Actual and estimated errors for MAP localization. (d) Probability plot of point-wise errors normalized√by point-wise Laplace uncertainty estimates, (µest. − µtrue )/ ε2 . The dashed line indicates the reference standard Gaussian distribution.
RESULTS Estimating point-wise uncertainty
Estimating uncertainty is closely related to estimating positions, where the maximum likelihood estimate (MLE) is generally considered the optimal method. A maximum likelihood method starts with a likelihood function, i.e., the probability density function of a probabilistic model for generating images of spots (pixel counts in a small region around a spot) with the emitter position among the adjustable parameters. The MLE are the parameters that maximize the likelihood function for a particular spot image. Following common practice, we model EMCCD camera noise with the high-gain approximation and Gaussian readout noise [4, 17], and the spot shape by a symmetric Gaussian intensity profile plus a constant background intensity. The fit parameters are thus spot position (µx , µy ), background intensity b, PSF width σ, and spot amplitude N (see Methods, Eq. (1)), while the camera noise is assumed known from calibration. What is localization uncertainty? The localization error is the difference µest. − µtrue between estimated and
true positions. By localization uncertainty, we seek the distribution of the error, either in a Bayesian posterior sense, or in the sense of repeated localizations of spots with the same uncertainty. The uncertainty of the localization is related to the shape of the likelihood maximum: a sharply peaked maximum means that only a narrow set of parameters are likely, while a shallow maximum means greater uncertainty. The Cram´er-Rao lower bound (CRLB) is the smallest possible variance of an unbiased estimator for a given set of model parameters, and is related to the expected sharpness of the likelihood maximum (see Methods, Eq. (4)). While this is strictly speaking not a statement about a single image, but rather about the average information content of data generated by a model, it is often used to estimate localization uncertainty. A Bayesian alternative is the Laplace approximation [18], which derives an approximate Gaussian posterior distribution for the fit parameters in terms of the sharpness of the likelihood maximum for each particular image (see Methods, Eq. (5)). Both estimators supply an estimated variance ε2 , but none of them are well characterized as estimators of localization uncertainty. To test these estimators, we analyzed a large set of simulated movies of a fluorescent particle diffusing at D=1 µm2 s−1 in an E. coli-like geometry. The movies cover a broad range of experimentally relevant imaging conditions (see Methods) and include realistic EMCCD noise, background fluorescence, a non-Gaussian and space-dependent PSF [19], and motion blur [20]. A basic consistency
2 check is that the average estimated error variance, the variance of the ac ε , agrees with tual errors, (µest. − µtrue )2 . In Fig. 2a, we compare the square root of these quantities for different imaging conditions, based on MLE localizations combined with either CRLB or Laplace uncertainty estimators. We obtain consistency under good imaging conditions, where the spots are bright and the average errors low. However, as conditions worsen and the errors increase, the uncertainty is underestimated, especially by the CRLB. There are several possible reasons for this behavior. The Laplace approximation is based on a truncated Taylor expansion of the log likelihood (see Methods). The CRLB is strictly not a point-wise error estimator at all, and is further defined in terms of the true parameter values, although by necessity evaluated with the fitted ones. The simplified Gaussian PSF model performs well for localization [4], but this does not guarantee good uncertainty estimates. Any of these approximations might fail in noisy or low light conditions. Another possibility is that localizations under poor imaging conditions are corrupted by a sub-population of fits that converge to a local minimum that does not reflect the underlying spot shape, e.g., fitting a single bright pixel as a very narrow spot, or misinterpreting the PSF shoulders as background or vice versa. To address the latter complication, we constrained the localizations using prior distributions on selected param-
3 eters, thus replacing MLE with maximum a aposteriori estimation (MAP). Fixing parameter values is not suitable here, since both the size and shape of spots fluctuate due to fluorophore motion and varying imaging conditions. Furthermore, it is experimentally easier to obtain independent information about the background and PSF width than about the spot intensity. We therefore limit our attention to background and PSF width, and found the following priors to perform well: a log-normal prior centered on the true value for the background intensity, and a skewed log-normal [21] prior for the PSF width to penalize fits with unphysical widths below that of an in-focus spot. The PSF width prior together with the distribution of fitted PSF widths are shown in Fig. 2b. PSF widths below 1 pixel (80 nm) are virtually eliminated in the MAP fits. Background and spot amplitudes (see Fig. S1) are shifted somewhat downwards and upwards, respectively. As seen in Fig. 2c, this substantially improves the agreement between true and estimated errors under all imaging conditions. The Laplace estimator still outperforms the CRLB by a small margin, and numerical experiments on a wider range of priors (see SI Fig. S2) further confirm that the Laplace estimator is more robust to non-optimal priors than the CRLB. How does the prior help? A direct comparison of the true errors for the MLE and MAP fits (SI Fig. S3) reveals very small differences, indicating that the improvement is mainly due to improved uncertainty estimates. This is consistent with the theoretical observation that the Fisher information matrix for the localization problem is nearly block-diagonal [4] with very weak coupling between two groups of fitting parameters: positions (µx , µy ) and shape parameters (N, b, σ). Thus, additional information about one of these groups does not reduce the errors of the other significantly. For uncertainty estimation, this is not the case: information about the background and PSF shape does help in estimation position uncertainty. While mean-square errors are useful, we are ultimately interested in the full distribution of errors. In particular, most [10–16, 22] (but not all [23]) statistical models of SPT data assume Gaussian errors, but this assumption has not been tested. The Laplace approximation (Eq. (5)) is a Gaussian approximation. If it was exact, the errors normalized by the estimated standard deviation would be Gaussian with unit variance, and produce a straight line in the Gaussian probability plot of Fig. 2d. The actual normalized errors based on MAP localizations is more Gaussian than the MLE ones, and follow the straight line for almost four standard deviations. This shows that the improved consistency of the MAP estimates also translates to more Gaussian error distributions. Overall, these results show that point-wise uncertainty estimation using the Laplace approximation works well in a wide range of experimentally relevant conditions, but that experimentally accessible additional information
about background and PSF shape is necessary to get consistent results in low light conditions. Moreover, we see good support for the common assumption of Gaussian errors.
Validation on real data
To test the above conclusions on real data, we imaged immobilized fluorescent beads, alternating strong and weak excitation as shown in Fig. 3a. We used images under strong excitation conditions to extract a high accuracy ground truth for testing the uncertainty estimates in the dim images. We estimated the position and uncertainty of spots using the MAP estimation described above, with a background prior centered around the mean background (0.8 photons/pixel) seen in dim frames. A drift-corrected ground truth was estimated by cubic spline interpolation between the mean positions obtained from each block of 10 consecutive bright images. Since the intensity differs by about a factor 10 between bright and dim frames, the RMS errors of the ground truth should be approximately 10-fold lower than that of a single dim spot. Fig. 3c shows the resulting erroruncertainty comparison, with every point corresponding to a single bead. It reproduces the behavior on simulated images in Fig. 2b, thus confirming our conclusion that the Laplace approximation is preferable to CRLB for uncertainty estimators, and that the good performance of our proposed PSF width prior is not limited to that particular simulated data set.
Diffusion constants
Next, we consider how estimated localization uncertainties leads to better estimates of diffusion constants, arguably the most common analysis of single particle tracking data. We divided the synthetic data shown in Fig. 1 into trajectories of length 10, estimated positions and uncertainties with the MAP and Laplace estimators described above, and estimated diffusion constants using the covariance-based estimators of Ref. [10] with and without the use of uncertainty estimates. Fig. 4 shows the mean value and 1% quantiles of the two estimators applied to every imaging condition, plotted against the signal-to-noise ratio. As expected, the use of estimated uncertainties improves the variability of the diffusion estimates substantially. The covariance-based estimators only use the average uncertainty in each trajectory (see Methods). We also implemented a maximum likelihood estimator for the diffusion constant [11] which makes explicit use of the point-wise uncertainties (see SI text S3), but found no further improvement.
4
FIG. 3. Validating estimators of localization uncertainty using real data. (a) Intensity in different frames for two different beads and the background. Excitation intensity indicated by yellow background. (b) Image examples. Top: background in high intensity frames. Middle: Bead 1 in a high intensity frame. Bottom: Bead 1 in a low intensity frame. (c) Actual and estimated RMS errors for different beads.
FIG. 4. Estimated diffusion constants vs. signal-to-noise ratio p SN R = D∆t/ hε2 i [10] in simulated 10-step trajectories, mean value and 1% quantiles.
Analysis of multi-state data
We now turn to a more challenging problem where point-wise errors do matter: data where both the diffusion constant and localization error change significantly on similar time scales. In single particle tracking, changes in diffusion constant can be used as a non-invasive reporter on intracellular binding and unbinding events [24]. However, diffusive motion and localization errors contribute additively to the observed step length statistics (see Eq. 7), and thus changes in diffusion constants and localization errors cannot be reliable distinguished. As an example, we consider a protein that alternates between free diffusion (D = 1 µm2 s−1 ) and a bound state simulated by slow diffusion (D = 0.1 µm2 s−1 ). We focus on a single trajectory with 4 binding/unbinding events, two of which occur about 400 nm out of focus, and thus are accompanied by substantial broadening of the PSF and accompanying increases in localization errors. This defocus matches roughly the radius of an E. coli cell, and the scenario could model tracking experiments with cytoplasmic proteins that can bind to the inner cell membrane. Using SMeagol [8], we simulated 12 000 replicas of the above set of events, at a camera frame rate of 200 Hz, continuous illumination, and 300 photons/spot on average.
Fig. 5a shows the z coordinates in the input trajectory, and the frame-wise RMS errors produced by the MAP localization algorithm described above. The input points are sparse (tens of ms apart), which allows SMeagol to produce simulated movies that contain the same binding events and defocus trends, but vary in the detailed diffusion trajectory as well as in noise realizations. Examples of simulated spots along a trajectory are shown in Fig. 5b. To analyze this challenging data set, we extended the maximum likelihood diffusion constant estimator with explicit point-wise errors to include multiple diffusion states governed by an hidden Markov model (HMM), for which we derived a variational EM algorithm (see SI text S4). We then analyzed each simulated trajectory with three different 2-state HMMs: (i) vbSPT, which models the observed positions as pure diffusion and neglects blur and localization errors [24], (ii) the above-mention HMM that explicitly models these effects, and (iii) the Kalman filter limit of the HMM in (ii), which models localization errors but not blur effects. Since multi-state Kalmantype algorithms have been studied previously [13, 15, 16], it is interesting to compare models with and without blur. Fig. 5c shows the inferred average state from the three different models. As expected, the HMMs that include localization errors outperform vbSPT at detecting the strongly defocused first and third binding events. The full HMM does not give the best classification of the two short binding events, but it does give the lowest overall misclassification rate, 9% versus 9.6% and 17% for the Kalman and vbSPT models, respectively, so the apparent worse time resolution might reflect an overall tendency to invent spurious transitions by the Kalman and vbSPT models. Next, we look at estimated diffusion constants. Here, the Kalman and vbSPT models make systematic errors as seen in the bare parameters in Fig. 5d. However, by comparing the step length statistics between the full and simplified models, one can derive heuristic correction fac-
5
FIG. 6. Improved localization accuracy by modeling particle motion. (a) True, measured, and HMM-refined positions from part of a 2-state trajectory. (b) Relative change of RMS localization error after HMM-based refinement, for every frame in Fig. 5a.
FIG. 5. Detecting binding events in and out of focus. (a) Input z-coordinates and RMS errors, and (b) representative spot images along simulated trajectories, from time points indicated by + in (a). Gray areas indicate binding events. (d) Average state occupancy for the different analysis algorithms ( “HMM n.b.” = no-blur HMM). (e) Distribution of estimated diffusion constants for the two states, with true values indicated by dashed lines. (f) Corrected diffusion constant estimates. (g) Spot images and HMM occupancies for trajectories simulated without defocus effects.
tors for these models (see Methods, Eq. (9)). As shown in Fig. 5e, this reduces the bias substantially, especially for the high diffusion constant. To finally compare the different HMMs on more wellbehaved data, we reran the same experiment but with all z coordinates rescaled by a factor 1/5 in the PSF model, which essentially removes the z-dependent defocus effects. On this less challenging data set, event detection is much improved and the differences between the three HMMs are much less pronounced. However, with misclassification rates of 4.8%, 6.6%, and 7.6% for the HMM, Kalman model, and vbSPT, respectively, the full HMM still does overall better.
Position refinement
Since the new HMM includes the true trajectory as a hidden variable and performs a global analysis, it can be used to refine individual localized positions, and in principle beat the Cramer-Rao lower bound for single image localizations. Fig. 6a shows the true, measured,
and refined positions for part of a two-state trajectory, and Fig. 6b the relative change of the RMS error for each frame in Fig. 5a after refinement. The refinement improves the RMS errors with up to 50%. Large localization errors and small diffusion constant leads to larger relative improvement, which is intuitive since those factors both make a single point less informative relative its neighbors. A few points show a small error increase. On closer inspection, these turn out to be situated near hardto-detect state-changes, and therefore tend to be refined using an incorrect diffusion constant.
DISCUSSION
Fluorophore positions are not the only useful kind of information in super-resolution microscopy images. Here, we have shown that point-wise position uncertainty can also be extracted and used to improve quantitative data analysis. This is particularly important for live cell data where dynamic phenomena can be studied, and one may expect more heterogeneous imaging conditions where many of the theoretical estimates of localization errors, that may apply for cells with immobilized internal structure (“fixed”), will have little relevance. In general, our results indicate that estimation of position uncertainty is more sensitive to the fit model than estimation of position. For parctical use, we find that an estimate based on the Laplace approximation combined with external information about the fluorescent background and PSF shape performs well in a wide range of experimentally relevant conditions. An intuitive reason why the CRLB performs worse may be that it makes explicit use of the fit model twice, for both fitting and predicting the model-dependent uncertainty, while the Laplace approximation only requires the first step. Theoretically, the posterior density formalism of the Laplace approximation also fits more naturally with model-based time-series analysis where the particle trajectory is treated as a latent variable to be integrated out.
6 While we have focused on 2D localization using conventional optics, the extension to e.g., three dimensions using dual plane imaging [25] or engineered PSFs [26] present no principal difficulties, as long as appropriate PSF models for localization can be formulated. Note however that even a perfect characterization of an experimental PSF [25] does not constitute a perfect localization model, since it does not describe random PSF shape fluctuations due to motion blur [20]. Current techniques for 3D localization are inherently asymmetric and yield different in-plane and axial accuracy [5], which further underscores the need for downstream analysis methods to incorporate heterogeneous localization uncertainty. Most super-resolution microscopy applications are however not aimed at particle tracking, but imaging. For PALM/STORM type imaging of fixed samples, the ability to estimate the uncertainty of individual spots does not improve the localizations themselves, but it may still improve the final resolution since uncertain points can be omitted on a quantitative basis. However, the consequences for live cell imaging are more interesting, since the same fluorophore may be detected in different positions over different frames if the target is moving. For this case, we have shown that the combination of estimating uncertainty and modeling the fluorophore motion can produce refined position estimates, in principle pushing the localization errors below the single-spot Cramer Rao lower bound, by merging information from consecutive frames in an optimal way.
METHODS Synthetic data
We generated synthetic microscopy data using SMeagol, a software for accurate simulations of dynamic fluorescence microscopy at the single molecule level [8]. We modeled the optics using a Gibson-Lanni PSF model with 584 nm wavelength and NA=1.4 [19], and an EMCCD camera with the high gain approximation [4, 17] plus Gaussian readout noise. For localization and diffusion estimation tests, we simulated simple diffusion (D = 1µm2 s−1 ) in a cylinder of length 14.4 µm and diameter 2 µm, similar to long E. coli cells, in order to avoid confinement artifacts in the longitudinal direction. We generated 48 data sets spanning a wide range of imaging conditions by varying frame duration (8 ms or 10 ms, exposure time 2 ms, 4 ms, 6 ms, and 3 ms, 5 ms, 8 ms respectively, background intensity (1 or 2 photons per pixel), average spot brightness (75-300 photons/spot), EM gain 20 or 30, and readout noise level 4 or 8. Each data set contained 104 individual spots. For the simulated multi-state data, we hand-modified a single SMeagol input trajectory from a simulated 2state model to contain 4 binding events with different durations and z-coordinates as seen Fig. 5a, and also thinned out the input trajectory to create more variabil-
ity in the particle paths between different realizations. We then simulated many realizations from this input trajectory, using the same PSF model as above, continuous illumination with a sample time of 5 ms, EMCCD gain 90, an average spot intensity of 300 photons/spot, and a time-dependent background that decays exponentially from 0.95 to 0.75 background photons per pixel with a time-constant of 0.75 s.
Real data
For estimating localization errors in the real imaging conditions, we use immobilized fluorescent beads with the diameter of 0.1 µm (TetraSpeck Fluorescent Microspheres, ThermoFischer T7284). The beads where diluted in ethanol and then placed on a coverslip where we let them dry in before adding water as a mounting medium. Imaging was done with a Nikon Ti-E microscope which was configured for EPI-illumination with a 514 nm excitation laser (Coherent Genesis MX STM) together with matching filters (Semrock dichroic mirror Di02-R514 with emission filter Chroma HQ545/50M-2P 70351). Intensity modulation was made possible by an acousto-optic tunable filter (AOTF) (AA Opto Electronics, AOTFnC) which was triggered by a waveform generator (Tektronix, AFG3021B). The waveform used was a sequence of square pulses, high for 200 ms and low for 1800 ms. The two illumination intensities, high and low, corresponds to 10.7 kW cm−2 and 0.63 kW cm−2 , respectively. Fluorescent beads where viewed through a 100x (CFI Apo TIRF 100x oil, Na=1.49) objective with a 2X (Diagnostic instruments DD20NLT) extension in front an Andor Ultra 897 EMCCD camera (Andor Technology Ltd.). This configuration puts the pixel size to 80 nm which is the same pixel size set in the simulated data. The data set constituted of 1000 frames (Fig. 3a) with an exposure time of 30 ms. EMCCD noise characteristics (gain, offset, readout noise) were determined by analyzing a “dark” movie obtained with the shutter closed.
Localization
We perform maximum likelihood (MLE) localization using a high-gain EMCCD noise model [4, 17], which relates the probability q(ci |Ei ) of the offset-subtracted pixel count ci for a given pixel intensity Ei (expected number of photons/frame) in pixel i. For the intensity E(x, y), we model the PSF with a symmetric Gaussian, (x − µ )2 + (y − µ )2 b N x y √ , + exp − 2 2 a2 2σ 2πσ (1) with pixel size a, background b (expected number of photons/pixel), spot width σ, amplitude N (expected numE(x, y) =
7 ber of photons/spot), and spot position (µx , µy ), and approximate the pixel intensity Z E(x, y)dxdy (2) Ei = px. i
by numerical quadrature [34]. The log likelihood of an image containing a single spot is then given by ln L(θ) = ln q0 (θ) +
X
i∈ROI
ln q(ci |Ei (θ)),
(3)
where θ = (µx , µy , b, N, σ) are fit parameters, and q0 is a prior distribution (we set ln q0 = 0 for MLE fitting). To avoid the complications of spot identification, we use known positions to determine the ROI and initial guess for (µx , µy ). All localizations are performed using a 9 × 9 ROI. Fits that failed to converge or estimated uncertainties larger than 2 pixels where discarded.
Cramer-Rao lower bound
Prior distributions
The prior distributions for localization used in the main text are: normal priors with mean value ln btrue and std. 0.2 for the log background intensity, resulting in the log-normal prior b ∈ ln N (ln btrue , 0.22 ). For the PSF width σ, we use a skewed normal prior for ln σ to penalize fits with σ below the minimum in-focus width of the PSF. The skew normal density is given by [21] 2 x − x0 x − x0 f (x, x0 , w, α) = φ Φ α , (6) w w w where φ is the probability distribution function and Φ is the cumulative distribution function for the unit normal distribution N (0, 1). We used x0 = log(1.5), w = 1, and α = 5, as sketched in Fig. 2b, with the pixel width a = 80 nm as the length unit. These priors are shown in SI Fig. S1. Covariance-based diffusion estimator
If xk (k = 0, 1, . . .) is the measured trajectory of a freely diffusing particle with diffusion constant D, the widely used model for camera-based tracking by Berglund [22] predicts that the measured step lengths ∆k = xk+1 − xk are zero-mean Gaussian variables with ! covariances given by r 2τ σa2
2 2 εCRLB = 2 1 + 4τ + , (4) ∆k = 2D∆t(1−2R)+2ε2 , h∆k ∆k±1 i = 2D∆tR−ε2 , N 1 + 4τ (7) and uncorrelated otherwise. Here, 0 ≤ R ≤ 1/4 is a 2 2 2 2 2 with τ = 2πσa b/(N a ), σa = σ + a /12, and the prefblur coefficient that depends on how the images are acactor 2 accounts for EMCCD excess noise [4]. quired (e.g., R = 1/6 for continuous illumination), ∆t is the measurement time-step, and ε2 is the variance of the localization errors.
Laplace approximation Substituting sample averages for ∆2k and h∆k ∆k+1 i and solving for D yields a covariance-based estimator (CVE) with good performance [10]. If ε2 is known or can An alternative way to approximate the uncertainty of be estimated independently, the first relation in Eq. (7) the fit parameters is to Taylor expand the likelihood alone yields a further improved estimate of D. As we araround the maximum-likelihood parameters θ∗ to second gue in Sec. S2 S2.3, these estimators apply also for variorder, that is 2 able
2 localization errors if ε is replaced by the average 2 ∂ ln L 1 ∂ ln L ε . ln L(θ) ≈ ln L(θ∗ )+ (θ−θ∗ )+ (θ−θ∗ )T (θ−θ∗ ). ∂θ θ∗ 2 ∂θ2 θ∗ (5) Maximum likelihood and multi-state diffusion The first-order term is zero, since θ∗ is a local maximum. This approximates the likelihood by a Gaussian with covariance matrix given by the inverse Hessian, i.e., The Berglund model [22] can also be used directly for Σ = [∂ 2 ln L/∂θ2 ]−1 . In a Bayesian setting, this expresses maximum likelihood inference, which has the potential the (approximate) posterior uncertainty about the fit paadvantage that point-wise errors can be modeled [11]. rameters. The estimated uncertainties (posterior variThe basic assumption is to model the observed positions ances) are given by the diagonal entries of the covarixk as averages of the true diffusive particle path y(t) ance matrix, e.g., ε2Lap. (µx ) = Σµx ,µx . We compute the during the camera exposure, plus a Gaussian localization Hessian numerically using Matlab’s built-in optimization error, i.e., routines, and use the log of the scale parameters b, N, σ Z ∆t for fitting, since they are likely to have a more Gaussianxk = y(k∆t + t)f (t)dt + εk ξk , (8) like posterior [37]. 0 The CRLB is a lower bound on the variance of an unbiased estimator [35, 36]. We use an accurate approximation to the CRLB for a symmetric Gaussian PSF from Ref. [5],
8 where f (t) is the normalized shutter function [22], εk is the localization uncertainty (standard deviation) at time k, and ξk are independent Gaussians random numbers with unit variance. Continuous illumination is described by a constant shutter function, f (t) = 1/∆t. The limit where blur effects are neglected can be described by setting f (t) to a delta function, which corresponds to instantaneous position measurement. This reduces Eq. (8) to a standard Kalman filter [12], and leads to R = 0 in Eq. (7). In SI text S3, we derive a maximum likelihood estimator that learns both D and y(t). In SI Sec. S4, we extend the model to multi-state diffusion, by letting the diffusion constant switch randomly between different values corresponding to different hidden states in an HMM, and derive a variational EM algorithm for maximum likelihood inference of model parameters, hidden states, and refined estimates of the measured positions. To interpret estimated diffusion constants from simplified models, one may “derive” corrected diffusion estimates D ∗ by equating expressions for the step length variance ∆2k from Eq. (7) with and without those effects present. For the Kalman (R = 0) and vbSPT (R = ε = 0) models, we get
DvbSPT − ε2 /∆t DKalman ∗ ∗ , and D = , (9) D = 1 − 2R 1 − 2R
[1] D. Sage, H. Kirshner, T. Pengo, N. Stuurman, J. Min, S. Manley, and M. Unser, “Quantitative evaluation of software packages for single-molecule localization microscopy,” Nat. Meth. 5 (2015), 10.1038/nmeth.3442. [2] Russell E Thompson, Daniel R Larson, and Watt W Webb, “Precise nanometer localization analysis for individual fluorescent probes,” Biophys. J. 82, 2775–2783 (2002). [3] Raimund J Ober, Sripad Ram, and E Sally Ward, “Localization accuracy in single-molecule microscopy,” Biophys. J. 86, 1185–1200 (2004). [4] Kim I. Mortensen, L. Stirling Churchman, James A. Spudich, and Henrik Flyvbjerg, “Optimized localization analysis for single-molecule tracking and super-resolution microscopy,” Nat. Meth. 7, 377–381 (2010). [5] Bernd Rieger and Sjoerd Stallinga, “The lateral and axial localization uncertainty in super-resolution light microscopy,” ChemPhysChem 15, 664–670 (2014). [6] Eric Betzig, George H Patterson, Rachid Sougrat, O Wolf Lindwasser, Scott Olenych, Juan S Bonifacino, Michael W Davidson, Jennifer Lippincott-Schwartz, and Harald F Hess, “Imaging intracellular fluorescent proteins at nanometer resolution,” Science 313, 1642–1645 (2006). [7] Michael J Rust, Mark Bates, and Xiaowei Zhuang, “Sub-diffraction-limit imaging by stochastic optical reconstruction microscopy (STORM),” Nat. Meth. 3, 793– 796 (2006). ´ c, Alexis Boucharin, [8] Martin Lind´en, Vladimir Curi´ David Fange, and Johan Elf, “Simulated
respectively, which is what we use in Fig. 5e.
Software
Matlab open source code for the EM and localization algorithms are available at github.com/bmelinden/uncertainSPT.
Acknowledgements
We thank David Fange and Irmeli Barkefors for their careful reading of the manuscript. This work was supported by the European Research Council (Grant no. 616047), Vetenskapsr˚ adet, the Knut and Alice Wallenberg Foundation, and the Swedish strategic research programme eSSENCE.
Author contributions
M.L., V.C. and J.E. designed the project. M.L. and V.C. designed and implemented data analysis algorithms. E.A. designed and ran microscopy experiments. V.C. and M.L. analyzed data. All authors wrote the paper.
[9]
[10]
[11]
[12]
[13]
[14]
[15]
single molecule microscopy with SMeagol,” Bioinformatics (2016), 10.1093/bioinformatics/btw109, http://smeagol.sourceforge.net. Suliana Manley, Jennifer M. Gillette, George H. Patterson, Hari Shroff, Harald F. Hess, Eric Betzig, and Jennifer Lippincott-Schwartz, “High-density mapping of single-molecule trajectories with photoactivated localization microscopy,” Nat. Meth. 5, 155–157 (2008). Christian L. Vestergaard, Paul C. Blainey, and Henrik Flyvbjerg, “Optimal estimation of diffusion coefficients from single-particle trajectories,” Phys. Rev. E 89, 022726 (2014). Peter K. Relich, Mark J. Olah, Patrick J. Cutler, and Keith A. Lidke, “Estimation of the diffusion constant from intermittent trajectories with variable position uncertainties,” Phys. Rev. E 93, 042401 (2016). Christopher P. Calderon, “Motion blur filtering: A statistical approach for extracting confinement forces and diffusivity from a single blurred trajectory,” Phys. Rev. E 93, 053303 (2016). Jason Bernstein and John Fricks, “Analysis of single particle diffusion with transient binding using particle filtering,” J. Theor. Biol. (2016), 10.1016/j.jtbi.2016.04.013. Peter K. Koo, Matthew Weitzman, Chandran R. Sabanaygam, Kenneth L. van Golen, and Simon G. J. Mochrie, “Extracting diffusive states of Rho GTPase in live cells: Towards in vivo biochemistry,” PLOS Comput. Biol. 11, e1004297 (2015). Paddy J. Slator, Christopher W. Cairo, and Nigel J. Burroughs, “Detection of diffusion heterogeneity in
9
[16]
[17]
[18] [19]
[20]
[21] [22] [23]
[24]
[25]
[26]
[27] [28] [29] [30] [31]
[32]
single particle tracking trajectories using a hidden Markov model with measurement noise propagation,” PLOS ONE 10, e0140759 (2015). Christopher P. Calderon, “Data-driven techniques for detecting dynamical state changes in noisily measured 3D single-molecule trajectories,” Molecules 19, 18381–18398 (2014). Jerry Chao, Sripad Ram, E. Sally Ward, and Raimund J. Ober, “Two approximations for the geometric model of signal amplification in an electron-multiplying charge-coupled device detector,” Proc. SPIE 8589, 858905 (2013). David MacKay, Information theory, inference, and learning algorithms (Cambridge University Press, 2003). H. Kirshner, F. Aguet, D. Sage, and M. UNSER, “3-D PSF fitting for fluorescence microscopy: implementation and localization application,” J. Microsc.-Oxford 249, 13–25 (2013). Hendrik Deschout, Kristiaan Neyts, and Kevin Braeckmans, “The influence of movement on the localization precision of sub-resolution particles in fluorescence microscopy,” J. Biophotonics 5, 97–109 (2012). A. Azzalini, “A class of distributions which includes the normal ones,” Scand. J. Stat. 12, 171–178 (1985). Andrew J. Berglund, “Statistics of camera-based singleparticle tracking,” Phys. Rev. E 82, 011917 (2010). Trevor T. Ashley and Sean B. Andersson, “Method for simultaneous localization and parameter estimation in particle tracking experiments,” Phys. Rev. E 92, 052707 (2015). Fredrik Persson, Martin Lind´ n, Cecilia Unoson, and Johan Elf, “Extracting intracellular diffusive states and transition rates from single-molecule tracking data,” Nat. Meth. 10, 265–269 (2013). Sheng Liu, Emil B. Kromann, Wesley D. Krueger, Joerg Bewersdorf, and Keith A. Lidke, “Three dimensional single molecule localization using a phase retrieved pupil function,” Opt. Express 21, 29462 (2013). S. R. P. Pavani, M. A. Thompson, J. S. Biteen, S. J. Lord, N. Liu, R. J. Twieg, R. Piestun, and W. E. Moerner, “Three-dimensional, singlemolecule fluorescence imaging beyond the diffraction limit by using a double-helix point spread function,” Proc. Natl. Acad. Sci. U.S.A. 106, 2995–2999 (2009). Winston C. Chow, “Brownian bridge,” WIREs Comp. Stat. 1, 325–332 (2009). Zoubin Ghahramani and Michael I. Jordan, “Factorial hidden Markov models,” Mach. Learn. 29, 245–273 (1997). Christopher Bishop, Pattern recognition and machine learning (Springer, New York, 2006). L. R. Rabiner and B. H. Juang, “An introduction to hidden Markov models,” IEEE ASSp Magazine , 4–16 (1986). Jonathan E. Bronson, Jingyi Fei, Jake M. Hofman, Ruben L. Gonzalez Jr., and Chris H. Wiggins, “Learning rates and states from biophysical time series: A Bayesian approach to model selection and single-molecule FRET data,” Biophys. J. 97, 3196–3205 (2009). Stephanie Johnson, Jan-Willem van de Meent, Rob Phillips, Chris H. Wiggins, and Martin Lind´en, “Multiple LacI-mediated loops revealed by Bayesian statistics and tethered particle motion,” Nucl. Acids Res. 42, 10265–10277 (2014).
[33] Kenneth P. Burnham and David R. Anderson, Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach (Springer, 2013). [34] Jerry Chao, Sripad Ram, Taiyoon Lee, E. Sally Ward, and Raimund J. Ober, “Investigation of the numerics of point spread function integration in single molecule localization,” Opt. Expr. 23, 16866 (2015). [35] Harald Cram´er, Mathematical methods of statistics, Vol. 9 (Princeton university press, 1945). [36] C Radhakrishna Rao, “Information and accuracy attainable in the estimation of statistical parameters,” Bull Calcutta. Math. Soc. 37, 81–91 (1945). [37] David J. C. MacKay, “Choice of basis for laplace approximation,” Mach. Learn. 33, 77–86 (1998).
10 SUPPLEMENTARY MATERIAL Appendix S1: Localization error estimates with different priors
In this section we present how different prior functions assigned to the PSF width σ and the background b influence the accuracy of the computed RMS errors and RMS uncertainties for the synthetic data set considered in this paper. We considered log-normal priors for the background intensity, either centered around the true (simulated) background in each data set, or around the average background intensity in all data sets. For the PSF width we tested two types of priors: (i) a log-normal prior around the expected value for the PSF width; (ii) a skew-normal prior on ln(σ), which assigns very low probability to PSF widths below ≈ 0.8 pixels, as well as low probabilities for very wide PSFs. We do not impose any prior on the spot amplitude N . The probability density function for the log-normal prior, denoted as ln N (µ, σ 2 ), is defined by (lnx−µ)2 1 fln (x, σ, µ) = √ e− 2σ2 , x 2πσ 2
(S1)
where µ and σ are the mean and standard deviation parameters. The skew-normal distribution [21] has a density function given by x − x0 2 x − x0 Φ α , (S2) fsn (x, x0 , ω, α) = φ ω ω ω where φ is the probability density function and Φ is the cumulative distribution function for the unit normal distribution N (0, 1), and the parameter α determines the degree of asymmetry; with α positive (negative) increasing the weight above (below) the mean value parameter x0 . For our investigations on the effects of different prior distributions, we used two background priors: • Log-normal “correct” prior on b: ln N (ln btrue , 0.22 ), where btrue is either 1 or 2 photons/pixel, which is the background level used for simulating that particular data set. • Log-normal “average” prior on b: ln N (ln 1.5, 0.22 ), centered around the average background intensity in all data sets, that is, 1.5 photons/pixel. For the PSF width, we considered the two following priors: • Log-normal prior on σ: ln N (ln 1.5, 0.22), since the half-width of the focused PSF width is around 1.5 pixels wide. • Skew-normal prior on ln(σ): fsn (x, x0 , ω, α), where x0 = ln(1.5), α = 5 and ω = 1. Error-uncertainty comparisons for various combinations of the above priors are shown in Fig. S2. As seen in Fig. S2a (which reproduces Fig. 2a in the main text), imposing a flat prior (no prior) leads to estimated uncertainties that
FIG. S1. Prior distributions and fit parameter distributions corresponding to Fig. 2a,c in the main text. (a) MLE and MAP background parameters, with b = 1 and b = 2 populations separated. Log-normal priors for the background are indicated by dashed lines. Note how the population near b = 0 is eliminated in the MAP fits. (b) PSF width σ for all backgrounds with (MAP) and without (MLE) priors, together with the skewed log-normal prior (dashed, reproduces Fig. 2b). (c) Spot amplitude N distributions. Again, the population is shifted away from the very smallest amplitude values, even though no prior on N was used in the MAP fit.
11
FIG. S2. Error-uncertainty plots for different combinations of prior distributions on b and σ.
substantially underestimate the true errors, especially for points with low photon count. In comparison, all other prior combinations present an average improvement. Overall, the Laplace estimator seems more consistent (RMS uncertainty mostly closer to RMS error), and also less sensitive to the choice of priors. The latter trend is especially clear in the examples using the “average” background prior, which is always either a systematic over- or underestimate, which results in substantially larger variability or the CRLB results. Appendix S2: Diffusion with time-varying localization errors
Here, we give a detailed derivation of the various estimators and models mentioned in the main text methods section, starting from Eq. (8), the assumption that localizing a moving object amounts to detecting the time-averaged position with some (independent) localization error, which we will assume to be Gaussian. Parts of these derivations have been given elsewhere [11, 22], but are restated here in a different form that facilitates a generalization to multi-state
12
FIG. S3. Localization errors for MLE vs MAP localizations, using the prior described in the main text and Fig. S2f.
models.
S2.1.
Diffusive camera-based tracking with blur and localization errors
To streamline the presentation, we will use units where ∆t = 1, use the subscript t = 0, 1, 2, . . . to denote discrete time dependence, and use the step length and localization variances, λt = 2Dt ∆t and vt = ε2t , respectively. We allow both of them to be time-dependent, but assume them to be statistically independent and restrict the diffusion constant to be constant throughout each frame. Then, Eq. (8) reads xt =
Z
1
√
f (t′ )y(t + t′ )dt′ +
vt ξt ,
(S1)
0
where ξt are independent identically distributed (iid) N(0,1) variables, and y(t) is the true trajectory of the particle being localized. The shutter distribution f (t) is a probability density on [0, 1], which describes the image acquisition process (e.g., f (t) = 1 for continuous acquisition), but neglects stochastic elements such as fluorophore blinking. It has the distribution function Z t F (t) = f (t′ )dt′ . (S2) 0
We divide y(t) in two parts, the true positions yt at the beginning of each frame, which evolve according to yt+1 = yt +
p λt ηt ,
(S3)
where ηt are again iid N(0,1), and a conditional interpolating process between them, described by Brownian bridges [27]. Thus, for 0 ≤ t′ ≤ 1, we write y(t + t′ ) = yt + t′ (yt+1 − yt ) +
p λt Bt (t′ ),
(S4)
where Bt are a set of iid standard Brownian bridges. These are Gaussian processes on the interval [0,1], defined by Bt (0) = Bt (1) = 0,
hBt (t′ )i = 0,
hBt (t′ )Bt (t′′ )i = t′ (1 − t′′ ), for t′ ≤ t′′ ,
(S5)
and also independent on different intervals, so that hBr (t′ )Bv (t′′ )i = 0 if r 6= v. Substituting the interpolation formula Eq. (S4) in the localization model Eq. (S1), we get xt = yt (1 − τ ) + yt+1 τ +
p √ vt ξt + λt
Z
1
f (t′ )Bt (t′ )dt′ ,
(S6)
0
where we have introduced the shutter average, given by τ=
Z
0
1
tf (t)dt.
(S7)
13 Using the properties of Brownian bridges, Eq. (S5), one can show that the last integral in Eq. (S6) is a Gaussian random variable with mean zero and variance Z 1 ′ ′ ′ β ≡ Var (S8) f (t )Bt (t )dt = τ (1 − τ ) − R, 0
where R is the blur coefficient of Ref. [22], given by R=
Z
1
0
F (t) 1 − F (t) dt.
(S9)
By assumption, vt and Bt are statistically independent, and thus one can add up the noise in the measurement model and arrive at p (S10) xt = yt (1 − τ ) + yt+1 τ + vt + βλt ζt ,
where ζt are again iid N(0,1).
S2.2.
Constant exposure
An important class of shutter distribution are those that are constant during some fraction tE of the each frame, and then zero, i.e., f (t) =
R=
tE , 6
1 tE ,
0,
t ≤ tE , t > tE ,
(S11)
which leads to τ=
tE , 2
β=
1 4 tE ( − tE ). 4 3
(S12)
We see that R ≤ 16 and τ ≤ 21 , with maxima at continuous exposure (tE = 1). 1 at continuous exposure can only be On the other hand, β has a maximum of 91 at tE = 23 , and the value of 12 1 further lowered when tE < 3 . It is unclear if this is significant, since with non-constant exposure there is some freedom in how the shutter distribution is defined, and one could also place it symmetrically in the interval and get τ = 0.5 for all exposure times. We defer further investigations of this issue to future work.
S2.3.
Covariance relations
The covariance matrix for the steps ∆t = xt+1 − xt can be found from Eqs. (S3,S10). With some manipulations, we get
2 ∆t =(1 − τ )λt + τ λt+1 − (λt+1 + λt )R + vt + vt+1 ,
h∆t ∆t+1 i =λt+1 R − vt+1 , h∆t ∆t+t′ i =0, if |t′ | > 1,
(S13) (S14) (S15)
where the expectations h·i are understood to be over the noise distributions only. If we further assume simple diffusion, λt = 2D∆t = const., and average over time as well, we recover covariance relations of the same form as Eq. (7),
2 ∆t = 2D∆t(1 − 2R) + 2 hvt i ,
h∆t ∆t±1 i = 2D∆tR − hvt i ,
(S16)
where the averages are now over time as well, and ε2 is identified as the time-average hvt i. Thus, the covariance-based estimators of Ref. [10] should apply also to non-constant localization errors.
14 Appendix S3: Maximum likelihood estimator
For the case of a single trajectory of simple diffusion in one dimension, the likelihood for the diffusion constant follows from Eqs. S3 and S10, Z p(x|λ) = dyp(x|y, λ)p(y|λ), (S1) p(y|λ) =
T Y
t=1
p(x|y, λ) =
T Y
t=1
h (y 2i − 1 t+1 − yt ) 2πλ 2 exp − , 2λ
h 1 − 1 2 i 2π(vt + βλ) 2 exp − (vt + βλ)−1 xt − (1 − τ )yt − τ yt+1 , 2
(S2) (S3)
where we neglected to supply a starting density for y1 , since the problem is translation invariant. The integral over the hidden path in Eq. (S1) is a multivariate Gaussian and can be solved exactly in several ways [11]. Defining y = [y1 , y2 , . . . , yT +1 ]† , where
†
x = [x1 , x2 , . . . , xT ]† ,
(S4)
denotes matrix transpose, we get
p(x|λ) =
Z
T T 1X 1 † y Λy − 2y † W x + x† V x dy exp − T ln(2π) − ln(λ) − ln(vt + βλ) − 2 2 t=1 2 T T 1X = exp − T ln(2π) − ln(λ) − ln(vt + βλ) 2 2 t=1 Z 1 −1 † −1 † † −1 (y − Λ W x) Λ(y − Λ W x) + x (V − W Λ W )x , (S5) × dy exp − 2
where Λ, W, V are matrices whose elements are found by comparing terms. This is a multivariate Gaussian in y, with mean value µ = Λ−1 W x and covariance matrix Σ = Λ−1 , and the marginalized likelihood is therefore given by T T 1X 1 1 † T −1 † −1 ln(2π) − ln(λ) − ln(vt + βλ) − x (V − W Λ W )x − ln |Λ| . p(x|λ) = exp − 2 2 2 t=1 2 2
(S6)
By comparing terms, we see that Λ is symmetric tridiagonal and positive definite, V is diagonal, and W has only nonzero elements on the diagonal and first upper diagonal, and so it is possible to compute the above matrix expression in linear time. We minimize ln p(x|λ) by standard optimization routines in Matlab. Repeating the analysis of Fig. 4 in the main text for this MLE estimator, see Fig. S4, we find almost no improvement over the CVE that uses average estimated uncertainties. However, the fact that the MLE estimator is limited to return positive estimated diffusion constants, and handles missing data points (by assigning them infinite or very large uncertainties), may be useful in applications.
Appendix S4: Variational EM algorithm for diffusive HMM
Here, we extend the above diffusion model to include multiple diffusion constants. We start by writing down a diffusive HMM, which includes both the hidden path of the above diffusion model, but also a set of hidden states with different diffusion constants, that evolve as a discrete Markov process. Similar models (that however did not include explicit motion blur effects), have previously been solved by stochastic EM algorithms [13, 23]. Here, we instead describe a deterministic variational approach inspired by algorithms for factorial HMMs [28]. In the rest of this section, we proceed as follows: We start by specify the diffusive HMM model for a single 1-dimensional trajectory. We then outline the variational EM approach, derive high level update equations, and describe the procedure for re-estimating localized positions. However, do not give a detailed derivation of all steps in the algorithm, as large parts of it closely resembles previously published derivations of variational algorithms for HMMs [24, 32].
15
Dest. / Dtrue
1.5
1
0.5 2
CVE
MLE
6
8
4
SNR FIG. S4. Mean value and 1% quantiles of estimated diffusion constants using the CVE and MLE diffusion constant estimator, both using estimated positions and uncertainties.
S4.1.
Model
In addition to the measured (x) and true (y) positions, we include a hidden state trajectory s = [s1 , s2 , . . . , sT ], such that st determines the diffusion constant on the interval [t, t + 1]. The hidden states are numbered from 1 to N , and evolve according to a Markov process with transition matrix A and initial state probability π. For a single 1-dimensional trajectory, this leads to a complete data likelihood of the form p(x, y, s|λ, A, π) = p(x|y, s, λ)p(y|s, λ)p(s|A, π),
(S1)
with factors p(s|A, π) =
N Y
δ
πmm,s1
m=1
p(y|s, λ) =
p(x|y, s, λ) =
T Y N Y
N T Y Y
δist δjst+1
Aij
,
1 (yt+1 − yt )2 , (2πλj )− 2 δjst exp −δjst 2λj t=1 j=1 T Y N Y
t=1 j=1
(S2)
t=2 i,j=1
− 12 δjst (xt − (t − τ )yt − τ yt+1 )2 2π(vt + βλj ) exp −δjst . 2(vt + βλj ) S4.2.
(S3)
(S4)
Variational EM approach
We would like to perform maximum-likelihood inference of the model parameters, which means maximizing the likelihood with latent variables s, y integrated out, Z X L(A, π, λ) = dy p(x|y, s, λ)p(y|s, λ)p(s|A, π). (S5) s
Since this problem is intractable, we make a variational approximation, meaning we approximate ln L with a lower bound Z Z X X p(x|y, s, λ)p(y|s, λ)p(s|A, π) ln L = ln dy p(x|y, s, λ)p(y|s, λ)p(s|A, π) ≥ dy q(s)q(y) ln ≡ F, (S6) q(s)q(y) s s where the inequality follows from Jensen’s inequality. Here, q(s), q(y) are arbitrary variational distributions that need to be optimized together with the model parameters to achieve the tightest lower bound, and can be used for approximate inference about the latent variables. In particular, it turns out that the optimal variational distributions approximate the posterior distribution of y, s in the sense of minimizing a Kullback-Leibler divergence [29]. We use gradient descent for the optimization, i.e., we iteratively optimize each variational distribution and the parameters with the others fixed, which leads to an EM-type algorithm.
16 To optimize F w.r.t. the variational distributions, we set the functional derivatives of F to zero and use Lagrange multipliers to enforce normalization. After some work, one arrives at ln q(s) = − ln Zs + hln p(x|y, s, λiq(y) + hln p(y|s, λiq(y) + ln p(s|A, π),
(S7)
ln q(y) = − ln Zy + hln p(x|y, s, λiq(s) + hln p(y|s, λiq(s) + hln p(s|A, π)iq(s) , | {z }
(S8)
λj = argmaxλj hln p(x|ys, λ)iq(y)q(s) hln p(y|s, λ)iq(y)q(s) ,
(S9)
independent of y
where Zs,y are normalization constants originating from the Lagrange multipliers, and h·if denotes an expectation value computed with respect to the distribution f . As it turns out, these equations are individually tractable, and results in q(y) being a multivariate Gaussian, and q(s) adopting the standard HMM form amenable to efficient forward-backward iterations. For the initial state and transition probability parameters, the only dependence in F comes from p(s|A, λ), and arrives at the classical Baum-Welch reestimation formulas [30]. However, optimizing F w.r.t. step length variances does not lead to a tractable update equation, and we are instead forced to optimize the λj -dependent parts of F numerically, i.e.,
S4.3.
The lower bound
The lower bound can be computed as well, and gets a particularly simple form just after the update of q(s). Substituting the update equation Eq. (S7) into the expression for F , Eq. (S6), we get F =
Z
dy
X s
h q(s)q(y) ln ln p(x|y, s, λ) + ln p(y|s, λ) + ln p(s, A, π)
i + ln Zs − hln p(x|y, s, λ) + ln p(y|s, λ) + ln p(s, A, π)iq(y) − ln q(y) = ln Zs − hln q(y)iq(y) . (S10)
Here, ln Zs is the normalization constant of q(s) that can be computed as part of the forward-backward iteration, and since (as we noted above) q(y) is a multivariate Gaussian with dimension T + 1, we get − hln q(y)iq(y) =
1 d (T + 1) 1 + ln(2π) + ln |Σ|, 2 2
(S11)
where Σ is the covariance matrix of q(y). The inverse of Σ is analogous to the matrix Λ appearing in the single-state diffusion estimator, Eqs. (S5,S6), and in particular Σ−1 is also symmetric, tridiagonal, and positive definite, and thus the determinant ln |Σ| = − ln |Σ−1 | can be robustly computed in linear time. Since the EM algorithm approximates the parameter likelihood, the lower bound cannot be used for model selection as in the case of variational maximum evidence calculations [24, 31, 32]. However, it is still useful for numerical convergence control, and possibly for model selection together with some complexity penalty such as the Bayesian or Akaike information criterion [33]. S4.4.
Refined positions
An additional use for the HMM analysis is to use it to refine the localizations. Since the HMM pools information about many spots, it is in principle possible to beat the Cram´er-Rao lower bound for single image localizations in this way. To set up√the refinment problem, we refer back to Eq. (S6) and the different contributions to the observed position xt = zt + vt ξt , where zt = yt (1 − τ ) + yt+1 τ +
Z p λst
1
f (t′ )Bt (t′ )dt′
(S12)
0
is the motion-averaged position that the localization algorithm tries to estimate (according to this model). To refine the localization, we need to compute the posterior density of zt , i.e., p(zt |x, θ). We start by recalling that the Brownian bridge integral in Eq. (S12) is Gaussian with mean zero and variance βλst , and using the compact notation yt = (1 − τ )yt + τ yt+1 and θ to denote all the model parameters, we therefore have p(zt |y, s, θ) = p(zt |yt , st , θ) = N (yt , βλst ),
(S13)
17 where N (a, b) denotes a Gaussian density with mean a and variance b. Furthermore, the localization uncertainty is also assumed Gaussian and independent of the underlying kinetics: p(xt |z, y, s, θ) = p(xt |zt ) = N (zt , vt ).
(S14)
Applying Bayes theorem to these relations, we get p(zt |xt , y, s, θ) =
v y + βλ x p(xt |zt )p(zt |y, s, θ) βλst vt N (zt , vt )N (y t , βλst ) st t t t . =N , = p(xt |y, s, θ)) N (yt , βλst + vt ) vt + βλst vt + βλst
(S15)
The predictive distribution is finally given by marginalizing p(zt |xt , y, s, θ) over the posterior for y, s. Using the variational distribution, this means vt y t + βλst xt βλst vt p(zt |x, θ) ≈ N , . (S16) vt + βλst vt + βλst q(y)q(s) In particular, the posterior mean of zt is then given by hzt |x, θi ≈
vt y t + βλst xt vt + βλst
q(y)q(s)
=
*
(1 − τ )µt + τ µt+1 + 1+
βλst vt
βλst vt
xt
+
,
(S17)
q(s)
which we will use as our estimator for refining the localizations. Here, µt = hyt iq(y) is the variational mean value. The variational average over hidden states is done numerically.