A Note on Systematic Errors in Bayesian Retrieval Algorithms

Journal of the Meteorological Society of Japan, Vol. 85, No. 1, pp. 69--74, 2007

69

NOTES AND CORRESPONDENCE A Note on Systematic Errors in Bayesian Retrieval Algorithms

Eun-Kyoung SEO, Guosheng LIU, and Kwang-Yul KIM Department of Meteorology, Florida State University, Tallahassee, Florida, USA (Manuscript received 20 June 2006, in final form 19 September 2006)

Abstract Using satellite ice water path (IWP) retrieval as an example, two types of systematic errors when using a Bayesian algorithm are documented. These errors arise from the discontinuous and heavily skewed data distributions of IWP in the supporting database. Discontinuity in the data distribution occurs at IWP ¼ 0 since there are no negative IWPs, which causes a positive bias in the IWP retrievals near icefree conditions. On the other hand, since clouds with small IWPs occur more often than clouds with large IWPs in nature, data points in the database are heavily populated in the low IWP range if we build the supporting database based on observed data. This skewed data distribution leads to a negative bias. A number of remedies, while some are ad hoc, to reduce these errors are also suggested.

1.

Introduction

Many satellite retrieval problems do not possess a unique solution for a given set of brightness temperatures ðTB sÞ (Twomey 1977). Many different hydrometeor profiles could be associated with the same set of TB s (Petty 1999; Seo and Biggerstaff 2006). One method to find an optimal solution for the underdetermined problem is the retrieval algorithm based on the Bayesian theorem (e.g., Kummerow et al. 1996; Evans et al. 2002; Seo and Liu 2005). A Bayesian algorithm finds a solution through a set of measurements and a supporting database. Given a set of satellite measured brightness temperatures yo , the posterior probability density function (PDF) may be expressed by Pðx j yo Þ ¼ Pðyo j xÞPðxÞ;

ð1Þ

where Pðyo j xÞ is the conditional PDF of an observed state yo , given an atmospheric state x, Corresponding author: Eun-Kyoung Seo, Department of Meteorology, Florida State University, 404 Love Bldg., Tallahassee, FL 32306-4520, USA. E-mail: [email protected] ( 2007, Meteorological Society of Japan

and PðxÞ is the prior PDF of an atmospheric state x. Commonly, it is assumed that the conditional PDF is a multivariate normal distribution of measured brightness temperatures being normally distributed around those calculated by a radiative transfer model, i.e., Pðyo j xÞ ¼ Nðyo ys ; sÞ, where ys is the brightness temperature vector from a radiative transfer model, and s can be considered as errors associated with both the observations and the forward radiative transfer model. N denotes a normal distribution. To calculate the retrieval from the posterior PDF, one implementation of Bayesian theorem is to find the mean state by integrating over the posterior PDF, in which the expected value of atmospheric state can be expressed as (Evans et al. 2002): Ð x Pðyo j xÞPðxÞ dx EðxÞ ¼ Ð : ð2Þ Pðyo j xÞPðxÞ dx In other words, the retrieval in this implementation is a weighted average of all possible atmospheric states, and the weighting factors are the (normalized) posterior PDF. If the posterior PDF is symmetric about its maximum, such as a Gaussian distribution, the retrieval

70

Journal of the Meteorological Society of Japan

Vol. 85, No. 1

based on (2) coincides with the location of the maximum of the posterior PDF. Otherwise, the more heavily the posterior PDF is skewed, the greater the difference is between the retrieval and the location of the maximum. Due to the skewed nature of the prior PDF for most meteorological variables, the retrieval values by (2) might not always be the same as the desired (true) values depending on the pattern of the posterior PDF, consequently causing systematic errors. In this note, we use ice water path (IWP) retrieval as an example to describe this systematic error. 2.

The IWP algorithm and the systematic error

The IWP retrieval algorithm used in this study has been described in detail by Seo and Liu (2005), and is only described here very briefly. This algorithm retrieves IWP based on the Monte Carlo integration implementation of Bayesian theorem as described in (2) using brightness temperatures from Advanced Microwave Sounding Unit–B (AMSU-B), which measures upwelling radiances at the following 5 frequencies: 89, 150, 183.3 G 1, 183.3 G 3, and 183.3 G 7 GHz. To build the supporting database, about 3,700 ice water content profiles are compiled based on surface observations from a millimeter-wave cloud radar (MMCR) at the Atmospheric Radiation Measurement’s (ARM) Southern Great Plain (SGP) site. These profiles, together with ground temperature and upper air sounding measured during the March 2000 ARM Intensive Observation Period (IOP), are then used as inputs for a radiative transfer model to compute brightness temperatures at the AMSU-B frequencies. The simulated brightness temperatures and their corresponding IWPs form the database which are utilized for the Bayesian algorithm. The prior PDF of the IWP in the supporting database is compared to the occurrence frequencies of IWPs in actual clouds. As shown in Fig. 1, the IWP frequency values in the supporting database agree well with those in the observations, i.e., decreasing sharply with the increase of IWP when the IWP is small (0– 30 g m2 ); its slope becomes moderate as the IWP increases further. By definition, there are no negative values of IWP, and the minimum value of IWP in our database is near 0 g m2 .

Fig. 1. Relative frequency of the observed (dotted line) and the synthetic (circles) IWPs.

To test how the retrieval algorithm performs, we randomly select about 900 data points from the supporting database to form a subset, and use this subset as ‘‘observations’’. Since the IWP values are known for the data in the subset, we can compare the retrieved versus the ‘‘observed’’ IWPs, as shown in Figs. 2a and b. As seen in the figures, there are distinct positive and negative biases in the IWP retrievals near ice-free conditions and in the range of @20–600 g m2 . To reflect the brightness temperature variations due to factors other than the IWP, such as surface emissivity and temperature, profiles of water vapor and temperature, and particle size and shape, noises are added to the ‘‘observed brightness temperatures’’. By doing so, uncertainties in association with determining clear sky microwave brightness temperatures are introduced. The added noises are normally distributed with a mean of zero, and a standard deviation of 1.5 K. Below the IWP of @600 g m2 , even though the retrievals are scattered due to the added noises, they exhibit the similar biases to the retrievals without noises added in the observed brightness temperatures (Fig. 2b). The cause of the biases as shown in Fig. 2 may be explained as follows. From (2), the retrieval is the weighted averaging of all IWPs in

February 2007

E.-K. SEO et al.

71

Fig. 2. Relations between the observed and the retrieved IWPs by employing (a) the ordinary Bayesian approach and (b) the ordinary Bayesian approach with random noises added to observed brightness temperature vectors. Crosses in (b) represent mean IWPs for the noises added to each observed brightness temperature vector.

the database. The weighting factor is the posterior PDF, which is the product of the PDF of the IWPs, and the conditional PDF of an observed state (brightness temperatures) given an atmospheric state (IWP). The conditional PDF is a normal distribution determined by the distance between observed and calculated brightness temperatures. To illustrate the problems that cause the biases in Fig. 2, the posterior PDFs as a function of the IWP are examined for three cases selected from the database with jyo j ¼ 0 K; 2:2 K and 12.0 K (Fig. 3), where yo denotes DTB ¼ TBC TB , and TBC and TB are, respectively, the clear and cloudy sky brightness temperature vectors. For the case of jyo j ¼ 0 K (in which the IWP ¼ 0), the posterior PDF (Fig. 3a) has values only in the range of IWP b 0 g m2 since there are no negative IWPs in the database. Therefore, the retrieved IWPs will be always greater than 0 g m2 even though the observed brightness temperatures are equal to the clearsky background values. As seen from Fig. 3a, a positive retrieval bias near IWP ¼ 0 results from this discontinuity of data distribution in the supporting database. For the case of jyo j ¼ 2:2 K, in which IWP ¼ 184 g m2 , the posterior PDF (Fig. 3b) exhibits an asymmetric distribution about the IWP. The contribution to (2) from the left side of the true value is greater than that from the

right side of the PDF, yielding a smaller expected value compared to the true value. The negative biases are distinct in the range of true IWPs between 20 and 600 g m2 (Figs. 2a and b). This is attributed primarily to the heavily skewed distribution of the prior PDF of the IWP in the range of jDTB j less than 10 K. As jDTB j increases greater than 10 K, the posterior PDF has a sufficient peak and becomes more symmetric about the true IWP, so that the estimate of the IWP is only slightly influenced by the prior PDF for the case of jyo j ¼ 12:0 K (IWP ¼ 696 g m2 ) (Fig. 3c). In this range, the original algorithm as shown by (2) yields little bias. 3.

Possible remedies

Now that the causes of the systematic errors are understood, here we attempt to offer some remedies to the problem. It should be noted that some of the remedies are ad hoc, and purely tuning in nature. In the following, after presenting each remedy we will comment on potential problems involved. 3.1 Positive bias near IWP ¼ 0 As discussed above, this bias results from the discontinuity of data distribution in the supporting database. We propose two remedies: (1) adding mirror (unphysical negative IWP) data points or, (2) clear-sky ðIWP ¼ 0Þ data points

72


Vol. 85, No. 1

Fig. 3. Posterior probability distributions at jyo j of (a) 0 K, (b) 2.2 K, and (c) 12.0 K. (a) represents the posterior probability distribution by adopting mirror image. Thick solid line in each panel denotes the true IWP for a given yo .

to the existing database. In the first remedy, the DTB and the IWP values of the additional data points are made to be symmetric to those in the existing database, about IWP ¼ 0 and jDTB j ¼ 0. In other words, the IWPs and DTB ’s in the added data points have the same magnitude but with negative sign as those in the existing database. By doing so, the posterior PDF will be symmetric about IWP ¼ 0 (Fig. 3a, dotted line), ensuring that the retrieved IWP ¼ 0 when jDTB j ¼ 0 from (2). In this way, the mirror data points fix the positive bias in the IWP retrievals for brightness temperatures at, and near clear sky values (Fig. 4a). In reality, however, the positive bias near IWP ¼ 0 can be hidden by the uncertainties, which are inherent in surface emissivity and background atmospheric emissions (water vapor, etc.). Clearly, the problem of this remedy is that the negative IWPs are unphysical, and are introduced just for tuning purposes. The second remedy is to include clear-sky data points, with a proper probability of occur-

rence in the a priori database. Note that all the above results did not include clear-sky (that is, zero IWPs) data points in the a priori database. Including clear-sky data points, (2) may be re-rewritten as: EðxÞ ¼Ð

Ð

x Pðyo j xÞPðxÞ dx : Pðy j xÞPðxÞ dx þ Pðyo j x ¼ 0ÞPðx ¼ 0Þ o x>0 ð3Þ

The difference between (2) and (3) is the second term in the normalization representing the clear-sky contribution. The clear-sky term, which does not appear in the numerator because it is multiplied by zero, tends to lower EðxÞ, and thus would offset, some of the positive bias for near clear-sky conditions. The clear-sky conditional PDF, Pðyo j x ¼ 0Þ, declines to zero as DTB increases, so that this term becomes negligible for large IWPs. The effect of including the clear-sky contribution shows that the positive bias is substantially reduced, approaching 0.5 g m2 (Fig. 4b) This

February 2007

E.-K. SEO et al.

73

Fig. 4. Relation between ‘‘observed’’ and retrieved IWPs by (a) adding the mirror data points and (b) including the zero IWP data points in the database.

remedy broadens the database from ‘‘cloudy cases’’ to ‘‘all-weather cases’’. This poses no problems if we construct a database using allweather observational data (like the data we used in our IWP algorithm). However, some Bayesian algorithm databases are constructed using cloud resolving model data, which tends to only simulate when/where there are cloud/ rain events. If the number of data points in the database are biased toward more ‘‘eventful’’ cloud/rain cases, this remedy will not work correctly because Pðx ¼ 0Þ is not correctly presented in the database. 3.2 Negative bias for low IWPs This bias is attributed primarily to the heavily skewed distribution of the prior PDF of IWP in the range of jDTB j less than 10 K. To reduce the negative biases, an effort should be made in a way that evades the overwhelmed influence by the heavily skewed prior probability of the IWP, and extracts more information from the brightness temperature measurements. The remedy we propose here is to effectively narrow the ‘‘radius of influence’’ for data points in the a priori database by artificially assigning smaller error variances ðsÞ to the observations, and radiative transfer model results. This remedy effectively ignores the actual random error in the observations and models, and forces the algorithm to find answers from those data points in the a priori database that possess brightness temperatures very close to the

observations. The result of an example of reducing s to 0.1s is shown in Fig. 5. As anticipated, the negative bias can be largely reduced by this remedy because we are using ‘‘perfect observational data’’ in this exercise. However, since observational data and radiative transfer models always contain errors, the correctness of this remedy still needs to be explored. In addition to the Monte Carlo integration implementation of the Bayesian theorem, in this study we also examined a retrieval algorithm based on taking the maximum posterior

Fig. 5. Relation between ‘‘observed’’ and retrieved IWPs by both inclusion of zero IWP data points in the database and narrowing s.

74


Vol. 85, No. 1

tion to the outcome of the retrieval. The consequence of this error is a negative bias in the retrieval. We offered a number of remedies to the above problems although none of them is without problems. The aim of the presentation of these remedies in the paper is rather to provoke a discussion in the community on the proper ways to mitigate the systematic errors, than to advocate any one of them. Acknowledgements

Fig. 6. Relation between ‘‘observed’’ and retrieved IWPs by finding the maximum in the posterior PDF.

This research has been supported by DOE ARM Grant DE-FG02-03ER63526 and NASA Grant NNG05GJ17G. The authors would like to acknowledge the valuable suggestions of three anonymous reviewers. References

probability rather than taking the weighted mean of the posterior PDF. The retrieval results are shown in Fig. 6. The retrieved IWPs for most of low jDTB j cases tend to approach IWP ¼ 0. Thus, maximizing the posterior PDF does not achieve the correct retrieval where the posterior probability is overwhelmed by the strongly skewed prior PDF. 4.

Conclusions

Bayesian retrieval algorithms have been used by many investigators to derive atmospheric parameters from satellite measurements. The purpose of this note is to document the existence of two types of systematic errors in the Monte Carlo integration implementation of the Bayesian theorem. The causes of the systematic errors are explained by using an IWP retrieval algorithm developed by Seo and Liu (2005), and treating a subset of the a priori database as ‘‘observations’’. The first type of error occurs near clear-sky conditions where data points in the database have a discontinuous distribution since, by definition, there are no clouds with 0 or negative IWPs. The consequence of this error is a positive bias in the retrieval. The second type of error occurs at the lower portion of the IWP spectrum, where the data point distribution is heavily skewed toward low IWPs. As a result, the a priori information on the likelihood that a certain IWP occurs is so overwhelming that the observational information does not have a sufficient contribu-

Evans, K.F. and G.L. Stephens, 1993: Microwave remote sensing algorithms for cirrus clouds and precipitation. Tech. Rep. 540, 198 pp. [available from Colorado State University, Department of Atmospheric Sciences, Fort Collins, CO 80523.] Evans, K.F., S.J. Walter, A.J. Heymsfield, and G.M. McFarquhar, 2002: Submillimeter-wave cloud ice radiometer: Simulations of retrieval algorithm performance. J. Geophys. Res., 107, 4028, doi:10.1029/2001JD000709. Kummerow, C., W.S. Olson, and L. Giglio, 1996: A simplified scheme for obtaining precipitation and vertical hydrometeor profiles from passive microwave sensors. IEEE Trans. Geosci. Remote Sens., 34, 1213–1232. Petty, G.W., 1999: Cloud physical and microwave radiative properties of tropical stratiform precipitation inferred from multichannel microwave radiances. Preprints, 10th Conference of Satellite Meteorology and Oceanography, Long Beach. CA, Amer. Meteor. Soc., 318–320. Seo, E.-K. and G. Liu, 2005: Retrievals of cloud ice water path by combining ground cloud radar and satellite high-frequency microwave measurements near the ARM SGP site. J. Geophys. Res., 110, D14203, doi:10.1029/2004JD005727. Seo, E.-K. and M.I. Biggerstaff, 2006: Impact of cloud model microphysics on passive microwave retrievals of cloud properties. Part II: Uncertainty in rain, hydrometeor structure and latent heating retrievals. J. Appl. Meteor. Clim., 45, 955–972. Twomey, S., 1977: Introduction to the Mathematics of Inversion in Remote Sensing and Indirect Measurements, Dover publications, 243 pp.

A Note on Systematic Errors in Bayesian Retrieval Algorithms

A Note on Systematic Errors in Bayesian Retrieval Algorithms

Suggest Documents

A note on Probably Certifiably Correct algorithms

Non-Gaussian Bayesian retrieval

A Note on Unit Root Tests and GARCH Errors

Correcting the Errors: A Note on Volatility Forecast Evaluation ... - Core

On Random and Systematic Errors of a Star ... - DigitalCommons@USU

A Note on On-The-Fly Verification Algorithms - TUM

Systematic errors in NIST-7 - Precision ...

Differences in the systematic and random errors

Bayesian real-time perception algorithms on GPU

CORRECTIONS OF SYSTEMATIC ERRORS, DATA ...

Systematic Review of Errors in Inhaler Use

A Note on Uniqueness of Bayesian Nash Equilibrium ... - Google Sites

A Note on Logic and Information Retrieval - CiteSeerX

A note on Minimal Unanimity and Ordinally Bayesian ... - Google Sites

A note on Bayesian nonparametric regression function ... - Springer Link

A Note on Bayesian Analysis of Decapitated Generalized Poisson ...

A Note on Uniqueness of Bayesian Nash Equilibrium ... - Google Sites

Component Retrieval Techniques-A Systematic Review - Ijser

Component Retrieval Techniques-A Systematic Review - Ijser

NOTE ON MEASURABILITY PROBLEM IN BAYESIAN ... - Google Sites

propagation of systematic errors in a one layer

Diagnostic errors in older patients: a systematic ... - Semantic Scholar

propagation of systematic errors in a one layer

Evolving Retrieval Algorithms with a Genetic ... - CiteSeerX